Transcript Powerpoint

CLiMB:
Computational Linguistics for
Metadata Building
Center for Research on Information Access
Columbia
Libraries
CLiMBUniversity
- Columbia University
CLiMB: Interdisciplinary
Research Project at Columbia
University
Funded by Mellon Foundation 2002-2004
•
•
•
Center for Research on Information
Access (CRIA)
Libraries
Computer Science Department
CLiMB - Columbia University
Problems in Image Access


Cataloging digital images
Traditional approach:
manual expertise



labor intensive
expensive
Can automated techniques help?
CLiMB - Columbia University
Can we harvest image descriptors?
CLiMB - Columbia University
CLiMB Technical Contribution
CLiMB will identify and extract
•proper nouns
•terms and phrases
from text related to an image:
September 14, 1908, the basis of the Greenes' final
design had been worked out. It featured a radically
informal, V-shaped plan (that maintained the original
angled porch) and interior volumes of various heights,
all under a constantly changing roofline that echoed
the rise and fall of the mountains behind it. The
chimneys and foundation would be constructed of the
sandstone boulders that comprised the local geology,
and the exterior of the house would be sheathed in
stained split-redwood shakes. —Edward R. Bosley.
Greene & Greene. London : Phaidon, 2000. p. 127
7/19/2016
CLiMB - Columbia University
5
Overall Goals
•
•
•
Research: Development of richer retrieval through
increased numbers of descriptors
Practice: Development of suite of CLiMB tools
Resources: Vocabulary list which can be used by
other visual resource professionals
The essence of CLiMB:
• Use scholars themselves as “catalogers” by utilizing
scholarly publications
• Enhance existing descriptive metadata
CLiMB - Columbia University
CLiMB Project Teams
Coordinating
Collections
(Curatorial)
Technical
External
Advisory
CLiMB - Columbia University
Squeezing Metadata out of
Scholarly Texts
• Image collection
• Associated text
• Target object identification (TOI)
• CLiMB suite of tools
• Evaluation
CLiMB - Columbia University
Squeezing Metadata out of
Scholarly Texts
• Image collection
• Associated text
• Target object identification (TOI)
• CLiMB suite of tools
• Evaluation
CLiMB - Columbia University
CLiMB Processes
Inputs
Source
TEXT
I
TOIs
process
texts
AAT /
BBIs / etc.
II
select metadata
from texts
Image Collections
Phase
User
Evaluation
Generate TEI Markup
Run CLiMB Suite of Tools
Other
Texts
Result: Enriched
XML
Test Records
Select words & phrases to include
in Core Descriptive Records
Core Descriptive
Records
CLiMB Enriched
Descriptive Records
 Art Librarians
 Subject
Specialists
 Catalogers
 Search &
Retrieval Experts
III
use CLiMB
metadata in image
search platform
Image Search Platform
Image Search Platform with
CLiMB Metadata
end-users
CLiMB - Columbia University
CLiMB Collections
• Greene & Greene Architectural Drawings,
Avery Architectural and Fine Arts Library
• Chinese Paper Gods,
C.V. Starr East Asian Library
• Photographs from the Archives,
American Institute of Indian Studies
CLiMB - Columbia University
Greene & Greene Architectural Records and
Papers Collection
Drawings and Archives
Avery Architectural and Fine Arts Library
Columbia University Libraries
CLiMB - Columbia University
Charles Sumner
Greene
Henry Mather
Greene
(1868-1957)
(1870-1954)
CLiMB - Columbia University
NYDA.1960.001.00023
All Saints Episcopal Church (Pasadena, Calif.). Alterations
1902-1903
CLiMB - Columbia University
Greene & Greene Catalog Record
Author: Greene & Greene.
Title:
[Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.).
Alterations.]
Residence of Mrs. Dudley P. Allen, 1188 Hillcrest Ave., Pasadena, Cal.
[graphic] : Alteration / Greene & Greene, Architects.
Published: [1917]
Physical Details: 4 sheets : various media ; 87.8 x 57.3 cm. (34 5/8 x 22 5/8 in.)
Location: Columbia University, Avery Architectural Drawings
Other Authors:
Subjects:
Greene, Charles Sumner, 1868-1957.
Greene, Henry Mather, 1870-1954.
Houses
Alterations
Architecture--Designs and plans--United States.
Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena,
Calif.)
Component Item: [1] Item no. NYDA.1960.001.03224. [AVERYimage]. Electric lighting -floor plan, part plan of basement : Sheet no.
Component Item: [2] Item no. NYDA.1960.001.00073. [AVERYimage]. [Electric lighting] -floor plan, part plan of basement.
CLiMB - Columbia University
Greene & Greene Bibliography
• Bosley, Edward R. Greene & Greene. London : Phaidon, 2000.
• Current, William R. Greene & Greene: architects in the residential
style. Fort Worth [Tex.] : Amon Carter Museum of Western Art,
[1974]
• Makinson, Randell L. Greene & Greene: architecture as fine art. Salt
Lake City : Peregrine Smith, c1977.
• Makinson, Randell L. Greene & Greene: the passion and the legacy.
Salt Lake City : Gibbs and Smith, c1998.
• Smith, Bruce. Greene & Greene masterworks. San Francisco :
Chronicle Books, c1998.
• Strand, Janann. A Greene & Greene guide [Pasadena, Calif. : G.
Dahlstrom, 1974]
CLiMB - Columbia University
CLiMB - Columbia University
Chinese Paper Gods
Anne S. Goodrich Collection
C.V. Starr East Asian Library,
Columbia University
CLiMB - Columbia University
Pan-hu chih-shen
God of tigers
CLiMB - Columbia University
Chinese Paper Gods Catalog Record
Title: Chuang gong chuang mu [graphic].
Published: [193-]
Physical Details: 1 print : wood-engraving, color ; 34 x 30 cm.
In: Anne S. Goodrich Collection.
Location: Columbia University, C.V. Starr East Asian Library (CJK)
EAX GAC 1 no. 16
Subjects: Gods, Chinese, in art.
Folk art--China.
Genre Or Form: Woodcuts--Chinese.
Notes: Date according to time period Anne S. Goodrich collected prints in Beijing.
Record ID: NYCP02-F20
CLiMB - Columbia University
Chinese Paper Gods Bibliography
• Day, Clarence Burton. Chinese peasant cults : being
a study of Chinese paper gods. Taipei : Ch'eng Wen
Pub. Co., 1974.
• Goodrich, Anne Swann. Peking paper gods : a look at
home worship. Nettetal : Steyler Verlag, 1991.
• Laing, Ellen Johnston. Art and aesthetics in Chinese
popular prints: selections from the Muban Foundation
collection. Ann Arbor, MI : Center for Chinese
Studies, University of Michigan, c2002
CLiMB - Columbia University
Chinese gods: selection from LC
Authority File
HEADING:
Used For/See From:
Nezha (Chinese deity)
Daluoxian (Chinese deity)
Jinhuan Yuanshuai (Chinese deity)
Jinkang Yuanshuai (Chinese deity)
Li Nezha (Chinese deity)
Luoche Taizi (Chinese deity)
Ne Zha (Chinese deity)
Nezhataizi (Chinese deity)
No-cha (Chinese deity)
Nuozha (Chinese deity)
Tailuoxian (Chinese deity)
Taizi Yuanshuai (Chinese deity)
Taiziyeh (Chinese deity)
Yühuang Taizi (Chinese deity)
Zhongtan Yuanshuai (Chinese deity)
Search Also Under: Gods, Chinese
CLiMB - Columbia University
CLiMB - Columbia University
CLiMB - Columbia University
Three Testbed Collections
• Greene & Greene
• detailed records
• more difficult to associate text with image
• Chinese Paper Gods
• strong associations
• problems with transliteration and variants
• South Asian Temples
• large set of digital images
• diacritics and variants
CLiMB - Columbia University
CLiMB Collections: Future
• Additional collection of digital images
• Close association between image and text
• Regularized metadata
Suggestions:
• Catalogue raisonné
• Museum collection catalog
• Exhibition catalog
CLiMB - Columbia University
Squeezing Metadata out of
Scholarly Texts
• Image collection
• Associated text
• Target object identification (TOI)
• CLiMB suite of tools
• Evaluation
CLiMB - Columbia University
Target Object Identification (TOI)
• Define based on institutional needs
• Varies from collection to collection
– Greene & Greene – Project Names
– Chinese Paper Gods – God Names
– South Asian Temples – Temple Names
• Compile authority list
CLiMB - Columbia University
CLiMB - Columbia University
Project Name Matching
• Locate project names in Greene & Greene
• Challenge: finding variant name forms
– Robert R. Blacker house (TOI)
– Blacker estate
– The house
• Possible techniques to improve matching
– Developing a semi-automatic technique
– Use existing information to label text
– An iterative platform for manual intervention
CLiMB - Columbia University
The Culbertson House
•
•
•
•
•
•
•
Cordelia A. Culbertson house (Pasadena, Calif.)
Culbertson sisters house (Pasadena, Calif.)
Culbertson, Cordelia A.
Allen, Elizabeth S.
Allen, Mrs. Dudley P.
Prentiss, Francis F.
Francis F. Prentiss house (Pasadena, Calif.)
House was purchased by Mrs. Allen, who remarried and
became Mrs. Prentiss!
CLiMB - Columbia University
Squeezing Metadata out of
Scholarly Texts
• Image collection
• Associated text
• Target object identification (TOI)
• CLiMB suite of tools
• Evaluation
CLiMB - Columbia University
CLiMB Suite of Tools
http://www.columbia.edu/cu/cria/climb/presentations.html
CLiMB - Columbia University
Squeezing Metadata out of
Scholarly Texts
• Image collection
• Associated text
• Target object identification (TOI)
• CLiMB suite of tools
• Evaluation
CLiMB - Columbia University
Next Steps – CLiMB Evaluation
Current Developments
• Meeting with experts – October 17th
• Survey with experienced image searchers
Long Term Goal
• Test CLiMB tools and data in an image
search platform
CLiMB - Columbia University
CLiMB: Computational Linguistics
for Metadata Building
• Image collection
• Associated text
• Target object identification (TOI)
• CLiMB suite of tools
• Evaluation
CLiMB - Columbia University
Thank you!
Any questions?
www.columbia.edu/cu/cria/climb
CLiMB - Columbia University