Transcript Powerpoint
CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia Libraries CLiMBUniversity - Columbia University CLiMB: Interdisciplinary Research Project at Columbia University Funded by Mellon Foundation 2002-2004 • • • Center for Research on Information Access (CRIA) Libraries Computer Science Department CLiMB - Columbia University Problems in Image Access Cataloging digital images Traditional approach: manual expertise labor intensive expensive Can automated techniques help? CLiMB - Columbia University Can we harvest image descriptors? CLiMB - Columbia University CLiMB Technical Contribution CLiMB will identify and extract •proper nouns •terms and phrases from text related to an image: September 14, 1908, the basis of the Greenes' final design had been worked out. It featured a radically informal, V-shaped plan (that maintained the original angled porch) and interior volumes of various heights, all under a constantly changing roofline that echoed the rise and fall of the mountains behind it. The chimneys and foundation would be constructed of the sandstone boulders that comprised the local geology, and the exterior of the house would be sheathed in stained split-redwood shakes. —Edward R. Bosley. Greene & Greene. London : Phaidon, 2000. p. 127 7/27/2016 CLiMB - Columbia University 5 Overall Goals • • • Research: Development of richer retrieval through increased numbers of descriptors Practice: Development of suite of CLiMB tools Resources: Vocabulary list which can be used by other visual resource professionals The essence of CLiMB: • Use scholars themselves as “catalogers” by utilizing scholarly publications • Enhance existing descriptive metadata CLiMB - Columbia University CLiMB Project Teams Coordinating Collections (Curatorial) Technical External Advisory CLiMB - Columbia University CLiMB Committees Coordinating Judith Klavans Stephen Angela Patricia Bob Davis Giral Renfro Wolven Curatorial Judith Klavans Stephen Angela Amy Davis Giral Heinrich David Magier Bob Scott Bob Wolven Roberta Blitz CLiMB - Columbia University Technical Stephen Judith Vera Davis Klavans Horvath David Elson Roberta Blitz Squeezing Metadata out of Scholarly Texts • Image collection • Associated text • Target object identification (TOI) • CLiMB suite of tools • Evaluation CLiMB - Columbia University CLiMB Processes Inputs Source TEXT I TOIs process texts AAT / BBIs / etc. II select metadata from texts Image Collections Phase User Evaluation Generate TEI Markup Run CLiMB Suite of Tools Other Texts Result: Enriched XML Test Records Select words & phrases to include in Core Descriptive Records Core Descriptive Records CLiMB Enriched Descriptive Records Art Librarians Subject Specialists Catalogers Search & Retrieval Experts III use CLiMB metadata in image search platform Image Search Platform Image Search Platform with CLiMB Metadata end-users CLiMB - Columbia University Squeezing Metadata out of Scholarly Texts • Image collection • Associated text • Target object identification (TOI) • CLiMB suite of tools • Evaluation CLiMB - Columbia University CLiMB Collections • Greene & Greene Architectural Drawings, Avery Architectural and Fine Arts Library • Chinese Paper Gods, C.V. Starr East Asian Library • Photographs from the Archives, American Institute of Indian Studies CLiMB - Columbia University Greene & Greene Architectural Records and Papers Collection Drawings and Archives Avery Architectural and Fine Arts Library Columbia University Libraries CLiMB - Columbia University Charles Sumner Greene Henry Mather Greene (1868-1957) (1870-1954) CLiMB - Columbia University NYDA.1960.001.00023 All Saints Episcopal Church (Pasadena, Calif.). Alterations 1902-1903 CLiMB - Columbia University Greene & Greene Catalog Record Author: Greene & Greene. Title: [Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.). Alterations.] Residence of Mrs. Dudley P. Allen, 1188 Hillcrest Ave., Pasadena, Cal. [graphic] : Alteration / Greene & Greene, Architects. Published: [1917] Physical Details: 4 sheets : various media ; 87.8 x 57.3 cm. (34 5/8 x 22 5/8 in.) Location: Columbia University, Avery Architectural Drawings Other Authors: Subjects: Greene, Charles Sumner, 1868-1957. Greene, Henry Mather, 1870-1954. Houses Alterations Architecture--Designs and plans--United States. Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.) Component Item: [1] Item no. NYDA.1960.001.03224. [AVERYimage]. Electric lighting -floor plan, part plan of basement : Sheet no. Component Item: [2] Item no. NYDA.1960.001.00073. [AVERYimage]. [Electric lighting] -floor plan, part plan of basement. CLiMB - Columbia University Greene & Greene Bibliography • Bosley, Edward R. Greene & Greene. London : Phaidon, 2000. • Current, William R. Greene & Greene: architects in the residential style. Fort Worth [Tex.] : Amon Carter Museum of Western Art, [1974] • Makinson, Randell L. Greene & Greene: architecture as fine art. Salt Lake City : Peregrine Smith, c1977. • Makinson, Randell L. Greene & Greene: the passion and the legacy. Salt Lake City : Gibbs and Smith, c1998. • Smith, Bruce. Greene & Greene masterworks. San Francisco : Chronicle Books, c1998. • Strand, Janann. A Greene & Greene guide [Pasadena, Calif. : G. Dahlstrom, 1974] CLiMB - Columbia University CLiMB - Columbia University Chinese Paper Gods Anne S. Goodrich Collection C.V. Starr East Asian Library, Columbia University CLiMB - Columbia University Pan-hu chih-shen God of tigers CLiMB - Columbia University Chinese Paper Gods Catalog Record Title: Chuang gong chuang mu [graphic]. Published: [193-] Physical Details: 1 print : wood-engraving, color ; 34 x 30 cm. In: Anne S. Goodrich Collection. Location: Columbia University, C.V. Starr East Asian Library (CJK) EAX GAC 1 no. 16 Subjects: Gods, Chinese, in art. Folk art--China. Genre Or Form: Woodcuts--Chinese. Notes: Date according to time period Anne S. Goodrich collected prints in Beijing. Record ID: NYCP02-F20 CLiMB - Columbia University Chinese Paper Gods Bibliography • Day, Clarence Burton. Chinese peasant cults : being a study of Chinese paper gods. Taipei : Ch'eng Wen Pub. Co., 1974. • Goodrich, Anne Swann. Peking paper gods : a look at home worship. Nettetal : Steyler Verlag, 1991. • Laing, Ellen Johnston. Art and aesthetics in Chinese popular prints: selections from the Muban Foundation collection. Ann Arbor, MI : Center for Chinese Studies, University of Michigan, c2002 CLiMB - Columbia University Chinese gods: selection from LC Authority File HEADING: Used For/See From: Nezha (Chinese deity) Daluoxian (Chinese deity) Jinhuan Yuanshuai (Chinese deity) Jinkang Yuanshuai (Chinese deity) Li Nezha (Chinese deity) Luoche Taizi (Chinese deity) Ne Zha (Chinese deity) Nezhataizi (Chinese deity) No-cha (Chinese deity) Nuozha (Chinese deity) Tailuoxian (Chinese deity) Taizi Yuanshuai (Chinese deity) Taiziyeh (Chinese deity) Yühuang Taizi (Chinese deity) Zhongtan Yuanshuai (Chinese deity) Search Also Under: Gods, Chinese CLiMB - Columbia University CLiMB - Columbia University CLiMB - Columbia University Three Testbed Collections • Greene & Greene • detailed records • more difficult to associate text with image • Chinese Paper Gods • strong associations • problems with transliteration and variants • South Asian Temples • large set of digital images • diacritics and variants CLiMB - Columbia University CLiMB Collections: Future • Additional collection of digital images • Close association between image and text • Regularized metadata Suggestions: • Catalogue raisonné • Museum collection catalog • Exhibition catalog CLiMB - Columbia University Squeezing Metadata out of Scholarly Texts • Image collection • Associated text • Target object identification (TOI) • CLiMB suite of tools • Evaluation CLiMB - Columbia University Target Object Identification (TOI) • Define based on institutional needs • Varies from collection to collection – Greene & Greene – Project – Chinese Paper Gods – Deity – South Asian Temples –Location & Temple • Compile authority list CLiMB - Columbia University CLiMB - Columbia University Project Name Matching • Locate project names in Greene & Greene • Challenge: finding variant name forms – Robert R. Blacker house (TOI) – Blacker estate – The house • Possible techniques to improve matching – Developing a semi-automatic technique – Use existing information to label text – An iterative platform for manual intervention CLiMB - Columbia University Squeezing Metadata out of Scholarly Texts • Image collection • Associated text • Target object identification (TOI) • CLiMB suite of tools • Evaluation CLiMB - Columbia University CLiMB Suite of Tools http://www.columbia.edu/cu/cria/climb/presentations.html CLiMB - Columbia University Squeezing Metadata out of Scholarly Texts • Image collection • Associated text • Target object identification (TOI) • CLiMB suite of tools • Evaluation CLiMB - Columbia University Next Steps – CLiMB Evaluation Current Developments • Meeting with experts – October 17th • Survey with experienced image searchers Long Term Goal • Test CLiMB tools and data in an image search platform CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building • Image collection • Associated text • Target object identification (TOI) • CLiMB suite of tools • Evaluation CLiMB - Columbia University Thank you! Any questions? www.columbia.edu/cu/cria/climb CLiMB - Columbia University