Transcript Document

Using Metadata in CONTENTdm
Diana Brooking and Allen Maberry
Metadata Implementation Group, Univ. of Washington
Crossing Organizational Boundaries
Oct. 29, 2002
Outline
• The metadata “environment”: factors that
influence basic decisions
• Structure of metadata: Dublin Core, field
structure in CONTENTdm
• Content standards: what goes into the fields,
formatting, controlled vocabularies
• The data dictionary: bringing it all together
Metadata: what is it?
• Data about data
– “Metadata are data that describe the attributes of a
resource; characterize its relationships; support its
discovery, management, and effective use; and exist in
an electronic environment.” (Sherry Vellucci, LRTS 44 (1), 1999)
• Commonly known as cataloging
Metadata: how is it used?
• For description: information for display
with the image
• For searching: users search for images by
searching for text attached to the image
Basic Decisions: Description
• How much information do you have?
• How much information do your users
need/want?
– What is depicted in the image?
– Who created it?
– Why is it important? Why did you select it?
• How much detail do you need to go into?
Basic Decisions: Searching
• How will users find the images? What will they be
looking for? What aspects are they interested in?
• How will you find the images? What are your
staff’s needs?
• At what level do you need to distinguish images
from one another?
• At what level do you need to bring like resources
together?
Decision Factors
• Size of file
– 50 images (small enough to browse)
– 10,000 images (need for more precise
searching)
– 10,000 images of many different things vs.
10,000 images of trains
Decision Factors
• Audience
– General public vs. specialists (e.g., railroad
enthusiasts)
• Institutional mission
– Say you are a railroad museum (audience
expectations)
Decision Factors
• Legacy data
– Starting from scratch
– Years of good cataloging
– Years of inconsistent cataloging
• Software issues
– What kind of data can the system handle?
– What are its search capabilities
– Short-term vs. long-term view
Basic Dublin Core Metadata
• What is the Dublin Core Metadata Element
Set (DCMES)
• Why was it developed, and how has it been
developed.
• A short history of the DC Initiative is
available at
http://www.dublincore.org/about/overview/
Dublin Core Metadata Element Set
• There are15 basic elements
• See Dublin Core Element Set, Version 1.1 Reference Description
• But, it is adaptable and expandable to fit the
needs of different users by the use of
“Applications profiles”
Dublin Core and CONTENTdm
• CONTENTdm is designed around the
Dublin Core
• (Very) basic overview of how
CONTENTdm works
– CONTENTdm uses DC element names as file
names
– Because each database has constant file names
it is easy to combine them to search either one
or more collections
Dublin Core mapping
• An example:
– Collection A has a field “Photographer”
mapped to DC:Creator, and Collection B has a
field “Artist” mapped to DC:Creator. Searching
across both databases searches the
CONTENTdm index “Creat*” and retrieves
data from the index for both “Photographers”
and “Artists” for collections A + B or A+B+n…
Dublin Core and searching
• What are the practical consequences of this?
– In cross database searching, one can search on
specific fields. However, the names of these
fields will not be Photographer or Artist, but
“Creator” because that is the common name of
the index in each collection.
– However you can do a keyword search on all
“searchable” fields in the database whether they
are mapped to a Dublin Core field or not.
Modern Book Arts field labels
– bibliographic description = descr0
– text production =
descr1
– image production =
descr2, etc.
Cross-database search index
– Description =
descr*
Dublin Core tips
– It is important to make sure that you are careful
about what information you put in searchable
fields, even if they are not mapped to a DC
element.
– If you have multiple collections it is very
important to make sure that the same type of
data is mapped to the same DC elements
consistently
Content Standards
• Used for choosing and formatting the data that
goes into the fields.
• Increase coherence and intelligibility of
description
• Enhance reliability of retrieval
• Enable compatibility with other collections (crossdatabase searching)
• Makes maintenance and possible migration of data
to other software easier
Standards = Consistency
• “Date” field: dates should always be formatted the
same way
• “Photographer” field: same person’s name should
always appear in the same form
• “Subject” field: same topic should have the same
term used to describe it across images
• If different terms or formats are used, the user may
not even realize that more than one search is
necessary
Examples of Content Standards
For description:
• Anglo-American Cataloging Rules, 2nd ed.,
2002 revision (libraries)
• Graphic Materials: Rules for Describing
Original Items and Historical Collections,
1982; revisions available electronically
(libraries, also museums, historical
societies, LC Prints & Photo., CORBIS)
Content Standards: Controlled
Vocabularies
“Any subset of the lexicon of a natural
language. A list of preferred and
nonpreferred terms produced by the process
of vocabulary control. Types of controlled
vocabularies include subject heading lists
and thesauri.” (NISO)
Controlled vocabs for which
fields?
• When you need consistency across images,
user searches to find all …
– Proper names for things (people, places, etc.)
– Subjects depicted in the images
• Not necessary when you have…
– Fields that contain data more likely to be
unique to the particular image (title, notes,
other free text fields)
Remember…
You can have fields that don’t use controlled
vocabularies, but where you still need
consistency in format:
– Dates
– Image numbers
– Physical description
• You could create your own controlled vocab
lists (if you really had to)
Controlled Vocabularies
For names:
• Library of Congress/National Authority File:
http://authorities.loc.gov
• Union List of Artist Names (Getty):
http://www.getty.edu/research/tools/vocabulary/ul
an
• USGS Geographic Names Information System:
http://geonames.usgs.gov/gnishome.html
Controlled Vocabularies
For subjects:
• Library of Congress Subject Headings:
http://authorities.loc.gov
• LC Thesaurus for Graphic Materials:
http://www.loc.gov/rr/print/tgm1
• Art & Architecture Thesaurus (Getty):
http://www.getty.edu/research/tools/vocabulary/aat
• Chenhall’s Nomenclature (The Revised
Nomenclature for Museum Cataloging. Walnut
Creek: Altamira Press, 1995)
Vocabulary conflicts?
• DC Subject: LCSH vs. AAT
– Church buildings vs. Churches
• DC Coverage: LC vs. Board of Geographic
Names
– Moscow vs. Moskva
• Challenge of meeting needs of diverse
collections and users, while maintaining
consistency within and between databases
Data Dictionaries
For each project a data dictionary documents:
• Database-specific field labels
• Mapping of fields to DC elements
• Data formatting instructions for each field
• Recommended controlled vocabularies
• UW data dictionaries:
http://www.lib.washington.edu/msd/mig/datadicts/
default.html
• MOHAI