Not So Different After All
Download
Report
Transcript Not So Different After All
Not So Different After All —
Creating Access To Diverse
Objects in Digital Repositories
Jennifer O’Brien Roper, Gretchen Gueguen, and Susan Schreibman
University of Maryland Libraries
What We Will Cover…
The Building Blocks of Digital
Repositories
The Thomas MacGreevy Archive vs.
The University of Maryland’s Digital
Repository
Future issues of exploration…
Barriers to Digital Library
Integration
Thematic Collections
Object Collections
University of Indiana (XPat)
http://www.dlib.indiana.edu/collections/
Packaged Collections
Documenting the American South
http://docsouth.unc.edu
Historic Pittsburgh
http://digital.library.pitt.edu/pittsburgh
Romantic Circles
http://www.rc.umd.edu
After the fact Collections
NINES
http://www.nines.org
Digital Project Building Blocks
Metadata
Vocabulary
Interface Design
Digital Projects: Metadata
Standard
Initiatives to aggregate collections:
Z39.50
Keep different metadata schemes
Keep localized repositories
Pressure on the search interface
Open Archives Initiatives Metadata
Harvesting Protocol (MHP)
Extract a normalized record
Create a central repository
Pressure on the central repository
Digital Projects: Metadata
Standard
metadata
Individual
Repository
(local metadata)
URL
User Query
metadata
Centralized
Repository for
searching
(normalized
metadata)
URL
Individual
Repository
(local metadata)
metadata
URL
Individual
Repository
(local metadata)
Digital Projects: Metadata
Standard
Lessons from MHP
Normalizing metadata for sharing
Keeping local enhanced records
Centralize functions like searching, but disperse
functions like displaying and storing
Drawbacks
Lack of enforcement for Metadata Standards
Various levels of granularity in data
Digital Projects: Vocabularies
The “ideal” language
Pre-coordinated vs. post-coordinated
Hierarchies and relationships are great for
some things, not for others
Digital Projects: Vocabularies
Controlled vs. local
Locally created languages
Fit materials well
Speak the users language
BUT
Are difficult to enhance
All controlled vocabularies are difficult to
combine
Digital Projects: Interface
Issues
Multiple Hierarchies
Multiple Object Types
Facilitate access to both individual and general
collections
Film, audio, video, image, text in one interface
Multiple Modes of Access
Allow users to browse through objects in a
manner that promotes cross-collection discovery
Thomas MacGreevy Archive
http://www.macgreevy.org
TEI p4-based repository of texts
A few different collections, some
searchable, some not
Changes Desired
Add images to the searchable collection
Add a collection of letters with special
display needs
Thomas MacGreevy Archive :
Metadata Standard
Limited to TEI p4
Letters have irregularities in the body
Images have multiple levels of “being”
Collections haven’t been named before
Thomas MacGreevy Archive :
Metadata Standard
Solution?
Adapt things to fit
<title type="main">The Entombment</title>
<title type=“version”>An Electronic Version</title>
Fit the title of the original (here, a paiting by Poussin) in the main
title and indicate that it is a digital copy in the version title. This is
typical TEI practice
<respStmt>
<resp>
Markup completed by:
<name>Gretchen Gueguen</name>
</resp>
</respStmt>
<extent TEIform="extent">6 kb</extent>
Keep the responsibility statement about the creation of the TEI
file. Record the creator of the original when you record details
about the original
Thomas MacGreevy Archive :
Metadata Standard
Add only what is necessary
<keywords>
- <list type="keyword">
<item type="subject">Art</item>
<item type="nationality">French</item>
<item type="date">1600-1699</item>
</list>
Keywords for cross-searching
<note type="phsy-desc-item">
<note type="phys-desc-photo">
<note type="technical-description">
-
These details for display and internal record keeping, NOT searching
<figDesc>
The McGreevy Family: Front row from left to right <persName reg="Thomas
MacGreevy">Thomas</persName>, <persName reg="Honora
McGreevy">Nora</persName>… </figDesc>
A caption is searchable and provide many precise points of access.
Thomas MacGreevy Archive :
Metadata Standard
Translate existing codes
<keywords>
<list type="collection">
<item type="images">Image</item>
</list>
</keywords>
The collections were loosely designated before by a combination of
item type and value. We continued to use this convention even
when it was redundant.
Thomas MacGreevy Archive :
Metadata Standard
Only the
information
that is also
available for
texts is
displayed, such
as title, creator,
text, and
keywords
Thomas MacGreevy Archive:
Vocabulary
Limited vocabulary available
Main descriptors based on Dewey
Other descriptors based on nationality and date
of subject
Some other fields used a consistent
language
Collection designations
Text types (poem, art review, obituary, etc.)
500 texts finished
Solution?
Thomas MacGreevy Archive:
Vocabulary
Can’t create a new vocabulary for the
collection
Can’t create a scalable vocabulary for
everything
So,
Add some new words to list and
retrospectively update
Thomas MacGreevy Archive:
Vocabulary
Existing Terms
Architecture
Art
Biography
Catholicism
Critical Method
Dance
Education
Film
Folklore
•Great War
•History
•Journals
•Music
•Mythology
•Opera
•Sport
•Theatre
•Travel
New Terms
Career &
Finances
Domestic
Life
Irish Culture
Literature
Politics &
Government
Portraits
Social Life
Thomas MacGreevy Archive :
Interface
New collections affect how searching
is done
New object types need to be displayed
different ways
Viewing
Search results
Thomas MacGreevy Archive :
Interface – Browse
Searchable
collections
Unsearchable
collections
Thomas MacGreevy Archive :
Interface – Search
Unclear
approach to
subjects
Confusing, often
missed, options
Thomas MacGreevy Archive :
Interface – Revised Search
Search in full-text as
well as by type
Faceted
subjects made
explicit
Collection, author, and date of
objects are searchable nodes
Thomas MacGreevy Archive :
Interface – Results
User determines
relevancy by
author and title
within results
Thomas MacGreevy Archive :
Interface – Revised Results
Sort by document
type in results
Show thumbnails
and keywords for
image objects
Indicate that images are
associated with texts
List blurbs for articles
and abstracts for letters
Thomas MacGreevy Archive :
Interface – Revised Display
Tabbed interface
for comparison
Unique, textual features
of letters represented in
images
Thomas MacGreevy Archive :
Lessons Learned
Neutral metadata standard
Scalable controlled vocabulary
Starting over is sometimes worth it,
sometimes not
University of Maryland Digital
Repository
FEDORA architecture
Individual collections
Unique look and feel
Customized metadata design
Cross collection search and browse
Common search interface
Rich minimum standards for metadata
UM Digital Repository :
Metadata Standard - Description
Dublin Core
Great for cross-collection description
Too simple for rich description within a
focused collection
VRA Core
Excels at rich description
Created for and focused solely on images
UM Digital Repository :
Metadata Standard - Description
Hybrid standard
University of Maryland Descriptive Metadata
(UMDM)
Customized DTD
Rigorous minimum standard
Common base of granular data
MODS
UM Digital Repository :
Metadata Standard – Local Standard
Required base elements
Coverage Place
Coverage Time
Media Type
Physical description
PID
Relationships
Repository
Rights
• Optional base elements
Identifier
Language
Agent
Style
Culture
Description
Subject
Title
UM Digital Repository :
Metadata Standard - METS
Wrapper for all objects
METS record for every object contains:
Header
Descriptive Metadata
Administrative Metadata
File Section
Structural Map
Structural Links
Behavior Section
UM Digital Repository :
Metadata Standard - METS
Flexibility to use external descriptive
standards
Behavioral control
Map other standards to UMDM
dynamically
UM Digital Repository :
Metadata Standard - Conversion
Mapping existing data to UMDM
Indicates where information is in the existing
dataset, and intended UMDM location
Transformation notes
Static information to be added
UM Digital Repository :
Metadata Standard - Conversion
UMDM
MARC
<title type=”main”>
245≠a ≠b ≠p
<title type=”alternate”>
246
<agent
type=”contributor”><persName>
700
<agent type=”provider”><corpName>
710
<covPlace><geogName
type=”continent”>
static: North America
<covPlace><geogName
type=”country”>
static: United States
<covTime><century era =”ad”>
static: 1901-2000
<covTime><date era =”ad”>
008 character position 11-14
UM Digital Repository :
Metadata Standard - Conversion
Ingestion
As a distinct standard, with dynamic
generation
Batch uploaded from another source
Incrementally built
UM Digital Repository :
Metadata Standard - Conversion
UM Digital Repository :
Metadata Standard - Conversion
UM Digital Repository :
Metadata Standard - Conversion
UM Digital Repository :
Vocabularies
Consistent input key to crosssearchability
Controlled vocabularies
General descriptive
Names and name authority
Subjects
UM Digital Repository :
Vocabularies – General Descriptive
External vocabularies
Media Type (DCMI Type Vocabulary)
Language (Former ISO 639-2 values)
Local vocabularies
Repository
UM Digital Repository :
Vocabularies – General Descriptive
Terms created as needed
Culture
nationality, ethnic, regional, organizational, Etc.
Style
architectural, literary, musical, etc.
UM Digital Repository :
Vocabularies – Name Authority
Existing terms
LC Name Authority File
Getty Thesaurus of Geographic Names
Creating terms
Name Authority Cooperative Program
UM Digital Repository :
Vocabularies – Subject
Collection based
Repository wide
“browse” terms
UM Digital Repository :
Vocabularies – Subject
UM Digital Repository :
Vocabularies – Subject
Subject: Fine Arts
Subject: Architecture
UM Digital Repository :
Vocabularies – Subject
Collection based
Appropriate to project focus and scope
Existing thesauri
Library of Congress Subject Headings
Art & Architecture Thesaurus
Thesaurus for Graphic Materials
Etc.
Local thesauri
UM Digital Repository :
Vocabularies – Subject
“browse” terms
Defined independent of any project
Applied to all objects, regardless of collection
Intentionally general
Only two levels of specificity
Experimented with locally derived list based
on LC Call Number Scheme
UM Digital Repository :
Interface Design
Make clear through general and
collection interfaces that:
Objects are in multiple hierarchies
Users can access multiple object types
Users can use multiple modes of access
and discovery
UM Digital Repository :
Interface Design - General
University of Maryland “theme”
Access to general metadata
Accommodate multiple file types
Simple and advanced searching
UM Digital Repository :
Interface Design - Collections
Collection based theme
Access to customized metadata
Exploit file types specific to collection
Customized search
Extras
Contextualized materials
Exhibits
Documentation
Date limit options
Restricted explicit field search
Embedded
video player
Metadata to
contextualize video
Amazon.com inspired
feature
Linked subject display
Option to limit or not by format
Drop
down
menus to
help guide
searchers
Wide variety of unique fields to explicitly search
Display of
customized
metadata
thumbnail
UM Digital Repository :
Interface Design - Browse
Explore collection without searching
“Browse” subjects as initial gateway
Use common metadata elements to drill
down
Lists rebuilt weekly, not dynamically
UM Digital Repository :
Future Issues
Difficulties managing multiple
hierarchies
Workflows for “general” collections
with no curator
Balancing infrastructure and individual
project development
Digital archiving questions
unanswered
Not So Different After All —
Creating Access To Diverse Objects
in Digital Repositories
http://www.lib.umd.edu/dcr/publications/lita.ppt
Jennifer O’Brien Roper: [email protected]
Gretchen Gueguen: [email protected]
Susan Schreibman: [email protected]