Not So Different After All

Download Report

Transcript Not So Different After All

Not So Different After All —
Creating Access To Diverse
Objects in Digital Repositories
Jennifer O’Brien Roper, Gretchen Gueguen, and Susan Schreibman
University of Maryland Libraries
What We Will Cover…



The Building Blocks of Digital
Repositories
The Thomas MacGreevy Archive vs.
The University of Maryland’s Digital
Repository
Future issues of exploration…
Barriers to Digital Library
Integration

Thematic Collections



Object Collections


University of Indiana (XPat)
http://www.dlib.indiana.edu/collections/
Packaged Collections


Documenting the American South
http://docsouth.unc.edu
Historic Pittsburgh
http://digital.library.pitt.edu/pittsburgh
Romantic Circles
http://www.rc.umd.edu
After the fact Collections

NINES
http://www.nines.org
Digital Project Building Blocks



Metadata
Vocabulary
Interface Design
Digital Projects: Metadata
Standard
Initiatives to aggregate collections:
 Z39.50




Keep different metadata schemes
Keep localized repositories
Pressure on the search interface
Open Archives Initiatives Metadata
Harvesting Protocol (MHP)



Extract a normalized record
Create a central repository
Pressure on the central repository
Digital Projects: Metadata
Standard
metadata
Individual
Repository
(local metadata)
URL
User Query
metadata
Centralized
Repository for
searching
(normalized
metadata)
URL
Individual
Repository
(local metadata)
metadata
URL
Individual
Repository
(local metadata)
Digital Projects: Metadata
Standard

Lessons from MHP




Normalizing metadata for sharing
Keeping local enhanced records
Centralize functions like searching, but disperse
functions like displaying and storing
Drawbacks


Lack of enforcement for Metadata Standards
Various levels of granularity in data
Digital Projects: Vocabularies

The “ideal” language

Pre-coordinated vs. post-coordinated

Hierarchies and relationships are great for
some things, not for others
Digital Projects: Vocabularies

Controlled vs. local

Locally created languages


Fit materials well
Speak the users language
BUT


Are difficult to enhance
All controlled vocabularies are difficult to
combine
Digital Projects: Interface
Issues

Multiple Hierarchies


Multiple Object Types


Facilitate access to both individual and general
collections
Film, audio, video, image, text in one interface
Multiple Modes of Access

Allow users to browse through objects in a
manner that promotes cross-collection discovery
Thomas MacGreevy Archive




http://www.macgreevy.org
TEI p4-based repository of texts
A few different collections, some
searchable, some not
Changes Desired


Add images to the searchable collection
Add a collection of letters with special
display needs
Thomas MacGreevy Archive :
Metadata Standard

Limited to TEI p4



Letters have irregularities in the body
Images have multiple levels of “being”
Collections haven’t been named before
Thomas MacGreevy Archive :
Metadata Standard
Solution?

Adapt things to fit
<title type="main">The Entombment</title>
<title type=“version”>An Electronic Version</title>

Fit the title of the original (here, a paiting by Poussin) in the main
title and indicate that it is a digital copy in the version title. This is
typical TEI practice
<respStmt>
<resp>
Markup completed by:
<name>Gretchen Gueguen</name>
</resp>
</respStmt>
<extent TEIform="extent">6 kb</extent>

Keep the responsibility statement about the creation of the TEI
file. Record the creator of the original when you record details
about the original
Thomas MacGreevy Archive :
Metadata Standard
Add only what is necessary
<keywords>
- <list type="keyword">
<item type="subject">Art</item>
<item type="nationality">French</item>
<item type="date">1600-1699</item>
</list>


Keywords for cross-searching
<note type="phsy-desc-item">
<note type="phys-desc-photo">
<note type="technical-description">
-
These details for display and internal record keeping, NOT searching
<figDesc>
The McGreevy Family: Front row from left to right <persName reg="Thomas
MacGreevy">Thomas</persName>, <persName reg="Honora
McGreevy">Nora</persName>… </figDesc>

A caption is searchable and provide many precise points of access.
Thomas MacGreevy Archive :
Metadata Standard
Translate existing codes
<keywords>
<list type="collection">
<item type="images">Image</item>
</list>
</keywords>


The collections were loosely designated before by a combination of
item type and value. We continued to use this convention even
when it was redundant.
Thomas MacGreevy Archive :
Metadata Standard
Only the
information
that is also
available for
texts is
displayed, such
as title, creator,
text, and
keywords
Thomas MacGreevy Archive:
Vocabulary

Limited vocabulary available



Main descriptors based on Dewey
Other descriptors based on nationality and date
of subject
Some other fields used a consistent
language


Collection designations
Text types (poem, art review, obituary, etc.)
500 texts finished
Solution?

Thomas MacGreevy Archive:
Vocabulary
Can’t create a new vocabulary for the
collection
 Can’t create a scalable vocabulary for
everything
So,
 Add some new words to list and
retrospectively update

Thomas MacGreevy Archive:
Vocabulary
Existing Terms
 Architecture
 Art
 Biography
 Catholicism
 Critical Method
 Dance
 Education
 Film
 Folklore
•Great War
•History
•Journals
•Music
•Mythology
•Opera
•Sport
•Theatre
•Travel
New Terms
 Career &
Finances
 Domestic
Life
 Irish Culture
 Literature
 Politics &
Government
 Portraits
 Social Life
Thomas MacGreevy Archive :
Interface


New collections affect how searching
is done
New object types need to be displayed
different ways


Viewing
Search results
Thomas MacGreevy Archive :
Interface – Browse
Searchable
collections
Unsearchable
collections
Thomas MacGreevy Archive :
Interface – Search
Unclear
approach to
subjects
Confusing, often
missed, options
Thomas MacGreevy Archive :
Interface – Revised Search
Search in full-text as
well as by type
Faceted
subjects made
explicit
Collection, author, and date of
objects are searchable nodes
Thomas MacGreevy Archive :
Interface – Results
User determines
relevancy by
author and title
within results
Thomas MacGreevy Archive :
Interface – Revised Results
Sort by document
type in results
Show thumbnails
and keywords for
image objects
Indicate that images are
associated with texts
List blurbs for articles
and abstracts for letters
Thomas MacGreevy Archive :
Interface – Revised Display
Tabbed interface
for comparison
Unique, textual features
of letters represented in
images
Thomas MacGreevy Archive :
Lessons Learned



Neutral metadata standard
Scalable controlled vocabulary
Starting over is sometimes worth it,
sometimes not
University of Maryland Digital
Repository


FEDORA architecture
Individual collections



Unique look and feel
Customized metadata design
Cross collection search and browse


Common search interface
Rich minimum standards for metadata
UM Digital Repository :
Metadata Standard - Description

Dublin Core



Great for cross-collection description
Too simple for rich description within a
focused collection
VRA Core


Excels at rich description
Created for and focused solely on images
UM Digital Repository :
Metadata Standard - Description

Hybrid standard





University of Maryland Descriptive Metadata
(UMDM)
Customized DTD
Rigorous minimum standard
Common base of granular data
MODS
UM Digital Repository :
Metadata Standard – Local Standard

Required base elements
Coverage Place
Coverage Time
Media Type
Physical description
PID
Relationships
Repository
Rights
• Optional base elements
Identifier
Language
Agent
Style
Culture
Description
Subject
Title
UM Digital Repository :
Metadata Standard - METS


Wrapper for all objects
METS record for every object contains:







Header
Descriptive Metadata
Administrative Metadata
File Section
Structural Map
Structural Links
Behavior Section
UM Digital Repository :
Metadata Standard - METS

Flexibility to use external descriptive
standards

Behavioral control

Map other standards to UMDM
dynamically
UM Digital Repository :
Metadata Standard - Conversion

Mapping existing data to UMDM



Indicates where information is in the existing
dataset, and intended UMDM location
Transformation notes
Static information to be added
UM Digital Repository :
Metadata Standard - Conversion
UMDM
MARC
<title type=”main”>
245≠a ≠b ≠p
<title type=”alternate”>
246
<agent
type=”contributor”><persName>
700
<agent type=”provider”><corpName>
710
<covPlace><geogName
type=”continent”>
static: North America
<covPlace><geogName
type=”country”>
static: United States
<covTime><century era =”ad”>
static: 1901-2000
<covTime><date era =”ad”>
008 character position 11-14
UM Digital Repository :
Metadata Standard - Conversion

Ingestion



As a distinct standard, with dynamic
generation
Batch uploaded from another source
Incrementally built
UM Digital Repository :
Metadata Standard - Conversion
UM Digital Repository :
Metadata Standard - Conversion
UM Digital Repository :
Metadata Standard - Conversion
UM Digital Repository :
Vocabularies


Consistent input key to crosssearchability
Controlled vocabularies



General descriptive
Names and name authority
Subjects
UM Digital Repository :
Vocabularies – General Descriptive

External vocabularies



Media Type (DCMI Type Vocabulary)
Language (Former ISO 639-2 values)
Local vocabularies

Repository
UM Digital Repository :
Vocabularies – General Descriptive

Terms created as needed

Culture


nationality, ethnic, regional, organizational, Etc.
Style

architectural, literary, musical, etc.
UM Digital Repository :
Vocabularies – Name Authority

Existing terms



LC Name Authority File
Getty Thesaurus of Geographic Names
Creating terms

Name Authority Cooperative Program
UM Digital Repository :
Vocabularies – Subject

Collection based

Repository wide

“browse” terms
UM Digital Repository :
Vocabularies – Subject
UM Digital Repository :
Vocabularies – Subject
Subject: Fine Arts
Subject: Architecture
UM Digital Repository :
Vocabularies – Subject

Collection based


Appropriate to project focus and scope
Existing thesauri





Library of Congress Subject Headings
Art & Architecture Thesaurus
Thesaurus for Graphic Materials
Etc.
Local thesauri
UM Digital Repository :
Vocabularies – Subject

“browse” terms





Defined independent of any project
Applied to all objects, regardless of collection
Intentionally general
Only two levels of specificity
Experimented with locally derived list based
on LC Call Number Scheme
UM Digital Repository :
Interface Design

Make clear through general and
collection interfaces that:



Objects are in multiple hierarchies
Users can access multiple object types
Users can use multiple modes of access
and discovery
UM Digital Repository :
Interface Design - General




University of Maryland “theme”
Access to general metadata
Accommodate multiple file types
Simple and advanced searching
UM Digital Repository :
Interface Design - Collections





Collection based theme
Access to customized metadata
Exploit file types specific to collection
Customized search
Extras



Contextualized materials
Exhibits
Documentation
Date limit options
Restricted explicit field search
Embedded
video player
Metadata to
contextualize video
Amazon.com inspired
feature
Linked subject display
Option to limit or not by format
Drop
down
menus to
help guide
searchers
Wide variety of unique fields to explicitly search
Display of
customized
metadata
thumbnail
UM Digital Repository :
Interface Design - Browse




Explore collection without searching
“Browse” subjects as initial gateway
Use common metadata elements to drill
down
Lists rebuilt weekly, not dynamically
UM Digital Repository :
Future Issues




Difficulties managing multiple
hierarchies
Workflows for “general” collections
with no curator
Balancing infrastructure and individual
project development
Digital archiving questions
unanswered
Not So Different After All —
Creating Access To Diverse Objects
in Digital Repositories
http://www.lib.umd.edu/dcr/publications/lita.ppt
Jennifer O’Brien Roper: [email protected]
Gretchen Gueguen: [email protected]
Susan Schreibman: [email protected]