IFLA/DELOS/NSF Workshop Standards and Metadata

Download Report

Transcript IFLA/DELOS/NSF Workshop Standards and Metadata

IFLA/DELOS/NSF Workshop
Standards and Metadata
EVA 2000 Moscow
November 2, 2000
Thomas Baker
Carl Lagoze
GMD
Cornell Univ.
Introductions
EVA 2000
• Thomas Baker
– GMD Library, Bonn, Germany
– Dublin Core Executive Committee
– EU DELOS Network of Excellence
• Carl Lagoze
– Digital Library Research Group, Faculty of
Computing and Information, Cornell University,
Ithaca, NY, USA
– Dublin Core Advisory Committee
– NSF Digital Library Initiative
EVA 2000
Workshop Roadmap
• Introduction to Metadata (30 min.)
• Dublin Core Metadata Initiative (60 min.)
Break
• Simplicity and Complexity (45 min.)
• Metadata Infrastructure (45 min.)
Lunch
• Deploying and Using Metadata (90 min.)
• Metadata Landscape (30 min.)
EVA 2000 Moscow
Introduction to Metadata
EVA 2000
Haven’t we done metadata
already?
EVA 2000
What’s wrong with this
model?
• Expensive
– Complex (even for its original goal?)
– Professional intervention (assumes single community of
expertise)
• Monolithic
– One size fits all approach
– Reflects its centralized system origins
• Bias towards physical artifacts
– Fixed resources
– Incomplete handling of resource evolution and other
resource relationships
Internet Commons includes
Multiple Communities
EVA 2000
Home
Pages
Scientific
Data
Commerce
Geo
Library
Internet
Commons
Museums
Whatever...
EVA 2000
Web Challenge to Traditional
Cataloging
• Scale
• Permanence
• Authenticity
• Organizational Context
• Variety
EVA 2000
•
•
•
•
•
•
•
•
•
State of the Web as an
Information System
Search systems are motivated by advertising
Index coverage is unpredictable and limited (1/3)
Too much recall, too little precision
Index spam abounds
Resources (and their names) are volatile
What about versions, editions, back issues?
Archiving is presently unsolved
Authority and quality of service are spotty
Managing Intellectual Property Rights is hard
EVA 2000
Metadata: Part of a Solution
• Structured data about data
– helps to impose order on chaos
– enables automated discovery/manipulation
• Variety across various dimension:
– specialization
– decentralization
– democratization
EVA 2000
Metadata Takes Many Forms
resource
discovery
document
administration
rights
management
content
rating
security and
authentication
archival
status
products and
services
database
schemas
process control
or description
EVA 2000
Metadata Challenges
• Accommodate multiple varieties of
metadata
• Tension: functionality and simplicity
• Tension: extensibility and
interoperability
• Human and machine creation and use
• Community-specific functionality,
creation, administration, access
EVA 2000
Warwick Framework:
Containing Chaos
• Conceptual Architecture for metadata from
the Warwick Metadata Workshop (DC-2)
• Conceptual architecture to support the
specification, collection, encoding, and
exchange of modular metadata
• Provide context for metadata efforts
(including Dublin Core)
– avoids the “black-hole” of comprehensive element
sets
– focuses interoperability issues at package level
Modularization Allows
Distributed Management
EVA 2000
• Communities of expertise (not software
vendors) are responsible for:
–
–
–
–
–
–
Semantics
Registration
Administration
Access management
Authority of data
Sharing and Distribution
Interoperability
EVA 2000
requires conventions about:
• Semantics
– The meaning of the
elements
• Structure
– human-readable
– machine-parseable
• Syntax
– grammars to convey
semantics and structure
EVA 2000 Moscow
Dublin Core Metadata
Initiative
History of the Dublin Core
EVA 2000
• 1994: "Do we have a simple set of tags for
ordinary people to describe their Web pages?"
• 1995: The Dublin Core: 13 elements, later 15
• 1996: The Dublin Core is but one of many
vocabularies needed ("Warwick Framework")
• 1997: "WF needs formal expression in a
Resource Description Framework (RDF)"
• 2000: Dublin Core Metadata Initiative
recommends qualifiers, broadens its
organizational scope beyond the Core
EVA 2000
A pidgin for digital tourists
• Metadata is language.
• Dublin Core is a small and simple language -- a
pidgin -- for finding resources across domains.
• Speakers of different languages naturally
"pidginize" to communicate
– E.g., tourists using simple phrases to order beer ("zwei
Bier bitte" "dva pivo" "biru o san bai"...)
• We are all "tourists" on the global Internet.
EVA 2000
A grammar of Dublin Core
• http://www.dlib.org/dlib/october00/baker/10baker.html
• By design not as subtle as mother tongues,
but easy to learn and extremely useful in
practice
• Pidgins: small vocabularies (Dublin Core:
fifteen special nouns and lots of optional
adjectives)
• Simple grammars: sentences (statements)
follow a simple fixed pattern...
EVA 2000
Example Dublin Core
statements
• Resource
Core'.
• Resource
• Resource
• Resource
has Title 'Grammar of Dublin
has Creator 'Tom Baker'.
has Subject 'Metadata'.
has Relation http://foo.org/file.htm.
EVA 2000
implied
verb
implied
subject
Resource has
one of 15
properties
DC:Creator
DC:Title
DC:Subject
DC:Date...
property
property value
(an appropriate
literal)
X
qualifiers
(adjectives)
EVA 2000
The fifteen special nouns
(properties)
Creator
Title
Subject
Contributor
Date
Description
Publisher
Type
Format
Coverage
Rights
Relation
Source
Language
Identifier
EVA 2000
Resource has Subject
Resource has
Date
"Languages -- Grammar"
"2000-06-13"
EVA 2000
Dumb-Down Principle for
qualifiers
• The fifteen elements should be usable
and understandable with or without the
qualifiers
• Like saying that nouns can stand on their
own without adjectives
• If your software encounters an
unfamiliar qualifier, look it up -- or just
ignore it!
EVA 2000
To test whether qualifiers are "good",
cover them with your hand and ask:
-- Does the statement still make sense?
-- Is it still correct?
Resource has Subject
Resource has
Date
"Languages -- Grammar"
"2000-06-13"
EVA 2000
Element Refinements
• Make the meaning of an element
narrower or more specific.
– a Date Created versus a Date
Modified
– an IsReplacedBy Relation versus a
Replaces Relation
• If your software does not understand
the qualifier, you can safely ignore it.
Value Encoding Schemes
EVA 2000
• Says that the value is
– a term from a controlled vocabulary
(e.g., Library of Congress Subject
Headings)
– a string formatted in a standard way
(e.g., "2000-05-03" means May 3, not
March 5)
• Even if a scheme is not known by
software, the value should be
"appropriate" and usable for resource
discovery.
EVA 2000
Peer review of proposals
for new terms
• DCMI Usage Committee reviews proposals for new
qualifiers (and perhaps elements)
• Evaluates proposals in light of grammatical
principles (are the qualifiers ignorable?)
• Tiered model of approval status (tentative):
proposed, conforming, recommended, obsolete
• First qualifiers "recommended" in July 2000
•
http://purl.org/DC/documents/rec/dcmes-qualifiers-20000711.htm
EVA 2000
Open questions in Dublin Core
• What are "appropriate values" for the
fifteen properties? How can they be used
for cross-domain searching?
• How can DCMI control the evolution of
Dublin Core as it is adapted in practice?
• How can an application use DC as a pidgin
while describing resources with more
complex metadata?
• Can we keep the Core simple?
EVA 2000
Search buckets versus
description
• Think of DC elements as fuzzy search buckets
– Different types of data appropriate for different
buckets: URLs, date strings, word strings, names
– Separate books about Sigmund Freud versus books
by Sigmund Freud into different buckets
• Search bucket: for discovering resources
• But general, fuzzy categories may not be
sufficient for describing resources
– After searching, display more detailed descriptions
on screen
EVA 2000
DCMI broadens its mission
(Oct 2000)
• The mission of the DCMI is to make it
easier to find resources using the Internet
through the following activities:
– Developing metadata standards for discovery
across domains (example: the Dublin Core)
– Defining frameworks for the interoperation of
metadata sets
– Facilitating the development of community or
disciplinary specific metadata sets that are
consistent with items 1 and 2
A context for the Core
EVA 2000
• If "the Dublin Core" is the core of
DCMI, what is the surrounding context?
• If "the Dublin Core" is the simple pidgin,
what is the broader landscape of
metadata language?
• How do pidgins relate to more complex
models or "application profiles"?
• Do we need pidgins for describing other
things, such as "people" and "events"?
EVA 2000
Using DC with other vocabularies
• Specialized application profiles [government
information, education, mathematics] may need to:
– Use general-purpose Dublin Core
elements
– Use elements from another, more
domain-specific standard
– Narrow standard definitions of DC
elements for specific local uses
– Invent local elements outside the
scope of existing standards
EVA 2000
Example: adapting DC:Title
to local uses
• As defined in the official Dublin Core
"namespace":
– "Title: A name given to the resource"
• As defined in a UK "application profile":
– "Title: A name given to the collection"
• Definition is narrower
Namespaces in translation
EVA 2000
• Dublin Core has been translated into 26
languages
– machine-readable tokens are shared
by all
– human-readable labels are defined in
different languages
– translations are distributed,
maintained in many countries
EVA 2000
One token - labels in many
languages
“Verfasser”
rdfs:label
rdfs:label
dc:creator
[Server in
Germany]
“Creator”
[DCMI Server]
rdfs:label
[Server in
“Pencipta” Jakarta]
EVA 2000
RDF -- a more powerful
sentence pattern
• Dublin Core statements:
– Resource has Creator "Tom Baker".
– Resource has Identifier http://foo.org/bar.html.
• Resource Description Framework "triples" a more powerful way to say the same thing:
– http://foo.org/bar.htm
has Creator "Tom Baker".
EVA 2000
DCMI Re-organization
• Expanded mission
– Core metadata elements for Agents (or Events)?
– Frameworks for integrating multiple standards
• Re-organization model
– Membership organization like W3C or Unicode
Consortium?
– Retain open consensus model
– International perspective
– Better training, documentation, outreach
EVA 2000
DCMI Open Metadata
Registry
• Managing vocabularies defined by the DCMI
– Languages
– Versioning
– Controlled vocabularies
• Foundation for modular, incremental
integration and evolution
• Collaboration with European SCHEMAS Project
and ULIS in Tsukuba, Japan
• http://wip.dublincore.org/registry/
EVA 2000
Official recognition of the
Dublin Core
• CEN Workshop Agreement
– endorse Dublin Core elements as CWA13874
– provide usage guidelines for European industry
• NISO Z39.85
– National Information Standards Organization,
an ANSI affiliate
– Balloting concluded in August 2000
DCMI Activities
EVA 2000
• Standards development and maintenance
• Metadata registry
• Technical working groups and periodic
workshops
• Tutorial materials and user guides
• Education and training
• Access to software
• Liaisons with other standards or user
communities
EVA 2000
DC-9 Workshop in Tokyo,
2001
• DC-8 Workshop was a National Library of
Canada (Ottawa)
– emphasis on application profiles, longer-term
organizational mission, and domain-specific
adaptations of Dublin Core
• DC-9 in Tokyo: well-defined tracks
– implementation reports and research papers
– ongoing technical working group meetings
– general introduction and tutorials for nonexperts
EVA 2000 Moscow
Simplicity and Complexity
EVA 2000
•
•
•
•
Warwick Framework
Container/Package approach to metadata
Rejection of universal ontology
Recognition of individual community needs
Provide scope for metadata efforts
EVA 2000
Warwick Framework Design
Container
Containers for
aggregating
Packages of typed
metadata sets
Package
Dublin Core
Package
MARC Metadata
URI
Package
Indirect Reference
Package
Terms and
Conditions
EVA 2000
Warwick Framework
Implementation and Research
• Packaging, linking, storing, and transmitting
component/package framework
• Semantic interactions and interoperability
among multiple metadata
packages/vocabularies
EVA 2000
Interoperability among
Metadata Vocabularies
Dublin
Core
MARC
abc core
classes
IMS
INDECS
EVA 2000
Harmony Project
• Project Investigators
– Dan Brickley - ILRT, Bristol (U.K.)
– Jane Hunter - DSTC, Brisbane (Australia)
– Carl Lagoze - Computer Science, Cornell (U.S.)
• More Information
– http://www.ilrt.bris.ac.uk/discovery/harmony/
EVA 2000
Attribute/Value approaches
to metadata…
The playwright of Hamlet was Shakespeare
subject
Hamlet
implied verb
has a
metadata noun
creator
literal
Shakespeare
metadata adjective
“Shakespeare”
R1
“Hamlet”
EVA 2000
…run into problems for richer
descriptions…
The playwright of Hamlet was Shakespeare,
who was born in Stratford
Hamlet
Hamlet
has a
has a
creator
creator
Shakespeare
Stratford
“Shakespeare”
R1
“Stratford”
EVA 2000
…because of their failure to
model entity distinctions
“Shakespeare”
R1
creator
R2
“Stratford”
“Hamlet”
EVA 2000
Applying a Model-Centric
Approach
• Formally define common entities and
relationships underlying multiple metadata
vocabularies
• Describe them (and their interrelationships) in a simple logical model
• Provide the framework for extending these
common semantics to domain and
application-specific metadata vocabularies.
EVA 2000
Applications of the ABC
Model
• Guidance for communities developing
vocabularies
• Foundation for understanding existing
vocabularies
• Basis for mappings among vocabularies
using formalisms such as RDF
EVA 2000
Harmony/ABC Workshop
• January 27-28 2000 CNI Washington
• Representatives from
– Dublin Core, INDECS, MPEG-7, IFLA
– Archives, Museums, Libraries, Audiovisual
• Result: Importance of processes, events,
and states in understanding and describing
resources
EVA 2000
Conceptual Basis:
Evolution of Content over
Time
IFLA Entity Model
From Bearman, et. al., D-Lib Magazine, January 1999.
EVA 2000
Events help metadata
relationships?
• Recognizing inherent lifecycle aspects of digital
content - transformation of “input” resources to
“output” resources and of their descriptions. (e.g.,
IFLA model)
• Modeling implied events as first-class objects
provides attachment points for common entities –
e.g., agents, contexts (times & places), roles.
• Clarifying attachment points facilitates mapping
across common entities in different vocabularies.
Content, Events, &
Descriptions
EVA 2000
E1
R1
E2
R2
E3
R3
E4
R4
desc2
desc1
EVA 2000
ABC Event Model
EVA 2000
A Simple Example:
Live At Lincoln Performance
• Performance at The Lincoln Center for the
Performing Arts
• On April 7, 1998 at 8pm Eastern time
• Orchestra is New York Philharmonic
• Musical score – “Concerto for Violin”
• 130 minute MP3 audio recording
• Rights held by Lincoln Center
EVA 2000
Example in ABC Model
EVA 2000
Derivation of Multiple Views
Dublin Core in XML/RDF
ABC
Description
in XML
ID3 tags embedded in MP3
MPEG-7 description in DDL
CIDOC CRM Model
Step 1 – Structural Mapping
EVA 2000
Event-aware model
Resource-centric
model
EVA 2000
Structural Mapping Rules
Event attributes transferred to output:
• Context/Date, /Time, /Place ->
Date.Performance, Time.Performance,
Place.Performance
• Act/Role -> Agent.Role e.g. Orchestra
• Event Type -> Relation between input & ouput
e.g. Performance ->Relation.isPerformanceOf
• Output Description generated from event
Type and input Title e.g. “Performance of
Concerto for Violin”
Step 2 – Semantic Mapping
EVA 2000
EVA 2000
XSLT for Transformations
• Works well for structural and syntactic
mapping between metadata descriptions
• Semantic mappings need to be hardcoded
• Unsuitable for loosely constrained or variable
input
EVA 2000
A More General Solution
• Flexible semantic mappings require
additional knowledge:
– Metadata Term Ontology – MetaNet
• Methods for using that context knowledge
for mapping
– Some combination of procedural language (Java)
and XSLT
– Investigating more general mapping rule
language (analogies to compiler technology)
Planned Experimental
Context
EVA 2000
• CIMI Experiments
–
–
–
–
Dublin Core for basic resource descriptions
Richer descriptions derived from ABC model
Mapping among descriptions
Understanding relationship between ABC and
CIDOC CRM
• Connecting with Recordkeeping Metadata
Issue - SPIRT Project
EVA 2000 Moscow
Metadata Infrastructure
EVA 2000
Metadata is language
• Metadata schemas are languages for
making statements about resources:
– Book has Title "Gone with the Wind".
– Web page has Publisher "Springer Verlag".
• Vocabulary terms (elements) are
defined in standards like Dublin Core
• Metadata grammars constrain the
statements and data models one can
form
EVA 2000
But languages evolve with use
• Inevitably, languages resist stability
• People stretch official definitions
• Implementers misunderstand the
intended meaning or use of elements
• Implementors coin local terms and
extensions
• If the application does not fit the
standard, the standard is often
"customized" to fit the application
EVA 2000
Metadata languages are
"multilingual"
• Metadata is not a spoken language
• The words of metadata -- "elements" -- are
symbols that stand for concepts
expressible in multiple natural languages
• Standards may have dozens of translations
• Are concepts like "title", "author", or
"subject" used the same way in English,
Finnish, and Korean?
EVA 2000
What metadata languages
lack
• Comprehensive dictionaries
– Where can one get an overview of vocabulary
terms used in metadata languages?
• A publication context for implementers
– Where can you see how they are using
metadata?
• Standard grammars
– How do we understand the principles of
metadata?
Can we manage this evolution?
EVA 2000
• How can we (scalably) monitor the usage
of a language that is:
– Never spoken?
– Rarely published in a way that can be
harvested?
• How can dictionary editors help a
metadata language evolve and grow in
response to usage?
• How can this evolution occur across
(human) languages?
EVA 2000
RDF Schemas (RDFS) -- W3C
standard
• A dictionary format for metadata terms:
– Simple XML format for terms and definitions
• Example: "Title" (Dublin Core)
– Human-readable label and definition:
• Title: A name given to the resource.
– Unique, machine-readable identifiers
• dc:title
• Support for cross-references
– between terms in related standards
– between local adaptations and related standards
EVA 2000
Print world versus the Web
• Traditional print world
– Standards are currently defined and published as
paper documents or Web pages in HTML
– Metadata implementors rarely publish their local
extensions and adaptations
• RDF Schemas (RDFS)
– Web-based publication format
– Explicit cross references from implementation
schemas and the standards on which they are based
EOR -- an RDF Schema
Browser
EVA 2000
• Harvests RDF Schemas
–
–
–
–
Schemas distributed on multiple Web servers
Creates huge database of schemas for searching
Web interface functions as a "metadata browser"
Click on cross-references between linked terms
• Downloadable as open source software
– http://eor.dublincore.org/index.html
– Authors: Eric Miller (OCLC, RDF Working Group, DCMI) and
Tod Matola
EVA 2000
Hyperlink Metadata Terms over
the Web
• Index of metadata terms searchable as one
huge database
• Click on cross-references to follow term-toterm links between vocabularies
• Point-to-point, like the Web itself
– In 1992, Gopher located the right file within
directory trees (but not points within the file)
– HTML enabled point-to-point links between
documents
"Editor" -- a MARC relator -- refines "Contributor"
EVA 2000
Follow the link to MARC Relator Terms
EVA 2000
...the source of which looks like this:
EVA 2000
...or to Contributor
EVA 2000
[here, in English, French, German]
Or view the schema of MyRDF itself...
EVA 2000
...itself an RDF schema like the others
EVA 2000
EVA 2000
Registries can function as
dictionaries
• Historically, dictionaries of English,
French, etc: recorded variants,
prescribed forms, and helped
standardize (national) languages
• Metadata dictionaries can help metadata
vocabularies evolve more like other
human languages
– Not just top-down, like traditional
standards
– Also bottom-up, in response to usage
EVA 2000
Dictionaries prescribe and
describe
• Prescribe definitions and recommend
usage
• Describe how terms are actually used
– Monitor usage through collecting examples
• Editors and usage boards must strike a
balance between prescription and
description.
EVA 2000
SCHEMAS Project -- a Thin
Registry
• http://www.schemas-forum.org, an EU Project
• Pointers to resources elsewhere (a "thin"
registry or portal)
• Short descriptions of metadata standards
activities
• Critical commentaries by domain experts
• Promote the publication of schemas (in RDF)
• Goal: help implementors discover how others
(e.g. EU Projects) are using standards in order
to harmonize usage
DCMI -- a Thick Registry
EVA 2000
• A thick registry: stores official metadata
element definitions in a central database or
repository
• Managing a namespace (as a standards agency):
publish qualifiers as available, with version
control
– Managing translations of the standard in multiple
languages
• Eventually:
– User guide interface
– Support for standardisation processes (peer review)
– Downloadable input to software tools for generating,
editing, validating DC metadata
Dictionaries as a tool for
harmonization
EVA 2000
• Knowledge of how other projects are using
standards will avoid "reinventing the wheel"
• To help information providers harmonize
their schemas for improved access within
domains:
–
–
–
–
–
Between countries (Nordic Metadata Project)
Preprint repositories (Open Archives Initiative)
Subject gateways (Renardus)
Theses and dissertations (NDLTD)
Mathematics and physics (MathNet, PhysNet)
EVA 2000
A global registry
infrastructure?
• Analogously to HTML for text, RDF Schema
format suggests a scalable ecology of metadata
vocabularies on the Web
• Sharing machine-readable elements translated
into many languages suggests a global
(multilingual) metadata language for digital
libraries
• Can a well-managed registry infrastructure allow
this language to evolve -- with flexible innovation
in usage alongside more stable standards?
EVA 2000
The scope of registries
• Anything "semantic" (terms and
definitions) is potentially an RDF schema:
– controlled vocabularies
– namespaces, application profiles, annotations
– the "schema" of the registry itself
• Application constraints can be modelled in
XML Schemas
– "title is mandatory"; "date must be after 1980"
• Will XML and RDF Schemas merge?
EVA 2000 Moscow
Deploying and Using
Metadata
EVA 2000
Syntax Alternatives:
HTML
• Advantages:
– Simple Mechanism – META tags embedded in
content
– Widely deployed tools and knowledge
• Disadvantages
– Limited structural richness (won’t support
hierarchical,tree-structured data or entity
distinctions).
– Limited formalisms (parsing and schema
definition)
EVA 2000
Dublin Core in HTML
<link rel="schema.DC" href="http://purl.org/dc">
<meta name="DC.Title" content="Business Unusual”
<meta name="DC.Creator" content="Carl Lagoze">
<meta name="DC.Subject" content="bibliographic control web
cataloging ">
<meta name="DC.Date" scheme="W3CDTF"
content="2000-10-23">
<meta name="DC.Format" content="text/html">
<meta name="DC.Identifier"
content="http://lcweb.loc.gov/lagoze_paper.html">
Syntax Alternatives:
XML
EVA 2000
• The standard for networked text and data
• Wide-spread tool support
–
–
–
–
–
Parsers (DOM and SAX)
Extensibility (namespaces)
Type definition (XML Schema)
Transformation and Rendering (XSLT)
Rich linking semantics (XLINK)
EVA 2000
XML Schema
• Rich XML-based language for expressing
type semantics
• Replaces arcane and limited DTD (origin in
SGML)
• Facilities
– Data typing (both complex and primitive)
– Constraints
– Defaults
EVA 2000
Dublin Core in XML
<metadata
xmlns:dc="http://www.openarchives.org/OAI/dc.xsd">
<dc:creator>Carl Lagoze</dc:creator>
<dc:title>Accommodating Simplicity and Complexity in
Metadata</dc:title>
<dc:date>2000-07-01</dc:date>
<dc:publisher>Cornell University, Computer
Science</dc:publisher>
</metadata>
EVA 2000
Syntax Alternatives:
RDF
• RDF (Resource Description Format)
• The instantiation of the Warwick
Framework on the Web
• Provides enabling technology for richlystructured metadata
• Rich data model supporting notions of
distinct entities and properties
• Syntax expressed in XML
EVA 2000
RDF Components
• Formal data model
• Syntax for interchange of data
• Schema Type system (schema model)
RDF Data Model
EVA 2000
• Directed labeled graphs
• Model elements
–
–
–
–
–
Resource
Property
Value
Statement
Containers
EVA 2000
RDF Model Primitives
Property
Resource
Value
Resource
Statement
EVA 2000
RDF Syntax Example
dc: Title
URI:R
“CIMI Presentation”
dc: Creator
“Eric Miller”
<RDF xmlns = “http://www.w3.org/TR/WD-rdf-syntax#”
xmlns:dc = “http://purl.org/dc/elements/1.0/”>
<Description about = “URI:R”>
<dc:Title> CIMI Presentation </dc:Title>
<dc:Creator> Eric Miller </dc:Creator>
</Description>
</RDF>
RDF Model Example #2
EVA 2000
dc: Title
“CIMI Presentation”
URI:R
oa: Creator
“Eric Miller”
URI:ERIC
bib:Aff
“OCLC”
URI:OCLC
bib:Name
“Eric Miller”
bib:Email
“emiller@
oclc.org”
RDF Syntax Example #2
EVA 2000
<RDF xmlns = “http://www.w3.org/TR/WD-rdf-syntax#”
xmlns:dc = “http://purl.org/dc/elements/1.0/”
xmlns:bib = “http://www.bib.org/persons#”>
<Description about = “URI:R”>
<dc:Title> CIMI Presentation </dc:Title>
<oa:Creator>
<Description>
<bib:Name> Eric Miller </bib:Name>
<bib:Email> [email protected] </bib:Email>
<bib:Aff resource = “http://www.oclc.org” />
</Description>
</oa:Creator>
</Description>
</RDF>
EVA 2000
RDF Containers
• Permit the aggregation of several values
for a property
• Express multiple aggregation semantics
– unordered
– sequential or priority order
– alternative
EVA 2000
RDF Schemas
• Declaration of vocabularies
– properties defined by a particular community
– characteristics of properties and/or constraints on
corresponding values
• Schema Type System - Basic Types
– Property, Class, SubClassOf, Domain, Range
– Minimal (but extensible) at this time
– minimize significant clashes with typing system designed for
XML Schema WG
• Expressible in the RDF model and syntax
EVA 2000
Relationships among
vocabularies
dc:Creator
marc:100
ms:director
bib:Author
EVA 2000
Bringing it together
• RDF Metadata transmission
– Embedded (e.g. <META>), Transmitted with resource
(HTTP), Trusted 3rd Party (HTTP GET)
• RDF Data Model
– Support consistent encoding, exchange and processing
of metadata… critical when aggregating data from
multiple sources
• RDF Schema
– Declare, define, reuse vocabularies
EVA 2000
Open Archives Initiative
http://www.openarchives.org
EVA 2000
What is Interoperability?
• Naming?
– Handles
– Purls
• Metadata?
– MARC
– Dublin Core
• Document models?
– WebDAV
• Federated searching?
– Z39.50?
– DASL?
• Services and
Protocols?
– Dienst
EVA 2000
Partitioning Interoperability
Mediator Services
Linking, Searching, Summarizing
Metadata Harvesting
Document Models
The World According to OAI
EVA 2000
Service Providers
Searching
Current
Awareness
Data Providers
Summarization
EVA 2000
UPS Meeting Results
• Establishment of Open Archives Initiative
– Loose coalition to experiment with
interoperability solutions
• Santa Fe Convention
– Organizational and technical framework to
support metadata harvesting for ePrint
archives
EVA 2000
Metadata Harvesting is not
New
• Harvest Project (1992-1995)
– DARPA-funded
– Mike Schwartz (U. Colorado), Mic Bowman (Penn
State), Udi Manber (U. Arizona)
“Open” Archives
EVA 2000
• Political Agenda?
– Author self-archiving of E-Prints
– “Mission” to reformulate scholarly publishing
framework
• Technical?
– Infrastructure to facilitate interoperability
across multiple domains
EVA 2000
Other communities of
interest
• “Cambridge” digital library federation
meetings
– research library community has many materials
for which they’d like to ‘expose’ metadata
• San Antonio OAI workshop
– librarians, publishers (some), others
EVA 2000
Technical Umbrella for Practical
Interoperability…
E-Print
Archives
Publishers
Reference
Libraries
…that can be exploited by different communities
EVA 2000
Acting mission statement
Supply and promote an application independent
technical framework – a supportive infrastructure
that empowers different scholarly communities to
pursue their own interests in interoperability in the
technical, legal, business, and organizational contexts
that are appropriate to them.
Dan Greenstein, Director DLF
EVA 2000
What does this REALLY
Mean?
• Keep the bar low enough to make
widespread adoption possible
• Provide enough back-doors to make true
“disruption” possible (e.g., ePrint
community:
– refine record notion to mandate full-content
connection
– refine metadata to mandate linkage to fullcontent
EVA 2000
Organizational Stability
• Institutional backing of CNI (Coalition for
Networked Information) and DLF (Digital
Library Federation)
• Formation of steering committee
– first steps towards international involvement
EVA 2000
Framework for Partitioning
Tasks
• Steering Committee
– policy guidance
• Technical Committee
– technical specifications
• Workshops
– public dissemination, feedback, communitybuilding
EVA 2000
Ithaca Technical Meeting
• Input
– experiences gained with implementing &
discussing the current SFc specs
– emerging interest for the application of SFcconcepts as a general interoperability
framework in a scholarly environment
EVA 2000
Ithaca technical meeting
• Output
– guidelines for an in-depth revised technical
spec to be issued early 2001
– stable for experimentation; not definitive
– minimize risk for early adopters
– maximize chances for future interoperability
across communities
EVA 2000
Components of OAI Model
underlying concepts
abstract principles
concrete implementation of principles
EVA 2000
OAI Underlying Concepts
managed archives (data providers)
records in an archive
open interface to archives
service providers
Building on Underlying
Concepts
EVA 2000
abstract
principles
metadata harvesting
identifiers
metadata set formats
acceptable use
registration
implementation
of principle
OAI harvesting protocol
URIs (community schemes)
DC & XML container (parallel sets)
Flow Control (usage restrictions)
(community specific)
EVA 2000
What is a record?
A record in an archive is a metadata-record.
The metadata record describes – and can
contain an entry point to- full-content.
EVA 2000
Metadata: Interoperability &
Extensibility
We recognize that archives will use specific metadata
sets and formats that suit the needs of their communities
and the types of data they handle. However,
interoperability depends on a shared format for
exchanging metadata and therefore archives should
implement the basic Open Archives Metadata Set.
EVA 2000
Metadata Solutions
• Adoption of unqualified Dublin Core
Element Set as required metadata.
• Support for parallel metadata sets
maintained
– EPMS (e-print community)
– Others
• Research library community
• Museum community
EVA 2000
Metadata XML Container
<record>
<header>
<identifier>oai:arXiv:hep/001001</identifier>
<datestamp>1999-12-25</datestamp>
</header>
<metadata xmlns:dc=“http:…”>
<dc:creator>Ernest Rutherford</dc:creator>
<dc:title>Investigations of Radioactivity
</dc:title>
<dc:identifier>doi:1234/5432</dc:identifier>
</metadata>
</record>
EVA 2000
Identifier Issues
• Basic identifier constraints based on URI
specifications
– A key for requesting a record from a repository
– Key and metadata format ID uniquely identify a
record
• Individual communities may develop URN
registration schemes
Identifier Solutions
EVA 2000
full-identifier = oai:archive-identifier:record-identifier
Registered
URI
Scheme
Unique ID within
archive:
(syntax is archiveexample = oai:ncstrl:ncstrl.cornellcs/TR94-1418
Archive
specific)
Idendifier:
Registered within
OAI
EVA 2000
Repositories, Identifiers, and
Records
Identifier
Datestamp
MF1
MF2 MF3 MF4
<record>
<header>
…
</header>
<metadata>
….
</metadata>
<record>
Selective harvesting
EVA 2000
• Recognized need for light-weight facility
for selective harvesting
– By Date
• Sets
– A low-cost means of selective harvesting
– NOT a general tool for defining global
categories
– Attribution of meanings to sets can be done
within communities and in bilateral fashion
Protocol Solutions
EVA 2000
• Normalized and Enhanced Verb Set
–
–
–
–
–
–
GetRecord
Identity
ListIdentifiers
ListMetadataFormats
ListRecords
ListSets
Protocol Solutions
EVA 2000
• CGI-script friendly syntax
–
–
–
–
baseurl?verb=verbname&argname=argval...
verbname is the name of the verb
argname is the name of the attribute
argval is the value of the attribute
• Example
http://foo/blaz?verb=ListRecords&set=S1
EVA 2000
Registration Solutions
• Automation through:
– On-line registration of:
• Archive identifier (uniqueness enforcement)
• base-url of archives OAI protocol implementation
– Identity verb that exposes archive
characteristics
– Use of protocol for registration of metadata
formats and validity checking
• Registration of service providers is still an
open issue
EVA 2000
Release Schedule
• October 15 – normalized meeting notes
distributed to meeting group
• November 1 – beta specification to
steering committee and limited distribution
• Early January – stabilization of
specification and public meeting
EVA 2000 Moscow
Metadata Landscape
EVA 2000
Conferences
• ACM Digital Libraries 2001, San Antonio, June
2001, http://www.dl00.org/
• European Conference on Digital Libraries,
Darmstadt, Sep 2001 http://www.ecdl2001.org
• Asian Digital Library Conference, Seoul, December
2000, http://ADL2000.kaist.ac.kr
• Tenth International WWW Conference, Hong
Kong, May 2001, http://www10.org
EVA 2000
NSF Digital Library
Initiative
• Phase I (1994-1998): six large-scale
testbeds involving research universities,
industrial partners, and next-generation
technologies
• Phase II (1999+): expanded scope, smaller
projects as well as large testbeds,
emphasis on making accessible new types of
content
EVA 2000
Distributed National
Electronic Resource (UK)
• A managed environment for Internet access to
scholarly journals and other materials relevant to
higher education in the UK
• Uses international standards (eg, Dublin Core)
• National purchase and licensing agreements for
best value to UK education community
• eLib research funding since mid-1990s emphasized
incremental improvement of standards and
services
EVA 2000
Global Info (Germany)
• "The German Digital Library Project"
• Since 1996, integrating access to
scientific information among libraries,
publishers, learned societies, and
individual scientists
• Emphasis on open standards (e.g.,
Dublin Core) and open-standard
formats (e.g., XML, RDF, MPEG)
European Union
EVA 2000
• Fifth Framework Programme, 1998-2002
–
–
–
–
several dozen projects with several countries each
Digital Heritage, Cultural Content
Interactive Electronic Publishing
Multimedia Content and Tools
• DELOS Network of Excellence
– http://www.ercim.org/delos/
– Communication within European digital library research
community and international networking
EVA 2000
MathNet
• German Mathematical Societies index math
pre-prints and home pages of mathematicians
– Encourages use of Dublin-Core-based metadata by
distributing free metadata editor; displays hits "with
metadata" separately from hits "without metadata"
• International Mathematical Union (IMU)
planning international Web service based on
German MathNet model
• Seeking international agreement on simple
metadata profiles for types of math materials
EVA 2000
IMS Global Learning
Consortium, Inc.
• Teachers seeking appropriate classroom
materials on Web may want to know:
– for which age-group?
– has it already been used successfully in
classrooms?
– will it work on my equipment?
• IMS: Rich descriptions of learning
resources in a standard record format
Federal Geographic Data
Committee
EVA 2000
• (US) FGDC Content Standard for Digital
Geospatial Metadata: integrate access to
resources about a particular area found in
diverse repositories
• Government, education, and business needs
–
–
–
–
Emergency management
Integrated databases and comprehensive maps
City planning
Environmental control
EVA 2000
Visual Resources Association
• VRA Core Categories in a two-level model
for describing objects such as paintings
and buildings
• "Works" described separately from
"images" of those works (One-to-One
Principle)
• Conceptual clarity of One-to-One Principle
implies more complex work-flow and
processing for catalogers and software
EVA 2000
Nordic Metadata Project
• Cooperation between Scandinavian
countries (since circa 1996)
• Pioneered idea of metadata-based
distributed index across national
boundaries
• NetLab (Lund University) maintains
SAFARI, which harvests Dublin-Corebased metadata embedded in documents on
Web servers
EVA 2000
Renardus Project (EU)
• http://www.konbib.nl/coop/reynard
– National libraries (Netherlands coordinates)
– NDR: National Digital Resource in UK
– Die Deutsche Bibliothek
• Goal: integrated access to subject
gateways in Europe
• High-level agreement on simple, DublinCore-based schema as common
denominator
EVA 2000
Networked Digital Library of
Theses and Dissertations
(NDLTD)
• http://www.ndltd.org
• International consortium of projects
putting dissertations online
• Difficult to agree on single unified
metadata schema -- national, legal, and
disciplinary requirements differ
significantly
• NDLTD agreement on a small Dublin-Corebased set of metadata elements?
CIDOC
EVA 2000
• International Council of Museums: objectoriented model (CIDOC) designed for
describing multiple entities that may be
–
–
–
–
physical (e.g., museum objects)
conceptual (e.g., works)
temporal (e.g., historical periods)
spatial (e.g., places)
• Implies an integrated information space of
"encyclopedic" scope
EVA 2000
Rich Site Summary (RSS)
• Metadata for content syndication (news
feeds)
• Used in developing media content portals
• Built on established vocabularies (DC), uses
RDF syntax
• Layers of application-specific semantics:
syndication vocabularies, annotation
vocabularies, etc.
EVA 2000
Moving Picture Experts
Group (MPEG)
• MPEG 4: encoding and interacting with
audio-visual objects
• MPEG 7: multimedia content description
interface for such objects
• MPEG 21: ambitious "umbrella" framework
describing the infrastructure for
delivering and consuming multimedia
content
EVA 2000
More...
• INDECS - Uses an event-based model to describe
intellectual property rights for commercial
transactions
• DOI - Uses the INDECS framework with a Digital
Object Identifier for content description and
management of references between scientific,
technical, and medical journals
• BSR - Basic Semantic Registry as a universal
interlingua of concepts
• GILS - Government Information Locator Service
EVA 2000
...and more...
• PDS - Planetary Data System
• IEEE Learning Object Metadata - an elaborate,
hierarchical scheme for describing multiple facets
of educational material
• MARC 21 - Machine Readable Cataloging format
and related vocabularies for libraries
• EPICS Data Dictionary, a subset of which -- ONIX
-- describes books in a specific XML format
(pushed by Amazon.com)
EVA 2000
For further information....
• "Metadata Watch Reports" of SCHEMAS
Project, http://www.schemas-forum.org
– Critical overview (with expert commentary) on
the metadata landscape as it evolves
– Related database of individual activity reports
• D-Lib Magazine, http://www.dlib.org/dlib/
• Ariadne, http://www.ariadne.ac.uk
Why the Web won
EVA 2000
• Tim Berners-Lee's original model was very
simple, and it was easy to implement
• Real-world experience with simple HTML led
iteratively to better understanding of
priorities
– As with bicycles and airplanes, there was no "theory"
for design -- design was perfected iteratively,
starting simple
• Complex standards impose significant costs,
especially if legacy data must be converted
Learning from experience
EVA 2000
• People are only human: the most perfect
language is always subject to interpretation
• By design, metadata languages must allow for
innovation and evolution
• Physics and art history, Chinese and Finnish -different languages will continue in real life
• Likewise, a diversity of metadata languages is
inevitable
• Interoperability over "everything" can only be
via a simple and general pidgin
EVA 2000
[email protected]