Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative Presentation Outline • • • • • Introduction to Metadata Dublin Core Metadata Initiative Metadata Registries Syntax Alternatives for Web.

Download Report

Transcript Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative Presentation Outline • • • • • Introduction to Metadata Dublin Core Metadata Initiative Metadata Registries Syntax Alternatives for Web.

Dublin Core
Metadata Initiative
Stuart Weibel
OCLC Office of Research
Director,
Dublin Core
Metadata Initiative
Presentation Outline
•
•
•
•
•
Introduction to Metadata
Dublin Core Metadata Initiative
Metadata Registries
Syntax Alternatives for Web Metadata
A Few Strategic Applications
2
Introduction to Metadata
The Web as an
Information System
• Search systems are motivated by business models,
not user needs
• Index coverage is unpredictable and limited
• Too much recall, too little precision
• Index spam abounds
• Resources (and their names) are volatile
• Archiving is presently unsolved
• Authority and quality of service are spotty
• Managing intellectual property rights is hard
4
Metadata:
Part of a Solution
• Structured data about data
–
–
–
–
Organization and management of content
Support discovery
Direct content in channels
Enable automated discovery/manipulation
5
Internet Commons includes
Multiple Communities
Home
Pages
Scientific
Data
Commerce
Geo
Library
Internet
Commons
Museums
Whatever...
6
Interoperability
requires conventions about:
• Semantics
– The meaning of the elements
• Structure
– human-readable
– machine-parseable
• Syntax
– grammars to convey semantics
and structure
7
Haven’t we done
metadata already?
The MARC family of standards is
the single most successful resource
description standard in the world
8
What’s wrong with
this model on the Web?
• Expensive
– Complex
– Professional catalogers required
• Bias towards bibliographic artifacts
– Fixed resources
– Incomplete handling of resource evolution and
other resource relationships
• Anglo-centric
– MARC 21 accounts for ¾ of MARC records, but
there are other varieties
9
Dublin Core Metadata
Initiative
History of the
Dublin Core
• 1994: Simple tags to describe Web pages
• 1995: The Dublin Core is one of many
vocabularies needed ("Warwick Framework")
• 1996: The Dublin Core: 13 elements expanded
to 15 - appropriate for Text and Images
• 1997: WF needs formal expression in a
Resource Description Framework (RDF)
• 2000: Dublin Core Metadata Initiative
recommends qualifiers, broadens its
organizational scope beyond the Core
11
Dublin Core Metadata
Initiative
• The mission of DCMI is to make it easier
to find resources using the Internet
through the following activities:
– Developing metadata standards for discovery
across domains (example: the Dublin Core)
– Defining frameworks for the interoperation of
metadata sets
– Facilitating the development of community or
disciplinary specific metadata sets
12
DCMI Organizational
Structure
Board of Trustees
Executive Director
Managing Director
Usage
Board
DCMI
Subscribers
Standards
Development
WGs
Infrastructure
WGs
User Support and
Education
WGs
Liaison
Directorate
Advisory
Board
DCMI
Activity
Areas
13
DCMI Activities
• Standards development and maintenance
• Metadata registry and infrastructure
• Technical working groups and periodic
workshops
• Tutorial materials and user guides
• Education and training
• Open source software
• Liaisons with other standards or user
communities
14
Unqualified Dublin Core is the
Pidgin metadata language
• Metadata is language
• Dublin Core is a small and simple language -a pidgin -- for finding resources across
domains using the internet.
• Speakers of different languages naturally
"pidginize" to communicate
15
Qualifiers and
Domain-specific Extensions
• The Dublin Core architecture supports more
sophisticated metadata solutions through the
addition of:
– Qualifiers
– Domain-specific extensions
– Application Profiles of involving mixed namespaces
(more on this later)
• Increased sophistication comes at the cost of
some degree of interoperability
16
Varieties of Qualifiers:
Value Encoding Schemes
• Says that the value is
– a term from a controlled vocabulary (e.g.,
Library of Congress Subject Headings)
– a string formatted in a standard way
(e.g., "2001-05-02" means May 2, not
February 5)
• Even if a scheme is not known by software,
the value should be "appropriate" and
usable for resource discovery.
17
Varieties of qualifiers:
Element Refinements
• Make the meaning of an element
narrower or more specific.
– a Date Created versus a Date
Modified
– an IsReplacedBy Relation versus a
Replaces Relation
• If your software does not understand
the qualifier, you can safely ignore it.
18
A Grammar of
Dublin Core
• http://www.dlib.org/dlib/october00/baker/10baker.html
• By design not as subtle as mother tongues,
but easy to learn and useful in practice
• Pidgins: small vocabularies (Dublin Core:
fifteen special nouns and lots of optional
adjectives)
• Simple grammars: sentences (statements)
follow a simple fixed pattern...
19
implied
verb
implied
subject
Resource has
one of 15
properties
DC:Creator
DC:Title
DC:Subject
DC:Date...
property
property value
(an appropriate
literal)
X
qualifiers
(adjectives)
Resource has Subject
Resource has
Date
"Languages -- Grammar"
"2000-06-13"
Dumb-Down Principle for
Qualifiers
• The fifteen elements should be usable and
understandable with or without the qualifiers
• Qualifiers refine meaning (but may be
harder to understand)
• Nouns can stand on their own without
adjectives
• If your software encounters an unfamiliar
qualifier, look it up -- or just ignore it!
22
Using DC with other
vocabularies
• Specialized application profiles may need
to:
– Use general-purpose Dublin Core
elements
– Use elements from another, more
domain-specific standard
– Narrow standard definitions of DC
elements for specific local uses
– Invent local elements outside the scope
of existing standards
23
What is an
Application Profile?
• A metadata schema incorporating a set of
elements from one or more metadata element
sets
• A set of policies defining how the elements
should be applied to the domain of the
application
• A set of guidelines that make the policies
concerning elements explicit
24
Multiple Namespace
Fragment
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:co="http://purl.org/rss/1.0/modules/company/"
<dc:publisher>The O'Reilly Network</dc:publisher>
<dc:creator>Rael Dornfest</dc:creator>
<dc:rights>Copyright &#169; 2000 O'Reilly &amp; Associates,
Inc.</dc:rights>
<dc:date>2000-01-01T12:00+00:00</dc:date>
<dc:description> XML is placing increasingly heavy loads on the existing
technical infrastructure of the Internet. </dc:description>
<co:name>XML.com</co:name>
<co:market>NASDAQ</co:market>
<co:symbol>XML</co:symbol>
26
Namespaces
and Translation
• Dublin Core has been translated into 26
languages
– machine-readable tokens are shared by all
– human-readable labels are defined in
different languages
– translations are distributed, maintained in
many countries
– eventually linked in DCMI registry
28
One concept identifier –
with labels in many languages
“Verfasser”
rdfs:label
rdfs:label
dc:creator
[German]
“Creator”
[English]
rdfs:label
“Pencipta”
[Indonesian]
29
Metadata Registries:
Dictionaries of Metadata
terms and Usage
Metadata is language
• Metadata schemas are languages for making
statements about resources:
– Book has Title "Gone with the Wind".
– Web page has Publisher "Springer Verlag".
• Vocabulary terms (elements) are defined in
standards like Dublin Core
• Metadata grammars constrain the
statements and data models one can form
31
Metadata languages
are Multilingual
• Metadata is not a spoken language
• The words of metadata -- "elements" -- are
symbols that stand for concepts
expressible in multiple natural languages
• Standards may have dozens of translations
• Are concepts like "title", "author", or
"subject" used the same way in English,
Finnish, and Korean?
32
Languages
Evolve With Use
• Inevitably, languages resist stability
• People stretch official definitions
• Implementers misunderstand the intended
meaning or use of elements
• Implementors coin local terms and
extensions
• If the application does not fit the standard,
the standard is often "customized" to fit
the application
33
How do we manage
this evolution?
• How can we monitor the usage of a language
that is:
– Never spoken?
– Rarely published in a way that can be harvested?
• How can dictionary editors help a metadata
language evolve and grow in response to
usage?
• How can this evolution occur across (human)
languages?
34
RDF Schemas (RDFS) -W3C standard
• A dictionary format for metadata terms:
– Simple XML format for namespaces, terms and
definitions
• Example: "Title" (Dublin Core)
– Human-readable label and definition:
• Title: A name given to the resource.
– Unique, machine-readable identifiers
• dc:title
• Support for cross-references
– Between multiple language renditions of a namespace
– between terms in related standards
– between local adaptations and related standards
35
Registries can function
as dictionaries
• Metadata dictionaries can help metadata
vocabularies evolve more like other human
languages
– Not just top-down, like traditional standards
– Also bottom-up, in response to usage
36
DCMI –
Metadata Registry
• Stores official metadata element definitions in a
central database or repository
• Managing a namespace (as a standards agency):
publish qualifiers as available, with version control
– Managing translations of the standard in multiple
languages
• Eventually:
– User guide interface
– Support for standardisation processes (peer review)
– Downloadable input to software tools for generating,
editing, validating DC metadata
37
Dictionaries as a tool for
harmonization
• Knowledge of how other projects are using
standards will avoid "reinventing the wheel"
• To help information providers harmonize
their schemas for improved access within
domains:
–
–
–
–
–
Between countries (Nordic Metadata Project)
Preprint repositories (Open Archives Initiative)
Subject gateways (Renardus)
Theses and dissertations (NDLTD)
Mathematics and physics (MathNet, PhysNet)
38
A global registry
infrastructure?
• RDF Schema format suggests a scalable ecology
of metadata vocabularies on the Web
• Sharing machine-readable elements translated
into many languages suggests a global
(multilingual) metadata language for digital
libraries
• Can a well-managed registry infrastructure allow
this language to evolve -- with flexible innovation
in usage alongside more stable standards?
39
EOR -- an RDF Toolkit
for Schema Infrastructure
• Harvests RDF Schemas
– Schemas distributed on multiple Web
servers
– Creates huge database of schemas for
searching
– Web interface functions as a "metadata
browser"
– Click on cross-references between linked
terms
• Downloadable as open source software
– http://eor.dublincore.org/
40
EOR Toolkit
• Integrate RDF components for supporting search
services, topic-maps, site-maps, annotation
environments and semantic metadata registries
• Base-level functionality of this toolkit includes:
–
–
–
–
Creation, deletion, and management of RDF databases.
Ability to infuse RDF instance data into RDF databases.
Ability to search RDF databases.
Generic interface design capabilities to support RDF
applications.
– Web interface functions as a "metadata browser„
• Open Source: http://eor.dublincore.org
41
Syntax Alternatives for
Web Metadata
Syntax Alternatives:
HTML
• Advantages:
– Simple Mechanism – META tags embedded in
content
– Widely deployed infrastructure (the Web)
– Public domain tools
• Disadvantages
– Limited structural richness (won’t easily support
hierarchical,tree-structured data or entity
distinctions ).
43
Syntax Alternatives:
XML
• The standard for networked text and data
• Wide-spread tool support
–
–
–
–
–
Parsers (DOM and SAX)
Extensibility (namespaces)
Type definition (XML Schema)
Transformation and Rendering (XSLT)
Rich linking semantics (XLINK)
44
XML DTDs
• Works, but…
• DTDs are a stopgap measure
– Extensibility is problematic
– Many ways to ‘say’ the same thing (too much
flexibility)
– Interoperability must be pre-coordinated
– DTDs cannot evolve gracefully
– Granularity is at the level of the DTD
45
XML Schemas
• Rich XML-based language for expressing
type semantics
• Replaces arcane and limited DTD (origin in
SGML)
• Facilities
– Data typing (both complex and primitive)
– Constraints
– Defaults
46
Syntax Alternatives:
RDF
• RDF (Resource Description Format)
• The instantiation of the Warwick
Framework on the Web
• Rich data model supporting notions of
distinct entities and properties
• Syntax expressed in XML
• Granularity is at the level of the element,
not the entire schema as with XML DTDs
47
RDF Components
• RDF Model and Syntax WG
– Formal data model
– Syntax for interchange of data
• RDF Schema (RDFS)
– Type system (schema model)
48
RDF Schemas
• Declaration of vocabularies
– properties defined by a particular community
– characteristics of properties and/or constraints on
corresponding values
• Schema Type System - Basic Types
– Property, Class, SubClassOf, Domain, Range
– Minimal (but extensible) at this time
– minimize significant clashes with typing system designed for
XML Schema WG
• Expressible in the RDF model and syntax
49
RDF:
In Summary
• RDF Metadata transmission
– Embedded (e.g. <META>), Transmitted with resource
(HTTP), or from a trusted 3rd Party
• RDF Data Model
– Support consistent encoding, exchange and processing
of metadata… critical when aggregating data from
multiple sources
• RDF Schema
– Declare, define, reuse vocabularies
50
Unresolved Issues Concerning
RDF and XML Schemas
• RDF Schemas and XML Schemas have
overlapping functionality
– XML Schemas provide strong data typing, but also
supports semantic specifications
– RDF is focused on semantic data model and
extensible namespace management
• Resolution of overlap and market acceptance
will determine the future of each
• Semantic Web Activity in the W3C
Chartered to address such issues:
http://www.w3.org/2001/sw
51
A Few Strategic Projects
Open Archives Initiative
http://www.openarchives.org
• Protocols to support alternative scholarly
publishing solutions:
• Federated repositories for:
– ePrints
– Libraries
– Publishers
• OAI archives may contain full text or
surrogates (metadata)
• Metadata harvesting protocols
53
OAI Metadata
OAI archives will use specific metadata sets and
formats that suit the needs of their communities
and the types of data they handle.
However, interoperability depends on a shared
format for exchanging metadata and therefore
archives should implement the basic Open Archives
Metadata Set.
54
OAI Metadata Solutions
• Adoption of unqualified Dublin Core
Element Set as required metadata.
• Support for parallel metadata sets
maintained
– EPMS (e-print community)
– Others
• Research library community
• Museum community
55
Renardus Project (EU)
• http://www.konbib.nl/coop/reynard
– National libraries (Netherlands coordinates)
– NDR: National Digital Resource in UK
– Die Deutsche Bibliothek
• Goal: integrated access to subject
gateways in Europe
• High-level agreement on simple, DublinCore-based schema as common
denominator
56
Networked Digital Library of
Theses and Dissertations (NDLTD)
• http://www.ndltd.org
• International consortium of projects
putting dissertations online
• NDLTD agreement on a small Dublin-Corebased set of metadata elements with
extensions to support application-specific
needs
• http://www.ndltd.org/standards/metadata
/current.html
57
PRISM
Publishing Requirements for Industry
Standard Metadata
• PRISM XML metadata standard for syndicating,
aggregating, post-processing and multi-purposing
content from magazines, news, catalogs, books and
mainstream journals.
• Uses DC and its relation types as the foundation
for its metadata
• Adobe, Time, Inc, Getty Images, Conde Nast,
Sotheby’s, Interwoven….
• http://www.prismstandard.org
58
Rich Site Summary (RSS)
http:/purl.org/RSS
• Metadata for content syndication (news
feeds)
• Used in developing media content portals
• Built on established vocabularies (DC),
using RDF syntax
• Layers of application-specific semantics:
syndication vocabularies, annotation
vocabularies, etc.
59
For further information....
• "Metadata Watch Reports" of SCHEMAS
Project, http://www.schemas-forum.org
– Critical overview (with expert commentary) on
the metadata landscape as it evolves
– Related database of individual activity reports
• D-Lib Magazine, http://www.dlib.org/dlib/
• Ariadne, http://ariadne.ac.uk
• DCMI Homepage, http://dublincore.org
60
DC-2001
• DC-2001 in Tokyo
– October 22-26, 2001
• Three tracks:
– Technical working group meetings
– Implementation reports and research papers
– General introduction and tutorials for nonexperts
61
How to Participate
• Join the
DC-General
mailing list
• Join a working
group
• Create a
working group
•Information on lists and working groups is available at
http://dublincore.org
62