Metadata 101 Amy Benson NELINET, Inc. November 7, 2005 Overview  Terms and definitions –  Categories of metadata schemes and tools –  How do they relate to each other? Uses.

Download Report

Transcript Metadata 101 Amy Benson NELINET, Inc. November 7, 2005 Overview  Terms and definitions –  Categories of metadata schemes and tools –  How do they relate to each other? Uses.

Metadata 101
Amy Benson
NELINET, Inc.
November 7, 2005
Overview

Terms and definitions
–

Categories of metadata schemes and tools
–

How do they relate to each other?
Uses and functions
–

What (the heck) do all those acronyms mean?
What do you do with them?
Staying power
–
Which ones do you really have to pay attention to?
Standards





Increase interoperability
Lower use and participation barriers
Build larger communities of users which can
drive creation of a wider range of relevant
services and tools (Windows vs Mac)
Improve chances of long term survival of
materials
Prefer open over proprietary
Categories

Metadata containers
–

Metadata standards
–


MARC, MODS, DC, EAD, TEI, ONIX, FGDC, GILS
Metadata content standards
Transmission standards and protocols
–

XML, RDF
METS, OAI, SOAP, Z39.50, SRW
Identifiers
–
URI, URL, PURL, URN, DOI, ISTC
Metadata - What is it?



Data about data
Information about any aspect of a resource size, location, attributes, topic, origin, use,
audience, creator, quality, access rights,
reviews… the list is endless
An aid to the discovery, identification,
assessment, and management of described
entities
Types of Metadata

Descriptive
–

Discovery
–

How can I find it?
Structural
–

What is it?
What files comprise it?
Administrative
–
When was it created?
Types of Metadata

Identifiers
–

Terms & conditions
–

How can I get to it?
Can I use it?
Preservation
–
Which key characteristics of the resource need to
be maintained?
Metadata Terms


Structured metadata
Extensibility
–

Granularity
–

Modify to suit local needs
Level at which item or collection of items is
described
Interoperability
–
–
Works with other systems
Share data across systems
Metadata - Who needs it?

Impact of metadata on collection access
–
–
–

Without metadata there is no service to users
Metadata provides the means for resource
discovery, grouping, filtering, matching user needs
Keyword searching works only for resources that
are text-based - excludes photographs, data sets,
objects, maps, audio, video…
Metadata itself as valuable content
–
Item descriptions, Finding aids, Reviews
Metadata

Description vs. discovery
–
–

Full description is important for collection inventory
and management - less so for discovery
Full description of a resource includes much
information that will never be part of a user’s search
key
Deep vs. shallow
–
Basic discovery metadata supports broad, crossdomain searching that can lead users to more
complete search mechanisms and descriptions
Interoperability




Interoperability allows different computer
systems, networks, and software to work
together and share information
Usually achieved by following standards
Generally, an increase in specialization results
in a decrease in interoperability
Allows different systems to make use of same
data
Interoperability

Advantages
–
–
–
–
–
Can increase awareness and use of collections
Reduces geographic and domain-specific isolation
of collections
Creates new avenues for scholarship
Likely to assist / promote the longevity of data and
collections
Holy Grail = one-stop access to the universe of
online resources
Interoperability

Disadvantages
–
–
–
–
–
–
–

Consensus
Compromise
Delays
Loss of independence
Uniformity
Increased implementation difficulties
Loss of specificity and detail
Worthy goal?
Interoperability

NINCH (National Initiative for a Networked
Cultural Heritage) Guide to Good Practice first
two of its six core principles:
1.
2.
Optimize interoperability of materials
Enable broadest use
Interoperability

Canadian Culture Online (CCO) Technical
Standards and Guidelines
–
–
Technical requirements that CCO-funded projects
must meet
Six metadata elements are required when
describing objects to ensure interoperability

title, creator, subject, date created, language, identifier
XML

eXtensible Markup Language
–
–
–
–
Based on SGML - Standardized General Markup
Language
Developed by WWW Consortium (W3C)
Open standard (non-proprietary)
Uses language tags, similar to HTML


<title>Gone with the wind</title>
A structure for storing and tagging information,
without prescribing how the information is
displayed or used
XML




Data stored in XML can be of many types
Its simple syntax is easy for machines to
process
Natural language tags make XML
understandable to humans
XML defines the syntax, but not the data
elements that make up an XML document
XML



The structure of XML allows for hierarchical
relationships – often necessary for complex
documents, 3-D objects, archives, etc.
XML is extensible – an important feature that
allows tags to be created by users or a
community of users
XML-encoded data is easily transformed or repurposed
XML - Elements Example
<!DOCTYPE list
[ <!ELEMENT list (book+)>
<!ELEMENT book (title, author*, date+, year, comment*, code*)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (aulast*, aufirst*)>
<!ELEMENT aulast (#PCDATA)>
<!ELEMENT aufirst (#PCDATA)>
<!ELEMENT date (day*, month*)>
<!ELEMENT day (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT comment (#PCDATA)>
<!ELEMENT code (#PCDATA)>
]>
XML – Record Example
<book> <title>Weaving the Web</title>
<author><aulast>Berners-Lee,</aulast>
<aufirst>Tim</aufirst></author>
<date> <day>6</day>
<month>January</month></date>
<year>2002</year>
<comment>Interesting topic, but not too well
written.</comment>
<code>nonfiction</code>
</book>
XML - Partial list of ONIX elements
RecipeML
XML

Usually, tags, definitions, and requirements are
defined and adhered to by a specific
community
–
DTD (Document Type Definition)

–
Describes the permissible data structure for an XML file
Schema


Also describes the permissible data structure for an XML
file
Newer, XML-based way to define XML document types
XML DTDs and Schemas

DTDs and schemas
–
–
–
–
–
Lay out the logical structure of the data
Establish rules about which elements a document
may have, which are required, which can repeat, etc.
Establish a root element, parent and child elements,
and where data can be placed within hierarchy
DTDs can be placed within an XML file, or be external
to it, and then referenced
Schemas are external
XML – Simple DTD Example
<!DOCTYPE list
[ <!ELEMENT list (book+)>
<!ELEMENT book (title, author*, date+, year, comment*, code*)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (aulast*, aufirst*)>
<!ELEMENT aulast (#PCDATA)>
<!ELEMENT aufirst (#PCDATA)>
<!ELEMENT date (day*, month*)>
<!ELEMENT day (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT comment (#PCDATA)>
<!ELEMENT code (#PCDATA)>
]>
XML – Ways to use XML


XML-encoded data is able to be re-purposed:
re-used in multiple contexts
Due to its ability to be easily parsed, software
can transform it in countless ways, thereby
allowing:




Easy migration paths
Alternative displays
On-the-fly response to user needs
Transform XML for display via style sheets
(XSL) and transformations (XSLT)
XML - XSL


XML prescribes the structure of a
document/record, but not content or display
XSL - eXtensible Stylesheet Language
–
–
–
XML uses stylesheets to display the code in userfriendly ways
Use different stylesheets to render the data in
different ways
Similar to Cascading stylesheets used for HTML
XML - XSLT

XML Stylesheet Language Transformations
(XSLT)
–
–
A markup language and programming syntax for
processing XML
Is most often used to:


Transform XML to HTML for delivery to standard web
clients
Transform XML from one set of XML tags to another
XML File
XML File Transformation
XML vs Traditional Database
Software

If your information is…
–
–
–
Tightly structured
Fixed field length
Massive numbers of individual items


You need a database
If your information is…
–
–
–
Loosely structured
Variable field length
Massive record size

You need XML
XML Software

Software
–
–
–
–

XMLSpy: http://www.xmlspy.com/
XMetal: http://www.xmetal.com/
AxKit: http://axkit.org/
Cocoon: http://xml.apache.org/cocoon/
Used to
–
–
–
Assist with content authoring and coding
Apply dynamic transformations to XML content
Render HTML for standard web browsers, PDAs,
cell phones, etc.
Namespaces


A namespace identifies a specific set of
elements
Namespaces allow metadata terms to be
unambiguously used across applications
–

Defines what ‘Date’ or ‘Title’ means in a specific
usage, or namespace
Each namespace has a unique identifier
associated with it
Namespaces - Example
<dc:DC
xmlns:dc='http://purl.org/dc/elements/1.1/'>
<dc:title>Internet Ethics</dc:title>
<dc:creator>Duncan Langford</dc:creator>
<dc:format>Book</dc:format>
<dc:identifier>ISBN 0333776267</dc:identifier>
Namespaces - Example
<d:student
xmlns:s='http://www.develop.com/student' '
xmlns:w='urn:schemas.develop.com:workshop'>
<s:id>3235329</s:id>
<s:name>Jeff Smith</s:name>
<w:name>Emerging Metadata Topics</w:name>
<s:institution>XNL</s:institution>
</d:student>
Resource Description Framework (RDF)



A structured framework for multiple resource
description schemas
Problem: data providers offer well organized
repositories of metadata, but use different
description systems
Solution: RDF - a way for machines to
understand multiple description systems or
metadata schemas and the relationship(s)
between them
RDF

Allows interoperability among multiple resource
description methods
–
–


Communities define and state their metadata
schemas in XML documents
Systems use the definitions and statements to
“understand” the metadata
In practice the element sets are namespaces
which are “called” or “stated” within RDF
RDF schemas “owned” by known groups
provide basis for trusted metadata
RDF Example
MARC

Advantages
–
–
–

Rich set of descriptive elements
Highly interoperable within library community
Long, established history
Disadvantages
–
–
–
Low extensibility
As is, not interoperable beyond the library world
Weak on administrative, rights, and other kinds of
metadata important for digital resources
MARC

Future of MARC
–



Must MARC die? No. New life through XML
MARC XML from the Library of Congress (LC)
MODS: a version of MARC encoded in XML,
developed by the Library of Congress
Crosswalks between MARC and many other
metadata schemas already exist
MARC XML


LC has developed a MARC XML schema,
stylesheets, and tools
The schema allows representation of a
complete MARC record in XML
–

Lossless conversion
Will support new transformations to new uses
of MARC data
–
MARC to MARCXML to Dublin Core and MODS
Metadata Object Description
Schema (MODS)





Set of 20 bibliographic elements - a subset of
the MARC 21 Format for Bibliographic Data
Not as complete as the full MARC format, but
richer than Dublin Core (for example)
Highly interoperable with existing MARC records
Uses language-based tags, rather than numbers
like MARC 21 (245, 650, etc.)
Under development by the LC Network
Development and MARC Standards Office
MODS

XML-based
–



Intended to work with/complement other metadata
formats
Can be used for conversion of existing MARC
records or to create new resource description
records
Useful particularly for library applications that
want to go beyond the OPAC
Shares features of MARC and Dublin Core
MODS Elements










TitleInfo
Name
TypeOfResource
Genre
PublicationInfo
Language
PhysicalDescription
Abstract
TableOfContents
TargetAudience










Note
Cartographics
Subject
Classification
RelatedItem
Identifier
Location
AccessCondition
Extension
RecordInfo
MODS Elements




Title element is mandatory, all others are
optional
Elements can have subelements and attributes
which provide refining detail for the element
Elements and sub-elements are repeatable,
except in certain cases
Elements display in any order
MODS Example
MODS Implementation

MODS User Guidelines
–


http://www.loc.gov/standards/mods/registry.html
MODS Implementation Registry
Contains descriptions of MODS projects
planned, in progress, and fully implemented
–
http://www.loc.gov/standards/mods/registry.html
Dublin Core (DC)



A method of describing resources intended to
facilitate the discovery of electronic resources
Designed to allow simple description of
resources by non-catalogers as well as
specialists
National and International standard
–
–

ANSI/NISO standard Z39.85-2001
ISO standard 15836
Includes 15 “core” elements
Dublin Core Elements








Title
Creator
Subject
Description
Publisher
Contributor
Date
Type







Format
Identifier
Source
Language
Relation
Coverage
Rights
Dublin Core







All elements optional and repeatable
Elements display in any order
Authority control not required
Simple and Qualified DC
Extensible
Flexible
International
Dublin Core

Simple
–
–
–

Lowest common denominator
Less rich
Discovery role – leads to resource or more complete
description of resource
Qualified
–
–
More precise
Less interoperable
Dublin Core Examples



Generic
Title=“The sound of music”
HTML
<meta name = "DC.Title" content = “The
sound of music”>
XML
<?xml version="1.0"?> <metadata
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:title> The Sound of Music</dc:title> </metadata>
Dublin Core Examples - HTML
Dublin Core Examples - XML
DC Record in OCLC Connexion
Other Metadata Standards







Encoded Archival Description (EAD)
Text Encoding Initiative (TEI)
Visual Resources Association (VRA)
Global Information Locator Service (GILS)
Online Information Exchange (ONIX)
Content Standards for Digital Geospatial
Metadata (CSDGM) aka FGDC
Document Data Initiative (DDI)
ONline Information eXchange
(ONIX)



Developed and maintained by EDItEUR jointly
with Book Industry Communication and the
Book Industry Study Group
ONIX is the international standard for
representing and communicating book industry
product information in electronic form
XML-based
ONIX


Highly focused on e-commerce of books
ONIX was developed as a solution to two
perceived problems
–
–

(1) The need for richer book data online to improve
sales
(2) the widely varying format requirements of the
major book wholesalers and retailers interoperability
May appear in future library applications
CSDGM / FGDC



Primary standard for geospatial metadata
All federal agencies are required to produce
and collect geospatial data in this format
Allows for very detailed description
–


334 different metadata elements
Tremendous potential uses
Challenge is to establish interoperability with
other metadata standards
Metadata for Images in XML - MIX




A XML-based set of technical data elements
required to manage digital image collections
Encodes information such as image source,
compression scheme, & image editing software
Currently being developed by LC and the NISO
Technical Metadata for Digital Still Images
Standards Committee
Draft 0.2 available for review and comment
–
http://www.loc.gov/standards/mix/
Document Data Initiative (DDI)



International, XML-based standard for the
content, presentation, transport, and
preservation of documentation for datasets in
the social and behavioral sciences
Creating appropriate metadata will enable
effective, efficient, and accurate use of the
datasets
http://www.icpsr.umich.edu/DDI/codebook/
Crosswalks

Crosswalks map an element from one scheme
to its closest equivalent in another scheme
–


Example: MARC 1XX field is mapped to DC ‘creator’
Instrumental for converting data in one format
to another format - one that is potentially more
widely accessible
Support the demand for cross-domain
searching and interoperability
Crosswalks

There is rarely a one-to-one correlation
between elements of different schemes
–
–
–

One to many - DC to MARC
Many to one or none - MARC to DC
None to one or many
MARC to DC
–
http://www.loc.gov/marc/marc2dc.html#unqualif
Content Standards

AACR (Anglo-American Cataloguing Rules)
–
–
–
“The rules cover the description of, and the
provision of access points for, all library materials
commonly collected at the present time.”
The current text is the 2nd ed, 2002 Revision (with
2003, 2004, and 2005 updates)
The Joint Steering Committee for Revision of AACR
(JSC) is working on a new code, “RDA: Resource
Description and Access” scheduled to be published
in 2008
Content Standards

International Standard Bibliographic
Description (ISBD)
–
–
–
A family of standards to regularize the form and
content of bibliographic descriptions
Available for different material types: monographs,
computer files, etc.
Designed to promote record sharing and exchange
Content Standards

Book Industry Standards And Communications
(BISAC)
–
–
–
Metadata Committee has the responsibility for the
continued development and maintenance of ONIX
for Books in North America developed Metadata
Best Practices document
Intended as a response to the question, “I’ve
downloaded the ONIX documentation. Now what?”
http://www.bisg.org/docs/Best_Practices_Document.pdf
Content Standards

Describing Archives: A Content Standard
(DACS)
–
–
Designed to facilitate consistent, appropriate, and
self-explanatory description of archival materials
and creators of archival materials
Replaces Archives, Personal Papers, and
Manuscripts (APPM)
Content Standards

Western States Dublin Core Metadata Best
Practices
–
Provide guidelines for creating metadata records for
digitized cultural heritage resources
Element set based on Dublin Core
–
http://www.cdpheritage.org/resource/metadata/wsdcmbp/
–
Content Standards

Cataloging Cultural Objects (CCO)
–
–
–
–
–
Provides guidelines for selecting, ordering, and
formatting data used to populate catalog records
Designed to promote good descriptive cataloging,
shared documentation, and enhanced end-user
access
Feb. 2005 draft available for review
A project of the Visual Resources Association
http://www.vraweb.org/ccoweb/
Content Standards

Descriptive Metadata Guidelines for RLG
Cultural Materials
–
–
–
Designed to help institutions with decision making
about metadata for online access to collections
Can be used to create or review local best practice
in describing collections of cultural objects,
regardless of the specific metadata standard used
http://www.rlg.org/en/pdfs/RLG_desc_metadata.pdf
Application Profiles


Elements from one or more metadata
standards combined to suit the needs of a
specific community
May also include usage guidelines
–

Example: Title element is required
A Library Application Profile for Dublin Core is
under development
–
Working draft is available from the DCMI web site
Authority Control Anyone?




Recommended, but not required by many
schemas
Librarians know its value
Controlled vocabularies: LCSH
Thesauri
–

Getty Art & Architecture Thesaurus; LC Thesaurus
for Graphic Materials I & II
Pre-set searches
FAST




Faceted Application of Subject Terminology
(FAST)
LCSH is by far the most commonly used and
widely accepted subject vocabulary for general
application
Need for a new approach to subject vocabulary
for electronic resources
Easy to maintain and amenable to automatic
authority control and computer manipulation
FAST


Maintains upward compatibility with LCSH, and
any valid set of LC subject headings can be
converted to FAST headings
Retains the advantages of a controlled
vocabulary
–
–
Most LCSH headings are synthesized by catalogers
based on rules
For FAST, all headings (except chronological) are
established and only established headings can be
assigned
LCSH
650 American loyalists $z England.
651 United States $x History $y Revolution, 1775-1783 $v
Biography.
650 Secret service $z Great Britain.
650 Painters $z United States.
FAST
Faceting of LCSH
648
650
650
650
650
651
651
651
655
655
1775 - 1783
American loyalists
Revolution (United States, 1775-1783)
Secret service
Painters
England
United States
Great Britain
Biography
History
Authority Control: FAST vs. LCSH
LCSH
FAST
Many headings are established;
most assigned headings are
synthesized by catalogers based on
rules
All headings (except chronological)
are established
Very large number (billions plus) of
possible headings
Faceting limits the number of
possible headings to a few million
Most headings are distinct (based
on NACO normalization rules*);
some conflicts occur particularly
with $x & $v
All headings are distinct; tagging
and subfield coding provides no
unique information
*http:\\www.loc.gov/catdir/pcc/naco/normrule.html
Metadata Encoding & Transmission
Standard (METS)



A system for packaging metadata necessary
for both the management of digital library
objects within a repository and the exchange of
such objects between repositories, or between
repositories and their users
Used for: Digital collection repositories
Developed by the Digital Library Federation
(DLF) and Library of Congress (LC)
Metadata Encoding & Transmission
Standard (METS)


METS can be understood as a binder that
unites metadata about a particular resource
A METS record includes six parts:
–
–
–
–
–
–
Header
Descriptive metadata
Administrative metadata
File groups
Structural map
Behavior section
Object Components
(21 Files and counting…)
100 800 1400 2000
Pixel Pixel Pixel Pixel
GIF JPG JPG JPG TIFF PDF
Whole Document
Page 1
Page 2
Page 3
Page 4
TEI MrSid AIFF
METS Schema
METS
metsHdr
(METS
Header)
dmdSec
(Descriptive
Metadata)
amdSec
(Adminstrative
Metadata)
fileSec
(Files)
structMap
(Structure)
behaviorSec
(Viewers)
Open Archives Initiative (OAI)



A tool that supports interoperability among
multiple databases
OAI goal: coarse-granularity resource
discovery
OAI handles simple discovery from multiple
community-specific repositories with metadata
crosswalked to unqualified Dublin Core
OAI


Roots are in the science community interested
in locating and searching multiple repositories
of pre- and e-prints of scientific papers
Not really an archive, the way we traditionally
think of the word
OAI


Data providers expose (make available) the
metadata for their collections
Service providers harvest the exposed
metadata and aggregate it (so that one search
does it all) and/or provide additional services
related to the harvested metadata, such as
providing easy access to recent additions,
updated materials, pre-set searches, etc.
OAI

OAI Protocol for Metadata Harvesting
–
–
–
–
Metadata content must be encoded in XML and
have a corresponding XML schema for validation
Metadata must be supplied in unqualified Dublin
Core format, at least
Other metadata formats are optional
Metadata may optionally include a link to the actual
content / resource
OAI Infrastructure
repository
repository
Service Provider
DC
DC
DC
Harvester
DC
repository
DC
repository
OAI Infrastructure
user
search
Repository
OAI Infrastructure
user
search
Repository
repository
OAI Harvesters - Examples

Registered OAI Service Providers
–

http://www.openarchives.org/service/listproviders.html
OAIster
–
http://oaister.umdl.umich.edu/o/oaister/
OAI - Advantages




Data providers – more exposure of, and
therefore, ideally, more access to one’s data
Overcome the geographical and domainspecific isolation that can occur
Service providers – more data in one place is
of value to users
Service providers may offer additional services
beyond increased access: prints, rights
negotiation, etc.
Simple Object Access Protocol
(SOAP)


A protocol that defines how to request services,
objects, and information in a platformindependent manner using HTTP and XML
The main goal of SOAP is to facilitate
interoperability between systems that need to
interact
–

Can run applications as if local user
Used for: Web services & e-commerce
Z39.50




Z39.50 is a search and retrieval protocol,
maintained by LC, capable of operating over
TCP/IP
Negotiates queries with multiple, separate
databases – does not harvest + create new db
Built in to some library software systems
OAI not intended to replace other approaches,
but to provide an easy-to-use alternative for
different constituencies and purposes
Search/Retrieve Web Service



The primary function of SRW is to allow a user
to search remote databases of records
Protocol uses easily available technologies -XML, SOAP, HTTP, URI -- to perform tasks
traditionally done using proprietary solutions
such as database queries and responses
Builds on Z39.50 and moves it forward
–
ZING: Z39.50 International: Next Generation
Functional Requirements for
Bibliographic Records (FRBR)

A study by IFLA (International Federation of
Library Associations) of the full range of
functions performed by the bibliographic record
–
What do we use bibliographic records for?



Description, access, location, identification, annotations ...
The report provides a framework for the nature
of and uses for bibliographic records
A conceptual model that can be used as a
means to meet user needs and expectations
Functional Requirements for
Bibliographic Records (FRBR)

Tasks we use bibliographic records for:
–
–
–
–

Finding
Identifying
Selecting
Obtaining access to resources
FRBR should allow systems to handle
bibliographic data in new, useful ways that fulfill
these tasks
Functional Requirements for
Bibliographic Records (FRBR)


Conceptual model of relationships between
bibliographic entities
Hierarchical relationships
–
Work

–
The intellectual product
Expression


An ‘expression’ of the parent work such as a translation,
edition, revisions, annotated text, etc. –
Expressions entail additional intellectual effort
Functional Requirements for
Bibliographic Records (FRBR)

Hierarchical relationships
–
Manifestation


–
Published runs of each expression in multiple formats over
time
The level at which we traditionally create a catalog record
Item


Each copy of a specific manifestation
Circulation records track items
Functional Requirements for
Bibliographic Records (FRBR)

OCLC is researching the application of FRBR
to WorldCat
–

“FRBRization”
They have created an algorithm that groups
records automatically based on the
Work/Expression/Manifestation/Item model
http://www.oclc.org/research/projects/frbr/algorithm.htm
OCLC & FRBR


OCLC Research has developed algorithm to build
FRBR “work” sets using author/title keys
Fiction Finder Project: Research team mined record
content from all records for fiction materials in
WorldCat, applied FRBR algorithm to yield
–
–
An enriched record view for every work of fiction represented
in WorldCat
Better search results displays for WorldCat fiction records
including links to groups of related WorldCat records by
language, format, manifestation/edition, etc.
xISBN



A web service that takes as input an ISBN and
returns a list of other ISBNs of associated
intellectual works
Developed by OCLC’s Office of Research
Results intended for use by computer systems
to generate new searches such as in OPAC
RLG’s RedLightGreen


Search interface for the RLG union catalog of 126
million bibliographic records representing 42 million
titles
FRBR-esque implementation
–


Uses FRBR concepts such as Work, Expression and
Manifestation for record clusters
Designed for the web-savvy undergraduate
Offers filtering and grouping of search results
–
http://www.redlightgreen.com
Identifiers

Four potential purposes
–
Locator

–
Identifier

–
Unique label for a resource
Gatherers

–
Where is the document I seek?
Groups like resources similar to a uniform title
Differentiator

Helps identify different versions of same resource
Identifiers

Uniform Resource Identifiers (URI)
–
Generic set of all names/addresses that refer to
resources on the Web including:






Uniform Resource Locator (URL)
Persistent Uniform Resource Locator (PURL)
Uniform Resource Name (URN)
OpenURL
DOI
ISTC
Uniform Resource Locator (URL)



Web address or location at which a resource is
held, not an identifier for the resource itself
Most common way to locate documents / items
on the Web (http, ftp, mailto, etc.)
Not particularly stable or permanent
–

Error 404: File not Found
No metadata, but important starting point as
we look at some of the related technologies
Persistent Uniform Resource
Locator (PURL)




PURL Service is managed by OCLC
Functionally, a PURL is a URL
The PURL remains constant even if the URL
changes - its function is to automatically redirect a user to the current URL
PURL system/resolver is updated by resource
manager to reflect any changes to location of
the file, or URL
PURLs



PURLs can be used both in documents and in
cataloging systems
PURLs increase the probability of correct
resolution and long-term access to resources
Use of PURLs can reduce the burden and
expense of catalog maintenance (and business
card printing)
PURL - Example

US Government is a big user of PURLs
–
http://www.ccny.cuny.edu/library/Divisions/Governm
ent/iraqbib.html
OpenURL



OpenURL = context-sensitive linking
OpenURL is a method of transporting metadata
and identifiers within URLs to allow for the
delivery of context-sensitive services
For example, a URL can carry with it
information such as author / title from a
previous search to allow a system to reexecute a search in a second database without
re-entry of the data by the user
OpenURL Metadata
OpenURL Example



OpenURL incorporates data from a citation
search
Embeds metadata such as ISSN, date, volume
number, pages, etc. in an OpenURL
A valid OpenURL incorporating the metadata:
http://sfx.library.yale.edu/sfx_local?sid=Entrez:PubMed
&id=pmid:16135848
Uniform Resource Name (URN)




Uniform Resource Names (URNs) are intended to
serve as persistent, location-independent resource
identifiers
Globally unique
Never change
Format
–

urn:<namespace identifier>:<namespace specific string>
Use a resolver system to indicate current location of
resource
Digital Object Identifier (DOI)




Overseen by the International DOI Foundation
DOIs are persistent, location-independent
identifiers of resources
Developed to enable management of
copyrightable materials in an electronic
environment (locate, buy, sell, track, license)
Specific type / implementation of a URN
DOI

A two-part number with a prefix identifying the
original publisher and a suffix identifying the
specific work
–

Similar to the ISBN
A DOI resolution request for a specific resource
would return one or more URLs - *locations*
where a user could obtain access to the
resource
–
Appropriate copy: online, text, free, illustrated, etc.
DOI



Applications of the DOI will require metadata
The basis of the DOI metadata scheme is a
minimal "kernel" of elements
DOI minimal kernel elements of metadata:
–
DOI, DOI genre, identifier, title, type, origination,
primary agent, agent role, and administrative data
such as registrant, and date of registration
International Standard Text Work
Codes (ISTC)




Type of URN
Persistent and unique identifiers for textual
works – abstract, conceptual entities rather
than specific bibliographic manifestations
International Standard Codes are also being
developed for Audiovisual Works (ISAN) and
Musical Works (ISWC)
Emerging ISO standard
ISTC




ISTC Registration Authority will be managed by
a consortium comprised of CISAC, Nielsen
BookData, and R.R. Bowker Inc.
ISTCs will be assigned by the Registration
Authority and Regional Agencies
ISTCs can and will be assigned to works
retrospectively
Each registered work must include basic
metadata such as author, title, subject (ONIX)
ISTC

Similar to ISBN, but focused on the work
versus the manifestation
–
Madame Bovary, Chez Gallimard, 2001

–
Madame Bovary, Penguin, 2001

–
207041311X
0140448187
Two ISBNS, one single ISTC for the work, Madame
Bovary
ISTC


The ISTC will allow computer systems to bring
together all manifestations of an intellectual
work
What’s the point?
–
As multiple versions of books, documents, articles
proliferate, systems need a way to control
presentation and access to users who generally
don’t care about the difference between the Penguin
2001 edition and the Signet Classic 2001 edition
Semantic Web




The mother of all metadata projects, under
development by the W3C
An extension of the current Web in which
information is given well-defined meaning,
understandable to people and computers
This in turn, provides better integration of
existing information on the Web
Key components: URIs, XML, RDF
Summary



Planning and goal setting are two important
factors for successful metadata implementation
Stick with open standards (non-proprietary),
where possible
Keep an eye on XML, DC, OAI, METS - but
don’t quote me
Questions?
Amy Benson
Program Director
NELINET Digital Services
NELINET, Inc.
[email protected]
508.597.1937
800.635.4638 x1937