Metadata and electronic information Michael Day UKOLN: The UK Office for Library and Information Networking, University of Bath http://www.ukoln.ac.uk/ [email protected].

Download Report

Transcript Metadata and electronic information Michael Day UKOLN: The UK Office for Library and Information Networking, University of Bath http://www.ukoln.ac.uk/ [email protected].

Metadata and electronic
information
Michael Day
UKOLN: The UK Office for Library and Information
Networking, University of Bath
http://www.ukoln.ac.uk/
[email protected]
Metadata and electronic
information
Michael Day
UKOLN: The UK Office for Library and Information
Networking, University of Bath
Final CIRCE Workshop, The Council House,
Birmingham, 15 January 1999.
Presentation Outline
• Metadata - some definitions
• Metadata formats
• The resource discovery context
– Dublin Core
– Resource Description Framework (RDF)
• Interoperability
• Other metadata applications
3
Metadata: definitions (1)
Metadata = “data about data”
“… the Internet-age term for structured data
about data” - Joint NSF-EU Working Group
on Metadata (1998)
“… structured data about data that imposes
order on a disordered information universe”
- Carl Lagoze (Cornell University)
4
Metadata: definitions (2)
“… machine understandable information
about web resources or other things” - Tim
Berners-Lee (World Wide Web Consortium)
Roles:
• Provides information about resources
• Supports operations carried out on
information objects
5
Metadata formats
Diversity of metadata formats and
frameworks, e.g.:
• Dublin Core
• EAD, CIMI, TEI
• PICS, RDF
• MARC
• GILS, FGDC
• ROADS
http://www.ukoln.ac.uk/metadata/glossary/
6
Some examples (1)
USMARC:
245 00 Wordnews online $h [computer file].
246 3 World news online
256
Computer online service.
260
Washington, D.C. : $b Worldnews Online, $c [1995538
Mode of access: Internet.
500
Title from title frame.
520
“WorldNews OnLine is a service … “
650 0 Newspapers $x Databases.
856 7 $u http://worldnews.net $2 http
Extract from: Nancy B. Olson, ed., Cataloguing Internet resources: a manual and
practical guide, 2nd ed. Dublin, Ohio: OCLC Online Computer Library Center, 1997.
http://www.purl.org/oclc/cataloging-internet
7
Some examples (2)
TEI header:
<teiHeader type="aacr2"><fileDesc><titleStmt>
<title type="245">Rubaiyat of Omar Khayyam : the astronomer
poet of Persia / rendered into English verse by Edward
Fitzgerald ; with drawings by Florence Lundborg</title>
<title type="gmd">[electronic resource]</title>
<author>Omar Khayyam</author> [...]
<respStmt>
<resp>Creation of machine-readable version:</resp>
<name>Stephen Ramsay, Electronic Text Center</name>
<resp>Conversion to TEI.2-conformant markup:</resp>
<name>University of Virginia Library Electronic Text Center
</name>
</respStmt> [...]
From: University of Virginia Library, Cataloging Services Department, Cataloging
Procedures Manual, Chapter XII. Charlottesville, Va.: University of Virginia Library,
1996-98.
http://www.lib.virginia.edu/cataloging/manual/chapters/chapxiib.html
8
Some examples (3)
IAFA template:
Template-Type: SERVICE
Handle: 871473886-23884
Title: Wellcome Unit for the History of Medicine
URI-v1: http://units.ox.ac.uk/cgi-bin/safeperl/wuhminfo/p?home.html
Admin-Email-v1: [email protected]
Publisher-Name-v1: Wellcome Unit for the History of Medicine
Publisher-Postal-v1: 45-47 Banbury Road, Oxford, OX2 6PE
Publisher-City-v1: Oxford
Description: The home page of the Wellcome Unit for the History of
Medicine, a sub-department of the Modern History Faculty of the University
of Oxford, this site provides information on the Unit, seminars,
conferences and workshops, research interests, staff, current projects, and
the graduate programmes.
Keywords: History of Medicine; Medicine
Language-v1: English
Subject-Descriptor-v1: WZ40 History of Medicine
Subject-Descriptor-Scheme-v1: NLM
Record-Last-Modified-Date: Fri, 10 Oct 1997 19:09:16 +0000
Record-Last-Modified-Email: [email protected]
Record-Created-Date: Fri, 10 Oct 1997 19:09:16 +0000
Record-Created-Email: [email protected]
9
A metadata typology
Simple
Band One
Rich
Band Two
Band Three
(full text
indexes)
(simple
structured
generic
formats)
(more complex (part of larger
structure,
semantic
domain
framework)
specific)
Proprietary
formats
Proprietary
formats
Dublin Core
ROADS
IAFA/Whois++
templates
FGDC
MARC
TEI headers
ICPSR
EAD
CIMI
Adapted from: Lorcan Dempsey and Rachel Heery, “Metadata: a current view of
practice and issues”, Journal of Documentation, vol. 54, no.2, March 1998,
10
pp. 145-172.
Resource discovery
Approaches to Internet resource discovery:
• Robot-based global indexes, e.g. Alta
Vista, Lycos, etc.
• Subject gateways - e.g. ROADS-based
services
• Library catalogues, e.g. using
USMARC 856 field - InterCat project
(OCLC), BIBLINK
• Need for “core” metadata for simple
resource discovery and interoperability
- Dublin Core initiative
11
Dublin Core (1)
International initiative to define a core set of
metadata elements for resource
discovery on the Internet
• Six DC workshops (to date):
•
•
•
•
•
•
•
DC-1 (Dublin, Ohio) - 1995
DC-2 (Warwick) - 1996
DC-3 (Dublin, Ohio) - 1996
DC-4 (Canberra) - 1997
DC-5 (Helsinki) - 1997
DC-6 (Washington, D.C.) - 1998
DC-7 (Frankfurt/AM) - 1999
http://purl.oclc.org/dc
12
Dublin Core (2)
15 Elements:
•
•
•
•
•
•
•
•
Title
Subject
Description
Creator
Publisher
Contributor
Date
Type
•
•
•
•
•
•
•
Format
Identifier
Source
Language
Relation
Coverage
Rights
Core elements defined in RFC 2413:
http://src.doc.ic.ac.uk/computing/internet/rfc/rfc2413.txt
13
Dublin Core (3)
DC Qualifiers:
• TYPE - refines the meaning of
elements:
– Relation TYPE=IsPartOf
• SCHEME - associates the value with
an externally defined ‘scheme’:
– Subject SCHEME=DDC
– Date SCHEME=ISO 8601
• LANGUAGE - indicates the language
of the value
– Title LANGUAGE=en
14
Dublin Core (4)
Syntax issues:
• Simple DC can be embedded into
HTML Web pages
– Limited functionality
• Web moving to Extensible Markup
Language (XML)
• Resource Description Framework
– RDF … described as “an architecture for
metadata on the Web”
15
RDF
Resource Description Framework
• World Wide Web Consortium (W3C)
• Data model and XML-based syntax
• An implementation of the conceptual
‘Warwick Framework’
• Modular interoperability
• Useful for aggregating the different
metadata types required for managing
digital information over time
http://www.w3.org/RDF/
16
DC in HTML
Example of DC embedded in HTML:
<HTML>
<HEAD>
<TITLE>UKOLN Home Page</TITLE>
<META NAME="DC.Title” CONTENT="UKOLN: UK Office for Library and
Information Networking">
<META NAME="DC.Subject" CONTENT="national centre, network
information support, library community, awareness, research,
information services, public library networking, bibliographic
management, distributed library systems, metadata, resource
discovery, conferences, lectures, workshops">
<META NAME="DC.Description" CONTENT="UKOLN is a national centre
for support in network information management in the library and
information communities. It provides awareness, research and
information services">
<META NAME="DC.Creator" CONTENT=”UKOLN Information Services
Group">
</HEAD>
<BODY> [...]
17
DC in XML-RDF
<rdf:RDF
xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#”
xmlns:dc="http://purl.org/dc/elements/1.0/">
<rdf:Description about="http://www.ukoln.ac.uk/metadata/"
dc:Title="UKOLN metadata homepage”
dc:Subject="metadata; BIBLINK; DESIRE; NewsAgent; ROADS;
PRIDE; Cedars; Dublin Core; DC; Z39.50; WHOIS++"
dc:Publisher="UKOLN, University of Bath"
dc:Type="Text"
dc:Format="text/html - 4847 bytes" >
<dc:Creator>
<rdf:Bag rdf:_1="Michael Day”
rdf:_2="Andy Powell" />
</dc:Creator>
<dc:Identifier>
<rdf:Bag rdf:_1="http://purl.org/net/ukoln/metadata"
rdf:_2="http://purl.eu.org/net/ukoln/metadata" />
</dc:Identifier>
</rdf:Description>
</rdf:RDF>
18
Interoperability
Problem of heterogeneous and distributed
resources
• Protocols
– Z39.50
– Whois++ cross-searching (ROADS)
• Metadata conversion
– Nordic Metadata Project
– BIBLINK
• “Layered” approaches
– Arts and Humanities Data Service
19
Other applications
Metadata has potential applications in other
areas relating to the management of digital
resources:
• Digital preservation
• Electronic commerce
• Authentication
• Managing intellectual property rights
• Managing access to resources
• Content rating services
20
UKOLN
UKOLN is funded by the British Library Research and
Innovation Centre (BLRIC), the Joint Information Systems
Committee (JISC) of the UK Higher Education Funding
Councils, as well as by project funding from the JISC’s
Electronic Libraries (eLib) Programme and the European
Union. UKOLN also receives support from the University of
Bath, where it is based.
http://www.ukoln.ac.uk/
More information on UKOLN’s work on metadata can be
found at:
http://www.ukoln.ac.uk/metadata/
21