Metadata and electronic information Michael Day UKOLN: The UK Office for Library and Information Networking, University of Bath http://www.ukoln.ac.uk/ [email protected].
Download ReportTranscript Metadata and electronic information Michael Day UKOLN: The UK Office for Library and Information Networking, University of Bath http://www.ukoln.ac.uk/ [email protected].
Metadata and electronic information Michael Day UKOLN: The UK Office for Library and Information Networking, University of Bath http://www.ukoln.ac.uk/ [email protected] Metadata and electronic information Michael Day UKOLN: The UK Office for Library and Information Networking, University of Bath Final CIRCE Workshop, The Council House, Birmingham, 15 January 1999. Presentation Outline • Metadata - some definitions • Metadata formats • The resource discovery context – Dublin Core – Resource Description Framework (RDF) • Interoperability • Other metadata applications 3 Metadata: definitions (1) Metadata = “data about data” “… the Internet-age term for structured data about data” - Joint NSF-EU Working Group on Metadata (1998) “… structured data about data that imposes order on a disordered information universe” - Carl Lagoze (Cornell University) 4 Metadata: definitions (2) “… machine understandable information about web resources or other things” - Tim Berners-Lee (World Wide Web Consortium) Roles: • Provides information about resources • Supports operations carried out on information objects 5 Metadata formats Diversity of metadata formats and frameworks, e.g.: • Dublin Core • EAD, CIMI, TEI • PICS, RDF • MARC • GILS, FGDC • ROADS http://www.ukoln.ac.uk/metadata/glossary/ 6 Some examples (1) USMARC: 245 00 Wordnews online $h [computer file]. 246 3 World news online 256 Computer online service. 260 Washington, D.C. : $b Worldnews Online, $c [1995538 Mode of access: Internet. 500 Title from title frame. 520 “WorldNews OnLine is a service … “ 650 0 Newspapers $x Databases. 856 7 $u http://worldnews.net $2 http Extract from: Nancy B. Olson, ed., Cataloguing Internet resources: a manual and practical guide, 2nd ed. Dublin, Ohio: OCLC Online Computer Library Center, 1997. http://www.purl.org/oclc/cataloging-internet 7 Some examples (2) TEI header: <teiHeader type="aacr2"><fileDesc><titleStmt> <title type="245">Rubaiyat of Omar Khayyam : the astronomer poet of Persia / rendered into English verse by Edward Fitzgerald ; with drawings by Florence Lundborg</title> <title type="gmd">[electronic resource]</title> <author>Omar Khayyam</author> [...] <respStmt> <resp>Creation of machine-readable version:</resp> <name>Stephen Ramsay, Electronic Text Center</name> <resp>Conversion to TEI.2-conformant markup:</resp> <name>University of Virginia Library Electronic Text Center </name> </respStmt> [...] From: University of Virginia Library, Cataloging Services Department, Cataloging Procedures Manual, Chapter XII. Charlottesville, Va.: University of Virginia Library, 1996-98. http://www.lib.virginia.edu/cataloging/manual/chapters/chapxiib.html 8 Some examples (3) IAFA template: Template-Type: SERVICE Handle: 871473886-23884 Title: Wellcome Unit for the History of Medicine URI-v1: http://units.ox.ac.uk/cgi-bin/safeperl/wuhminfo/p?home.html Admin-Email-v1: [email protected] Publisher-Name-v1: Wellcome Unit for the History of Medicine Publisher-Postal-v1: 45-47 Banbury Road, Oxford, OX2 6PE Publisher-City-v1: Oxford Description: The home page of the Wellcome Unit for the History of Medicine, a sub-department of the Modern History Faculty of the University of Oxford, this site provides information on the Unit, seminars, conferences and workshops, research interests, staff, current projects, and the graduate programmes. Keywords: History of Medicine; Medicine Language-v1: English Subject-Descriptor-v1: WZ40 History of Medicine Subject-Descriptor-Scheme-v1: NLM Record-Last-Modified-Date: Fri, 10 Oct 1997 19:09:16 +0000 Record-Last-Modified-Email: [email protected] Record-Created-Date: Fri, 10 Oct 1997 19:09:16 +0000 Record-Created-Email: [email protected] 9 A metadata typology Simple Band One Rich Band Two Band Three (full text indexes) (simple structured generic formats) (more complex (part of larger structure, semantic domain framework) specific) Proprietary formats Proprietary formats Dublin Core ROADS IAFA/Whois++ templates FGDC MARC TEI headers ICPSR EAD CIMI Adapted from: Lorcan Dempsey and Rachel Heery, “Metadata: a current view of practice and issues”, Journal of Documentation, vol. 54, no.2, March 1998, 10 pp. 145-172. Resource discovery Approaches to Internet resource discovery: • Robot-based global indexes, e.g. Alta Vista, Lycos, etc. • Subject gateways - e.g. ROADS-based services • Library catalogues, e.g. using USMARC 856 field - InterCat project (OCLC), BIBLINK • Need for “core” metadata for simple resource discovery and interoperability - Dublin Core initiative 11 Dublin Core (1) International initiative to define a core set of metadata elements for resource discovery on the Internet • Six DC workshops (to date): • • • • • • • DC-1 (Dublin, Ohio) - 1995 DC-2 (Warwick) - 1996 DC-3 (Dublin, Ohio) - 1996 DC-4 (Canberra) - 1997 DC-5 (Helsinki) - 1997 DC-6 (Washington, D.C.) - 1998 DC-7 (Frankfurt/AM) - 1999 http://purl.oclc.org/dc 12 Dublin Core (2) 15 Elements: • • • • • • • • Title Subject Description Creator Publisher Contributor Date Type • • • • • • • Format Identifier Source Language Relation Coverage Rights Core elements defined in RFC 2413: http://src.doc.ic.ac.uk/computing/internet/rfc/rfc2413.txt 13 Dublin Core (3) DC Qualifiers: • TYPE - refines the meaning of elements: – Relation TYPE=IsPartOf • SCHEME - associates the value with an externally defined ‘scheme’: – Subject SCHEME=DDC – Date SCHEME=ISO 8601 • LANGUAGE - indicates the language of the value – Title LANGUAGE=en 14 Dublin Core (4) Syntax issues: • Simple DC can be embedded into HTML Web pages – Limited functionality • Web moving to Extensible Markup Language (XML) • Resource Description Framework – RDF … described as “an architecture for metadata on the Web” 15 RDF Resource Description Framework • World Wide Web Consortium (W3C) • Data model and XML-based syntax • An implementation of the conceptual ‘Warwick Framework’ • Modular interoperability • Useful for aggregating the different metadata types required for managing digital information over time http://www.w3.org/RDF/ 16 DC in HTML Example of DC embedded in HTML: <HTML> <HEAD> <TITLE>UKOLN Home Page</TITLE> <META NAME="DC.Title” CONTENT="UKOLN: UK Office for Library and Information Networking"> <META NAME="DC.Subject" CONTENT="national centre, network information support, library community, awareness, research, information services, public library networking, bibliographic management, distributed library systems, metadata, resource discovery, conferences, lectures, workshops"> <META NAME="DC.Description" CONTENT="UKOLN is a national centre for support in network information management in the library and information communities. It provides awareness, research and information services"> <META NAME="DC.Creator" CONTENT=”UKOLN Information Services Group"> </HEAD> <BODY> [...] 17 DC in XML-RDF <rdf:RDF xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#” xmlns:dc="http://purl.org/dc/elements/1.0/"> <rdf:Description about="http://www.ukoln.ac.uk/metadata/" dc:Title="UKOLN metadata homepage” dc:Subject="metadata; BIBLINK; DESIRE; NewsAgent; ROADS; PRIDE; Cedars; Dublin Core; DC; Z39.50; WHOIS++" dc:Publisher="UKOLN, University of Bath" dc:Type="Text" dc:Format="text/html - 4847 bytes" > <dc:Creator> <rdf:Bag rdf:_1="Michael Day” rdf:_2="Andy Powell" /> </dc:Creator> <dc:Identifier> <rdf:Bag rdf:_1="http://purl.org/net/ukoln/metadata" rdf:_2="http://purl.eu.org/net/ukoln/metadata" /> </dc:Identifier> </rdf:Description> </rdf:RDF> 18 Interoperability Problem of heterogeneous and distributed resources • Protocols – Z39.50 – Whois++ cross-searching (ROADS) • Metadata conversion – Nordic Metadata Project – BIBLINK • “Layered” approaches – Arts and Humanities Data Service 19 Other applications Metadata has potential applications in other areas relating to the management of digital resources: • Digital preservation • Electronic commerce • Authentication • Managing intellectual property rights • Managing access to resources • Content rating services 20 UKOLN UKOLN is funded by the British Library Research and Innovation Centre (BLRIC), the Joint Information Systems Committee (JISC) of the UK Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries (eLib) Programme and the European Union. UKOLN also receives support from the University of Bath, where it is based. http://www.ukoln.ac.uk/ More information on UKOLN’s work on metadata can be found at: http://www.ukoln.ac.uk/metadata/ 21