Description, Discovery, Disclosure Rachel Heery UKOLN University of Bath http://www.ukoln.ac.uk/metadata/ UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Systems Committee.

Download Report

Transcript Description, Discovery, Disclosure Rachel Heery UKOLN University of Bath http://www.ukoln.ac.uk/metadata/ UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Systems Committee.

Description, Discovery,
Disclosure
Rachel Heery <[email protected]>
UKOLN
University of Bath
http://www.ukoln.ac.uk/metadata/
UKOLN is funded by the British Library Research and Innovation Centre,
the Joint Information Systems Committee of the Higher Education Funding
Councils, as well as by project funding from the JISC’s Electronic Libraries
Programme and the European Union.
UKOLN also receives support from the University of Bath where it is based.
1
University of Glasgow
Outline
Description
definitions
Discovery
metadata typology
Dublin Core
metadata creation
subject gateways
Disclosure
metadata registries
2
Definitions (1)
..data about data..
..structured, machine readable data
..data which supports operations carried out
on information objects
3
11/6/2015
University of Glasgow
Definitions (2)
What does metadata describe?
“… machine understandable information
about web resources or other things”
- Tim Berners-Lee (World Wide Web
Consortium)
digital resources
and
physical resources?
4
Metadata supports ...
• resource discovery
• authentication
• data management
• rights management
• digital preservation
• content rating services
5
Discovery..
Metadata for resource
discovery
Provides support for:
• searching
• location
• retrieval (delivery)
• description
May enable:
• Semantic interoperability
• Interworking systems
7
11/6/2015
University of Glasgow
Diversity of formats and protocols
Metadata is structured according to standards:
MARC, EAD, CIMI, TEI ….
Dublin Core
XML, RDF
Metadata is searched using protocols:
Z39.50, whois++, LDAP
Glossary available at
http://www.ukoln.ac.uk/metadata/glossary/
8
A metadata typology
Simple
Band One
9
Rich
Band Two
Band Three
(full text
indexes)
(simple
structured
generic
formats)
(more complex (part of larger
structure,
semantic
domain
framework)
specific)
Proprietary
formats
Proprietary
formats
Dublin Core
ROADS
IAFA/Whois++
templates
FGDC
MARC
GILS
TEI headers
ICPSR
EAD
CIMI
USMARC record
Extract from USMARC record
…….
111 2$aSeminar on Cataloging Digital Documents $d(1994 :
$cUniversity of Virginia
Library and Library of Congress)
245 10$aProceedings of the Seminar on Cataloging Digital
Documents, October 12-14, 1994 $h{computer file}
/$cUniversity of Virginia Library, Charlottesville, and the Library of
Congress.
256 $aComputer data and program.
260 $a{Washington, D.C. :$bLibrary of Congress,$c1994}.
538 $aAccess: Internet. Address:
http://lcweb.loc.gov/catdir/semdigdocs/seminar.html.
500 $aTitle from title screen.
500 $a"Sponsor: Sarah Thomas, director for cataloging, Library of
Congress"--Home page.
………….
10
Extract from: Guidelines for the Use of Field 856. Network Development and MARC Standards Office,
Library of Congress. Revised March 1996
TEI header
<TEIHEADER><FILEDESC>
<TITLESTMT><TITLE>Liberty Lyrics (1895):
a machine-readable transcription</TITLE>
<AUTHOR>Bevington, Louisa Sarah (Guggenberger)
(1845-?)</AUTHOR>
<RESPSTMT><RESP>Transcribed and encoded by </RESP>
<NAME>Felix Jung</NAME></RESPSTMT>
<RESPSTMT><RESP>Edited by </RESP>
<NAME>Perry Willett</NAME></RESPSTMT></TITLESTMT>
<EXTENT>TEI formatted filesize uncompressed&colon;
1426 bytes</EXTENT>
<PUBLICATIONSTMT>
<PUBLISHER>Library Electronic Text Resource Service
(LETRS), Indiana University</PUBLISHER>
<DATE>September 22, 1995</DATE>
<AVAILABILITY><P>&copy; 1995, The Trustees of
Indiana University. Indiana University makes a
claim of copyright only to original contributions
11
ROADS Template
12
Template-Type: SERVICE
Handle: 871473886-23884
Title: Wellcome Unit for the History of Medicine
URI-v1: http://units.ox.ac.uk/
Admin-Email-v1: [email protected]
Publisher-Name-v1: Wellcome Unit for the History of Medicine
Publisher-Postal-v1: 45-47 Banbury Road, Oxford, OX2 6PE
Publisher-City-v1: Oxford
Description: The home page of the Wellcome Unit for the History
of Medicine this site provides information on the Unit,
seminars, ..
Keywords: History of Medicine; Medicine
Language-v1: English
Subject-Descriptor-v1: WZ40 History of Medicine
Subject-Descriptor-Scheme-v1: NLM
Record-Last-Modified-Date: Fri, 10 Oct 1997 19:09:16 +0000
Record-Created-Date: Fri, 10 Oct 1997 19:09:16 +0000
Dublin Core …..
An instance of resource discovery
metadata
What is the Dublin Core?
• widespread consensus
• 15 element metadata set
• simple set for untrained creators
• base set for semantic interoperability
• web-based document-like objects?
14
Dublin Core history
• workshop series - DC-1 (1995) to DC-6
• email discussion lists (Mailbase)
• RFC 2413 - DC core elements
http://src.doc.ic.ac.uk/computing/internet/rfc/rfc2413.txt
• submission to NISO (…ISO)
• DC home page
http://purl.oclc.org/dc/
15
Dublin Core elements
• 15 element core metadata set
•
•
•
•
•
•
•
•
16
Title
Subject
Description
Creator
Publisher
Contributor
Date
Type
•
•
•
•
•
•
•
Format
Identifier
Source
Language
Relation
Coverage
Rights
Dublin Core Qualified
• refine the meaning of elements using ‘type’:
• Relation TYPE=IsPartOf
• associate value with externally defined ‘scheme’:
• Subject SCHEME=LCSH
• Date SCHEME=ISO 8601
• indicate ‘language’ of value
• Title LANGUAGE=en
17
DC in HTML
18
<HTML><HEAD>
<TITLE>UKOLN Home Page</TITLE>
<META NAME="DC.Title” CONTENT="UKOLN: UK Office
for Library and Information Networking">
<META NAME="DC.Subject" CONTENT="national centre,
network information support, library community,
awareness, research, information services, public
library networking, bibliographic management,
distributed library systems, metadata, resource
discovery, conferences, lectures, workshops">
<META NAME="DC.Description" CONTENT="UKOLN is a
national centre for support in network information
management in the library and information
communities. It provides awareness, research and
information services">
<META NAME="DC.Creator" CONTENT=”UKOLN Information
Services Group">
</HEAD>
...
RDF
Resource Description Framework
• Abstract data model
• expressed in XML based syntax
• Provides structure (resource, property,
value)
• Provides common syntax
• Provides means to aggregate
metadata modules
http://www.w3.org/TR/REC-rdf-syntax/
19
DC in RDF
http://www.ukoln.ac.uk/metadata/
DC:Title
The UKOLN
Metadata Home
Page
<?xml:namespace
ns="http://purl.org/dublin_core/schema/" prefix=”DC"?>
<RDF:RDF>
<RDF:Description
RDF:HREF=”http://www.ukoln.ac.uk/metadata/”>
<DC:Title>The UKOLN Metadata Home Page</DC:Title>
</RDF:Description>
</RDF:RDF>
20
DC in XML-RDF
<rdf:RDF
xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#”
xmlns:dc="http://purl.org/dc/elements/1.0/">
<rdf:Description about="http://www.ukoln.ac.uk/metadata/"
dc:Title="UKOLN metadata homepage”
dc:Subject="metadata; BIBLINK; DESIRE; NewsAgent; ROADS;
PRIDE; Cedars; Dublin Core; DC; Z39.50; WHOIS++"
dc:Publisher="UKOLN, University of Bath"
dc:Type="Text"
dc:Format="text/html - 4847 bytes" >
<dc:Creator>
<rdf:Bag rdf:_1="Michael Day”
rdf:_2="Andy Powell" />
</dc:Creator>
<dc:Identifier>
<rdf:Bag rdf:_1="http://purl.org/net/ukoln/metadata"
rdf:_2="http://purl.eu.org/net/ukoln/metadata" />
</dc:Identifier>
</rdf:Description>
</rdf:RDF>
21
Metadata creation..
Who creates metadata?
Resource creators
• author
• webmaster
• institution
Service providers
• search services
• third parties
• commercial publishers
• hand crafted
• robot generated
23
11/6/2015
University of Glasgow
Metadata creation editors
DC dot (UKOLN)
http://www.ukoln.ac.uk/metadata/dcdot/
Reggie (DSTC)
http://metadata.net
Nordic Metadata Template (Nordic Web
Index)
http://www.lub.lu.se/cgi-bin/nmdc.pl
24
Metadata creation robots
Search engine robots
AC/DC UK Academic Directory
- Harvest
http://acdc.hensa.ac.uk/index.shtml
Nordic Web Index
- Combine metadata aware robot
http://nwi.ub2.lu.se/
25
BIBLINK …..
An instance of a metadata creation
system
BIBLINK
Aim
Establish information flow between
electronic publishers and National
Bibliographic Agencies
Achieved by
Establishing workspace
Management of database of records
Searching and downloading of records
http://www.ukoln.ac.uk/metadata/BIBLINK/
27
BIBLINK Workspace
Interfaces
28
Email or HTTP (Web) user interface
Input formats:
• Dublin Core in HTML
• two SGML DTDs
Export views
• Dublin Core in HTML
• two SGML DTDs
• MARC (various flavours)
Administrator Web interface
• user registration, access control,
mapping tables, configuration, ...
Description of BIBLINK
Workspace
Publishers
BIBLINK Workspace
A shared facility for storing and
manipulating BIBLINK
workspace records
Third parties
BIBLINK
Workspace
Administrator
e.g. Identification
agencies - ISBN,
ISSN, etc.
National Bibliographic Agencies
29
Search services..
Search Service models
Geographic coverage:
global service
regional service
Domain coverage
subject
sector
31
Business models:
commercial
institutional
collaborative
centralised
Selection criteria:
quality
language
target audience
Subject gateways...
ROADS
Resource Organisation and Discovery in
Subject-based Services
Web based tools for Subject Services
• SOSIG, ADAM, OMNI, … plus
Manage and search Internet resource
descriptions
• ROADS templates (based on IAFA
templates)
• WHOIS++ directory service protocol
http://www.ukoln.ac.uk/roads/
33
Roads Template Types
In original RFC:
New types:
SERVICE
PROJECT
EVENT
DUBLIN CORE
DOCUMENT
SOFTWARE
DATASET
MAILING LIST
…...
34
COLLECTION
(under
development)
ROADS template
35
Template-Type: SERVICE
Handle: 871473886-23884
Title: Wellcome Unit for the History of Medicine
URI-v1: http://units.ox.ac.uk/
Admin-Email-v1: [email protected]
Publisher-Name-v1: Wellcome Unit for the History of Medicine
Publisher-Postal-v1: 45-47 Banbury Road, Oxford, OX2 6PE
Publisher-City-v1: Oxford
Description: The home page of the Wellcome Unit for the
History of Medicine this site provides information on the
Unit, seminars, ..
Keywords: History of Medicine; Medicine
Language-v1: English
Subject-Descriptor-v1: WZ40 History of Medicine
Subject-Descriptor-Scheme-v1: NLM
Record-Last-Modified-Date: Fri, 10 Oct 1997 19:09:16 +0000
Record-Created-Date: Fri, 10 Oct 1997 19:09:16 +0000
Disclosure: registries
Metadata registries
Objectives
• Definitions
• Mappings
• Information
Users
• Human
• Software
Enable
• Tool creation
• Further automation
37
ROADS registry
RECCI
Template Registry
•Cataloguing Guidelines
•Definitive lists of template
types and elements
•Interoperability guidelines
•Template usage statistics
•Recommendations for
collaborative cataloguing
38
•Rules for content (schemes)
•Change control
Future
Move from projects to services
Wider deployment
Resource discovery:
integration into learning environments
targeted, personalised services
Core metadata sets for …..
Metadata complex digital objects
39
11/6/2015
University of Glasgow