MARC21 Update UK Cataloguing & Indexing Standards Forum 13

Download Report

Transcript MARC21 Update UK Cataloguing & Indexing Standards Forum 13

Publishing
the British National Bibliography
as
Linked Open Data
Corine Deliot
Metadata Standards Analyst
British Library
Linked Data: what cataloguers need to know
London, 20 February 2015
© The British Library Board 2014
Overview
• Motivations and approach
• The modelling process and the data model
• Technical process: from MARC 21 to RDF, linking to external datasets
• Outcomes and dissemination
• Plans for future developments
• Use of the BNB data
• Challenges
• Benefits
www.bl.uk
2
Motivations
• Publishing our data for others to re-use
• Looking beyond library audiences
• Taking part in the Linked Data conversation
www.bl.uk
3
How?
• Pragmatic, bottom-up approach
• Using existing staff
• Building on existing skills
• Using existing tools as much as possible
• But training and mentoring from external provider
www.bl.uk
4
Why BNB?
• General bibliography - not a
unique institutional catalogue
• Consistent format - over 60 years
• Size & range of content - 3
million records on all subjects in many
languages
• Control of metadata –
publishable as CC0.
www.bl.uk
© Waldir/ Wikimedia Commons/ CC BY-SA-3.0
Usage terms: http://creativecommons.org/licenses/by-sa/3.0/
5
The modelling process (I)
• identify our objects of interest, i.e. what does the
MARC record says about “things in the world”
 e.g. Bibliographic resources, people, organizations,
places, subjects, etc.
• Assign URIs to identify these objects of interests
www.bl.uk
6
URIs: Things to think about
• Create our own URIs or use existing ones?
 e.g. http://viaf.org/viaf/96994048
http://id.loc.gov/authorities/names/n78095332
• Create opaque or transparent URIs?
• e.g. http://viaf.org/viaf/96994048 or
http://dbpedia.org/resource/William_Shakespeare
• What pattern?
 URI pattern guidance from the UK Cabinet Office
“Designing URI Sets for the UK Public Sector”
• Create valid, i.e. syntax conformant URIs
www.bl.uk
7
URI patterns
• http://bnb.data.bl.uk/id/resource/{control-number}
• http://bnb.data.bl.uk/id/resource/{BNB-number}
• http://bnb.data.bl.uk/id/person/{person-name}
• http://bnb.data.bl.uk/id/organization/{organization-name}
• http://bnb.data.bl.uk/id/concept/lcsh/{topic}
• http://bnb.data.bl.uk/id/concept/ddc/{editionnumber}/{dewey-number}
www.bl.uk
8
URI patterns
• http://bnb.data.bl.uk/id/resource/008043929
• http://bnb.data.bl.uk/doc/resource/008043929
• http://bnb.data.bl.uk/doc/resource/008043929.rdf
• http://bnb.data.bl.uk/doc/resource/008043929.ttl
• http://bnb.data.bl.uk/doc/resource/008043929.json
• http://bnb.data.bl.uk/doc/resource/008043929.html
• http://bnb.data.bl.uk/doc/resource/008043929.xml
www.bl.uk
9
The modelling process (II)
• Describe these objects of interest, i.e. use classes
• and how they relate to each other, i.e. use properties
Use classes and properties from existing RDF
vocabularies
Define our own classes and properties when required;
documented in the British Library Terms RDF schema
www.bl.uk
10
RDF Vocabularies
• Bibliographic Resource




Dublin Core
Bibliographic Ontology
ISBD
British Library Terms
• Event
 Event Ontology
 British Library Terms
• RDF
• RDF Schema
• OWL
www.bl.uk
• Person/Organization
 FOAF: Friend of a Friend
 Bio: a Vocabulary for
Biographical Information
 Org: an Organisation
Ontology
 RDA
 MADS/RDF
• Place
 WGS84 Geo Positioning
• Concept
 SKOS
 British Library Terms
11
The British Library Terms RDF Schema
@prefix blt:<http://www.bl.uk/schemas/bibliographic/blterms#> .
• Existing property “not quite right” (e.g. not granular enough)
 e.g. dcterms:identifier vs blt:bnb
www.bl.uk
12
The British Library Terms RDF Schema
@prefix blt:<http://www.bl.uk/schemas/bibliographic/blterms#> .
Property or class required by specific feature of the model
 e.g. blt:publication and blt:PublicationEvent
(rdfs:subClassOf event:Event)
www.bl.uk
13
The British Library Terms RDF Schema
@prefix blt:<http://www.bl.uk/schemas/bibliographic/blterms#> .
For pragmatic reasons, e.g. facilitate searching and navigating
through the graph
 e.g. blt:TopicLCSH and blt:TopicDDC
 e.g. blt:hasCreated owl:inverseOf dcterms:creator
www.bl.uk
14
The BNB data model - Books
www.bl.uk
http://www.bl.uk/bibliographic/pdfs/bldatamodelbook.pdf
15
Data Model Features (I): the Bibliographic
Resource
www.bl.uk
16
Data Model Features (II): Publication as an event
@prefix dc:<http://purl.org/dc/elements/1.1/> .
@prefix dcterms:<http://purl.org/dc/terms> .
“Publisher” ;
<BibResource> dc:publisher
dcterms:issued
“Date” ;
?:placeOfPublication
“Place” .
Usual
approach
@prefix blt:<http://www.bl.uk/schemas/bibliographic/blterms#> .
@prefix event:<http://purl.org/NET/c4dm/event.owl#> .
<BibResource>
blt:publication
<PublicationEvent> event:place
event:agent
event:time
www.bl.uk
<PublicationEvent> .
<Place> ;
Event-based
<Publisher> ;
approach
<Year> .
17
Data model features (III)
• Birth and death are modelled as biographical events
• extensive use of foaf:focus to relate “things in the world”
(e.g. people, organizations, places) to their SKOS concepts.
 e.g. “London”, the capital of England and the UK as a single
“thing in the world” may be the “focus” of multiple concepts
belonging to different concept schemes, e.g. thesauri (LCSH,
Rameau, etc.)
<Thing-as-Concept> foaf:focus <Thing in the World> .
http://efoundations.typepad.com/efoundations/2011/09/thing
s-their-conceptualisations-skos-foaffocus-modellingchoices.html by Pete Johnston
www.bl.uk
18
MARC to RDF Conversion Workflow
Full BNB
MARC21
File
Select records
Convert to precomposed UTF-8
Normalise for
improved
matching &
transforms
MARC
Pre-Processing
Create BL URIs
and add external
URIs by
matching
Process
• Selection
• Character set conversion
• Pre-processing
• URI generation
• Data transformation
• Create & load triples
• Produce VoiD descriptions
Transform to
RDF/XML using
XSLT
Tools
• Catalogue Bridge Utilities
• MARC Global/MARC Report
http://www.marcofquality.com/
• Jena Eyeball
http://jena.sourceforge.net/Eyeball/
www.bl.uk
BNB RDF/
XML file
Generate RDF
Triple Dump
Load to BL
Downloads page
Load to Linked
Data Platform
19
Linking to external sources (I)
To give our data broader
context we linked to:
• General resources:
•
•
•
•
GeoNames
Lexvo
ISNI
RDF Book
Mashup
• Library resources:
•
•
•
•
www.bl.uk
LCSH
VIAF
Dewey.info
MARC language
and country
codes
20
Linking to external sources (II)
Techniques included:
• Automatic generation from
record data
• Auto text match with linked data
dumps
• Crosswalk matching for coded
data
www.bl.uk
© Silverspoon/ Wikimedia Commons/ CC BY-SA-3.0
Usage terms: http://creativecommons.org/licenses/bysa/3.0/
21
Outcomes
• Two datasets – Books and Serials - and their VoID descriptions,
accessible at:
• BNB Linked data platform: http://bnb.data.bl.uk
• SPARQL endpoint: http://bnb.data.bl.uk/sparql
• SPARQL editor: http://bnb.data.bl.uk/flint-sparql
• Bulk downloads: http://www.bl.uk/bibliographic/download.html
 Updated monthly
 Serializations available:
RDF/XML, N-Triples
www.bl.uk
“Linking Open Data cloud diagram, by Richard Cyganiak and
Anja Jentzsch. http://lod-cloud.net/”
Usage terms: http://creativecommons.org/licenses/by-sa/3.0/
22
www.bl.uk
http://bnb.data.bl.uk
23
Platform change
• 2011 - initial Talis platform
• 2013 – data migration to TSO platform
http://www.tso.co.uk/our-expertise/technology/openup-platform
 Tendering process
 Migration of data and services over a couple of months
www.bl.uk
24
www.bl.uk
25
www.bl.uk
http://bnb.data.bl.uk/flint-sparql
26
www.bl.uk
http://www.bl.uk/bibliographic/download.html
27
Dissemination
• British Library Terms RDF schema declared in LOV (Linked
Open Vocabularies) http://lov.okfn.org
• Linked Open BNB on data.gov.uk
– 5 * Openness rating
– included in the National Information Infrastructure
http://data.gov.uk/dataset/the-linked-open-british-nationalbibliography
• Open Data Institute certification
– Pilot level
– 92% Data Quality indicator as part of Heritage & Culture
Challenge evaluation
www.bl.uk
28
Plans for Future Developments
• Refine and extend the model
• Investigate frbr-ization
• Link to other external sources
• e.g. DBPedia/Wikidata
• Collaborate with other national libraries
• Expand scope beyond current BNB, e.g. the British
Catalogue of Music.
• Improve developer support
www.bl.uk
29
Use of the BNB data
• Statistics
 e.g. Number of hits on the SPARQL endpoint
 e.g. Number of downloads on the BL webpage
 e.g. Web logs analysis reports
• BNB data used in pilot projects
 e.g. Linked Open BNB data used as test data for a semantic search
demonstrator.
 e.g. data provided to Microsoft to assist in their research into linking
structured data.
 BNB data used in tutorials
 e.g. http://www.meanboyfriend.com/overdue_ideas/2014/10/using-an-apihands-on-exercise/ - Owen Stephens
www.bl.uk
30
Use of the BNB data
 Anecdotal evidence
However, use is quite difficult to assess; part and parcel of the data
being open and available for all to use
www.bl.uk
31
Challenges
Converting MARC data into RDF!
• Publication event approach: transforming transcribed text into data
• URI creation from string
 may result in duplication
 changes over time may also produce duplication.
• Legacy data issues
 e.g. inconsistency of the data
 e.g. cataloguers using inadequate input tools for diacritics
• This was (relatively) new, nobody had all the answers
www.bl.uk
32
Benefits of Linked Open Data
• We have learnt a lot about the practical aspects of working
with linked data.
• The data model is influencing other implementations
 Re-used by Danish Bibliography Centre
• LOD raised the profile of Collection Metadata internally and
the Library’s profile externally
• LOD helped us focus our legacy data enhancement
activities
www.bl.uk
33
For further information
http://bnb.data.bl.uk
http://www.bl.uk/bibliographic/datafree.html
Thank you.
Questions?
[email protected]
http://twitter.com/#!/BLMetadata
www.bl.uk
34