UNIMARC and linked data

Download Report

Transcript UNIMARC and linked data

UNIMARC and linked data
Gordon Dunsire and Mirna Willer
Presented at Session 187 (Advancing UNIMARC: alignment
and innovation) of the World Library and Information
Congress : 77th IFLA General Conference and Assembly,
13-18 August 2011, San Juan, Puerto Rico
Overview
• Background
• Linked data and the Semantic Web
• Methods and issues in representing UNIMARC
for the Semantic Web
• Recommendations
Background
• Representation of IFLA standards for use in the
Semantic Web
– Work of the FRBR Namespaces project and IFLA
Namespaces Task Group
– Work of the ISBD/XML Study Group
• Included a feasibility study of representation of UNIMARC
• Representations allow legacy catalogue records
to be published as linked data using RDF
• Branding IFLA standards for authority & trust
– Semantic Web lets “Anyone say Anything about Any
resource”
Linked data and RDF
• Resource Description Framework (RDF)
• Designed for machine-processing of metadata
at global scale (Semantic Web)
– 24/7/365
– Trillions of operations per second
• Everything must be dis-ambiguated
– Machines are dumb
• A simple approach helps!
– Machine-readable identifiers
RDF triple
• Metadata expressed as “atomic” statements
– A simple, single, irreducible statement
• The title of this book is “Cataloguing is fun!”
• Constructed in 3 parts
– “Triple”
• The title of this book is “Cataloguing is fun!”
– Subject of the statement = Subject: This book
– Nature of the statement = Predicate: has title
– Value of the statement = Object: “Cataloguing is fun!”
• This book – has title – “Cataloguing is fun!”
– subject – predicate - object
Machine-readable identifiers
• Uniform Resource Identifier (URI)
– Can be any unique combination of numbers and
letters
• No intrinsic meaning; it’s just an identifier
• RDF requires the subject and predicate of triple
to be URIs
– Object can be a URI, or a literal string (“Cataloguing is
fun!”)
• URIs can be matched by machine to link triples
together
UNIMARC element identifiers
Element: Number (ISBN)
Tag: 010 1st ind.: b 2nd
(Unique
ind.: bin element
Subfield:set)
a
Coded Information Block: Target audience code
100bba
(Unique
in element
set)
Character
position:
17-19
Target audience vocabulary: children, ages 9-14
Code: d
(Unique in vocabulary)
Vocabularies and Element sets
• Controlled terminologies represented as
vocabularies
• UNIMARC entities, attributes, and
relationships form an element set
– Attributes and relationships represented as
properties/predicates
– Entities represented in RDF as classes
• But only 1 entity in UNIMARC-B (Resource)
• ISBD already has an equivalent class for Resource
UNIMARC and ISBD properties
• Element identifier/URI: unimarcb:P205bbb
– Label (English): (has) issue statement
• Equivalent ISBD URI: isbd:P1011
– Label (English): has additional edition statement
• The meaning is the same, but the identifiers
and labels are different
• unimarcb:P205bbb same as isbd:P1011 (in
RDF)
– Or use isbd:P1011 instead of unimarcb:P205bbb
Translations
• The same identifier is used for translated
elements (captions, definitions, etc.) and
vocabularies (preferred terms, definitions,
etc.)
• E.g. Vocabulary of 116bba0 = Coded data for
graphics: Specific material designation
Graphics SMD translation example
•
•
•
•
•
•
Term identifier/URI: namespace/b
Notation: b
Preferred label (English): drawing
Preferred label (Italian): disegno
Preferred label (Portuguese): desenho
Definition (English): An original visual
representation (other than a print or painting)
...
Triples from UNIMARC records
• Create or obtain URI for the Resource
described
• Obtain URI for UNIMARC tag/subfield
– Direct from tag/indicators/subfield encoding
• Obtain URI of value of subfield, or use a literal
value
– URI from vocabulary or UNIMARC Authority
• Publish triple
Recommendations: Foundation
• Approve the method of identifying UNIMARC
elements and vocabularies.
• Approve the pattern for namespaces for
UNIMARC/B and /A elements and vocabularies.
• Decide on initial creation and maintenance of
UNIMARC elements and vocabularies in the Open
Metadata Registry (OMR).
• Decide between re-use of existing ISBD
namespaces for UNIMARC/B or representing all
UNIMARC/B elements and link to existing ISBD
classes and properties as appropriate.
18/07/2015
Dunsire & Willer. UNIMARC and Linked
Data, IFLA 2011 San Jose, Puerto Rico
13
Recommendations: Foundation
• Investigate further the re-use of existing
FRAD/FRBR and FRSAD namespaces or
representing all UNIMARC/A elements and link to
existing FRAD/FRBR/FRSAD classes/subclasses
and properties as appropriate.
• Investigate further the appropriate classes for
UNIMARC/A in relation to UNIMARC/B,
FRAD/FRBR and FRSAD.
• Support and promote the translation of
UNIMARC classes and properties in national
languages.
18/07/2015
Dunsire & Willer. UNIMARC and Linked
Data, IFLA 2011 San Jose, Puerto Rico
14
Recommendations: Application
• Discuss and consider the requirements for
Application Profiles for UNIMARC.
• Check and verify the availability of SKOS
representations of other external vocabularies
used in UNIMARC.
• Investigate and verify internal UNIMARC
vocabularies for suitable SKOS representations;
consider approaching the owners of external
vocabularies to liaise on developing SKOS
representations.
18/07/2015
Dunsire & Willer. UNIMARC and Linked
Data, IFLA 2011 San Jose, Puerto Rico
15
Recommendations: Application
• Investigate further the “combinatorial explosion” of
UNIMARC properties; determine if some combinations
are invalid and do not require a separate property.
• Consider and approve the re-use of aggregated ISBD
elements which are represented in RDF using Syntax
encoding schemes (SES), which will avoid the need for
developing UNIMARC equivalents.
• Monitor relevant MARC21 developments, especially
the Bibliographic Framework Transition Initiative
recently announcement by the Library of Congress.
18/07/2015
Dunsire & Willer. UNIMARC and Linked
Data, IFLA 2011 San Jose, Puerto Rico
16
Thank you
• [email protected][email protected]