Initiatives to make standard library metadata models and

Download Report

Transcript Initiatives to make standard library metadata models and

Initiatives to make standard library metadata models and structures available to the Semantic Web

Gordon Dunsire, UK [email protected]

Mirna Willer, HR [email protected]

Presented at WLIC Session 149, Sun 15 Aug 2010, Gothenburg, Sweden

Overview

• • I: Initiatives: IFLA initiatives (FRBR, ISBD, etc.) and the relation to external initiatives (RDA, linked data vocabularies like VIAF, LCSH, etc.).

II: Shift of focus: Potential use of these initiatives to support the Semantic Web (parsing existing legacy records to create huge quantities of high quality instance triples, the power of inferencing to create new triples, etc.), and the shift of cataloguing focus from record to statement (triple). WLIC 2010, Gothenburg: Sun 15 August 2010

IFLA initiatives: Background

• IFLA’s initiatives to make standard library metadata models, structures, and vocabularies developed by IFLA available to the Semantic Web, initially stimulated by external projects: – RDA: resource description and access • Data models meeting (London) with Dublin Core Metadata Initiative (DCMI), IEEE Learning Object Metadata (IEEE LOM), W3C Simple Knowledge Organization System (SKOS) WLIC 2010, Gothenburg: Sun 15 August 2010

IFLA initiatives: Standards, models

• • “Functional Requirements” family or “FRBR family of models”: – – FRBR, 1998: Bibliographic Records [data] FRAD, 2009: Authority Data – – FRSAD, [2010]: Subject Authority Data Preliminary work: the FRBR Namespace Project used the testing area of the National Science Digital Library Metadata Registry (NSDL) • Now the Open Metadata Registry ISBD XML in the RDF/XML environment WLIC 2010, Gothenburg: Sun 15 August 2010

WLIC 2010, Gothenburg: Sun 15 August 2010

IFLA initiatives: Infrastructure

• • 2009-2010: the IFLA Namespaces project is developing an administrative and technical infrastructure to support such initiatives and encourage uptake of standards by other agencies.

Basic namespace: “iflastandards.info“ – FRBR: “ http://iflastandards.info/ns/fr/frbr/frbrer/ ” as the basis of the uniform resource identifiers (URIs) of each RDF class and property [entity & relationship] in the FRBR model – /frbr

er

/ to distinguish from FRBRoo [CIDOC CRM] – FRAD: “ http://iflastandards.info/ns/fr/frad/ ” WLIC 2010, Gothenburg: Sun 15 August 2010

IFLA initiatives: FR family

• • Representation of FRBRer model element set is mainly complete – FRAD and FRSAD close behind Representation in Resource Description Framework (RDF) is informing work on combining and consolidating the model family – Also supplies “learning curve” for Semantic Web environment WLIC 2010, Gothenburg: Sun 15 August 2010

IFLA initiatives: ISBD RDF/XML

• • FRBR is a conceptual model built on the E-R methodology which is intrinsically applicable to representation in RDF, while ISBD is a data standard Design of the RDF representation of ISBD involves: – the treatment of aggregated statements in a defined number of elements within the areas; – the treatment of mandatory and optional elements and areas; – the order of areas and elements within an area; – the repeatability of areas and elements; – the treatment of punctuation and its double function.

WLIC 2010, Gothenburg: Sun 15 August 2010

Related standards: RDA

• • DCMI RDA Task Group has three goals: – define RDA modelling entities as an RDF vocabulary of properties and classes; – identify in-line value vocabularies as candidates for publication in RDFS or SKOS [nearly completed]; – develop a Dublin Core Application Profile for RDA based on FRBR and FRAD.

Task Group is using the Open Metadata Registry to develop RDF representations of the RDA vocabularies WLIC 2010, Gothenburg: Sun 15 August 2010

Related standards: Other

• • The National Library of Sweden has developed a methodology for representing MARC21 records in RDF and implemented it for LIBRIS, the Swedish Union Catalogue The Vocabulary Mapping Framework (VMF) project [funded by UK Joint Information Systems Committee (JISC)]: – to develop a major expansion of the RDA/ONIX framework for resource categorization – to create a tool to support the environments automated mapping of vocabularies from metadata standards of use to the JISC community, which includes research, teaching, and learning – CIDOC CRM, FRAD, FRBR, MARC21 and RDA vocabularies [included] & ISBD and UNIMARC [represented] WLIC 2010, Gothenburg: Sun 15 August 2010

Related standards: Vocabularies

• • Instance values from terminologies (subject headings, classification captions and indexes, and thesauri) can be represented in RDF using SKOS: – Library of Congress Subject Headings (LCSH) – Faceted Application of Subject Terminology (FAST), Medical Subject Headings (MESH), Form and genre headings for fiction and drama, and Thesaurus for Graphic Materials (TGM) – French RAMEAU subject headings – DDC Summaries Linked data: [set of] best practices for publishing and connecting structured data on the Web WLIC 2010, Gothenburg: Sun 15 August 2010

Linked data initiatives

• • • UDC Consortium: published a selection of around 2,000 UDC classes in 16 languages online as the UDC summary (RDF version in development) Virtual International Authority File (VIAF): a set of linked controlled vocabularies (authority records of personal names) by national bibliographic agencies ISBD: prescribes vocabulary control for the data in the Area 0 for content form and media type. Terms for the elements (content form, content qualification, and media type) are taken from closed lists WLIC 2010, Gothenburg: Sun 15 August 2010

Linked data from catalogue records

• • • Most linked data initiatives involve vocabularies Linked data can also represent bibliographic descriptions – Huge quantities of high quality bibliographic metadata are locked in catalogue records • UNIMARC, MARC21, EAD, etc.

Use RDF “models” to parse the records into linked data WLIC 2010, Gothenburg: Sun 15 August 2010

Disaggregating the metadata record into single statements Record Record ID 1234

Author

Mirna Willer

Title Date

“UNIMARC format for authority records” “2004” Statements 1234 has Author Mirna Willer 1234

has Title

“UNIMARC format for authority records” 1234

has Date

“2004” WLIC 2010, Gothenburg: Sun 15 August 2010

Representing a single statement as an RDF triple Statement 1234

has Title

“UNIMARC format for authority records” [subject] URI = [property] URI = [object] literal = http://natlibx/ 1234 http://.../???

“UNIMARC format for authority records” Triple natlibx:1234 some:???

“UNIMARC ...” “UNIMARC ...” WLIC 2010, Gothenburg: Sun 15 August 2010

Property URIs has Title = http://.../???

ISBD:has Title Proper http://iflastandards.info/ns/isbd/elements/1004 FRBR:has Title of the Manifestation http://iflastandards.info/ns/fr/frbr/frbrer/3020 FRBR:has Title of the Expression http://iflastandards.info/ns/fr/frbr/frbrer/3008 FRBR:has Title of the Work http://iflastandards.info/ns/fr/frbr/frbrer/3001 WLIC 2010, Gothenburg: Sun 15 August 2010

Inferring new triples from existing triples An RDF property can have a domain (the type of thing the property is applied to) and a range (the type of thing that can be a value of the property) Example: FRBR property “is created by (person)” (frbrer:2009) has domain Work (frbrer:1001) and range Person (frbrer:1005) natliby:456 frbrer:2009 viaf:21647077 natliby:456 rdf:type viaf:21647077 rdf:type frbrer:1001 frbrer:1005 Therefore natliby:456 is a Work, and viaf:21647077 is a Person WLIC 2010, Gothenburg: Sun 15 August 2010

Linking triples Statement 1234 has Author Mirna Willer [object] URI = viaf:29776655 Triple Another natlibx:1234 some:123 viaf:29776655 viaf:29776655 [is Author of] natliby:456 and and natliby:456 frbrer:2009 viaf:21647077 foaf:name viaf:21647077 “Dunsire, Gordon” Q: Who is a co-author with Mirna Willer? A: “Dunsire, Gordon” Q: Are they persons? A: Yes Q: Really? A: VIAF & natliby say so!

WLIC 2010, Gothenburg: Sun 15 August 2010

Metadata focus Shift of focus of metadata creation , maintenance , storage , preservation (by professionals, amateurs, machines) From Record To Statement (s) = triple(s) But metadata display ...

... aggregates triples (from multiple sources) to create records on the fly WLIC 2010, Gothenburg: Sun 15 August 2010

Thank you

• • [email protected]

[email protected]

WLIC 2010, Gothenburg: Sun 15 August 2010