Linked Library Data Modeling Metadata for the [Semantic] Web Presented 2010-11-19 Columbia University Digital Library Seminar Series Corey A Harper.

Download Report

Transcript Linked Library Data Modeling Metadata for the [Semantic] Web Presented 2010-11-19 Columbia University Digital Library Seminar Series Corey A Harper.

Linked Library
Data
Modeling Metadata for
the [Semantic] Web
Presented 2010-11-19
Columbia University
Digital Library Seminar Series
Corey A Harper
Topical Overview
• Semantic Web Intro
• Linked Open Data
– Graphs: Entity – Attribute – Value
– A Few Examples
• Library Data
2010-11-19
Harper - Linked Library Data - Columbia University
2
Topical Overview (cont)
• Linked Library Data
– SKOS and Authority Control
– FRBR and Bibliographic Data
– National Libraries
• Resource Description and Access
(RDA)
• Dublin Core Metadata Initiative
2010-11-19
Harper - Linked Library Data - Columbia University
3
Semantic Web
• TBL’s original vision
– “Weaving the Web” – 1999
• Then: Focus on Machine Reasoning
– Scientific American Article
• Now: Focus on things & links
– Reasoning becoming lower level
2010-11-19
Harper - Linked Library Data - Columbia University
4
Semantic Web
• Originally:
– Metadata standard built on XML
– Metadata about “Web” things
• Eventually:
– Metadata about all things
– Metadata about relationships
between things
2010-11-19
Harper - Linked Library Data - Columbia University
5
Semantic Web Terminology
•
•
•
•
•
Resource: Any thing
Class: Abstraction of a type of thing
Individual: An instance of a class
Property: An attribute of an individual
Ontology: A domain specific collection of
classes and properties
• Statement/Triple:
– A Resource (subject) - Nodes
– A Property (predicate) - Arcs
– A Value (object) - Nodes
2010-11-19
Harper - Linked Library Data - Columbia University
6
Semantic Web Terminology
• Graphs: Representations of statements about
resources
• Nodes: The Subjects and Objects in a Graph
• Arcs: The Predicates in a Graph
• Literals:
“Objects” represented as strings (constant
values) rather than things (URI References)
• Domains and Ranges: Constraints on Nodes
• For Example…
2010-11-19
Harper - Linked Library Data - Columbia University
7
2010-11-19
Harper - Linked Library Data - Columbia University
8
RDF
• Resource Description Framework
• Formally Begun in 1999
• Ideas from 1995
• Finalized in 2004
• Frighteningly complex at times…
– “Directed Labeled Graphs”
2010-11-19
Harper - Linked Library Data - Columbia University
9
SemWeb Value Proposition
• Formally Modeled (Meta) Data
• Formal Semantics Declaration
• Increased Granularity compared to
record-based Metadata
• Improved Interoperability
2010-11-19
Harper - Linked Library Data - Columbia University
10
“The vast bulk of data to be on the
Semantic Web is already sitting in
databases … all that is needed [is] to
write an adapter to convert a particular
format into RDF and all the content in
that format is available.”
-Tim Berners-Lee
in an interview with the
Consortium Standards Bulletin
2010-11-19
Harper - Linked Library Data - Columbia University
11
Linked Open Data
• Use URIs as names for things
• Use HTTP URIs so that people can look
up those names.
• When someone looks up a URI, provide
useful information.
• Include links to other URIs. so that they
can discover more things.
http://www.w3.org/DesignIssues/LinkedData.html
2010-11-19
Harper - Linked Library Data - Columbia University
12
2010-11-19
Harper - Linked Library Data - Columbia University
13
2010-11-19
Harper - Linked Library Data - Columbia University
14
2010-11-19
Harper - Linked Library Data - Columbia University
15
Linked Data Cloud
• Automated generation
– Comprehensive Knowledge Archive
Network (CKAN)
– Vocabulary of Interlinked Datasets
(voiD)
– Basically, catalog your metadata!
• Recent criticism: data quality
2010-11-19
Harper - Linked Library Data - Columbia University
16
Data in the Cloud
• Hubs in the May 2008 Version:
– FOAF
– DBPedia
–Geonames
–MusicBrains
• Myriad Sources coming online:
–
–
–
–
–
Thompson Reuters
New York Times
British Broadcasting Corporation
Google and Facebook
More and More Library Data
2010-11-19
Harper - Linked Library Data - Columbia University
17
DBpedia
• Structured Wikipedia Data
• Genres, Influences, External Links
• Multi-lingual / Multi-script labels
• Rich Semantics
• Many linkages to other datasets
2010-11-19
Harper - Linked Library Data - Columbia University
18
DBpedia
• 3.4 Million “things” described
• Ontology based on “infoboxes”
– 1.5 million things classified
• Approx. 50,000 “Properties”
– Approx. 1,200 defined in ontology
• Brief Example
2010-11-19
Harper - Linked Library Data - Columbia University
19
Domain Modeling
• Starting from application / goal /
function
“To guide and evaluate our designs, we need
objective criteria that are founded on the
purpose of the resulting artifact, rather than
based on a priori notions of naturalness or
Truth.” – Gruber, 1993
• Does this apply to Libraries? FRBRer?
2010-11-19
Harper - Linked Library Data - Columbia University
20
DBPedia Model
•
•
•
•
Partial basis in data entry conventions
InfoBox’s, and InfoBox Templates
Metadata Entry Format
Partial source of Ontology
– Class Structure
– Vocabulary Design
2010-11-19
Harper - Linked Library Data - Columbia University
21
DBpedia
• 3.4 Million “things” described
• Ontology based on “infoboxes”
– 1.5 million things classified
– http://wiki.dbpedia.org/Ontology
• Approx. 50,000 “Properties”
– Approx. 1,200 defined in ontology
2010-11-19
Harper - Linked Library Data - Columbia University
22
2010-11-19
Harper - Linked Library Data - Columbia University
23
2010-11-19
Harper - Linked Library Data - Columbia University
24
More Examples
• British Broadcasting Corporation
– Programmes, Music, Wildlife
• Google Refine
• Data.gov and data.gov.uk
• NY Times
2010-11-19
Harper - Linked Library Data - Columbia University
26
What *things* are
in our data???
2010-11-19
Harper - Linked Library Data - Columbia University
27
…Library
data is
extremely
complicated
2010-11-19
Harper - Linked Library Data - Columbia University
28
Bibliographic Data
• Rich stores of MARC, MODS, &c.
• Robust Controlled Vocabularies
– Subject Heading lists
– Code lists
– Thesauri
• Emerging data model in FR*
2010-11-19
Harper - Linked Library Data - Columbia University
29
Bibliographic Vocabs
• Bibliographic Ontology
– Zotero, Omeka, EPrints and Others
• FRBR – unofficial
– And now Official (Thank you IFLA!)
• ISBD
2010-11-19
Harper - Linked Library Data - Columbia University
30
Library Authority Data
“Include links to other URIs. so that they
can discover more things.”
Short of providing and linking to URIs,
this *is* authority data.
This is what our authority files are for.
2010-11-19
Harper - Linked Library Data - Columbia University
31
Library Controlled
Vocabularies: Benefits
• Reputation - Trusted Tradition
• Mature - Time tested and carefully
developed
• General & Comprehensive - Cover
large knowledge spaces
2010-11-19
Harper - Linked Library Data - Columbia University
32
SKOS
• Simple Knowledge Organization
System
• Properties and Classes for
describing Controlled Vocabulary
skos:primaryTopic
RDF Page
2010-11-19
skos:person
Harper - Linked Library Data - Columbia University
33
LCSH in Dublin Core
• Encoding Scheme for DC Subject
• No easy way to draw on equivelent
terms and cross-references
• Abstract Model, RDF and SKOS
could enable applications to make
use of the whole vocabulary
2010-11-19
Harper - Linked Library Data - Columbia University
34
LCSH as a Web Service!
• Uses principles of linked data
• lcsh.info -> id.loc.gov
• People noticed when taken down
• Links to French Subject Headings
• URIs for Literal String lookup
• http://id.loc.gov/authorities/label/World Wide Web
2010-11-19
Harper - Linked Library Data - Columbia University
35
2010-11-19
Harper - Linked Library Data - Columbia University
36
Other Vocabularies
•
•
•
•
•
•
•
Thesaurus for Economics
French Subject Headings
Swedish Subject Headings
IconClass (not on web yet)
OCLC Terminology Services
Dewey Decimal Classification
Virtual International Authority File
2010-11-19
Harper - Linked Library Data - Columbia University
37
Linked Library Data
•
•
•
•
VIAF, LCSH, MARC Codes
Open Library, XC, Kualli OLE
Library of Congress, OCLC
Hungarian, German, British, Swedish
National Libraries
• Formalized Efforts: W3C, IFLA & RDA
2010-11-19
Harper - Linked Library Data - Columbia University
38
Image courtesy of Martin Malmstem
http://blog.libris.kb.se/semweb/?p=7
Kungliga Biblioteket
2010-11-19
Harper - Linked Library Data - Columbia University
39
National Széchényi Library
“Our RDFDC, FAOF and SKOS
statements are linked together. Our
name authority is matched with the
DBPedia name files and URI aliases are
handled as owl:sameAs statements.” Adam Horvath
2010-11-19
Harper - Linked Library Data - Columbia University
40
W3C LLD XG
• “Incubator Group”
• Membership:
– Researchers, Consultants, Librarians
– National Libraries: Germany, France,
LoC, Sweden
– OCLC & IFLA
2010-11-19
Harper - Linked Library Data - Columbia University
41
2010-11-19
Harper - Linked Library Data - Columbia University
42
W3C LLD XG Goals
• Collecting, Curating and Clustering
over 50 Use Cases
• Mining use cases for functional
requirements and design patterns
• Recommendations to W3C
– Should lead to Working Groups
2010-11-19
Harper - Linked Library Data - Columbia University
43
• RDA elements, roles and vocabularies have
been provisionally registered
• IFLA FRBRer and ISBD elements and
vocabularies have been officially registered
• Discussions about long term maintenance
of both RDA and the vocabularies
• Effort to create multi-language RDA
Vocabularies
2010-11-19
Harper - Linked Library Data - Columbia University
44
RDA Slides Adapted from Diane Hillmann
RDA Development
RDA Elements Listing
334!
2010-11-19
Harper - Linked Library Data - Columbia University
45
RDA Elements Listing
Base material
334!
2010-11-19
Harper - Linked Library Data - Columbia University
46
Detail: Base Material
2010-11-19
Harper - Linked Library Data - Columbia University
47
Detail: Base Material
URI
2010-11-19
Harper - Linked Library Data - Columbia University
48
RDA Base Material Vocabulary
2010-11-19
Harper - Linked Library Data - Columbia University
49
RDA WEMI Relationships
2010-11-19
Harper - Linked Library Data - Columbia University
50
Detail: RDA WEMI Relationship
2010-11-19
Harper - Linked Library Data - Columbia University
51
Metadata Registries
• Formerly NSDL Registry
– Now “Open Metadata Registry”
– Managing Vocabularies
– Providing Vocabulary Services
• DCMI Registry Community
• DCMI Architecture Forum
2010-11-19
Harper - Linked Library Data - Columbia University
52
DCMI and the Semantic Web
• Collaboration from the start
• Libraries (esp. OCLC) were at the
table
• Perception of DCMI as DCMES
– DCMI = Metedata Vocab / Framework
– DCMES = Metadata Record Format
2010-11-19
Harper - Linked Library Data - Columbia University
53
DCMI and the Semantic Web
• Every example above had dcterms
• DCMI as Research Institute and
Metadata Think Tank
–
–
–
–
–
Modeling Work
Metadata Registries
Application Profiles
Description Set Profiles
Singapore Framework
2010-11-19
Harper - Linked Library Data - Columbia University
54
Changing Role of DCMI
• Mike Bergman at DC2010:
– Reference Metadata
– Reference Concepts
– Mapping Predicates
• “Mappings should be approximate”
– Usage Guidelines
• Compliment to W3C Standards
2010-11-19
Harper - Linked Library Data - Columbia University
55
Why Does This Matter?
Our descriptions no longer stand alone!
Connect our data with the rest of the WEB
Allow others to reuse more easily
–
–
–
–
–
–
–
–
FOAF
DBPedia
Geonames
MusicBrains
New York Times
Thompson Reuters
Government Data - data.gov
British Broadcasting Corporation
2010-11-19
Harper - Linked Library Data - Columbia University
56
Conclusions
• Distributed bibliographic control
environment
– Linking Data
– Focus on identification over description
• “In short, by treating values as nonliteral resources and assigning URIs to
them we give ourselves (and others)
the hooks on which to hang further
descriptions.” - Andy Powell
2010-11-19
Harper - Linked Library Data - Columbia University
57
Endless possibilities
• This barely scratches the surface
• The Giant Global Graph!!
• With more soundly modeled
bibliographic and authority data…
–
–
–
–
Mashups
Web Services
User Profiling
Collaboration tools
2010-11-19
– Terminology Services
– Context sensitive
interfaces
– Customized Exhibits
Harper - Linked Library Data - Columbia University
58
Continuing Challenges
• Emerging Technology
• Design Patterns
• Complexity (http-range14)
• Existing Technical Infrastructure
• Bootstrapping
• Business Cases
2010-11-19
Harper - Linked Library Data - Columbia University
59
More Information
• W3C LLD XG:
http://www.w3.org/2005/Incubator/lld/wiki/Main_Page
• ALA LLD Interest Group:
– http://kcoyle.net/lld-ala.html
• IFLA Semantic Web SIG
– https://wiki.d-nb.de/x/vA10Ag
2010-11-19
Harper - Linked Library Data - Columbia University
60
Thanks!
[email protected]
212.998.2479
Questions?
2010-11-19
Harper - Linked Library Data - Columbia University
61