Transcript Linked Data

Linked Data
A Personal Perspective
Janifer Gatenby
OCLC EMEA
With acknowledgements to Richard Wallis and Anila
Angjeli
The world’s libraries. Connected.
• What is it?
• What does it promise?
• How do we get there?
• What happens when we get there?
The world’s libraries. Connected.
What is it?
• Not really a new way of linking but a new way of
expressing a link
It is about using canonical trusted globally
referenceable identifiers for concepts, people,
organisations, locations etc. instead of copying text
strings and losing the connection with the
authoritative sources they came from.
Richard Wallis
The world’s libraries. Connected.
MARC21 links
• 700 10 $a name $e role $0 authority control
number
• (added entry in a MARC record for a name related to a work, not the main author)
These familiar links reference an authority record in the
same database as a bibliographic record, hence have
no address portion. Linked data extends the linking
range.
The world’s libraries. Connected.
Extending the linking range: URI
• URI – immutable address as well as an identifier
• http://id.loc.gov/authorities/names/nr89009099
• http://viaf.org/viaf /116774723
• http://isni-url.oclc.nl/isni/000000114556841
9 NACO libraries –
LC,
National Agricultural Library,
National Library of Medicine,
British Library,
NL Mexico,
NLNZ,
NL Scotland,
NL South Africa,
NL Wales
The world’s libraries. Connected.
Extending the linking range: RDF
• RDF – metadata is expressed in triples
• Data
• Data label (properties)
• Vocabulary from which the label comes (gives context
to the label)
The world’s libraries. Connected.
Linked Data Principles
1. Use URIs as names for things
2. Use HTTP URIs so people can look up those
names
3. When someone looks up a URI, provide useful
information, using the standards - RDF
4. Include links to other URIs, so that they can
discover more
Tim Berners-Lee - 2006
The world’s libraries. Connected.
Vocabularies
• Vocabularies are not schemas, they are lists of
defined data labels (concepts)
• Schema.org (Search engines)
• BibFrame (Library community)
• FOAF Friend of a friend
• OWL same as
• Vocabularies can be mixed
The world’s libraries. Connected.
foaf:name "Jimmy Wales" ;
foaf:mbox <mailto:[email protected]> ;
foaf:homepage <http://www.jimmywales.com/> ;
foaf:nick "Jimbo" ;
What does it promise?
• Enriched displays without data maintenance
• Better harvesting and ranking
• because of markup
• and because of links
• Navigation to pages with additional information –
– Example: from VIAF via ISNI to encyclopaedias, rights
management societies (digitisation rights), Bowker –
biographies from fly leaves
The world’s libraries. Connected.
The world’s libraries. Connected.
The world’s libraries. Connected.
Interconnecting French cultural heritage treasures on the Web
Digital documents
(DC)
Other BnF
resources
Web pages for
Internet users
BnF Archives and
Manuscripts
catalogue
(EAD)
BnF Main catalogue
(MARC)
External
resources
Raw data for machines
Modeling
Matching
Clustering
Alignments
Semantic Web
techniques
12
example
ISNI 0000 0001 2283 1567 (soon)
BnF persistent ID
Links
Imported
from
Wikipedia
and
integrated in
the page
vocabularies used
Existing ones + others
defined for the specific
needs of the project
Data can be downloaded
Information about the data model (or ontology) at : http://data.bnf.fr/about-en
How do we get there?
DNB CultureGraph
• “It’s all about creating
connections”
• DDC to RVK (German
classification) by
comparing search
results
• GND (names) to
German Wikipedia
The world’s libraries. Connected.
Example VIAF
• Ingesting data to compare and create links
• Makes clusters; cluster identifier
• Ingesting preferred to external linking
• Wikipedia, ISNI, WorldCat identities
• More data used for clustering, so more reliable
• VIAFBot for making reciprocal links in Wikipedia / Wikidata
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
<rdf:typedf:resource="http://rdvocab.info/uri/schema/FRBRentitiesRDA/Person"/>
<foaf:name>De Groot, Gerard J., 1955-</foaf:name>
<foaf:name>DeGroot, Gerard J., 1955-</foaf:name>
<rdaGr2:dateOfBirth>1955-06-22</rdaGr2:dateOfBirth>
<owl:sameAs rdf:resource="http://data.bnf.fr/ark:/12148/cb12299846b#foaf:Person"/>
<owl:sameAs rdf:resource="http://www.idref.fr/034977651/id"/>
<owl:sameAs rdf:resource="http://d-nb.info/gnd/12422900X"/>
The world’s libraries. Connected.
Text Rights
Music Rights
Trade Sources
Encyclopaedias
Libraries
Researchers & Professional
7 million NEW LINKS to & from VIAF
bnf
dnb
lc
nta
nukat
wkp
All VIAF
Text Rights Sources
123,964 assigned
37.383
25.177
72.960
83.498
32.184
14.935
406.178
Research & Profess’l
404,272 assigned
24.141
14.688
76.986
30.526
16.730
3.465
223.305
Music Sources
189,000 assigned
27.542
33.997
38.218
13.560
8.675
19.700
207.231
Trade sources
2.4 million assigned 570.224 384.230 2.138.955 741.671 442.037 138.636
6.100.349
Totals 659.290 458.092 2.327.119 869.255 499.626 176.736
6.937.063

Linked Data: isni-url.oclc.nl/isni/
The world’s libraries. Connected.
ISNI – an identifier
• Identifiers Seal Uniqueness: “n” number
of other elements are necessary for
uniqueness
• Stable identifier; stable metadata:
•
assigned where there is confidence in
the quality and completeness of the
metadata to establish uniqueness
• ISNI system + Quality Team (BL & BnF)
Linking erroneous data
propagates errors.
The world’s libraries. Connected.
Links are made once and inherited, e.g. by local catalogues
• URI – immutable address as well as an identifier
• http://id.loc.gov/authorities/names/nr89009099
• http://viaf.org/viaf /116774723
• http://isni-url.oclc.nl/isni/000000114556841
9 NACO libraries –
Library of Congress,
National Agricultural Library,
National Library of Medicine,
British Library,
NL Mexico,
NLNZ,
NL Scotland,
NL South Africa,
NL Wales
The world’s libraries. Connected.
What happens when we get there?
• Search happens mostly in the search engines
• Library catalogue concentrates on:
• Being linked to (& linking out)
• Delivery, particularly of the digitised and immediate
The world’s libraries. Connected.
What happens when we get there?
• How do search and linked data interact?
• Is search really fully delegated to search engines
& larger union catalogues?
The world’s libraries. Connected.
Types of search
Search type
Happening in
Known item
Search engines, also in more specific
sources where expected to reduce noise
Subject search
Search engines, also in more specific
sources
Index browse
In catalogues
Follow a link
Everywhere . In library catalogues from a
full record display.
The more your catalogue is linked in, the more likely it is to
attract all types of searches
The world’s libraries. Connected.
Links plus data needed in catalogues
It is about using canonical trusted globally
• Data needed
• For making
indexes
• For
comparisons,
e.g. For deduplication
• Data mining
referenceable identifiers for concepts, people,
organisations, locations etc. instead of copying text
strings and losing the connection with the
authoritative sources they came from.
This doesn’t mean that you only
need the links; you often also
need to ingest the data
Besides data storage no longer the restraint it once was
The world’s libraries. Connected.
Richard Wallis: Further Reading
• http://www.slideshare.net/tulipbiru64/the-singlepower-of-link-richard-wallis
• http://www.slideshare.net/rjw/linked-data-andoclc
The world’s libraries. Connected.