Transcript Diapositive 1
SWIB 2012 Linked Open Library Data in Practice: Lessons Learned and Opportunities for data.bnf.fr
Romain Wenz Bibliothèque nationale de France Conservateur Département de l’information bibliographique et numérique
What it looks like
Web pages about Authors, Works, Subjects Gathering information Library records (12 million at BnF) Archive materials Digital objects (2 million at BnF: Gallica)
Part I
The purpose and difficulties Build Web pages About writers, books, subjects Linking to all resources in the library Completely automatic
Exemple
Information about Cicero, http://data.bnf.fr/11885977/ciceron/ Most studied books, editions of theses books Digitized books, Activities, such as translations by Cicero
Regroupement par « Œuvres »
http://data.bnf.fr/11952658/dante_alig hieri_la_divine_comedie/ Manuscripts Editions Digital books
About a « theme »
Books about diving http://data.bnf.fr/12647518/natation/
Several formats
Marc catalogues XML-EAD archives and manuscripts Dublin Core digital Library Authorities: Persons and Organisations Works (Uniform titles) Subject Headings
Several structures
Library records : flat structure Archival fonds with hierarchical structure and heritage Digital Content that can be processed: tables of contents, OCR
Purpose: info about concepts
Pages for humans Structure for machines
Links and authorities
ARK identifiers from authorities Materials to make the matchings: Dates Preferred and alternative labels Graph of links : relations, roles
Workflow Digital documents Archives and Manuscripts Library catalogue records
Matchings- Alignments
Web pages for humans data for computers
Data model
Ontologie complexe
Romain WENZ BnF-IBN 13
Part II
Feedback on activities
How?
FRBR principles Things that work
Principes FRBR
Functional Requirements for Bibliographic Records Uses Dates Labels Related roles Wich roles: creation of a work production of a version: language, type, material production: publication, life of an item
Why FRBR?
Linking writers and works with a useful type of links: - Writer of a work - Contributor of an edition: translator, preface, … - Producer : physical copy with a printer, distributor - Associated with a unique item: owner, annotator
From a bibliographic record
Make the link towards a work Common properties Possible « expressions » Author Dates Name Role Type of document Language Date Title
Matching (« Aligning »)
Using a « prediction function » to: Predict to wich Work a bibliographic ressource is associated : Words of all titles Goups of words Give a threshold Stopwords and improvements
Clustering
From the manifestations that are not matched If there are enough common points What it looks like in theory… and in practice
The purpose
Gather data Make them useful on the Web Upgrade the catalogs
Part III
« Linked Open Library »
Open: Technical Legal
With the “Open data” initiatives led by the French government, it is possible to use an Open Licence. Currently a strong state incentive around open data and formats Once data is linked and open, what comes next?
First, changes in general use, since people can now find BnF’s resources directly on the Web. Mailing address: lots of mail, « new publics » Use statistic: 80%+ users from search engines R and D: Improvements to integrate in main catalogues and archives
Secondly, the data is being used by broader communities. small public libraries, new procedures are being explored for re-use of the dataset in local catalogues. Example of « OpenCat » with Fresnes Use in other contexts: example of IF verso (translations) Institut français http://ifverso.com/ Specific catalogues (bindings)
In the long term ?
Semantic Web technologies could set a standard for library data, if we keep them linked and open.
Library missions
Strengths or weaknesses?
Descriptive information :trust
produced to handle a collection and not for marketing purposes
Describing local « concepts » : local use
For documents, not encyclopaedically
Use of standards: long-time perspective
MARC catalogues, EAD archives, DC digital collection
Already « machine-readable »
But not with Web standards yet
Projet: [email protected]