Title of the presentation

Download Report

Transcript Title of the presentation

Accessing Cultural Heritage
Collections using Semantic
Web Techniques
Antoine ISAAC
(inluding cool graphics by Frank van Harmelen)
STITCH Project
Book & Digital Media Master
March 2nd, 2007
Accessing Cultural Heritage collections using Semantic Web techniques
Background
• CATCH
• Continuous Access To Cultural Heritage
• Funded by NWO
• 10 computer science research projects applied to the Cultural
Heritage field
• Personalization of access
• Image and text analysis for creating metadata
• …
• STITCH
• SemanTic Interoperability To access Cultural Heritage
• Exchanging and integrating metadata
Accessing Cultural Heritage collections using Semantic Web techniques
Agenda
• Cultural Heritage and Semantic Web
• Two important issues
• Publishing Cultural Heritage vocabularies on the Semantic
Web
• Vocabulary alignment
• Demo
Accessing Cultural Heritage collections using Semantic Web techniques
Some Needs for Cultural Heritage Collections
• Representation of objects and knowledge about them
• Pointing at collection objects
• Describing them (creating metadata) according to
specific
• Metadata structures (schemes)
• Controlled expert vocabularies (e.g. thesauri)
• Accessing object using metadata
• E.g. search using information contained in thesauri
Accessing Cultural Heritage collections using Semantic Web techniques
KB Illustrated Manuscripts
Accessing Cultural Heritage collections using Semantic Web techniques
KB Illustrated Manuscripts
Accessing Cultural Heritage collections using Semantic Web techniques
The Semantic Web (1/4)
• Pointing at resources: documents, knowledge objects
Uniform Resource Identifiers (≈ URLs)
Accessing Cultural Heritage collections using Semantic Web techniques
A Web of Resources
The_Netherlands
rep321
Amsterdam
rep321#paragraph3
http://www.ned.nl/rep321
Accessing Cultural Heritage collections using Semantic Web techniques
The Semantic Web (2/4)
• Pointing at resources: documents, knowledge objects
• Creating structured assertions involving resources
RDF (Resource Description Framework)
Factual knowledge encoded as subject-property-object triples
Accessing Cultural Heritage collections using Semantic Web techniques
Metadata in RDF
The_Netherlands
subject
hasCapital
rep321
Amsterdam
partOf
subject
rep321#paragraph3
http://www.ned.nl/rep321
Accessing Cultural Heritage collections using Semantic Web techniques
The Semantic Web (3/4)
• Pointing at resources: documents, knowledge objects
• Enabling structured assertions
• Using “building blocks” with precise semantics
Ontologies: formal definitions of shared conceptual vocabularies
RDF Schema /OWL (Ontology Web Language)
Accessing Cultural Heritage collections using Semantic Web techniques
Ontological information
subClassOf
Report
The_Netherlands
Document
type
subject
hasCapital
rep321
Amsterdam
partOf
subject
rep321#paragraph3
http://www.ned.nl/rep321
Accessing Cultural Heritage collections using Semantic Web techniques
The Semantic Web (4/4)
• Pointing at resources: documents, knowledge objects
• Enabling structured assertions
• Using “building blocks” with precise semantics
• Controlling existing facts, inferring new ones
Part of the tasks are delegated from the user to inference
engines that use the formal semantics of ontologies
Accessing Cultural Heritage collections using Semantic Web techniques
Ontological information
subClassOf
Report
The_Netherlands
Document
type
subject
type
hasCapital
rep321
Amsterdam
partOf
subject
rep321#paragraph3
http://www.ned.nl/rep321
Accessing Cultural Heritage collections using Semantic Web techniques
Building on top of XML
eXtensible Markup Language
<rdf:Description rdf:about=”http://www.ned.nl/doc321”>
<subject rdf:resource=” http://www.geo.org/voc/The_Netherlands”/>
</rdf:Description>
<rdf:Description rdf:about=”http://www.geo.org/voc/The_Netherlands”>
<hasCapital rdf:resource=”http://www.geo.org/voc/Amsterdam”/>
</rdf:Description>
Accessing Cultural Heritage collections using Semantic Web techniques
Building on top of the Web
• Web-based resources allow
division/sharing of
• document
• vocabulary
• metadata
http://www.geo.org/voc/
(par3, subject, Amsterdam)
http://www.kb.nl/eDepot
http://www.ned.nl/rep321
different
owners & locations
Accessing Cultural Heritage collections using Semantic Web techniques
Cultural Heritage Collections and Semantic Web
• Need to categorize/classify things
• Need to structure representations
• Using MD schemes is similar to using relations
Semantic Web techniques are good candidate for
representing Cultural Heritage metadata
Accessing Cultural Heritage collections using Semantic Web techniques
Agenda
• Cultural Heritage and Semantic Web
• Two important issues
• Publishing Cultural Heritage vocabularies on the Semantic
Web
• Vocabulary alignment
• Demo
Accessing Cultural Heritage collections using Semantic Web techniques
Publishing Cultural Heritage vocabularies on the
Semantic Web
• Situation: a lot of knowledge up there
• Aim: providing domain expertise to the outside world
• Thesaurus web services
• Aim: a global network of collection and vocabularies
• Coordinating different vocabularies
• Problem: need to enforce some homogenization
• Many different models and formats
Accessing Cultural Heritage collections using Semantic Web techniques
SKOS
• Simple Knowledge Organization Systems
• World Wide Web Consortium (W3C)
• Model to represent structured vocabularies (thesauri,
classification schemes) on the Semantic Web
• Building blocks to create XML/RDF data
•
•
•
•
Concepts and Concept schemes
Lexical properties (prefLabel, altLabel)
Semantic relations (broader, related)
Notes (scopeNote, definition)
Accessing Cultural Heritage collections using Semantic Web techniques
SKOS: Nederlandse Basisclassificatie (KB)
skos:prefLabel
Wetenschap en
cultuur in het
algemeen
nbc:nbc0200
skos:broader
skos:broader
nbc:nbc0214
nbc:nbc0230
skos:prefLabel
Museologie
skos:related
skos:prefLabel
skos:scopeNote
Organisatie van
Wetenschap en
cultuur
skos: = http://www.w3.org/2004/02/skos/core#
nbc: = http://www.kb.nl/nbc/
Verwijzing:voor
algemene musea, zie
02.14
Accessing Cultural Heritage collections using Semantic Web techniques
SKOS: Nederlandse Basisclassificatie (KB)
<rdf:Description rdf:about="http://stitch.cs.vu.nl/nbc#nbc0200">
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:prefLabel>wetenschap en cultuur in het algemeen</skos:prefLabel>
</rdf:Description>
<rdf:Description rdf:about="http://stitch.cs.vu.nl/nbc#nbc0214">
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:prefLabel>organisatie van wetenschap en cultuur</skos:prefLabel>
<skos:broader rdf:resource="http://stitch.cs.vu.nl/nbc#nbc0200"/>
</rdf:Description>
<rdf:Description rdf:about="http://stitch.cs.vu.nl/nbc#nbc0230">
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:prefLabel>museologie</skos:prefLabel>
<skos:broader rdf:resource="http://stitch.cs.vu.nl/nbc#nbc0200"/>
<skos:scopeNote>voor algemene musea, zie: 02.14</skos:scopeNote>
</rdf:Description>
Accessing Cultural Heritage collections using Semantic Web techniques
SKOS: Brinkman Trefwoorden (KB)
geneeskunde
skos:prefLabel
bk:075607204
bk:075607220
skos:related
skos:broader
skos:prefLabel
skos:altLabel
bk:075611791
skos:prefLabel
kindergeneeskunde
skos:scopeNote
geneesmiddelen
kinderen ouder dan
12 vallen niet onder
kindergeneeskunde
skos: = http://www.w3.org/2004/02/skos/core#
bk: = http://www.kb.nl/brinkman/
medicijnen
Accessing Cultural Heritage collections using Semantic Web techniques
SKOS
• Open (future) standard
• Web-compatible
• Shareable
• Links and blocks have established meaning
• Compliant with community needs
Accessing Cultural Heritage collections using Semantic Web techniques
Agenda
• Cultural Heritage and Semantic Web
• Two important issues
• Publishing Cultural Heritage vocabularies on the Semantic
Web
• Vocabulary alignment
• Demo
Accessing Cultural Heritage collections using Semantic Web techniques
Cultural Heritage Interoperability Problems
• Current trend: accessing different collections
simultaneously
• Problem: integrating different databases/metadata
schemes/vocabularies
• Syntactic interoperability can be solved
• Common metadata scheme
• Common vocabulary model (SKOS?)
• How about conceptual heterogeneity?
Accessing Cultural Heritage collections using Semantic Web techniques
The semantic interoperability problem
• There is no standard thesaurus
• We don’t really want it
different vocabularies for different expertise domains,
traditions, tasks
• Consequence:
• “klassieke ruïnes” vs. “landschap met ruïnes”
• “maagd Maria”
vs. “Heilige Moeder”
• Practical problem:
• Searching for “Heilige Moeder” misses “maagd Maria”
• Unless we know both vocabularies
Accessing Cultural Heritage collections using Semantic Web techniques
Old situation
Accessing Cultural Heritage collections using Semantic Web techniques
Vocabulary alignment
• STITCH aim: find correspondences between
vocabulary elements
• “klassieke ruïnes” ≈ “landschap met ruïnes”
• “maagd Maria”
= “Heilige Moeder”
• Doing it automatically
• Vocabularies are big (tens of thousands concepts)
• They evolve
• Application can change their reference vocabularies
• Using techniques from
• Linguistics
• Computer science
• Statistics
Accessing Cultural Heritage collections using Semantic Web techniques
New situation
Accessing Cultural Heritage collections using Semantic Web techniques
Automatic alignment techniques
• Lexical
Long
brain
tumor
Long
Labels of entities and textual definitions
• Structural
Structure of the formal definitions of entities, position in the hierarchy
• Statistical
Object information (e.g. book indexing)
• Background knowledge
Using a shared conceptual reference to find links
tumor
Accessing Cultural Heritage collections using Semantic Web techniques
Lexical alignment
• Compare each pair of concepts
• Use labels and synonyms of concepts
• Heuristic method to discover
equivalence and specialization relations
Long brain tumor
More specific Long tumor
than
Accessing Cultural Heritage collections using Semantic Web techniques
Lexical alignment: Manuscripts case
broaderEquivalent
Accessing Cultural Heritage collections using Semantic Web techniques
Automatic Alignment Techniques
• Lexical
Long
brain
tumor
Long
Labels of entities and textual definitions
• Structural
Structure of the formal definitions of entities, position in the hierarchy
• Statistical
Object information (e.g. book indexing)
• Shared background knowledge
Using a conceptual reference to deduce correspondences
tumor
Accessing Cultural Heritage collections using Semantic Web techniques
Statistical alignment
Accessing Cultural Heritage collections using Semantic Web techniques
Statistic approach: KB case
• Experiment with GOO trefwoordenthesaurus and Brinkman
thesaurus
Accessing Cultural Heritage collections using Semantic Web techniques
Statistic approach: KB case
• Comparing books indexed with BK concepts and books
indexed with GTT concepts
• Overlap measure
concept C2 [BK]
concept C1 [GTT]
Accessing Cultural Heritage collections using Semantic Web techniques
Results
1: 9132.9 (1704 3479 976) Schilderijen schilderkunst
2: 8088.5 (1204 2330 767) Kwaliteitszorg kwaliteitsmanagement
3: 6232.7 (820 1572 543) Personeelsmanagement personeelsbeleid
4: 5392.1 (1399 3271 622) Beeldende kunsten beeldende kunst
5: 5063.1 (4951 1152 613) Nederlands - Nederlandse
taalkunde
17: 3421.8 (280 714 243) Diabetes mellitus suikerziekte
Accessing Cultural Heritage collections using Semantic Web techniques
Agenda
• Cultural Heritage and Semantic Web
• Two important issues
• Publishing Cultural Heritage vocabularies on the Semantic
Web
• Vocabulary alignment
• Demo
Accessing Cultural Heritage collections using Semantic Web techniques
Demo
• KB Illuminated Manuscripts
• BNF Mandragore Manuscripts
Accessing Cultural Heritage collections using Semantic Web techniques
Manuscripts, 2nd Collection: BNF Mandragore
Accessing Cultural Heritage collections using Semantic Web techniques
Manuscripts, 2nd Collection: BNF Mandragore
Accessing Cultural Heritage collections using Semantic Web techniques
Manuscripts vocabularies
• Mandragore
•
•
•
•
Big (16000 terms)
Weakly structured (2-level deep, multi-inheritance)
Alternative lexical forms
Definitions
• IconClass
•
•
•
•
Huge (>24000 subjects)
Richly structured : 10 level hierarchy, cross-references
Compound concepts: keys, structural digits…
Keywords
[Monolingual case, since Iconclass comes in French and
English]
Accessing Cultural Heritage collections using Semantic Web techniques
Demo
• http://stitch.cs.vu.nl/rp33333/MANDRA-SV-ICEmandraNewNONE , amphibians
• Wheat
Accessing Cultural Heritage collections using Semantic Web techniques
Conclusion: Semantic Web can help Cultural
Heritage
• Representation of collections and associated
expert vocabularies
• Publication and access
• Semantic integration
New opportunities for making knowledge
accessible
Cf. Dublin core RDF Schema
Accessing Cultural Heritage collections using Semantic Web techniques
Links
• Semantic Web at W3C
• http://www.w3.org/2001/sw/
• Semantic Web at Vrije Universiteit
• http://www.cs.vu.nl/ai/kr/
• http://www.cs.vu.nl/bi/
• SKOS
• http://www.w3.org/2004/02/skos/
• Other Cultural Heritage and Semantic Web projects
• MuseumFinland, http://www.museosuomi.fi/
• eCulture, http://e-culture.multimedian.nl/
Accessing Cultural Heritage collections using Semantic Web techniques
Thanks!
Accessing Cultural Heritage collections using Semantic Web techniques
Accessing Cultural Heritage collections using Semantic Web techniques
Accessing Cultural Heritage collections using Semantic Web techniques