Multilingual access to online content

Download Report

Transcript Multilingual access to online content

www.europeanaconnect.eu
Multilingual Access to Online Content
- the Europeana Experience
Vivien Petras (Humboldt-Universität zu Berlin)
 With the help of many people involved in Europeana
(referenced in the slides)
Eurovoc Conference, 18-19 November 2010
Outline
• Europeana – a brief introduction
• Multilingual access to Europeana – approaches
• Europeana Semantic Data Layer
• Multilingual Alignments of Vocabularies
• Semantic Search Engine Prototype
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Europeana
“A digital library that is a
single, direct and
multilingual access point
to the European cultural
heritage.”
European Parliament, 27
September 2007
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Europeana Today
• 13 million objects
• 28 data aggregators
• 1500 participating institutions
• 200 partners
• 35 FTE’s
• 21 projects
• 1 million visits in 2010
• 30,000 My Europeana signees
• 2008: Prototype
• 2010: Operational Service
• Stable portal
• Open Source Code
• EuropeanaLabs
• Public Domain Charter
From: Cousins, Jill (2010). Europeana Overview. Europeana Open Cultures Conference, 14-15 October Amsterdam
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Europeana Contributions by Country
Different
languages!(?)
From: Cousins, Jill (2010). Europeana Overview. Europeana Open Cultures Conference, 14-15 October Amsterdam
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Europeana Content Types
Videos Sounds
1%
1%
Goethe, Johann Wolfgang von
Title: Goethe, Johann Wolfgang von
Date: unknown
Creator: Goethe, Johann Wolfgang von
Description: Goethe, Johann Wolfgang von
Texts
Language: de-DE
38%
Format: image/jpeg
Source: SLUB/Deutsche Fotothek
Images
Rights: Deutsche Fotothek
60%
Provider: Deutsche Fotothek ; Germany
Identifier:
http://www.deutschefotothek.de/obj70226592.html
Subject: Bildnis; Bildniskatalog; Foto; Fotos;
Books, Articles, Postcards, Folklore Portrait
objects, Photography, Art
Type: image
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Multilingual Acess to Europeana
• Interface
• static pages
• Search
• query translation
• (document translation)
• Subject Browse (&
Search)
• Controlled vocabularies
• Semantic Data Layer
French
English
Spanish
Dutch
Portugese
German
Italian
Polish
Hungarian
Swedish
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Europeana Semantic Data Layer
Doerr, M.; Gradmann, S.; Hennicke, S.; Isaac, A.; Van de Sompel, H. (2010). The Europeana Data Model (EDM).
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Europeana Semantic Data Layer
Bridging „isles of information“ by connecting objects from different
domains via cross-vocabulary links.
museum
archive
library
Doerr, M.; Gradmann, S.; Hennicke, S.; Isaac, A.; Van de Sompel, H. (2010). The Europeana Data Model (EDM).
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Semantic Data Layer Alignment Example
Norwegian
vocabulary
SKOS Mapping
skos:exactMatch
Irish vocabulary
From: Cousins, Jill (2010). Europeana Overview. Europeana Open Cultures Conference, 14-15 October Amsterdam
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Multilingual Alignment: Approach
• Identify and convert relevant semantic resources
• Pivot vocabularies for relevant categories (subject, persons, places…)
= multilingual and with wide coverage
• E.g. UDC, DDC, VIAF, TGN, Geonames, Wordnets, dbPedia
From: Isaac, Antoine; Schreiber, Guus (2010). Vrije Universiteit Amsterdam Approach to Multilingual Mapping of Vocabularies.
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Multilingual Alignment: Approach
• Align more specific vocabularies to the pivots
= anchoring mappings
• Finding instances of skos:exactMatch mappings
• Vocabulary characteristics important for matching:
• Lexical variance of lables (e.g. plural/singular, diacritics,
multilinguality)
• Preferred / alternative labels
• Nature of hierarchy
From: EuropeanaConnect Milestone 1.2.1 (2010). Specification of preferred terms identification methodology.
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Multilingual Alignment: Approach
• Methodology:
• Conversion to SKOS/RDF
• Application of different alignment methods:
• Lexical matching
• Structure-based matching
• Instance-based matching
• Filtering / disambiguation of matching candidates:
• Analyzing children / parent matches
• Combining alignments
From: EuropeanaConnect Milestone 1.2.1 (2010). Specification of preferred terms identification methodology.
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
VUA Vocabulary Aligment Tool Amalgame
• AMsterdam ALignment GenerAtion MEtatool
• Uses EDOAL (Expressive and Declarative Ontology
Alignment Language) or SKOS
• Also provides pre- / post-mapping statistics and an
evaluation tool
From: EuropeanaConnect Milestone 1.2.2 (2010). Semantics of descriptions aligned (intermediary).
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
VUA Vocabulary Aligment Tool Amalgame
http://semanticweb.cs.vu.nl/beta/amalgame/list_alignments
• Skosified:
en, fr, de, nl,
hu
• Mappings
(>500,000):
en, fr, nl
• Mostly label
matches
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Europeana Semantic Search Engine
http://eculture.cs.vu.nl/europeana/session/search
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Europeana Semantic Search Engine
Disambiguation of search terms
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Europeana Semantic Search Engine
Multilingual query expansion
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Europeana Semantic Search Engine
• Works created by matching
•
•
•
Clustering of search results
•
•
•
•
•
person
Works related to matching person
Works created by a teacher of
matching person
Works related to an artefact
created by matching person
Works created by an artist
professionally related to matching
person
Works titled
Works showing concept
Works with matching Location
….
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Next Steps
• Adding more vocabularies from the content providers:
• VIAF
• Spanish and Polish subject heading lists
• Switching metadata delivery to Europeana Data Model
(EDM) format (2011)
• And: linking with the cloud…
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Europeana & Linked Open Data
Information Spaces
• DBpedia
• PND and SWD
(prototype)
• Geonames
• LCSH
• …
Doerr, M.; Gradmann, S.; Hennicke, S.; Isaac, A.; Van de Sompel, H. (2010). The Europeana Data Model (EDM).
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010
Thank you.
www.europeana.eu
Vivien Petras, Humboldt-Universität zu Berlin
Eurovoc Conference, 18-19 November 2010