Europeana v1.0 WP3: Charting the landscape for EDM prototyping

Download Report

Transcript Europeana v1.0 WP3: Charting the landscape for EDM prototyping

Interoperability Aspects in
Europeana
Antoine Isaac
[email protected]
Workshop on Research Metadata in Context
7./8. September 2010, Nijmegen
europeana.eu: the mission
• Making European cultural heritage better (web-)accessible
• Federating (online) cultural collections across countries and
domains
• Hundreds of institutions, millions of objects
europeana.eu in practice
• We rely on aggregating from our providers:
• Metadata
• References to digital objects
• We have a portal
• End-user “show-case”
• We will strive to become a metadata distributor
• Allowing partners to get enriched (contextualized) data for their
objects
• Allowing third-parties to deploy object access functions similar to
Europeana’s, in their own services
Current status – provider data
• Very heterogeneous: different communities, different
institutions, different interests and means
• Descriptions of original objects and digital objects uses
hundreds of vocabularies, e.g.:
• Libraries: “MARC-style” records
• Museums: very diverse, richest ones with event-based
descriptions (CIDOC-CRM)
• Archives: “EAD-style” hierarchical finding aids
• Cross-field “container” formats: METS
Current status – provider data
• Grain varies
• Quality varies
• Free keyword indexing
• Explicit or implicit use of controlled vocabularies
• Adhoc vs. more standard (DDC, AAT, etc.)
• Persistent identifier usage not widespread
• (National) Libraries are doing better
Current Europeana metadata stream
• Europeana Semantic Elements for ingestion of descriptive
metadata and pointers to digital objects
• Dublin Core fields + Europeana-specific one
• Providers do the mapping from their data to ESE
• Ingestion process: OAI-PMH, still often via files
• (Fielded) full-text search using SOLR/Lucene
Limitations of ESE
• Simple “flat” format
• Loosing richer (structured) data
• OK for full-text indexing and search
• Not ok for all the rest (display, access to data, richer search)
• Variations of DC field usage across collections
• dc:coverage
• dc:rights
Digression: talk about rights?
• Lots of objects, with rights not cleaned yet
• Collection-level approaches are difficult to implement
• Rights of metadata different of rights for “real” objects
• Result: users don’t know in Europeana the rights status of
the object they can access
• They have to go to providers’ site for each object
• Deterring reference and re-use
• Recent developments: trying to
• Encourage provision of rights at object-level
• Use “controlled vocabularies” for rights (CC)
• Promote public domain (esp. for metadata)
The future
• A new data model as a solution?
• EDM – Europeana Data Model
EDM requirements & principles
1. Distinction between “provided object” (painting, book,
2.
3.
4.
5.
6.
7.
program) and digital representation
Distinction between object and metadata record describing
an object
Allow for multiple records for same object, containing
potentially contradictory statements about an object
Support for objects that are composed of other objects
Standard metadata format that can be specialized
Standard vocabulary format that can be specialized
EDM should be based on existing standards
EDM basics
Re-using available vocabularies
• OAI ORE for organization of metadata about an object
• Dublin Core for core metadata representation
• SKOS for vocabulary representation
EDM basics
• A semantic web-inspired model
• E.g., DC would not be used with text fields alone, reference to
controlled vocabularies (via URIs) will be encouraged
• Keeping original descriptive metadata
• Achieving interoperability through mapping (cf. Peter’s “profile
matching”?)
• Flexibility–ingesting richer original metadata– is a main
requirement
• Even though we might not really use ourselves all of the data at its
full potential, e.g. for search
A flexible model: different semantic grains
A flexible model: object and events
Around the data model
• Opportunity (and need) to get and produce richer metadata
• De-duplication
• Semantic enrichment with contextual resources (thesauri, authority
lists) within and outside Europeana
• Alignment of contextual resources
• Linked Data: serving data on the web, pointing to others’ data
• Fits very well Europeana missions
Around the data model
Rationalization of data ingestion, archival and dissemination
process (OAIS) makes more explicit what Europeana needs
to do to behave more as a real metadata archive
• Not only feeding a Lucene/SOLR instance
• Cope with enrichments, versions, pointers to external
resources
• Registries of vocabularies (metadata structures and
controlled value vocabularies) and links between them
Encouraging community initiatives
Best practices for representing and providing metadata can be
seen as a complement to the general EDM.
Building interoperability cores at community-level
• Museums:
• ATHENA project (LIDO format)
• Audio/visual:
• PrestoPrime, European Film Gateway
• Archives:
• APEnet (using EAD)
Planning
Thank you!
References for ESE and EDM:
http://version1.europeana.eu/web/guest/technical-requirements/
http://version1.europeana.eu/web/europeana-project/technicaldocuments/
Mona Lisa example
20
Example with event-based metadata