The mapping process

Download Report

Transcript The mapping process

The mapping process – some
observations
Robina Clayphan
EDLF
Data Flow
Local schemas > ESE
Management of the process
• Sheer complexity of managing the hundreds of files going
through the steps in the process
• keeping track of the status of the files
– straight-forward ones – in the right place for the next step
– problem ones - refer back to provider or a developer
• Use of Sharepoint document libraries and rapid
establishment of procedures that all must adhere to
• The management of the process evolved during
implementation - a very steep learning curve
• Maintenance of authority files
– getting for meta-metadata from the providers (types etc)
– collection IDs
(sort of) Policy issues
• Inclusion criterion: must have a link giving direct
access to the digital object
– check if URLs in data actually resolve to the object
described
• Often:
– resolve to metadata page with e.g. pdf icon
– how many clicks are acceptable – need for policy decision
– granularity mismatch – link at title level only
• Sometimes:
– 404 page not found - refer to provider – persistence of URLs
– need a plug in (e.g. DjVu) – is that OK?
• Occasionally: a log-in required for restricted access resources
• Need for providers to ensure they only provide links
to resources that can be accessed
Data level problems 1
• Trying to understand decision-making process of the
original metadata creators
– What they meant by e.g. dc:date, dc:source
• Trying to discern the (implicit) data model of the
original metadata creators
– What is the dc:relation referring to
• Understanding data in a foreign language or foreign
script
– Is negyedévenként really hungarian for terminally?
• And, if so, why is it in dc:format?
Data level problems 2
• Questions to developers that arose from examining
the data
– All records have two instances of dc:identifier the first a URL the
second (possibly) a shelfmark. Need to map each instance to a
different ESE - can it be done?
– All records have two instances of dc:rights the first appropriate
the second not – is it possible to just display the first and ignore
the second?
– Where values had been divided between multiple instances of
the same element – could they be concatenated with
punctuation for a better display e.g spatial1, spatial2, spatial3
used for a geographic hierarchy. Another with up to 14
instances of dc:subject.
Normalisation level
• At the normalisation stage you can see if your
interpretation of the record actually makes sense
when it has been processed against the source data.
• Apply the Quality Control Checklist
• Edit mapping and repeat !
(my) Conclusion
• All indicates:
– that it is easier if the mapping and normalising is done as
close to source as possible, ideally by the providers
• they are the ones who understand what the data means and can
make sensible mapping decisions
• they understand the language and script
– Tools would be nice!
Data Flow
#0
Transform data to populate local
repository
Aggregator?
Aggregator with
provider?
Aggregator with
provider?
EuropeanaLocal
#5
Export data to Europeana
Local schemas > ESE
EuropeanaLocal Content Provider Model - to illustrate
movement of metadata only
Content
provider
local systems
Customised transformations to e.g. OAI-DC
Content
provider
repositories
Harvesting of e.g. OAI-DC
Aggregator
Aggregator
Mapping and transformation to ESE,
including <europeana> elements
EuropeanaLocal
Parallel Test
Environment
Europeana
No metadata
transformations
Issues for EuropeanaLocal
• Currently a great deal of manual effort goes into
metadata transformation.
– at provider sites: local format to repository format
– by the Europeana development team harvested
format to ESE
– normalisation by Europeana development team
• Where will this work happen in EuropeanaLocal?
– feasibility of central Europeana staff handling
hundreds more collections?
• Can we minimise the current manual overhead?
• What are the possibilities for automating all or some of
the transformation work?