Introduction to SeaDataNet Metadata
Download
Report
Transcript Introduction to SeaDataNet Metadata
SeaDataNet Training Course
Introduction to SeaDataNet
Metadata
Roy Lowry
British Oceanographic Data Centre
Overview
• An introduction to the
SeaDataNet metadata formats
covering
Purpose
Entity definition
History
Population
Strengths
Weaknesses
Overview
• SeaDataNet metadata formats
European Directory of Marine Organisations
(EDMO)
Cruise Summary Report (formerly ROSCOP)
European Directory of Marine Environmental
Datasets (EDMED)
European Directory of the Ocean Observing
System (EDIOS)
SeaDataNet Common Data Index (CDI)
European Directory of Marine Environmental
Research Projects (EDMERP)
EDMO
• Purpose
Provides SeaDataNet with an address book of
organisations associated with marine data
Provides descriptions of these organisations
• Entity definition
Any group of people sharing a common postal
address engaged in activities associated with marine
data acquisition and use
• History
Developed by Maris during SEA-SEARCH in
response to a need to improve address metadata
management across the project
EDMO
• Population
On-line Content Management System
fronted by a web form (http://www.seasearch.net/organisations/)
Partners are responsible for maintenance of
their national record set
Management supported by a reasonably
sophisticated access control system that
authenticates users and grants access to
the appropriate database subset
EDMO
• Strengths
The maintenance tool. Please use it to look after the
entries for your country
Provides a single point of entry for SeaDataNet
metadata documents associated with a given
organisation
Centralisation of metadata common to other
catalogues, replacing four independently maintained
address metadata repositories
Rich information content, including descriptions,
logos and spatial location information
EDMO
• Weaknesses
Simple data model is poorly equipped for the
management of organisational evolution
Organisations merge, fragment, rename and move
All we can do in EDMO is document this using plain
language fields
Text fields contain embedded markup
These look very nice when displayed through the
search interface
However, the markup causes problems generating
XML documents for record transport between systems
Examples including graphics and relative URLs break
when transported by copy/paste
CSR
• Purpose
To document the operational and data generation activities
of an oceanographic research cruise
• Entity definition
A subject of some controversy
I am a metadata purist and support the definition of a
‘cruise’ as the interval of time between leaving port and
returning to port
Thus for a 3-leg cruise I would generate 3 CSR records
whilst others would generate just one. I do this because:
Combining records is easier than splitting them
Cruise ‘legs’ for some ships can be VERY different (e.g. 3
legs of a Meteor cruise: one JGOFS, one OMEX, one
WOCE)
Merging ‘legs’ is a slippery slope – I’ve even encountered a
single record covering the activities of two ships three
months apart
CSR
• Entity definition (continued)
Problem with my definition is that the real world
creates grey areas. For example, does a personnel
change by pilot boat in an estuary count as
‘docking’?
Others, extend the definition to cover any activity
collecting oceanographic data (shoehorning)
I believe this is a very bad thing to do
The activity super-class and other activity sub-classes
are much better described by other metadata
standards (e.g. in OGC Observations and
Measurements)
Later on in SeaDataNet we could consider
incorporating some of these to further enrich our
metadata portfolio
In the meantime remember that it is NOT necessary
to have every measurement covered by a CSR. If it
isn’t appropriate, don’t create one.
CSR
• History
Originally a paper form developed by IOC called a
ROSCOP
Replaced in 1990 by the Cruise Summary Report with
richer content (but the name ROSCOP stuck)
Numerous on-line databases developed during the
1990s
Primary repositories now DOD for SeaDataNet
partners and ICES for non-SeaDataNet
CSR
• Population
On-line web-form (http://www.seasearch.net/roscop/welcome.html)
XML schema available for bulk transfers
• Strengths
Flexible population mechanisms
Long history with a massive legacy
population
Cruise is (or should be) a well defined
concept to oceanographers
CSR
• Weaknesses
“Parameter” vocabulary
Really a vocabulary describing shipborne activities
No clear equivalent elsewhere for interoperability, but
ontological mapping to multiple vocabularies might provide
a solution
On-line systems developed using plaintext fields when
controlled vocabularies would have made interoperability
between repositories more straightforward
Spatial coverage limitations
Coarse-grained
Described using Marsden Squares but BODC has deployed
a Web Service to convert these to ISO19115/DIF standard
bounding boxes
EDMED
• Purpose
To describe marine environmental datasets to promote
their discovery
• Entity definition
A dataset, but what is a dataset?
ISO19101 defines a dataset as ‘an identifiable collection of
data’ which covers everything from the parameters
measured on a single water sample to the 7,500,000 CTDs
is the USNODC World Ocean Database
Sound judgement is needed to decide upon appropriate
granularity
Best approach is to establish objective criteria
Worth remembering that a measurement may be included in
more than one dataset
Posing this question to metadata specialists can provide
good sport!
EDMED
• History
Developed by BODC in late 80s
Adopted by EU MAST Data Committee, then SEASEARCH and now SeaDataNet
• Population
Form interface to stand-alone Access database that
is submitted to BODC for ingestion
XML schema available for bulk transfers
• Strengths
Content quality controlled on ingestion, therefore
standards are high
Rich content developed during SEA-SEARCH
EDMED
• Weaknesses
Developed in splendid isolation,
including vocabularies, therefore
interoperability with other systems is
difficult
Heavy dependence on plaintext fields:
a problem that should be addressed
during SeaDataNet
EDIOS
• Purpose
To describe marine environmental datasets
comprising data that are collected repeatedly,
regularly and routinely in order to promote their
discovery (initially for operational planning purposes)
• Entity definition
A dataset comprised of data that are collected
repeatedly, regularly and routinely, but what is a
dataset (c.f. EDMED)?
• History
Developed as an EU project led by EuroGOOS
Inherited by SeaDataNet
EDIOS
• Population
Currently an issue
There is a Word-based form (the MIF)
– Developed in parallel to the data model and
database with no evidence of communication
– Completed MIFs entered into the database at
BODC, requiring significant interpretation and
information rehashing (long and painful process)
SeaDataNet work in progress
– IFREMER/BODC working to produce an XML
schema to facilitate large-scale transfer
– Maris/BODC developing a web-form based
content management system along the lines of
EDMO
EDIOS
• Strengths
Rich data model based on structured fields
with minimal plaintext
Data model includes hierarchical
relationships between entities (project oneto-many observing programmes one-tomany measurement series)
Data model includes support for complex
spatial objects (polygons not boxes)
Data model is particularly well suited to the
description of operational oceanographic
systems
EDIOS
• Weaknesses
At the start of SeaDataNet EDIOS had
17 local vocabularies
Extremely poor content governance
Undergoing replacement with
managed SeaDataNet standard
vocabularies (6 down 11 to go)
Legacy content has not been
systematically quality controlled
EDIOS
• How is EDIOS different from EDMED?
Both are content standards designed to
describe datasets
Any dataset described by an EDMED
document could be described by an EDIOS
document and vice versa
Once vocabularies have been harmonised
and some mappings set up it should be
possible to generate an EDMED document
from an EDIOS document
Generation of an EDIOS document from an
EDMED document will never be possible
EDIOS
• How is EDIOS different from EDMED?
SeaDataNet convention is to use EDIOS for
‘qualifying’ datasets and EDMED for everything else
EDMED currently has a working population
mechanism, but EDIOS does not
Advice to partners
Identify datasets to be described by EDIOS
documents, map them to the EDIOS data model
(relational schema and Access prototype on BSCW)
and gather together the necessary information
Prepare EDMED documents for all other data sets
and get them into BODC
Submit EDIOS entries to BODC once the necessary
systems are operational
CDI
• Purpose
To provide an ultra-light discovery metadata
description of accessible SeaDataNet data objects
Used to build a manageable fine-grained index of
discrete data objects (millions of entries)
• Entity definition
The fundamental SeaDataNet data delivery unit such
as a current meter record or a CTD profile
• History
Developed by SEA-SEARCH as a pilot for SeaDataNet
CDI
• Population
XML schema describing files that should be
generated automatically from existing digital
indexes
• Strengths
Light content makes efficient handling of
large numbers of records possible
• Weaknesses
Light content restricts available information
EDMERP
• Purpose
Description of European marine research
projects and programmes
• Entity definition
A co-ordinated collection of marine data
acquisition activities in Europe
• History
Developed by Maris during SEA-SEARCH
EDMERP
• Population
Access form: resulting mdb file submitted to
Maris
On-line content management system
planned
• Strengths
Provides centralised project metadata
• Weaknesses
Local vocabularies and plaintext
That’s All Folks!
Questions or
Geoff?