Data driven research in Earth and Environmental Sciences

Download Report

Transcript Data driven research in Earth and Environmental Sciences

Joining the Dots
Managing and identifying geolocated data by DOIs and IGSNs
Jens Klump | OCE Science Leader Earth Science Informatics
20 August 2014
MINERAL RESOURCES FLAGSHIP
A few words to introduce myself ...
• 1992 – 1995 B.Sc. in geology and in oceanography, Univ. Cape
Town, South Africa.
• 1995 B.Sc. (Honours) in geology (exploration geochemistry) from
Univ. Cape Town, South Africa.
• 1996 – 1999 PhD in marine geology (biogeochemistry) from Univ.
Bremen, Germany.
• 1999 – 2000 training in application and database development,
project management.
• 2000 – 2001 IT project manager for DIE ZEIT (weekly newspaper,
Hamburg, Germany).
• 2001 – 2014 senior research scientist at the German Research
Centre for Geosciences GFZ, Potsdam, Germany.
• Since March 2014 CSIRO OCE Science Leader Earth Science
Informatics.
2 | Joining the dots | Jens Klump
Previous Work
• Supporting the research data value chain
• Understanding data management in geoscience research
• Development of project and enterprise research data solutions
• Development and implementation of persistent identifiers (DOI, IGSN)
• Integration of data from heterogeneous sources
• Information models to describe data and processes
• Semantic technologies for data interoperability
• Adoption of new technologies
• Studies on HPC, visualisation, 3D printing, internet of things
• Application of information technology to geosciences
• Sensor web enablement in environmental monitoring networks and in the
laboratory,
• Data driven research on natural gas hydrates
3 | Joining the dots | Jens Klump
DOI:
Data Publication and Citation
Making data part of the record of science
HTTP Error 404
5 | Joining the dots | Jens Klump
History of DOI
• “Link rot” was recognised as a problem early on and led to the
development of the handle system of persistent identifiers in
1995.
• DOI proposed 1997 and in production since 1998.
• First DOI for data minted 2004 in the context of DFG project.
• A business model had to be found to expand DOI for data to an international
scale.
• DataCite founded in 2009.
• 31 members at present, 3.6 M datasets registered (1.2 M in last 12 months)
• Total journal publications was estimated at 1.8 M articles for 2012.
• Some of the data sets are really fine grained.
6 | Joining the dots | Jens Klump
Data in publications
http://dx.doi.org/10.1594/GFZ.SDDB.1043
7 | Joining the dots | Jens Klump
Access to data
•
•
•
•
•
Description
Citation
Related materials
Download data
Download metadata
•
•
•
•
ISO19115
NASA DIF
DataCite
eSciDoc
http://dx.doi.org/10.1594/GFZ.SDDB.1043
8 | Joining the dots | Jens Klump
DOI for data
• Resolution 
• Resolution from DOI to URL provided by Handle service.
• Granularity?
• What is the smallest identifiable object?
• Identity?
• What exactly is identified by a DOI?
• Versioning?
• Updates, corrections, errata …
• Time series?
• Continuing time series from environmental monitoring
9 | Joining the dots | Jens Klump
The Ship of Theseus Paradox
Year 1
Year 2
Change
one plank
Year 3
Change
one plank
Year n
Change
one plank
10 | Joining the dots | Jens Klump
The Ship of Theseus Paradox
Year n
Collected planks
Year 1
Year 2
Change
one plank
Year 3
Change
one plank
Year n
Change
one plank
11 | Joining the dots | Jens Klump
The ship of Theseus Paradox
Can any object be identical with another object?
Is it the equivalent object we are looking for?
What is represented by the identifier?
Formally the Ship of Theseus Paradox can be approached by
introducing the concept of perdurantism.
The perdurantist view is that an individual has distinct temporal
parts throughout its existence.
Perdurantism is usually presented as the antipode to endurantism,
the view that an individual is wholly present at every moment of
its existence
12 | Joining the dots | Jens Klump
Single item
13 | Joining the dots | Jens Klump
Appended time series
14 | Joining the dots | Jens Klump
Updated item
15 | Joining the dots | Jens Klump
Snapshots
16 | Joining the dots | Jens Klump
Collection
17 | Joining the dots | Jens Klump
doi:10.1594/GFZ.SDDB.1202
Publication of Geodata
18 | Joining the dots | Jens Klump
Repositories vs. Services
• How should data identifies by DOI be disseminated?
• File based:
• Generic, close to original record of science, OAIS compliant.
• Limited for use by user agents (machines), often requires manual
interventions.
• Services:
• Machine friendly, use can be automated.
• Storage not OAIS compliant.
• File based data can be transformed into services.
19 | Joining the dots | Jens Klump
IGSN:
International Geo Sample
Number
Connecting Geology to the Internet of Things
Internet of Things
• “The Internet of Things refers to uniquely identifiable objects
(things) and their virtual representations in an Internet-like
structure.”
21 | Joining the dots | Jens Klump
Internet of Things
• Specimens are a basic unit for Geoscience observations.
• basic unit in data reporting.
• basic unit for data discovery, access, and analysis.
• Access to information about the samples is essential for evaluation
and interpretation of specimen-based data.
• Access to physical specimens allows to build more comprehensive
datasets and facilitates re-use of resources.
• No standard way to access information about specimens
• Few online repository catalogues
• Few disciplinary catalogues (e.g. Index of Marine & Lacustrine Geological
Samples, IODP)
• Incomplete specimen metadata in publications – if any.
22 | Joining the dots | Jens Klump
Why do we need identifiers for specimens?
Locations of rock specimens in EarthChem called “M1”.
23 | Joining the dots | Jens Klump
Globally Unique Identifiers
• Verification of literature data
without GUID for data and drill
holes or samples required indepth knowledge of the
organisational structures of
ocean drilling.
• Data were available, but
difficult to find.
• Search involved PANGAEA and
SEDIS (IODP).
24 | Joining the dots | Jens Klump
Literature, Data, Samples
doi:10...
Search: ...
doi: ...
doi:10.1594/...
doi:10...
doi:10.1594/...
doi:10.1594/...
25 | Joining the dots | Jens Klump
IGSN hdl: ...
Why not use DOI for specimens?
• DOI could be used for specimens.
• Remember, it’s a digital identifier for objects, not
only digital objects.
• Historically, TIB Hannover declined to
register DOI for specimens on formal
grounds. This was prior to DataCite.
• The use case of dealing with physical
specimens called for a different set of rules
even though structures are similar to
DataCite.
• Based on the Handle system, IGSN can easily
be merged with DataCite in the future.
26 | Joining the dots | Jens Klump