ULI als Tesbed - Scientific Committee on Oceanic Research

Download Report

Transcript ULI als Tesbed - Scientific Committee on Oceanic Research

Pilot Implementation:
Publication and Citation of
Scientific Primary Data
Result of CODATA WG, supported by DFG
Jan Brase
Learning Lab Lower Saxony, Uni. Hannover
Michael Lautenschlager
WDC for Climate
Model and Data / Max-Planck-Institute for Meteorology
ERPANET WS, Cork, Ireland, 17+18.06.04
IDF Member's Meeting, London, 22.06.04
Roots
CODATA1) National Committee initiated WG, grant-aided by DFG
Working Period
 September 2001 to May 2002
Result
 Final Report "Konzept zur Zitierfähigkeit wissenschaftlicher
Primärdaten" or "Conception of Citing Scientifc Primary Data",
Hannover, 29.05.2002
Continuation
Two year project for pilot implementation funded by DFG starting in
October 2003
(1) CODATA - Committee on Data for Science and Technology)
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 2
Northern Hemisphere temperature
response for scenario IS92a
NH mean temperature
anomaly relative to
1961 – 1990 mean
of the IPCC DDC
greenhouse gas only
experiments
ECHAM4 / 3 : DT = 4.3°C
ECHAM4 / 2 : DT = 2.5°C
ECHAM4 / 1 : DT = 0.7°C
Each curve is connected with appr. 1TB data (numbers)
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 3
ECHAM4 / 1:
Temperature 2000
-8°C to -12°C
Corresponding to
point 1 in NH
temperature anomaly
CO2 = 370 ppmv
ECHAM4/OPYC
greenhouse gas only
according to IS92a
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 4
ECHAM4 / 2:
Temperature 2050
-4°C to -8°C
Corresponding to
point 2 in NH
temperature anomaly:
CO2 = 500 ppmv
ECHAM4/OPYC
greenhouse gas only
according to IS92a
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 5
ECHAM4 / 3:
temperature anomaly 2099
0°C to -4°C
Corresponding to
point 3 in NH
temperature anomaly:
CO2 = 690 ppmv
ECHAM4/OPYC
greenhouse gas only
according to IS92a
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 6
Problem and Solution
Shortcomings in data provision and interdisciplinary use
 Rules of good scientific practise are not taken into account in all
cases.
 Data sources are widely unknown.
 Data are achived without context.
 Data cannot be cited as independent entities
Method of solution: publication of primary data as independent
entities
 Persitent Identifier with global resolving mechanism for data archive
and context referencing (scientifc datamodel at archive level)
 Integration into library catalogues in order to find data together with
articles
 STD-DOI application profile: meta data kernel + items for electronic
publication (interface between scientific data archives and libraries)
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 7
Credits in Science
"Citation Index": Scientific efficiency is "measured" by publications.
Extra work for data publication is currently not acknowledged.
 Data processing, context documentation, quality assurance.
Recommendation: Data publications should be included in the
standard scientific "Citation Index".
 Motivation of the individual scientist.
 Connection between person and primary dataset.
Citable Data publications




support the rules of good scientific practise.
encourage inter-disciplinary data utilisation.
Make data searchable in library catalogues together with articles
Closes the gap between scientifc literature and related data
sources
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 8
Metadata for primary data 1
Attribute
Example
1. DOI
10.1594/WDCC/IPCC_EH4_OPYC_SRES_B2_MM
2. identifier
URN:TIB:10.1594/WDCC/IPCC_EH4_OPYC_SRES_B
2_MM
3. creator
Monika Esch (Author)
4. publisher
WDCC, World Data Center for Climate
5. title
Climate Projection for the next Century
calculated by the Global Climate Model ECHAM4OPYC using the SRES B2 IPCC Scenario
6. language
en
7. StructuralType
Digital
8. mode
Abstract
9. resourceType
Dataset
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 9
Metadata for primary data 2
Attribute
Example
10.-12. registration information
10.1594 (RA) / 1 (issue no.) / 2004-07-18 (issue
date)
13. creationDate
2001-12-31
14. publicationDate
2004-07-18
15. description
These data represent results from the
ECHAM4/OPYC climate model running the SRESB2 sceanrio. The data base tables contain
monthly mean time series of ……
16. publicationPlace
Hamburg
17. size
614190228 Bytes
18. format
GRIB
19. edition
1
20. relatedDOIs
(none)
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 10
Criteria for Persistent Identifier Allocation
Critical points are securing of data quality and stable connection
between identifier and data entity
 Allocation is restricted to syntax control and completeness, i.e. expert
data description and long-term archiving
 Scientific quality assurance is expected by the author and will be
reviewed during the allocation process.
 Published primary data cannot be changed like published articles.
 Stable connection between identifier reference and data entity as well as
long-term availability of the primary data are essential and must be
ensured (e.g. ICSU WDC's)
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 11
DOI and URN
DOI (Digital Object Identifier)
URN (Uniform Ressource Name)
Non profit, but membership fee
Presently cost free
Extended metadata support
Basic technical metadata
System of registration agencies
infrastructure
Anybody can register URN
namespaces
Global resolving mechanism
Resolving at community level
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 12
International
DOI Foundation
Global
Handle System
TIB Hannover
Registr.Agency
GFZ
Geophysics
Data Storage
Long-term
Archiving
M&D/MPIM
Climate Models
Data Storage
Long-term
Archiving
In WDC
DDB
URN-Knot
Marum/AWI
Observations
Data Storage
Long-term
Archiving
In WDC
TIB-ORDER
Library Catalogue
DFG Project
"Publication and
Citation of Scientific
Primary Data"
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 13
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 14
More Details of Pilot Implementation
Application Example
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 15
Primary data publication
• During her research for the World Data Center Climate
(WDCC) the scientist Mrs. Weather gains primary data about
the weather in Hannover in the year 2003.
• As usual the primary data is tested, evaluated, stored and
administrated at the WDCC.
• In addition Mrs. Weather registers the primary data at the TIB
(Primary data publication by STD-DOI/URN assignment)
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 16
Registration of primary data
• After quality assurance WDCC transmits to the TIB the URL
where the data can be accessed, together with a XML-file
containing all relevant metadata (generated from scientific
data model)
• Including all information obligatory for the citing of electronic
media (ISO 690-2)
•author
•title
•size
•edition
•
•
•
•
language
publisher
publishing date
publishing place
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 17
Identifier
• The TIB is saving this information about the primary data
and awards the primary data with a unique identifier for
registration: a DOI
• DOI (Digital Object Identifier) is a system for persistent and
actionable identification and interoperable exchange of
intellectual property on digital networks
• Coordinated by the International DOI foundation (IDF)
Prefix
Suffix
10.1000/123456
DOI
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 18
Citing primary data
In her publications, Mrs. Weather is now citing this primary data
with its unique DOI, maintaned from the TIB:
doi:10.1594 /WDCC/W_Han_2003_MMB_2
10.1594
WDCC
W_Han_2003_MMB_2
(Prefix) stands for the TIB as the
registration agency.
stands for the respective
research institute.
is the internal name of the Data
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 19
Resolving the DOI
These DOI can be resolved (and the data can be cited) in
every browser worldwide in three ways:
• http://dx.doi.org/10.1594/WDCC/W_Han_2003_MMB_2
• http://doi.tib-hannover.de:8000/10.1594/WDCC/W_Han_2003_MMB_2
Or by
Doi://10.1594/WDCC/W_Han_2003_MMB_2
(after installing a browser plugin)
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 20
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 21
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 22
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 23
Usage scenario 1
• Mr. Storm is reading publications from Mrs. Weather in a
journal and would like to analyse her data under different
aspects.
• In his publication ”Comparison of the weather from Hannover
and Miami” Mr. Storm cites Mrs. Weathers data using its DOI,
refering to the uniqueness and own identity of the original
data.
• Citation example:
Weather, 2003: Weather in Hannover for 2003. [doi:
10.1594/WDCC/W_Han_2003_MMB_2]
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 24
Usage scenario 2
• Mr. Nice is writing a paper about the sales figures of ice cream
in Hannover in 2003, but he has no information about the
weather.
• He uses the TIB as the central registration agency to start a
metadata search over the registered primary data.
• The result is doi:10.1594/WDCC/W_Han_2003_MMB_2
• He resolves the DOI to find the data sufficient.
• The metadata refers him to the WDCC as publisher and data
archive.
• In his paper he cites the data again using their DOI.
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 25
URN
In cooperation with the German Library (DDB) in Frankfurt,
every dataset is also registered with an unique URN, having
the same structure as the DOI:
DOI-Structure:
10.1594/WDCC/W_Han_2003_MMB_2
URN-Structure:
Urn:TIB:10.1594/WDCC/W_Han_2003_MMB_2
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 26
Current situation
• In cooperation with
 World Data Center Climate (WDCC), Max Plank Institut für
Meteorologie, Hamburg
• Geoforschungszentrum Potsdam
• World Data Center MARE, Uni. Bremen and Alfred Wegener
Institute Bremerhaven
• Learning Lab Lower Saxony, Uni. Hannover
the TIB Hannover now is the world‘s first registration agency
for scientific and technical data (STD-DOI).
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 27
Technical
• A Handle server is installed at the TIB Hannover, so TIB is able
to register and resolve DOIs.
• The TIB officially received a DOI Prefix (10.1594)
• The first data sets have been stored at the TIB by hand.
• The automatic registration process is under development.
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 28
Technical realization
Central Library database
Göttingen
Metadata storage
International
DOI
Foundation
DOI registration
Cocoon-Webserver
XML-basiert
XSL-Transformierung
Handle Server
DDB
URN registration
Data URL with XML-file
GFZ
WDCs
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 29
Outlook
2004
• We expect abaout 10.000 datasets until the end of the year.
2005
• The system shall be widened for other science fields
2006
• The TIB Hannover shall become the central registration
agency for scientific primary data
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 30
Further information
Project webpage:
http://www.std-doi.de
TIB Handle Server:
http://doi.tib-hannover.de:8000
DOI Foundation:
http://www.doi.org
URN registration of the DDB:
http://www.persistent-identifier.de
J.Brase (L3S) + M.Lautenschlager (WDCC) / 18.06.04 / 31