Transcript Document

Preservation, access and re-use of research data
A Publisher’s perspective……and how we can help
Joep Verheggen, Elsevier
PARSE.insight workshop, Darmstadt, 22 September 2009
Context
“…… increased availability of primary sources of data in digital form has the potential to
shift the balance away from research based on secondary sources such as publications,
thus positioning data as the central element in the scientific process.” (a statement from the
Director of the Directorate General for Information Society and Media of the European Commission, 2008)
“If the raw data doesn’t form a central part of the scientific record then we perhaps need to
start asking whether the usefulness of that record in its current form is starting to run out.”
(from a blog called Science in the Open: http://blog.openwetware.org/scienceintheopen/2008/05/16/avoidthe-pain-and-embarassment-make-all-the-raw-data-available/
“..let us get back to the days where observational scientists could justify peer reviewed
publication primarily on the basis of collection, description and reporting of high quality data
sets (usually with some basic level of interpretation..” Quote taken from a discussion paper called “The
Risk-Reward Basis for Data Publication” (marine sciences, 2007)
“Problem = scientific community does not see online data as “publication” (from a presentation
called: How to motivate scientists to publish data online, Mark J. Costello. June 2008)
2
what makes researchers decide to publish data.. Simply put
motivation
application
• reward & recognition
• community culture
• collaboration
• instructions (& mandates)
process
• modelling & simulation
• commercial applications
• long term preservation
• Re-evaluation
infrastructure
Genetics
Materials sciences
3
What do scientist want…….
4
5
6
What do Publishers currently do……
Instructions to authors in “Tetrahedron”
7
Supplementary files
are linked directly from
an article’s abstract
page.
8
Supplementary files are
referenced within the article
text and linked via the
article’s abstract page
using the doi.
9
10
How do Publishers view research data in the context of “IPR”
The Publishing Industry (STM/ALPSP) position is:
“…..believe that, as a general principle, data sets, raw data outputs of research,
and sets or subsets of that data should wherever possible be made freely accessible
to other scholars” (Statement from STM & ALPSP, June 2006)
It is also stated that:
“….articles published in scholarly journals often include tables and charts in which
certain data points are included or expressed. Journal publishers often do seek
the transfer of or ownership of the publishing rights in such illustrations.., but this
does not amount to a claim to the underlying data itself..”
11
Research data and the Publisher’s Mission
Publishers are committed to making genuine contributions to
the research communities…..
Can we meaningful contribute to an
“editorial” process for data?


Submission processes
editorial organization, review
Can we contribute to the data
dissemination/retrieval process?


Storing, Linking
Search, Discovery
Can we contribute to research
workflows ?


Meta-data, collections, ontologies
Visualization, mining, etc
• support to the scholarly
communication process
• increased availability of
research output
• increased citations to
research output
• increased overall quality of
research
• develop new means of
knowledge discovery
• increase in the research
efficiency
12
Support through the journal networks and publishing platforms
Move from…..




General instructions to make
available
available as supplementary
information with the online article
Textual references to data
repositories & datasets
Verbal instructions, limited support
by editorial team
To……….






Note: a successful implementation
requires a combination of domain
specific and generic solutions


“More granular” definition of research
data and supplementary information
Specific instructions & mandates
how, when and where to submit, and
how to cite.
Specific sustainable destinations for
research data
Agreed formats & metadata
requirements for data submission
Expand editorial teams with a “dataeditor”
Hyper-linking between articles and
(final) dataset destinations and v.v.
“Federated searching”
Intelligent (contextual) referencing of
datasets in articles
13
working examples……..
14
Visa versa
15
Digital object identifiers – for data(sets)
16
A possible solution



Creation of new and strenghtening of existing
data centers.
Global access to data sets and their metadata
through existing catalogues.
By the use of persistent identifiers
This and the following slides are taken from Jan Brase’s presentation
17
Results







Citability of primary data
High visability of the data
Easy re-use and verification of the data sets.
Scientific reputation for the collection and
documentation of data (Citation Index)
Accepting the rules of good scientific practice
Avoiding duplications
Motivation for new research
18

What the DOI System Can Do to Help
19
Project

The German Research Foundation (DFG) has
started the project Publication and Citation of
Scientific Primary Data to increase the
accessibility of scientific primary data, starting with
the field of earth science.

The German National Library of Science and
Technology (TIB) is now established as a “noncommercial” DOI-registration agency for scientific
primary data as a member of the International DOI
Foundation (IDF).
20
Data and article

The DOI system offers an easy way to connect
the article with the underlying data:
The dataset:
G.Yancheva, . R Nowaczyk et al (2007)
Rock magnetism and X-ray flourescence spectrometry analyses
on sediment cores of the Lake Huguang Maar, Southeast
China, PANGAEA
doi:10.1594/PANGAEA.587840
Is cited in the article:
G. Ycheva, N. R. Nowaczyk et al (2007)
Influence of the intertropical convergence zone on the East
Asian monsoon
Nature 445, 74-77
21
22
23
24
25
26
27
“Quasi mash-up” type of interaction
28
29
Collaborations is the way to go…….
30
In conclusion

Do Publishers recognise the importance of “data publishing”
YES

Can Publishers help to get research data in the open?
YES

Will Publishers help to improve the discoverability of data?
YES



…..and YES:
Solutions must be scalable & sustainable
Existing capabilities should be used as much as possible
We need to secure buy-in from the researchers and
research communities as well as the policy makers
31
And finaly
Thank you for your attention……..
[email protected]
32