Publishing Data

Download Report

Transcript Publishing Data

Publishing Data
Catherine Jones
Library Systems Development Manager, STFC
Rutherford Appleton Laboratory
CLADDIER workshop, Chilworth, Southampton, UK
15th May 2007
Contents
•
•
•
•
•
•
•
Set the scene
Definition of publication
Complexities
Making data permanently available
Quality control
User requirements
Issues
Microsoft’s
Science 2020
Report
Modern scientific communication relies on
both journals and databases. At present
these are not integrated.
By 2020 mutual linking will be
commonplace and publications just
containing peer-reviewed data will
become available.
http://research.microsoft.com/towards2020science/downloads.htm
Publication
concept
In this context “publication” is defined as
the process through which data is fixed
and made retrievable over the long term,
and may imply that there has been some
quality control process.
Complexities of
Data
These all show the same data at different
levels of processing.
Making data
permanently
available
Three areas:
1. Defining what is to be kept:
encapsulation
2. Ensuring that it is described
effectively: metadata
3. Identifying who is responsible for the
data management: trusted repository
Encapsulation
A method of identifying a fixed collection
of meaningful data so that it can be
preserved as a clearly defined unchanging
entity.
Datasets which are still growing
Versions of datasets
Format translations
Metadata
Needs to be created to ensure that the
data is usable now and over the long
term.
Semantic encapsulation is important as
this is likely to be used in citation.
Trusted
repository
To ensure that the data is available over
the long term, the Data Centre needs to
be on a secure footing and well managed.
Quality Control
Usability of the dataset. This is one of
the roles of the Data Centres.
Usefulness of the dataset. This is the
role of domain experts.
User requirements
for citation
1.Need for an unambiguous reference to a well
defined permanent entity
2.This reference/citation needs to be
understandable for humans
3.Author and publication year, or equivalents, are
important
4.An unambiguous data reference, in this area
includes the activity or tool which produced the
data
5.Source of the data (i.e. the repository) may be as
important as the producer and needs to be
unambiguous
Requirements
from data
producers
1. Traceable to the data
provider/producer
2. Usable for usage metrics
3. To be recognised as intellectually
equivalent to academic papers
4. Able to be used to search for papers
citing data
Citation format
Author, title, [medium], publisher,
publication date, identifier, feature,
[access date, available at]
Natural Environment Research Council, MesosphereStratosphere-Troposphere Radar Facility at Aberystwyth,
[Internet]. British Atmospheric Data Centre (BADC), 1990- urn
badc.nerc.ac.uk/data/mst/v3/upd15032006, feature
200409031205 [http://featuretype.registry/VerticalProfile]
[cited 2006 Apr 25. Available from
http://badc.nerc.ac.uk/data/mst.]
Issues for
consideration
•The ability to cite data is strongly linked to the
definition of the data.
•Dynamic datasets pose additional issues for longterm accessibility.
•Versioning of the data and the
processing/analysing software are big issues to
resolve.
•Peer review of the data is important.
•Identification of datasets where a facility may
provide data from a set of instruments is a
complex decision.