Data lifecycle for STFC facilities

Download Report

Transcript Data lifecycle for STFC facilities

Scientists are Sensitive too: Some Issues in Research ethics arising from Data Sharing

Brian Matthews

Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory [email protected]

What’s this talk about?

• Cultural barriers to sharing data – Ethics of unrestricted access • How this affects what we do – Data Policy – Implementation of Data Publication • Stimulate discussion

“We must give taxpayers more bang for their buck. Open access to papers and data will speed up important breakthroughs by our researchers and businesses, boosting knowledge and competitiveness in Europe.”

Máire Geoghegan-Quinn, European commissioner for research, innovation and science July 2012 http://europa.eu/rapid/press-release_IP-12 790_en.htm?locale=en .

Opportunities for Data Exchange (ODE)

• EC FP7 Project: 2010-12 • Workshops and interviews – Conceptual model – Drivers, barriers, enablers to data sharing

R. Darby, S. Lambert, B. Matthews, M. Wilson, K. Gitmans, S. Dallmeier-Tiessen, S. Mele, J. Suhonen

Enabling Scientific Data Sharing and Re-use.

IEEE Conf. on E-Science, Chicago, Oct 2012.

http://www.alliancepermanentaccess.org/in dex.php/community/current-projects/ode/

• • • •

Drivers for Data Sharing

Societal benefits

– Economic/commercial benefits; – Better quality decision making in government and commerce;

Academic benefits

– The integrity of science as an activity is increased

Research benefits

For the data contributor:

Validation of

scientific results by other scientists; •

Recognition

of their contribution.

For the data user:

• Re-Use of data in

new analysis

• Re-use of data in

metastudies

; • Re-use of data in

interdisciplinary studies

;

Organisational benefits

Producer Organisation:

enhances organizational profile; –

Publisher Organisation:

adds value to the product.

Infrastructure Organisation: r

eputation as "data holder with expert support" is increased –

Consumer Organisation: can

use data to make policy decisions;

Cultural Barriers to Data Sharing

• • • • Publisher Practises: – Journal articles do not describe available data as a publication – Data not recognised as a citable publication – Lack of data reviewers to assess data quality Research Assessment – Publication and citation of data not tracked – Not counted as part of performance evaluation for careers Academic Defensiveness – Fear that others will benefit from their data and gain priority for results – Fear that their results will not be validated – Fear that misuse of data will harm the data contributor – Fear that use of data to support arguments the data contributor disagrees with Personal data confidentiality – Anonymity of subjects in medical and social science in particular – Perceived conflicts between data protection and FOI

Thus unrestricted data access has ethical implications

“By confusing the allocation of scientific merit and potentially undermining authorship conventions, data sharing could work against individual scientists' need for recognition”

Gerrit Hirschfeld, Open science: Data sharing is harder to reward Correspondence, Nature Volume: 487, Page: 302 : (19 July 2012) DOI: doi:10.1038/487302c

RCUK Principles on Data Policy Common Principles

1. Public good 2. Preservation 3. Discoverability 4. Confidentiality 5. First use 6. Recognition 7. Costs A tension between these principles

Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property.

RCUK recognises that there are legal, ethical and commercial constraints on release of research data. To ensure that the research process is not damaged by inappropriate release of data , research organisation policies and practices should ensure that these are considered at all stages in the research process

http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx

So how to do we implement a data management policy and infrastructure which acknowledges and manages these tensions ?

The Science We Do: Large Scale Facilities

Visit facility on research campus Place sample in beam Diffraction pattern from sample • ~30,000 user visitors each year in Europe: – physics, chemistry, biology, medicine, energy, environmental, materials, culture, pharmaceuticals, petrochemicals, microelectronics • • Billions of € of investment – c. £400M for DLS + running costs Over 5.000 high impact publications per year in Europe • • •

Data management infrastructure

– – – – Fitting experimental data to model Capture, Process Store Catalogue Link to publications Structure of cholesterol in crude oil Common infrastructure in Europe Longitudinal strain in aircraft wing Bioactive glass for bone growth Hydrogen storage for zero emission vehicles Magnetic moments in electronic storage

A Facility Data Repository

• A bit like a

University research repository

– Data generated from within our institution – by nature collaborative with other institutions • A bit like a

Subject Repository

– Data collected via “Neutron or Synchrotron” science – not, one discipline, nor the whole of any discipline – Do not have a mandate to aggregate and disseminate data within the discipline • A bit like a

Memory institution

– Public funding to collect, preserve and make available data – No obligation to deposit/mandate to archive

Supporting our user community

ISIS Data Policy • ISIS Neutron Source Data Policy

http://www.isis.stfc.ac.uk/user-office/data policy11204.html

– Consultation with science user community

• Influencing Diamond Synchrotron policy • Now influencing policy across Europe

– PaNData data policy framework – For similar facilities http://wiki.pan-data.eu/imagesGHD/0/08/PaN-data-D2-1.pdf

Some policy details

3.1.1

All raw data and the associated metadata

obtained as a result of

free (non commercial)

access to ISIS, reside in

the public domain

, with ISIS acting as the custodian 3.3.2

Access

to the on-line catalogue will be

restricted

STFC/ISIS as users of the on-line catalogue.

to those who

register

with 3.3.3

Access restricted

to raw data and the associated metadata obtained from an experiment is to the

experimental team

for a period of

three years

after the end of the experiment.

Thereafter

, it will become

publicly accessible

. Any PI that wishes their data to remain ‘

restricted access

’ for a

longer period

will be required to make a

special case to the Director

of ISIS.

3.3.6 The on-line catalogue will enable the

linking proposals

.

Access

to

proposals

will

only

of experimental

data

to experimental ever be provided to the

experimental team

and appropriate

STFC staff,

unless otherwise authorized by the PI.

4.1.1

Ownership

of all results derived from the

analysis

of the raw data is determined by the

contractual obligations

of the

person(s

)

performing the analysis

.

5.4 PIs and researchers who carry out analyses of raw data and metadata are encouraged to link the results of these analyses with the raw data / metadata using the facilities provided by the on-line catalogue.

Furthermore, they

are encouraged to make such results publicly accessible

.

TopCat

Data Publication using DOIs

• Use DataCite service to: – mint, sustain, search and discover digital object identifiers (DOI) • DOIs are issued per experiment – experiments collecting

raw data

– Easy for facility – May also want finer granularity (datasets, datafiles) • Makes data a research publication – Data is citable like a journal article – Can be accessed and quality assured – Bibliometric services count data citation frequency for impact, e.g. REF submission • Experimenters can gain credit for collecting data

DOI Data Access Process

Easton,S; Barnes,C H W; Ionescu,A ; (2011): RB820232: Magnetic moment of EuO in spin filtering magnetic tunnel structures .; STFC ISIS Facility . doi:10.5286/ISIS.E.24066298

Issues of using DOIs

• The DOI is issued when the experimental time is

allocated

. – we want to identify the

experiment

– encourage use of the DOI itself • DataCite require:

authors, title, date, publisher

to be entered when the DOI is issued. – But this can give a information away too early • before expiration of embargo period.

• before publication of results • before derivation of results –

Even releasing Metadata could be unethical (or at least break our poiicy)

• Should the DOI be issued later so that this information is only made public later ?

– When data is collected ?

– When data is released ?

• The minimum metadata are available from the registry when data are collected: – Update the metadata when the data is released.

Summary

• Open data is a public policy goal – But need to bring along the research community – Ethical implications of unrestricted release – Data release should not damage the research process.

• Data publication – Provide data publication and citation mechanisms • Data embargos – Exclusive access to research data – as a data

collector

– Embargo the metadata too – A crude one size fits all mechanism • Recognition and rewards for data publication – a common system of credit and recognition for data production and sharing is needed – provide researchers with clear instructions on how to cite data – Include data publication & citation metrics in researcher appraisal

Thank You Questions?

[email protected]

www.stfc.ac.uk/SCD