Transcript Data lifecycle for STFC facilities
Scientists are Sensitive too: Some Issues in Research ethics arising from Data Sharing
Brian Matthews
Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory [email protected]
What’s this talk about?
• Cultural barriers to sharing data – Ethics of unrestricted access • How this affects what we do – Data Policy – Implementation of Data Publication • Stimulate discussion
“We must give taxpayers more bang for their buck. Open access to papers and data will speed up important breakthroughs by our researchers and businesses, boosting knowledge and competitiveness in Europe.”
Máire Geoghegan-Quinn, European commissioner for research, innovation and science July 2012 http://europa.eu/rapid/press-release_IP-12 790_en.htm?locale=en .
Opportunities for Data Exchange (ODE)
• EC FP7 Project: 2010-12 • Workshops and interviews – Conceptual model – Drivers, barriers, enablers to data sharing
R. Darby, S. Lambert, B. Matthews, M. Wilson, K. Gitmans, S. Dallmeier-Tiessen, S. Mele, J. Suhonen
Enabling Scientific Data Sharing and Re-use.
IEEE Conf. on E-Science, Chicago, Oct 2012.
http://www.alliancepermanentaccess.org/in dex.php/community/current-projects/ode/
• • • •
Drivers for Data Sharing
Societal benefits
– Economic/commercial benefits; – Better quality decision making in government and commerce;
Academic benefits
– The integrity of science as an activity is increased
Research benefits
–
For the data contributor:
•
Validation of
scientific results by other scientists; •
Recognition
of their contribution.
–
For the data user:
• Re-Use of data in
new analysis
• Re-use of data in
metastudies
; • Re-use of data in
interdisciplinary studies
;
Organisational benefits
–
Producer Organisation:
enhances organizational profile; –
Publisher Organisation:
adds value to the product.
–
Infrastructure Organisation: r
eputation as "data holder with expert support" is increased –
Consumer Organisation: can
use data to make policy decisions;
Cultural Barriers to Data Sharing
• • • • Publisher Practises: – Journal articles do not describe available data as a publication – Data not recognised as a citable publication – Lack of data reviewers to assess data quality Research Assessment – Publication and citation of data not tracked – Not counted as part of performance evaluation for careers Academic Defensiveness – Fear that others will benefit from their data and gain priority for results – Fear that their results will not be validated – Fear that misuse of data will harm the data contributor – Fear that use of data to support arguments the data contributor disagrees with Personal data confidentiality – Anonymity of subjects in medical and social science in particular – Perceived conflicts between data protection and FOI
Thus unrestricted data access has ethical implications
“By confusing the allocation of scientific merit and potentially undermining authorship conventions, data sharing could work against individual scientists' need for recognition”
Gerrit Hirschfeld, Open science: Data sharing is harder to reward Correspondence, Nature Volume: 487, Page: 302 : (19 July 2012) DOI: doi:10.1038/487302c
RCUK Principles on Data Policy Common Principles
1. Public good 2. Preservation 3. Discoverability 4. Confidentiality 5. First use 6. Recognition 7. Costs A tension between these principles
Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property.
RCUK recognises that there are legal, ethical and commercial constraints on release of research data. To ensure that the research process is not damaged by inappropriate release of data , research organisation policies and practices should ensure that these are considered at all stages in the research process
http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
So how to do we implement a data management policy and infrastructure which acknowledges and manages these tensions ?
The Science We Do: Large Scale Facilities
Visit facility on research campus Place sample in beam Diffraction pattern from sample • ~30,000 user visitors each year in Europe: – physics, chemistry, biology, medicine, energy, environmental, materials, culture, pharmaceuticals, petrochemicals, microelectronics • • Billions of € of investment – c. £400M for DLS + running costs Over 5.000 high impact publications per year in Europe • • •
Data management infrastructure
– – – – Fitting experimental data to model Capture, Process Store Catalogue Link to publications Structure of cholesterol in crude oil Common infrastructure in Europe Longitudinal strain in aircraft wing Bioactive glass for bone growth Hydrogen storage for zero emission vehicles Magnetic moments in electronic storage
A Facility Data Repository
• A bit like a
University research repository
– Data generated from within our institution – by nature collaborative with other institutions • A bit like a
Subject Repository
– Data collected via “Neutron or Synchrotron” science – not, one discipline, nor the whole of any discipline – Do not have a mandate to aggregate and disseminate data within the discipline • A bit like a
Memory institution
– Public funding to collect, preserve and make available data – No obligation to deposit/mandate to archive
Supporting our user community
ISIS Data Policy • ISIS Neutron Source Data Policy
http://www.isis.stfc.ac.uk/user-office/data policy11204.html
– Consultation with science user community
• Influencing Diamond Synchrotron policy • Now influencing policy across Europe
– PaNData data policy framework – For similar facilities http://wiki.pan-data.eu/imagesGHD/0/08/PaN-data-D2-1.pdf
Some policy details
3.1.1
All raw data and the associated metadata
obtained as a result of
free (non commercial)
access to ISIS, reside in
the public domain
, with ISIS acting as the custodian 3.3.2
Access
to the on-line catalogue will be
restricted
STFC/ISIS as users of the on-line catalogue.
to those who
register
with 3.3.3
Access restricted
to raw data and the associated metadata obtained from an experiment is to the
experimental team
for a period of
three years
after the end of the experiment.
Thereafter
, it will become
publicly accessible
. Any PI that wishes their data to remain ‘
restricted access
’ for a
longer period
will be required to make a
special case to the Director
of ISIS.
3.3.6 The on-line catalogue will enable the
linking proposals
.
Access
to
proposals
will
only
of experimental
data
to experimental ever be provided to the
experimental team
and appropriate
STFC staff,
unless otherwise authorized by the PI.
4.1.1
Ownership
of all results derived from the
analysis
of the raw data is determined by the
contractual obligations
of the
person(s
)
performing the analysis
.
5.4 PIs and researchers who carry out analyses of raw data and metadata are encouraged to link the results of these analyses with the raw data / metadata using the facilities provided by the on-line catalogue.
Furthermore, they
are encouraged to make such results publicly accessible
.
TopCat
Data Publication using DOIs
• Use DataCite service to: – mint, sustain, search and discover digital object identifiers (DOI) • DOIs are issued per experiment – experiments collecting
raw data
– Easy for facility – May also want finer granularity (datasets, datafiles) • Makes data a research publication – Data is citable like a journal article – Can be accessed and quality assured – Bibliometric services count data citation frequency for impact, e.g. REF submission • Experimenters can gain credit for collecting data
DOI Data Access Process
Easton,S; Barnes,C H W; Ionescu,A ; (2011): RB820232: Magnetic moment of EuO in spin filtering magnetic tunnel structures .; STFC ISIS Facility . doi:10.5286/ISIS.E.24066298
Issues of using DOIs
• The DOI is issued when the experimental time is
allocated
. – we want to identify the
experiment
– encourage use of the DOI itself • DataCite require:
authors, title, date, publisher
to be entered when the DOI is issued. – But this can give a information away too early • before expiration of embargo period.
• before publication of results • before derivation of results –
Even releasing Metadata could be unethical (or at least break our poiicy)
• Should the DOI be issued later so that this information is only made public later ?
– When data is collected ?
– When data is released ?
• The minimum metadata are available from the registry when data are collected: – Update the metadata when the data is released.
Summary
• Open data is a public policy goal – But need to bring along the research community – Ethical implications of unrestricted release – Data release should not damage the research process.
• Data publication – Provide data publication and citation mechanisms • Data embargos – Exclusive access to research data – as a data
collector
– Embargo the metadata too – A crude one size fits all mechanism • Recognition and rewards for data publication – a common system of credit and recognition for data production and sharing is needed – provide researchers with clear instructions on how to cite data – Include data publication & citation metrics in researcher appraisal
Thank You Questions?
www.stfc.ac.uk/SCD