Mind the Gap: Reflections on Data Policies and Practice Dr Liz Lyon, Director, UKOLN, University of Bath, UK Associate Director, UK Digital Curation Centre JISC/CNI.

Download Report

Transcript Mind the Gap: Reflections on Data Policies and Practice Dr Liz Lyon, Director, UKOLN, University of Bath, UK Associate Director, UK Digital Curation Centre JISC/CNI.

Mind the Gap:
Reflections on Data
Policies and Practice
Dr Liz Lyon, Director, UKOLN, University of Bath, UK
Associate Director, UK Digital Curation Centre
JISC/CNI Conference, Edinburgh, July 2010
UKOLN is supported by:
This work is licensed under a Creative Commons Licence
Attribution-ShareAlike 2.0
www.ukoln.ac.uk
A centre of expertise in digital information management
.
• UK Data Policy Context
– Institutions & open science
– Data practice today
• Future landscape
– Scale and complexity
– Open and personal
– Drivers and incentives
• Challenges & Actions
– Planning tools
– Policy Gaps
Overview
1. Current Practice
1. Scale, Complexity, Predictive
Potential
2. Continuum of Openness
3. Citizen Science
4. Credentials, Incentives, Rewards
5. Institutional Readiness &
Response
6. Data Informatics Capacity &
Capability
•Open Science at Web-Scale Report
http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/publications.html#november-2009
Scoping study :
institution perspective
• Creating & organising data
• Storage and access
• Back-up
• Preservation
• Sharing and re-use
INCREMENTAL Project
http://www.flickr.com/photos/mattimattila/3003324844/
“Departments don’t have guidelines or
norms for personal back-up and researcher
procedure, knowledge and diligence varies
tremendously. Many have experienced
moderate to catastrophic data loss”
Incremental Project Report, June 2010
“Data
sharing was
more readily
discussed by
early career
researchers.”
“While many researchers are
positive about sharing data in
principle, they are almost
universally reluctant in
practice. ..... using these
data to publish results before
anyone else is the
primary way of gaining
prestige in nearly all
disciplines.”
INCREMENTAL Project
Heather Piwowar
…but many
researchers
don’t share…
…and are
reluctant to
re-use data…
“Interviewees were
often unaware of
existing guidance,
resources.... and
policy documents.”
“They found the documents
....to be dense, wordy,
theoretical, ambiguous and
un-engaging.”
Incremental Project Report, June 2010
“Many people are suspicious of ‘policies’
which sound like hollow mandates, but
are receptive to ‘procedures’ or ‘advice’
which may be essentially the same thing,
but convey a sense of purpose and
assistance rather than requirement.”
The majority of people felt that some
form of policy or guidance was needed....
Incremental Project Report, June 2010
2. Future Data Landscape ?
Genomics exemplar
$1000 genome in <15 minutes ....by 2013?
...Next next generation technology race to market
Researchers need....
• Large-scale data storage that is:
– Cost-effective (rent on-demand)
– Secure (privacy and IPR)
– Robust and resilient
– Low entry barrier / ease-of-use
– Has data-handling / transfer / analysis capability
• Cloud services?
• “....analyse an entire human genome in a single
day sitting with a laptop at your local Starbucks.”
Data storage
policy?
The “new” genome informatics ecosystem
The case for cloud computing in genome
informatics.
Lincoln D Stein, May 2010
Post-genome decade
Human
genomes: >24
published &
almost 200
unpublished
They
have
shared
their
data….
Share
my data
Data sharing policy?
“P4 medicine :
Predictive,
Personalised,
Preventive,
Participatory.”
Leroy Hood –
Institute for Systems Biology
Image from Scientific American
...“medicine is going to become
an information science”...
P4 medicine
• Each patient’s genome sequenced
• Your genome is basis of your medical record
• New method to anonymise medical records for
genomics research at Vanderbilt Univ (April ‘10)
• New Predictive models of health and disease
• Personalised treatments focus on Preventative
therapies
Genome scale network biology
Genomic data as a commodity
Stephen Friend
•
•
•
•
Sage Bionetworks : Integrative genomics
Open data in the Sage Commons repository
Human and mouse: clinical and genetics data
Develop predictive models of disease: liver /
breast / colon cancer, diabetes, obesity
• Crowd-sourced effort : global scope
Participatory
medicine :
share data &
empower the
patient...
Sage Congress
San Francisco April 2010
• Significant implications for Faculty
• Awareness of wider societal benefits
• University Ethics Committee
“You have zero privacy anyway.
Get over it”
Scott McNealy, CEO Sun
Microsystems, 1999
Data Ethics & Privacy Policy?
Public participation, citizen science
Results data : validate in professional press
• Faculty attitude & culture
• Professional : amateur
Data policy for public engagement?
Incentives?
Calls for action,
new metrics
Complexity : what are we citing?
•
•
•
•
•
•
•
•
Journal
Macro
Article
Workflow
Visualisation
Model
Data
Annotation
Concept
Micro / Nano
Attribution granularity
Large-scale predictive
network models of disease
• Multiple datasets
• Visualise: Cytoscape
• Workflow: Taverna
Data citation
policy?
3. Policy
guidance,
planning tools,
Code of Conduct
State-of-the-Art Report :
Models & Tools (Alex Ball, June 2010)
•
•
•
•
•
•
•
•
Data Lifecycles
Data Policies (UK) incl DMP
Standards & tools
Data Asset Framework (DAF)
DANS Seal of Approval
Preservation metadata
Archive management tools
Cost / benefit tools
• Data types, formats, standards, capture
• Ethics and Intellectual Property
• Access, sharing and re-use
• Short-term storage & data management
• Deposit & long-term preservation
• Adherence and review
DMP Online
Currently updating Version 2.0
Version 3.0 summer 2010
http://www.dcc.ac.uk/dmponline
Making DMPs work : the
start of a long process…
• Embed DMPs in funder policies &
research lifecycles as the norm
• Code of Conduct for Research
• Assess & review DMPs (not just
the science content of proposals)
• Educate reviewers (DCC guidance
for social science in prep)
• Manage compliance of researchers
• Infrastructure to share DMPs
• Analyse cost-benefits for UK HE
Take homes...
• Practice is disconnected from policy
• Policy Gaps
–
–
–
–
–
Data Storage (& Appraisal: DCC guidance in prep)
Data Sharing (& Licensing: DCC guidance in prep)
Ethics and Privacy
Citizen Science & Public Engagement
Data Citation and Attribution
• Collaborate with funders to make DMPs work
• Digital Curation Centre DMP tool & resources
www.dcc.ac.uk
Thank you…
Chicago Mart Plaza, 6-8 December 2010