Transcript Document
Data Management
Principles - Planning
UniMelb Cluster - Research Symposium
Lyle Winton
24 Oct 2008
16/07/2015
1
Who am I?
Dr Lyle Winton
Background:
Researcher/Scientist
Technical Consultant
education and research for gov. infrastructure projects
Software Engineer
experimental high energy physics, distributed systems, Grid
industry, higher education (web development, information systems, enterprise
systems)
Currently:
eScholarship Research Centre (eSRC) & Research Computing Services,
Information Services
Senior Research Support Officer (eResearch)
16/07/2015
provide ICT support for research workers, supply expertise & strategic advice
develop plans for eResearch infrastructure
be active in local & national eResearch co-ordination groups
eScholarship Research Centre
2
Data Management
What are we doing…
(eSRC & eR – myself, Joanne Evans, Simon
Porter, Gavan McCarthy, Leon Sterling)
Policy
Planning
Tools
(focus)
(focus)
Services
Infrastructure
Training
Consultancy
16/07/2015
(focus)
(focus)
eScholarship Research Centre
3
Nationally…
ANDS Vision: “The development of ANDS is intended to provide the essential meeting place where the
Australian path forward for research data management can evolve and where a vision can be
achieved.”
Towards an Australian Data Commons, ANDS – Oct 2007
“ – institutions will be expected to have and support data management plans, and any researcher
seeking support through a number of government funding agencies will be expected to describe
how the data generated through the project will be managed throughout its lifecycle.”
ANDS Interim Business Plan – Sept 2008
“Enabling Components… Data Storage: … This investment will extend to research organisations for
the development of institutional nodes of the storage grid, on the condition that the storage is used
exclusively for research data; the institutes co-invest in the infrastructure; each institute publishes
and adopts a data management plan; and each institute ensures its researchers use and abide by
the data management plan.”
Strategic Roadmap for Research Infrastructure, NCRIS – July 2008.
16/07/2015
eScholarship Research Centre
4
Known problems…
“A mature data stewardship system, interlinking policy and infrastructure could address
the needs of researchers and improve the quality and efficiency of Australian
innovation and research.”
“The survey found that individual researchers and research groups do not include data
management as an element when planning research projects.”
“Grants do not fund the creation of datasets as an end in itself, nor are funds provided
explicitly for the management of data.”
“The survey found that research groups and organisations rarely have formal policies for
the management of data. They usually have a set of practices that may or may not be
adhered to at the project level.”
“Researchers… see research data as belonging to them. … Experienced researchers
have been managing data all their careers.”
AERES report – Oct 2006
16/07/2015
eScholarship Research Centre
5
Some UniMelb goals…
Information Futures Commission
Excerpts from final report…
We will know we're on track if:
“Management and dissemination of research data and digital collections is
painless.”
We propose that we will:
“Develop and adopt standards, guidelines and processes for the
management, access and preservation of research data”
“Implement a program for targeted curation of collections…”
“Implement a digitisation and profiling strategy for works in collections
(including 'born digital')…”
Numerous references to services surround data:
16/07/2015
“Adequate physical and digital collections support research, learning and
teaching, and knowledge transfer … Cataloguing and search tools make it
easy to discover, cite and manage information.”
eScholarship Research Centre
6
Where are we heading?
Formal Research Data Management Infrastructure/Plans/Policies are
emerging!
Globally researchers are beginning to adopt this as good practice
University is moving towards this as standard practice
We need to start implementing and/or improving…
Professional Data/Info Management Practice
ensuring quality research data
enables (appropriate) access
enables reuse of data
Policy, Intellectual Property & Licensing,
Contracts, Legislation, Process …
16/07/2015
not just paperwork and hurdles
ensuring research has integrity, repeatability
enables (appropriate) access
enables reuse of data
eScholarship Research Centre
Data
Management
Plan
(DMP)
7
Why now?
Research Data is increasing in size
Research Collaborations are increasing
Data is increasingly digital
Wonderful opportunities for reuse,
sharing, collaboration, analysis
However:
while microfilm and non-acidic paper
can last for 100+ years
magnetic media lasts 10+ years
optical media lasts 20+ years
(with proper handling)
2-10% of hard drives fail every year
software & hardware can outdate
And much info is still only hardcopy
16/07/2015
Lab books, notes, primary data, samples
Burroughs 1977 – B 9495
eScholarship
Centre
MagneticResearch
Tape Subsystem
8
Parts of the elephant…
Researchers & Departments
are
at varying levels of maturity
are experiencing different pain-points
Infrastructure Providers
are
focused on specific problems
are experts in different aspects/solutions
are getting varying requirements
16/07/2015
eScholarship Research Centre
9
Framing the elephant…
16/07/2015
eScholarship Research Centre
10
Training for post-grads
UpSkills eResearch Stream – “Data Management Workshop”
Influences and References
run 3 so far
The University of Melbourne Policy
(Research Office, Records Services)
Australian Code for Responsible Conduct of Research
(NHMRC, ARC, Universities Australia)
OAK Law Project, QUT
Belinda Weaver presentations, UQ
PILIN Project (ANDS/ARROW)
A few examples!
Review of material
By eScholarship Research Centre
By local eResearch social network (eCoffee)
By a small group of department research/IT managers
By School of Graduate Research
16/07/2015
eScholarship Research Centre
11
Training for post-grads
Workshop Covers:
Development of a web site (ongoing)
Components of a “Data Management Plan”
Recommended reading list
Information Modelling, Good Practice Guidance
Technologies
Feedback has been very positive!!!
Resources, References, Examples, Q&A
A Research DMP Template (ongoing)
Drafting guidelines to support the
implementation and compliance (underway)
Future developments:
Training materials for supervisors?
Discussing undergraduate data management
training across Uni
Possible DMP registry
16/07/2015
eScholarship Research Centre
12
Why Manage Research Data
IT IMPROVES YOUR RESEARCH BOTH NOW AND LATER…
Data is often valuable for a long time!!!
Maximise usefulness of data to fellow researchers
Context for the research, how data was collected, quality controls, how people can
and should use it (access and licensing), how you then attribute people/projects
can help lead to subsequent research papers
Good Practice Better Research
Results of your research may outlast the project, your degree,
your position, your career, your institution
historical value, predictable or unforseen
DMP’s state the parameters within which you MUST do research,
then follow them! (being a Professional Researcher)
document for new comers, your group, project, externals
Ensure research integrity (and repeatability)
16/07/2015
through keeping better records
can trace your outcomes right from data collection, through research method, through to results
promotes awareness of responsibilities, policies, ethics, legislation
eScholarship Research Centre
13
Why Manage Research Data
IT MAY SAVE WASTED TIME…
You need to properly…
Collect research data
Manage research data
Archive research data
…otherwise there is a risk you cannot use your data, wasting years of effort.
From a study of 500 charges of “research misconduct” 40% could have been avoided by good data
management practice!
“Student submits her PhD thesis for examination then leaves country taking the data with them. An
examiner questions the integrity of the research data. A reanalysis of the data and original questionnaire is
required.”
“Participant in a research project lodges a claim for compensation, alleging that he was not adequately
informed about the effects of the study, does not recall giving consent, and the raw data he provided has
become public. Where are the records?“
“Ten years after a patent has been granted a patent infringement action is lodged. The laboratory notebook
is required.”
“At completion of a research project the data and records are boxed and stored in a departmental storeroom.
Sometime later the researcher needs to access the original records to refute a claim of falsification. He finds
that the storeroom has since been converted into a laboratory/coffee-shop/learning-hub.”
16/07/2015
eScholarship Research Centre
14
Why Manage Research Data
AND YOU NEED TO PLAN AHEAD…
University of Melbourne Policy
research methods and results open to scrutiny
data should be retained in a durable and appropriately referenced form
16/07/2015
for at least 5 years from any publication
minimum of 15 years for clinical trials
minimum of 7 years for adult psychological files (for minors 7 years after reaching 18)
or longer if external/funding/regulatory/archival requirements
research units & departments have formally
documented procedures for retention
researchers must comply
ensure research data and records are
accurate, complete, authentic and reliable
data and records formed for verification and
include sufficient detail
(authenticity and validity of conclusions)
eScholarship Research Centre
15
What’s in a DMP?
A Possible Template:
Context (Outline, Pre-planning, Decisions)
Responsibilities (ethics, consent, licensing, legislation, funding requirements, reporting)
Process & Policies
Data Collection and QC Process
Access Policy
Appropriate Use and Access Patterns
Data Maintenance, Persistence and Archival Practice
Decommissioning/Destruction/Sanitisation
Technical Requirements (policy for system developers/implementers/admins)
Current Infrastructure and Requirements
Future Infrastructure Requirements
Interoperability
Data Security
Availability, Reliability, Support and Response
(full template found at http://www.esrc.unimelb.edu.au/dmp )
16/07/2015
eScholarship Research Centre
16
Why Plan?
Making the most of Infrastructure
ARCS
Data Fabric (NCRIS)
University Infrastructure
National Compute Infrastructure (VLSCI,
ANUSF, VPAC)
Advanced Technology (imaging, sequencing,
synchrotron)
16/07/2015
eScholarship Research Centre
17
Why Plan?
Making the most of Research Networks
ANDS
Data Commons
BioGrid Australia
Protein Data Bank
Increasingly you need to ensure
Research
integrity, traceability
Data and Result quality
Data reusability
Data security (misuse/damage, unintended/intended)
16/07/2015
eScholarship Research Centre
18
Communication
2-way Communication is important
Good Practice will emerge from both Research and ICT
expertise
National Infrastructure
Administration/ICT and Research Community
Opportunities and Trade-offs
3+ -way communication ?
Vision: a local community of practice
to provide and review guidelines and policies
to share data management plans
to drive development of shared infrastructure
advocate for and steer national infrastructure
16/07/2015
eScholarship Research Centre
19
What you can do…
http://www.esrc.unimelb.edu.au/dmp
Provide general feedback
Ask questions, we’ll seek answers
Work with us on guidance & good practice
Encourage students to attend future UpSkills
Talk with your students/group/department about
formally documenting a DMP
Feed back you DMP
16/07/2015
eScholarship Research Centre
20