Transcript Document

Data Management
Principles - Planning
UniMelb Cluster - Research Symposium
Lyle Winton
24 Oct 2008
16/07/2015
1
Who am I?

Dr Lyle Winton

Background:

Researcher/Scientist


Technical Consultant


education and research for gov. infrastructure projects
Software Engineer


experimental high energy physics, distributed systems, Grid
industry, higher education (web development, information systems, enterprise
systems)
Currently:


eScholarship Research Centre (eSRC) & Research Computing Services,
Information Services
Senior Research Support Officer (eResearch)



16/07/2015
provide ICT support for research workers, supply expertise & strategic advice
develop plans for eResearch infrastructure
be active in local & national eResearch co-ordination groups
eScholarship Research Centre
2
Data Management

What are we doing…
(eSRC & eR – myself, Joanne Evans, Simon
Porter, Gavan McCarthy, Leon Sterling)
 Policy
 Planning
 Tools
(focus)
(focus)
 Services
 Infrastructure
 Training
 Consultancy
16/07/2015
(focus)
(focus)
eScholarship Research Centre
3
Nationally…
ANDS Vision: “The development of ANDS is intended to provide the essential meeting place where the
Australian path forward for research data management can evolve and where a vision can be
achieved.”
Towards an Australian Data Commons, ANDS – Oct 2007
“ – institutions will be expected to have and support data management plans, and any researcher
seeking support through a number of government funding agencies will be expected to describe
how the data generated through the project will be managed throughout its lifecycle.”
ANDS Interim Business Plan – Sept 2008
“Enabling Components… Data Storage: … This investment will extend to research organisations for
the development of institutional nodes of the storage grid, on the condition that the storage is used
exclusively for research data; the institutes co-invest in the infrastructure; each institute publishes
and adopts a data management plan; and each institute ensures its researchers use and abide by
the data management plan.”
Strategic Roadmap for Research Infrastructure, NCRIS – July 2008.
16/07/2015
eScholarship Research Centre
4
Known problems…
“A mature data stewardship system, interlinking policy and infrastructure could address
the needs of researchers and improve the quality and efficiency of Australian
innovation and research.”
“The survey found that individual researchers and research groups do not include data
management as an element when planning research projects.”
“Grants do not fund the creation of datasets as an end in itself, nor are funds provided
explicitly for the management of data.”
“The survey found that research groups and organisations rarely have formal policies for
the management of data. They usually have a set of practices that may or may not be
adhered to at the project level.”
“Researchers… see research data as belonging to them. … Experienced researchers
have been managing data all their careers.”
AERES report – Oct 2006
16/07/2015
eScholarship Research Centre
5
Some UniMelb goals…

Information Futures Commission

Excerpts from final report…

We will know we're on track if:



“Management and dissemination of research data and digital collections is
painless.”
We propose that we will:

“Develop and adopt standards, guidelines and processes for the
management, access and preservation of research data”

“Implement a program for targeted curation of collections…”

“Implement a digitisation and profiling strategy for works in collections
(including 'born digital')…”
Numerous references to services surround data:

16/07/2015
“Adequate physical and digital collections support research, learning and
teaching, and knowledge transfer … Cataloguing and search tools make it
easy to discover, cite and manage information.”
eScholarship Research Centre
6
Where are we heading?

Formal Research Data Management Infrastructure/Plans/Policies are
emerging!



Globally researchers are beginning to adopt this as good practice
University is moving towards this as standard practice
We need to start implementing and/or improving…

Professional Data/Info Management Practice




ensuring quality research data
enables (appropriate) access
enables reuse of data
Policy, Intellectual Property & Licensing,
Contracts, Legislation, Process …




16/07/2015
not just paperwork and hurdles
ensuring research has integrity, repeatability
enables (appropriate) access
enables reuse of data
eScholarship Research Centre
Data
Management
Plan
(DMP)
7
Why now?



Research Data is increasing in size
Research Collaborations are increasing
Data is increasingly digital


Wonderful opportunities for reuse,
sharing, collaboration, analysis
However:






while microfilm and non-acidic paper
can last for 100+ years
magnetic media lasts 10+ years
optical media lasts 20+ years
(with proper handling)
2-10% of hard drives fail every year
software & hardware can outdate
And much info is still only hardcopy

16/07/2015
Lab books, notes, primary data, samples
Burroughs 1977 – B 9495
eScholarship
Centre
MagneticResearch
Tape Subsystem
8
Parts of the elephant…

Researchers & Departments
 are
at varying levels of maturity
 are experiencing different pain-points

Infrastructure Providers
 are
focused on specific problems
 are experts in different aspects/solutions
 are getting varying requirements
16/07/2015
eScholarship Research Centre
9
Framing the elephant…
16/07/2015
eScholarship Research Centre
10
Training for post-grads

UpSkills eResearch Stream – “Data Management Workshop”


Influences and References







run 3 so far
The University of Melbourne Policy
(Research Office, Records Services)
Australian Code for Responsible Conduct of Research
(NHMRC, ARC, Universities Australia)
OAK Law Project, QUT
Belinda Weaver presentations, UQ
PILIN Project (ANDS/ARROW)
A few examples!
Review of material




By eScholarship Research Centre
By local eResearch social network (eCoffee)
By a small group of department research/IT managers
By School of Graduate Research
16/07/2015
eScholarship Research Centre
11
Training for post-grads

Workshop Covers:






Development of a web site (ongoing)




Components of a “Data Management Plan”
Recommended reading list
Information Modelling, Good Practice Guidance
Technologies
Feedback has been very positive!!!
Resources, References, Examples, Q&A
A Research DMP Template (ongoing)
Drafting guidelines to support the
implementation and compliance (underway)
Future developments:


Training materials for supervisors?
Discussing undergraduate data management
training across Uni
 Possible DMP registry
16/07/2015
eScholarship Research Centre
12
Why Manage Research Data

IT IMPROVES YOUR RESEARCH BOTH NOW AND LATER…

Data is often valuable for a long time!!!



Maximise usefulness of data to fellow researchers



Context for the research, how data was collected, quality controls, how people can
and should use it (access and licensing), how you then attribute people/projects
can help lead to subsequent research papers
Good Practice  Better Research



Results of your research may outlast the project, your degree,
your position, your career, your institution
historical value, predictable or unforseen
DMP’s state the parameters within which you MUST do research,
then follow them! (being a Professional Researcher)
document for new comers, your group, project, externals
Ensure research integrity (and repeatability)



16/07/2015
through keeping better records
can trace your outcomes right from data collection, through research method, through to results
promotes awareness of responsibilities, policies, ethics, legislation
eScholarship Research Centre
13
Why Manage Research Data

IT MAY SAVE WASTED TIME…

You need to properly…




Collect research data
Manage research data
Archive research data
…otherwise there is a risk you cannot use your data, wasting years of effort.

From a study of 500 charges of “research misconduct” 40% could have been avoided by good data
management practice!

“Student submits her PhD thesis for examination then leaves country taking the data with them. An
examiner questions the integrity of the research data. A reanalysis of the data and original questionnaire is
required.”

“Participant in a research project lodges a claim for compensation, alleging that he was not adequately
informed about the effects of the study, does not recall giving consent, and the raw data he provided has
become public. Where are the records?“

“Ten years after a patent has been granted a patent infringement action is lodged. The laboratory notebook
is required.”

“At completion of a research project the data and records are boxed and stored in a departmental storeroom.
Sometime later the researcher needs to access the original records to refute a claim of falsification. He finds
that the storeroom has since been converted into a laboratory/coffee-shop/learning-hub.”
16/07/2015
eScholarship Research Centre
14
Why Manage Research Data

AND YOU NEED TO PLAN AHEAD…

University of Melbourne Policy


research methods and results open to scrutiny
data should be retained in a durable and appropriately referenced form








16/07/2015
for at least 5 years from any publication
minimum of 15 years for clinical trials
minimum of 7 years for adult psychological files (for minors 7 years after reaching 18)
or longer if external/funding/regulatory/archival requirements
research units & departments have formally
documented procedures for retention
researchers must comply
ensure research data and records are
accurate, complete, authentic and reliable
data and records formed for verification and
include sufficient detail
(authenticity and validity of conclusions)
eScholarship Research Centre
15
What’s in a DMP?

A Possible Template:

Context (Outline, Pre-planning, Decisions)

Responsibilities (ethics, consent, licensing, legislation, funding requirements, reporting)

Process & Policies


Data Collection and QC Process

Access Policy

Appropriate Use and Access Patterns

Data Maintenance, Persistence and Archival Practice

Decommissioning/Destruction/Sanitisation
Technical Requirements (policy for system developers/implementers/admins)

Current Infrastructure and Requirements

Future Infrastructure Requirements

Interoperability

Data Security

Availability, Reliability, Support and Response
(full template found at http://www.esrc.unimelb.edu.au/dmp )
16/07/2015
eScholarship Research Centre
16
Why Plan?

Making the most of Infrastructure
 ARCS
Data Fabric (NCRIS)
 University Infrastructure
 National Compute Infrastructure (VLSCI,
ANUSF, VPAC)
 Advanced Technology (imaging, sequencing,
synchrotron)
16/07/2015
eScholarship Research Centre
17
Why Plan?

Making the most of Research Networks
 ANDS
Data Commons
 BioGrid Australia
 Protein Data Bank

Increasingly you need to ensure
 Research
integrity, traceability
 Data and Result quality
 Data reusability
 Data security (misuse/damage, unintended/intended)
16/07/2015
eScholarship Research Centre
18
Communication

2-way Communication is important



Good Practice will emerge from both Research and ICT
expertise
National Infrastructure



Administration/ICT and Research Community
Opportunities and Trade-offs
3+ -way communication ?
Vision: a local community of practice




to provide and review guidelines and policies
to share data management plans
to drive development of shared infrastructure
advocate for and steer national infrastructure
16/07/2015
eScholarship Research Centre
19
What you can do…







http://www.esrc.unimelb.edu.au/dmp
Provide general feedback
Ask questions, we’ll seek answers
Work with us on guidance & good practice
Encourage students to attend future UpSkills
Talk with your students/group/department about
formally documenting a DMP
Feed back you DMP
16/07/2015
eScholarship Research Centre
20