DMPTool and Data Management Basics Hannah Norton July 29, 2014 Image modified from : http://www.flickr.com/photos/blprnt/3642742876/in/photostream/

Download Report

Transcript DMPTool and Data Management Basics Hannah Norton July 29, 2014 Image modified from : http://www.flickr.com/photos/blprnt/3642742876/in/photostream/

DMPTool and
Data Management Basics
Hannah Norton
July 29, 2014
Image modified from : http://www.flickr.com/photos/blprnt/3642742876/in/photostream/
Background: the Data Lifecycle
Data
Archiving
Data
Collection
Study
Concept
Data Management
Planning
Data
Analysis
Data
Processing
Data
Distribution
Data
Analysis
Data
Discovery
Repurposing
* Based on Data Documentation Initiative (DDI) version 3.0 Combined Life Cycle Model2
What is a data management plan
(DMP)?
• A clear description of how you plan to address
data management issues in your research.
• A way to communicate your data
management efforts to members of your team
and others (especially funders).
A data management plan gives a concise
description of the who, what, where, and when
of your data throughout its life cycle.
Why do researchers need a
Data Management Plan (DMP)?
For all the same reasons you should take care of
your data…
• To ensure that valuable data resources will be
accessible in the future to members of the
research team and the broader community.
• To make life easier – by planning ahead and
documenting data throughout its life cycle,
researchers can save time and focus on research.
• To increase the visibility of research.
• To satisfy funders’ requirements.
Components of a DMP
• Project description
• Data collection:
– Types of data
– Data and metadata standards to be used
• Legal and ethical issues:
– Privacy and confidentiality
– Intellectual property rights
• Policies for data sharing and re-use
• Data preservation (long-term)
• Who is responsible for data management
http://dmptool.org
Log in to DMPTool with Gatorlink
Funders with DMPTool Templates
•
•
•
•
•
•
•
•
Alfred P. Sloan Foundation
Gordon and Betty Moore Foundation
Gulf of Mexico Research Initiative
Institute of Education Sciences (US Dept of Education)
Institute of Museum and Library Services
Joint Fire Science Program
National Institutes of Health
National Endowment for the Humanities – Office of
Digital Humanities
• National Science Foundation (General and 11
Directorates)
• U.S. Geological Survey
http://library.ufl.edu/datamgmt
http://guides.uflib.ufl.edu/datamana
gement
Sample DMPs from UF
• Example text in the IR@UF:
http://ufdc.ufl.edu/AA00014694/00001/
• Research Computing guidance on Data
Management Plans (includes links to UF
College of Engineering and Department of
Astronomy guides):
http://www.hpc.ufl.edu/research/proposalsupport/data-management-plan/
Components of a DMP
•
•
•
•
•
•
Project description
Data collection
Legal and ethical issues
Policies for data sharing and re-use
Data preservation (long-term)
Who is responsible for data management
Example data collection questions
• What file formats will you use for your data, and why?
What metadata/documentation will be submitted
alongside the data? (NIH)
• Describe the data to be collected (actual observations)
during your research including amount (if known). Name
the type of data, the instrument or collection approach,
and how the data will be sampled. (NSF-BIO)
• Give a short description of the data, including amount
(estimated amount or known amount) and content. Data
types could include XML spreadsheets, interview
transcripts, text files, historical documents, diaries, field
notes, geospatial data, citations, software code, algorithms,
etc. (NEH)
Data generated throughout the
lifecycle has different needs
• Raw data - some must be kept forever, others
can be discarded after the project is complete
• Intermediate data for analyzing and
processing - can be often be discarded at the
end of the computation, but computational
methods should be kept for reproducibility
• Final data - should be made available
indefinitely to the community
File formats
Formats with the following characteristics are considered
relatively stable and better for long-term preservation:
• open documentation
• support across a range of software platforms
• wide adoption
• no compression (or lossless compression)
• no embedded files or embedded programs/scripts
• non-proprietary format
See the following for preferred and accepted file formats
for the IR@UF: http://ufdc.ufl.edu/AA00017119/00011
What exactly is metadata again?
• Descriptive information that helps you and
others understand your data
• “Data about data” that acts as a surrogate for
your data when you or others are trying to:
– Find the data later
– Know what the data is later
– Share the data later
Metadata across the disciplines
Basic information to keep:
• Descriptive
– What is it about?
– Title, time, author, keywords
– Relations to other data objects
• Administrative
– Ownership and use permissions
• Provenance
– Where does it come from?
– History of changes to the data, versions
More specific information varies by discipline
Components of a DMP
•
•
•
•
•
•
Project description
Data collection
Legal and ethical issues
Policies for data sharing and re-use
Data preservation (long-term)
Who is responsible for data management
Example legal/ethical questions
• Procedures for managing and for maintaining the
confidentiality of the data to be shared (IES)
• Will any permission restrictions need to be placed
on the data? (NSF-BIO)
• Policies for public access and sharing should be
described, including provisions for appropriate
protection of privacy, confidentiality, security,
intellectual property, or other rights or
requirements. (NEH)
Components of a DMP
•
•
•
•
•
•
Project description
Data collection
Legal and ethical issues
Policies for data sharing and re-use
Data preservation (long-term)
Who is responsible for data management
Example data sharing questions
• Will you share data via a repository, handle
requests directly or use another mechanism?
(IES)
• What transformations will be necessary to
prepare data for preservation/data sharing? (NIH)
• How long will the original data
collector/creator/principal investigator retain the
right to use the data before opening it up to
wider use? (NEH)
Example data preservation/archiving
questions
• If your method of sharing is with an archive, which
archive/repository/database have you identified as a
place to deposit data? (IES)
• What is the long-term strategy for maintaining,
curating and archiving the data? (NSF-BIO)
• The Data Management Plan should describe physical
and cyber resources and facilities that will be used for
the effective preservation and storage of research data.
These can include third party facilities and repositories.
(NEH)
Finding a home for your data
• Data storage, both short-term and long-term,
can take place in 3 types of places:
– Locally, within the lab or research environment
– Within the institution
– Within a national/discipline-based repository
See the following guide to find discipline-based
repositories: http://guides.uflib.ufl.edu/datasets
http://www.hpc.ufl.edu/
Repositories
Advantages of an institutional
repository:
• Linked to your institution –
intellectual capital of the
institution in one place
• You can put all your datasets
together
• Some guarantee of support
from the university
• Some domain repositories
may “go out of business”
once their funding ends
Advantages of a domain
repository:
• Your data will stored with
similar datasets
• Researchers in your discipline
will may find your data more
easily
• The repository will
understand what your data
needs in terms of storage,
archiving and preservation
• Computational tools may be
developed to crunch a critical
mass of data of a certain kind
Adapted from: http://libraries.mit.edu/guides/subjects/data-management/Managing%20Research%20Data%20101.pdf
Benefits of sharing data
• Data can be used by other researchers with
different objectives
• Accelerate the time of discovery by building upon
previous research
• Results can be reproduced more easily and
accurately
• Researchers receive the credit they’re due
• Data producers have a new channel by which to
promote their work (increase impact of research)
Components of a DMP
•
•
•
•
•
•
Project description
Data collection
Legal and ethical issues
Policies for data sharing and re-use
Data preservation (long-term)
Who is responsible for data management
Example data management
responsibility questions
• Roles and responsibilities of project or
institutional staff in the management and
retention of research data (IES)
• Who will be responsible for data management
and for monitoring the data management plan?
How will adherence to this data management
plan be checked or demonstrated? (NSF-BIO)
• Who will have responsibility over time for
decisions about the data once the original
personnel are no longer available? (NEH)
A cautionary tale…
From NYU Health Science Center Libraries: http://youtu.be/N2zK3sAtr-4
Questions?
Feel free to contact the Data
Management/Curation Task Force: [email protected]
Or me: Hannah Norton, [email protected], 352273-8412