Transcript software

Research Data Management
PROJECT LIFECYCLE
Research Data Management
Introduction and context
Basic Project
Info.
• Thesis Title
• UH or Research Council?
• Duration
Related Policies
• UH and STFC policies:
• open after publication as your research is public
funded through UH & RCUK
Roles and
Responsibilities
• Yours and your supervisory team
• Additional help from IT staff, HelpDesk and other
team members
Planning Research Data Management
Research Data Management
Introduction and context
UH Template Questions
- Short description of the project’s fundamental aims and purpose
- Funding body requirements relating to the creation of data
- Institutional or research group guidelines
- Other policy-related dependencies
-
Date of creation
Aims and purpose of this plan
Target audience for this plan
Does this version supersede an earlier plan?
Contact details and expertise of nominated data managers /named individuals
- Glossary of terms
Planning Research Data Management
Research Data Management
Data types, formats, standards and capture methods
What data will be created?
• Note the type and volume of data that will be created e.g. transcripts,
measurements, imaging etc.
• Explain how you will capture the data. e.g. in a numbered, dated notebook.
• What formats do you propose to use and why? e.g. Microsoft Access, Excel or SPSS,
as they’re in widespread use.
DICE DMP Breakdown
Planning Research Data Management
Research Data Management
Activity
What is data?
What does data mean to you? Spend a couple of minutes thinking about what
data you will be working with, throughout your project.
Then we’ll combine your ideas and compare them to the DMP.
Planning Research Data Management
Research Data Management
Data types, formats, standards and capture methods
Project Desc.
and process
Existing and
New Data
Metadata
• Basic description for context
• Define data
• Software, Documents, Formats
• Archival data, Catalogues, New obs.
• Proprietary? Project only for 6mth
• Headers: table column headings, image
information, paper reference, software
version.
Planning Research Data Management
Research Data Management
Data types, formats, standards and capture methods
UH Template Questions
- Give a short overview description of the data being generated or reused in this
research
- How will you manage integration between the data being gathered during the
project and pre-exiting data source?
- How will you capture or create new data?
- Which file formats will you use and why?
- What directory and file naming convention will you use?
- Are there any tools or software needed to create/process/visualise these data?
- Are there appropriate computing hardware, facilities, and resources to manage,
store, and analyse these data?
- Are the datasets which you will be capturing/creating self-explanatory, or
understandable in isolation?
Planning Research Data Management
Research Data Management
Data types and formats
Images / Photos
Plots
Code
Tables
Audio-Visual
Transcripts
Planning Research Data Management
Research Data Management
Data types and formats
Formats
Images
Raw, Processed, Plotted,
Photos, Scans, CAD
Tables
Catalogues, Query results,
Calculations, Measurements
Source code
Models, simulations, scripts,
inputs, outputs, instructions
Interviews
Audio, Video, Written
Transcript
Uses
Considerations
FITS, JPG, PNG,
BMP, PS
Reuse, paper,
talk, poster,
archive, web
Use, size,
longevity
Text files, FITS,
spread sheets
Code input,
spectra, plot,
paper, CDS
Use, metadata,
accessibility
.c, .pl, .py, .idl,
README, Make
file, input,
output
Third party edit,
run. paper, web
User friendly;
functions, size
.txt, .odt, .doc.,
mp3, .mp4, .avi
Producing
transcripts,
further analysis
Format,
longevity,
security,
metadata
Planning Research Data Management
Research Data Management
Data metadata
How will the data be documented and described?
• What contextual details are needed? e.g. a description of the capture methods and
data analysis.
• How will you capture this? e.g. in papers, in a database, in a ‘readme’ text file, in
file properties/headers.
• Which standards will you use and why? e.g. refer to data centre recommendations
for metadata, controlled vocabularies, documentation.
• Are there any encoding guidelines you should follow?
DICE DMP Breakdown
Planning Research Data Management
Research Data Management
Data metadata
UH Template Questions
- What contextual details are needed to make the data you capture or collect
meaningful?
- How will you create or capture these metadata?
- What form will the metadata take?
- Why have you chosen particular standards and approaches for metadata and
contextual documentation?
Planning Research Data Management
Research Data Management
Legal and ethical issues
How you will manage ethics and intellectual property?
• How will you safeguard the privacy of research participants? e.g. by negotiating
informed consent.
• Will there be any restrictions and why? e.g. delays while you seek a patent,
embargoes as right of first use.
DICE DMP Breakdown
Planning Research Data Management
Research Data Management
Legal and ethical issues
Legal Issues
Ethics
• Do you have copyright issues?
• Is there a patent pending on your work?
• Is the data personal?
• Who owns your data? UH? STFC? Third party
company?
• How will the data be licenced?
• How will you deal with disputes?
Planning Research Data Management
Research Data Management
Legal and ethical issues
UH Template Questions
- Are there ethical and privacy issues that may prohibit sharing some or all of the data?
- How may they be resolved?
- Is the data that you capture / create ‘personal data’ in terms of the Data Protection
Act (1998) or equivalent legislation if outside the UK?
- What action have you taken to comply with your obligation under the Data Protection
Act (1998) or equivalent legislation if outside the UK?
- Will the data be covered by copyright or the Database Right? Give details.
- Who owns the copyright and other intellectual property?
- How will the database be licensed?
Planning Research Data Management
Research Data Management
Legal and ethical issues
How will you manage your data?
• How will you store and back-up the data? e.g. University storage with IT backup, mirror data on partner's server.
DICE DMP Breakdown
Planning Research Data Management
Research Data Management
Activity
How is your data at risk?
What precautions do you have in place to safeguard your data? Spend a
couple of minutes thinking about how your data could be lost, damaged or
stolen?
Then we’ll combine your ideas and compare them to the DMP.
Planning Research Data Management
Research Data Management
Short-term storage and data management
Storage
• Where will your data be stored?
• How will it be transmitted?
Back-up
• Where will you back-up?
• Who will do it?
• And how often?
Security
• Keeping sensitive data private
• Safe from loss or theft
Planning Research Data Management
Research Data Management
Short-term storage and data management
laptop
Cluster
External HD
DVD, Tape
UH PC
local drive
Networked
drives
U: and X:
5G
Planning Research Data Management
UH
server
Research Data Management
c
Sharing
• Access only to UH
members
• Versioning
• OS independent
• Set file structure
• Access only to UH
members
• Undefined file
structure
• No versioning
• Send large files
using the UH
server
• Web based only
• Open and Free
DMS
Research
Drives
Zend.To
Safeguarding data with Research Data Management
Research Data Management
Short-term storage and data management
Backing up should be an automatic part of your everyday research activities.
In 2005, an electrical fault in the electronics and laser research building at the
University of Southampton cost £50-100M including temporary building hire and
transfer of work to Holland.
Image if a fire or similar disaster happened at UH
How much would it cost you‽
Storing your data on the UH network means that it
is stored at de Havilland and at college lane.
Mountbatten Building, So’ton Uni.
Planning Research Data Management
Research Data Management
Short-term storage and data management
rsync
crontab
• Updates the changes to
files between two
directories and servers
• Timed schedule to
perform tasks – your
rsync for example
/usr/bin/rsync -avu /data/jgoodger/
/local/data/
/usr/bin/rsync [options] [src] [dest]
SHELL=/bin/tcsh
PATH=/sbin:/bin:/usr/sbin:/usr/bin
[email protected]
17 3 * * * /bin//usr/bin/rsync -avu /data/auser/
/local/data/
Planning Research Data Management
Research Data Management
Short-term storage and data management
Windows
Backup
and
restore
Mac
Time
machine
• Set an automated
backup through
control panel
• Back up your entire
content to another
disk or to the net.
Planning Research Data Management
Research Data Management
Short-term storage and data management
Most data needs some level of security:
-
Sensitive Personal Information
Proprietary data
New discoveries
Revolutionary code / software
All of it needs to be accessible, but secure
during storing, sharing, and publishing.
If you lost it, who would be able to access
your data?
Planning Research Data Management
Research Data Management
Short-term storage and data management
Keep your data secure in an encrypted folder
Bitlocker is available on Windows 7 +,
Truecrypt works on any operating system.
•
•
•
•
•
Open Source Encryption that works with Windows, Mac and Linux
Pack your files into an encrypted volume
Send by email, shared drive, cloud storage, web space
Password access
Variable encryption algorithms available
Planning Research Data Management
Research Data Management
Short-term storage and data management
UH Template Questions
-
What is the ballpark size of the data being collected / created?
Where physically will you store the data during the project’s lifetime?
What media will you use for primary storage during the project’s lifetime?
What software will be used in storing and processing these data?
-
How will you backup the data during the project’s lifetime?
How regularly will backups be made?
Who is responsible for backups?
How will you manage access restrictions and data security during the project’s
lifetime?
Planning Research Data Management
Research Data Management
Data sharing and access
What are the plans for data sharing and access?
• Who is expected to use the completed dataset(s) and for what purpose?
• How will the data be developed with future users in mind? e.g. choose appropriate
formats.
• How will you make the data available? e.g. deposit in a data centre, forward copies
on request, create website, publish a book.
DICE DMP Breakdown
Planning Research Data Management
Research Data Management
Activity
What happens when you’re finished?
After you’ve published, what happens to your data? Spend a couple of
minutes thinking about where your data should be stored, who should have
access, and what would happen if something happened to you?
Then we’ll combine your ideas and compare them to the DMP.
Planning Research Data Management
Research Data Management
Data sharing and access
Data sharing
and Reuse
• Who else wants your data?
• Why might they not have it?
Access to Data
• How and when will you release your data.
• Project timetable
Timing
• Limits on pub dates?
• Special Journal or Conference Publication
• Embargo or Patent Pending?
Planning Research Data Management
Research Data Management
Data sharing and access
UH Template Questions
- Which groups or organisations are likely to be interested in the data that you will create
/ capture?
- How do you anticipate your new data being reused?
- Are you under obligation or do you have plans to share all or part of the data that you
create / capture?
- If not, why will you not share you data?
- If you can, how and when will you make the data available?
- What is the process for gaining access to the data? Will access be chargeable?
- Does the original data collector / creator / PI, retain the right to use the data before
opening it up to wider use? Give details.
- Are there any embargo periods for political / commercial / patent reasons? Give details.
- How will you implement permission, restrictions, and embargoes?
Planning Research Data Management
Research Data Management
Deposit and long-term preservation
What is the strategy for long-term preservation and sustainability?
• What are the plans for sustainability? e.g. choose open standards, deposit in data
centre.
• Which repository / data centre have you identified as a place to deposit data?
Show you've consulted them.
• How will you prepare data for preservation and sharing? Show time and resource
budgeted in.
DICE DMP Breakdown
Planning Research Data Management
Research Data Management
Deposit and long-term preservation
Selection
• Which data will be kept / made public?
• Which tools are independently valuable?
• How will sensitive data by managed?
Location and
Schedule
• Where will your data be published? In a
national or subject specific archive? At UH?
• How long should the data be kept?
Metadata
• What metadata and documents will also be
archived?
• How will this data be created?
Planning Research Data Management
Research Data Management
Deposit and long-term preservation
Working
Data
Publication
Archiving
Journal
Paper
All Data
Archive
(ArXiv)
UHRA
Supporting
Data
National
Archive
Planning Research Data Management
Currently, selection, methods,
algorithms, results, plots, and
conclusions are in papers,
published in journals and
open archived in the ArXiv.
In the future you’ll need to
select supporting data,
including material with
independent scientific merit
for publication online in open
access archives; either subject
specific or in the UHRA.
Research Data Management
Deposit and long-term preservation
What is kept depends on the decisions made by the government, RCUK and the journals.
Probably supporting material and data that has scientific merit, but could be all of it.
Be prepared!
• Keep clear and useful notes on your work;
— Annotate your code so others (including your future self) can make sense of it
— Keep a README of instructions for reduction, analysis or code procedures
— Clearly name the published results – isolate them in or copy them to a directory
— Version control your codes, results, plots and drafts so you can compare at least
— Make a note of results/conclusions of dead ends
Planning Research Data Management
Research Data Management
Deposit and long-term preservation
UH Template Questions
-
-
What is the long-term strategy for maintaining, curating, and archiving these data?
On what basis will data be selected for the long-term preservation?
Will or should data be kept beyond the life of the project?
How long will or should these data be kept beyond the life of the project?
Which archive / repository / central database / data centre have you identified as a
place to deposit data?
How will you manage sensitive data over the longer term?
What metadata / documentation will be submitted alongside the datasets or
created on deposit / transformation in order to make the data re-useable?
How will this metadata / documentation be created, and by whom?
Will you include links to published materials and/or outcomes? Give details.
How will you address the issue of persistent citation?
Planning Research Data Management