RDMRose: Research Data Management for LIS Session 6 Managing Data Session 6.1 Practical data management Practical data management Session 6.1 Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose.

Download Report

Transcript RDMRose: Research Data Management for LIS Session 6 Managing Data Session 6.1 Practical data management Practical data management Session 6.1 Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose.

RDMRose: Research Data Management for LIS
Session 6 Managing Data
Session 6.1 Practical data management
Practical data management
Session 6.1
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Learning outcomes
• By the end of this session you will be able to:
– Describe and apply practical principles of data
management
– Select appropriate messages about practical data
management for particular audiences (in terms of
discipline and seniority)
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Session overview
•
•
•
•
•
The importance of good data management
Risk assessment
Data quality
Data security
Teaching data management
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Practical data management
• One of the key ways to motivate researchers
for RDM is to consider the inherent
importance of data quality management and
the consequences of bad management
• Practical data management includes:
– Data quality
– Metadata quality, e.g. file naming
– Backing up data / data security
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
The importance of good data
management
• The range of different arguments for RDM include
data quality issues
• A good “starting point” for libraries in engaging
with RDM is raising PhD researchers’ awareness
• Information professionals should already
understand the principles of good data
management
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Practical messages for PhD students
ACTIVITY 6.1.1
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Activity 6.1.1 Practical messages for
PhD students
• Brainstorm what you think might be key
practical messages for PhD students about
how to manage their data.
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Using stories about what can go wrong
ACTIVITY 6.1.2
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Activity 6.1.2 Using stories about what
can go wrong
• SODAMAT project have assembled some news
stories about what can go wrong for
researchers:
– https://code.soundsoftware.ac.uk/projects/soda
mat/wiki/Evidence_Promoting_Good_Data_Mana
gement
• If you wanted to use one of these stories to
inform an audience of early career
researchers, which would you pick and why?
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Risk assessment
• Data practices should undergo risk assessment
• This implies categorising risks, in terms of their
severity and their likelihood, then determining
possible stances (from toleration to terminating
the activity)
• A risk log is a project management tool for of
monitoring risks
• The key to managing risk is often said to be
– Planning early and continuing to update plan
– Apportioning responsibility clearly
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Risk assessment
Low severity
Medium
severity
High severity
Low probability Tolerate
Tolerate
Treat
Treat
Medium
probability
Tolerate
Treat
Treat
Treat
Transfer
High
probability
Treat
Treat
Transfer
Treat
Transfer
Terminate
Based on DATUM in Action (2012 b) and JISC project guidelines.
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Risk logs
Risk
description
Probability
(P)
1–5
(Condition/Cau (1 = low
se/consequenc 5 = high)
e)
Severity
(S)
1–5
(1 = low
5 = high)
Risk
Timescale
Score
(PxS)
Based on JISC infoNet (2012).
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Owner
Detail of
action to
be taken
Data threats
•
•
•
•
Theft or loss of device
Corruption of back up material
Hard drive failures
Difficulty locating data files
– Difficulty finding relevant version
• Colleagues move on, taking files with them so they cannot
be consulted or leaving data without explanations of their
source
• Files over-written
• Poor metadata
• Not enough information about context is supplied to
understand the data
• Obsolescence of file types
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
High risk data (DATUM, 2012b)
• “Details relating to identifiable individuals the contents of which, if
compromised, have the potential to cause damage or distress
• Any set of data relating to an identifiable individual’s sensitive
personal details
• Data concerning any vulnerable individual
• Large data sets relating to 1,000 or more identifiable individuals
• Research recommendations, before the decision was officially
announced
• Data that, if compromised, would affect contracts with commercial
or other partners, or confidentiality and non-disclosure agreements
• Information that would compromise patent applications
• Any data that is the result of an un-repeatable study”
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Risk analysis
ACTIVITY 6.1.3
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Activity 6.1.3 Risk analysis
• Think about the files on your own computer at
work. Make a list of a few of the types of thing
you are storing.
• Use the risk log to analyse strategies for
managing the risk.
• How do you think you compare to other
people with regards to how well you manage
your data?
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Data quality (Gordon, 2007)
• Completeness
• Correctness
• Enterprise awareness
Nov-15
•
•
•
•
•
Input validation
Integrity
Currency
Duplication
Inconsistency
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Quality controls in the research
context
•
•
•
•
•
•
•
Instrument calibration
Taking multiple measurements
Following protocols in taking measurements
Validation rules
Using controlled vocabularies
Expert validation
Statistical tests to identify anomalous values
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Metadata quality: file naming
conventions
•
“Data files are distinguishable from each
other within their containing folder
• Data file naming prevents confusion when
multiple people are working on shared
files
• Data files are easier to locate and browse
• Data files can be retrieved not only by the
creator but by other users
• Data files can be sorted in logical
sequence
• Data files are not accidentally overwritten
or deleted
• Different versions of data files can be
identified
• If data files are moved to other storage
platform their names will retain useful
context”
(EDINA and Data Library, n.d.)
Nov-15
•
•
•
Simplicity
Avoid special characters, spaces
Appropriate word order
• Rules about version control
(DATUM in Action, 2012a)
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Data security: storage and back up
•
•
•
•
Choice of media
Frequency of back up
How long are back ups stored?
Security, if sensitive data
• Issues with cloud based services, such as
Dropbox, e.g. procedures for restoring files,
reliability
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Developing teaching materials
ACTIVITY 6.1.4
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Activity 6.1.4 Developing teaching
materials
• Mantra project has some excellent guides to basic data
management principles:
http://datalib.edina.ac.uk/mantra/.
• Work through the material under “organising data”
“storage and security” and “data protection, rights and
access”.
• Imagine you are trying to construct an induction
presentation for new PhD students in one of the
departments you support. You have to cover a lot of topics,
such as available library resources. What if anything might
you include from the Mantra material and why? Compose
up to three PowerPoint slides that put over the key issues
in the strongest way.
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
Activity 6.1.4 Developing teaching
materials
• Discuss with a colleague how you think the key
messages might be different depending on the
discipline the students belong to – or is this a
generic issue?
• How, if at all, would you change the message for
a taught student doing a dissertation, an early
career researcher or a professor?
• Which aspects of data management should the
library explain and which should the computing
service promote, and why?
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
REFERENCES
Nov-15
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
References
•
•
•
Nov-15
DATUM in Action. (2012 a). Folders and files – guidance Newcastle:
Northumbria University School of Computing, Engineering &
Information Sciences. Retrieved from
http://www.northumbria.ac.uk/static/5007/ceispdf/filenameguide
.pdf
DATUM in Action. (2012 b). Information security guidance.
Newcastle: Northumbria University School of Computing,
Engineering & Information Sciences. Retrieved from
http://www.northumbria.ac.uk/static/5007/ceispdf/infosecurity.p
df
EDINA and Data Library, University of Edinburgh. (n.d.). Research
Data MANTRA. Retrieved from http://datalib.edina.ac.uk/mantra/
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose
References
•
•
•
Nov-15
Gordon, K. (2007). Principles of Data Management: Facilitating
Information Sharing. Swindon: British Computing Society.
JISC infoNet (2012). The Risk Log. Newcastle upon Tyne. Retrieved
from http://www.jiscinfonet.ac.uk/infokits/riskmanagement/identifying-risk/risk-log
UK Data Archive (2011). Managing and Sharing Data: Best Practice
for Researchers (3d ed., fully revised). Colchester: University of
Essex. Retrieved from http://www.dataarchive.ac.uk/media/2894/managingsharing.pdf [This includes a
useful checklist as an appendix.]
Learning material produced by RDMRose
http://www.sheffield.ac.uk/is/research/projects/rdmrose