RDMRose: Research Data Management for LIS Session 6 Managing Data Session 6.1 Practical data management Practical data management Session 6.1 Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose.
Download ReportTranscript RDMRose: Research Data Management for LIS Session 6 Managing Data Session 6.1 Practical data management Practical data management Session 6.1 Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose.
RDMRose: Research Data Management for LIS Session 6 Managing Data Session 6.1 Practical data management Practical data management Session 6.1 Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Learning outcomes • By the end of this session you will be able to: – Describe and apply practical principles of data management – Select appropriate messages about practical data management for particular audiences (in terms of discipline and seniority) Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Session overview • • • • • The importance of good data management Risk assessment Data quality Data security Teaching data management Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Practical data management • One of the key ways to motivate researchers for RDM is to consider the inherent importance of data quality management and the consequences of bad management • Practical data management includes: – Data quality – Metadata quality, e.g. file naming – Backing up data / data security Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose The importance of good data management • The range of different arguments for RDM include data quality issues • A good “starting point” for libraries in engaging with RDM is raising PhD researchers’ awareness • Information professionals should already understand the principles of good data management Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Practical messages for PhD students ACTIVITY 6.1.1 Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Activity 6.1.1 Practical messages for PhD students • Brainstorm what you think might be key practical messages for PhD students about how to manage their data. Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Using stories about what can go wrong ACTIVITY 6.1.2 Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Activity 6.1.2 Using stories about what can go wrong • SODAMAT project have assembled some news stories about what can go wrong for researchers: – https://code.soundsoftware.ac.uk/projects/soda mat/wiki/Evidence_Promoting_Good_Data_Mana gement • If you wanted to use one of these stories to inform an audience of early career researchers, which would you pick and why? Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Risk assessment • Data practices should undergo risk assessment • This implies categorising risks, in terms of their severity and their likelihood, then determining possible stances (from toleration to terminating the activity) • A risk log is a project management tool for of monitoring risks • The key to managing risk is often said to be – Planning early and continuing to update plan – Apportioning responsibility clearly Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Risk assessment Low severity Medium severity High severity Low probability Tolerate Tolerate Treat Treat Medium probability Tolerate Treat Treat Treat Transfer High probability Treat Treat Transfer Treat Transfer Terminate Based on DATUM in Action (2012 b) and JISC project guidelines. Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Risk logs Risk description Probability (P) 1–5 (Condition/Cau (1 = low se/consequenc 5 = high) e) Severity (S) 1–5 (1 = low 5 = high) Risk Timescale Score (PxS) Based on JISC infoNet (2012). Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Owner Detail of action to be taken Data threats • • • • Theft or loss of device Corruption of back up material Hard drive failures Difficulty locating data files – Difficulty finding relevant version • Colleagues move on, taking files with them so they cannot be consulted or leaving data without explanations of their source • Files over-written • Poor metadata • Not enough information about context is supplied to understand the data • Obsolescence of file types Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose High risk data (DATUM, 2012b) • “Details relating to identifiable individuals the contents of which, if compromised, have the potential to cause damage or distress • Any set of data relating to an identifiable individual’s sensitive personal details • Data concerning any vulnerable individual • Large data sets relating to 1,000 or more identifiable individuals • Research recommendations, before the decision was officially announced • Data that, if compromised, would affect contracts with commercial or other partners, or confidentiality and non-disclosure agreements • Information that would compromise patent applications • Any data that is the result of an un-repeatable study” Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Risk analysis ACTIVITY 6.1.3 Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Activity 6.1.3 Risk analysis • Think about the files on your own computer at work. Make a list of a few of the types of thing you are storing. • Use the risk log to analyse strategies for managing the risk. • How do you think you compare to other people with regards to how well you manage your data? Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Data quality (Gordon, 2007) • Completeness • Correctness • Enterprise awareness Nov-15 • • • • • Input validation Integrity Currency Duplication Inconsistency Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Quality controls in the research context • • • • • • • Instrument calibration Taking multiple measurements Following protocols in taking measurements Validation rules Using controlled vocabularies Expert validation Statistical tests to identify anomalous values Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Metadata quality: file naming conventions • “Data files are distinguishable from each other within their containing folder • Data file naming prevents confusion when multiple people are working on shared files • Data files are easier to locate and browse • Data files can be retrieved not only by the creator but by other users • Data files can be sorted in logical sequence • Data files are not accidentally overwritten or deleted • Different versions of data files can be identified • If data files are moved to other storage platform their names will retain useful context” (EDINA and Data Library, n.d.) Nov-15 • • • Simplicity Avoid special characters, spaces Appropriate word order • Rules about version control (DATUM in Action, 2012a) Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Data security: storage and back up • • • • Choice of media Frequency of back up How long are back ups stored? Security, if sensitive data • Issues with cloud based services, such as Dropbox, e.g. procedures for restoring files, reliability Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Developing teaching materials ACTIVITY 6.1.4 Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Activity 6.1.4 Developing teaching materials • Mantra project has some excellent guides to basic data management principles: http://datalib.edina.ac.uk/mantra/. • Work through the material under “organising data” “storage and security” and “data protection, rights and access”. • Imagine you are trying to construct an induction presentation for new PhD students in one of the departments you support. You have to cover a lot of topics, such as available library resources. What if anything might you include from the Mantra material and why? Compose up to three PowerPoint slides that put over the key issues in the strongest way. Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose Activity 6.1.4 Developing teaching materials • Discuss with a colleague how you think the key messages might be different depending on the discipline the students belong to – or is this a generic issue? • How, if at all, would you change the message for a taught student doing a dissertation, an early career researcher or a professor? • Which aspects of data management should the library explain and which should the computing service promote, and why? Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose REFERENCES Nov-15 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose References • • • Nov-15 DATUM in Action. (2012 a). Folders and files – guidance Newcastle: Northumbria University School of Computing, Engineering & Information Sciences. Retrieved from http://www.northumbria.ac.uk/static/5007/ceispdf/filenameguide .pdf DATUM in Action. (2012 b). Information security guidance. Newcastle: Northumbria University School of Computing, Engineering & Information Sciences. Retrieved from http://www.northumbria.ac.uk/static/5007/ceispdf/infosecurity.p df EDINA and Data Library, University of Edinburgh. (n.d.). Research Data MANTRA. Retrieved from http://datalib.edina.ac.uk/mantra/ Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose References • • • Nov-15 Gordon, K. (2007). Principles of Data Management: Facilitating Information Sharing. Swindon: British Computing Society. JISC infoNet (2012). The Risk Log. Newcastle upon Tyne. Retrieved from http://www.jiscinfonet.ac.uk/infokits/riskmanagement/identifying-risk/risk-log UK Data Archive (2011). Managing and Sharing Data: Best Practice for Researchers (3d ed., fully revised). Colchester: University of Essex. Retrieved from http://www.dataarchive.ac.uk/media/2894/managingsharing.pdf [This includes a useful checklist as an appendix.] Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose