Transcript Document
Data Management Principles - Planning UniMelb Cluster - Research Symposium Lyle Winton 24 Oct 2008 16/07/2015 1 Who am I? Dr Lyle Winton Background: Researcher/Scientist Technical Consultant education and research for gov. infrastructure projects Software Engineer experimental high energy physics, distributed systems, Grid industry, higher education (web development, information systems, enterprise systems) Currently: eScholarship Research Centre (eSRC) & Research Computing Services, Information Services Senior Research Support Officer (eResearch) 16/07/2015 provide ICT support for research workers, supply expertise & strategic advice develop plans for eResearch infrastructure be active in local & national eResearch co-ordination groups eScholarship Research Centre 2 Data Management What are we doing… (eSRC & eR – myself, Joanne Evans, Simon Porter, Gavan McCarthy, Leon Sterling) Policy Planning Tools (focus) (focus) Services Infrastructure Training Consultancy 16/07/2015 (focus) (focus) eScholarship Research Centre 3 Nationally… ANDS Vision: “The development of ANDS is intended to provide the essential meeting place where the Australian path forward for research data management can evolve and where a vision can be achieved.” Towards an Australian Data Commons, ANDS – Oct 2007 “ – institutions will be expected to have and support data management plans, and any researcher seeking support through a number of government funding agencies will be expected to describe how the data generated through the project will be managed throughout its lifecycle.” ANDS Interim Business Plan – Sept 2008 “Enabling Components… Data Storage: … This investment will extend to research organisations for the development of institutional nodes of the storage grid, on the condition that the storage is used exclusively for research data; the institutes co-invest in the infrastructure; each institute publishes and adopts a data management plan; and each institute ensures its researchers use and abide by the data management plan.” Strategic Roadmap for Research Infrastructure, NCRIS – July 2008. 16/07/2015 eScholarship Research Centre 4 Known problems… “A mature data stewardship system, interlinking policy and infrastructure could address the needs of researchers and improve the quality and efficiency of Australian innovation and research.” “The survey found that individual researchers and research groups do not include data management as an element when planning research projects.” “Grants do not fund the creation of datasets as an end in itself, nor are funds provided explicitly for the management of data.” “The survey found that research groups and organisations rarely have formal policies for the management of data. They usually have a set of practices that may or may not be adhered to at the project level.” “Researchers… see research data as belonging to them. … Experienced researchers have been managing data all their careers.” AERES report – Oct 2006 16/07/2015 eScholarship Research Centre 5 Some UniMelb goals… Information Futures Commission Excerpts from final report… We will know we're on track if: “Management and dissemination of research data and digital collections is painless.” We propose that we will: “Develop and adopt standards, guidelines and processes for the management, access and preservation of research data” “Implement a program for targeted curation of collections…” “Implement a digitisation and profiling strategy for works in collections (including 'born digital')…” Numerous references to services surround data: 16/07/2015 “Adequate physical and digital collections support research, learning and teaching, and knowledge transfer … Cataloguing and search tools make it easy to discover, cite and manage information.” eScholarship Research Centre 6 Where are we heading? Formal Research Data Management Infrastructure/Plans/Policies are emerging! Globally researchers are beginning to adopt this as good practice University is moving towards this as standard practice We need to start implementing and/or improving… Professional Data/Info Management Practice ensuring quality research data enables (appropriate) access enables reuse of data Policy, Intellectual Property & Licensing, Contracts, Legislation, Process … 16/07/2015 not just paperwork and hurdles ensuring research has integrity, repeatability enables (appropriate) access enables reuse of data eScholarship Research Centre Data Management Plan (DMP) 7 Why now? Research Data is increasing in size Research Collaborations are increasing Data is increasingly digital Wonderful opportunities for reuse, sharing, collaboration, analysis However: while microfilm and non-acidic paper can last for 100+ years magnetic media lasts 10+ years optical media lasts 20+ years (with proper handling) 2-10% of hard drives fail every year software & hardware can outdate And much info is still only hardcopy 16/07/2015 Lab books, notes, primary data, samples Burroughs 1977 – B 9495 eScholarship Centre MagneticResearch Tape Subsystem 8 Parts of the elephant… Researchers & Departments are at varying levels of maturity are experiencing different pain-points Infrastructure Providers are focused on specific problems are experts in different aspects/solutions are getting varying requirements 16/07/2015 eScholarship Research Centre 9 Framing the elephant… 16/07/2015 eScholarship Research Centre 10 Training for post-grads UpSkills eResearch Stream – “Data Management Workshop” Influences and References run 3 so far The University of Melbourne Policy (Research Office, Records Services) Australian Code for Responsible Conduct of Research (NHMRC, ARC, Universities Australia) OAK Law Project, QUT Belinda Weaver presentations, UQ PILIN Project (ANDS/ARROW) A few examples! Review of material By eScholarship Research Centre By local eResearch social network (eCoffee) By a small group of department research/IT managers By School of Graduate Research 16/07/2015 eScholarship Research Centre 11 Training for post-grads Workshop Covers: Development of a web site (ongoing) Components of a “Data Management Plan” Recommended reading list Information Modelling, Good Practice Guidance Technologies Feedback has been very positive!!! Resources, References, Examples, Q&A A Research DMP Template (ongoing) Drafting guidelines to support the implementation and compliance (underway) Future developments: Training materials for supervisors? Discussing undergraduate data management training across Uni Possible DMP registry 16/07/2015 eScholarship Research Centre 12 Why Manage Research Data IT IMPROVES YOUR RESEARCH BOTH NOW AND LATER… Data is often valuable for a long time!!! Maximise usefulness of data to fellow researchers Context for the research, how data was collected, quality controls, how people can and should use it (access and licensing), how you then attribute people/projects can help lead to subsequent research papers Good Practice Better Research Results of your research may outlast the project, your degree, your position, your career, your institution historical value, predictable or unforseen DMP’s state the parameters within which you MUST do research, then follow them! (being a Professional Researcher) document for new comers, your group, project, externals Ensure research integrity (and repeatability) 16/07/2015 through keeping better records can trace your outcomes right from data collection, through research method, through to results promotes awareness of responsibilities, policies, ethics, legislation eScholarship Research Centre 13 Why Manage Research Data IT MAY SAVE WASTED TIME… You need to properly… Collect research data Manage research data Archive research data …otherwise there is a risk you cannot use your data, wasting years of effort. From a study of 500 charges of “research misconduct” 40% could have been avoided by good data management practice! “Student submits her PhD thesis for examination then leaves country taking the data with them. An examiner questions the integrity of the research data. A reanalysis of the data and original questionnaire is required.” “Participant in a research project lodges a claim for compensation, alleging that he was not adequately informed about the effects of the study, does not recall giving consent, and the raw data he provided has become public. Where are the records?“ “Ten years after a patent has been granted a patent infringement action is lodged. The laboratory notebook is required.” “At completion of a research project the data and records are boxed and stored in a departmental storeroom. Sometime later the researcher needs to access the original records to refute a claim of falsification. He finds that the storeroom has since been converted into a laboratory/coffee-shop/learning-hub.” 16/07/2015 eScholarship Research Centre 14 Why Manage Research Data AND YOU NEED TO PLAN AHEAD… University of Melbourne Policy research methods and results open to scrutiny data should be retained in a durable and appropriately referenced form 16/07/2015 for at least 5 years from any publication minimum of 15 years for clinical trials minimum of 7 years for adult psychological files (for minors 7 years after reaching 18) or longer if external/funding/regulatory/archival requirements research units & departments have formally documented procedures for retention researchers must comply ensure research data and records are accurate, complete, authentic and reliable data and records formed for verification and include sufficient detail (authenticity and validity of conclusions) eScholarship Research Centre 15 What’s in a DMP? A Possible Template: Context (Outline, Pre-planning, Decisions) Responsibilities (ethics, consent, licensing, legislation, funding requirements, reporting) Process & Policies Data Collection and QC Process Access Policy Appropriate Use and Access Patterns Data Maintenance, Persistence and Archival Practice Decommissioning/Destruction/Sanitisation Technical Requirements (policy for system developers/implementers/admins) Current Infrastructure and Requirements Future Infrastructure Requirements Interoperability Data Security Availability, Reliability, Support and Response (full template found at http://www.esrc.unimelb.edu.au/dmp ) 16/07/2015 eScholarship Research Centre 16 Why Plan? Making the most of Infrastructure ARCS Data Fabric (NCRIS) University Infrastructure National Compute Infrastructure (VLSCI, ANUSF, VPAC) Advanced Technology (imaging, sequencing, synchrotron) 16/07/2015 eScholarship Research Centre 17 Why Plan? Making the most of Research Networks ANDS Data Commons BioGrid Australia Protein Data Bank Increasingly you need to ensure Research integrity, traceability Data and Result quality Data reusability Data security (misuse/damage, unintended/intended) 16/07/2015 eScholarship Research Centre 18 Communication 2-way Communication is important Good Practice will emerge from both Research and ICT expertise National Infrastructure Administration/ICT and Research Community Opportunities and Trade-offs 3+ -way communication ? Vision: a local community of practice to provide and review guidelines and policies to share data management plans to drive development of shared infrastructure advocate for and steer national infrastructure 16/07/2015 eScholarship Research Centre 19 What you can do… http://www.esrc.unimelb.edu.au/dmp Provide general feedback Ask questions, we’ll seek answers Work with us on guidance & good practice Encourage students to attend future UpSkills Talk with your students/group/department about formally documenting a DMP Feed back you DMP 16/07/2015 eScholarship Research Centre 20