Data catalogues and the data repository ADMIRe JISC MRD Dr Tom Parsons March 2013 Friday, November 06, 2015 ADMIRe.
Download ReportTranscript Data catalogues and the data repository ADMIRe JISC MRD Dr Tom Parsons March 2013 Friday, November 06, 2015 ADMIRe.
Data catalogues and the data repository ADMIRe JISC MRD Dr Tom Parsons March 2013 Friday, November 06, 2015 ADMIRe 1 A world-class university • One of the world’s top 100 universities, Nottingham is recognised globally for ground-breaking research and teaching excellence. • 40,000 students from more than 150 countries, two overseas campuses and strong links with universities around the world • Heavily focused on research: Medical & Health Sciences, Sciences, Engineering, Social Sciences and Arts • Large research income (£100m) – primarily RCUK, UK/EU government, commercial and charities Friday, November 06, 2015 ADMIRe 2 RDM policy “1.5. The University will provide mechanisms and services for storage, backup, registration, deposit, retention and preservation of research data assets in support of current and future access, during and after completion of research projects.” • Key priorities for ADMIRe: – Is the current provision good enough? – Where are the gaps? – What do we need to provide? Friday, November 06, 2015 ADMIRe 3 Understanding requirements • Approaches: – Survey (summer 2012) – Focus groups (November 2012) – Interviews (May 2012 onwards) • Mixture of ADMIRe, in-house, JISC MRD & Sero • Outputs: service model, detailed requirements catalogue, logical models & prototype • Institutional requirements: “Enterprise Architecture compliant”, use and integrate with existing systems Friday, November 06, 2015 ADMIRe 4 Survey results: Types of data Friday, November 06, 2015 ADMIRe 5 Survey results: Data storage Friday, November 06, 2015 ADMIRe 6 Survey results: Metadata… Friday, November 06, 2015 ADMIRe 7 Sharing data? Friday, November 06, 2015 ADMIRe 8 Survey results: Total research data estimates • From the survey’s 366 responses • 75 Gb average (mean/frequency) Friday, November 06, 2015 ADMIRe 9 Total research data estimates • 75 Gb average x approx. numbers of PIs & post-grads (4000) = 300TB (+-90%) • Large number of unknowns • A large amount of data, a large amount of files and a good case for managing it Friday, November 06, 2015 ADMIRe 10 Focus groups to understand more • Five Faculty based focus groups (30 people in total) • Based upon California Digital Library model Friday, November 06, 2015 ADMIRe 11 Active data Friday, November 06, 2015 ADMIRe 13 Archive data Friday, November 06, 2015 ADMIRe 14 Preservation activities Req. Freq Function 1 – Tag 2 – Bag 3 – Transfer 4 – Ingest 5 – Update 6 – GetDOI 7 – Publish 8 – Relocate 9 – Search 10 – Access 11 – Notify 12 – Annotate 13 - Check 14 – Report 15 - Administer Friday, November 06, 2015 Actors R S A + + + + + + + + Enter metadata describing a bag of research data assets Zip the data files up in a bag Transfer a bag to archival storage Ingest a bag in to storage Update (enhance, correct) metadata for a stored bag Get (public, private) DOIs for designated assets Publish assets appropriately on landing pages Relocate assets and update locators Search for assets by keyword or field Access metadata and data according to permissions Notify actors automatically about data events Create notes about a bag or its contents Check (verify) that the contents of a bag are in order Run reports on aspects of the system (DOI, bag, user) Administer permissions and system parameters M C C C O C C O M M O O M O M M M M M L L L L H M P L P L M ADMIRe 15 Mapping requirements Where are we now? Friday, November 06, 2015 ADMIRe 17 Solution Description Scope Data Retention A storage platform Storage of files and very Platform that enables storage basic (file type, size, of “unstructured” data retention period, user) files. BPM Metastorm frontend. Research data Web Site. Expected search and to be CMS or retrieve web possibly SharePoint site Equella Metadata Database FAST Search Engine Baggit File collection tool Interfaces/Integrations Direct Users AD to support access. (Note that Researchers Open Access will be supported by providing a persistent account used by the Research data web site server that has read only access to all “Open” data sets. Web site with relevant 1. Data Retention Platform via information and screens to REST to enable http(s) data search and return results transfer. 2. FAST (embedded function) to allow search from a web page. 3. Equella (API) to expose metadata onto search results. 4. Active Directory/LDAP to authenticate file access Stores metadata See Metastorm, FAST and Research Web Site Provides search results and 1. Potential federation to Primo rich search functionality on 2. Crawl of Equella the metadata Tool to assist researchers in Linked to from Metastorm selecting and bringing files into a collection Those searching for data sets N/A Anyone PI Solution Description DMP Online On line tool providing support for Used to create Data creating Data Management plan Management Plan that is managed to ensure Research Council Requirements are met DOI Active File Services “Other Repository” Scope Interfaces/Integrations Direct Users 1. Metastorm will link this PI within curation workflow 2. Metastorm will take the XML output of this and read key fileds directly to automate some metadata creation in Equella 3. Metastorm will save the output file of this tool On line tool for creating a unique Workflow to fork out to See Metastorm PI digital object identifier this system to allow researcher to create a persistent object identifier. File services primarily for storage The source of files for curation PI of active (ie not curated) files (“Bagging”). Selectable by browsing using Baggit tool. Sometimes Selectable by If used, and browsing using Baggit tool as where possible, the source of files for curation the DOI will point (“Bagging”). However these to these. may be databases or alternative repositories that are used instead. ADMIRe Phasing: Drop 1 (to June 2013) Objective: Deliver Key Functions but without over integration Deliverables: 1. Instructions and links on web site on how and why to use DMP Online 2. Instructions and links on web site on how and why to use DOI 3. Implementation (but not integration) of Baggit for Research users 4. Delivery of Metadata in Equella Including instructions and links on web site on how and why to use 5. Creation of Research Data Search Page Including instructions and links on web site on how and why to use Implementation of FAST search crawl Embed of FAST in web page Delivery of Results page to include relevant information 6. Metastorm development that: Creates User (PI Researcher) interface to Equella Provides fields to add all metadata into Equella Including Research Project Information, Subject Specific Information, Technical Metadata Allows Researcher to choose when a page is searchable Friday, November 06, 2015 ADMIRe ADMIRe Phasing: Drop 2 (to Dec 2013) Deliverables 1. Delivery of Retention platform • Delivered outside of ADMIRe project 2. Delivery of Open Access Platform • (Subset of Retention platform) 3. Definition and Delivery of • End to end workflow automation and integration for data management process with a vision of “Input Once” • Integrations of Baggit, Agresso Awards Management, DMP Online, DOI 4. Definition and Delivery of a report for Research Councils that • Confirms project adherence (at Project close) to funding requirements for data management and access • Enables non-conformance to be addressed Friday, November 06, 2015 ADMIRe Reusable outputs • • • • • Focus groups/interview formats Requirements catalogue Use cases Survey – questions, write-up etc Software? No… Friday, November 06, 2015 ADMIRe 22 Questions? [email protected] ADMIRe Project Manager Friday, November 06, 2015 ADMIRe 23