Digital Libraries and e-Research: a UK perspective on a changing landscape. Dr Liz Lyon, Director UKOLN, University of Bath, UK eScience Forum, Berlin October 2005. UKOLN.
Download ReportTranscript Digital Libraries and e-Research: a UK perspective on a changing landscape. Dr Liz Lyon, Director UKOLN, University of Bath, UK eScience Forum, Berlin October 2005. UKOLN.
Digital Libraries and e-Research: a UK perspective on a changing landscape. Dr Liz Lyon, Director UKOLN, University of Bath, UK eScience Forum, Berlin October 2005. UKOLN is supported by: www.ukoln.ac.uk a centre of expertise in digital information management www.bath.ac.uk Overview 1. e-Research & data-intensive science 2. Building e-infrastructure: repository services & adding value • • Aggregation and linking: eBank UK Integration and workflows 3. Looking to the longer term: digital curation and preservation eScience Forum, Berlin October 2005 2 1. e-Research & data-intensive science eScience - the data deluge Data Overload! EPSRC National Crystallography Service How do we disseminate? eScience Forum, Berlin October 2005 4 Diversity of data collections • • Very large, relatively homogeneous: Large-scale Hadron Collider (LHC) outputs from CERN Smaller, heterogeneous and richer collections: World Data Centre for Solar-terrestrial Physics CCLRC Small-scale laboratory results: “jumping robots” project at the University of Bath Population survey data: UK Biobank • Highly sensitive, personal data: patient care records • • eScience Forum, Berlin October 2005 5 Taxonomy of data collections • • • Research collections: jumping robots Community collections: Flybase at Indiana (with UC Berkeley ) Reference collections: Protein Data Bank Evolution…… Source: NSF Long-Lived Digital Data Collections Draft report revisedeScience MayForum, 2005 Berlin October 2005 6 Experience of data-sharing • Large scale data sharing in the life sciences Draft Report June 2005 Sponsored by UK research funding bodies MRC, BBSRC, NERC, JISC, Wellcome • Outcomes & recommendations – – – – Importance of standards and good quality metadata Require a data management plan Work needed on vocabularies & ontologies Awareness of archiving & long term preservation • Position of research funders and policy makers? eScience Forum, Berlin October 2005 7 eScience Forum, Berlin October 2005 8 Presentation services: subject, media-specific, data, commercial portals Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media Resource discovery, linking, embedding Data analysis, transformation, mining, modelling Searching , harvesting, embedding Aggregator services: national, commercial Resource discovery, linking, embedding Learning object creation, re-use Harvesting metadata Research & e-Science workflows Deposit / selfarchiving Learning & Teaching workflows Repositories : institutional, e-prints, subject, data, learning objects Validation Publication Resource discovery, linking, embedding The scholarly knowledge cycle. Liz Lyon, Ariadne, July 2003. © Liz Lyon (UKOLN, University of Bath), 2005 This work is licensed under a Creative Commons License Attribution-ShareAlike 2.0 Deposit / selfarchiving Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules Peer-reviewed publications: journals, conference proceedings eScience Forum, Berlin October 2005 Validation Quality assurance bodies 9 Digital repositories: a UK view in 2005 • Institutional repository trends D-Lib Magazine Sept 2005 – Statistics: UK 31, (Germany 103, Sweden 25) – Policy: UK RCUK draft, (Germany YES), – National programmes: UK YES, (Germany, Australia, Sweden, Netherlands YES) • Pioneering work: University of Southampton, SHERPA, ePrints UK, eBank UK…… • Southampton has a Self-Archiving Policy and a mandate rather than a recommendation • JISC £4M Digital Repository Programme started • 13 Oct JISC announces extra £80M capital funding over 2 years which includes further support for repositories eScience Forum, Berlin October 2005 10 2. Building e-infrastructure: repository services & adding value eBank UK Project http://www.ukoln.ac.uk/projects/ebank-uk/ • Two key themes: – Open access to datasets – Linking research data to publications and to learning • UKOLN, University of Southampton, University of Manchester • e-Science application ‘Combechem’ : Grid-enabled combinatorial chemistry + National Crystallography Service • Resource Discovery Network / PSIgate physical sciences portal eScience Forum, Berlin October 2005 12 Create Data Flow in eBank UK HTML Deposition Interface Submit Store/link Institutional repository eCrystals Index and Search Harvest (XML) eBank aggregator service Present HTML Present OAI-PMH Deposit Service Provider interfaces e.g. Subject Portal Local archive search interface eScience Forum, Berlin October 2005 Data files Metadata 13 CombeChem: An EPSRC pilot project Simulation Video Diffractometer Properties Analysis Structures Database Properties e-Lab X-Ray e-Lab Grid Middleware eScience Forum, Berlin October 2005 14 Crystallography workflow RAW DATA DERIVED DATA RESULTS DATA • Initialisation: mount new sample set up data collection • Collection: collect data • Processing: process and correct images • Solution: solve structures • Refinement: refine structure • CIF: produce CIF (Crystallographic Information File) • Validation: chemical & crystallographic checks • Report: generate Crystal Structure Report eScience Forum, Berlin October 2005 15 A data repository entry eScience Forum, Berlin October 2005 16 Access to the underlying data: complex objects ecrystals.chem.soton.ac.uk eScience Forum, Berlin October 2005 17 Aggregating: search & discover eScience Forum, Berlin October 2005 18 Linking data to publications eScience Forum, Berlin October 2005 19 Embedding in a science portal for student learners eScience Forum, Berlin October 2005 20 Ontologies for discovery in an inter-disciplinary world • Transform the ‘list’ into an ‘ontology’ • Embed ontology into the deposition process • Aggregators use keywords for linking with the broader literature • Researchers use keyword ontology in search and discovery services eScience Forum, Berlin October 2005 21 Persistent identifiers for data citation • eBank use cases: depositor, author, service provider, reader, publisher, ? • Schemes: DOI, Handle, ARK, PURL • Global identification: express as http URIs • Added value services: CrossRef, resolution service, integration (Globus), look-up service, ? • Degree of trust or persistence • Costs • Future potential: political, ? • Domain identifiers: International Chemical Identifier (InChI) codes eScience Forum, Berlin October 2005 22 Publication & citation of scientific primary data project • National Library for Science & Technology (TIB), University of Hanover, Germany • STD-DOI Project http://www.std-doi.de • DOI registry for datasets • Data requirements: quality control, long-term curation, use DOI resolver • Data publication agents: World Data Center Climate, GeoForschungsZentrum Potsdam • Exemplar data citation: – Kamm, H; Machon, L; Donner, S (2004): Gas chromatography (KTB Field Lab), GFZ Potsdam. doi:10.1594/GFZ/ICDP/KTB/ktb-geoch-gaschr-p eScience Forum, Berlin October 2005 23 Integration into crystallographic publishing practices Publishers seal of approval eScience Forum, Berlin October 2005 24 Integration into chemistry research workflows • R4L Repository for the Laboratory Project (JISC-funded) automated data capture from instrumentation, registration of results • SMART TEA electronic Laboratory notebook + annotations • Related sub-domains of chemistry: SPECTRa Project (JISC-funded) • Research assessment (RAE) process? eScience Forum, Berlin October 2005 25 Integration into the curriculum and e-Learning workflows • MChem course • Assess role in Undergraduate Chemical Informatics courses • Pedagogic evaluation eScience Forum, Berlin October 2005 26 3. Looking to the longer term: digital curation & preservation Repositories and digital curation For later use? In use now (and the future)? Static Dynamic Data preservation Data curation “maintaining and adding value to a trusted body of digital information for current and future use” Assuring long-term access to the research record eScience Forum, Berlin October 2005 28 UK Digital Curation Centre http://www.dcc.ac.uk UK Digital Curation Centre http://www.dcc.ac.uk • Universities of Edinburgh, Glasgow, CCLRC, UKOLN • Funding from JISC and EPSRC for 3 years eScience Forum, Berlin October 2005 29 UK Digital Curation Centre • Research agenda – Annotation, provenance and archiving of scientific databases, socio-legal issues, organisational dynamics of trusted repositories • Development activities – Audit & certification, representation information registry, wiki at http://dev.dcc.ac.uk • Delivering services – Curation manual, 45 initial topics identified, Workshop Programme Digital Curation & Preservation: Defining the Research Agenda for next Decade, Nov in Warwick UK • Outreach Programme – 1st International DCC Conference presentations available, PV 2005 Nov in Edinburgh, International Journal of Digital Curation eScience Forum, Berlin October 2005 30 Thank you. Questions?….. More information: UKOLN http://www.ukoln.ac.uk/