Developing e-Infrastructure to support new research and learning paradigms. Dr Liz Lyon, Director UKOLN, University of Bath, UK Building the Info Grid, Copenhagen, September 2005. UKOLN.
Download ReportTranscript Developing e-Infrastructure to support new research and learning paradigms. Dr Liz Lyon, Director UKOLN, University of Bath, UK Building the Info Grid, Copenhagen, September 2005. UKOLN.
Developing e-Infrastructure to support new research and learning paradigms. Dr Liz Lyon, Director UKOLN, University of Bath, UK Building the Info Grid, Copenhagen, September 2005. UKOLN is supported by: www.ukoln.ac.uk a centre of expertise in digital information management www.bath.ac.uk Overview 1. e-Research: a changing landscape 2. Developing infrastructure: repository services & adding value • • Aggregation and linking: eBank UK Integration and workflows 3. Looking to the longer term: digital curation and preservation DEFF Seminar, Copenhagen, September 2005 2 1. e-Research: a changing landscape eScience - the data deluge Data Overload! EPSRC National Crystallography Service How do we disseminate? DEFF Seminar, Copenhagen, September 2005 4 Diversity of data collections • • Very large, relatively homogeneous: Large-scale Hadron Collider (LHC) outputs from CERN Smaller, heterogeneous and richer collections: World Data Centre for Solar-terrestrial Physics CCLRC Small-scale laboratory results: “jumping robots” project at the University of Bath Population survey data: UK Biobank • Highly sensitive, personal data: patient care records • • DEFF Seminar, Copenhagen, September 2005 5 Taxonomy of data collections • • • Research collections: jumping robots Community collections: Flybase at Indiana (with UC Berkeley ) Reference collections: Protein Data Bank Evolution…… Source: NSF Long-Lived Digital Data Collections Draft report revised May 2005 DEFF Seminar, Copenhagen, September 2005 6 Experience of data-sharing • Large scale data sharing in the life sciences Draft Report June 2005 Sponsored by UK research funding bodies MRC, BBSRC, NERC, JISC, Wellcome • Outcomes & recommendations – – – – Importance of standards and good quality metadata Require a data management plan Work needed on vocabularies & ontologies Awareness of archiving & long term preservation • Position of research funders and policy makers? DEFF Seminar, Copenhagen, September 2005 7 DEFF Seminar, Copenhagen, September 2005 8 Presentation services: subject, media-specific, data, commercial portals Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media Resource discovery, linking, embedding Data analysis, transformation, mining, modelling Searching , harvesting, embedding Aggregator services: national, commercial Resource discovery, linking, embedding Learning object creation, re-use Harvesting metadata Learning & Teaching workflows Research & e-Science workflows Repositories : institutional, e-prints, subject, data, learning objects Deposit / selfarchiving Validation Publication Resource discovery, linking, embedding The scholarly knowledge cycle. Liz Lyon, Ariadne, July 2003. © Liz Lyon (UKOLN, University of Bath), 2005 This work is licensed under a Creative Commons License Attribution-ShareAlike 2.0 Deposit / selfarchiving Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules Peer-reviewed publications: journals, conference proceedings DEFF Seminar, Copenhagen, September 2005 Validation Quality assurance bodies 9 A view in 2005 • Institutional repositories: country update CNI/JISC/SURF • D-Lib Magazine September 2005 • Emerging trends – Germany 103, UK 31, Sweden 25 – Policy: Germany YES, UK RCUK draft – National programmes: UK, Germany, Australia, Sweden, Netherlands YES – Services: indexing, search, harvesting DEFF Seminar, Copenhagen, September 2005 10 2. Developing infrastructure: repository services & adding value Developing models • The e-Framework for Education & Research • JISC, UK and Department of Education, Science & Training, Australia • www.e-framework.org “The primary goal of the initiative is to produce an evolving and sustainable, open standards based service oriented technical framework to support the education and research communities.” • Reference models • Service definitions DEFF Seminar, Copenhagen, September 2005 12 JISC-funded content providers institutional content providers external content providers authentication/authorisation (Athens) service registries metadata schema registries brokers aggregators catalogues indexes identifier services institutional profiling services OpenURL media-specific institutional link servers portals portals subject portals learning management systems terminology services shared infrastructure end-user desktop/browser © Andy Powell (UKOLN, University of Bath), 2005 This work is licensed under a Creative Commons License Attribution-ShareAlike 2.0 JISC Information Environment architecture Presentation services: subject, media-specific, data, commercial portals Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media Resource discovery, linking, embedding Data analysis, transformation, mining, modelling Searching , harvesting, embedding Aggregator services: eBank UK Resource discovery, linking, embedding Learning object creation, re-use Harvesting metadata Research & e-Science workflows Deposit / selfarchiving Learning & Teaching workflows Repositories : institutional, e-prints, subject, data, learning objects Validation Publication Deposit / selfarchiving Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules Resource discovery, linking, embedding Peer-reviewed publications: journals, conference proceedings DEFF Seminar, Copenhagen, September 2005 Validation Quality assurance bodies 14 eBank UK Project • Two key themes: – Open access to datasets – Linking research data to publications and to learning • JISC-funded from September 2003: now in Phase 2 • UKOLN at the University of Bath (lead), University of Southampton, University of Manchester • Exemplar: e-Science testbed ‘Combechem’ – Grid-enabled combinatorial chemistry / crystallography – National Crystallography Service • Resource Discovery Network / PSIgate physical sciences portal • http://www.ukoln.ac.uk/projects/ebank-uk/ DEFF Seminar, Copenhagen, September 2005 15 The “hybrid” project team • • • • • • • • UKOLN Michael Day Monica Duke Rachel Heery Traugott Koch Liz Lyon + Andy Powell • • • • • • • Southampton Les Carr Simon Coles Jeremy Frey Chris Gutteridge Mike Hursthouse Andrew Milstead • Manchester • John Blunden-Ellis DEFF Seminar, Copenhagen, September 2005 16 Create Data Flow in eBank UK HTML Deposition Interface Submit Store/link Institutional repository eCrystals Index and Search Harvest (XML) eBank aggregator service Present HTML Present OAI-PMH Deposit Service Provider interfaces e.g. Subject Portal Local archive search interface DEFF Seminar, Copenhagen, September 2005 Data files Metadata 17 CombeChem: An EPSRC pilot project Simulation Video Diffractometer Properties Analysis Structures Database Properties e-Lab X-Ray e-Lab Grid Middleware DEFF Seminar, Copenhagen, September 2005 18 Crystallography workflow RAW DATA DERIVED DATA RESULTS DATA • Initialisation: mount new sample set up data collection • Collection: collect data • Processing: process and correct images • Solution: solve structures • Refinement: refine structure • CIF: produce CIF (Crystallographic Information File) • Validation: chemical & crystallographic checks • Report: generate Crystal Structure Report DEFF Seminar, Copenhagen, September 2005 19 DEFF Seminar, Copenhagen, September 2005 20 A data repository entry DEFF Seminar, Copenhagen, September 2005 21 Access to the underlying data: complex objects ecrystals.chem.soton.ac.uk DEFF Seminar, Copenhagen, September 2005 22 Harvesting: OAIster DEFF Seminar, Copenhagen, September 2005 23 Aggregating: search & discover DEFF Seminar, Copenhagen, September 2005 24 Linking data to publications DEFF Seminar, Copenhagen, September 2005 25 eBank embedded in a science portal DEFF Seminar, Copenhagen, September 2005 26 Ontologies for discovery in an interdisciplinary world • Transform the ‘list’ into an ‘ontology’ • Embed ontology into the deposition process • Publish keywords in OAI • Aggregators use keywords for linking with the broader literature • Researchers use keyword ontology in search and discovery services DEFF Seminar, Copenhagen, September 2005 27 Persistent identifiers for data citation • eBank use cases: depositor, author, service provider, reader, publisher, ? • Schemes: DOI, Handle, ARK, PURL • Global identification: express as http URIs • Added value services: CrossRef, resolution service, integration (Globus), look-up service, ? • Degree of trust or persistence • Costs • Future potential: political, ? • Domain identifiers: International Chemical Identifier (InChI) codes DEFF Seminar, Copenhagen, September 2005 28 Publication & citation of scientific primary data project • National Library for Science & Technology (TIB), University of Hanover, Germany • STD-DOI Project http://www.std-doi.de • DOI registry for datasets • Data requirements: quality control, long-term curation, use DOI resolver • Data publication agents: World Data Center Climate, GeoForschungsZentrum Potsdam • Exemplar data citation: – Kamm, H; Machon, L; Donner, S (2004): Gas chromatography (KTB Field Lab), GFZ Potsdam. doi:10.1594/GFZ/ICDP/KTB/ktb-geoch-gaschr-p DEFF Seminar, Copenhagen, September 2005 29 Integration into crystallographic publishing practices Publishers seal of approval DEFF Seminar, Copenhagen, September 2005 30 Integration into chemistry research workflows • R4L Repository for the Laboratory Project (JISC-funded) automated data capture from instrumentation, registration of results • SMART TEA electronic Laboratory notebook + annotations • Related sub-domains of chemistry: SPECTRa Project (JISC-funded) • Research assessment (RAE) process? DEFF Seminar, Copenhagen, September 2005 31 Integration into the curriculum and e-Learning workflows • MChem course • Assess role in Undergraduate Chemical Informatics courses • Pedagogic evaluation • Introducing school children to e-Research? DEFF Seminar, Copenhagen, September 2005 32 Knowledge extraction & “post-processing” New information & knowledge ……… • Mining (data, text, structures) • Modelling (economic, climate, mathematical, bio) • Analysis (statistical, lexical, pattern matching, gene) • Presentation (visualisation, rendering) • In federated repositories: Digital libraries, datasets, learning materials • Role of Google???? DEFF Seminar, Copenhagen, September 2005 33 3. Looking to the longer term: digital curation & preservation Repositories and digital curation For later use? In use now (and the future)? Static Dynamic Data preservation Data curation “maintaining and adding value to a trusted body of digital information for current and future use” DEFF Seminar, Copenhagen, September 2005 35 Assuring long term access to the research record • Trusted digital repositories – – – – Audit Checklist for Certification Draft Report Research Libraries Group, August 2005 RLG-NARA Taskforce Defined criteria under 4 categories • • • • Organisation Functions, processes & procedures Designated community & usability Technologies & technical infrastructure • UK Digital Curation Centre http://www.dcc.ac.uk – 1st International DCC Conference Sep 29-30, Bath UK DEFF Seminar, Copenhagen, September 2005 36 DEFF Seminar, Copenhagen, September 2005 37 Thank you. Questions?….. More information: UKOLN http://www.ukoln.ac.uk/