Digital Libraries and e-Research: a UK perspective on a changing landscape. Dr Liz Lyon, Director UKOLN, University of Bath, UK eScience Forum, Berlin October 2005. UKOLN.

Download Report

Transcript Digital Libraries and e-Research: a UK perspective on a changing landscape. Dr Liz Lyon, Director UKOLN, University of Bath, UK eScience Forum, Berlin October 2005. UKOLN.

Digital Libraries and e-Research:
a UK perspective on a changing
landscape.
Dr Liz Lyon, Director
UKOLN, University of Bath, UK
eScience Forum, Berlin October 2005.
UKOLN is supported by:
www.ukoln.ac.uk
a centre of expertise in digital information management
www.bath.ac.uk
Overview
1. e-Research & data-intensive science
2. Building e-infrastructure: repository
services & adding value
•
•
Aggregation and linking: eBank UK
Integration and workflows
3. Looking to the longer term: digital
curation and preservation
eScience Forum, Berlin October 2005
2
1. e-Research & data-intensive science
eScience - the data deluge
Data
Overload!
EPSRC National
Crystallography
Service
How do we
disseminate?
eScience Forum, Berlin October 2005
4
Diversity of data collections
•
•
Very large, relatively homogeneous:
Large-scale Hadron Collider (LHC) outputs from CERN
Smaller, heterogeneous and richer collections:
World Data Centre for Solar-terrestrial Physics CCLRC
Small-scale laboratory results:
“jumping robots” project at the University of Bath
Population survey data: UK Biobank
•
Highly sensitive, personal data: patient care records
•
•
eScience Forum, Berlin October 2005
5
Taxonomy of data collections
•
•
•
Research collections:
jumping robots
Community collections:
Flybase at Indiana (with
UC Berkeley )
Reference collections:
Protein Data Bank
Evolution……
Source: NSF Long-Lived Digital
Data Collections
Draft report revisedeScience
MayForum,
2005
Berlin October 2005
6
Experience of data-sharing
• Large scale data sharing in the life sciences
Draft Report June 2005
Sponsored by UK research funding bodies
MRC, BBSRC, NERC, JISC, Wellcome
• Outcomes & recommendations
–
–
–
–
Importance of standards and good quality metadata
Require a data management plan
Work needed on vocabularies & ontologies
Awareness of archiving & long term preservation
• Position of research funders and policy makers?
eScience Forum, Berlin October 2005
7
eScience Forum, Berlin October 2005
8
Presentation services: subject, media-specific, data, commercial portals
Data creation /
capture /
gathering:
laboratory
experiments,
Grids,
fieldwork,
surveys, media
Resource
discovery, linking,
embedding
Data analysis,
transformation,
mining, modelling
Searching ,
harvesting,
embedding
Aggregator
services: national,
commercial
Resource
discovery,
linking,
embedding
Learning object
creation, re-use
Harvesting
metadata
Research &
e-Science
workflows
Deposit / selfarchiving
Learning &
Teaching
workflows
Repositories :
institutional,
e-prints, subject,
data, learning objects
Validation
Publication
Resource
discovery, linking,
embedding
The scholarly knowledge cycle.
Liz Lyon, Ariadne, July 2003.
© Liz Lyon (UKOLN, University of Bath), 2005
This work is licensed under a Creative Commons License
Attribution-ShareAlike 2.0
Deposit / selfarchiving
Institutional
presentation
services: portals,
Learning
Management
Systems, u/g, p/g
courses, modules
Peer-reviewed
publications: journals,
conference proceedings
eScience Forum, Berlin October 2005
Validation
Quality
assurance
bodies
9
Digital repositories: a UK view in 2005
• Institutional repository trends D-Lib Magazine Sept 2005
– Statistics: UK 31, (Germany 103, Sweden 25)
– Policy: UK RCUK draft, (Germany YES),
– National programmes: UK YES, (Germany, Australia, Sweden,
Netherlands YES)
• Pioneering work: University of Southampton, SHERPA,
ePrints UK, eBank UK……
• Southampton has a Self-Archiving Policy and a
mandate rather than a recommendation
• JISC £4M Digital Repository Programme started
• 13 Oct JISC announces extra £80M capital funding over
2 years which includes further support for repositories
eScience Forum, Berlin October 2005
10
2. Building e-infrastructure:
repository services & adding value
eBank UK Project
http://www.ukoln.ac.uk/projects/ebank-uk/
• Two key themes:
– Open access to datasets
– Linking research data to publications and to learning
• UKOLN, University of Southampton, University of Manchester
• e-Science application ‘Combechem’ : Grid-enabled combinatorial chemistry
+ National Crystallography Service
• Resource Discovery Network / PSIgate physical sciences portal
eScience Forum, Berlin October 2005
12
Create
Data Flow in eBank UK
HTML
Deposition
Interface
Submit
Store/link
Institutional
repository
eCrystals
Index
and
Search
Harvest
(XML)
eBank
aggregator
service
Present
HTML
Present
OAI-PMH
Deposit
Service Provider
interfaces e.g.
Subject Portal
Local archive
search
interface
eScience Forum, Berlin October 2005
Data files
Metadata
13
CombeChem: An EPSRC pilot project
Simulation
Video
Diffractometer
Properties
Analysis
Structures
Database
Properties
e-Lab
X-Ray
e-Lab
Grid Middleware
eScience Forum, Berlin October 2005
14
Crystallography workflow
RAW DATA
DERIVED DATA
RESULTS DATA
• Initialisation: mount new sample set up data collection
• Collection: collect data
• Processing: process and correct images
• Solution: solve structures
• Refinement: refine structure
• CIF: produce CIF (Crystallographic Information File)
• Validation: chemical & crystallographic checks
• Report: generate Crystal Structure Report
eScience Forum, Berlin October 2005
15
A data repository entry
eScience Forum, Berlin October 2005
16
Access to the underlying data:
complex objects
ecrystals.chem.soton.ac.uk
eScience Forum, Berlin October 2005
17
Aggregating: search & discover
eScience Forum, Berlin October 2005
18
Linking data to publications
eScience Forum, Berlin October 2005
19
Embedding in a science portal
for student learners
eScience Forum, Berlin October 2005
20
Ontologies for discovery in
an inter-disciplinary world
• Transform the ‘list’ into an
‘ontology’
• Embed ontology into the
deposition process
• Aggregators use keywords
for linking with the broader
literature
• Researchers use keyword
ontology in search and
discovery services
eScience Forum, Berlin October 2005
21
Persistent identifiers for
data citation
• eBank use cases: depositor, author, service
provider, reader, publisher, ?
• Schemes: DOI, Handle, ARK, PURL
• Global identification: express as http URIs
• Added value services: CrossRef, resolution
service, integration (Globus), look-up service, ?
• Degree of trust or persistence
• Costs
• Future potential: political, ?
• Domain identifiers: International Chemical Identifier
(InChI) codes
eScience Forum, Berlin October 2005
22
Publication & citation of
scientific primary data project
• National Library for Science & Technology (TIB),
University of Hanover, Germany
• STD-DOI Project http://www.std-doi.de
• DOI registry for datasets
• Data requirements: quality control, long-term curation,
use DOI resolver
• Data publication agents: World Data Center Climate,
GeoForschungsZentrum Potsdam
• Exemplar data citation:
– Kamm, H; Machon, L; Donner, S (2004): Gas chromatography
(KTB Field Lab), GFZ Potsdam.
doi:10.1594/GFZ/ICDP/KTB/ktb-geoch-gaschr-p
eScience Forum, Berlin October 2005
23
Integration into
crystallographic
publishing
practices
Publishers
seal of
approval
eScience Forum, Berlin October 2005
24
Integration into chemistry
research workflows
• R4L Repository for the Laboratory Project (JISC-funded)
automated data capture from instrumentation, registration of results
• SMART TEA electronic Laboratory notebook + annotations
• Related sub-domains of chemistry: SPECTRa Project (JISC-funded)
• Research assessment (RAE) process?
eScience Forum, Berlin October 2005
25
Integration into the curriculum
and e-Learning workflows
• MChem course
• Assess role in
Undergraduate
Chemical Informatics
courses
• Pedagogic evaluation
eScience Forum, Berlin October 2005
26
3. Looking to the longer term:
digital curation & preservation
Repositories and digital curation
For later use?
In use now (and the future)?
Static
Dynamic
Data preservation
Data curation
“maintaining and adding value to a trusted body of digital
information for current and future use”
Assuring long-term access to the research record
eScience Forum, Berlin October 2005
28
UK Digital Curation Centre
http://www.dcc.ac.uk
UK Digital Curation Centre
http://www.dcc.ac.uk
• Universities of Edinburgh, Glasgow, CCLRC, UKOLN
• Funding from JISC and EPSRC for 3 years
eScience Forum, Berlin October 2005
29
UK Digital Curation Centre
• Research agenda
– Annotation, provenance and archiving of scientific
databases, socio-legal issues, organisational dynamics of
trusted repositories
• Development activities
– Audit & certification, representation information registry,
wiki at http://dev.dcc.ac.uk
• Delivering services
– Curation manual, 45 initial topics identified, Workshop
Programme Digital Curation & Preservation: Defining the
Research Agenda for next Decade, Nov in Warwick UK
• Outreach Programme
– 1st International DCC Conference presentations available,
PV 2005 Nov in Edinburgh, International Journal of Digital
Curation
eScience Forum, Berlin October 2005
30
Thank you.
Questions?…..
More information: UKOLN http://www.ukoln.ac.uk/