eCrystals Federation: Open Repositories for Open Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton, UK Dr Manjula Patel,

Download Report

Transcript eCrystals Federation: Open Repositories for Open Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton, UK Dr Manjula Patel,

eCrystals Federation:
Open Repositories
for Open Science
Dr Liz Lyon, UKOLN, University of Bath, UK
Dr Simon Coles, University of Southampton, UK
Dr Manjula Patel, UKOLN, University of Bath, UK
CNI Taskforce Meeting, Washington DC, December 2007
This work is licensed under a
Creative Commons Licence
Attribution-ShareAlike 3.0
http://creativecommons.org/licenses/by-sa/3.0/
Federation
Overview
1. Chemistry and Open
Science : context and
practice.
2. Lessons learnt from
eBank Phase 3
3. Data curation and
preservation issues
4. Setting up the
Federation: Challenges
ahead?
Chemistry and Open Science:
context and practice
Federation
Social networks for chemists….
New postgraduate cohorts : millennials / Google
generation : new behaviours
>8000
views
Community content for
chemists : rich media
video + paper = Pubcast
At the
coalface:
tagging &
sharing
workflows
Astronomy,
Bioinformatics,
Chemistry, Social
Science pilots.
Universities of
Manchester &
Southampton
“Small science” : sharing in the lab
Open
Wetware
Laboratory
wikis
Transforming practice?
2006
Open Notebook
Science (ONS)
26 September:
1st use of term
blogged by JeanClaude Bradley,
Drexel University
2007
27 March: ONS at
Amer Chem Society
Symposium
7 August: ONS Poster in
Second Life on Nature
island
24 September: ONS
Case Studies in Second
Life
4 October: > 43,000 hits
in Google for term ONS
10 & 15 October:
Policy lists,DabbleDB
membership
database created US
11 October: ONS
experiment starts in
Cambridge, UK
7 November:
Cameron Neylon
(Univ Southampton /
STFC, UK) posts
“Sourceforge for
Science” concept
10 November: Open Data
for common molecules Wikichemicals? Peter
Murray-Rust’s blog at Univ.
Cambridge, UK
27 November: Research
Network proposal submitted
to UK research council
Yesterday: about 2,400,000
Google hits for Open
Notebook Science
New ideas are surfacing
very fast with instant
development, testing
and take-up…..
eBank Project – building the eCrystals Data Repository
eBank-UK Phase 1 2003
Institutional Repository exemplar
http://ecrystals.chem.soton.ac.uk
Metadata Publication
• Using
simple Dublin Core
• Crystal structure
• Title (Systematic IUPAC Name)
• Authors
• Affiliation
• Creation Date
• Additional chemical information through Qualified Dublin Core
• Empirical formula
• International Chemical Identifier (InChI)
• Compound Class & Keywords
• Specifies which ‘datasets’ are present in an entry
• DOI http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145
• Rights & Citation http://ecrystals.chem.soton.ac.uk/rights.html
• Application Profile http://www.ukoln.ac.uk/projects/ebank-uk/schemas/
wikis
blogs
Publish
Harvest
Lessons learnt from
eBank Phase 3
Federation
Study Aims and Approach
• Scoping the eCrystals Federation of
crystallography data repositories
• Questionnaire and interview-based
• Joint Consultation Workshop (eBank,
R4L, SPECTRa) & Report
• Engage whole data lifecycle
community – crystallographers,
central facilities, publishers, data
centres, and chemical information
specialists.
• Mixed project team: Chemists, Digital
Library researchers & Computer
Scientists
Lessons: Policy and practice
• Must be considered at level of the Institution and the
practising Laboratory
• Mixed lab practice – central service facility versus single
“staff crystallographer” in department
• “Repository Lite” for smaller lab operations?
• Established data ‘publication’ practice + domain subject
repository: Cambridge Crystallographic Data Centre
(CCDC)
• Institutional policy buy-in is essential
• Demonstrate benefits and added value to senior
managers
• Implications for information services structure
Interoperability & Standards
• Instrument manufacturers
proprietary formats
• Technical software platform
• Metadata schema : Application
profiles
• Standards and identifiers –
International Chemical Identifier
(InChI), DOI, CIF, CML, de facto
software
X-ray diffractometers
• Semantic interoperability
Subject Repositories, Publishing
and IPR
• Established subject repository at CCDC (40
years old!) : repository interactions?
• The “embargo problem” : prior dissemination
affecting publication of journal article
• Cultural issues related to chemists “its my
data” (journal article will always be sacred)
• Mechanisms for sharing with collaborators
and referees prior to publication?
Advocacy
• The most important issue?!?
Data curation and
preservation issues
Federation
Digital Curation Centre
http://www.dcc.ac.uk/
• Community
Development work
• Led by UKOLN
• eBank/eCrystals
partner
eBank-UK Phase 3 Curation &
Preservation Study
http://www.ukoln.ac.uk/projects/ebankuk/curation/
Examined four main areas
1. Audit and certification (TRAC,
DRAMBORA, NESTOR, ISO
International repository audit and
certification BOF Group)
2. The Open Archival Information
System (OAIS) and Representation
Information (RI)
3. eBank-UK application profile and
preservation metadata
4. ePrints.org repository platform
Observations & Recommendations 1
• Self-assessment using DRAMBORA toolkit
• Engage DCC audit & certification team
• Formulation of long-term objectives and policy
– Deposit agreements
– Services
• Aim for community-supported sustainability
plan
• Implement regular audits: annual
• Produce evidence of compliance
–
–
–
–
Documentation
Transparency
Adequacy
Measurability
• Federation context
Observations & Recommendations 2
• Maintenance and open access of critical file formats and software
– Work-up software e.g. XPREP
– Export raw data from instrumentation as imgCIF
• Consider Representation Information (RI) in context of whole
crystallography landscape (CCDC, IUCR etc.)
• Develop a preservation and curation strategy and formal policies to
indicate levels of service
– Deposit, ingest, validation, dissemination
• Consider services to be developed over the DCC Registry/Repository
of Representation Information (RRoRI)
Observations & Recommendations 3
• Develop preservation strategy & plan
for the specific content
• Capture preservation metadata,
including versioning and provenance
information
• PREMIS Data Dictionary
– Semantic Units (e.g. file format,
significant properties, provenance,
fixity info)
– Extend eBank metadata application
profile (AP)?
• Obtain consensus on AP
• Seek to automate metadata
generation, extraction, maintenance
• ePrints.org support for information
packages
Setting up the Federation:
Challenges ahead?
Federation
Funder
Data centres /
aggregator
services
Scientist
Scientist
Create
Deposit
Advisory
IR Federation
Curate
Policy
Preserve
Advocacy
Standards Training
Collaborate
Share
Harvest
Link
Discover
Re-use
Link
Publishers
eCrystals Federation
Data Deposit Model
User
Link
Repository deployment & support
• Roll-out in 2 phases
– Universities Sydney, Glasgow, Newcastle with
eprints.org platform
– Universities Cambridge, STFC, ReciprocalNet,
ARCHER with other platforms
• Information Environment Service Registry
(IESR) listing Federation Collections
Laboratory Workflow & Provenance
• Achieving end-to-end
workflows: avoiding
fragmentation of data,
results and
interpretations
• Account for differing
laboratory practice
RAW DATA
DERIVED DATA
Public domain
material
Raw Data
RESULTS DATA
Repository interoperability &
linking services
• Establish core Federation application profile
and mappings
• Bi-directional links with derived articles in
“publisher repositories”, IUCr, Royal Society of
Chemistry (RSC), Chemistry Central
• Test linking options: StORe middleware and
CLADDIER (JISC-funded projects)
• OAI-ORE Pathways Project developments
Interoperability testbed
•
•
•
•
Experimental data sets + metadata as compound objects
Dublin Core and METS not sufficient
OAI-ORE (base: Atom Publishing Protocol) testbed
Enable 3rd party services e.g. data / text mining
eChemistry project
Enabling data discovery
• Royal Society of Chemistry Project Prospect
tagging & semantic linking
Preservation & Sustainability
•
•
•
•
DRAMBORA Assessment : use DRAMBORA Interactive
Enhance Application Profile with PREMIS preservation metadata
Populate RRoRI with crystallography representation information
Examine repository platform conformance to OAIS Ref Model
• Survey partner
institutional preservation
policies
Embedding into current publishing practice
• Chemists still want to publish scholarly articles
• Blogs and repositories are a new form of rapid
communication, but there are prior publication
concerns
• Timing of release of data into public domain and
formal publication will be crucial
– Repository must provide control over timing of public
visibility
– EPrints3 version of eCrystals has ‘embargo tokens’
• Validation and quality in an ‘Open’ world
– Quality indicators?
Advocacy
• Chemists still wary of
‘Open Access’
• eCrystals Roadshow
Workshops engaging both
crystallographers and their
service ‘users’ in the
workplace
• Open forum at
International Union of
Crystallography world
• Publishers Workshop to
congress (Aug 2008)
demonstrate co-existence of open
data models & traditional
scholarly articles
Questions?
Slides will be available at :
http://wiki.ecrystals.chem.soton.ac.uk/index.php
This work is licensed under a
Creative Commons Licence
Attribution-ShareAlike 3.0
http://creativecommons.org/licenses/by-sa/3.0/
Federation