a centre of expertise in data curation and preservation Reflections on open scholarship: process, product and people Dr Liz Lyon, DCC Associate Director Outreach Director,

Download Report

Transcript a centre of expertise in data curation and preservation Reflections on open scholarship: process, product and people Dr Liz Lyon, DCC Associate Director Outreach Director,

a centre of expertise in data curation and preservation
Reflections on open scholarship:
process, product and people
Dr Liz Lyon,
DCC Associate Director Outreach
Director, UKOLN, University of Bath, UK
This work is licensed under a Creative Commons Licence
Attribution-ShareAlike 2.0
Funded by:
2nd International Digital Curation Conference, November 2006
a centre of expertise in data curation and preservation
Three themes
• How?
– Unpacking the title: open scholarship
• What?
– Creating and using science-ready archives
• Who?
– “Digital natives” as data scientists
2nd International Digital Curation Conference, November 2006
•Publicly available?
•Shared?
•Inclusive?
•Collaborative?
What do we mean by “open”?
•Participative?
•Non-proprietary?
Scholarship today?
“Open Access”
Data-driven science
Datacentric
2020
vision
Reference datasets as
infrastructure
“Open source science”
Research into neglected tropical diseases
http://www.thesynapticleap.org/
http://openwetware.org/wiki/Main_Page
Synthetic biology: materials for (bio)
mash-ups? Interesting IPR issues…..
Bioblog
Blogs, blogs and metablogs….
http://www.flickr.com/photos/64696485@N00/13146762/
The Tool Box?
The Peer
Review
Process?
http://www.ch.ic.ac.uk/wiki2/index.php/Mauveine
The Scientific Paper?
Crystal Structure reports data-rich scientific articles
Original slide:
•
Brian McMahon, •
IUCr
•
•
•
•
•
•
3-d positional coordinates
Atomic motions
Molecular geometry
Chemical bonding
Crystal packing
Chemical behaviour arising
from structure
Two dedicated IUCr journals:
Acta Cryst. C, E
Important part of scientific
discussion in many other
titles: Acta Cryst. B, D, F
Validation of data through publication
• Data-centric scholarly
“publications”
• Raw, primary, derived
data integrated with
interpretations
• Mandatory submission
of data with text
a centre of expertise in data curation and preservation
2nd
The database publication?
International Digital Curation Conference, November 2006
http://declanbutler.info/blog/?p=58#more-58
The “mash-up”
Data from
FAO, WHO
+
Google
Earth
Pause for thought…..
• Big science communities
– Grid-enabled applications
– Large managed open data archives
– Funder policy driver
• Small(er) science communities
– Collaborative and social software
– Evolving open wikis and blogs
– Grassroots driver
• Curation and preservation issues
– Burgeoning wiki and blog content
– Web archiving
• Positioning of repositories???
Big science
Funder-mandated sharing?
Top down
“science-ready archives”
Small science
Community culture
Discipline?
Institution?
Bottom up
• Laboratory protocols: common practice
• Instrumentation: proprietary software
• Standard specifications and formats
Data capture
• Working towards standard specifications in the lab
–
–
–
–
Open Microscopy Environment OME
Medical imaging DICOM
Flow cytometry standard FCS
Mass Spectrometry Standards Working Group
mzData vs mzXML
• Laboratory management data systems in
development
RepoMMan: Repository
Metadata and Management
(Univ Hull) using WS-BPEL
Airport
Maintenance
Engineer
Workflow: m2m?
Visual
Inspection
DS&S
Maintenance
Analyst (Fleet
Manager)
Aircraft Lands
Quote
Diagnosis
Rolls Royce
Domain
Expert
DAME signal processing
workflows using Grid
Services
Brief Diagnosis /
Prognosis
Check
Diagnoses
[ unknown ]
Diagnosis
Result
Detailed Diagnosis /
Prognosis
[ fault unresolved ]
[ Clear ]
[ known ]
[ information required ]
Provide
Information
Maintenance
Procedure
Release
Engine
e-Scientist desktop?
Slide: Carole Goble
complete
[ diagnosis
Maintenance
Result
[ fault resolved ]
Request
Information
Analyst [ unknown ]
Decision
Detailed
Analysis
[ diagnosis ]
Expert Decision
[ information required ]
Sign-off
Diagnosis
Provide Further
Details
Request Further
Details
Silchester:
A VRE for
Archaeology
Harmonisation and normalisation
• Standard Deposit API (GNU eprints, Dspace, Fedora)
• Dublin Core Application Profile for ePrints (+ Eduserv)
• Requirements: richer metadata set, support for value-added
services, version identification, appropriate copy (OA), citations
• Based on FRBR
• Data model for scholarly works
• Application profile includes simple and qualified DC properties
The ePrints application profile
• simple DC properties (the usual suspects … )
– identifier, title, abstract, subject, creator, publisher, type,
language, format
• qualified DC properties
– access rights, licence, date available, bibliographic citation,
references, date modified
• new properties
– grant number, affiliation institution, status, version, copyright
holder
• properties from other schemes
– funder, supervisor, editor (MARC relators)
– name, family name, given name, workplace homepage,
mailbox, homepage (FOAF)
• clearer use of existing relationships
– has version, is part of
• new relationship properties
– has adaptation, has translation, is expressed as, is manifested
as, is available as
• vocabularies
– access rights, entity type, resource type and status
Slide: Julie Allinson, UKOLN, Andy Powell, Eduserv
Use DC Application Profile for ePrints?
Data description and discovery
• Validation, publication & discovery of
data models & schema
• eBank Application Profile
http://www.ukoln.ac.uk/projects/ebankuk/schemas/
• Harmonisation and normalisation of
metadata and semantics
• DOI
http://dx.doi.org/10
•
.1594/ecrystals.chem.soton.ac.uk/145
Rights & Citation policy
http://ecrystals.chem.soton.ac.uk/rights.html
• Crystallography: a community
working together
eCrystals ‘Global Federation’ Model
Data creation
& capture in
“Smart lab”
Data discovery,
linking, citation
Laboratory
repository
This work is licensed under a
Creative Commons Licence
Attribution-ShareAlike 2.0
Presentation services / portals
Data discovery,
linking, citation
Publishers: peerreview journals,
conference
proceedings, etc
Deposit
23/10/2006
Data analysis,
Publication
transformation,
mining,
Institutional
modelling
data repositories
Deposit ,
Validation
Search,
harvest
Aggregator
services
Search,
harvest
Validation
Search,
harvest
Subject
Repository
Deposit
Deposit
Curation
Preservation
Deposit
Institution Library &
Information Services
Data deposit & sharing:
roles and responsibilities
•
•
•
•
Funder
Institution
Faculty
Individual
Noor et al PLoS Biol 4(7) 2006
eBank Project exemplar
Adding value:
aggregating & linking data + interpretations
“Repository wow-factor”…
…or adding value through
user interface tools…
Facilitating use and re-use:
text mining tools
• Adding value
Second pause for thought…
• We need to work with instrument suppliers
• We need to understand more about
workflow
• We need to develop new ways of adding
value to datasets through innovative user
tools and services
• We need more evidence of how data is
used and re-used (or not…)
Getting the skills mix
• Communities, teams, individuals
• International Virtual Observatory
Alliance
– Global community
– Virtual organisation
• Multi-disciplinary team approach
– eBank Project exemplar: computer
scientists, domain scientists
(chemists), digital library experts
– Lessons learnt: e-Science Human
Factors Audit Report 2006 Roy
Kawalsky, Loughborough
• NSF Report 2005 Long-lived digital
data collections
– “Data scientist”
Wanted!
data scientist
?
Digital natives as data scientists?
• eBank Project: assessing role of
research data in u/g Chemical
Informatics and MChem courses
at Univ. of Southampton
• Pedagogic evaluation by
Grainne Conole
• Report imminent….
“There were several parts to the course – We started
off with how to get 2D and 3D representations of
molecules onto a computer using a one-dimensional
format, a SMILE string …so just ways of like getting
data into a format so that it can be easily shared
between different computers or different people
without having to change lots of things”
“Well basically I’ve done nothing like it before, so it’s
the first time I’ve sort of delved into computing or
computational chemistry … quite nice, quite
enjoyed starting off with just like a string of data and
pop it into say a database, just a flat string of numbers
basically and then come out with a crystal structure,
which is exactly what it should represent which is
quite cool”
Source: Grainne Conole
New skills requirements:
• interdisciplinary
• quantitative
• data curation
Integrate within the
curriculum
Wingreen & Botstein Mol Cell Biol 7, 2006
Final pause for thought…
• Various approaches to develop and obtain
digital curation skills
• Skills are there but often in discrete
communities: we need to bring communities
together (like at this conference…)
• Integration within the curriculum:
undergraduate students, library & information
science, archival studies, computer science
• Provide recognition and a career path for
emerging data scientists
a centre of expertise in data curation and preservation
Take home messages
• Scholarship is changing fast
• Big science and open source science
both create significant digital curation
challenges
• Science-ready archives are the goal
• Native data scientists are coming
• The culture will change too……….
2nd International Digital Curation Conference, November 2006
a centre of expertise in data curation and preservation
Thank you….
2nd International Digital Curation Conference, November 2006