Bloomsbury Conference UCL, London 6.25.10 Fourth Bloomsbury Conference on e-Publishing and e-Publications Valued Resources: Roles and Responsibilities of Digital Curators and Publishers Conceptualizing Library Data Curation.

Download Report

Transcript Bloomsbury Conference UCL, London 6.25.10 Fourth Bloomsbury Conference on e-Publishing and e-Publications Valued Resources: Roles and Responsibilities of Digital Curators and Publishers Conceptualizing Library Data Curation.

Bloomsbury Conference
UCL, London
6.25.10
Fourth Bloomsbury Conference on e-Publishing and e-Publications
Valued Resources: Roles and Responsibilities of Digital Curators and Publishers
Conceptualizing Library Data
Curation and Publishing Services at
Purdue University
Charles Watkinson
Director
Purdue University Press
D. Scott Brandt
Assoc Dean for Research
Purdue University Libraries
Bloomsbury Conference
UCL, London
6.25.10
Structure of the Presentation
I. Some Background & Context
II. Exploring Library’s Role in the “Data Deluge”
III. Data Curation Profiles: what we’re learning
IV. What a Publisher can learn from the Profile
“Data curation is the activity of managing and
promoting the use of data from the point of
creation, to ensure its fitness for contemporary
purposes and availability for discovery and reuse.”
Bloomsbury Conference
UCL, London
6.25.10
Purdue University and Purdue Libraries
• ~38K students, ~1.8K faculty
• Strengths in science, technology,
agriculture, & engineering.
• 12 subject-oriented Libraries + units
• University press a unit (only 11% of
US presses report within Libraries)
Dean
Assoc Dean
for Research
D2C2
Data
Research
Scientist
Interdiscip
Research
Librarian
Directors of Office of Copyright,
Finance, and the University Press
Assoc for
Academic
Affairs
Library
Faculty
-32-
Assoc Dean
for Digital
Programs and
Information
Access
Assoc Dean
for Planning &
Administration
Bloomsbury Conference
UCL, London
6.25.10
published
data/
datasets
unpublished
research
published
research
published
research
non-traditional
non-traditional
traditional
secondary/
tertiary
resources
analyzed
data/
datasets
Analyzed data might need to be reviewed prior to publication, or
in case of questions after publication. It is increasingly linked as
“supplementary data” by publishers
processed
data/
datasets
Quite often data must be scrubbed/anonymized, or processed to
format prior to analysis; some disciplines share this data widely
within their communities (e.g., astronomy, physics, etc.)
“raw”
data/
datasets
Some raw data are shared readily (e.g., genetics), but
also quite often are discarded, depending on discipline
Modified from: Brandt, D.S. “Scholarly Communication” (in To Stand the Test of Time: Long-Term
Stewardship of Digital Data Sets in Science and Engineering.: Final Report of Workshop New Collaborative
Relationships: Academic Libraries in the Digital Data Universe. ARL, Washington, DC, September 2006.)
Bloomsbury Conference
UCL, London
6.25.10
PUL response to “data deluge”
• Investigating research data needs and building
relationships with faculty, in order to:
• Design, build, assess prototype infrastructure,
tools and services to handle digital data.
• This approach recognizes the disciplinaryspecific nature of faculty needs, though there
is a tension between this and the practical
requirements of building a sustainable suite of
services/digital infrastructure.
Bloomsbury Conference
UCL, London
6.25.10
Our organization to achieve this vision
Distributed
Liaison
Centralized
Services
Support
across the
research
lifecycle
Faculty Liaison
subject librarians
Publishing
e-Pubs & Press
Data Management
D2C2
Rights Management
University Copyright Office
disciplinary faculty
Bloomsbury Conference
UCL, London
6.25.10
1. Investigating Research Data Needs
• Strategy 1: Embedding data
scientists in research projects;
D2C2 provides this expert
consultancy.
• Strategy 2: Creating tools to
structure conversations about
data; Data Curation Profiles
help liaison librarians structure
their conversations.
DCP
D2C2
librarians
researchers
Bloomsbury Conference
UCL, London
6.25.10
2. Solving Problems and Developing
Prototype Tools, Systems, Services
Study
Concept &
Design
Data
Collection
Data
Processing
Data
Access &
Dissemination
Analysis
Research
Outcomes
• Ingest, Preservation and Access for
Water Quality Datasets in an Institutional
Repository
• Developing a Data Management and
Curation Workflow for Camp Calcium
• Developing a Content Organization
Framework for Regenstrief Center
Healthcare Delivery Hub
• Enabling end-to-end geospatial data
modeling workflows via INPort: The
Isotope Networks Portal
• Leveraging Relational Information
in the HUBs using Linked Data
• Investigate and Implement
Persistence for HUB Resources
• DataCite (founding member)
• Integrating Spatial Educational
• Prototype publications linked to
Experiences (ISEE) into Crop, Soil, and
data through e-Pubs and Purdue
Environmental Science Curricula
• INTEROP: Developing Community-based University Press.
DRought Information Network Protocols
and Tools for Multi -disciplinary Regional
Scale Applications
Adapted from: e-Science and the Life Cycle Model of Research
http://datalib.library.ualberta.ca/~humphrey/lifecycle-science060308.doc
Bloomsbury Conference
UCL, London
6.25.10
Data Curation Profiles
Bloomsbury Conference
UCL, London
6.25.10
Profiling Data
• Research Data Lifecycle (what’s the story
of the data from producer's perspective)
• Data Management / Storage
• Disposition of the Data
• Data Dissemination and Sharing
• Data Preservation and Repositories
• Roles for Libraries, Librarians, and
Publishers
Sample Profile link
Bloomsbury Conference
UCL, London
6.25.10
Disposition of the Data
• Willingness / Motivations to share
– feelings/reservations/willingness towards sharing
• Access control
– need to restrict or control access to/from others
• Target data for sharing
– stage in the lifecycle the data should be shared
• Value of the data
– real or potential value, from their perspective
• Embargo (and reasons why/why not)
Bloomsbury Conference
UCL, London
6.25.10
What data curators can learn
• Advancing university-based cyberinfrastructure is
dependent on our understanding of how to support
data practices and needs
• Sharing is at the heart of success: collecting, storing,
and making use of data can only come after the
means for sharing are in place
• We cannot collect and curate all data, particularly in
a way that facilitates effective re-use
– We will need to work with researchers to develop selection
and appraisal guidelines, and data services
from: M. Cragin. (2009) “Data Sharing, Small Science, and Institutional
Repositories.” UK e-Science All Hands Meeting: Oxford, UK
Bloomsbury Conference
UCL, London
6.25.10
Data Curation Proliferation
DCP
12
workshops
dataconservancy.org
Bloomsbury Conference
UCL, London
6.25.10
What publishers can learn
• Researchers want to disseminate outputs, but ranges
in scope, format, use
• They are generally willing to share data with others,
but not without certain restrictions, or benefits for
themselves
• They hold on to their data but do not do much to
curate it; what is most easily or willingly shared is not
always the data that has the most re-use value
Bloomsbury Conference
UCL, London
6.25.10
Purdue UP lesson learned 1
“Researchers want to disseminate outputs, but ranges
in scope, format, use”
• Print books and subscription-based journals, PUP’s
traditional focus, are not enough
• PUP / Libraries need to offer a range of different
channels to fit different needs
• PUP / Libraries need a venue to experiment with
hybrid or new models
Bloomsbury Conference
UCL, London
6.25.10
“A Continuum of Scholarly Content” in the IR
Student
Admin
Unaffiliated
Source of scholarship
Faculty
(with thanks to J.G. Bankier, Berkeley Electronic Press)
Book Pre Print
Datasets
Faculty Journal
/Primary
Post Print
Faculty Conference
Nonresearch
Research Finding
research
Committee Meetings
output
Research Reports
Newsletter
Dissertation
Masters Thesis
Graduate Journal
Honor Papers
Undergrad Conference
Undergrad Journal
Red stars = Purdue UP?
Blue stars = Purdue e-Pubs?
Admin Report
Alumni Magazine
Historical Collection
Commencement
address
Low
Symposium
Society Journal
Policy Report
Scholarly Impact of Content
High
Bloomsbury Conference
UCL, London
6.25.10
Purdue UP lesson learned 2
“Researchers willing to share data with others, but not
without certain restrictions/benefits”
• PUP provides a layer of editorial services for
credentialing that can incentivize data sharing
• PUP needs to make it easy to link to and cite data in
publications (Datacite so important!)
• PUP / Libraries need to be nuanced in their Open
Access messages (OA is not always right strategy)
Bloomsbury Conference
UCL, London
6.25.10
Read the full text of the book on your
portable device
Follow in-text URLs to supplementary
data
View spreadsheets on-site or
download them from
your personal computer
Bloomsbury Conference
UCL, London
6.25.10
Purdue UP lesson learned 3
“What is most easily/willingly shared is not always data
that has the most re-use value”
• Move away from producing data supplements for
publications to producing supplementary
publications to drive re-use of data
• Take advantage of being “inside the tent” to have
deeper conversations with scholars about what is
most important data for reuse
Bloomsbury Conference
UCL, London
6.25.10
Next Steps
• Spreading the use of DCPs so that we can get a more
complete picture of faculty behavior variations
around data
• More clearly defining library-based publishing
services, and building relevant skills and tools in
Libraries and Press
• Communicating to faculty the full range of library
services they have access to, and changing their old
views of what Purdue Libraries and Purdue UP “do”
Bloomsbury Conference
UCL, London
6.25.10
Thank
you!
D. Scott Brandt
[email protected]
Charles Watkinson
[email protected]