Data Curation Education - University of Illinois at Urbana

Download Report

Transcript Data Curation Education - University of Illinois at Urbana

GSLIS Research Showcase, 9 April 2010
The Data Conservancy:
Research on Data Curation and Repositories
Center for Informatics Research in Science & Scholarship
Carole Palmer, PI
Melissa Cragin, John MacMullen, Tiffany Chao
Allen Renear, Dave Dubin, Simone Sacchi
Michael Welge & Loretta Auvil, NCSA
Led by:
What’s the problem?
Scientists & scholars generate increasingly vast amounts of digital data.
Digital data is extremely fragile; few standards of good practice.
Data are essential raw materials of science and scholarship
Data are valuable institutional, disciplinary, and national assets
with tremendous potential for integration and reuse.
Need for repositories of “curated” data
Data curation is the active and on-going management of data through its
lifecycle of interest and usefulness to scholarship and science.
enable data discovery and retrieval
maintain data quality add value
provide for re-use over time
The Data Conservancy asserts research libraries as core part of
emerging distributed network of data collections and services
“Data sets are the new special collections.”
(Sayeed Choudhury, personal communication, 2007)
“Data centers are the new library stacks.”
(Winston Tabb, JHU Dean of Libraries)
Data collections and services consistent with research library mission.
Will be like other collections requiring library support and expertise
Will need to serve broad academic constituency.
flickr.com/photos/001fj/2907653323/
Flickr users: stancia, rh creative commons
Astronomy as an exemplar scientific community
Achieved notable success in community data standards, practices,
documentation, and associated services for research and learning.
DC initial goal - ingest astronomy data into preservation archive,
connect data to existing services used by astronomers.
** SDSS 140 TB, 3 times that currently held on JHU campus
Demonstrate utility of hosting data in environment that supports
existing scientific capabilities in a sustainable manner.
Extend to:
life sciences
earth sciences
social sciences
To date, limited support for “small” science
Data from Big Science is … easier to handle, understand and archive.
Small Science is horribly heterogeneous and far more vast.
In time will generate 2-3 times more data than Big Science.
(‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher Education, 23/06/2006.)
small science data
CIRSS contributions to DC and DataNet Partners
Data practices group (Palmer, Cragin, MacMullen, Chao)
comparative analysis concentrating on small science
taxonomies of data types, practices, & curation
criteria for deposition, sharing, quality control
long-term potentials of data
Data concepts group (Renear, Dubin, Sacchi)
development of formal terminology, identity conditions for
collections, data sets, versions, and data items
rules that relate collection and data set metadata
support development of common collection registry scheme
NCSA SEASR group (Welge, Auvil)
extend and advance Software Environment for the Advancement
of Scholarly Research – begin with high throughput biology