How would you give guidance or prioritize how to address gaps in the lifecycle of data acquisition, curation and preservation? Are there new programs.

Download Report

Transcript How would you give guidance or prioritize how to address gaps in the lifecycle of data acquisition, curation and preservation? Are there new programs.

How would you give guidance or
prioritize how to address gaps in the
lifecycle of data acquisition, curation
and preservation? Are there new
programs or community opportunities?
The Fir Group
Kerstin Lehnert, John Graybeal, Dmitri Mozzherin, Vivian Hutchison, Giri Palanisamy, Eric Wolf, Ron
Weaver, Jan Peters, Walt Snyder, Mary Marlino, Cheryl Morris, Benjamin D Branch, Steve Tessler, Lisa
Raymond, Jeanine Aquino, Scott Jensen, Percy Donaghay, Dave Folker, Sze-Ling Celine Chan
Data Lifecycle
Acquisition – curation – preservation
Data lifecycle starts with PLANNING
Consider ‘use and re-use’ as part of the
data lifecycle
Data Acquisition
 Two different phases:
 ‘Data Creation’: When the data are generated: field, lab,
computation, …
 ‘Data Gathering’: When the data hits the data system
 What is the definition of the "data system"? Many data sources have a
long path to the data system.
 Difference between large science programs and small
investigator-based projects
 Legacy data vs.new data
 historical data submitted to the data archive years later - need to
develop/submit metadata after the fact
Data Acquisition Gaps
 Standards for acquisition that make ingestion more efficient
 Incentives to submit data to archive
 Metadata that ensure proper use and reuse
 Infrastructure, tools
 Need to define metadata standards, etc that go across all
disciplines vs domain specific metadata
Data Curation Gaps
 Lack of ability to discover the data being curated
 Best Practices
 Standards, e.g. uniform metadata
 Funding for metadata collection, coordination,
 Communication process to gather requirements
 Infrastructure
 Ability to document provenance
Data Preservation Gaps
 Funding for data preservation
 Access to the ‘original’ (raw?) data
 Access to software/algorithms used to process the data, i.e.
metadata to reconstruct the data
 metadata that help use of data and understanding the data
 Ability to reuse data
 harvest information from reuse/repurposing in other contexts
 Add value to data during analysis and cycle that back to the
archive for others to benefit
 Repositories for data
How would you give guidance or
prioritize how to address gaps in the
lifecycle of data acquisition, curation
and preservation? Are there new
programs or community opportunities?
The Fir Group
Guidance Needed
 Plan
 Partner with data management
 Initial metadata in acquisition plan
 Tools to assist with metadata entry & Data mgt plan
 from funding agencies (or funding by them for development
of guidance)
 how to know what not to keep (since we can't keep
everything)
Possible Steps
 * Best data practice(s) award
 * NSF program managers instruct all review panels to
evaluate all proposals by DM plan, in such a way that
reviewers realistically review DM resources
 * Every data set has to have a DOI.
 * Perform a community survey to determine what data
lifecycle looks like in different disciplines
To The Cloud…
http://etherpad.ooici.org/geodata-fir