Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation and Integration of Geo-Data Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout.

Download Report

Transcript Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation and Integration of Geo-Data Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout.

Geo-Data Informatics (GDI) Workshop:
Exploring the Life Cycle, Citation
and Integration of Geo-Data
Summary Report from
Thursday, 3 March 2011
Pine Room Data Integration Breakout Group
Discussion Prompt
In your view/experience what parts of data integration
implementations/applications or frameworks are well
established (or not) in your discipline(s) and what are the
common gaps?
Moderator: Cyndy Chandler (WHOI, BCO-DMO)
Rapporteur: Chris Mattmann (NASA JPL, USC)
Discussion notes kept at TWC hosted titanpad site
Participants
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Bob Arko (Lamont-Doherty Earth Observatory)
Joanne Luciano (TWC, RPI)
Anna Milan (National Geophysical Data Center)
Bob Simons (NOAA)
Brian Wee (NEON, Inc.)
Leslie Hsu (LDEO)
Roland Viger (USGS)
James Wilson (James Madison University)
Tom Narock (NASA/GSFC)
Cathy Constable (SIO, UCSD)
Ruth Duerr (NSIDC)
Yoori Choi (CUAHSI)
Lee Allison, Arizona Geological Survey
Erin Robinson (ESIP)
Kavitha Chandrasekar, Indiana University
Bob Detrick (NSF)
Clifford Jacobs (NSF)
Leonard Jonson (NSF)
Data Integration
• What does that mean?
Combining more than one data source into a single data
object. Different from display of multiple data sources
in a single view.
Example: a database join
Time series data sets made up of a variety of sources of
data often require data integration.
Data aggregation and interoperability are related
concepts.
Group did not come to consensus.
Geo Disciplines Represented
•
•
•
•
•
•
•
•
•
•
Geology
Hydrology
Oceanography
Geophysics
Geography
Marine geology and geophysics
Space science
Air quality
Computational neuroscience
Multi-disciplinary or discipline-agnostic: data
management, computer science and archive
Geo-Data Integration
• What aspects are well established or not?
• Identify common gaps?
• For many projects, two common themes emerged as
being associated with some level of success in ability to
do data integration:
– ‘long-term’ commitment of funding support
– Active engagement of funding managers
Examples:
Unidata (Atmospheric Sciences)
CUASHI (Hydrography)
IRIS (Earthquake)
US JGOFS, US GLOBEC, US WOCE (Ocean Sciences)
ODP (Ocean Drilling)
NEON
Support for Data Integration
Development of community of practice
• Infrastructure to foster communication (workshops)
• Mentoring of students and early career PIs
• Development of tools (e.g. Unidata developed NetCDF
which has been adopted by many communities)
• Education and training
• The persistence and recognition of a ‘named’
community can enable funds to flow from some
agencies to researchers
Support for Data Integration
• Some communities agreed on common data
formats that facilitated data integration
• Pressures from funding agencies or
community needs resulted in common
software tools
• Some communities identified ‘primary’ or
‘core’ variables (e.g. common, essential
measurements)
Summary
• ‘Long-term’ funding support enables
development of a community-of-practice that
fosters communication, education and
training, development and adoption of
common tools and identification of core
measurements. Communities-of-Practice can
divide up the labor and work collaboratively to
address shared challenges (economy of scale).
Additional Observations
• Tension between local and global (single PI to
coordinated project to national to
international). An awareness of global use of
data could help with subsequent data
integration.
• Early planning/specs for data management are
important but traditionally difficult to obtain
funding.
Gaps
• Lack of awareness/understanding that keeping
data ‘alive’ (usable) is not free
• Many people think data stewardship and data
preservation are "solved problems” (not).
• "bit level preservation" has been solved, but what
is the useful lifespan of those files? What effort is
required to make the archived data compatible
with all the latest tools and technology. Ability to
use a dataset declines over time, without
continuing and ongoing attention to ensure that
it's still meeting the current access requirements.
Gaps
• Historical or legacy data (originating PI is no
longer active in the research community)
• no national policy for scientific preservation
• different disciplines have different
interpretations of features in a dataset
• Lack of guidelines for best practices regarding
metadata required to document model results
* software, methodology, inputs, outputs, etc
Gaps
• Misconception that you create metadata one
time, and it's forever good
– not a true statement
– somehow the metadata needs to be updated
– systems and the infrastructure need to support
this
– metadata needs to evolve over time
Suggestion
Group agreed that ESIP would be an
appropriate community in which to continue
these discussions and start to do some much
needed planning and cross-disciplinary
solutions needed to address the gaps and
improve infrastructure for geo-data
integration.
Additional Comments
• NRC study done 7-8 years ago about the loss
of data and samples in the geosciences:
http://www.nap.edu/openbook.php?record_id=10348&page=R1
• Geoscience Data and Collections: NATIONAL
RESOURCES IN PERIL
Additional Comments
• Marine Metadata Interoperability (MMI)
http://marinemetadata.org/
Collection of ‘Guides’ on topics including Semantic
Web technologies, controlled vocabularies, ontologies,
standards, metadata best practices, and much more.
• MMI Ontology Registry and Repository (ORR) is a web
application through which you can create, update,
access, and map ontologies and their terms.
http://mmisw.org/orr/#b
Additional
• CUASHI: Hydrologic Ontology System (funded by
NSF)
http://his.cuahsi.org/ontologyfiles.html
http://water.sdsc.edu/hiscentral/startree.aspx
• "Data Management Plan" template available from
CUAHSI (February 2011). It is available at
http://www.cuahsi.org/his-dmp.html; and
includes data inventory, data and metadata
standards, data management life cycle, etc.
Additional Comments
• EXILIR
http://www.bbsrc.ac.uk/science/international/eli
xir.aspx European life science infrastructure for
biological information.
• Its Mission: To construct and operate a
sustainable infrastructure for biological
information in Europe to support life science
research and its translation to medicine and the
environment, the bio-industries and society.