Transcript Slide 1

Observation Data Model: Creating a Unified Model
for Observational Data in the Ecological and
Environmental Sciences.
Steve Kelling
Cornell Lab of Ornithology
There is an enormous variety of observation data.
A major challenge is joining observation data gathered
in different projects.
Nitrogen
Skunk
Ocean Current
Trilobites
Integrating Variables to Model Species Occurrence.
When multiple independent variables (i.e. land cover, human density climate) are
incorporated into species distribution and abundance analysis, accurate estimates
of species occurrence can be obtained.
Top:
Distribution of Northern Cardinal
during the summer of 2006.
Over 200 independent variables
were used in the model.
Right:
Confidence intervals provide an
indication of difference between
locations.
Observations
Workshop
St. Barbara/ NCEAS
9.-11. July 2007
Mark Schildauer:
“I agree with what I
said here”
A variety of observational data models were analyzed at the meeting.
Organization
Short description of observational data modeling approach
SEEK
The SEEK extensible observations ontology (OBOE) focuses on capturing the
essential information about observations required to comprehensively discover and
integrate heterogeneous ecological data.
NatureServe
The NatureServe Observational Data Standard focuses on developing an XML
Schema for specimen-oriented survey data to improve data aggregation and sharing
within and between organizations.
ALTER-NeT
The European ALTER-NeT Ontology, CEDEX, focuses on developing an objectoriented data system for cataloguing observational ecological data while retaining
semantic information to aid data discovery and analysis.
SPIRE
The Spire initiative focuses on developing domain-independent, general-purpose
ontologies to enable annotation of the contents and structure of existing ecological
databases with an initial focus on taxonomy and food web issues (ETHAN).
OGC
The OGC Observation and Measurement Standard focuses on developing a generic
conceptual XML Schema for representing all aspects of observation and
measurement data.
VSTO
The Virtual Solar-Terrestrial Observatory focuses on building ontologies for
interoperating among different existing meteorological and atmospheric metadata
standards.
TDWG
TDWG is developing a “meta-model”to integrate biodiversity observations with
specimen data by identifying similarities between these two data types, determining
whether existing standards suffice to describe them, and if not, developing the
additional concepts needed for clarification
ODM
The CUAHSI Observations Data Model (ODM) and associated relational database
focus on storing hydrologic observations data in a system designed to optimize data
retrieval for integrated analysis of information collected by multiple investigators.
Observations Workshop Summary
•
Holistic integrative large-scale science would benefit from better data
discovery, interpretation, and integration within and across disciplines.
•
The workshop participants found much commonality among their
approaches in modeling observational data.
•
An extensible observational data model has
advantages over conventional models.
•
The development of a core observational data
model should be domain independent
and be conducted through an
established standards body.
This Presentation will review the
following outcomes of the meeting.
• Definition of Observation
• Capabilities
• Requirements
An Observation is the Determination of the Value
of a Property of some Entity in a particular
Context
–
–
–
–
–
–
–
–
–
Entity: thing or process or phenomena
Determination: the outcome of the process by
which Value of a Characteristic is measured
Characteristic: a property that can be assigned a value
Value: discrete or continuous quantification or
qualification of a Characteristic
Context: the setting and conditions that constrain
the interpretation and applicability of a Measurement,
such as space, time, or treatment
property
Observer:
Protocol:
Standard:
Capabilities
Organizational Approach
Producers
–
–
–
–
Design
Create
Manage
Publish
Consumers
–
–
–
–
–
–
Find/Discover
Access
Interpret
Integrate
Analyze
Report/Present/Visualize
Community
Consumers
•
•
•
•
Scientist -- obtain content, information about content, runs data analysis
Aggregator -- organizations such as GBIF
Application Programmer -- writes analyses, obtains data, query
Citizens -- K-12, analysis results, aggregated data
Producers
•
•
•
•
•
Scientist -- collect data
Application Programmer -- tools for publishing, sharing data
Information Manager -- database admin, schema designer
Data Entry Personnel -- enter data into system
Citizen Scientists -- eg, collecting census data
Producers :
• Design- Standardize schema components, catalogs of properties, and attributes
so they can be shared.
• Create- Import/export of data assets should
preserve data integrity, and maintain data ownership.
• Manage- Develop flexible tools for resource
management and access control.
• Publish- Enable structured data and provenance
descriptions that can be published via common
data exchange formats.
Consumers :
• Discover- Facilitate discovery by providing access to content themes,
context, provenance, and attributes via semantic searches.
• Access- Standardize data access processes via
exchange schemas, and improved machine to
machine communication.
• Integrate- Capture relationships, and mediate
differences between datasets to allow integration.
• Analyze- Enable scientific workflow processes
to explore patterns and test hypotheses.
• Report- Provide resources for data visualizations
and result publication.
Requirements
Organizational Approach
Data Model Characteristics
Accurate portrayal of observational data via well defined
terms that conform to existing standards.
• Data Model Items
• Extensibility
Data Model Items:
• Organization of observations by survey-type, protocol, project, data set,
data stream, or particular entity must be maintained.
• Identify specific relationships among controlled terms, eg, taxon names, categorical
response values H/M/L must be identified.
• Context
must unambiguously represent space time
location and other relevant aspects of data
with some indication of uncertainty.
• Provenance and ownership information maintained
at atomic level of data precision.
• Experimental design and methods must be described.
• Collection Event must be maintained.
• Measurements must be accurately maintained.
Extensibility
•
•
•
•
•
Support extensions that are specific to sub-disciplines.
Allow referencing to ontologies from different domains.
Allow terms and definitions to be packaged for re-use.
Allow “competing” domain extensions.
Extensions should not impact the core
model.
• Allow extensions to be related
(crosswalks).
• Allow extensions to be further extended.
• Allow “core” extensions for a
particular community.
Conclusion:
Look at comparisons between the developing BIS TDWG Observations
Specimen Records Interest Group model and OGC and SDD.
For example, SDD may provide a vocabulary for
Characteristics/Properties in the Observations model.
Acknowledgements:
Mark Schildauer
Matt Jones
Paul Allen