Transcript Slide 1
Observation Data Model: Creating a Unified Model for Observational Data in the Ecological and Environmental Sciences. Steve Kelling Cornell Lab of Ornithology There is an enormous variety of observation data. A major challenge is joining observation data gathered in different projects. Nitrogen Skunk Ocean Current Trilobites Integrating Variables to Model Species Occurrence. When multiple independent variables (i.e. land cover, human density climate) are incorporated into species distribution and abundance analysis, accurate estimates of species occurrence can be obtained. Top: Distribution of Northern Cardinal during the summer of 2006. Over 200 independent variables were used in the model. Right: Confidence intervals provide an indication of difference between locations. Observations Workshop St. Barbara/ NCEAS 9.-11. July 2007 Mark Schildauer: “I agree with what I said here” A variety of observational data models were analyzed at the meeting. Organization Short description of observational data modeling approach SEEK The SEEK extensible observations ontology (OBOE) focuses on capturing the essential information about observations required to comprehensively discover and integrate heterogeneous ecological data. NatureServe The NatureServe Observational Data Standard focuses on developing an XML Schema for specimen-oriented survey data to improve data aggregation and sharing within and between organizations. ALTER-NeT The European ALTER-NeT Ontology, CEDEX, focuses on developing an objectoriented data system for cataloguing observational ecological data while retaining semantic information to aid data discovery and analysis. SPIRE The Spire initiative focuses on developing domain-independent, general-purpose ontologies to enable annotation of the contents and structure of existing ecological databases with an initial focus on taxonomy and food web issues (ETHAN). OGC The OGC Observation and Measurement Standard focuses on developing a generic conceptual XML Schema for representing all aspects of observation and measurement data. VSTO The Virtual Solar-Terrestrial Observatory focuses on building ontologies for interoperating among different existing meteorological and atmospheric metadata standards. TDWG TDWG is developing a “meta-model”to integrate biodiversity observations with specimen data by identifying similarities between these two data types, determining whether existing standards suffice to describe them, and if not, developing the additional concepts needed for clarification ODM The CUAHSI Observations Data Model (ODM) and associated relational database focus on storing hydrologic observations data in a system designed to optimize data retrieval for integrated analysis of information collected by multiple investigators. Observations Workshop Summary • Holistic integrative large-scale science would benefit from better data discovery, interpretation, and integration within and across disciplines. • The workshop participants found much commonality among their approaches in modeling observational data. • An extensible observational data model has advantages over conventional models. • The development of a core observational data model should be domain independent and be conducted through an established standards body. This Presentation will review the following outcomes of the meeting. • Definition of Observation • Capabilities • Requirements An Observation is the Determination of the Value of a Property of some Entity in a particular Context – – – – – – – – – Entity: thing or process or phenomena Determination: the outcome of the process by which Value of a Characteristic is measured Characteristic: a property that can be assigned a value Value: discrete or continuous quantification or qualification of a Characteristic Context: the setting and conditions that constrain the interpretation and applicability of a Measurement, such as space, time, or treatment property Observer: Protocol: Standard: Capabilities Organizational Approach Producers – – – – Design Create Manage Publish Consumers – – – – – – Find/Discover Access Interpret Integrate Analyze Report/Present/Visualize Community Consumers • • • • Scientist -- obtain content, information about content, runs data analysis Aggregator -- organizations such as GBIF Application Programmer -- writes analyses, obtains data, query Citizens -- K-12, analysis results, aggregated data Producers • • • • • Scientist -- collect data Application Programmer -- tools for publishing, sharing data Information Manager -- database admin, schema designer Data Entry Personnel -- enter data into system Citizen Scientists -- eg, collecting census data Producers : • Design- Standardize schema components, catalogs of properties, and attributes so they can be shared. • Create- Import/export of data assets should preserve data integrity, and maintain data ownership. • Manage- Develop flexible tools for resource management and access control. • Publish- Enable structured data and provenance descriptions that can be published via common data exchange formats. Consumers : • Discover- Facilitate discovery by providing access to content themes, context, provenance, and attributes via semantic searches. • Access- Standardize data access processes via exchange schemas, and improved machine to machine communication. • Integrate- Capture relationships, and mediate differences between datasets to allow integration. • Analyze- Enable scientific workflow processes to explore patterns and test hypotheses. • Report- Provide resources for data visualizations and result publication. Requirements Organizational Approach Data Model Characteristics Accurate portrayal of observational data via well defined terms that conform to existing standards. • Data Model Items • Extensibility Data Model Items: • Organization of observations by survey-type, protocol, project, data set, data stream, or particular entity must be maintained. • Identify specific relationships among controlled terms, eg, taxon names, categorical response values H/M/L must be identified. • Context must unambiguously represent space time location and other relevant aspects of data with some indication of uncertainty. • Provenance and ownership information maintained at atomic level of data precision. • Experimental design and methods must be described. • Collection Event must be maintained. • Measurements must be accurately maintained. Extensibility • • • • • Support extensions that are specific to sub-disciplines. Allow referencing to ontologies from different domains. Allow terms and definitions to be packaged for re-use. Allow “competing” domain extensions. Extensions should not impact the core model. • Allow extensions to be related (crosswalks). • Allow extensions to be further extended. • Allow “core” extensions for a particular community. Conclusion: Look at comparisons between the developing BIS TDWG Observations Specimen Records Interest Group model and OGC and SDD. For example, SDD may provide a vocabulary for Characteristics/Properties in the Observations model. Acknowledgements: Mark Schildauer Matt Jones Paul Allen