The VIRTUAL SOLAR-TERRESTRIAL OBSERVATORY

Download Report

Transcript The VIRTUAL SOLAR-TERRESTRIAL OBSERVATORY

Ontologies and Semantic Applications in Earth Sciences

Peter Fox (TWC/RPI; formerly HAO/NCAR)

Thanks to many.

Projects funded by NSF/OCI and NASA/ACCESS/ESTO 1 20081118 Fox OOS meeting

Background

Scientists should be able to access a global, distributed knowledge base of scientific data that: • appears to be integrated • appears to be locally available But… data is obtained by multiple means (models and instruments), using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology 2

Data-types as service

VO layer Limited interoperability VO App 2

Service

VO App 3

VOTable

Web {Feature, Coverage, Mapping}

Simple Image Access Protocol

Simple Spectrum Access Protocol

Sensor Web Enablement: Sensor {Observation, Planning,

Simple Time Access Protocol

Analysis} Service

Lightweight semantics DB 1

use the same approach

DB 2 DB 3 Limited meaning, hard coded … … … … Under review DB n

3

Added value Education, clearinghouses, disciplines, etc.

other services, Semantic mediation layer - mid-upper-level VO Semantic interoperability Added value Web Serv.

VO API Added value Mediation Layer Semantic query, hypothesis and inference

• Ontology - capturing concepts of Parameters, Instruments, Date/Time, Data Product (and

Semantic mediation layer - VSTO - low level

Classes • Maps queries to underlying data

Metadata, schema,

• Generates access requests for metadata, data

Query, access and use of data Standard, or ies not, vocabular and schema DB n DB 1 DB 2 DB 3 … … … …

20080602 Fox VSTO et al.

4

Semantic Web Methodology and Technology Development Process • • Establish and improve a well-defined methodology vision for Semantic Technology based application development Leverage any existing vocabularies

Open World: Evolve, Iterate, Redesign, Redeploy Rapid Prototype Leverage Technology Infrastructure Adopt Technology Approach Science/Expert Review & Iteration Use Tools Analysis Use Case Small Team, Develop model/ ontology

5

E.g. Science and technical use cases

Find data which represents the state of the neutral atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity . – Extract information from the use-case - encode knowledge – Translate this into a complete query for data - inference and integration of data from instruments, indices and models Provide semantically-enabled, smart data query services via a SOAP web for the Virtual Ionosphere Thermosphere-Mesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter in any order and with constraints included in any combination.

6 20080602 Fox VSTO et al.

VSTO - semantics and ontologies in an operational environment: vsto.hao.ucar.edu

, www.vsto.org

Web Service Existing OPeNDAP Service

20080602 Fox VSTO et al.

7

Semantic Web Services

20080602 Fox VSTO et al.

8

Semantic Web Services

OWL document returned using VSTO ontology can be used both syntactically or semantically 9 20080602 Fox VSTO et al.

Semantic Web Benefits

• Unified/ abstracted query workflow: Parameters, Instruments, Date-Time

across widely different disciplines

• Decreased input requirements for query: in one case reducing the number of selections from

eight

to

three

• Semantic query support: by using background ontologies and a reasoner, our application has the opportunity to

only expose coherent queries

(portal and services) • Semantic integration: in the past users had to remember (and maintain codes) to account for numerous different ways to combine and plot the data whereas now semantic mediation provides the level of sensible data integration required, and exposed as smart web services – understanding of coordinate systems, relationships, data synthesis, transformations, etc.

– returns independent variables and related parameters • • A

broader range of potential users

(PhD scientists, students, professional research associates and those from outside the fields) VSTO: http://vsto.hao.ucar.edu

, http://www.vsto.org

10

http://dataportal.ucar.edu/schemas/vsto_all.owl

(1.0, 2.0 coming)

Fox RPI: Semantic Data Frameworks May 14, 2008 11

Ingest/pipelines: problem definition

Data is coming in faster, in greater volumes and outstripping our ability to perform adequate quality control

Data is being used in new ways and we frequently do not have sufficient information on what happened to the data along the processing stages to determine if it is suitable for a use we did not envision

We often fail to capture, represent and propagate manually generated information that need to go with the data flows

Each time we develop a new instrument, we develop a new data ingest procedure and collect different metadata and organize it differently. It is then hard to use with previous projects

The task of event determination and feature classification is onerous and we don't do it until after we get the data

12

20080602 Fox VSTO et al.

13

Use cases

• Who (person or program) added the comments to the science data file for the best vignetted, rectangular polarization brightness image from January, 26, 2005 1849:09UT taken by the ACOS Mark IV polarimeter?

• What was the cloud cover and atmospheric seeing conditions during the local morning of January 26, 2005 at MLSO?

• Find all

good

images on March 21, 2008.

• • Why are the quick look images from March 21, 2008, 1900UT missing?

Why does this image look bad?

14

20080602 Fox VSTO et al.

15

20080602 Fox VSTO et al.

16

Provenance

• Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility • Knowledge provenance; enrich with ontologies and ontology-aware tools 17

18

20080602 Fox VSTO et al.

19

Quick look browse

20080602 Fox VSTO et al.

20

21

Visual browse

22

23

24

Search and structured query

Search Structured Query 25

Search

20080602 Fox VSTO et al.

26

Data Integration Use Case

• Determine the statistical signatures of both volcanic and solar forcings on the height of the tropopause 27

Detection and attribution relations…

28

20080602 Fox VSTO et al.

29

SWEET 2.0

Semantic framework indicating how volcano and atmospheric parameters and databases can immediately be plugged in to the semantic data framework to enable data integration.

31

Faceted Search

20080602 Fox VSTO et al.

32

Summary

• Level of ontology encoding relates to use, e.g.

– VSTO: – SPCDIS: – SESDI: Data integration needs higher level of curation of ontologies and mapping to data • Languages and tools – Rapid prototyping (PHP, Semantic MediaWiki) – Clean and simple (RDFS, Perl and SPARQL) – Complex and rich (Java, Protégé, Jena, Pellet, ELMO, Maven, Eclipse) 33

Modified GEON Solution Framework

Data Discovery Data Integration Level 1:

Data Registration at the Discovery Level, e.g. Volcano location and activity

Level 2:

Data Registration at the Inventory Level, e.g. list of datasets by, types, times, products

Level 3:

Data Registration at the Item Detail Level, e.g. access to individual quantities

Earth Sciences Virtual Database

A Data Warehouse where Schema heterogeneity problem is Solved; schema based integration 20080602 Fox VSTO et al.

Ontology based Data Integration

34 A.K.Sinha, Virginia Tech, 2006

Spare material

20080602 Fox VSTO et al.

35

Example 1: Registration of Volcanic Data

Location Codes: • U - Above the 180° turn at Holei Pali (upper Chain of Craters Road) • L - Below Holei Pali (lower Chain of Craters Road) • UL - Individual traverses were made both above and below the 180 ° turn at Holei Pali • H - Highway 11 SO 2 Emission from Kilauea east rift zone vehicle-based (Source: HVO) Abreviations: t/d=metric tonne (1000 kg)/day, SD=standard deviation, WS=wind speed, WD=wind 36

Registering Volcanic Data (2)

• No explicit lat/long data • Volcano identified by name • Volcano ontology framework will link name to location 20080602 Fox VSTO et al.

37

Registering Atmospheric Data (2)

20080602 Fox VSTO et al.

38

Building blocks

• Data formats and metadata: IAU standard FITS, with SoHO keyword convention, JPeG, GIF • Ontologies: OWL-DL and RDF • The proof markup language (PML) provides an interlingua for capturing the information agents need to understand results and to justify why they should believe the results.

• The Inference Web toolkit provides a suite of tools for manipulating, presenting, summarizing, analyzing, and searching PML in efforts to provide a set of tools that will let end users understand information and its derivation, thereby facilitating trust in and reuse of information.

• Capturing semantics of data quality, event, and feature detection within a suitable community ontology packages (SWEET, VSTO) 39