Transcript Document

Geoscience Knowledge
Representation Using the
SWEET Ontologies
Rob Raskin
Jet Propulsion Laboratory
Transforming Data into
Knowledge
Data
Basic Elements
Services
Storage
Interoperability
Volume/Density
Statistics
Analysis
Methodology
Information
Knowledge
Bytes Numbers
Models
Facts
Ingest Archive
Visualize
Infer
Understand Predict
File Database HDF-EOS GIS MIS
Ontology Mind
Syntactic OPeNDAP WMS/WCS
Semantic
High/Low
Low/High
Checksum Moments Descriptive
Inferential
Fourier Wavelet
EOF
SSA
Exploratory-analysis
Model-based-mining
Syntax
Semantics
What is Knowledge?

Facts, relations, meanings, contexts





Organized information
Core ingredient in “common sense”
Common understanding
In a form to apply reasoning/inference
Dynamic

Expandable
Semantic Understanding is Difficult!
Sea surface temperature: measured 3 m above surface
Sea surface temperature: measured at surface
Variable t: temperature
Variable t: time
Data quality= 5
Let’s eat, Grandma.
Let’s eat Grandma.
Time flies like an arrow.
Fruit flies like a pie.
“Mission accomplished. Major combat
operations in Iraq have ended”
LA Times headline
Database vs Knowledge Base

Database


Entities and Relations
Closed world


All facts included
Knowledge base




Classes and Properties
Collection of facts
Captures corporate memory
Open world

Facts not stated may be either true or untrue
PO.DAAC Knowledge Bases Public access
People
Roles/Tasks
Data
Processing
Data
Products
Metadata
Tools/
Services
Web Pages
Science
Concepts
Missions
Instruments
Organizations
Applications
Announcements
Inquiries
Computers
Documents
(Docushare)
Relations




People have roles
Instruments measure science
parameters
Inquiries relate to data products
etc.
Example of KnowledgeAssisted Service

Yellow Page Lookup:


cars vs automobiles
Hotels vs motels vs resorts
Semantic-based Service
Example: Google

Type into Google: “gymnasiums in Seattle”


Google understands that



Generates map of Seattle with dots locating gyms
Seattle is a place
Gymnasiums is a place-based service
Google understands semantics so that the search
results also could include


locations near Seattle
Similar services (e.g., health club)
Assertion of Facts as Triples
Subject-Verb-Object representation





Flood
subClassOf WeatherPhenomena
HDF
subClassOf FileFormat
Pressure subClassOf PhysicalProperty
Ocean
AIRS
hasSubstance
Water
measures
Temperature
Applications

Software tools can find “meaning” in resources for





Discovery
Fusion
Lineage
…
Requirements

Data products associated with objects in “science concept space”


Data services associated with objects in “service concept space”


Richer descriptions than DIFs
Richer descriptions than SERFs
Search/fusion tools that exploit ontologies
Semantic Web Vision



Web page creators place XML tags around
technical terms on web pages
XML tags point to knowledge base where
term is “defined”
Search tools use this information to provide
value-added services

Common search engines (Google) use these
capabilities only minimally, at present
Ontologies



Current preferred method to store “facts”
General definition: “all that is known”
Computer science definition: Machine-readable definition
of terms and how they relate to one another




As with a dictionary, terms are defined in terms of other terms
Provide shared understanding of concepts
Support knowledge reuse
Support machine-to-machine communications with
deeper semantics than controlled vocabulary
XML-based Ontology
Languages

XML satisfies desired properties for language
syntax




Readable by both humans and machines
However, there are too many possible ways
that XML tags can be named and used
No standardization of XML tag meanings as in
HTML (<b> </b> pair => renders in bold)
Additional standardized semantics needed to
exploit shared understanding of concepts
RDF and OWL

W3C has adopted languages that specialize XML
Resource Description Formulation (RDF)

Ontology Web Language (OWL)

Languages predefine specific tags

RDF: Class, subclass, property, subproperty, …
RDF and OWL form a nested collection of languages, each roughly
a specialization of the preceding language with further shared
understanding







XML
RDF
RDFS
OWL Lite
OWL DL
OWL Full
Semantic Web for Earth and
Environmental Terminology
(SWEET)


SWEET is a concept space
Enables scalable classification of Earth system science
concepts


Anybody can import, expand, and specialize the work of
others


Currently being expanded to Space science
No need to regenerate a physics, chemistry, or math ontology
Concept space is translatable into other
languages/cultures using “sameAs” notions
SWEET Ontologies and Their Interrelationships
Integrative Ontologies
Living
Substances
Non-Living
Substances
Faceted Ontologies
Natural
Phenomena
Physical
Processes
Human Activities
Earth Realm
Physical
Properties
Data
Space
Time
Numerics
Units
SWEET as an Upper Level
Earth Science Ontology
Math
Space
Time
Physics
Chemistry
import
Property
EarthRealm
Process, Phenomena
Substance
Data
SWEET
import
Stratospheric
Chemistry
Biogeochemistry
Specialized
domains
Why an Upper-Level Ontology
for Earth System Science?

Many common concepts used across Earth Science
disciplines (such as properties of the Earth)




Provides common definitions for terms used in multiple
disciplines or communities
Provides common language in support of community and
multidisciplinary activities
Provides common “properties” (relations) for tool developers
Reduced burden (and barrier to entry) on creators of
specialized domain ontologies

Only need to create ontologies for incremental knowledge
How SWEET was Initially
Populated

Initial sources

GCMD




Over 10,000 datasets
Over 1000 keywords
Data providers submit far more than the 1000 terms for “free-text”
search
CF


Over 500 keywords
Very long term names


surface_downwelling_photon_spherical_irradiance_in_sea_w
ater
Decomposed into facets
Spatial Ontology



Concepts of 0-D, 1-D, 2-D, and 3-D objects
Default coordinate system: lat/lon/up
Polygons used to store spatial extents


Spatial attributes added (population, area, etc.)
Scientific applications include: geology to represent
3-D structure
Numerical Ontologies

Numerics



SpatialEntities



Extents: interval, point, 0, positiveIntegers, …
Relations: lessThan, greaterThan, …
Extents: country, Antarctica, equator, inlet, …
Relations: above, northOf, …
TemporalEntities


Extents: duration, century, season, …
Relations: after, before, …
Numerical Ontologies (cont.)

Numeric concepts defined in OWL only through
standard XML XSD spec


Numerical relations defined in SWEET



Intervals defined as restrictions on real line
lessThan, max, …
Cartesian product (multidimensional spaces)
added in SWEET
Numeric ontologies used to define spatial and
temporal concepts
Conceptual Ontologies

Phenomena



ElNino, Volcano, Thunderstorm, Deforestation)
Each has associated, spatial/temporal extent, EarthRealms,
PhysicalProperties etc.
Specific instances included


Human Activities


e.g., 1997-98 ElNino
Fisheries, IndustrialProcessing, Economics, Public Good
State

History or state of planet or component
SWEET Users











ESML- Earth Science Markup Language
ESIP - Earth Science Information Partner Federation
GEON- Geosciences Network
GENESIS- Global Environmental & Earth Science
Information System
IRI- International Research Institute (Columbia)
LEAD- Linked Environments for Atmospheric
Discovery
MMI- Marine Metadata Initiative
NOESIS
PEaCE- Pacific Ecoinformatics and Computational Ecology
SESDI- Semantically Enabled Science Data Integration
VSTO- Virtual Solar-Terrestrial Observatory
Collaboration Web Site

Discussion tools





Version Control/ Configuration Management
Trace dependencies on external ontologies
Tools to search for existing concepts in registered
ontologies
Ontology Validation Procedure



Blog, wiki, moderated discussion board
W3C note is formal submission method
Registry/discovery of ontologies
Support workflows/services for ontology development
Community Issues

Content


Standards and Conventions



Agreement on standards for use of OWL
Fuzzy representation conventions
Review Board



Maintain alignment given expansion of classes and
properties
Who will oversee and maintain for perpetuity (or at least
through the next funding cycle)
ESIP Federation? ESSI?
Global Support

Provide tools to visualize and appreciate the big picture
Update/Matching Issues

No removal of terms except for spelling or factual
errors




Must avoid contradictions
Additions can create redundancy if sameAs not used
Humans must oversee “matching”



Subscription service to notify affected ontologies when
changes made
CF has established moderator to carry out analogous additions
OWL “import” imports entire file
Associate community with ontology terms

Community tagging
Best Practices

Keep ontologies small, modular



Be careful that “Owl:Import” imports
everything
Use higher level ontologies where possible
Identify hierarchy of concept spaces


Model schemas
Try to keep dependencies unidirectional