Transcript Document
Geoscience Knowledge
Representation Using the
SWEET Ontologies
Rob Raskin
Jet Propulsion Laboratory
Transforming Data into
Knowledge
Data
Basic Elements
Services
Storage
Interoperability
Volume/Density
Statistics
Analysis
Methodology
Information
Knowledge
Bytes Numbers
Models
Facts
Ingest Archive
Visualize
Infer
Understand Predict
File Database HDF-EOS GIS MIS
Ontology Mind
Syntactic OPeNDAP WMS/WCS
Semantic
High/Low
Low/High
Checksum Moments Descriptive
Inferential
Fourier Wavelet
EOF
SSA
Exploratory-analysis
Model-based-mining
Syntax
Semantics
What is Knowledge?
Facts, relations, meanings, contexts
Organized information
Core ingredient in “common sense”
Common understanding
In a form to apply reasoning/inference
Dynamic
Expandable
Semantic Understanding is Difficult!
Sea surface temperature: measured 3 m above surface
Sea surface temperature: measured at surface
Variable t: temperature
Variable t: time
Data quality= 5
Let’s eat, Grandma.
Let’s eat Grandma.
Time flies like an arrow.
Fruit flies like a pie.
“Mission accomplished. Major combat
operations in Iraq have ended”
LA Times headline
Database vs Knowledge Base
Database
Entities and Relations
Closed world
All facts included
Knowledge base
Classes and Properties
Collection of facts
Captures corporate memory
Open world
Facts not stated may be either true or untrue
PO.DAAC Knowledge Bases Public access
People
Roles/Tasks
Data
Processing
Data
Products
Metadata
Tools/
Services
Web Pages
Science
Concepts
Missions
Instruments
Organizations
Applications
Announcements
Inquiries
Computers
Documents
(Docushare)
Relations
People have roles
Instruments measure science
parameters
Inquiries relate to data products
etc.
Example of KnowledgeAssisted Service
Yellow Page Lookup:
cars vs automobiles
Hotels vs motels vs resorts
Semantic-based Service
Example: Google
Type into Google: “gymnasiums in Seattle”
Google understands that
Generates map of Seattle with dots locating gyms
Seattle is a place
Gymnasiums is a place-based service
Google understands semantics so that the search
results also could include
locations near Seattle
Similar services (e.g., health club)
Assertion of Facts as Triples
Subject-Verb-Object representation
Flood
subClassOf WeatherPhenomena
HDF
subClassOf FileFormat
Pressure subClassOf PhysicalProperty
Ocean
AIRS
hasSubstance
Water
measures
Temperature
Applications
Software tools can find “meaning” in resources for
Discovery
Fusion
Lineage
…
Requirements
Data products associated with objects in “science concept space”
Data services associated with objects in “service concept space”
Richer descriptions than DIFs
Richer descriptions than SERFs
Search/fusion tools that exploit ontologies
Semantic Web Vision
Web page creators place XML tags around
technical terms on web pages
XML tags point to knowledge base where
term is “defined”
Search tools use this information to provide
value-added services
Common search engines (Google) use these
capabilities only minimally, at present
Ontologies
Current preferred method to store “facts”
General definition: “all that is known”
Computer science definition: Machine-readable definition
of terms and how they relate to one another
As with a dictionary, terms are defined in terms of other terms
Provide shared understanding of concepts
Support knowledge reuse
Support machine-to-machine communications with
deeper semantics than controlled vocabulary
XML-based Ontology
Languages
XML satisfies desired properties for language
syntax
Readable by both humans and machines
However, there are too many possible ways
that XML tags can be named and used
No standardization of XML tag meanings as in
HTML (<b> </b> pair => renders in bold)
Additional standardized semantics needed to
exploit shared understanding of concepts
RDF and OWL
W3C has adopted languages that specialize XML
Resource Description Formulation (RDF)
Ontology Web Language (OWL)
Languages predefine specific tags
RDF: Class, subclass, property, subproperty, …
RDF and OWL form a nested collection of languages, each roughly
a specialization of the preceding language with further shared
understanding
XML
RDF
RDFS
OWL Lite
OWL DL
OWL Full
Semantic Web for Earth and
Environmental Terminology
(SWEET)
SWEET is a concept space
Enables scalable classification of Earth system science
concepts
Anybody can import, expand, and specialize the work of
others
Currently being expanded to Space science
No need to regenerate a physics, chemistry, or math ontology
Concept space is translatable into other
languages/cultures using “sameAs” notions
SWEET Ontologies and Their Interrelationships
Integrative Ontologies
Living
Substances
Non-Living
Substances
Faceted Ontologies
Natural
Phenomena
Physical
Processes
Human Activities
Earth Realm
Physical
Properties
Data
Space
Time
Numerics
Units
SWEET as an Upper Level
Earth Science Ontology
Math
Space
Time
Physics
Chemistry
import
Property
EarthRealm
Process, Phenomena
Substance
Data
SWEET
import
Stratospheric
Chemistry
Biogeochemistry
Specialized
domains
Why an Upper-Level Ontology
for Earth System Science?
Many common concepts used across Earth Science
disciplines (such as properties of the Earth)
Provides common definitions for terms used in multiple
disciplines or communities
Provides common language in support of community and
multidisciplinary activities
Provides common “properties” (relations) for tool developers
Reduced burden (and barrier to entry) on creators of
specialized domain ontologies
Only need to create ontologies for incremental knowledge
How SWEET was Initially
Populated
Initial sources
GCMD
Over 10,000 datasets
Over 1000 keywords
Data providers submit far more than the 1000 terms for “free-text”
search
CF
Over 500 keywords
Very long term names
surface_downwelling_photon_spherical_irradiance_in_sea_w
ater
Decomposed into facets
Spatial Ontology
Concepts of 0-D, 1-D, 2-D, and 3-D objects
Default coordinate system: lat/lon/up
Polygons used to store spatial extents
Spatial attributes added (population, area, etc.)
Scientific applications include: geology to represent
3-D structure
Numerical Ontologies
Numerics
SpatialEntities
Extents: interval, point, 0, positiveIntegers, …
Relations: lessThan, greaterThan, …
Extents: country, Antarctica, equator, inlet, …
Relations: above, northOf, …
TemporalEntities
Extents: duration, century, season, …
Relations: after, before, …
Numerical Ontologies (cont.)
Numeric concepts defined in OWL only through
standard XML XSD spec
Numerical relations defined in SWEET
Intervals defined as restrictions on real line
lessThan, max, …
Cartesian product (multidimensional spaces)
added in SWEET
Numeric ontologies used to define spatial and
temporal concepts
Conceptual Ontologies
Phenomena
ElNino, Volcano, Thunderstorm, Deforestation)
Each has associated, spatial/temporal extent, EarthRealms,
PhysicalProperties etc.
Specific instances included
Human Activities
e.g., 1997-98 ElNino
Fisheries, IndustrialProcessing, Economics, Public Good
State
History or state of planet or component
SWEET Users
ESML- Earth Science Markup Language
ESIP - Earth Science Information Partner Federation
GEON- Geosciences Network
GENESIS- Global Environmental & Earth Science
Information System
IRI- International Research Institute (Columbia)
LEAD- Linked Environments for Atmospheric
Discovery
MMI- Marine Metadata Initiative
NOESIS
PEaCE- Pacific Ecoinformatics and Computational Ecology
SESDI- Semantically Enabled Science Data Integration
VSTO- Virtual Solar-Terrestrial Observatory
Collaboration Web Site
Discussion tools
Version Control/ Configuration Management
Trace dependencies on external ontologies
Tools to search for existing concepts in registered
ontologies
Ontology Validation Procedure
Blog, wiki, moderated discussion board
W3C note is formal submission method
Registry/discovery of ontologies
Support workflows/services for ontology development
Community Issues
Content
Standards and Conventions
Agreement on standards for use of OWL
Fuzzy representation conventions
Review Board
Maintain alignment given expansion of classes and
properties
Who will oversee and maintain for perpetuity (or at least
through the next funding cycle)
ESIP Federation? ESSI?
Global Support
Provide tools to visualize and appreciate the big picture
Update/Matching Issues
No removal of terms except for spelling or factual
errors
Must avoid contradictions
Additions can create redundancy if sameAs not used
Humans must oversee “matching”
Subscription service to notify affected ontologies when
changes made
CF has established moderator to carry out analogous additions
OWL “import” imports entire file
Associate community with ontology terms
Community tagging
Best Practices
Keep ontologies small, modular
Be careful that “Owl:Import” imports
everything
Use higher level ontologies where possible
Identify hierarchy of concept spaces
Model schemas
Try to keep dependencies unidirectional