Transcript Test

Developing Ontologies
(and more)
Peter Fox (NCAR)
ESIP Winter Meeting (TIWG)
January 9, 2008, Washington, D.C.
1
Ontology Spectrum
Thesauri
“narrower
Catalog/
term”
ID
relation
Terms/
glossary
Informal
is-a
Selected
Formal Frames
Logical
is-a (properties)Constraints
(disjointness,
inverse, …)
Formal
Value
instance
Restrs.
General
Logical
constraints
Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty;
– updated by McGuinness.
Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
2
Ontology - declarative knowledge
• The triple: {subject-predicate-object}
interferometer is-a optical instrument
Fabry-Perot is-a interferometer
Optical instrument has focal length
Optical instrument is-a instrument
Instrument has instrument operating mode
Data archive has measured parameter
SO2 concentration is-a concentration
Concentration is-a parameter
3
Semantic Web Layers
4
http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/
Terminology
• Ontology (n.d.). The Free On-line Dictionary of Computing.
http://dictionary.reference.com/browse/ontology
– An explicit formal specification of how to represent the objects,
concepts and other entities that are assumed to exist in some area
of interest and the relationships that hold among them.
• Semantic Web
– An extension of the current web in which information is given welldefined meaning, better enabling computers and people to work in
cooperation, www.semanticweb.org
– Primer: http://www.ics.forth.gr/isl/swprimer/
• Languages
–
–
–
–
–
–
OWL 1.0 (Lite, DL, Full) - Web Ontology Language (W3C)
RDF - Resource Description Framework (W3C)
OWL-S/SWSL - Web Services (W3C)
WSMO/WSML - Web Services (EC/W3C)
SWRL - Semantic Web Rule Language, RIF- Rules Interchange Format
Editors: Protégé, SWOOP, CoE, VOM, Medius, SWeDE, …
5
OWL and RDF
• OWL
– Lite
– DL
– Full
• RDF
• Services
–
–
–
–
OWL-S
SWSL
WSML
SAWSDL - (WSDL-S)
• Rules
– SWRL
6
Developing Ontologies
• Approach:
– Bottom-up
– Top-down (upper-level or foundational)
– Mid-level (use case)
•
•
•
•
Using tools
Coding and testing
Iterating
Maintaining and evolving (curation,
preservation)
7
GRDDL - bottom up
• GRDDL - Gleaning Resource Descriptions
from Dialects of Languages
• Pretty much = “XML/XHTML (for e.g.) into
RDF via XSLT”
• Good support, e.g. Jena
• Handles microformats
• Active community
• How to categorize, use, re-use (parts of)?
8
Collecting
• RDFa extends XHTML by:
– extending the link and meta to include child
elements
– add metadata to any elements (a bit like the
class in micro-formats, but via dedicated
properties)
– It is very similar to micro-formats, but with more
rigor:
• it is a general framework (instead of an “agreement” on
the meaning of, say, a class attribute value)
• terminologies can be mixed more easily
• ATOM (used with RSS)
9
Foundational Ontologies
CONTENTS
 General concepts and relations that apply
in all domains
physical object, process, event,…, inheres, participates,…
 Rigorously defined
formal logic, philosophical principles, highly structured
 Examples
DOLCE, BFO, GFO, SUMO, CYC, (Sowa)
10
Courtesy: Boyan Brodaric
Foundational Ontologies
PURPOSE: help integrate domain ontologies
“…and then there was one…”
Foundational ontology
Geology
ontology
Struc
Rock
ontology
ontology
Geophysics
ontology
Marine
ontology
Water
ontology
Planetary
ontology
11
Courtesy: Boyan Brodaric
Foundational Ontologies
PURPOSE: help organize domain ontologies
“…a place for everything, and everything in its place…”
Foundational ontology
shale
rock
formatio
n
lithification
12
Courtesy: Boyan Brodaric
Problem scenario

Little work done on linking foundational
ontologies with geoscience ontologies

Such linkage might benefit various scenarios
requiring cross-disciplinary knowledge, e.g.:
water budgets: groundwater (geology) and surface water (hydro)
hazards risk: hazard potential (geology, geophysics) and items at
threat (infrastructure, people, environment, economic)
health: toxic substances (geochemistry) and people, wildlife
many others…
13
Courtesy: Boyan Brodaric
DOLCE
14
DOLCE + SWEET
DOLCE
= SWEET
< SWEET
Physical-body
BodyofGround,
BodyofWater,…
Material-Artifact
Infrastructure,
Dam, Product,…
Physical-Object
LivingThing,
MarineAnimal
Amount-of-Matter

full coverage
rich relations
home for orphans
single
superclasses
Substance
HumanActivity
Activity
Physical-Phenomenon
Phenomena
Process
Process
State
StateOfMatter
Quality
Quantity,
Moisture,…
Physical-Region
Basalt,…
Temporal-Region
Ordovician,…
Benefits

Issues
individuals
(e.g. Planet Earth)
roles
(contaminant)
features
(SeaFloor)
15
Courtesy: Boyan Brodaric
Conclusions
 Surprisingly good fit amongst ontologies
so far: no show-stopper conflicts, a few difficult conflicts
 DOLCE richness benefits geoscience ontologies
good conceptual foundation helps clear some existing problems
 Unresolved issues in modeling science entities
modeling classifications, interpretations, theories, models,…
 Same procedure with GeoSciML
16
Courtesy: Boyan Brodaric
SUMO - Standard Upper Merged Ontology
•
•
Physical
• Object
•
SelfConnectedObject
•
ContinuousObject
•
CorpuscularObject
•
Collection
• Process
Abstract
• SetClass
•
Relation
• Proposition
• Quantity
•
Number
•
PhysicalQuantity
• Attribute
17
18
19
Using SNAP/ SPAN
20
GeoSciOnt?
21
22
Using SWEET
• Plug-in (import) domain detailed modules
• Lots of classes, few relations (properties)
23
Mix-n-Match
• The IRI example:
– Collect a lot of different ontologies representing
different terms, levels of concepts, etc. into a
base form: RDF
– See Benno’s talk in session 1b.
• MMI
• Others
24
NC basic attributes
CF attributes
IRIDL
attributes/objects
CF data objects
SWEET Ontologies
(OWL)
CF Standard Names
(RDF object)
Location
CF Standard Names
As Terms
IRIDL Terms
SWEET as Terms
Search Terms
Gazetteer Terms
25
Blumenthal
IRI RDF Architecture
MMI
Data Servers
Ontologies
JPL
bibliography
Start Point
Standards
Organizations
RDF Crawler
RDFS Semantics
Owl Semantics
SWRL Rules
SeRQL CONSTRUCT
Sesame
Location
Canonicalizer
Time
Canonicalizer
Search Queries
26
Blumenthal
Search Interface
Mid-Level: Developing ontologies
• Use cases and small team (7-8; 2-3 domain experts, 2
knowledge experts, 1 software engineer, 1 facilitator, 1
scribe)
• Identify classes and properties (leverage controlled
vocab.)
– Start with narrower terms, generalize when needed or
possible
– Adopt a suitable conceptual decomposition (e.g. SWEET)
– Import modules when concepts are orthogonal
• Review, vet, publish
• Only code them (in RDF or OWL) when needed
(CMAP, …)
• Ontologies: small and modular
27
Use Case example
• Plot the neutral temperature from the Millstone-Hill
Fabry Perot, operating in the vertical mode during
January 2000 as a time series.
• Plot the neutral temperature from the MillstoneHill Fabry Perot, operating in the vertical mode
during January 2000 as a time series.
• Objects:
–
–
–
–
–
–
–
Neutral temperature is a (temperature is a) parameter
Millstone Hill is a (ground-based observatory is a) observatory
Fabry-Perot is a interferometer is a optical instrument is a instrument
Vertical mode is a instrument operating mode
January 2000 is a date-time range
Time is a independent variable/ coordinate
Time series is a data plot is a data product
28
Class and property example
• Parameter
– Has coordinates (independent variables)
• Observatory
– Operates instruments
• Instrument
– Has operating mode
• Instrument operating mode
– Has measured parameters
• Date-time interval
• Data product
29
30
31
32
Higher level use case
• Find data which represents the state of the
neutral atmosphere above 100km, toward the
arctic circle at any time of high geomagnetic
activity
• Find data which represents the state of the
neutral atmosphere above 100km, toward
the arctic circle at any time of high
geomagnetic activity
33
Translating the Use-Case - nonmonotonic?
GeoMagneticActivity has
ProxyRepresentation
Input
GeophysicalIndex is a
ProxyRepresentation (in
Physical properties: State of
Realm of Neutral Atmosphere)
neutral atmosphere
Kp is a GeophysicalIndex
Spatial:
hasTemporalDomain: “daily”
• Above 100km
hasHighThreshold:
• Toward arctic circlexsd_number = 8
(above 45N)
Date/time when KP => 8
Conditions:
Specification needed for
query to CEDARWEB
Instrument
Parameter(s)
Operating Mode
Observatory
Date/time
• High geomagnetic activity
Action: Return Data
Return-type: data
34
Translating
the
Use-Case
hasPhysicalProperties: NeutralTemperature, Neutral Wind, etc.
ctd.
hasSpatialDomain: [0,360],[0,180],[100,150]
NeutralAtmosphere is a subRealm of TerrestrialAtmosphere
hasTemporalDomain:
Specification needed for
Input
query to CEDARWEB
NeutralTemperature
is
a
Temperature
(which)
is
a
Parameter
Physical properties: State of
Instrument
neutral atmosphere
Spatial:
Above 100km
GeoMagneticActivity
has
ProxyRepresentation
Toward arctic
circle (above
GeophysicalIndex
is a 45N)
ProxyRepresentation
(in
Conditions:
Realm of Neutral Atmosphere)
High geomagnetic
Kp
is a GeophysicalIndex
activity
hasTemporalDomain: “daily”
Action: Return Data
hasHighThreshold:
xsd_number = 8
Date/time when KP => 8
Parameter(s)
FabryPerotInterferometer
is a Interferometer,
(which) is a OpticalOperating
Instrument
(which) is a
Mode
Instrument
Observatory
hasFilterCentralWavelength: Wavelength
hasLowerBoundFormationHeight: Height
Date/time
ArcticCircle is a GeographicRegion
Return-type: data
hasLatitudeBoundary:
hasLatitudeUpperBoundary:
35
Tools - Using Protégé
36
Creating Ontologies - visual
• UML - new release of ODM/MOF
– Ontology Definition Metamodel/Meta Object
Facility (OMG) for UML
– Provides standardized notation
• CMAP Ontology Editor (concept mapping tool
from IHMC)
– Drag/drop visual development of classes,
subclass (is-a) and property relationship
– Read and writes OWL
– Formal convention (OWL/RDF tags, etc.)
• White board, text file
37
Using CMAP/COE
38
39
Is OWL the only option? No…
• SKOS - Simple Knowledge Organization
Scheme
• Annotations (RDFa)
• Atom
• Natural Language (read results from a web
search and transform to a usable form)
– CL (common logic)
– Rabbit, e.g. ShellfishCourse is a Meal Course
that (if has drink) always has drink Potable Liquid
that has Full body and which either has Moderate
or Strong flavour
40
– PENG (processable English)
Is OWL the only option II? No…
• Natural Language (NL)
– Read results from a web search and transform to a
usable form
– Find/filter out inconsistencies, concepts/relations that
cannot be represented
• Popular options
– CLCE (common logic controlled english)
– Rabbit, e.g. ShellfishCourse is a Meal Course that (if has
drink) always has drink Potable Liquid that has Full body
and which either has Moderate or Strong flavour
– PENG (processable English)
• Really need PSCI - process-able science
41
Creating Ontologies - verbal
• Translating use cases
• E.g. Find data which represents the state of
the neutral atmosphere above 100km, toward
the arctic circle at any time of high
geomagnetic activity
• Can this be expressed as an ontology?
– CLCE, Rabbit, PENG, Sydney syntax
• Notice something about the next examples?
42
Sydney syntax
If X has Y as a father then Y is the
only father of X.
The class person is equivalent to
male or female, and male and
female are mutually exclusive.
equivalent to
The classes male and female are
mutually exclusive. The class
person is fully defined as anything
that is a male or a female.
43
PENG - Processible English
1. If X is a research programmer then
X is a programmer.
2. Bill Smith is a research
programmer who works at the CLT.
3. Who is a programmer and works at
the CLT?
44
CLCE - Common Logic Controlled English
CLCE: If a set x is the set of (a cat, a
dog, and an elephant), then the cat is an
element of x, the dog is an element of x,
and the elephant is an element of x.
PC:~(∃x:Set)(∃x1:Cat)(∃x2:Dog)(∃x3:Elep
hant)(Set(x,x1,x2,x3) ∧ ~(x1∈x ∧ x2∈x ∧
x3∈x))
45
Use Case
• Provide a decision support capability for an
analyst to determine an individual’s
susceptibility to avian flu without having to be
precise in terminology (-nyms)
46
47
48
Using ThManager
49
Services
• Ontologies of services, provides:
– What does the service provide for prospective clients?
The answer to this question is given in the "profile," which
is used to advertise the service. To capture this
perspective, each instance of the class Service presents
a ServiceProfile.
– How is it used? The answer to this question is given in the
"process model." This perspective is captured by the
ServiceModel class. Instances of the class Service use
the property describedBy to refer to the service's
ServiceModel.
– How does one interact with it? The answer to this
question is given in the "grounding." A grounding provides
the needed details about transport protocols. Instances of
the class Service have a supports property referring to a 50
ServiceGrounding.
Developing a service ontology
• Use case: find and display in the same projection,
sea surface temperature and land surface
temperature from a global climate model.
• Find and display in the same projection, sea
surface temperature and land surface
temperature from a global climate model.
• Classes/ concepts:
–
–
–
–
–
–
–
Temperature
Surface (sea/ land)
Model
Climate
Global
Projection
Display …
51
Service ontology
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Climate model is a model
Model has domain
Climate Model has component representation
Land surface is-a component representation
Ocean is-a component representation
Sea surface is part of ocean
Model has spatial representation (and temporal)
Spatial representation has dimensions
Latitude-longitude is a horizontal spatial representation
Displaced pole is a horizontal spatial representation
Ocean model has displaced pole representation
Land surface model has latitude-longitude representation
Lambert conformal is a geographic spatial representation
Reprojection is a transform between spatial representation
….
52
Service ontology
• A sea surface model has grid representation displaced pole
and land surface model has grid representation latitudelongitude and both must be transformed to Lambert
conformal for display
53
Best practices
• Ontologies/ vocabularies must be shared and
reused - swoogle.umbc.edu, www.planetont.org
• Examine ‘core vocabularies’ to start with
– SKOS Core: about knowledge systems
– Dublin Core: about information resources, digital libraries,
with extensions for rights, permissions, digital right
management
– FOAF: about people and their organizations
– DOAP: on the descriptions of software projects
– DOLCE seems the most promising to match science
ontologies
• Go “Lite” as much as possible, then DL and only if
you have to Full - balancing expressibility vs.
implementability
• Minimal properties to start, add only when needed
54
Tutorial Summary
• Many different options for ontology
development and encoding
• Tools are in reasonable shape, no killer-tool
• Best practices DO exist
– PLEASE DO NOT just start coding OWL!
• Use case should drive the functional
requirements of both your ontology and how
you will ‘build’ one
• PARTNER with someone already familiar
55
More information
• OWL-S - http://www.w3.org/Submission/OWL-S
• SWSO/F/L - Semantic Web Services Ontology/Framework/Language http://www.w3.org/Submission/SWSF/
• WSMO/X/L - Web Services Modeling Ontology/Exection/Language http://www.w3.org/Submission/WSMX/ www.wsmo.org, www.wsmx.org
• SAWSDL - (WSDL-S)
56
Other tools
• Reasoners
– Pellet, Racer, Medius KBS, FACT++, fuzzyDL, KAON2,
MSPASS, QuOnto
• Query Languages
– SPARQL, XQUERY, SeRQL, OWL-QL, RDFQuery
• Other Tools for Semantic Web
–
–
–
–
Search: SWOOGLE swoogle.umbc.edu
Collaboration: www.planetont.org
Other: Jena, SeSAME/SAIL, Mulgara, Eclipse, KOWARI
Semantic wiki: OntoWiki, SemanticMediaWiki
57
Editors
• Protégé (http://protégé.stanford.edu)
• SWOOP (http://mindswap.org/2004/SWOOP)
• Altova SemanticWorks
(http://www.altova.com/download/semanticworks/se
mantic_web_rdf_owl_editor.html)
• SWeDE (http://owleclipse.projects.semwebcentral.org/InstallSwede.ht
ml), goes with Eclipse
• Medius
• TopBraid Composer and other commercial tools
• Visual Ontology Modeler (VOM) - Sandpiper
• CMAP Ontology Editor (COE)
(http://cmap.ihmc.us/coe)
58
What about Earth Science?
• SWEET (Semantic Web for Earth and Environmental
Terminology)
– http://sweet.jpl.nasa.gov
– based on GCMD terms
– modular using faceted and integrative concepts
• VSTO (Virtual Solar-Terrestrial Observatory)
– http://vsto.hao.ucar.edu
– captures observational data (from instruments)
– modular using domains
• MMI
– http://marinemetadata.org
– captures aspects of marine data, ocean observing systems
– partly modular, mostly by developed project
• GeoSciML
– http://www.opengis.net/GeoSciML/
– is a GML (Geography ML) application language for Geoscience
– modular, in ‘packages’
59