Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Peter Fox CSCI/ITEC-6962-01 Week 2, September 14, 2009

Download Report

Transcript Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Peter Fox CSCI/ITEC-6962-01 Week 2, September 14, 2009

Foundations I: Methodologies,
Knowledge Representation
Deborah McGuinness and Peter Fox
CSCI/ITEC-6962-01
Week 2, September 14, 2009
1
Key Points Regarding Flu
Epidemics
•
Wash your hands often, especially after shaking hands with others (use hand
disinfectants if there is no access to soap and water).
•
Avoid close contact with people who are sick
•
Cover your mouth and nose with a tissue when coughing or sneezing. If you don't
have a tissue, use the inside of your elbow.
•
Do not touch your eyes, nose, or mouth, especially after contact with shared
keyboards, microscopes, instruments, or other people.
•
STAY HOME-If you have flu-like symptoms (i.e., fever (100 degrees F [37.8
decrees C] or higher, cough, sore throat, runny or stuffy nose, body aches,
headache, chills, fatigue).
•
For more information http://www.rpi.edu/about/flu/ .
Review of reading Assignment 1
• Ontologies 101, Semantic Web, e-Science,
RDFS, Common Logic, OWL guide
• Any comments, questions?
3
Contents
•
•
•
•
•
•
•
Review of methodologies
Elements of KR in semantic web context
And in e-Science
Choices of representation, models
Examples of KR
Encoding and understanding representations
Assignment 1
4
Semantic Web Methodology and
Technology Development Process
•
•
Establish and improve a well-defined methodology vision for
Semantic Technology based application development
Leverage controlled vocabularies, et c.
Rapid
Open World:
Evolve, Iterate, Prototype
Redesign,
Redeploy
Leverage
Technology
Infrastructure
Adopt
Science/Expert
Technology
Approach Review & Iteration
Use Tools
Evaluation
Analysis
Use Case
Small Team,
mixed skills
Develop
model/
ontology
5
KR and methodologies
• Procedural Knowledge: Knowledge is encoded in functions/procedures.
This can be viewed as hard coded and less flexible. ex: function Person(X)
return boolean is if (X = ``Socrates'') or (X = ``Hillary'') then return true
else return false;
Or
function Mortal(X) return boolean is return person(X);
• Networks: A compromise between declarative and procedural schemes.
Knowledge is represented in a labeled, directed graph whose nodes
represent concepts and entities, while its arcs represent relationships
between these entities and concepts.
• Frames: Much like a semantic network except each node represents
prototypical concepts and/or situations. Each node has several property
slots whose values may be specified or inherited.
• Logic: A way of declaratively representing knowledge. For example:
–
–
–
–
person(Socrates).
person(Hillary).
forall X [person(X) ---> mortal(X)]
DL, FOL, HOL
6
KR and methodologies
• Decision Trees: Concepts are organized in the form of a
tree.
• Statistical Knowledge: The use of certainty factors, Bayesian
Networks, Dempster-Shafer Theory, Fuzzy Logics, ..., etc.
• Rules: The use of Production Systems to encode conditionaction rules (as in expert systems).
• Parallel Distributed processing: The use of connectionist
models.
• Subsumption Architectures: Behaviors are encoded
(represented) using layers of simple (numeric) finite-state
machine elements.
• Hybrid Schemes: Any representation formalism employing a
combination of KR schemes.
7
Remember, in science!
• Some of the knowledge is lost when it is
placed into any particular representation
structure, or may not be reusable (e.g.
Frames)
• So, you may ask something that cannot be
answered or inferred
• Knowledge evolves, i.e. changes
• Knowledge and understanding is very often
context dependent (and discipline, language,
and skill-level dependent, and …)
8
And, if you are used to logic
• You are working mostly within the world of
logic, whereas we are trying to represent
knowledge with logic and we are usually
dealing with tangible objects, such as trees,
clouds, rock, storms, etc.
• Because of this, we have to be very careful
when translating real things into logical
symbols - this can, surprisingly, be a difficult
challenge.
• Consider your method of representation (yes,
we do want to compute with it)
9
Thus
• A person who wants to encode knowledge
needs to decouple the ambiguities of
interpretation from the mathematical certainty
of (any form of) logic.
• The nature of interpretation is critical in formal
knowledge representation and is carefully
formalized by KR scientists in order to
guarantee that no ambiguity exists in the
logical structure of the represented
knowledge.
10
Representing Knowledge With Objects
• Take all individuals that we need to keep track of and
place them into different buckets based on how similar
they are to each other. Each bucket is given a
descriptive based on what objects it contains.
• Since the individuals in a given bucket are at least
somewhat similar, we can avoid needing to describe
every inconsequential detail about each individual.
Instead, properties that are common to all individuals
in a bucket can just be assigned to the entire bucket at
once. Properties are typically either primitive values
(such as numbers or text strings) or may be
references to other buckets.
11
Representing Knowledge With Objects
• Some buckets will be more similar to each other than
others and we can arrange the buckets into a
hierarchy based on the similarity.
• If all buckets in a branch in the tree of buckets share a
property, the information can be further simplified by
assigning the property only to the parent bucket. Other
buckets (and individuals) are said to inherit that
property.
• Buckets may have different names: e.g. Classes,
Frames, or Nodes
• BUT, once we move to (e.g.) DL, not all object rules
apply, e.g. cannot override properties
12
• Multiple inheritance is not always obvious to people
Re-enter Semantic Web
• At its core, the Semantic Web can be thought
of as a methodology for linking up pieces of
structured and unstructured information into
commonly-shared description logics
ontologies.
13
Semantic Web Layers
14
http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/
Elements of KR in Semantic Web
• Declarative Knowledge
• Statements as triples: {subject-predicate-object}
interferometer is-a optical instrument
Fabry-Perot is-a interferometer
Optical instrument has focal length
Optical instrument is-a instrument
Instrument has instrument operating mode
Instrument has measured parameter
Instrument operating mode has measured parameter
NeutralTemperature is-a temperature
Temperature is-a parameter
• A query: select all optical instruments which have operating
mode vertical
• An inference: infer operating modes for a Fabry-Perot
Interferometer which measures neutral temperature
15
Ontology Spectrum
Thesauri
“narrower
Catalog/
term”
ID
relation
Terms/
glossary
Informal
is-a
Selected
Formal Frames
Logical
is-a (properties)Constraints
(disjointness,
inverse, …)
Formal
Value
instance
Restrs.
General
Logical
constraints
Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty;
– updated by McGuinness.
Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
16
OWL or RDF or OWL 2 RL?
• In representing knowledge you will need to
balance expressivity with implementability
– OWL (Lite, DL, Full) 1 or 2?
– RDF and RDFS
– Rules, e.g. SWRL or OWL 2 RL
• You will need to consider the sources of your
knowledge
• You will need to consider what you want to do
with the represented knowledge
17
The knowledge base
• Using, Re-using, Re-purposing, Extending,
Subsetting
• Approach:
– Bottom-up (instance level or vocabularies)
– Top-down (upper-level or foundational)
– Mid-level (use case)
• Coding and testing (understanding)
• Using tools (some this class, more over the next two
classes)
• Iterating (later)
• Maintaining and evolving (curation, preservation)
18
(later)
‘Collecting’ the ‘data’
• Part of the (meta)data information is present in tools ... but thrown away
at output e.g., a business chart can be generated by a tool: it ‘knows’ the
structure, the classification, etc. of the chart,but, usually, this information
is lost storing it in web data would be easy!
• Semantic Web-aware tools are around (even if you do not know it...),
though more would be good:
– Photoshop CS stores metadata in RDF in, say, jpg files (using XMP)
– RSS 1.0 feeds are generated by (almost) all blogging systems (a huge
amount of RDF data!)
• Scraping - different tools, services, etc, come around every day:
– get RDF data associated with images, for example: service to get RDF from
flickr images
– service to get RDF from XMP
– XSLT scripts to retrieve microformat data from XHTML files
– RSS scraping in use in Virtual Observatory projects in Japan
– scripts to convert spreadsheets to RDF
• SQL - A huge amount of data in Relational Databases
– Although tools exist, it is not feasible to convert that data into RDF
– Instead: SQL ⇋ RDF ‘bridges’ are being developed: a query to RDF data is
transformed into SQL on-the-fly
19
More Collecting
• RDFa (formerly known as RDF/A) extends XHTML
by:
– extending the link and meta to include child elements
– add metadata to any elements (a bit like the class in
microformats, but via dedicated properties)
• It is very similar to microformats, but with more
rigor:
– it is a general framework (instead of an メagreementモ on
the meaning of, say, a class attribute value)
– terminologies can be mixed more easily
• GRDDL - Gleaning Resource Descriptions from
Dialects of Languages
• ATOM - XML-based Web content and metadata
syndication format (used with RSS)
20
GRDDL - bottom up
• GRDDL - Gleaning Resource Descriptions
from Dialects of Languages
• Pretty much = “XML/XHTML (for e.g.) into
RDF via XSLT”
• Good support, e.g. Jena
• Handles microformats
• Active community
• How to categorize, use, re-use (parts of)?
21
Foundational Ontologies
Domain independent concepts and relations
physical object, process, event,…, participates,…
 (Usually) Rigorously defined
formal logic, philosophical principles, highly structured
 Examples
DOLCE – Descriptive Onotology for Linguistic and Cognitive Engineering
SUMO – Suggested Upper Merged Ontology
CYC Upper Level Ontology
BFO – Basic Formal Ontology
GFO – General Formal Ontology (developed by Onto Med)
22
Foundational Ontologies
PURPOSE: help integrate domain ontologies
“…and then there was one…”
Foundational ontology
Geology
ontology
Struc
Rock
ontology
ontology
Geophysics
ontology
Marine
ontology
Water
ontology
Planetary
ontology
23
Courtesy: Boyan Brodaric
Foundational Ontologies
PURPOSE: help organize domain ontologies
“…a place for everything, and everything in its place…”
Foundational ontology
shale
rock
formation
lithification
24
Courtesy: Boyan Brodaric
Problem scenario

Little work done on linking foundational
ontologies with geoscience ontologies

Such linkage might benefit various scenarios
requiring cross-disciplinary knowledge, e.g.:
water budgets: groundwater (geology) and surface water (hydro)
hazards risk: hazard potential (geology, geophysics) and items at
threat (infrastructure, people, environment, economic)
health: toxic substances (geochemistry) and people, wildlife
many others…
25
Courtesy: Boyan Brodaric
DOLCE
26
SUMO - Standard Upper Merged Ontology
•
•
Physical
• Object
•
SelfConnectedObject
•
ContinuousObject
•
CorpuscularObject
•
Collection
• Process
Abstract
• SetClass
•
Relation
• Proposition
• Quantity
•
Number
•
PhysicalQuantity
• Attribute
27
• http://www.ifomis.org/Research/IFOMISRepor
ts/IFOMIS%20Report%2005_2003.pdf
http://www.ifomis.org/Research/IFOMISReports/IFOMIS%20Report%2005_2003.pdf
28
29
Using SNAP/ SPAN
30
DOLCE + SWEET
DOLCE
= SWEET
< SWEET
Physical-body
BodyofGround,
BodyofWater,…
Material-Artifact
Infrastructure,
Dam, Product,…
Physical-Object
LivingThing,
MarineAnimal
Amount-of-Matter

full coverage
rich relations
home for orphans
single
superclasses
Substance
HumanActivity
Activity
Physical-Phenomenon
Phenomena
Process
Process
State
StateOfMatter
Quality
Quantity,
Moisture,…
Physical-Region
Basalt,…
Temporal-Region
Ordovician,…
Benefits

Issues
individuals
(e.g. Planet Earth)
roles
(contaminant)
features
(SeaFloor)
31
Courtesy: Boyan Brodaric
Conclusions
 Surprisingly good fit amongst ontologies
so far: no show-stopper conflicts, a few difficult conflicts
 DOLCE richness benefits geoscience ontologies
good conceptual foundation helps clear some existing problems
 Unresolved issues in modeling science entities
modeling classifications, interpretations, theories, models,…
 Same procedure with GeoSciML
32
Courtesy: Boyan Brodaric
33
SWEET 2.0 Modular Design
• Supports easy extension
by domain specialists
Math, Time, Space
Basic Science
• Organized by subject
(theoretical to applied)
Geoscience
Processes
• Reorganization of classes,
but no significant changes to
content
• Importation is
unidirectional
Geophysical
Phenomena
Applications
importation
34
SWEET 2.0 Ontologies
35
Using SWEET
• Plug-in (import) domain detailed modules
• Lots of classes, few relations (properties)
• Version 2.0 is re-usable and extensible
36
Mix-n-Match
• The hybrid example:
– Collect a lot of different ontologies representing
different terms, levels of concepts, etc. into a
base form: RDF
37
NC basic attributes
CF attributes
IRIDL
attributes/objects
CF data objects
SWEET Ontologies
(OWL)
CF Standard Names
(RDF object)
Location
CF Standard Names
As Terms
IRIDL Terms
SWEET as Terms
Search Terms
Gazetteer Terms
38
Blumenthal
IRI RDF Architecture
MMI
Data Servers
Ontologies
JPL
bibliography
Start Point
Standards
Organizations
RDF Crawler
RDFS Semantics
Owl Semantics
SWRL Rules
SeRQL CONSTRUCT
Sesame
Location
Canonicalizer
Time
Canonicalizer
Search Queries
39
Blumenthal
Search Interface
Mid-Level: Developing ontologies
• Use cases and small team (7-8; 2-3 domain experts, 2
knowledge experts, 1 software engineer, 1 facilitator, 1
scribe)
• Identify classes and properties (leverage controlled
vocab.)
– Start with narrower terms, generalize when needed or
possible
– Adopt a suitable conceptual decomposition (e.g. SWEET)
– Import modules when concepts are orthogonal
• Review, vet, publish
• Only code them (in RDF or OWL) when needed
(CMAP, …)
• Ontologies: small and modular
40
Use Case example
• Plot the neutral temperature from the Millstone-Hill
Fabry Perot, operating in the non-vertical mode
during January 2000 as a time series.
• Plot the neutral temperature from the MillstoneHill Fabry Perot, operating in the non-vertical
mode during January 2000 as a time series.
• Objects:
–
–
–
–
–
–
–
Neutral temperature is a (temperature is a) parameter
Millstone Hill is a (ground-based observatory is a) observatory
Fabry-Perot is a interferometer is a optical instrument is a instrument
Non-vertical mode is a instrument operating mode
January 2000 is a date-time range
Time is a independent variable/ coordinate
Time series is a data plot is a data product
41
Class and property example
• Parameter
– Has coordinates (independent variables)
• Observatory
– Operates instruments
• Instrument
– Has operating mode
• Instrument operating mode
– Has measured parameters
• Date-time interval
• Data product
42
43
44
45
Higher level use case
• Find data which represents the state of the
neutral atmosphere above 100km, toward the
arctic circle at any time of high geomagnetic
activity
• Find data which represents the state of the
neutral atmosphere above 100km, toward
the arctic circle at any time of high
geomagnetic activity
46
Extending the KR for a purpose
GeoMagneticActivity has
ProxyRepresentation
Input
GeophysicalIndex is a
ProxyRepresentation (in
Physical properties: State of
Realm of Neutral Atmosphere)
neutral atmosphere
Kp is a GeophysicalIndex
Spatial:
hasTemporalDomain: “daily”
• Above 100km
hasHighThreshold:
• Toward arctic circlexsd_number = 8
(above 45N)
Date/time when KP => 8
Conditions:
Specification needed for
query to CEDARWEB
Instrument
Parameter(s)
Operating Mode
Observatory
Date/time
• High geomagnetic activity
Action: Return Data
Return-type: data
47
Translating
the
Use-Case
hasPhysicalProperties: NeutralTemperature, Neutral Wind, etc.
ctd.
hasSpatialDomain: [0,360],[0,180],[100,150]
NeutralAtmosphere is a subRealm of TerrestrialAtmosphere
hasTemporalDomain:
Specification needed for
Input
query to CEDARWEB
NeutralTemperature
is
a
Temperature
(which)
is
a
Parameter
Physical properties: State of
Instrument
neutral atmosphere
Spatial:
Above 100km
GeoMagneticActivity
has
ProxyRepresentation
Toward arctic
circle (above
GeophysicalIndex
is a 45N)
ProxyRepresentation
(in
Conditions:
Realm of Neutral Atmosphere)
High geomagnetic
Kp
is a GeophysicalIndex
activity
hasTemporalDomain: “daily”
Action: Return Data
hasHighThreshold:
xsd_number = 8
Date/time when KP => 8
Parameter(s)
FabryPerotInterferometer
is a Interferometer,
(which) is a OpticalOperating
Instrument
(which) is a
Mode
Instrument
Observatory
hasFilterCentralWavelength: Wavelength
hasLowerBoundFormationHeight: Height
Date/time
ArcticCircle is a GeographicRegion
Return-type: data
hasLatitudeBoundary:
hasLatitudeUpperBoundary:
48
Knowledge representation - visual
• UML – Universal Modeling Language
– Ontology Definition Metamodel/Meta Object
Facility (OMG) for UML
– Provides standardized notation
• CMAP Ontology Editor (concept mapping tool
from IHMC - http://cmap.ihmc.us/coe )
– Drag/drop visual development of classes,
subclass (is-a) and property relationship
– Read and writes OWL
– Formal convention (OWL/RDF tags, etc.)
• White board, text file
49
50
Representing processes
51
Is OWL/RDF the only option? No…
• SKOS - Simple Knowledge Organization
Scheme for Taxonomies
http://www.w3.org/2004/02/skos/
• Annotations (RDFa) – for un- or semistructured information sources
http://www.w3.org/TR/xhtml-rdfa-primer/
http://rdfa.info
• Atom (and RSS) – for representing
syndication feeds – structured
http://tools.ietf.org/html/rfc4287
• More expressive languages IKL, CL, …
52
Use Case
• Provide a decision support capability for an
analyst to determine an individual’s
susceptibility to avian flu without having to be
precise in terminology (-nyms)
53
54
55
Building SKOS
• ThManager
• Protégé (4) plugin for SKOS
56
Is OWL the only option II? No…
• Natural Language (NL)
– Read results from a web search and transform to a
usable form
– Find/filter out inconsistencies, concepts/relations that
cannot be represented
• Popular options
– CLCE (common logic controlled english)
– Rabbit, e.g. ShellfishCourse is a Meal Course that (if has
drink) always has drink Potable Liquid that has Full body
and which either has Moderate or Strong flavour
– PENG (processable English)
• Really need PSCI - process-able science but that’s
another story (research project)
57
Sydney syntax
If X has Y as a father then Y is the
only father of X.
The class person is equivalent to
male or female, and male and
female are mutually exclusive.
equivalent to
The classes male and female are
mutually exclusive. The class
person is fully defined as anything
that is a male or a female.
58
PENG - Processible English
1. If X is a research programmer then
X is a programmer.
2. Bill Smith is a research
programmer who works at the CLT.
3. Who is a programmer and works at
the CLT?
59
CLCE - Common Logic Controlled English
CLCE: If a set x is the set of (a cat, a
dog, and an elephant), then the cat is an
element of x, the dog is an element of x,
and the elephant is an element of x.
PC:~(∃x:Set)(∃x1:Cat)(∃x2:Dog)(∃x3:Elep
hant)(Set(x,x1,x2,x3) ∧ ~(x1∈x ∧ x2∈x ∧
x3∈x))
60
Rules (aka ‘Logic’)
• OWL-DL and OWL-Lite are based on
Description Logic
• There are things that DL cannot express
(though there are things that are difficult to
express with rules and easy in DL...)
– A well known examples is Horn rules (eg, the
‘uncle’ relationship): (P1 ∧ P2 ∧ ...) → C
– e.g.: parent(?x,?y) ∧ brother(?y,?z) ⇒
uncle(?x,?z)
– Or, for any X, Y and Z: if Y is a parent of X, and Z
is a brother of Y then Z is the uncle of X
61
Examples from
http://www.w3.org/Submission/SWRL/
• A simple use of these rules would be to assert that
the combination of the hasParent and
hasBrother properties implies the hasUncle
property. Informally, this rule could be written as:
– hasParent(?x1,?x2) ∧ hasBrother(?x2,?x3) ⇒
hasUncle(?x1,?x3)
• In the abstract syntax the rule would be written like:
– Implies(Antecedent(hasParent(Ivariable(x1) I-variable(x2))
hasBrother(I-variable(x2) Ivariable(x3)))Consequent(hasUncle(Ivariable(x1) I-variable(x3))))
• From this rule, if John has Mary as a parent and
Mary has Bill as a brother then John has Bill as an
uncle.
62
Examples
• An even simpler rule would be to assert that
Students are Persons, as in
– Student(?x1) ⇒
Person(?x1).Implies(Antecedent(Student(Ivariable(x1)))Consequent(Person(Ivariable(x1))))
– However, this kind of use for rules in OWL just duplicates
the OWL subclass facility. It is logically equivalent to write
instead
• Class(Student partial Person) or
• SubClassOf(Student Person)
– which would make the information directly available to an
OWL reasoner.
63
Semantic Web with Rules
•
•
•
•
•
•
•
•
Metalog
RuleML
SWRL
RIF
OWL 2 RL
WRL
Cwm
Jess - rules engine
64
Query
• Querying knowledge representations in OWL and/or
RDF
• OWL-QL (for OWL)
http://projects.semwebcentral.org/projects/owl
-ql/
• SPARQL for RDF http://www.sparql.org/ and
http://www.w3.org/TR/rdf-sparql-query/
• XQUERY (for XML)
• SeRQL (for SeSAME)
• RDFQuery (RDF)
• Few as yet for natural language representations
65
Developing a service ontology
• Use case: find and display in the same projection,
sea surface temperature and land surface
temperature from a global climate model.
• Find and display in the same projection, sea
surface temperature and land surface
temperature from a global climate model.
• Classes/ concepts:
–
–
–
–
–
–
–
Temperature
Surface (sea/ land)
Model
Climate
Global
Projection
Display …
66
Service ontology
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Climate model is a model
Model has domain
Climate Model has component representation
Land surface is-a component representation
Ocean is-a component representation
Sea surface is part of ocean
Model has spatial representation (and temporal)
Spatial representation has dimensions
Latitude-longitude is a horizontal spatial representation
Displaced pole is a horizontal spatial representation
Ocean model has displaced pole representation
Land surface model has latitude-longitude representation
Lambert conformal is a geographic spatial representation
Reprojection is a transform between spatial representation
….
67
Service ontology
• A sea surface model has grid representation displaced pole
and land surface model has grid representation latitudelongitude and both must be transformed to Lambert
conformal for display
68
Best practices (some)
• Ontologies/ vocabularies must be shared and
reused - swoogle.umbc.edu, bioportal, OOR
• Examine ‘core vocabularies’ to start with
– SKOS Core: about knowledge systems
– Dublin Core: about information resources, digital libraries,
with extensions for rights, permissions, digital right
management
– FOAF: about people and their organizations
– SIOC: about communities
– DOAP: on the descriptions of software projects
– DOLCE seems the most promising to match science
ontologies
• Go “Lite” as much as possible, then increasing logic
- balancing expressibility vs. implementability
• Minimal properties to start, add only when needed
69
Summary
• The science of knowledge representation has, throughout its
history, consisted of a compromise between pragmatism,
scientific rigor, and accessibility to domain experts
• Many different options for ontology development and
encoding, i.e. knowledge representation
• Sometimes, your choice of representation may need to
change based on language and tools availability/
capability…
• Balancing expressivity and implementability means we favor
an object-type, e.g. DL representation (but also suggests the
need for a meta-representation: e.g. KIF – Knowledge
Interchange Format)
• Next class (3) – ontology engineering
• Use cases should drive the functional requirements of both
your ontology and how you will ‘build’ one (see class 4)
70
Assignments for Week 2
• Reading: Semantic Web for the Working Ontologist
• Assignment 1: Representing Knowledge and
Understanding Representations
71