Data and Knowledge Representation Lecture 3

Download Report

Transcript Data and Knowledge Representation Lecture 3

Data and Knowledge
Representation
Lecture 3
Qing Zeng, Ph.D.
Last Time We Talked About
Boolean Algebra
 Predicate Logic (First order logic)

Today We Will Talk About
Ontology
 Major KR Schemes

Tell me what’s in this room
Tables, chairs, windows, computers, papers,
pens, people, etc..
 We can write x. y.Table( x)  Room( y)  In( x, y)
 But what is a table? What is a room?
 Logic has no vocabulary of its own

Ontology Fills the Gap
Ontology is a study of existence, of all
kinds of existence, of all kinds of entities
 It supplies the predicates of predicate
logic and labels that fill the boxes and
circles of conceptual graph

Webster’s Definition of Ontology

“1 : a branch of metaphysics concerned
with the nature and relations of being
2 : a particular theory about the nature of
being or the kinds of existents” -http://www.webster.com/cgi-bin/dictionary
My Simplified Understanding
Ontology seeks to describe entities through
classification of relations among entities
 Domain ontology limits the its scope to a specific
domain such as medicine
 In informatics, we further limit domain ontology
to what is needed by a application or certain
kinds of applications such clinical guideline,
retrieval of pathology information

Why Ontology in Biomedical Domain

Encode data
 E.g.

Patient A is diabetic and HIV positive
Represent knowledge
 E.g.
Blood Glucose test is a diagnostic test for
diabetes.
Sources of Ontology
Observation: provides knowledge of the
physical world
 Reasoning: make sense of observation by
generating a framework of abstractions
called “metaphysics”.

Ontology Development in Biomedical
Domain

Areas that directly involve ontology
 Data
model
 Vocabulary/terminology
 Knowledge based system
Philosopher’s Approach to Ontology

Top-down
 Concerned
with the entire universe
 Build top level ontology first

Long history
 Lao
Zi (Book of Tao)
 Plato
 Aristotle
 Kant (1787)
Computer/Information Science’s
Approach

Bottom Up
 Start
with limited world or specific
applications
 Exception:
 Designed

Cyc system
with computing in mind
Short History
 First
use of the term “ontology” in computer
science community: McCarthy, J. 1980
“Circumscription – A Form of Non-Monotonic
Reasoning”, Artificial Intelligence, 5: 13, 27–
39.
Problems Faced by Computer/Information
Scientists

Tower of Babel
 Ontology
used/developed by different groups
for applications
 Terminological
and conceptual incompatibilities
 Problem arise in system development and
maintenance as well as data/knowledge exchange

Insufficient expressive power
Example

Problem Oriented Medical Record
 Weed
LL. Medical records that guide and
teach. 1968. MD Comput. 1993 MarApr;10(2):100-14.
 Where “SOAP” comes from…
 The gist: organizing medical data/information
by patient problem

Many EMRs has a place for “problem list”
Example

Which one of the following is a “problem”
 Cough
 Anxiety
 Pregnancy
 Sleep
disorder
 Rash

Physicians can not agree
 Cited
by a number of POEMRs as one of the
reasons of failure
Another Example

What does “acute” mean?
or severity e.g. acute pain
 having a sudden onset, sharp rise, and short
course, e.g. acute pancreatitis
 sharpness

In a data model for finding, we had
severity as an attribute. Thus need to
decide where acute fit in.
To Solve the Problem
Develop formalism for sharing (e.g. KIF,
CGIF)
 Develop standard ontology
 Develop new formalism to increase
expressive power

Ontological Categories
Making a choice on ontological categories
is first step in system design – John Sowa
 Ontological Categories is

 “Class”
in OO system
 “Domain” in database theory
 “type” in AI theory
 “type” or “sort” in logic
Ontological Categories
Making a choice on ontological categories
is first step in system design – John Sowa
 Ontological Categories is

 “Class”
in OO system
 “Domain” in database theory
 “type” in AI theory
 “type” or “sort” in logic
Brentano’s tree of Aristotle’s
Categories
Being
Substance
Accident
Property
Inherence
Relation
Directness
Containment
CYC Ontology
Thing
Individual Object
Intangible
Represented Thing
Relationship
Event
Stuff
IntangibleObject
Collection
Contrast -> Distinction

All perceptions start with contrast
 Bright
– dark
 Tall – short
 Healthy – ill
 Happy – sad

Distinction (discrete/continuous)
conceptual interpretations of perceptual
contrasts
Contrast -> Distinction

All perceptions start with contrast
 Bright
– dark
 Tall – short
 Healthy – ill
 Happy – sad

Distinction (discrete/continuous)
conceptual interpretations of perceptual
contrasts
Distinction -> Categories

Distinctions maybe combined to generate
categories. E.g.
 Classify
patients.
 Distinctions: (insured, uninsured), (inpatient,
outpatient), (infant, child, adult), (emergency,
urgent, general)……..
 Categories: insured pediatric emergency
patient, uninsured adult inpatient……
Sowa’s Ontology (Peirce and
Whitehead)
 AXIOMS:
 Physical:
physical entities have location
in space and a point in time. E.g. hand,
hair, computer.
 Abstract: abstract entities do not have
location in space or a point in time. E.g.
theorem, knowledge, story.
Sowa’s Ontology

AXIOMS:



Independent: independent entities can exist without
being dependent on the existence of another entity.
E.g. person, diary, song.
Relative: relative entities require the existence of
some other entity. E.g. joints between bones, middle
child, remission after a disease episode.
Mediating: mediating entities require the existence of
(at least) two other entities and establish new
relationship among them. E.g. theory of relativity,
diagnostic strategy, cardiovascular system.
Sowa’s Ontology

AXIOMS:
 Continuant:
has only spatial parts and no
temporal parts; identity cannot depend on
location in space and time. E.g. gender, alert
and reminder system, medication formula.
 Occurrant: has both spatial parts
(participants) and no temporal parts (stages);
can only identify by location in space and
time. E.g. disease episode, clinical event,
medication order.
Matrix of Central Categories
Physical
Abstract
Continuant Occurrent
Continuant Occurrent
Indepen- Object
dent
Process
Schema
Script
Relative
Participation
Description
History
Situation
Reason
Purpose
Juncture
Mediating Structure
Exercise
Assume you are developing an alert system
to monitor errors in laboratory information
systems. Identify some distinctions for
categorizing the errors and describe which
distinctions are in contrast with which
other distinctions.
Semantic Network

An long existing notion: there are different
pieces of knowledge of world, and they
are all linked together through certain
semantics.
Basic Components

Nodes
 Represent

Arcs
 Represent

concepts
relations
Labels for nodes and arcs
Little Constraint
patient
Interact
Interact
Nurse
physician
Interact
Little Constraint
DSG Site
Link
Link
Instructors’
Homepage
Course Site
Link
Web
Relation
Directed or non-directed
 Multiple relations between two concepts
 Can have different properties

 Reflexive
(e.g. co-ocurrence)
 Transitive (e.g. causal)
 Symmetric (e.g. sibling)
 ………..
Some Often Used Relations in
Biomedical Domain
IS A
 IS PART OF
 CAUSE OF
 MEASURES
 CO-OCCURS
 …………

Major Limitation

Lack of Semantics
 No
formal semantic of the relations
 E.g.
Does “ISA” mean subclass, member, etc?
 Possible
multiple interpretations
 Restricted expressiveness
 E.g.
can not distinguish between instance and
class
Extension

Extending expressivity (distinguish
different types of concepts and relations”
 Distinguish
between “some” and “all”
 Distinguish between “existence” and
“intension”
 Distinguish between “definition” and
“assertion”

Add semantic rigor
 Map
to logic (Sowa – CG)
Frame-based Network
Distinguish instance vs. class
 Hierarchical structure (superclass and
subclass)
 Multiple hierarchy
 Slots

 Member
 Own
slot
slot
Slot
Frame identifying information
 Relationship between frames
 Descriptors of requirements for frame
match
 Procedural information
 Default information
 Restrictions and constraints
 New instance information

Strength
Help organize knowledge hierarchically
 Procedure information
 Support multiple inheritance

Weakness
Expressiveness (e.g. quantifier)
 Inheritance

 Sub
classing (override slot value)
 Multiple inheritance

Large complex knowledge system
Example: MED
Example: Protégé
Example: Protégé
Example: Protégé
Production Rules
Also called IF-THEN rules
 Many forms:

 IF
condition THEN action
 IF premise THEN conclusion
 IF proposition p1 and proposition p2 are true
THEN proposition p3 is true
Components
Rule base
 Inference engine
 Working memory

Inference

Modus ponens
 Forward

chaining
Modus tollens
 Background
chaining
Example: MYCIN
IF the identity of the germ is not known
with certainty
AND the germ is gram-positive
AND the morphology of the organism is
"rod"
AND the germ is aerobic
THEN there is a strong probability (0.8) that
the germ is of type enterobacteriacae
Example
 POINT
Control the
execution of
inference engine by
retrieve and
providing needed
knowledge
Define semantic
relations between
concepts
Main
Inference
Control
Medical
Knowledge
base
Jess
Inference
Engine
Inference
Rules
Fire rules when
adequate
knowledge is
provided
Define rules of
relevance base on
semantic relations
between concepts
Example
Medical
Knowledgebase
Inference Rules
Inference Process
Inference Results
Pro and Con

Pro
 Modular
 Natural

Con
 Not
efficient
 Not expressive
Exercise



The thyroid gland is located at the base of your neck in front of your
trachea (or windpipe). It has two sides and is shaped like a
butterfly.
The thyroid gland makes, stores, and releases two hormones - T4
(thyroxine) and T3 (triiodothyronine). Thyroid hormones control the
rate at which every part of your body works. This is called your
metabolism. Your metabolism controls whether you feel hot or cold
or tired or rested. When your thyroid gland is working the way it
should, your metabolism stays at a steady pace -not too fast or too
slow.
If no cancer cells are found, your doctor may prescribe a thyroid
hormone to decrease the size of your nodule. Or, your doctor may
suggest surgery to remove it. If cancer cells are found, further
treatment will be needed. Thyroid cancer usually can be treated
with success.
Excise

Which representation scheme to choose?
Reading
Sowa: Chapter 2
 Sowa: Chapter 4
