Constraint propagation

Transcript Constraint propagation

URKS
Uncertainty Reasoning
in Knowledge Systems
D. Bollé
H. Bruyninckx
M. Nuttin
D. De Schreye
Course overview:
 Introduction on uncertainty in A.I.
 Motivation - examples - some basic concepts
 D. De Schreye (1 session)
 Probability concepts, techniques and systems
 Bayesian theory and approach
 D. Bollé and H. Bruyninckx (+/-5 sessions)
 Fuzzy concepts, techniques and systems
 Zadeh theory and approach/possibility theory
 M. Nuttin (+/- 6 sessions)
 Comparison and question session
2
Practical sessions:
 5 practical (hands-on) sessions of 2.5 hours:
 first: introduction to matlab
 2 sessions on probability examples
 2 sessions on fuzzy examples (second is pen-paper)
 examination: exercises related to lab-sessions
(1/2 of points)
open book
 remainder: written theory exam.
 closed book
Http://www.cs.kuleuven.ac.be/~dannyd/URKS
3
Uncertainty in
Knowledge Systems
Introduction
probable
fuzzy
imprecise
likely
Levels
vague
uncertain of certainty on the uncertain
approximating
assuming
possible
5
Overview:
Contents
Sources of Uncertainty.
Utility and decision theory.
Diagnosis and weak implications.
Quantification types for uncertainty.
What is a probability?
Contents
Joint probabilistic distributions.
Motivating Bayes rule
Inference under uncertainty:
Abductive reasoning
Probabilistic reasoning
GenInfer
Opinion Nets
Introducing Fuzzy sets.
Prior versus Conditional probability.
Probabilistic rules.
Axioms of probability.
6
Sources of uncertainty:
1. Information obtained may take the form of weak
implications
Ex.: in diagnosis:
disease(p, Cavity) (0.8)  symptom(p, Toothache)
quantification of frequency
with which the rule applies
2. Imprecise language:
“often”, “sometimes”, “frequently”, “hardly ever”, …
- need to quantify these in terms of frequency,
- need to deal with proposed frequency in rules.
7
Study of relation between imprecise
language and frequency:
8
Sources of uncertainty (2):
3. Unknown information.
grass_is_wet  sprinkler_was_on
grass_is_wet  rain_last_night
 We observe that grass_is_wet, but have no
information on sprinkler nor rain.
 How to reason?
Abductive reasoning
 Note: can be “ranges” of unknown, depending on
additional evidence.
quantification of possible conclusions
4. Conflicting information.
Ex.: Several experts have provided conflicting
information:
quantification of measure of belief
9
Sources of uncertainty (3):
5. Vague Concepts:
Herman is tall.
What is Herman’s height?
- at least 1.80 m?
- could Herman be 1.78 m and still tall?
- if Herman is in the population of Basketball players
is Herman still tall?
- if Herman would be a kid of 9 years and 1.45 m
is Herman also tall?
We may want to quantify the
degree in which Herman belongs to
the set of ‘tall_people’
10
Sources of uncertainty (4):
6. Precise specifications may be too complex:
Plan_90: leave 90 minutes before departure
Problem: will Plan_90 succeed?
again depends on unknown information
(traffic jam?, earthquake?, accident?)
BUT: enumeration of all conditions may be impossible or
impractical:
Succeed(Plan_90)  not(car_break_down) and
not(out_of_gas) and
not(accident) and
not(traffic_jam) and
...
Quantification of estimated degree of success
instead of specification of all conditions
11
Sources of uncertainty (5):
7. Propagation of uncertain information:
Tomorrow(cold)
Tomorrow(rain)
(0.7)
(0.6)
Can we easily determine the uncertainty for:
Tomorrow(cold)  Tomorrow(rain) (?)
Or for:
Tomorrow(cold)  Tomorrow(rain) (?)
Not without sufficient information on the interdependencies
of the events!
In absence of interdependencies: propagation of
uncertain knowledge increases uncertainty of the
conclusions
12
Utility theory:
Plan_90: leave 90 minutes before departure
Plan_120: leave 120 minutes …
Plan_1440: leave 24 hours ...
Assume that Plan_90 is the right thing to do: what
would this mean?
Plan_120 is more likely to succeed.
Plan_1440 is practically sure to succeed.
BUT: Plan_90 attempts to optimize all our goals:
- arrive on time for the flight
- avoid long waiting at the airport
- avoid getting speeding tickets for the drive
- ...
Utility theory is used to represent and reason about
preferences.
13
Decision theory:
If we have expressed preferences using Utility
Theory and we have expressed probabilities of
events and effects by Probability Theory:
Decision theory = Probability theory + Utility theory
A system is rational
if it chooses the action that yields the highest
expected utility, averaged over all
probabilities of outcomes of the action.
14
More on diagnosis and
weak implications:
Is it possible to capture diagnosis in hard rules ?
disease(p, Cavity)  symptom(p, Toothache)
is simply wrong
there may be other diseases !
disease(p, Cavity)  disease(p, GumDisease) 
disease(p, ImpactedWisdom)  …
 symptom(p, Toothache)
Again enumeration problem: do we know them all ?
symptom(p, Toothache)  disease(p, Cavity)
wrong again
not each cavity causes pain !
There is no correct logical rule.
The best we can do is provide “a quantification of belief”15
What kind of quantifications?
Basic distinction
Degree of belief
Degree of membership
Degree of belief:
Probability (respecting the axioms of
probability theory)
Given Toothache, there is 80% chance of Cavity
Certainty factors (don’t respect the axioms)
Toothache gives Cavity with factor 0.7 (in [-1,1])
Degree of membership:
Fuzzy Logic (measures in vague concepts)
Herman is tall (with 95% measure of belonging
to the set of ‘tall people’)
16
What is a probability?
P(A) = number in the range [0,1] expressing the
degree of belief in A.
 An often used intuition:
counting
P( randomly chosen person is Chinese) = ?
Chinese
=
all_people
#(Chinese)
#(all_people)
 Interesting intuition to verify basic axioms and rules
of probability
 BUT: counting is not always possible, nor desirable.
17
What is a probability? (2)
 Statistics may help:
 Count a randomly selected subset of the population
 determine the ratio (e.g.: of Chinese) from this subset
taken as the probability
 But even more often:
 A general measure of belief on the basis of
 prior experience
 intuition
 ...
 Crucial is:
 Belief and probability changes with new gathered
information !
Prior probability
Conditional probability
18
Prior versus Conditional:
The Chinese student story
The Bombed plane story
Prior Probability:
A = a randomly chosen student in some classroom is Chinese
P(A) = 1/6
Conditional Probability:
add information:
B = the chosen classroom is in K.U.Leuven
P(A | B) = 1/60
add information:
C = the classroom is from MAI
P(A | B  C) = 1/4
19
Prior versus Conditional (2):
The Bombed plane story
Prior Probability:
A = there is a bomb on board
P(A) = 1/106
Conditional Probability:
add information:
A’ = there is (independently) another
bomb on board
change A’:
P(A | A’) = 1/ 1012
B = I bring a second bomb myself
P(A | B) = 1/ 106
 Prior probability (A):
 probability of A in absence of any other information
 Conditional Probability (A|B):
 probability of A given that we already know B
20
Toothache example:
P(Cavity) = 0.1
10% of all individuals have a Cavity
P(Toothache) = 0.05
5% have a Toothache
P(Cavity|Toothache) = 0.8
given that we know the individual has Toothache,
there is 80% chance of him having Cavity
P(Toothache|Cavity) = 0.4
conditional probability is NOT symmetric
P(Cavity|Toothache  not Gumdisease) = 0.9
additionally given that another diagnosis is already
excluded, conditional probability increases
P(Cavity|Toothache  FalseTeeth) = 0
adding information does not necessarily increase the
probability
21
A sensible semantics for
probabilistic rules:
A probabilistic rule
A (factor)  B1  B2  …  Bn
should best have the semantics:
P(A| B1  B2  …  Bn) = factor
So: we simply have alternative syntax:
disease(p, Cavity) (0.8)  symptom(p, Toothache)
P(Cavity|Toothache) = 0.8
22
But this is NOT standard at all !
In probabilistic Logic Programming (Subramanian et al.)
A [n1,n2]  B [m1,m2]
Means: If the probability that in some possible
world B is true is between m1 and m2
Then the probability that in some world A is
true is between n1 and n2
 Example: flip a coin, with events A = head, B = tail
head [0.5,0.5]  tail [0.5,0.5]
If the probability of a world in which you get tail is 0.5
Then the probability of a world in which you get head is
also 0.5
Notice: no world in the intersection !
In conditional probabilities semantics:
head 0.0  tail
23
The axioms of Probability
1. 0  P(A)  1
2. P(True) = 1
,
P(False) = 0
3. P(A  B) = P(A) + P(B) - P(A  B)
A
AB
B
Derived: P(A) = 1 - P(A)
Are the major difference with “certainty factor” systems:
do NOT respect these axioms
(Mycin: factors in range -1, 1)
24
Joint probability distribution
 Given 2 properties/events: list the entire distribution
of all probability assignments to all possible
combinations of truth-values for the properties/events
Toothache
Toothache
Cavity
0.04
0.06
 Cavity
0.01
0.89
All prior and conditional probabilities can be derived !
P(Toothache|Cavity) = 0.04 / 0.04 + 0.06 = 0.4
BUT: gathering this distribution is often not possible or
at least very tedious.
Bayes Rule !
25
Inference under Uncertainty.
The logic version of the burglary-alarm example:
burglary
phoneRings
earthquake
alarm
unlessLoudmusic
JohnCalls
MaryCalls
In Logic:
JohnCalls  Alarm
Alarm  Burglary
JohnCalls  PhoneRings
Alarm  EarthQuake
MaryCalls  Alarm  Loudmusic
What can we deduce from an observation that John
calls, Mary doesn’t and Mary’s CD-player was broken ? 26
Abductive Reasoning
 Deductive reasoning:
(using Modus Ponens)
A
BA
B
 Abductive reasoning:
(assume A is unknown)
B
BA
A
Abduce that A holds as an explanation for the observation B
 More generally: given a set of observations and a set of
logical rules: find a set of hypotheses (from the unknown
properties) that allow to deduce the observations
27
Abduction in burglar-alarm:
 Unknown information :
burglary
earthquake
phoneRings
alarm
JohnCalls
JohnCalls  Alarm
JohnCalls  PhoneRings
MaryCalls  Alarm  Loudmusic
 Observation:
abduce
PhoneRings
unlessLoudmusic
MaryCalls
Alarm  Burglary
Alarm  EarthQuake
JohnCalls
deduce
Alarm
abduce
Burglary
abduce
EarthQuake
 3 possible solutions: PhoneRings  Burglary  EarthQuake
28
Abduction in burglar-alarm (2):
JohnCalls  Alarm
JohnCalls  PhoneRings
MaryCalls  Alarm  Loudmusic
 Observation:
Alarm  Burglary
Alarm  EarthQuake
JohnCalls   MaryCalls  LoudMusic
abduce
deduce
Alarm
PhoneRings
deduce
 Alarm
hypothesis of alarm is inconsistent
with observations
 1 possible explanation:
PhoneRings
Note: abductive procedures may be complicated !
29
Belief Networks or
Bayesian Nets
The probabilistic version of the burglary-alarm example:
P(B)
.001
burglary
earthquake
alarm
A
T
F
P(J)
.90
.05
JohnCalls
B
T
T
F
F
E
T
F
T
F
P(E)
.002
P(A)
.95
.94
.29
.001
MaryCalls
A
T
F
P(M)
.70
.01
Acyclic directed network !
Prior probability for roots, conditional (on parents) for
30
lower levels
Inference in Belief Networks
P(B)
.001
burglary
earthquake
alarm
A
T
F
P(J)
.90
.05
JohnCalls
B
T
T
F
F
E
T
F
T
F
P(A)
.95
.94
.29
.001
MaryCalls
P(E)
.002
A
T
F
P(M)
.70
.01
Many (all) types of questions can be answered, using Bayes Rule.
What is the probability that there is no burglary, nor
earthquake, but that the alarm went and both John and Mary
called?
= P (J  M  A  B  E) = 0.00062
What is the probability that there is a burglary, given that John
calls?
= P(B | J) = ?
(Bayes)
0.016
31
An application: GENINFER
 A couple is expecting a child.
 The (expecting) mother has a hemophiliac risk
 determine the probability of hemophiliac for the child
 Hemophiliac disease is genetically determined:
 Due to a defected X chromosome
Chromosomes:
mother
father
X X
X y
carrier hemophiliac
X X
X y
child
X X
X y
32
The Bayesian Network:
P(M)
.00008
mother_carrier
father_hemoph
M
T
T
F
F
child_recessive
great grandmother
great uncle
A family tree:
C
H
ok
F
T
F
T
F
P(F)
.00008
P(C)
.75
.50
.50
0
great grandfather
grandmother grandfather
?
mother
ok
?
ok
?
father
33
Expanding to full network:
P(GGM)
1
P(GU)
1
GU
GGF
GGM
GF
GM
Tempting solution:
but these are not prior
probabilities
P(GGF)
0
F
M
But in fact remains correct
if you interpret events differently
P(GF)
0
P(F)
0
C
34
Expanding to full network (2)
Compute: P(GGM| GU  GGF) = 1
Compute: P(GM| GGM  GGF) = 0.5 , etc.
P(GGM)
.00008
GGF
GGM
P(GGF)
.00008
0.5
GU
All dependencies
are based on:
M
T
T
F
F
F
T
F
T
F
P(C)
.75
.50
.50
0
GF
GM
P(GF)
.00008
0.25
F
M
P(F)
.00008
0.125
C
35
And if there are uncles?
Recompute: P(GM| GMM  GGF  U1  U2)
Propagate the information to Mother and Child
GGF
GGM
GU
U1
GM
U2
GF
0.5
F
M
C
0.028
36
And brothers?
Probability under additional condition of 3 healthy bothers:
GGF
GGM
further decrease
GU
GF
GM
U1
U2
B1
B2
F
M
B3
C
0.007
37
Belief Networks as Rule Systems
P(B)
.001
burglary
earthquake
alarm
A
T
F
P(J)
.90
.05
JohnCalls
Burglary (0.01) 
Earthquake (0.02) 
B
T
T
F
F
E
T
F
T
F
P(A)
.95
.94
.29
.001
MaryCalls
P(E)
.002
A
T
F
P(M)
.70
.01
JohnCalls (0.90)  Alarm
JohnCalls (0.05)  Alarm
MaryCalls (0.70)  Alarm
Alarm (0.95)  Burglary  Earthquake
MaryCalls (0.01)  Alarm
Alarm (0.94)  Burglary  Earthquake
Alarm (0.29)  Burglary  Earthquake
Alarm (0.001)  Burglary  Earthquake
Doesn’t add anything …but shows that you need many
rules to represent the full Bayesian Net
In many cases you may not have all this information!
38
Opinion Nets:
when dependencies are unknown
 Will the stock (on stock market) split ?
 Ask the opinion of 2 brokers and of 2 mystics.
Broker 1
Broker 2 OR
Brokers’ opinion
AND
Mystic 1
Mystic 2 OR
Overall opinion
Mystics’ opinion
Stock_split  Brokers_say_split  Mystics_say_split
Brokers_say_split  Boker1  Broker2
Mystics_say_split  Mystic1  Mystic2
+ opinions (in probabilities) of Brokers and Mystics
39
Opinion Nets (2)
 Problem:
 We don’t know the dependencies between the opinions
of the brokers and/or mystics !
 How to propagate their (probabilistic) opinions?
You need upper and lower bounds!
1
Upperbound(E)
P(E)
0
U(E)  P(E)
L(E)  P(E)
Lowerbound(E)
We try to make it Least Upperbounds and Greatest Lowerbounds
40
Rules governing the
propagation of bounds
… for the or connective
A
B
OR
A or B
U(A)  U(A or B)
U(B)  U(A or B)
L(A)  L(A or B) - U(B)
L(B)  L(A or B) - U(A)
U(A or B)  U(A) + U(B)
L(A or B)  max[L(A) , L(B)]
… similar for the and connective
41
Some example inferences:
A
B
A
A
B
B
B
A
max[P(A), P(B)]  P(A or B)  P(A) + P(B)
max( L(A) , L(B)) 
max( L(A) , L(B)) is a lowerbound for P(A or B)
max( L(A) , L(B))  L(A or B)
42
Propagation of opinions:
Using forward propagation:
1
0
1
OR
1
0
U(A or B)  U(A) + U(B)
0.4
0.1
0
0.4
0.3
0.8
0.3
L(A or B)  max[L(A) , L(B)]
Or backward propagation:
1
0.8
1
0
1
0
U(A)  U(A or B)
OR
0.8
0
U(B)  U(A or B)
0.8
0.3
43
The global propagation
(restricted to lowerbounds):
1
0
1
0
0.75
0.25
L(A or B)  max[L(A) , L(B)]
OR
0.66
0
0.33
0
1
0
0.85
0.85
0.33
1
AND
1
0.15
0.15
1
1
0
0.18 = 0.33 +
0.85 -1
0.85
OR
0
L(A and B)  L(A) + L(B) - 1
 Not just Opinions: any probability propagation in absence
44
of dependencies needs to apply approximations.
Going Fuzzy …
for a few minutes.
 Examples of Fuzzy statements:
The motor is running very hot.
Tom is a very tall guy.
Electric cars are not very fast.
High-performance drives require very rapid dynamics
and precise regulation.
 Leuven is quite a short distance from Brussels.
 Leuven is a beautiful city.
 The maximum range of an electronic vehicle is short.




 If short means: 300 km or less, would 301 km be long ?
 Want to express to what degree a property holds.
45
Relations, sets and functions
Offer alternative representations of logical statements
Relations:
loves(John, Ann)
loves(Mary, Phil)
loves(Carl,Susan)
Equivalent !
Sets:
Functions to {0,1}:
loves = {(John,Mary),
(Mary, Phil),
(Carl, Susan)}
This one allows refined statements
loves(John,Mary) = 1
loves(Mary,Phil) = 1
loves(Carl,Susan) = 1
46
Fuzzy sets:
 Are functions: f: domain  [0,1]
Crisp set (tall men):
1
0
150
160
170
180
190
200
210 cm
170
180
190
200
210 cm
Fuzzy set (tall men):
1
0
150
160
47
Representing a domain:
Crisp sets (men’s height):
1
short
0
150
160
tall
medium
170
180
190
200
210 cm
200
210 cm
Fuzzy set (men’s height):
1
short
short
0
150
160
medium
170
180
tall
190
48
And much, MUCH MORE ...
… in the coming lessons.
49

Constraint propagation

Transcript Constraint propagation

Directory