Transcript Slide 1

Probabilistic Reasoning and
Bayesian Networks
Lecture Prepared For
COMP 790-058
Yue-Ling Wong
Probabilistic Robotics
• A relatively new approach to robotics
• Deals with uncertainty in robot perception and
action
• The key idea is to represent uncertainty explicitly
using the calculus of probability theory
• i.e. represent information by probability
distributions over a whole space of guesses,
instead of relying on a single "best guess"
3 Parts of this Lecture
• Part 1. Acting Under Uncertainty
• Part 2. Bayesian Networks
• Part 3. Probabilistic Reasoning in Robotics
Reference for Part 3
• Sebastian Thrun, et. al. (2005) Probabilistic
Robotics
• The book covers major techniques and
algorithms in localization, mapping, planning and
control
• All algorithms in the book are based on a single
overarching mathematical foundations:
– Bayes rule
– its temporal extension known as Bayes filters
Goals of this lecture
• To introduce this overarching
mathematical foundations: Bayes rule and
its temporal extension known as Bayes
filters
• To show how Bayes rule and Bayes filters
are used in robotics
Preliminaries
• Part 1
– Probability theory
– Bayes rule
• Part 2
– Bayesian Networks
– Dynamic Bayesian Networks
Outline of this lecture
• Part 1. Acting Under Uncertainty (October 20)
– To go over fundamentals on probability theory that is necessary to
understand the materials of Bayesian reasoning
– Start with AI perspective and without adding the temporal aspect of
robotics
• Part 2. Bayesian Networks (October 22)
– DAG representation of random variables
– Dynamic Bayesian Networks (DBN) to handle uncertainty and changes
over time
• Part 3. Probabilistic Reasoning in Robotics (October 22)
– To give you general ideas of how DBN is used in robotics to handle the
changes of sensor and control data over time in making inferences
– Demonstrate use of Bayes rule and Bayes filter in a simple example of
mobile robot monitoring the status (open or closed) of doors
Historical Background and Applications of
Bayesian Probabilistic Reasoning
• Bayesian probabilistic reasoning has been used
in AI since 1960, especially in medical diagnosis
• One system outperformed human experts in the
diagnosis of acute abdominal illness
(de Dombal, et. al. British Medical Journal, 1974)
Historical Background and Applications of
Bayesian Probabilistic Reasoning
• Directed Acyclic Graph (DAG) representation for
Bayesian reasoning started in the 1980's
• Example systems using Bayesian networks
(1980's-1990's):
– MUNIN system: diagnosis of neuromuscular disorders
– PATHFINDER system: pathology
Historical Background and Applications of
Bayesian Probabilistic Reasoning
• NASA AutoClass for data analysis
http://ti.arc.nasa.gov/project/autoclass/autoclass-c/
finds the set of classes that is maximally
probable with respect to the data and model
• Bayesian techniques are utilized to calculate the
probability of a call being fraudulent at AT&T
Historical Background and Applications of
Bayesian Probabilistic Reasoning
By far the most widely used Bayesian network systems:
– The diagnosis-and-repair modules (e.g. Printer Wizard) in
Microsoft Windows
(Breese and Heckerman (1996). Decision-theoretic troubleshooting: A framework
for repair and experiment. In Uncertainty in Artificial Intelligence: Proceedings of
the Twelfth Conference, pp. 124-132)
– Office Assistant in Microsoft Office
(Horvitz, Breese, Heckerman, and Hovel (1998). The Lumiere project: Bayesian
user modeling for inferring the goals and needs of software users. In Uncertainty
in Artificial Intelligence: Proceedings of the Fourteenth Conference, pp. 256-265.
http://research.microsoft.com/~horvitz/lumiere.htm)
– Bayesian inference for e-mail spam filtering
Historical Background and Applications of
Bayesian Probabilistic Reasoning
• An important application of temporal probability
models: Speech recognition
References and Sources of Figures
• Part 1:
Stuart Russell and Peter Norvig, Artificial
Intelligence A Modern Approach, 2nd ed., Prentice
Hall, Chapter 13
• Part 2:
Stuart Russell and Peter Norvig, Artificial Intelligence A
Modern Approach, 2nd ed., Prentice Hall, Chapters 14 &
15
• Part 3:
Sebastian Thrun, Wolfram Burgard, and Dieter Fox,
Probabilistic Robotics, Chapter 2
Part 1 of 3: Acting Under
Uncertainty
Uncertainty Arises
• The agent's sensors give only partial, local
information about the world
• Existence of noise of sensor data
• Uncertainty in manipulators
• Dynamic aspects of situations (e.g. changes
over time)
Degree of Belief
An agent's knowledge can at best provide
only a degree of belief in the relevant
sentences.
One of the main tools to deal with degrees
of belief will be probability theory.
Probability Theory
Assigns to each sentence a numerical
degree of belief between 0 and 1.
In Probability Theory
You may assign 0.8 to the a sentence:
"The patient has a cavity."
This means you believe:
"The probability that the patient has a
cavity is 0.8."
• It depends on the percepts that the agent has received
to date.
• The percepts constitute the evidence on which
probability assessments are based.
Versus In Logic
You assign true or false to the same
sentence.
True or false depends on the interpretation
and the world.
Terminology
• Prior or unconditional probability
– The probability before the evidence is
obtained.
• Posterior or conditional probability
– The probability after the evidence is obtained.
Example
Suppose the agent has drawn a card from a
shuffled deck of cards.
Before looking at the card, the agent might assign
a probability of 1/52 to its being the ace of
spades.
After looking at the card, the agent has obtained
new evidence. The probability for the same
proposition (the card being the ace of spades)
would be 0 or 1.
Terminology and Basic
Probability Notation
Terminology and Basic Probability Notation
Proposition
Ascertain that such-and-such is the case.
Terminology and Basic Probability Notation
Random variable
Refers to a "part" of the world whose
"status" is initially unknown.
Example: Cavity might refer to whether
the patient's lower left wisdom tooth has a
cavity.
Convention used here: Capitalize the
names of random variables.
Terminology and Basic Probability Notation
Domain of a random variable
The collection of values that a random
variable can take on.
Example: The domain of Cavity might
be: true, false
The domain of Weather might be:
sunny, rainy, cloudy, snow
Terminology and Basic Probability Notation
Abbreviations used here:
cavity to represent Cavity = true
cavity to represent Cavity = false
snow to represent Weather = snow
cavity  toothache
to represent:
Cavity=true  Toothache=false
Terminology and Basic Probability Notation
cavity  toothache
or
Cavity=true  Toothache=false
is a proposition that may be assigned with a
degree of belief
Terminology and Basic Probability Notation
Prior or unconditional probability
The degree of belief associated with a
proposition in the absence of any other
information.
Example:
p(Cavity=true) = 0.1
or p(cavity) = 0.1
Terminology and Basic Probability Notation
p(Weather=sunny) = 0.7
p(Weather=rain) = 0.2
p(Weather=cloudy) = 0.08
p(Weather=snow) = 0.02
or we may simply write
P(Weather) = 0.7, 0.2, 0.08, 0.02
Terminology and Basic Probability Notation
Prior probability distribution
A vector of values for the probabilities of
each individual state of a random variable
Example: This denotes a prior probability
distribution for the random variable
Weather.
P(Weather) = 0.7, 0.2, 0.08, 0.02
Terminology and Basic Probability Notation
Joint probability distribution
The probabilities of all combinations of the
values of a set of random variables.
P(Weather, Cavity)
denotes the probabilities of all combinations
of the values of a set of random variables
Weather and Cavity.
Terminology and Basic Probability Notation
P(Weather, Cavity)
can be represented by a 4x2 table of probabilities.
Cavity=true
Weather=sunny
Weather=rainy
Weather=cloudy
Weather=snow
Cavity=false
Terminology and Basic Probability Notation
Full joint probability distribution
The probabilities of all combinations of the
values of the complete set of random
variables.
Terminology and Basic Probability Notation
Example: Suppose the world consists of just
the variables Cavity, Toothache, and
Weather.
P(Cavity, Toothache, Weather)
denotes the full joint probability distribution
which can be represented as a 2x2x4
table with 16 entries.
Terminology and Basic Probability Notation
Posterior or conditional probability
Notation: p(a|b)
Read as: "The probability of proposition a,
given that all we know is proposition b."
Terminology and Basic Probability Notation
Example:
p(cavity|toothache) = 0.8
Read as:
"If a patient is observed to have a
toothache and no other information is yet
available, then the probability of the
patient's having a cavity will be 0.8."
Terminology and Basic Probability Notation
Equation:
p ( a  b)
p ( a | b) 
p(b)
where p(b) > 0
Terminology and Basic Probability Notation
Product rule
p(a  b)  p(a | b) p(b)
which is rewritten from the previous equation
p ( a  b)
p ( a | b) 
p(b)
Terminology and Basic Probability Notation
Product rule
p(a  b)  p(a | b) p(b)
can also be written the other way around
p(a  b)  p(b  a)  p(b | a) p(a)
Terminology and Basic Probability Notation
Intuition
cavity  toothache
cavity
toothache
p(cavity  toothache)
p(cavity | toothache) 
p(toothache)
Terminology and Basic Probability Notation
Intuition
cavity  toothache
cavity
toothache
p(toothache cavity)
p(toothache| cavity) 
p(cavity)
Derivation of Bayes' Rule
p(a  b)  p(a | b) p(b)
p(b  a)  p(b | a) p(a)
p(b  a)  p(a  b)
 p(b | a) p(a)  p(a | b) p(b)
p(a | b) p(b)
 p(b | a) 
p(a)
Terminology and Basic Probability Notation
Bayes' rule, Bayes' law, or Bayes' theorem
p(a | b) p(b)
p(b | a) 
p(a)
Bayesian Spam Filtering
Given that it has certain words in an email, the
probability that the email is spam is equal to the
probability of finding those certain words in
spam email, times the probability that any email
is spam, divided by the probability of finding
those words in any email
Speech Recognition
P(words | signal)   P(signal| words) P(words)
Given the acoustic signal, the probability
that the signal corresponds to the words is
equal to the probability of getting the
signal with the words, times the probability
of finding those words in any speech,
times a normalization coefficient
Terminology and Basic Probability Notation
Conditional distribution
Notation: P(X|Y)
It gives the values of p(X=xi | Y=yj) for each
possible i, j.
Terminology and Basic Probability Notation
Conditional distribution
Example:
P(X,Y) = P(X|Y)P(Y)
denotes a set of equations:
p(X=x1  Y=y1)= p(X=x1 | Y=y1)p(Y=y1)
p(X=x1  Y=y2)= p(X=x1 | Y=y2)p(Y=y2)
.
.
.
Probabilistic Inference
Using Full Joint Distributions
Terminology and Basic Probability Notation
Simple dentist diagnosis example.
3 Boolean variables:
• Toothache
• Cavity
• Catch (the dentist's steel probe catches in
the patient's tooth)
A full joint distribution for the
Toothache, Cavity, Catch world
toothache
toothache
catch
catch
catch
catch
cavity
0.108
0.012
0.072
0.008
cavity
0.016
0.064
0.144
0.576
Getting information from the full
joint distribution
toothache
toothache
catch
catch
catch
catch
cavity
0.108
0.012
0.072
0.008
cavity
0.016
0.064
0.144
0.576
p(cavity  toothache) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28
Getting information from the full
joint distribution
toothache
toothache
catch
catch
catch
catch
cavity
0.108
0.012
0.072
0.008
cavity
0.016
0.064
0.144
0.576
p(cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2
unconditional or marginal probability
Marginalization, Summing Out,
Theorem of Total Probability,
and Conditioning
Getting information from the full
joint distribution
toothache
toothache
catch
catch
catch
catch
cavity
0.108
0.012
0.072
0.008
cavity
0.016
0.064
0.144
0.576
p(cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2
p(cavity, catch, toothache) + p(cavity, catch, toothache)
+ p(cavity, catch, toothache) + p(cavity, catch, toothache)
Marginalization Rule
Marginalization rule
For any sets of variables Y and Z,
(Y )   (Y , Z )
Z
A distribution over Y can be obtained by summing
out all the other variables from any joint
distribution containing Y.
A variant of the rule after applying
the product rule
Conditioning
For any sets of variables Y and Z,
(Y )   (Y | Z ) ( Z )
Z
or
p( y)   p( y | z ) p( z )
z
Read as: Y is conditioned on the variable Z.
Often referred to as Theorem of total probability.
Getting information from the full
joint distribution
toothache
toothache
catch
catch
catch
catch
cavity
0.108
0.012
0.072
0.008
cavity
0.016
0.064
0.144
0.576
The probability of a cavity, given evidence of a toothache:
p(cavity | toothache)
conditional probability
Getting information from the full
joint distribution
toothache
toothache
catch
catch
catch
catch
cavity
0.108
0.012
0.072
0.008
cavity
0.016
0.064
0.144
0.576
The probability of a cavity, given evidence of a toothache:
p(cavity  toothache)
p(toothache)
0.108 0.012

 0.6
0.108 0.012 0.016 0.064
p(cavity | toothache) 
conditional probability
Getting information from the full
joint distribution
toothache
toothache
catch
catch
catch
catch
cavity
0.108
0.012
0.072
0.008
cavity
0.016
0.064
0.144
0.576
The probability of a cavity, given evidence of a toothache:
p(cavity  toothache)
p(toothache)
0.108 0.012

 0.6
0.108 0.012 0.016 0.064
p(cavity | toothache) 
conditional probability
Getting information from the full
joint distribution
toothache
toothache
catch
catch
catch
catch
cavity
0.108
0.012
0.072
0.008
cavity
0.016
0.064
0.144
0.576
The probability of a cavity, given evidence of a toothache:
p(cavity  toothache)
p(toothache)
0.108 0.012

 0.6
0.108 0.012 0.016 0.064
p(cavity | toothache) 
conditional probability
Independence
Independence
If the propositions a and b are independent,
then
p(a|b) = p(a)
p(b|a) = p(b)
p(ab) = p(a,b) = p(a)p(b)
Think about the coin flipping example.
Independence Example
Suppose Weather and Cavity are
independent.
p(cavity | Weather=cloudy) = p(cavity)
p(Weather=cloudy | cavity) = p(Weather=cloudy)
p(cavity, Weather=cloudy) =
p(cavity)p(Weather=cloudy)
Similarly…
If the variables X and Y are independent,
then
P(X|Y) = P(X)
P(Y|X) = P(Y)
P(X,Y) = P(X)P(Y)
Normalization
Previous Example
toothache
toothache
catch
catch
catch
catch
cavity
0.108
0.012
0.072
0.008
cavity
0.016
0.064
0.144
0.576
The probability of a cavity, given evidence of a toothache:
p(cavity  toothache)
p(toothache)
0.108 0.012

 0.6
0.108 0.012 0.016 0.064
p(cavity | toothache) 
Previous Example
toothache
toothache
catch
catch
catch
catch
cavity
0.108
0.012
0.072
0.008
cavity
0.016
0.064
0.144
0.576
The probability of a cavity, given evidence of a toothache:
p(cavity | toothache) 

p(cavity  toothache)
p(toothache)
0.016 0.064
 0.4
0.108 0.012 0.016 0.064
Normalization
The term
1
P(toothache)
remains constant, no matter which value of
Cavity we calculate.
In fact, it can be viewed as a normalization
constant for the distribution
P(Cavity|toothache), ensuring that it adds
up to 1.
Recall this example…
toothache
toothache
catch
catch
catch
catch
cavity
0.108
0.012
0.072
0.008
cavity
0.016
0.064
0.144
0.576
The probability of a cavity, given evidence of a toothache:
p(cavity  toothache)
p(toothache)
0.108 0.012

 0.6
0.108 0.012 0.016 0.064
p(cavity | toothache) 
Now, normalization simplifies
the calculation
toothache
toothache
catch
catch
catch
catch
cavity
0.108
0.012
0.072
0.008
cavity
0.016
0.064
0.144
0.576
The probability distribution of Cavity, given evidence of a toothache:
(Cavity | toothache)   (Cavity, toothache)
  [(Cavity, toothache, catch)  (Cavity, toothache, catch)]
  [ 0.108,0.016  0.012,0.064 ]
  0.12,0.08
 0.6,0.4
Now, normalization simplifies
the calculation
toothache
toothache
catch
catch
catch
catch
cavity
0.108
0.012
0.072
0.008
cavity
0.016
0.064
0.144
0.576
The probability distribution of Cavity, given evidence of a toothache:
(Cavity | toothache)   (Cavity, toothache)
  [(Cavity, toothache, catch)  (Cavity, toothache, catch)]
  [ 0.108,0.016  0.012,0.064 ]
  0.12,0.08
 0.6,0.4
Example of Probabilistic Inference:
Wumpus World
OK
OK
OK
OK
Pit?
Wumpus
Pit?
Wumpus
OK
Pit?
Wumpus
Pit?
Wumpus
Pit?
Wumpus
Pit?
Wumpus
Pit?
Wumpus
Pit?
Wumpus
Pit?
Wumpus
Now what??
Pit?
Wumpus
Pit?
Wumpus
Pit?
Wumpus
By applying Bayes' rule, you can calculate the probabilities of these cells having a
pit, based on the known information.
0.31
Pit?
Wumpus
Pit?
Wumpus
0.86
Pit?
Wumpus
0.31
To Calculate the Probability Distribution for
Wumpus Example
Let unknown be a composite variable consisting of Pi,j
variables for squares other than Known squares and the
query square [1,3]