Transcript Document
CISC453 Winter 2010
AIMA3e Chapter 13:
Quantifying Uncertainty
OUTLINE
2
overview
1. rationale for a new representational language
what logical representations can't do
2.
3.
4.
5.
6.
utilities & decision theory
possible worlds & propositions
unconditional & conditional probabilities
random variables
probability distributions
Quantifying Uncertainty
OUTLINE
3
overview
7. using the Joint Probability Distribution
for inference by enumeration
for unconditional & conditional probabilities
8. reducing complexity
independence
9. Bayes' Rule
from causal probability to diagnostic probability
conditional independence
10. pits & probability in Wumpus World
11. summary
Quantifying Uncertainty
4
Quantifying Uncertainty
consider our approach so far
we've handled limited observability &/or non-determinism
using belief states that capture all possible world states
but the representation can become large, as can corresponding
contingent plans, and it's possible that no plan can be
guaranteed to reach the goal, yet the agent must act
agents should behave rationally
this rationality depends both on the importance of goals and
on the chances of & degree to which they'll be reached
Quantifying Uncertainty
5
A Visit to the Dentist
we'll use medical/dental diagnosis examples
extensively
our new prototype problem relates to whether a dental
patient has a cavity or not
the process of diagnosis always involves uncertainty & this
leads to difficulty with logical representations (propositional
logic examples)
(1) toothache cavity
(2) toothache cavity gumDisease ...
(3) cavity toothache
(1) is just wrong since other things cause toothaches
(2) will need to list all possible causes
(3) tries a causal rule but it's not always the case that cavities
cause toothaches & fixing the rule requires making it logically
exhaustive
Quantifying Uncertainty
6
Representations for Diagnosis
logic is not sufficient for medical diagnosis, due to
our Laziness: it's too hard to list all possible antecedents or
consequents to make the rule have no exceptions
our Theoretical Ignorance: generally, there is no complete
theory of the domain, no complete model
our Practical Ignorance: even if the rules were complete, in
any particular case it's impractical or impossible to do all the
necessary tests, to have all relevant evidence
the example relationship between toothache & cavities is not
a logical consequence in either direction
instead, knowledge of the domain provides a degree of belief
in diagnostic sentences & the way to represent this is with
probability theory
next slide: recall our discussion of ontological & epistemological
commitments from 352
Quantifying Uncertainty
7
Epistemological Commitment
ontological commitment
what a representational language assumes about the nature
of reality - logic & probability theory agree in this, that facts
do or do not hold
epistemological commitment
the possible states of knowledge
for logic, sentences are true/false/unknown
for probability theory, there's a numerical degree of belief in
sentences, between 0 (certainly false) and 1 (certainly true)
Quantifying Uncertainty
The Qualification Problem
8
for a logical representation
the success of a plan can't be inferred because of all the
conditions that could interfere but can't be deduced not to
happen (this is the qualification problem)
probability is a way of dealing with the qualification problem by
numerically summarizing the uncertainty that derives from
laziness &/or ignorance
returning to the toothache & cavity problem
in the real world, the patient either does or does not have a cavity
a probabilistic agent makes statements with respect to the
knowledge state, & these may change as the state of knowledge
changes
for example, an agent initially may believe there's an 80% chance
(probability 0.8) that the patient with the toothache has a cavity,
but subsequently revises that as additional evidence is available
9
Rational Decisions
making choices among plans/actions when the
probabilities of their success differ
this requires additional knowledge of preferences among
outcomes
this is the domain of utility theory: every state has a degree
of utility/usefulness to the agent & the agent will prefer those
with higher utility
utilities are specific to an agent, to the extent that they can even
encompass perverse or altruistic preferences
Quantifying Uncertainty
10
Rational Decisions
making choices among plans/actions when the
probabilities of their success differ
we can combine preferences (utilities) + probabilities to get a
general theory of rational decisions: Decision Theory
a rational agent chooses actions to yield the highest expected
utility averaged over all possible outcomes of the action
this is the Maximum Expected Utility (MEU) principle
expected = average of the possible outcomes of an action weighted
by their probabilities
choice of action = the one with highest expected utility
Quantifying Uncertainty
Revising Belief States
11
belief states
in addition to the possible world states that we included before,
belief states now include probabilities
the agent incorporates probabilistic predictions of action
outcomes, selecting the one with the highest expected utility
AIMA3e chapters 13 through 17 address various aspects of using
probabilistic representations
an algorithmic description of the Decision Theoretic Agent
function DT-AGENT (percept) returns an action
persistent: belief-state, probabilistic beliefs about the current state of the world
action, the agent's action
update belief-state based on action and percept
calculate outcome probabilities for actions,
given action descriptions and current belief state
select action with the highest expected utility,
given probabilities of outcomes and utility information
return action
12
Notation & Basics
we should interpret probabilities as describing
possible worlds and their likelihoods
the sample space is the set of all possible worlds
note that possible worlds are mutually exclusive & exhaustive
for example, a roll of a pair of dice has 36 possible worlds
we use the Greek letter omega to refer to possible worlds
refers to the sample space, to its elements (particular
possible worlds)
a basic axiom for probability theory
(13.1) 0 P() 1,
P() 1
as an example, for the dice rolls, each possible world is a pair
(1, 1), (1, 2), ..., (6, 6)
each with a probability of 1/36, all summing to 1
Quantifying Uncertainty
13
Notation & Basics
assertions & queries in probabilistic reasoning
these are usually about sets of possible worlds
these are termed events in probability theory
for AI, the sets of possible worlds are described by
propositions in a formal language
the set of possible worlds corresponding to a proposition
contains those in which the proposition holds
the probability of the proposition is the sum over those
possible worlds
Quantifying Uncertainty
Propositions
14
propositions
another axiom of probability theory, using the Greek letter phi
() for proposition
(13.2) P()
P()
so for a fair pair of dice P(total = 7) =
P((1+6))+P((6+1))+P((2+5))+P((5+2))+P(((3+4))+P((4+3))
=1/36+1/36+1/36+1/36+1/36+1/36 = 1/6
asserting the probability of a proposition constrains the
underlying probability model without fully determining it
Propositions
15
propositions: unconditional & conditional probabilities
P(total = 7) from the previous slide & similar probabilities are
called unconditional or prior probabilities, sometimes
abbreviated as priors
they indicate the degree of belief in propositions without any other
information, though in most cases, we do have other information,
or evidence
when we have evidence, the probabilities are conditional or
posterior, given the evidence
Conditional Probabilities
calculating conditional probabilities
in terms of unconditional probabilities, for propositions a & b
P (a b)
notation & formula:
P (a | b)
P ( b)
intuitively, observing b excludes the possible worlds where b is
false, so with a total probability P(b), within which the worlds
where a is true satisfy a b and are the fraction P(a b)/P(b)
an alternative formulation of the conditional rule is:
axiom (13.3) P(a b) = P(a | b)P(b)
this is the product rule form
16
Random Variables
17
more terminology & notation
for chapters 13 & 14, propositions for sets of possible worlds
use notation that combines aspects of propositional logic &
constraint satisfaction - a factored representation in which a
possible world is represented as a set of variable + value pairs
for example: Weather = sunny
variables in probability are called random variables
as a convention, their names begin with an UC letter, & each has a
domain of all its possible values
for the Weather example, say {sunny, rain, cloudy, snow}
Random Variables & Values
18
propositions in our probability notation
by convention, the values for random variables use lower case
letters, for example Weather = rain
each random variable has a domain, its set of possible values
for a Boolean random variable the domain is {true, false}
also by convention, A = true is written as simply a, A = false as
¬a
domains also may be arbitrary sets of tokens, like the {red, green,
blue} of the map coloring CSP or {juvenile, teen, adult} for Age
when it's unambiguous, a value by itself may represent the
proposition that a variable has that value
for example, using just sunny for Weather = sunny
Random Variables & Values
19
propositions in our probability notation
more background & notation
domains may be infinite (like the integers)
domains may be continuous (something like temperature)
for ordered domains inequality notation is allowed: val1 < val2
finally, we use propositional logic connectives to combine
elementary propositions: P(cavity | ¬toothache teen) = 0.1
20
Distribution Notation
bold is used as a notational coding
for the probabilities of all possible values of a random variable
we may list the propositions or we may abbreviate, given an
ordering on the domain
as in the ordering (sunny, rain, cloudy, snow) for Weather
then P(Weather) = <0.6, 0.1, 0.29, 0.01>, where bold indicates
there's a vector of values
this defines a probability distribution for the random variable
Weather
we can use a similar shorthand for conditional distributions, for
example:
P(X|Y) lists the values for P(X=xi | Y=yj) for all i,j pairs
Quantifying Uncertainty
Continuous Variables
21
distributions are
the probabilities of all possible values of a random variable
there's alternative notation for continuous variables where there
cannot be an explicit list: instead, express the distribution as a
parameterized function of value
for example, P(NoonTemp=x) = Uniform[18C,26C] (x) specifies a
probability density function (pdf) that defines density function
values for intervals of the NoonTemp variable values
AIMA3e uses the same notation for discrete distributions & density
functions, P, since confusion about what is intended is unlikely
note that while probabilities are unitless, density functions are
measured with a unit, reciprocal degrees in the temperature example
above
Distribution Notation
22
for distributions on multiple variables
we use commas between the variables: so P(Weather, Cavity)
denotes the probabilities of all combinations of values of the 2
variables
for discrete random variables we can use a tabular
representation, in this case yielding a 4x2 table of probabilities
this gives the joint probability distribution of Weather & Cavity
tabulates the probabilities for all combinations
Distribution Notation
for distributions on multiple variables
the notation also allows mixing variables & values
P(sunny, Cavity) is just a 2-vector of probabilities
the distribution notation, P, allows compact expressions
for example, here are the product rules for all possible
combinations of Weather & Cavity
P(Weather, Cavity) = P(Weather | Cavity)P(Cavity)
the distribution notation summarizes what otherwise would be 8
separate equations each of the form
P(W = sunny C = true) = P(W = sunny | C = true)P(C = true)
23
Full Joint Distribution
24
now we fill in some details
of the semantics of the probability of a proposition as the sum
of probabilities for the possible worlds in which it holds
possible worlds are analogous to those in propositional logic
each possible world is specified by an assignment of values to all of
the random variables under consideration
for the random variables Cavity, Toothache & Weather
there are 16 possible worlds (2x2x4) & the value of a given
proposition is determined in the same recursive fashion as for
formulas in propositional logic
Full Joint Distribution
25
semantics of a proposition
the probability model is determined by the joint distribution for
all the random variables: the full joint probability distribution
for the Cavity, Toothache, Weather domain, the notation is:
P(Cavity, Toothache, Weather)
this can be represented as a 2x2x4 table
given the definition of the probability of a proposition as a sum
over possible worlds, the full joint distribution allows calculating
the probability of any proposition over its variables by summing
entries in the FJD
Probability Axioms
26
we can derive
some additional relationships for degrees of belief among
logically related propositions, from axioms 13.1 & 13.2 &
some algebraic manipulation
for example, P(¬a) = 1 - P(a), the relationship between the
probability of a proposition & its negation
and also axiom (13.4) P(a b) = P(a) + P(b) - P(a b)
this axiom for the probability of a disjunction is referred to as the
inclusion-exclusion principle
(13.1) 0 P() 1,
P() 1
(13.4) P(a b) = P(a) + P(b) - P(a b)
together, 13.1 & 13.4 are referred to as Kolmogorov's axioms,
from which the Russian mathematician derived all of probability
theory, including issues related to handling continuous variables
Is Probability the Answer?
27
historically
there's been a debate over whether probabilities are the only
viable mechanism for describing degrees of belief
the degree of belief in a proposition can be reformulated as
betting odds for establishing amounts of wagers on outcomes
of events
deFinetti (1931, 1993) proved that if an agent's set of
degrees of belief are inconsistent with the probability axioms,
then when formulated as bets on outcomes of events, there is
a combinations of bets by an opposing agent that will cause
the agent to lose money every time
Quantifying Uncertainty
Rationality & Probability Axioms
28
apparently then
no rational agent will have beliefs that violate the axioms of
probability
a common rebuttal to this argument is that betting is a poor
metaphor & the agent could just refuse to bet
which itself is countered by pointing out that betting is just a
model for the decision-making that goes on, inevitably, all the
time
other authors have constructed similar arguments to support
those of deFinetti
furthermore, in the "real world", AI reasoning systems based
on probability have been highly successful
Quantifying Uncertainty
29
Don't Mess with the Probability Axioms
from Table 13.2
evidence for the rationality of probability
Agent 1
Agent 2
Outcomes & Payoffs to Agent 1
Proposition
Belief
Bet
Stakes
a, b
a, ¬b
¬a, b
¬a, ¬b
a
0.4
a
4 to 6
-6
-6
4
4
b
0.3
b
3 to 7
-7
3
-7
3
ab
0.8
¬(a b)
2 to 8
2
2
2
-8
-11
-1
-1
-1
Agent 1's inconsistent beliefs allow Agent 2 to set up bets to guarantee Agent 1
loses, independent of the outcome of a and b
so, for example, Agent 1's degree of belief in a is 0.4, so will bet
"against" it & pay 6 to Agent 2 if a is the outcome, receive 4 from
Agent 2 if it is not, and so on
Quantifying Uncertainty
30
Inference With Probability
using the full joint distributions for inference
here's the FJD for the Toothache, Cavity, Catch domain of 3
Boolean variables
as required by the axioms, the probabilities sum to 1.0
when available, the FJD gives a direct means of calculating
the probability of any proposition
just sum the probabilities for all the possible worlds in which the
proposition is true
Quantifying Uncertainty
Full Joint Distribution & Inference
31
an example of using the FJD for inference
to calculate: P(cavity toothache)
cavity toothache holds for 6 possible worlds
the corresponding sum is:
0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064
= 0.28
so P(cavity toothache) = 0.28
Quantifying Uncertainty
Full Joint Distribution & Inference
32
using the full joint distributions for inference
a common task is to state the distribution over a single variable
or a subset of variables: sum over the other variables to get
the unconditional or marginal probability
for example, P(cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2
the terminology for this is: "marginalization" or "summing out"
it takes other variables out of the equation
for sets of variables Y and Z:
P(Y) P(Y, z)
zZ
zZ means to sum over all the possible combinations of values
of the set of variables Z
Full Joint Distribution & Inference
33
using the full joint distributions for inference
a variant considers conditional probabilities instead of joint
probabilities uses the product rule, referred to as conditioning
P(Y) P(Y | z)P(z)
z
the common scenario is to want conditional probabilities of
some variable given evidence about others
use the Product Rule (13.3) P(a | b)=P(a b) / P(b) to get an
expression in terms of unconditional probabilities, then sum
appropriately in the FJD
for example: the probability of a cavity, given evidence of a toothache
P(cavity | toothache) = P(cavity toothache) / P(toothache)
= (0.108 + 0.012) / (0.108 + 0.012 + 0.016 + 0.064) = 0.6
Full Joint Distribution & Inference
34
as a check we might
compute the probability of no cavity, given a toothache
P(cavity | toothache) = P(cavity toothache) / P(toothache)
= (0.016 + 0.064) / (0.108 + 0.012 + 0.016 + 0.064) = 0.4
as they should, the probabilities sum to 1.0
we note that P(toothache) is the denominator for both, & as
part of the calculation of both values for cavity, can be viewed
as a normalization constant for the distribution
P(Cavity | toothache) terms both have P(toothache) as
denominator ensuring they sum to 1
Quantifying Uncertainty
35
Normalization Constant
note that P(toothache) was the denominator
for calculating both conditional probabilities
it functions as a normalization constant for the distribution
P(Cavity | toothache), ensuring the probabilities add to 1
in AIMA, this constant is denoted by and we use it to mean
a normalizing constant , where probabilities must add to 1
since the sum for the distribution must be 1, we can just sum
the raw values obtained and then use 1/sum for
this may make calculations simpler, and might even allow
them when some probability assessment is not available
Quantifying Uncertainty
36
Normalization Constant
an example of using the normalization constant
P(Cavity | toothache) = P(Cavity, toothache) / P(toothache)
= P(Cavity, toothache)
= [P(Cavity, toothache, catch)+P(Cavity, toothache, catch)]
= [<0.108, 0.016> + <0.012, 0.064>]
= <0.12, 0.08>
= <0.6, 0.4>
since the probabilities must add to 1.0, the calculation can be
done without knowing , just normalizing at the end
Quantifying Uncertainty
37
Generalization of Inference
given a query, the generalized version of the process
for a conditional probability distribution is:
for a single variable X (Cavity in the preceding example), let E
be the list of evidence variables (just Toothache in the
example) and e the list of observed values for them, and Y
the unobserved variables (Catch in the example)
the query: P(X | e) is calculated by summing out over the
unobserved variables
(13.9) P(X | e) = P(X, e) = yP(X, e, y)
Quantifying Uncertainty
38
Inference for Probability
given the full joint distribution & 13.9
we can answer all probability queries for discrete variables
are we left with any any unresolved issues?
well, given n variables, and d as an upper bound on the number
of values then the full joint distribution table size &
corresponding processing of it are O(dn), exponential in n
since n might be 100 or more for real problems, this is often
simply not practical
as a result, the FJD is not the implementation of choice for
real systems, but functions more as the theoretical reference
point (analogous to role of truth tables for propositional logic)
the next sections we look at are foundational for developing
practical systems
Quantifying Uncertainty
Efficiency Through Independence
39
consider a new version of our example domain
now defined in terms of 4 random variables
Toothache, Catch, Cavity, Weather
so P(Toothache, Catch, Cavity, Weather) has a FJD with
2x2x2x4=32 entries
one way to display it would be as four 2x2x2 tables, 1 for
each value of Weather
how are they related?
for example:
P(toothache, catch, cavity, cloudy)
& P(toothache, catch, cavity)
Quantifying Uncertainty
Efficiency Through Independence
40
in the 4-variable domain
what is the relationship between
P(toothache, catch, cavity, cloudy) & P(toothache, catch, cavity)
given what we know about relating probabilities (the product
rule)
P(toothache, catch, cavity, cloudy)
= P(cloudy | toothache, catch, cavity) P(toothache,catch,cavity)
but we "know" that dental problems don't influence the weather
& we know weather doesn't seem to influence dental variables
so
P(cloudy | toothache, catch, cavity) = P(cloudy)
P(toothache, catch, cavity, cloudy) = P(cloudy) P(toothache, catch, cavity)
& similarly for each entry in P(Toothache, Catch, Cavity, Cloudy)
thus the 32 element table for 4 variables reduces to an 8
element table & a 4 element table
Quantifying Uncertainty
41
Independence
the property of independence
or marginal independence or absolute independence
notationally, in terms of propositions or random variables, is:
P(a|b) = P(a) or P(b|a) = P(b) or P(a b) = P(a) P(b)
P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B)
from our knowledge of the domain, we can simplify the full
joint distribution, dividing variables into independent subsets
with separate distributions
as an example, for the Dentistry-Weather domain
Quantifying Uncertainty
42
Independence
absolute independence
while very powerful for simplifying probability representation
& inference absolute independence is unfortunately rare
though, for example, for n independent coin tosses
P(C1, …, Cn), the full joint distribution with 2n entries becomes n
single variable distributions P(Ci)
and while
this is an artificial example and the converse is more likely the
case for real domains
that is, within a large domain like dentistry there are likely
dozens of diseases & hundreds of symptoms, all interrelated
Quantifying Uncertainty
43
Bayes' Rule
from the Product Rule, for propositions a & b
P(a b)= P(a | b) P(b), or alternatively
= P(b | a) P(a)
we can derive Bayes' Rule for conditional probabilities
equate the alternative RHSs & divide by P(b) to yield Bayes' rule
P(a | b) = P(b | a) P(a) / P(b)
in the general case of multivalued variables, in distribution form
P(Y|X) = P(X|Y) P(Y) / P(X)
representing the set of equations, each for specific values of the
variables
& finally, a version indicating conditionalizing on background
evidence e
P(Y | X, e) = P(X | Y, e) P(Y | e) / P(X | e)
Quantifying Uncertainty
44
Bayes' Rule
Bayes' rule
is the basis of most AI systems of probabilistic inference
we are often able to estimate the 3 RHS probabilities & so
compute the LHS
finding diagnostic probability from causal probability
P(effect|cause) specifies relationship in causal direction
P(cause|effect) describes diagnostic direction
P(cause|effect) = P(effect|cause) P(cause) / P(effect)
in the medical domain, it is common to have conditional
probabilities on causal relationships
P(symptoms | disease)
Quantifying Uncertainty
45
Bayes' Rule
Bayes' rule: a medical example
P(cause | effect) = P(effect | cause) P(cause) / P(effect)
here's a medical domain example
a patient presents with a stiff neck, a known symptom of the
disease meningitis
the physician "knows" the prior probabilities of stiff neck (P(s) =
0.01) & meningitis (P(m) = 0.00002)
in addition the physician knows that 70% of patients with
meningitis have a stiff neck: P(s|m) = 0.7
P(m|s) = P(s|m) P(m) / P(s)
= 0.7 × 0.00002 / 0.01
= 0.0014
Quantifying Uncertainty
46
Bayes' Rule Example
Bayes' rule & the meningitis example
P(m|s) = P(s|m) P(m) / P(s)
= 0.7 × 0.00002 / 0.01
= 0.0014
so, we should expect only 1 in 700 patients with a stiff neck
to have meningitis, reflecting the much higher prior
probability of stiff neck than of meningitis
note: normalization can be applied when using Bayes' Rule
P(Y|X) = <P(X|Y)P(Y)>
where is a normalization constant so entries in P(Y|X) sum to 1
Quantifying Uncertainty
Bayes' Rule: n Evidence Variables
47
Bayes' rule & the dental diagnosis: scaling up
for the combining of evidence from multiple
sources/variables, how does use of Bayes' Rule scale up,
compared to using the FJD?
the sample problem:
what does the dentist conclude about a cavity when the
patient has a toothache & the probe catches in the sore tooth
(13.16)
P(Cavity | toothache catch) = P(toothache catch | Cavity) P(Cavity)
there's not an issue with just 2 sources, but if there are n,
then we have 2n possible combinations of observed values &
we need to know the conditional probabilities for each (no
better than needing the full joint distribution)
Quantifying Uncertainty
Bayes' Rule: n Evidence Variables
48
Bayes' rule & the dental diagnosis: scaling up
we return to the idea of independence
in the example, Toothache & Catch are not absolutely
independent, but are independent given either the presence or
absence of a cavity (each is caused by the cavity but otherwise
they are independent)
expressing the conditional independence given Cavity we get
(13.17)
P(toothache catch | Cavity) = P(toothache | Cavity) P(catch | Cavity)
(13.16)
P(Cavity | toothache catch) = P(toothache catch | Cavity) P(Cavity)
substituting into 13.16 yields the following, reflecting the
conditional independence of Toothache and Catch
P(Cavity | toothache catch) = P(toothache | Cavity) P(catch | Cavity) P(Cavity)
Quantifying Uncertainty
Conditional Independence
49
the general form of the conditional independence rule
here are the most general & for the dental diagnosis domain
P(X,Y|Z) = P(X|Z) P(Y|Z)
(13.19) P(Toothache, Catch|Cavity) = P(Toothache|Cavity) P(Catch|Cavity)
conditional independence also allows decomposition
for the dental problem, algebraically, given 13.19, we have
P(Toothache, Catch, Cavity)
= P(Toothache, Catch |Cavity) P(Cavity) (product rule)
= P(Toothache|Cavity) P(Catch|Cavity) P(Cavity) (13.19)
Conditional Independence
50
implications of the conditional independence rule
P(Toothache, Catch, Cavity)
= P(Toothache, Catch |Cavity) P(Cavity) (product rule)
= P(Toothache|Cavity) P(Catch|Cavity) P(Cavity) (3.19)
we decompose the original large table, which has 23 – 1 = 7
independent entries, into 3 smaller tables
2 of the tables are of the form P(T|C) with 2 rows, each of which
must sum to 1 so has 1 independent number
1 table with 1 row for the prior distribution P(C) so having 1 more
independent number
for our Toothache, Catch, Cavity domain, we've gone from 7 to
5 independent values in total, a small gain for a small problem
but if there were n symptoms, all conditionally independent given
Cavity, the size of the resulting representation would be linear in n
instead of exponential
51
Conditional Independence
summary: conditional independence
allows scaling up to real problems since the representational
complexity can go from exponential to linear
is more often applicable than absolute independence
assertions
yields this net gain: the decomposition of large domains into
weakly connected subsets
is illustrated in a prototypical way by the dental domain: one
cause influences multiple effects, which are conditionally
independent, given that cause
Quantifying Uncertainty
52
Conditional Independence
summary: conditional independence
with multiple effects, which are conditionally independent,
given the cause, the full joint distribution then is rewritten as
P(Cause, Effect1, ..., Effectn) = P(Cause) i P(Effecti | Cause)
this is called the naïve Bayes model
it makes the simplifying assumption that all effects are
conditionally independent
it is naïve in that it is applied to many problems although the
effect variables are not precisely conditionally independent given
the cause variable
nevertheless, such systems often work well in practice
Quantifying Uncertainty
53
A Return to Wumpus World
recall the Wumpus World agent
the agent explores the grid world to grab the gold while
attempting to avoid being eaten by the Wumpus or falling into
a bottomless Pit
we used propositional logic for representation & inference
now we'll explore an example
that uses probability in Wumpus World
we'll simplify by restricting our WW hazards only to Pits
recall that
1. the percept of a breeze in a square indicates a pit in a
neighbouring square
2. the logical representation allowed some conclusions about
whether a square was safe but not a quantitative measure of risk
if not absolutely safe
the "is it safe" problem can be reformulated to use our new
probability tools
Quantifying Uncertainty
54
Wumpus World Revisited
the world
incomplete information about the presence of Pits leads to
uncertainty, & the agent should choose the best next move
here are the Random Variables in the problem
one per square, Pij = true iff [i,j] contains a pit
one per observed square, Bij = true iff [i,j] is breezy
the agent has visited only [1,1], [1,2], [2,1]
so we include only B1,1, B1,2, B2,1 in the probability model
Quantifying Uncertainty
Probabilities in Wumpus World
55
we begin with the full joint distribution
P(P1,1, ...,P4,4, B1,1, B1,2, B2,1)
applying the product rule yields
P(B1,1, B1,2, B2,1 | P1,1, ..., P4,4)P(P1,1, ..., P4,4)
1st term: the conditional probability of a breeze configuration
given a pit configuration (P(Effect | Cause))
Bi,j values in the first term are 1 if adjacent to a pit, 0 otherwise
2nd term: the prior probability of a pit configuration
pits are placed randomly, independent of each other, with
probability 0.2 for any square, so
4, 4
(13.20) P(P1,1, ..., P4,4) =
P(P
i , j1,1
i, j
)
which, for a particular configuration that has n pits is
P(P1,1, ..., P4,4) = 0.2n x 0.816-n
Quantifying Uncertainty
Probabilities in Wumpus World
in the example, we have
observed evidence
56
the example
a breeze or not in each
visited square + no pit in any
visited square, abbreviated
as b & known:
b = ¬b1,1 b1,2 b2,1
known = ¬p1,1 ¬p1,2 ¬p2,1
an example query concerns
the safety of other squares:
what's the probability of a pit
at [1,3], given the evidence
so far?
P(P1,3 |known, b)
we could answer by summing
over cells in the FJD
Quantifying Uncertainty
Probabilities in Wumpus World
57
to use summation over the FJD
let Unknown be the set of Pi,j variables for squares other than
Known & [1,3]
so from 13.9 we have
P(P1,3 |known, b) = unknownP(P1,3, unknown, known, b)
that is, we can just sum over the entries in the Full Joint
Distribution but with 12 unknown squares we have 212 terms in
the summation, so the calculation is exponential in the number
of squares
so we'll need to simplify from insight about independence
we note: not all unknown squares are equally relevant to the query
Probabilities in Wumpus World
since summations over the FJD are exponential
we need to simplify, given insight about independence
to begin, we note that not all unknown squares are equally
relevant to the query
first, some terminology about partitioning the pit variables
frontier are those pit variables (besides the query variable)
neighbouring the visited squares
other are the remaining pit variables
with this revision, we see that the observed breezes are
conditionally independent of the other variables, given the
known, frontier & query variables
58
Probabilities in Wumpus World
59
using conditional independence
P(b|P1,3, Known, Unknown) = P(b|P1,3, Known, Frontier)
note that the figures use Fringe, while the text uses Frontier to name
the relevant squares neighbouring the visited squares ([2,2] & [3,1])
then we'll need to manipulate our query into a form where we
can use this
the query: P(P1,3 |known, b) = unknownP(P1,3, unknown, known, b)
the world:
Using Conditional Independence
using the conditional independence simplification
P(P1,3 |known, b) = unknownP(P1,3, known, b, unknown) (the query, from 13.9)
then by the product rule
= unknownP(b | P1,3, known, unknown) P(P1,3, known, unknown)
then partitioning unknown into frontier & other
= frontierotherP(b | known, P1,3, frontier, other) P(P1,3, known, frontier, other)
then using the conditional independence of b from other
given known, P1,3 & frontier (& so dropping other from first term)
= frontierotherP(b | known, P1,3, frontier) P(P1,3, known, frontier, other)
since the 1st term now does not depend on other, move the
summation inward
= frontierP(b | known, P1,3, frontier) otherP(P1,3, known, frontier, other)
60
Using Conditional Independence
manipulating the query to get efficient computation
we began with
P(P1,3 |known, b) = unknownP(P1,3, unknown, known, b) (the query, from 13.9)
so far we have
= frontierP(b | known, P1,3, frontier) otherP(P1,3, known, frontier, other)
use independence as in 13.20 to factor the prior term
= frontierP(b | known, P1,3, frontier) otherP(P1,3) P(known) P(frontier) P(other)
then reorder terms
= P(known) P(P1,3) frontierP(b | known, P1,3, frontier) P(frontier) other P(other)
fold P(known) into the normalizing constant
& use other P(other) = 1
= P(P1,3) frontierP(b | known, P1,3, frontier) P(frontier)
61
Probabilities in Wumpus World
62
using conditional independence & independence
has yielded an expression with just 4 terms in the summation
over the frontier variables P2,2 & P3,1 eliminating other squares
P(P1,3 |known, b) = P(P1,3) frontierP(b | known, P1,3, frontier) P(frontier)
the expression P(b | known, P1,3, frontier) is 1 when the
frontier is consistent with the breeze observations, 0 otherwise
so to get each value of P1,3 we sum over the logical models for
frontier variables that are consistent with known facts
this figure shows the models & the associated priors P(frontier)
Probabilities in Wumpus World
sum over the logical models for frontier variables that
are consistent with known facts
P(P1,3 |known, b) = P(P1,3) frontier P(frontier)
P(P1,3 |known, b) = ´<0.2(0.04 + 0.16 + 0.16), 0.8(0.04 + 0.16)>
<0.31, 0.69>
consistent models for frontier variables P2,2 & P3,1with P(frontier)
for each model, for P1,3 = true & P1,3 = false
63
Using Conditional Independence
note that P1,3, P3,1 are symmetric
so by symmetry, [3,1] would contain a pit about 31% of the
time: P(P3,1 |known, b) = <0.31, 0.69>
& by a similar calculation, [2,2] can be shown to contain a pit
with about 0.86 probability: P(P2,2|known, b) <0.86, 0.14>
it is clear to the probabilistic agent where not to go next
64
Probabilities in Wumpus World
65
the logical agent & the probabilistic agent
strictly logical inferencing can only yield known safe/known
unsafe/unknown
the probabilistic agent knows which move is relatively safer,
relatively more dangerous
for efficient probabilistic solutions we can use independence &
conditional independence among variables to simplify the
summations involved
fortunately, these often match our natural understanding of how
the problem should be decomposed
our next topic considers
formal representations for these relationships
algorithms to operate on them to do efficient probabilistic
inferencing
Quantifying Uncertainty
66
Summary
uncertainty
is due to laziness &/or ignorance, and is unavoidable under
nondeterminism &/or partial observability
probabilities
describe an agent's inability to decide on the truth of a
sentence, summarizing belief relative to evidence
decision theory
combines beliefs & preferences, defining an optimal action as
one that maximizes expected utility
statements in probability
involve prior & conditional probabilities over propositions
axioms of probability
constrain the probabilities of propositions such that an agent
that ignores them behaves irrationally
Quantifying Uncertainty
67
Summary
the Full Joint Probability Distribution
specifies the probability for every assignment of values to
random variables, & when available allows summation over
possible worlds to answer queries, but has complexity
exponential in the number of variables
absolute independence
allows decomposition of a problem's random variables into
smaller joint distributions, reducing complexity, but is rare
Bayes' rule
allows computing probabilities typically of a cause, given an
effect, from known conditional probabilities, but does not
scale when there are many evidence variables
conditional independence
derives from shared causal relationships in the domain & may
allow factoring of the FJD into smaller conditional distributions
Quantifying Uncertainty
68
Summary
a naïve Bayes model
assumes conditional independence of all effects variables with
a single cause variable so complexity grows linearly with the
number of effects
in Wumpus World
by simplifying calculations via conditional independence the
agent may calculate probabilities for unobserved variables
and so do better than a purely logical agent
Quantifying Uncertainty