CMSC 671 Fall 2001 Class #20 – Thursday, November 8 Today’s class • Conditional independence • Bayesian networks – Network structure – Conditional probability tables – Conditional.

Download Report

Transcript CMSC 671 Fall 2001 Class #20 – Thursday, November 8 Today’s class • Conditional independence • Bayesian networks – Network structure – Conditional probability tables – Conditional.

CMSC 671
Fall 2001
Class #20 – Thursday, November 8
1
Today’s class
• Conditional independence
• Bayesian networks
– Network structure
– Conditional probability tables
– Conditional independence
– Inference in Bayesian networks
2
Conditional
Independence and
Bayesian Networks
Chapters 13/14 (2/e)
3
Conditional independence
• Last time we talked about absolute independence:
– A and B are independent if P(A  B) = P(A) P(B); equivalently,
P(A) = P(A | B) and P(B) = P(B | A)
• A and B are conditionally independent given C if
– P(A  B | C) = P(A | C) P(B | C)
• This lets us decompose the joint distribution:
– P(A  B  C) = P(A | C) P(B | C) P(C)
• In the example from last time, Moon-Phase and Burglary
are conditionally independent given Light-Level
• Conditional independence is weaker than absolute
independence, but still useful in decomposing the full joint
probability distribution
4
Bayesian Belief Networks (BNs)
• Definition: BN = (DAG, CPD)
– DAG: directed acyclic graph (BN’s structure)
• Nodes: random variables (typically binary or discrete, but
methods also exist to handle continuous variables)
• Arcs: indicate probabilistic dependencies between nodes
(lack of link signifies conditional independence)
– CPD: conditional probability distribution (BN’s parameters)
• Conditional probabilities at each node, usually stored as a table
(conditional probability table, or CPT)
P ( xi |  i ) where i is theset of all parentnodesof xi
– Root nodes are a special case – no parents, so just use priors
in CPD:
 i  , so P( xi |  i )  P( xi )
5
Example BN
P(A) = 0.001
a
P(B|A) = 0.3
P(B|~A) = 0.001
b
P(C|A) = 0.2
P(C|~A) = 0.005
c
d
P(D|B,C) = 0.1
P(D|B,~C) = 0.01
P(D|~B,C) = 0.01
P(D|~B,~C) = 0.00001
e
P(E|C) = 0.4
P(E|~C) = 0.002
Note that we only specify P(A) etc., not P(¬A), since they have to add to one
6
Topological semantics
• A node is conditionally independent of its nondescendants given its parents
• A node is conditionally independent of all other nodes in
the network given its parents, children, and children’s
parents (also known as its Markov blanket)
• The method called d-separation can be applied to decide
whether a set of nodes X is independent of another set Y,
given a third set Z
7
Independence and chaining
• Independence assumption
– P ( xi |  i , q)  P ( xi |  i )
i
where q is any set of variables
q
(nodes) other than x i and its successors
xi
–  i blocks influence of other nodes on x i
and its successors (q influences x i only
through variables in  i )
– With this assumption, the complete joint probability distribution of all
variables in the network can be represented by (recovered from) local
CPD by chaining these CPD
P ( x1 ,..., xn )  ni1 P ( xi |  i )
8
Chaining: Example
a
b
c
d
e
Computing the joint probability for all variables is easy:
P(a, b, c, d, e)
= P(e | a, b, c, d) P(a, b, c, d)
by Bayes’ theorem
= P(e | c) P(a, b, c, d)
by indep. assumption
= P(e | c) P(d | a, b, c) P(a, b, c)
= P(e | c) P(d | b, c) P(c | a, b) P(a, b)
= P(e | c) P(d | b, c) P(c | a) P(b | a) P(a)
9
Direct inference with BNs
• Now suppose we just want the probability for one variable
• Belief update method
• Original belief (no variables are instantiated): Use prior
probability p(xi)
• If xi is a root, then P(xi) is given directly in the BN (CPT at
Xi)
• Otherwise,
– P(xi) = Σ πi P(xi | πi) P(πi)
• In this equation, P(xi | πi) is given in the CPT, but computing
P(πi) is complicated
10
Computing πi: Example
a
b
•
•
•
•
c
d
e
P (d) = P(d | b, c) P(b, c)
P(b, c) = P(a, b, c) + P(¬a, b, c)
(marginalizing)
= P(b | a, c) p (a, c) + p(b | ¬a, c) p(¬a, c)
(product rule)
= P(b | a) P(c | a) P(a) + P(b | ¬a) P(c | ¬a) P(¬a)
If some variables are instantiated, can “plug that in” and
reduce amount of marginalization
Still have to marginalize over all values of uninstantiated
parents – not computationally feasible with large networks
11
Representational extensions
• Compactly representing CPTs
– Noisy-OR
– Noisy-MAX
• Adding continuous variables
– Discretization
– Use density functions (usually mixtures of Gaussians) to build
hybrid Bayesian networks (with discrete and continuous variables)
12
Inference tasks
• Simple queries: Computer posterior marginal P(Xi | E=e)
– E.g., P(NoGas | Gauge=empty, Lights=on, Starts=false)
• Conjunctive queries:
– P(Xi, Xj | E=e) = P(Xi | e=e) P(Xj | Xi, E=e)
• Optimal decisions: Decision networks include utility
information; probabilistic inference is required to find
P(outcome | action, evidence)
• Value of information: Which evidence should we seek next?
• Sensitivity analysis: Which probability values are most
critical?
• Explanation: Why do I need a new starter motor?
13
Approaches to inference
• Exact inference
– Enumeration
– Variable elimination
– Clustering / join tree algorithms
• Approximate inference
–
–
–
–
–
–
Stochastic simulation / sampling methods
Markov chain Monte Carlo methods
Genetic algorithms
Neural networks
Simulated annealing
Mean field theory
14