Transcript Slide 1

Intro to Junction Tree
propagation and
adaptations for a Distributed
Environment
Thor Whalen
Metron, Inc.
This naive approach of updating the
network inherits oscillation problems!
5
4
a
conflict
d
7 2
3 6
c
b
conflict
1
8
Idea behind the Junction Tree
Algorithm
a
a
clever
c
b
d
bc
algorithm
d
abd
ace
a
ae
ad
b
c
d
e
ce
g
ade
ceg
de
eg
def
egh
h
f
Bayesian Network
• one-dim. random variables
• conditional probabilities
Secondary Structure/
Junction Tree
• multi-dim. random variables
• joint probabilities (potentials)
Variable Elimination
(General Idea)
• Write query in the form
P( X1 )  
Xk
 P( X
X3 X 2
i
| par ( X i ))
i
• Iteratively
– Move all irrelevant terms outside of innermost sum
– Perform innermost sum, getting a new term
– Insert the new term into the product
Eaxmple of Variable Elimination
• The “Asia” network:
Visit to
Asia
Tuberculosis
Smoking
Lung Cancer
Abnormality
in Chest
X-Ray
Bronchitis
Dyspnea
We are interested in P(d)
- Need to eliminate: v,s,x,t,l,a,b
S
V
L
T
B
A
X
Initial factors:
D
P(v) P(s) P(t | v) P(l | s) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
Brute force:
P(d )   P(v) P(s) P(t | v) P(l | s) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
v
s
x
t
l
a
b
Eliminate variables in order:
v s  x t l a b
S
V
L
T
B
A
Initial factors:
X
D
P(v) P(s) P(t | v) P(l | s) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
f v (t )   P(v) P(t | v)
v
[ Note: fv(t) = P(t)
In general, result of elimination is
not necessarily a probability term ]
 f v (t ) P(s) P(l | s) P(b | s) P(a | t, l ) P( x | a) P(d | a, b)
Eliminate variables in order:
v s  x t l a b
S
V
L
T
B
A
Initial factors:
X
D
P(v) P(s) P(t | v) P(l | s) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
 f v (t ) P(s) P(l | s) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
f s (b, l )   P(s) P(b | s) P(l | s)
s
[ Note: result of elimination
may be a function of several
variables ]
 f v (t ) f s (b, l ) P(a | t, l ) P( x | a) P(d | a, b)
Eliminate variables in order:
v s  x t l a b
S
V
L
T
B
A
X
Initial factors:
D
P(v) P(s) P(t | v) P(l | s) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
 f v (t ) P(s) P(l | s) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
 f v (t ) f s (b, l ) P(a | t , l ) P( x | a) P(d | a, b)
f x ( a )   P( x | a )
x
[ Note: fx(a) = 1 for
all values of a ]
 f v (t ) f s (b, l ) f x (a) P(a | t , l ) P(d | a, b)
Eliminate variables in order:
v s  x t l a b
S
V
L
T
B
A
Initial factors:
X
D
P(v) P(s) P(t | v) P(l | s) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
 f v (t ) P(s) P(l | s) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
 f v (t ) f s (b, l ) P(a | t , l ) P( x | a) P(d | a, b)
 f v (t ) f s (b, l ) f x (a) P(a | t , l ) P(d | a, b)
f t (a, l )   f v (t ) P(a | t , l )
t
 f s (b, l ) f x (a) ft (a, l ) P(d | a, b)
Eliminate variables in order:
v s  x t l a b
S
V
L
T
B
A
Initial factors:
X
D
P(v) P(s) P(t | v) P(l | s) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
 f v (t ) P(s) P(l | s) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
 f v (t ) f s (b, l ) P(a | t , l ) P( x | a) P(d | a, b)
 f v (t ) f s (b, l ) f x (a) P(a | t , l ) P(d | a, b)
 f s (b, l ) f x (a) ft (a, l ) P(d | a, b)
f l (a, b)   f s (b, l ) f t (a, l )
l
 fl (a, b) f x (a) P(d | a, b)
Eliminate variables in order:
v s  x t l a b
S
V
L
T
B
A
Initial factors:
X
D
P(v) P(s) P(t | v) P(l | s) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
 f v (t ) P(s) P(l | s) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
 f v (t ) f s (b, l ) P(a | t , l ) P( x | a) P(d | a, b)
 f v (t ) f s (b, l ) f x (a) P(a | t , l ) P(d | a, b)
 f s (b, l ) f x (a) ft (a, l ) P(d | a, b)
 fl (a, b) f x (a) P(d | a, b)
f a (b, d )   f l (a, b) f x (a) p(d | a, b)
a
 f a (b, d )
Eliminate variables in order:
v s  x t l a b
S
V
L
T
B
A
X
Initial factors:
D
P(v) P(s) P(t | v) P(l | s) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
 f v (t ) P(s) P(l | s) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
 f v (t ) f s (b, l ) P(a | t , l ) P( x | a) P(d | a, b)
 f v (t ) f s (b, l ) f x (a) P(a | t , l ) P(d | a, b)
 f s (b, l ) f x (a) ft (a, l ) P(d | a, b)
 fl (a, b) f x (a) P(d | a, b)
 f a (b, d )
f b (d )   f a (b, d )  f b (d )
b
Intermediate factors
In our previous example:
With a different ordering:
v s  x t l a b
a b x t v s l
f v (t )
f s (b , l )
f x (a)
f t (a, l )
f l (a, b)
f a (b , d )
f b (d )
ga (l,t,d,b, x)
gb (l,t,d, x,s)
gx (l,t,d,s)
gt (l,t,s,v)
gv (l,d,s)
gs (l,d)
L
T
B
A
X
gl (d)
Complexity is exponential in the size of these factors!

S
V
D
Notes about variable elimination
• Actual computation is done in the
elimination steps
• Computation depends on the order of
elimination
• For each query we need to compute
everything again!
– Many redundant calculations
Junction Trees
• The junction tree algorithm “generalizes”
Variable Elimination to avoid redundant
calculations
• The JT algorithm compiles a class of
elimination orders into a data structure that
supports the computation of all possible
queries.
Building a Junction Tree
DAG
Moral Graph
Triangulated Graph
Identifying Cliques
Junction Tree
Step 1: Moralization
a
a
a
b
c
g
b
c
g
b
c
g
d
e
h
d
e
h
d
e
h
f
f
G=(V,E)
1. For all w  V:
• For all u,vpa(w) add an edge e=u-v.
2. Undirect all edges.
f
GM
Step 2: Triangulation
a
a
b
c
g
b
c
g
d
e
h
d
e
h
f
f
GM
GT
Add edges to GM such that there is no cycle
with length  4 that does not contain a chord.
NO
YES
Step 2: Triangulation (cont.)
• Each elimination ordering triangulates the
graph, not necessarily in the same way:
A
A
B
A
A
A
B
C
B
C
B
C
B
C
D
E
D
E
D
E
D
E
F
G
F
G
F
G
F
G
C
D
E
F
H
H
H
H
A
A
A
A
B
C
B
C
B
C
B
C
D
E
D
E
D
E
D
E
F
G
F
G
F
G
F
G
G
H
H
H
H
H
Step 2: Triangulation (cont.)
• Intuitively, triangulations with as few fill-ins as
possible are preferred
– Leaves us with small cliques (small probability tables)
• A common heuristic:
Repeat until no nodes remain:
– Find the node whose elimination would require the
least number of fill-ins (may be zero).
– Eliminate that node, and note the need for a fill-in
edge between any two non-adjacent neighbors.
• Add the fill-in edges to the original graph.
Eliminate the vertex that requires least number of
edges to be added.
a
a
b
c
d
e
g
h
a
b
c
d
e
f
g
b
c
d
e
f
a
f
b
c
g
d
e
h
GM
a
a
b
c
b
d
e
d
h
g
f
c
a
a
f
e
vertex induced
removed clique
1
2
3
4
a
egh
ceg
def
ace
d
added
edges
a-e
e
e
GT
vertex induced
removed clique
5
6
7
8
b
d
e
a
abd
ade
ae
a
added
edges
a-d
-
Step 3: Junction Graph
• A junction graph for an undirected graph G
is an undirected, labeled graph.
• The nodes are the cliques in G.
• If two cliques intersect, they are joined in
the junction graph by an edge labeled with
their intersection.
a
a
a
b
c
g
b
c
g
b
c
g
d
e
h
d
e
h
d
e
h
f
f
Moral graph GM
Bayesian Network
G=(V,E)
a
abd
ad
f
Triangulated graph GT
a
ace
ae
e
def
e
e
e
c
c
eg
egh
Junction graph GJ (not complete)
g
g
ceg
d
de
a
b
ce
ade
a
d
e
d
e
e
e
e
seperators
e.g. ceg  egh = eg
f
Cliques
h
Step 4: Junction Tree
• A junction tree is a sub-graph of the junction
graph that
– Is a tree
– Contains all the cliques (spanning tree)
– Satisfies the running intersection property:
for each pair of nodes U, V, all nodes on the path
between U and V contain U  V
Running intersection?
All vertices C and sepsets S along the path between any
two vertices A and B contain the intersection AB.
Ex: A={a,b,d}, B={a,c,e}  AB={a}
C={a,d,e}{a}, S1={a,d}{a}, S2={a,e}{a}
abd
ace B
A
ae
ad
S1
ce
S2
ade
ceg
de
eg
def
egh
C
A few useful Theorems
• Theorem: An undirected graph is
triangulated if and only if its junction graph
has a junction tree
• Theorem: A sub-tree of the junction graph
of a triangulated graph is a junction tree if
and only if it is a spanning of maximal
weight (sum of number the of variables in
the domain of the link).
There are several methods to find MST.
Kruskal’s algorithm: choose successively a link of
maximal weight unless it creates a cycle.
a
abd
ad
ace
ae
def
e
e
ceg
e
e
ace
ae
ad
ce
ade
de
abd
eg
ce
ade
ceg
de
eg
def
egh
egh
Junction graph GJ (not complete)
Junction tree GJT
Colorful example
• Compute the elimination cliques
(the order here is f, d, e, c, b, a).
• Form the complete
junction graph over
the maximal
elimination cliques
and find a maximumweight spanning tree.
Principle of Inference
DAG
Junction Tree
Initialization
Inconsistent Junction Tree
Propagation
Consistent Junction Tree
Marginalization
P(V  v | E  e)
a
a
abd
a
b
c
c
g
g
d
d
e
d
e
e
e
e
h
f
In JT cliques
becomes
vertices
ace
ae
ad
ade
ce
GJT
ceg
de
eg
def
egh
sepsets
Ex: ceg  egh = eg
Potentials
DEFINITION: A potential A over a
set of variables XA is a function
that maps each instantiation of
xA into a non-negative real
number.
A joint probability is a special case
of a potential where  A(xA)=1.
Ex: A potential abc over
the set of vertices {a,b,c}.
Xa has four states, and
Xb and Xc has three
states.
The potentials in the junction tree are not consistent
with each other., i.e. if we use marginalization to get
the probability distribution for a variable Xu we will get
different results depending on which clique we use.
abd
ace
ae
ad
P(Xa) =
 ade
P(Xa) =
ce
ce
= (0.12, 0.33, 0.11, 0.03)
ade
ceg
de
eg
def
egh
de
= (0.02, 0.43, 0.31, 0.12)
 ace
The potentials might not even sum to one, so
they are not joint probability distributions.
Propagating potentials
Message Passing from clique A to clique B
1. Project the potential of A into SAB
2. Absorb the potential of SAB into B
Projection
Absorption
Global Propagation
1. COLLECT-EVIDENCE
2. DISTRIBUTE-EVIDENCE
messages 1-5
messages 6-10
Root
abd
ace
2
3
7
ae
ad
ce
6
9
5
ade
1
de
def
ceg
8
4
eg
egh
10
A priori distribution
global propagation

potentials are consistent

Marginalizations gives probability
distributions for the variables
Example: Create Join Tree
(this BN corresponds to an HMM with 2 time steps:
B
C
A
D
Junction Tree:
A,B
B
B,C
C
C,D
B
C
A
D
Example: Initialization
A,B
B
B,C
C
C,D
Variable
Associated
Cluster
A
A,B
A, B  P(B)
B
A,B
A,B  P(B)P( A | B)
C
B,C
B,C  P(C | B)
D
C,D
C , D  P(D | C)
Potential function
Example: Collect Evidence
• Choose arbitrary clique, e.g. B,C, where
all potential functions will be collected.
• Call recursively neighboring cliques for
messages:
• 1. Call A,B:
– 1. Projection onto B:
B  A, B   P( B) P( A | B)  P( B)
– 2. Absorption:
B,C
A
A
B
 B ,C old  P(C | B) P( B)  P( B, C )
B
Example: Collect Evidence
(cont.)
• 2. Call C,D:
– 1. Projection:
C  C , D   P( D | C )  1
– 2. Absorption:
B , C
A,B
B
D
D
C
 B ,C old  P( B, C )
C
B,C
C
C,D
Example: Distribute Evidence
• Pass messages recursively to neighboring
nodes
• Pass message from B,C to A,B:
– 1. Projection:
B  B,C   P( B, C )  P( B)
– 2. Absorption:
 A, B
C
C
B
P( B)
 A, B old  P( A, B)
B
P( B)
Example: Distribute Evidence
(cont.)
• Pass message from X1,X2 to X2,Y2:
– 1. Projection:
C  B,C   P( B, C )  P(C )
– 2. Absorption:
C , D
A,B
B
B
C
P(C )
 C , D old  P( D | C )
 P(C , D)
C
1
B
B,C
C
C,D
Netica’s Animal Characteristics BN
Subnet 1:
* JTnode 1: An,En
* JTnode 2: An,Sh
* JTnode 3: An,Cl
Subnet 2:
* JTnode 4: Cl,Yo
* JTnode 5: Cl,Wa
Subnet 3:
* JTnode 6: Cl,Bod