Transcript pps

Compiling Graphical Models
Adnan Darwiche
University of California, Los Angeles
UAI’06 Tutorial
Compilation: Historical
Motivation

Separate inference into two phases:




Offline: Compile model into a structure
Online: Use structure to answer queries
Goal: Push as much work into offline phase
to optimize online inference time
Best initial example:


Offline: Compile a Bayesian network into a jointree
Online: Use jointree to answer multiple queries
efficiently
Compilation: Modern
Motivation

Exploit model structure in inference:

Global structure:




Local structure:




Exhibited in model topology
Measured by treewidth
Exploited by most (non-compilation) algorithms
Exhibited in model parameters
Type 1: Determinism
Type 2: Context-specific independence
Local structure is best exploited in the
context of compilation: main theme
Compilation: Theoretical
Implications

Unifies inference paradigms




Variable elimination
Jointree (Tree clustering)
Conditioning
Compilation as a trace of classical
inference
Bayesian Networks
Battery Age
Alternator
Fan Belt
Charge Delivered
Battery
Fuel Pump
Starter
Battery Power
Fuel Line
Distributor
Gas
Local Knowledge
Spark Plugs
Gas Gauge
Radio
Lights
Engine Turn Over
Engine Start
Bayesian Networks
Battery Age
Alternator
Fan Belt
Charge Delivered
If Battery
Power = OK, then
Fuel Pump
Lights = ON (99%)
….
Battery
Starter
Battery Power
Lights
ON
OFF
.99
.01
Engine WEAK
Turn Over.20
.80
Battery
Power
Lights
Distributor
Gas
Gas Gauge
Radio
Fuel Line
OK
DEAD
0
1
Spark Plugs
Engine Start
Bayesian Networks
Battery Age
Alternator
Fan Belt
Charge Delivered
Battery
Fuel Pump
Starter
Fuel Line
Distributor
Gas
Battery Power
Spark Plugs
Gas Gauge
Radio
Lights
Engine Turn Over
Engine Start
Global Structure:
Treewidth w
O ( n exp( w ))
Local Structure:
CSI and Determinism
Battery Age
Alternator
Fan Belt
Charge Delivered
Battery
Fuel Pump
Starter
Fuel Line
Distributor
Gas
Battery Power
Spark Plugs
Gas Gauge
Radio
Lights
Engine Turn Over
Engine Start
Local Structure:
CSI and Determinism
Battery Age
Alternator
Fan Belt
Charge Delivered
Battery
Fuel Pump
Starter
Fuel Line
Distributor
Gas
Battery Power
Spark Plugs
Gas Gauge
Radio
Lights
Engine Turn Over
Context Specific Independence (CSI)
Engine Start
Local Structure:
CSI and Determinism
Battery Age
Alternator
Fan Belt
Charge Delivered
Battery
Fuel Pump
Starter
Fuel Line
Distributor
Gas
Battery Power
Lights
Spark Plugs
ON
OFF
Gas Gauge
Determinism
Battery
Power
Radio
OK
If Battery Power = Dead,
.99
then
= OFF
LightsLights Engine
Turn Over
WEAK
.20
DEAD
0
.01
.80
1
Engine Start
Today’s Models …

Characterized by:




Richness in local structure (determinism, CSI)
Massiveness in size (100,000’s variables not uncommon)
High connectivity (treewidth > 50, > 100)
Enabled by:


High level modeling tools: relational, first order
New application areas (synthesis):



Bioinformatics (e.g. linkage analysis)
Sensor networks
Exploiting local structure a must!
High Order Specifications:
Relational Models…
burglary(v)=0.005;
alarm(v)=(burglary(v):0.95,0.01);
calls(v,w)=
(neighbor(v,w):
(prankster(v)):
(alarm(w):0.9,0.05),
(alarm(w):0.9,0)),0);
alarmed(v)=
n-or{calls(w,v)|w:neighbor(w,v)}
Primula
Friends and Smokers
(Richardson & Domingos, 2004)




M individuals
Relations such as
smokes(p), cancer(p),
friend(p1,p2)
Logical constraints such
as: if one of p's friends
smokes, then p smokes.
Sample Query:
probability that given
person has cancer
M Network Treewidth*
params w
CNF
Vars
Cnf
Claus
1
34
3
12
31
4
1,552
13
414
1,390
7
7,714
36
1,995
6,916
10 21,760
70
5,565
19,525
13 46,930
118
11,934
42,133
16 86,464
172
21,912
77,656
19 143,602
244
36,309
129,01
22 221,584
316
55,935
199,11
25 323,650
412
81,600
290,87
28 453,040
528
114,114 407,21
29 502,802
560
126,614 451,96
Students
(Pasula & Russell, 2001)




P professors
S students
Various relations, such as
famous(p), wellfunded(p), success(s),
advises(p,s)
Sample Query:
probability a professor is
well-funded given
success of advised
students
Students- Network
Profs
params
Treewidth CNF
w
Vars
04-08
11,566
72
3,099
04-16
21,070
101
5,859
05-10
20,688
128
5,624
05-20
38,168
148
10,73
06-12
33,454
176
9,209
06-24
62,302
233
17,69
Genetic Linkage Analysis

Gene 1
Gene 2
Gene 3


Ordering genes on a
chromosome and
determining distance
between them
Useful for predicting
and detecting diseases
Associating functionality
of genes with their
location on the
chromosome
Pedigrees + Phenotype +
Genotype
DBNs from Speech Applications
Coding Networks
Tutorial Outline





Theoretical foundations
Online query answering algorithms
Offline compilation algorithms
Applications
Concluding remarks
Theoretical Foundations

Graphical Model (Bayesian, Markov Networks):


Compiled Model:


Is a Multi-Linear Function (MLF)
Is an Arithmetic Circuit (AC)
Compilation process:

Factoring MLF into AC
Multi-Linear Functions 
Arithmetic Circuits
f   a  b a  b | a   a  b  a  b | a   a  b a  b | a   a  b  a  b | a
A
Factoring
B
+
*
*
+
*
a
*
+
*
*
 a  b | a  b  b | a  b|a  b
*
 b |a
*
a
a
A Differential Approach to Inference in Bayesian Networks
JACM-03 (Darwiche)
Factoring Multi-linear
Functions (MLFs)
Arithmetic Circuit (AC)
*
Circuit Complexity:
Size of smallest AC that computes the MLF
+
MLF:
a + ad + abd + abcd
• An
A graphical
model
defines an MLF
MLF has an
exponential
of terms,
yet
it amay
be evidence
• number
Evaluating
the MLF
for
given
represented
by an AC
gives
the probability
of with
evidence
polynomial size!
• The inference problem can be formulated
as factoring the MLF of a graphical model
*
+
a
b
c
d
1
Graphical Models as MLFs
A
B
Pr(.)
true
true
.03
true
false
.27
false
true
.56
false
.14
false
false
false
Pr(a) = .03 + .27
= .3
Graphical Models as MLFs
A
B
Pr(.)
true
true
.03
true
false
.27
false
true
.56
false
.14
false
false
false
Pr(~b) = .27 + .14
= .41
Graphical Models as MLFs
A
B
Pr(.)
true
true
λ.03
a*λb * .03
true
false
.27
λa*
λ~b * .27
false
true
λ~a*
.56
λb * .56
false
λ~a*.14
λ~b* .14
false
false
false
F(λ~a, λ~b, λa, λb) =
.03λaλb + .27λaλ~b + .56λ~aλb + .14λ~a λ~b
F(λ~a, λ~b, λa, λb)
=.03λaλb + .27λaλ~b + .56λ~aλb + .14λ~a λ~b
Pr(a,~b)
= F(λ~a:0, λ~b:1, λa:1 , λb:0)
= .27
Pr(a)
= F(λ~a:0, λ~b:1, λa:1 , λb:1)
= .03+.27
θa
C
θc|a
B
θb|a
A
A B
C
Pr(.)
a
b
c
θa θb|a θc|a
a
b
~c θa θb|a θ~c|a
a
~b c
a
~b ~c θa θ~b|a θ~c|a
.
.
.
θa θ~b|a θc|a
…
θa
C
θc|a
B
θb|a
A
A B
C
Pr(.)
a
b
c
λa λb λc θa θb|a θc|a
a
b
a
~c λa λb λ~c θa θb|a θ~c|a
~b c
λa λ~b λc θa θ~b|a θc|a
a
~b ~c
.
.
.
λa λ~b λ~c θa θ~b|a θ~c|a
…
C
A
B
F=
λa λb λc θa θb|a θc|a +
λa λb λ~c θa θb|a θ~c|a +
λa λ~b λc θa θ~b|a θc|a +
λa λ~b λ~c θa θ~b|a θ~c|a
….
C
θa
A
θc|a
θb|a
D
θd|bc
B
F=
λa λb λc λd θa θb|a θc|a θd|bc +
λa λb λc λ~d θa θb|a θc|a θ~d|bc +
….
Each term has 2n variables (n indicators, n parameters)
Each variable has degree one (multi-linear function)
Multi-Linear Functions 
Arithmetic Circuits
f   a  b a  b | a   a  b  a  b | a   a  b a  b | a   a  b  a  b | a
A
Factoring
B
+
*
*
+
*
a
*
+
*
*
 a  b | a  b  b | a  b|a  b
*
 b |a
*
a
a
Online Query Answering

Complexity:


Time and space linear in the AC size
Queries:




Probability of evidence, with
evidence flipping/fast retraction
Variable and family marginals
MPE: most probable explanation
Sensitivity analysis (derivatives)
Evaluating the Polynomial
F ( e )  F (   ..)  Pr( e )
PR: Probability of Evidence
Alternator
Battery Age
Fan Belt
Charge Delivered
Battery
Fuel Pump
Starter
Fuel Line
Distributor
Gas
Battery Power
Spark Plugs
Gas Gauge
Radio
Pr(e)
Lights
Engine Turn Over
Engine Start
The Partial Derivatives
F
 x
( e )  Pr( e  X , x )
PR: Probability of Evidence
Flips
Battery Age
Alternator
Fan Belt
Charge Delivered
Battery
Fuel Pump
Starter
Fuel Line
Distributor
Gas
Battery Power
Spark Plugs
Gas Gauge
Radio
Lights
Pr(e)
Engine Turn Over
Engine Start
X
PR: Probability of Evidence
Flips
Battery Age
Alternator
Fan Belt
Charge Delivered
Battery
Fuel Pump
Starter
Fuel Line
Distributor
Gas
Battery Power
Spark Plugs
Gas Gauge
Radio
Lights
Engine Turn Over
Pr(e-X,x)
Engine Start
X
The Partial Derivatives
 x |u
F
  x |u
( e )  Pr( e , x , u )
PR: Family Marginals
Battery Age
Alternator
Charge Delivered
Battery
U
Fan Belt
X
Starter
Fuel Pump
Fuel Line
Distributor
Gas
Battery Power
Spark Plugs
Gas Gauge
Radio
Lights
Pr(e,x,u)
Engine Turn Over
Engine Start
Multi-Linear Functions 
Arithmetic Circuits
f   a  b a  b | a   a  b  a  b | a   a  b a  b | a   a  b  a  b | a
A
Factoring
B
+
*
*
+
*
a
*
+
*
*
 a  b | a  b  b | a  b|a  b
*
 b |a
*
a
a
Circuit Evaluation and
Differentiation: Marginals
.3
+
Pr( a )
1
.3
0
*1
*
1 only:
Two passes
.3
*
1
.3
a
1
1
1 flipping)
•probability of evidence
(with evidence
+
+
•Node marginals
.1 marginals
.9
.8
.2
•Family
•Sensitivity
*
*
*
*
.3
1
.1
.3
1
.9
0
0
1*
0
.8
1
 0b
  a.27
.2
0
 a  b | a  b . 03 b| a .f7 b|(aa ) fPr(
(ba )a, bPr(
)b | aa )  a
.3
.3
.03
.3
0
.7
.7
a
0
Efficient Eval/Diff
Schemes
Assume alternating levels of +/* nodes,
with one parent per *node

Method A:

Method B:

Method C:
Two registers per +node (no registers for *nodes)
One register per node (use for values in upward pass,
then override with derivatives in downward pass)
One register per node, one bit per *node
Circuit Optimization: MPE
.27
m
MPE (a )
.27
0
*
*
.9
.8
m
.3
.1
*
.3
a
*
1
.1
1
m
.9
.8
.2
0
*
*
*
*
.9
.8
1
 a  b | a  b  b | a  b|a  b
.2
 b |a
0
a
.7
a
Circuit Optimization: MPE
m
*
MPE : a , b
m
*
a
a
*

b |a
b
Custom Hardware for
Evaluating ACs
Adharapurapu, Ercegovac (2004)
Offline Compilation

Factoring MLFs into ACs:




Jointree: Embeds AC
Variable Elimination: Trace is an AC
Recursive Conditioning: Trace is an AC
Reduction to Logic: CNF to d-DNNF
compilation
Compiling using Jointrees

Classical Jointree Algorithm:



Convert model into jointree
Jointree propagation (two-passes)
Modern interpretation:


Jointree embeds an AC that factors MLF
Jointree propagation is
evaluating/differentiating embedded AC
A Jointree Embeds an AC…
root
AE
Inward-pass evaluates circuit
Outward-pass differentiates circuit
[Hugin, Shenoy Shafer,…]
 : ab
AB
A
AC
:a
:a
A
B
:a
 : ab
:a
:ab
 :b
:ab
 :b
AD
A Differential Semantics to Jointree Algorithms
AIJ-04 (with James Park)
Efficient Eval/Diff
Schemes
Assume alternating levels of +/* nodes,
with one parent per *node

Method A:

Method B:

Method C:
Two registers per +node (no registers for *nodes)
One register per node (use for values in upward pass,
then override with derivatives in downward pass)
One register per node, one bit per *node
Jointree Flavors



Shenoy-Shafer:
Method A
Hugin:
Method B (looses information)
Zero-Conscious Hugin (new):
Method C (best of A,B)
Compiling using Variable
Elimination (VE)

VE operates on factors:


VE performs two operations on factors:



Multiply two factors
Sum-Out a variable from factor
Factors have different representations:



Mappings from variable
instantiations to real numbers
Tables
More structured representations (decision
trees/graphs)
Overhead problem for structured factors
Tabular Factors
B
A
A
A
TA
true
.3
false
.7
B
TB
true
true
.1
true
false
.9
false
true
.8
false
.2
false
false
Structured Factors:
Algebraic Decision Diagrams (ADDs)
X
Z
Y
Z
.1 .9
.5
Networks with Local Structure
Network Max Clust
alarm
bm
7.2
Vars
Card Total Parms %Det %Distinct
37
2...4
752
0.9
24.6
20 1005
2...2
6972
99.6
100
diabetes
17.2
413
3...21
461069
78.2
17.6
hailfinder
11.7
56
2...11
3741
15.7
26.9
mildew
21.4
35 3...100
547158
93.2
25.1
2...2
8326
98.7
75
mm
23 1220
munin1
26.8
189
1...21
19466
66.5
61.2
munin2
18.6 1003
2...21
83920
63.3
69.5
munin3
17.8 1044
1...21
85855
63.1
71.3
munin4
21.4 1041
1...21
98183
64.5
65.3
pathfinder
15
109
2...63
97851
56.1
5.1
pigs
17.4
441
3...3
8427
56.2
23.9
students
22
376
2...2
2616
90.7
79.3
tcc4f
10
105
2...2
3236
0.4
35.6
water
19.9
32
3...4
13484
54
57
VE: Tabular vs ADD Representations of Factors
Tabular
ADD
Network
Time (ms)
Time (ms)
Improvement
alarm
31
360
0.086
barley
307
14,049
0.022
bm-5-3
4,892
658
7.435
diabetes
949
33,220
0.029
hailfinder
48
515
0.093
link
1,688
2,658
0.635
mm-3-8-3
2,166
843
2.569
mildew
72
92,602
0.001
munin1
155
1,255
0.124
munin2
204
3,170
0.064
munin3
350
5,049
0.069
munin4
406
4,361
0.093
pathfinder
51
5,213
0.01
pigs
69
597
0.116
st-3-2
186
362
0.514
tcc4f
29
153
0.19
water
76
1,015
0.075
Compiling using Variable
Elimination (VE)

By using symbolic factors and
corresponding operations:


VE with tabular factors:


VE compiles out an AC
Generates ACs similar to those
embedded in jointree
VE with structured factors:


Generates much smaller ACs
Overhead pushed into offline phase
Factors
B
A
A
A
TA
true
.3
false
.7
B
TB
true
true
.1
true
false
.9
false
true
.8
false
.2
false
false
Symbolic Factors
B
A
A
A
TA
B
TB
true
true
θb|a * λb
true
θa * λa
true
false
θ~b|a * λ~b
false
θ~a * λ~a
false
true
θb|~a * λb
false
θ~b|~a* λ~b
false
false
Summing out Variable B
A
true
true
B
TB
true
θb|a * λb
false
θ~b|a * λ~b
false true
false
false false
θb|~a * λb
θ~b|~a * λ~b
A
Summing out B true
T’B
θb|a *λb + θ~b|a* λ~b
false θb|~a*λb+θ~b|~a *λ~b
Multiplying Factors
A T’B
A
TA T’B
TA
true
true
A
θa*λa
false θ~a*λ~a
*
θb|a*λb + θ~b|a*λ~b
false θb|~a*λb + θ~b|~a*λ~b
true
=
θa *λa *(θb|a* λb + θ~b|a* λ~b)
false θ~a*λ~a*(θb|~a*λb + θ~b|~a*λ~b)
Summing out Variable A
A
true
TA T’B
θa * λa * (θb|a * λb + θ~b|a * λ~b)
false θ~a * λ~a* (θb|~a* λb + θ~b|~a* λ~b)
θa * λa* (θb|a* λb + θ~b|a* λ~b) + θ~a* λ~a (θb|~a* λb + θ~b|~a* λ~b)
VE factors MLF into AC
(Bottom up Construction)
f   a  b a  b | a   a  b  a  b | a   a  b a  b | a   a  b  a  b | a
A
Factoring
B
+
•Time and space complexity of generating AC is similar to
*
*
Variable Elimination: Exponential only in treewidth
+
+
•Generated
ACs
*
* similar *to those*embedded*in Jointree*
 •Recall:
 to answer
 bcan
 b| a multiple
a  AC
 queries!
be used
|a 
a
b
b |a
b
b |a
a
a
Structured Factors:
Algebraic Decision Diagrams (ADDs)
X
Z
Y
Z
.1 .9
.5
Structured Factors:
Algebraic Decision Diagrams (ADDs)
•Modify standard ADD operations (multiply,Xsum-out) to
operate on symbolic ADDs
•Run variable elimination with symbolic ADDs
Z
•Compile out an AC
Y
•Asymptotic complexity is no worse than variable
Z
elimination
•Overhead of ADDs is pushed into offline phase
1  2
•Generated AC can be much smaller
•Online inference can be much faster
3
Symbolic ADD
Networks with Local Structure
Network Max Clust
alarm
bm
7.2
Vars
Card Total Parms %Det %Distinct
37
2...4
752
0.9
24.6
20 1005
2...2
6972
99.6
100
diabetes
17.2
413
3...21
461069
78.2
17.6
hailfinder
11.7
56
2...11
3741
15.7
26.9
mildew
21.4
35 3...100
547158
93.2
25.1
2...2
8326
98.7
75
mm
23 1220
munin1
26.8
189
1...21
19466
66.5
61.2
munin2
18.6 1003
2...21
83920
63.3
69.5
munin3
17.8 1044
1...21
85855
63.1
71.3
munin4
21.4 1041
1...21
98183
64.5
65.3
pathfinder
15
109
2...63
97851
56.1
5.1
pigs
17.4
441
3...3
8427
56.2
23.9
students
22
376
2...2
2616
90.7
79.3
tcc4f
10
105
2...2
3236
0.4
35.6
water
19.9
32
3...4
13484
54
57
Tabular vs ADD: Standard VE
Tabular
ADD
Network
Time (ms)
Time (ms)
Improvement
alarm
31
360
0.086
barley
307
14,049
0.022
bm-5-3
4,892
658
7.435
diabetes
949
33,220
0.029
hailfinder
48
515
0.093
link
1,688
2,658
0.635
mm-3-8-3
2,166
843
2.569
mildew
72
92,602
0.001
munin1
155
1,255
0.124
munin2
204
3,170
0.064
munin3
350
5,049
0.069
munin4
406
4,361
0.093
pathfinder
51
5,213
0.01
pigs
69
597
0.116
st-3-2
186
362
0.514
tcc4f
29
153
0.19
water
76
1,015
0.075
Tabular vs ADD: VE Compilations
Time (s)
Network
Ace
AC size
ADD-VE
Improv.
Tabular-VE
ADD-VE
Improv.
alarm
0.3
3.9
0.1
3,534
3,030
1.2
barley
8,190.20
122.8
66.7
66,467,777
24,653,744
2.7
0.8
6
0.1
75,591,750
14,836
5095.2
1,710.00
110.3
15.5
34,728,957
17,219,042
2
0.7
1.2
0.5
72,755
25,992
2.8
127,262,777
89,097,450
1.4
bm-5-3
diabetes
hailfinder
link
-
699.7
mildew
3,125.20
218.9
14.3
16,094,592
3,352,330
4.8
1.5
11.9
0.1
36,635,566
108,428
337.9
munin1
1,005.10
316.7
3.2
1,260,407,123
31,409,970
40.1
munin2
198.4
31.7
6.3
20,295,426
5,662,218
3.6
munin3
188.4
17.6
10.7
16,987,088
3,503,242
4.8
munin4
205
37.8
5.4
76,028,532
6,869,760
11.1
4.9
5.8
0.9
796,588
44,468
17.9
23.1
10
2.3
4,925,388
2,558,680
1.9
st-3-2
0.5
2.4
0.2
19,374,934
22,070
877.9
tcc4f
0.9
1.1
0.8
33,408
22,612
1.5
3
20.7
0.1
15,996,054
170,428
93.9
mm-3-8-3
pathfinder
pigs
water
-
ADD-VE vs Jointree: Online Inference Time (ms)
Network
Jointree
ADD-VE
Improv.
alarm
166
32
5.2
barley
65,226
35,209
1.9
bm-5-3
89,593
83
1079.4
diabetes
29,316
20,421
1.4
hailfinder
245
70
3.5
link
223,542
175,769
1.3
munin1
669,915
37,451
17.9
munin2
17,857
7,180
2.5
munin3
13,351
4,945
2.7
munin4
42,754
8,683
4.9
pathfinder
1,332
102
13.1
pigs
3,020
2,814
1.1
st-3-2
17,536
82
213.9
tcc4f
281
73
3.8
water
16,676
251
66.4
Work on structured
is now much
mildew representations
10,077
4,522 of factors
2.2
more relevantmm-3-8-3
and practical.
34,001
198
171.7
Computing all marginals, for 16 pieces of random evidence
Compiling by Reduction to
Logic




Algebraic: MLFs / ACs
Logical: CNF / d-DNNF
Factoring MLF into AC can be reduced
to factoring CNF into d-DNNF
CNF to d-DNNF compilers are very
powerful (natural for exploiting
determinism and CSI)
Reduction to Logic
Compiler:
CNF
d-DNNF
http://reasoning.cs.ucla.edu/c2d
Encode
Multi-Linear
Function
Decode
Arithmetic
Circuit
MLFsACs
CNFsd-DNNF
Propositional theory:
Multi-linear function:
c ^ (a   b)
Encode
ac+abc+c
Compile



c


b
a
b
a
Smooth d-DNNF


Decode
c


b
a

1
1
Arithmetic Circuit
Deterministic,
Decomposable NNF
or
and
and A
or
and
or
and
A and
or
or
and
B
C
and
D
B
and
D
E
Deterministic,
Decomposable NNF
Deterministic:
Disjuncts are
logically disjoint
or
and
and A
or
and
or
and
A and
or
or
and
B
C
and
D
B
and
D
E
Deterministic,
Decomposable NNF
Decomposable:
Conjuncts share
no variables
or
and
and
A and
Compilingand
CNFsA
into d-DNNFs
AAAI-02, ECAI-04
or
or
or
Compiler ator
http://reasoning.cs.ucla.edu/c2d
and
and
B
C
and
D
B
and
D
E
Recursive Conditioning
for Compilation
ABC
ADE
A  B  C
A  D  E
or
and
and
A
A
BC
DE
BC
DE

and
BC
DE
and

BC

DE
Why Logic?

Encoding local structure is easy:

Determinism encoded by adding clauses:
C | A  0

CSI encoded by collapsing variables:
 C | AB   C | A  B

A natural environment to exploit local structure:



DD-backtracking, clause learning, …
Non-structural decomposition
Non-structural (formula) caching
Local Structure
A
B
S
C
s|abe
-Functional constraints
-Context-specific independence
Pr(S|A,B,C)
A
B
C
a
b
c
0.95
a
b
c
0.95
a
b
c
0.20
a
b
c
0.05
a
b
c
0.00
a
b
c
0.00
a
b
c
0.00
a
b
c
0.00
Tabular CPT
Determinism
A
B
C
Pr(S|A,B,E)
a
b
c
0.95
a
b
c
0.95
a
b
c
0.20
a
b
c
0.05
a
b
c
0.00
a
b
c
0.00
a
b
c
0.00
a
b
c
0.00
Tabular CPT
λ~a  λb  λc  λs ↔ θs|~abc
¬ λ~a  ¬ λb  ¬ λc  ¬ λs
Context-Specific
Independence
Pr(S|A,B,C)
A
B
C
a
b
c
0.95
a
b
c
0.95
a
b
c
0.20
a
b
c
0.05
a
b
c
0.00
a
b
c
0.00
a
b
c
0.00
a
b
c
0.00
Tabular CPT
λa  λb  λc  λs ↔ θs|abc
λa  λb  λ~c λs ↔ θs|ab~c
λa  λb  λs ↔ θs|ab
The Ace System:
http://reasoning.cs.ucla.edu/ace
Arithmetic Circuit
Belief network

X

Y





x ... y ... x|y ...
Smooth d-DNNF
CNF
x   x
x   x
x  y  x|y
...

….
….




1


x ... y ...

x|y ...

x
ADD-VE vs Logic (Ace): Compile Times
Time (s)
Network
Ace
AC size
ADD-VE
Improv.
Tabular-VE
ADD-VE
Improv.
alarm
0.3
3.9
0.1
3,534
3,030
1.2
barley
8,190.20
122.8
66.7
66,467,777
24,653,744
2.7
0.8
6
0.1
75,591,750
14,836
5095.2
1,710.00
110.3
15.5
34,728,957
17,219,042
2
0.7
1.2
0.5
72,755
25,992
2.8
127,262,777
89,097,450
1.4
bm-5-3
diabetes
hailfinder
link
-
699.7
mildew
3,125.20
218.9
14.3
16,094,592
3,352,330
4.8
1.5
11.9
0.1
36,635,566
108,428
337.9
munin1
1,005.10
316.7
3.2
1,260,407,123
31,409,970
40.1
munin2
198.4
31.7
6.3
20,295,426
5,662,218
3.6
munin3
188.4
17.6
10.7
16,987,088
3,503,242
4.8
munin4
205
37.8
5.4
76,028,532
6,869,760
11.1
4.9
5.8
0.9
796,588
44,468
17.9
23.1
10
2.3
4,925,388
2,558,680
1.9
st-3-2
0.5
2.4
0.2
19,374,934
22,070
877.9
tcc4f
0.9
1.1
0.8
33,408
22,612
1.5
3
20.7
0.1
15,996,054
170,428
93.9
mm-3-8-3
pathfinder
pigs
water
-
ADD-VE vs Logic (Ace)
Network
Nodes
Parameters
Max Cluster
mastermind_04_08_03
1418
9802
26
mastermind_06_08_03
1814
12754
37
mastermind_10_08_03
2606
18658
54
mastermind_03_08_04
2288
16008
31
mastermind_04_08_04
2616
18488
39
mastermind_03_08_05
3692
26186
40
students_03_02
376
2616
25
students_03_12
1346
9856
59
students_04_16
2827
21070
101
students_05_20
5064
38168
148
students_06_24
8201
62302
233
blockmap_05_03
1005
6972
23
blockmap_10_03
6848
48758
52
blockmap_15_03
18787
132436
68
blockmap_20_03
43356
307220
92
blockmap_22_03
59404
423452
104
ADD-VE vs Logic (Ace)
Network
Offline Time
(min)
AC Nodes
AC Edges
Online Inference
Time (s)
mastermind_04_08_03
1
71,666
541,356
0.05
mastermind_06_08_03
1
258,228
1,523,888
0.15
mastermind_10_08_03
3
1,293,323
4,315,566
0.68
mastermind_03_08_04
2
186,351
4,859,201
0.3
mastermind_04_08_04
5
932,355
19,457,308
1.73
mastermind_03_08_05
10
1,359,391
55,417,639
4.33
students_03_02
1
7,927
37,281
0.01
students_03_12
1
24,219
113,876
0.02
students_04_16
3
181,166
815,461
0.09
students_05_20
7
1,319,834
5,236,257
1.84
students_06_24
33
9,922,233
36,450,231
12.97
blockmap_05_03
1
2,833
20,636
0.01
blockmap_10_03
2
17,749
974,817
0.06
blockmap_15_03
6
47,475
7,643,307
0.38
blockmap_20_03
30
105,602
40,172,434
2.45
blockmap_22_03
61
144,136
76,649,302
4.67
Effect of Local Structure
Local
Pathfinder Water
Structure
Encoded
None
981,178
Munin4
13,777,166 116,136,985
Det + CSI 42,810
(4%)
134,140
(1%)
5,762,690
(5%)
Det
130,380
(13%)
138,501
(1%)
9,997,267
(9%)
CSI
200,787
(20%)
11,111,104
(81%)
17,612,036
(15%)
Compilation vs Direct
Inference

Grid problems here…
Compilation vs Direct
Inference
Grid
size
Treewidth
w
Det
Cachet Ace
Ace
Offline/
(sec)
offline online Online
(sec) (sec)
16x16 25
50% 2236
220
2.072 1079
22x22 36
75% 2757
349
2.178 2024
34x34 60
90% 1584
79
0.419 3783
Average over 10 random instances for each grid
Ace available at http://reasoning.cs.ucla.edu/ace
Applications



Relational Models
Diagnosis
Genetic Linkage Analysis
Primula/Ace: Upcoming
Release
burglary(v)=0.005;
alarm(v)=(burglary(v):0.95,0.01);
calls(v,w)=
(neighbor(v,w):
(prankster(v)):
(alarm(w):0.9,0.05),
(alarm(w):0.9,0)),0);
alarmed(v)=
n-or{calls(w,v)|w:neighbor(w,v)}
Friends and Smokers
(Richardson & Domingos, 2004)




M individuals
Relations such as
smokes(p), cancer(p),
friend(p1,p2)
Logical constraints such
as: if one of p's friends
smokes, then p smokes.
Sample Query:
probability that given
person has cancer
Friends & Smokers
M Network Treewidth
params w
CNF
Vars
Cnf
AC
Online Offline
Clauses Edges Time
Time
(sec)
(sec)
1
34
3
12
31
18
0
0.03
4
1,552
13
414
1,390
293
0.003
0.44
7
7,714
36
1,995
6,916
1,295
0.006
1.92
10
21,760
70
5,565
19,525
3,512
0.005
6.66
13
46,930
118
11,934
42,133
7,430
0.013
12.8
16
86,464
172
21,912
77,656
13,535
0.022
21.68
19
143,602
244
36,309
129,010
22,313
0.035
38.36
22
221,584
316
55,935
199,111
34,250
0.058
90.67
25
323,650
412
81,600
290,875
49,832
0.079
162.45
28
453,040
528
114,114
407,218
69,545
0.114
274.2
29
502,802
560
126,614
451,965
77,118
0.119
275.17
Students
(Pasula & Russell, 2001)




P professors
S students
Various relataios, such as
famous(p), wellfunded(p), success(s),
advises(p,s)
Sample Query:
probability a professor is
well-funded given
success of advised
students
Students
Students- Network Treewidth CNF
Profs
params
w
Vars
Cnf
AC
Clauses Edges
Online
Time
(sec)
Offline
Time
(min)
04-08
11,566
72
3,099
11,099
445,410
0.0530
2
04-16
21,070
101
5,859
21,115
815,461
0.0930
3
05-10
20,688
128
5,624
20,279
2,531,230
0.2885
3
05-20
38,168
148
10,734
38,889
5,236,257
1.8439
7
06-12
33,454
176
9,209
33,353
16,936,504
3.2120
14
06-24
62,302
233
17,693
64,325
36,450,231
12.9663
33
Diagnosis QMR-like: Effect
of Encoding Evidence



600 diseases (D) and
4100 features (F)
Feature Fj is a noisy-or
of parent diseases Di
(11 parents chosen
randomly)
Sample Query:
probability of disease
given partial evidence on
features.
D1
D2
F1
…
D3
F2
…
Dm
Fn
Diagnosis QMR-like: Effect
of Encoding Evidence
No. True AC
Features Edges
Online
Time
(sec)
Offline
Time
(sec)
0
48,100
0.05
23.73
3
52,830
0.05
23.86
6
57,638
0.05
23.81
9
62,547
0.05
23.82
12
67,632
0.05
24.19
15
73,321
0.04
23.6
18
81,629
0.05
24.95
21
109,335
0.05
30.95
25
434,445
0.08
155.12
27
1,141,674 0.17
469.7
28
1,691,833 0.23
728.52
29
2,352,820 0.3
1,046.93
Treewidth:
586-589
CNF variables:
94,900
CNF clauses:
188,600
Genetic Linkage Analysis

Gene 1
Gene 2
Gene 3


Ordering genes on a
chromosome and
determining distance
between them
Useful for predicting
and detecting diseases
Associating functionality
of genes with their
location on the
chromosome
Pedigrees + Phenotypes +
Genotypes
Arithmetic Circuit
Gene 1
Gene 2
State of the Art Linkage
Pedigree
Offline AC Edges
(sec)
Online
(sec)
Superlink
1.4
(sec)
EE33
25.33
2,070,707
0.59
1,046.72
EE37
61.29
1,855,410
0.39
1,381.61
EE30
376.78 27,997,686 8.37
815.33
EE23
89.47
1.08
502.02
EE18
283.96 23,632,200 6.63
248.11
3,986,816
Model Compilation:
Factoring MLFs into ACs

Classical algorithms factor
MLFs into ACs:





Jointree embeds AC
Variable elimination
constructs AC bottom up
Recursive conditioning
constructs AC
top down
Factoring MLFs into ACs
can be reduced to logical
reasoning
Exploiting local
structure to build
smaller ACs:



Compiling models with
very high treewidth is
common place
Boundary between exact
and approximate
inference is much
changed
Public systems now
available!