TSLRP3.0 draft slides

Download Report

Transcript TSLRP3.0 draft slides

Lecture 15
Bayesian Networks in Computer
Vision
Gary Bradski
Sebastian Thrun
*
http://robots.stanford.edu/cs223b/index.html
1
What is a Bayesian Network?
It’s a Factored Joint Distribution and/or Causal Diagram
P(W)
The graph is directed
and acyclic.
P(A|W)
P(C|W)
P(F|C)
(random)
variables
A conditional
probability
distribution
quantifies the
effects of the
parents on node.
P(R|C,A)
A joint distribution, here p(W,C,A,R,R), is everything we can know about the problem,
but it grows exponentially, here 25-1=31. Factoring the distribution in a Bayesnet
decreases the number of parameters, here from 31 to 11 (note probabilities sum to one
2
which decreases the number of parameters to be specified).
Causality and Bayesian Nets
One can also think of Bayesian Networks as a “Circuit Diagram” of
Probability Models
• The Links indicate causal effect, not direction of information flow.
• Just as we can predict effects of changes on the circuit diagram,
we can predict consequences of “operating” on our probability
model diagram.
Diode
Mains
Capac.
Transf.
Diode
Observed
Un-Observed
Ammeter
Battery
3
Inference
• Once we have a model, we need to make it
consistent by “diffusing” the distributions
around until they are all consistent with one
another.
• Central algorithm for this:
Belief Propagation
4
Belief Propagation
“Causal” message
Going down arrow, sum out parent
Bayes Law:
P( B | A) P( A)
P( A | B) 
P( B)
“Diagnostic” message
Going up arrow, Bayes Law
Message
Messages
Specifically:
1/a
9
5
* some figures from: Peter Lucas BN lecture course
Belief Propagation
Bayes Law:
P( B | A) P( A)
P( A | B) 
P( B)
Diagnostic message against arrow
V (V j )  P(Vi | V j )
Vj
i
Causal message with arrow
 V (V j )  P(V j )
Vj
i
6
* some figures from: Peter Lucas BN lecture course
Inference in general graphs
• Belief propagation is only guaranteed to be
correct for trees
• A general graph should be converted to a
junction tree, by clustering nodes
• Computationally complexity is exponential in
size of the resulting clusters (NP-hard)
7
Junction tree: BN  Junction Tree
Algorithm for turning a Bayesian Network with loops into a junction tree
1.
2.
3.
4.
“Moralize” the graph by connecting parents
Drop the arrows.
Triangulate (connect nodes if a loop of >3 exists)
Put in intersection variables
Graph: X1
X3
(1)
X2
X5
X3
X4
X6
(2)
X1
X2
X5
X3
X4
X6
(3)
X1
X2
X5
X3
X4
X6
X1
X2
X5
X4
X6
Junction Tree:
Image from
Sam Roweis
8
* Lauritzen 96
Global message passing: Two
pass
• Select one clique as the root
• Two pass message passing: first collect
evidence, then distribute evidence.
Distribute
Collect
root
root
9
Figure from P. Green
Junction Tree Inference
Image from
Cecil Huang
10
Global message passing:
Parallel, distributed version
• All nodes can simultaneously send
messages out, if it has received the
messages from all its parents
• Parallel processing (topology level
parallelism).
Stage 1.
X1
X3
X2
X1
X4
Stage 2.
X3
X4
X2
11
Details
Junction Tree Algorithm
12
Junction Tree Properties
Graph:
a
Moralized, triangular graph:
b
a
c
d
d
e
An undirected graph whose vertices (clusters
are sets of variables with three properties:
{c,d}
{c}
{a,b,c}
c
e
Junction Tree:
b
{c}
1. Singly connected property (only one path)
2. Potential property (all variables are represented)
3. Running intersection property (variable in 2
nodes implies that all nodes on the path have
{c,e} the variable)
Collect and Distribute pass
necessary for Inference
1
p(a , b, c , d, e )  ψ (a , b, c )ψ (c , d )ψ (c , e )
13
Z
Junction Tree 1
Image from
Sam Roweis
14
Junction Tree 2
Image from
Sam Roweis
15
•
Message Passing in Junction
Tree
Potential
– U, the space of U (subset of the set of all nodes/vertices V) is the
Cartesian product of the state sets of the nodes of U
– A discrete potential on U is a mapping from U to the non-negative
real numbers Ro.
– Each clique and seperator in the junction tree has a potential
(actually marginalized joint distribution on the nodes in the
clique/seperator)
• Propagation/message passing between two adjacent
cliques C1, C2 (S0 is their seperator)
– Marginalize C1’s potential to get new potential for S0
 S*o 

C1
C1 \ S0
– Update C2’s potential
C* 2  C2
S*0
S 0
– Update S0’s potential to its new potential
16
Message Passing General
• BayesNet forms a tree
– Pearl’s algorithm is Message Passing first out
and then back in from a given node
• Not a tree (has loops)
– Turn loops into cliques until net is a tree, then
use Pearl’s algorithm
• Cliques turn out to be too big
– Exact computation is exponential in size of
largest cliques
– Use approximation algorithms (many)
17
Towards Decisions
18
From Bayes’ Net to
Decision/Influence Network
Start out with a causal Bayesian Network. In this case,
Possible causes of leaf loss in an apple tree.
We want to know what to do about this.
We duplicate the network because we are going to
Add an intervention: Treating sickness
The intervention will cost us,
but might help with our utility:
Making a profit when we Harvest.
Given the cost, we can now infer the optimal
Treat/no-treat policy
19
Influence Example
No fever
fever,means,
no runny
nose
No
cold
No fever,
runny nose =>
healthy,
don’t
treat
less likely
=>=>
Treat
allergy
treat
Replicate cold net and add
decision and cost/utility nodes
20
General
21
Probabilistic graphical models
Probabilistic models
Graphical models
Directed
(Bayesian belief nets)
Alarm network
State-space models
HMMs
Naïve Bayes classifier
PCA/ ICA
Undirected
(Markov nets)
Markov Random Field
Boltzmann machine
Ising model
Max-ent model
Log-linear models
22
Graphical Models Taxonomy
23
Typical forms for the Conditional
Probability Distributions (CPDs)
at graph nodes
• For Discrete-state nodes • For Continuous-state
nodes
– Tabular (CPT)
–
–
–
–
–
–
Decision tree
Deterministic CPD
SoftMax(logistic/sigmoid)
Noisy-OR
MLP
SVM?
–
–
–
–
–
Gaussian
Mixture of Gaussians
Linear Gaussian
Conditional Gaussian
Regression tree
24
We can’t always compute exact
inference. We then use
Approximate Inference
Avi's categorization for approximate inference algorithms
Approximate inference algorithms
Approximate computation
on exact model
Sampling
Methods
Importance sampling
MCMC
Search
Methods
Beam search
A* search
Exact computation
on approximate model
Loopy
Propagation
Expectation
Propagation
Variational
methods
Projection
Mean field
Minibuckets
Boyen-Koller
method for DBN
25
Software
Libraries
26
Append A
Bayesian Net Software
Name
Authors
Src
API
Exec
Bassist
U. Helsinki
C++
Y
U
BayesiaLab
Bayesia Ltd
Murphy
(U.C.Berkeley)
Hsu
(Kansas)
MRC/Imperial
N
N
-
Matlab/C
Java
Y
-
WUM
-
0 Many
0 jtree, IS
N
R
N
-
WU
-
0 Gibbs
0 None
GDAGsim
College
Bottcher et al
Wilkinson (U.
Newcastle)
C
Y
WUM
0 Exact
Structure learning.
Bayesian analysis of large linear Gaussian directed
models.
Genie
U. Pittsburgh
N
WU
WU
0 Jtree
-
C
Y
WUM
0 MCMC
Bayesian analysis of large linear Gaussian undirected
models.
N
R
N
Y
Y
U
W
0 Jtree
0 Jtree
Jtree
Designed for speech recognition.
-
0 MCMC
Varelim,
0 jtree
-
Up to 52 variables.
-
BNT
BNJ
BUGS
Deal
GMRFsim
GMTk
Grappa
Hugin Expert
Hydra
Rue (U.
Trondheim)
Bilmes (UW),
Zweig (IBM)
Green (Bristol)
Hugin
Warnes
(U.Wash.)
Free
Inference
0 MH
$
$
jtree
Comments
Generates C++ for MCMC.
"Supervised and unsupervised learning, clustering,
analysis toolbox, adaptive questionnaires, dynamic
models"
Also handles dynamic models, like HMMs and Kalman
filters.
-
Java
-
-
Java
Y
WUM
MIM
MSBNx
Netica
Cozman (CMU)
HyperGraph
Software
Microsoft
Norsys
N
N
N
N
Y
WUM
W
W
W
PMT
Pavlovic (BU)
Matlab/C
-
-
Jtree
0 Jtree
jtree
special
0 purpose
PNL
Eruhimov (Intel) C++
-
-
0 Many
A C++ version of BNT; will be released 12/03.
Pulcinella
Lisp
Y
WUM
0?
Uses valuation systems for non-probabilistic calculi.
Java
N
Java
Y
N
-
WUM
WU
-
0 Polytree
0 None
0 jtree
Distributed implementation.
K2 for struct learning
Vibes
IRIDIA
Dodier
(U.Colorado)
CMU
?
Winn & Bishop
(U. Cambridge)
Java
Y
WU
0 Variational
Not yet available.
WinMine
Microsoft
N
N
W
0 None
Learns BN or dependency net structure.
XBAIES 2.0
Cowell (City U.)
N
N
W
0 Jtree
-
Java Bayes
RISO
Tetrad
UnBBayes
$
$
-
-
27
Append A
Compare All BayesNet Software
G
Y
Y
Many
28
Append A
Compare All BayesNet Software
G
Y
Y
Many
29
Append A
Compare All BayesNet Software
G
Y
Y
Many
30
Append A
Compare All BayesNet Software
KEY
G
Y
Y
Many
31
Append C
BN Researchers
MAJOR RESEARCHERS
Microsoft: http://www.research.microsoft.com/research/dtg/
Heckerman & Chickering are big there, currently pushing uses of Dependency Networks
Prof. Russell (Berkeley): http://http.cs.berkeley.edu/~russell/
Wants more expressive probabilistic language. Currently pushing
Center for Intelligent Systems at Berkeley http://www.eecs.berkeley.edu/CIS
Brings together wide range of luminaries
Prof. Jordan (Berkeley): http://www.cs.berkeley.edu/~jordan/
Writing book, Data retrieval, structure learning, clustering. Variational methods, All.
Yair Weiss (Berkely=>Hebrew U): http://www.cs.berkeley.edu/~yweiss/
Computationally tractable approximation. Vision, now at Hebrew U.
Prof. Koller (Stanford): http://robotics.stanford.edu/~koller/courses.html
Writing book, probabilistic relational models (PRMs) more expressive languages, All.
Prof. Frey (Waterloo): http://www.cs.toronto.edu/~frey/
Vision models, machine learning reformulations
Prof. Pearl (UCLA): http://bayes.cs.ucla.edu/jp_home.html
Founder. Causality theory
Bill Freeman (MIT, was MERL, Learning, vision): http://www.ai.mit.edu/people/wtf/ Low level vision, learning theory now at MIT
Peter Spirtes (CMU, Tetrad project): http://hss.cmu.edu/HTML/departments/philosophy/people/directory/Peter_Spirtes.html
Kevin Murphy (MIT, BN Toolkit): http://www.ai.mit.edu/~murphyk/
Toolboxes (BNT), computational speedups, tutorials
Jonathan Yedidia (MERL): http://www.merl.com/people/yedidia/
Learning theory
Pietro Perona (CalTech): http://www.erc.caltech.edu/
Vision
Center for NeuroMorphic information http://www.erc.caltech.edu/
Brings together machine learning, BN, vision, design etc
Ron Parr (Duke University) http://www.cs.duke.edu/~parr/
Game theory, reinforcement, multi-agent
Nir Friedman (Hebrew U): http://www.cs.huji.ac.il/~nirf/
Computational biology, efficient inference
Avi Pfeffer (Harvard): http://www.eecs.harvard.edu/~avi/
Richer probabilistic expressibility, intelligent systems
Zoubin Ghahramani (Gatsby Institute, London): http://www.gatsby.ucl.ac.uk/~zoubin Variational Bayes
Finn Jensen, (Hugin, Denmark): http://www.cs.auc.dk/~fvj
Classical (expert-system style) BNs
Uffe Kjaerulff, (Hugin, Denmark): http://www.cs.auc.dk/~uk
Ditto
Eric Horvitz, (Microsoft): http://research.microsoft.com/~horvitz/
Decision making, user interface
Tommi Jaakkola, (MIT): http://www.ai.mit.edu/people/tommi/tommi.html
Theory, structure learning from bio data
Ross Shachter, (Stanford): http://www.stanford.edu/dept/MSandE/faculty/shachter/
Influence diagrams
David Spiegelhalter, (Univ. College London): http://www.mrc-bsu.cam.ac.uk/BSUsite/AboutUs/People/davids.shtml
Bayesian and medical BNs
Steffan Laurizten, (Europe): http://www.math.auc.dk/~steffen/
Statisical theory
Phil Dawid, (Univ College London): http://www.ucl.ac.uk/~ucak06d/
Statistical theory
Kathy Laskey, (George Mason): http://www.ucl.ac.uk/~ucak06d/
Object-oriented BNs, military applications
Jeff Bilmes, (U Washington): http://www.ee.washington.edu/faculty/bilmes/
DBNs for speech
Hagai Attias, (Microsoft): http://research.microsoft.com/users/hagaia/
Variational and sampling for (acoustic) signal processing
World wide list of Bayesians (not just networks):
http://bayes.stat.washington.edu/bayes_people.html
CONFERENCES
UAI:
http://robotics.stanford.edu/~uai01/
NIPS:
32
http://www.cs.cmu.edu/Groups/NIPS/
Present Library:
Append C
PNL vs. Other Graphical Models Libraries
Name
Author
Src
Cost
GUI
Un/dir
Utility
DBN
Gauss
Inference
Learning
Param
.edu
PNL
Intel
C++
U,D
0
BNT
Murphy
Struct
.com
0|$
-
Matlab
*
Jtree, BP, Gibbs
+
+
D
0
0
-
+
+
+
+
Jtree, BP, Gibbs, varelim
+
+
+
GMTk
Bilmes
C++
0
0
-
D
-
+
-
Jtree
+
+
Hugin
Hugin
-
$
$
+
D
+
-
+
Jtree
+
-
BUGS
MRC
-
0
∞
+
D
-
-
+
Gibbs
+
-
Genie
U. Pitt.
-
0
∞
+
D
+
-
-
Jtree
-
-
MSBN
Microsoft
-
0
$
+
D
+
-
-
Jtree
-
-
WinMine
Microsoft
-
0
$
+
U,D
-
-
-
-
+
+
JavaBayes
Cozman
Java
0
∞
+
-
-
-
-
-
D
Varelim
Intel Library is much more comprehensive
33
Examples of Use
Applications
34
Face Modeling and Recognition Using Bayesian
Networks
Gang Song*, Tao Wang, Yimin Zhang, Wei Hu, Guangyou Xu*, Gary Bradski
Face feature finder (separate)
System:
Learn Gabor filter “jet” at each
point
Add Pose switching variable
35
Face Modeling and Recognition Using Bayesian
Networks
Gang Song*, Tao Wang, Yimin Zhang, Wei Hu, Guangyou Xu*, Gary Bradski
Results:
Pose
Results:
BNPFR – Bayesnet with Pose
BNFR – Bayesnet w/o Pose
EHMM – Embedded HMM
36
EGM – Gabor jets
The Segmentation Problem
Looking for all possible joint configuration J is computationally impractical.
Therefore, segmentation takes place in two stages. First, we segment the head and
torso, and determine the position of the neck. Then, we jointly segment the upper
arms, forearms and hands, and determine the position of the remaining joints.
Step I
Step II
arg max P(O F | J, Q)  arg max
J,Q
J,Q
 P(O
i , jQ HT
ij
| qij , J HT )  P(Oij | qij , J A )
i , jQ A
Q A , Q HT
state assignments for the arm and head&torso regions
J A , J HT
joints for the arms and head&torso components.
37
Upper Body Model
{P(O )  u }  {  P(O
P(O F ) 
ij
ij
i , jF
Hand
Size
Sh
Forearm
Size Sf
Left
Wrist
Wl
Left
Hand
Hl
i , jF
Upper
Arm Size
Sa
Left
Elbow
El
Left
Forearm
Fl
A
Head
Size
Shd
Left
Shoulder
Sl
Left
Upper Arm
Ul
J qij C
Torso
Size
St
| qij , J , A)P(qij | J , A) P(J | A)P( A)  uij }
Upper
Arm Size
Sa
Right
Shoulder
Sr
Neck
N
Head
H
ij
Torso
T
Observations
Oij
Hand
Size
Sh
Forearm
Size Sf
Right
Elbow
Er
Right
Upper Arm
Ur
Right
Wrist
Wr
Right
Forearm
Fr
Anthropological
Measurements
A
Joints
J
Right
Hand
Hl
Components
C
Observations
O
38
Body Tracking Results
39
Audio-Visual Continuous Speech
Recognition. The Overall System
Face
Detection
Mouth
Detection
Mouth
Tracking
Audio
video
signal
Visual
Features
AV
Model
Train
Reco
Acoustic Features
(MFCC)
40
Speaker Independent AVCSR
AV Speech Reco
Audio observations of size
13, modeled with 3states,
32 mixture/state, diagonal
covariance matrix (39
English phoenemes).
Visual observations of size
13, modeled with 3states,
12 mixture/state, diagonal
covariance matrix (13
English visemes).
A coupled HMM for audio visual speech recognition
41
AVCSR Experimental Results
•WER obtained on X2MTVS database, 300 speakers, 10 digit enumeration
sentences.
The system improves by over
55% the recognition rate of
the acoustic only speech
recognition at SNR 0db!
42
MRFs for Hyper-Resolution
Bill Freeman (MIT AI Lab) created a simple model of early visual
processing:
He presented blurred images and trained on the sharp original, then
tested on new images
Input
Cubic Spline
Bayesian Net
Actual
43
MRFs for Shape from Shading
The illumination, which changes with each frame,
is factored from the reflectance which stays the same:
Frames over
time =>
vs.
This model is then used to insert graphics with proper lighting:
44
Blei, Jordan Malik
45
Blei, Jordan Malik
46
Blei, Jordan Malik
47
Blei, Jordan Malik
48
Example of learned models
(from Frey)
49
Example of learned models
(from Frey)
50