Transcript First Slide

Markov Random Fields
• Allows rich probabilistic models for
images.
• But built in a local, modular way. Learn
local relationships, get global effects out.
MRF nodes as pixels
Winkler, 1995, p. 32
MRF nodes as patches
image patches
scene patches
F(xi, yi)
image
Y(xi, xj)
scene
Network joint probability
1
P ( x, y ) =
Z
scene
image
Y( x , x ) F( x , y )
i
i, j
j
i
i
i
Scene-scene
Image-scene
compatibility
compatibility
function
function
neighboring
local
scene nodes
observations
In order to use MRFs:
• Given observations y, and the parameters of
the MRF, how infer the hidden variables, x?
• How learn the parameters of the MRF?
Outline of MRF section
• Inference in MRF’s.
–
–
–
–
–
Gibbs sampling, simulated annealing
Iterated condtional modes (ICM)
Variational methods
Belief propagation
Graph cuts
• Vision applications of inference in MRF’s.
• Learning MRF parameters.
– Iterative proportional fitting (IPF)
Gibbs Sampling and Simulated
Annealing
• Gibbs sampling:
– A way to generate random samples from a (potentially
very complicated) probability distribution.
• Simulated annealing:
– A schedule for modifying the probability distribution so
that, at “zero temperature”, you draw samples only
from the MAP solution.
Reference: Geman and Geman, IEEE PAMI 1984.
Sampling from a 1-d function
1. Discretize the density
function
f (x )
f (k )
f (k )
F (k )
3. Sampling
draw a ~ U(0,1);
for k = 1 to n
if F (k )  a
break;
x = x  k ;
0
2. Compute distribution function
from density function
Gibbs Sampling
x1(t 1) ~  ( x1 | x2(t ) , x3(t ) ,, xK(t ) )
x2(t 1) ~ π ( x2 | x1(t 1) , x3(t ) , , xK(t ) )
xK( t 1) ~  ( xK | x1( t 1) ,, xK( t 11) )
x2
Slide by Ce Liu
x1
Gibbs sampling and simulated
annealing
Simulated annealing as you gradually lower
the “temperature” of the probability
distribution ultimately giving zero
probability to all but the MAP estimate.
What’s good about it: finds global MAP
solution.
What’s bad about it: takes forever. Gibbs
sampling is in the inner loop…
Iterated conditional modes
• For each node:
– Condition on all the neighbors
– Find the mode
– Repeat.
Described in: Winkler, 1995. Introduced by Besag in 1986.
Winkler, 1995
Variational methods
• Reference: Tommi Jaakkola’s tutorial on
variational methods,
http://www.ai.mit.edu/people/tommi/
• Example: mean field
– For each node
• Calculate the expected value of the node,
conditioned on the mean values of the neighbors.
Outline of MRF section
• Inference in MRF’s.
–
–
–
–
–
Gibbs sampling, simulated annealing
Iterated condtional modes (ICM)
Variational methods
Belief propagation
Graph cuts
• Vision applications of inference in MRF’s.
• Learning MRF parameters.
– Iterative proportional fitting (IPF)
Derivation of belief propagation
y1
F( x1 , y1 )
y2
F( x2 , y2 )
x1
Y( x1 , x2 )
x2
y3
F( x3 , y3 )
Y( x2 , x3 )
x3
x1MMSE = mean sum sum P( x1 , x2 , x3 , y1 , y2 , y3 )
x1
x2
x3
The posterior factorizes
x1MMSE = mean sum sum P ( x1 , x2 , x3 , y1 , y2 , y3 )
x1
x2
x3
x1MMSE = mean sum sum F ( x1 , y1 )
x1
x2
x3
F ( x2 , y2 ) Y ( x1 , x2 )
F ( x3 , y3 ) Y ( x2 , x3 )
x1MMSE = mean F ( x1 , y1 )
x1
sum F ( x2 , y2 ) Y ( x1 , x2 )
x2
sum F ( x3 , y3 ) Y ( x2 , x3 )
x3
y1
F( x1 , y1 )
y2
F( x2 , y2 )
x1
Y( x1 , x2 )
x2
y3
F( x3 , y3 )
Y( x2 , x3 )
x3
Propagation rules
x1MMSE = mean sum sum P ( x1 , x2 , x3 , y1 , y2 , y3 )
x1
x2
x3
x1MMSE = mean sum sum F ( x1 , y1 )
x1
x2
x3
F ( x2 , y2 ) Y ( x1 , x2 )
F ( x3 , y3 ) Y ( x2 , x3 )
x1MMSE = mean F ( x1 , y1 )
x1
sum F ( x2 , y2 ) Y ( x1 , x2 )
x2
sum F ( x3 , y3 ) Y ( x2 , x3 )
x3
y1
F( x1 , y1 )
y2
F( x2 , y2 )
x1
Y( x1 , x2 )
x2
y3
F( x3 , y3 )
Y( x2 , x3 )
x3
Propagation rules
x1MMSE = mean F( x1 , y1 )
x1
sum F( x2 , y2 ) Y ( x1 , x2 )
x2
sum F( x3 , y3 ) Y ( x2 , x3 )
x3
M 12 ( x1 ) = sum Y( x1 , x2 ) F( x2 , y2 ) M 23 ( x2 )
x2
y1
F( x1 , y1 )
y2
F( x2 , y2 )
x1
Y( x1 , x2 )
x2
y3
F( x3 , y3 )
Y( x2 , x3 )
x3
Belief, and message updates
bj (x j ) =
j
k
M
 j (x j )
kN ( j )
M i j ( xi ) =  ij (xi , x j )
xj
i
=
i
k
M
 j (x j )
k N ( j ) \ i
j
Optimal solution in a chain or tree:
Belief Propagation
• “Do the right thing” Bayesian algorithm.
• For Gaussian random variables over time:
Kalman filter.
• For hidden Markov models:
forward/backward algorithm (and MAP
variant is Viterbi).
No factorization with loops!
x1MMSE = mean F( x1 , y1 )
x1
sum F( x2 , y2 ) Y ( x1 , x2 )
x2
sum F( x3 , y3 ) Y ( x2 , x3 ) Y ( x1 , x3 )
x
3
y2
y1
x1
x2
y3
x3
Justification for running belief propagation
in networks with loops
• Experimental results:
– Error-correcting codes Kschischang and Frey, 1998;
McEliece et al., 1998
– Vision applications
Freeman and Pasztor, 1999;
Frey, 2000
• Theoretical results:
– For Gaussian processes, means are correct.
Weiss and Freeman, 1999
– Large neighborhood local maximum for MAP.
Weiss and Freeman, 2000
– Equivalent to Bethe approx. in statistical physics.
Yedidia, Freeman, and Weiss, 2000
– Tree-weighted reparameterization
Wainwright, Willsky, Jaakkola, 2001
Statistical mechanics interpretation
U - TS = Free energy
U = avg. energy =  p( x1 , x2 ,...)E( x1 , x2 ,...)
states
T = temperature
  p( x1 , x2 ,...)ln p( x1 , x2 ,...)
S = entropy =
states
Free energy formulation
Defining
Yij ( xi , x j ) = e
 E ( xi , x j ) / T
Fi ( xi ) = e
 E ( xi ) / T
then the probability distribution P( x1 , x2 ,...)
that minimizes the F.E. is precisely
the true probability of the Markov network,
P( x1 , x2 ,...) =  Yij (xi , x j ) Fi (xi )
ij
i
Approximating the Free Energy
F[ p( x1, x2 ,...,xN )]
Exact:
F[bi ( xi )]
Mean Field Theory:
Bethe Approximation : F[bi ( xi ),bij ( xi , x j )]
Kikuchi Approximations:
F[bi ( xi ),bij ( xi , x j ),bijk ( xi, x j , xk ),....]
Bethe Approximation
On tree-like lattices, exact formula:
p( x1 , x2 ,..., xN ) =  pij ( xi , x j )[ pi ( xi )]1qi
(ij )
i
FBethe (bi , bij ) =   bij ( xi , x j )( Eij ( xi , x j )  T ln bij ( xi , x j ))
( ij ) xi , x j
  (1  qi )bi ( xi )(Ei ( xi )  T ln bi ( xi ))
i
xi
Gibbs Free Energy
FBethe (bi , bij )    ij {  bij ( xi , x j )  1}
( ij )

xj
xi , x j
  ( x ){ b ( x , x )  b ( x )}
ij
( ij )
j
ij
xi
i
j
j
j
Gibbs Free Energy
FBethe (bi , bij )    ij {  bij ( xi , x j )  1}
( ij )
xi , x j

xj
  ( x ){ b ( x , x )  b ( x )}
ij
( ij )
j
ij
i
j
j
xi
Set derivative of Gibbs Free Energy w.r.t. bij, bi terms to zero:
 ij ( xi )
bij ( xi , x j ) = k Yij ( xi , x j ) exp(
)
T
bi ( xi )
= k F( xi ) exp(
 ij ( xi )
jN ( i )
T
)
j
Belief Propagation = Bethe
Lagrange multipliers
ij ( x j )
enforce the constraints
b j ( x j ) =  bij ( xi , x j )
xi
Bethe stationary conditions = message update rules
with
ij ( x j ) = T ln
M
kN ( j ) \i
k
j
(x j )
Region marginal probabilities
bi ( xi )
= k F( xi )
k
M
 i ( xi )
kN ( i )
i
bij ( xi , x j ) = k Y ( xi , x j )
k
M
 i ( xi )
kN ( i ) \ j
i
j
k
M
 j (x j )
kN ( j ) \ i
Belief propagation equations
Belief propagation equations come from the
marginalization constraints.
i
i
=
i
j
i
j
M i j ( xi ) =  ij (xi , x j )
xj
k
M
 j (x j )
k N ( j ) \ i
Results from Bethe free energy analysis
• Fixed point of belief propagation equations iff. Bethe
approximation stationary point.
• Belief propagation always has a fixed point.
• Connection with variational methods for inference: both
minimize approximations to Free Energy,
– variational: usually use primal variables.
– belief propagation: fixed pt. equs. for dual variables.
• Kikuchi approximations lead to more accurate belief
propagation algorithms.
• Other Bethe free energy minimization algorithms—
Yuille, Welling, etc.
Kikuchi message-update rules
Groups of nodes send messages to other groups of nodes.
Typical choice for Kikuchi cluster.
i
=
i
Update for
messages
j
i
j
=
Update for
messages
i
j
k
l
Generalized belief propagation
Marginal probabilities for nodes in one row
of a 10x10 spin glass
References on BP and GBP
• J. Pearl, 1985
– classic
• Y. Weiss, NIPS 1998
– Inspires application of BP to vision
• W. Freeman et al learning low-level vision, IJCV 1999
– Applications in super-resolution, motion, shading/paint
discrimination
• H. Shum et al, ECCV 2002
– Application to stereo
• M. Wainwright, T. Jaakkola, A. Willsky
– Reparameterization version
• J. Yedidia, AAAI 2000
– The clearest place to read about BP and GBP.
Graph cuts
• Algorithm: uses node label swaps or expansions
as moves in the algorithm to reduce the energy.
Swaps many labels at once, not just one at a time,
as with ICM.
• Find which pixel labels to swap using min cut/max
flow algorithms from network theory.
• Can offer bounds on optimality.
• See Boykov, Veksler, Zabih, IEEE PAMI 23 (11)
Nov. 2001 (available on web).
Comparison of graph cuts and belief
propagation
Comparison of Graph Cuts with Belief
Propagation for Stereo, using Identical
MRF Parameters, ICCV 2003.
Marshall F. Tappen William T. Freeman
Ground truth, graph cuts, and belief
propagation disparity solution energies
Graph cuts versus belief propagation
• Graph cuts consistently gave slightly lower energy
solutions for that stereo-problem MRF, although
BP ran faster, although there is now a faster graph
cuts implementation than what we used…
• However, here’s why I still use Belief
Propagation:
– Works for any compatibility functions, not a restricted
set like graph cuts.
– I find it very intuitive.
– Extensions: sum-product algorithm computes MMSE,
and Generalized Belief Propagation gives you very
accurate solutions, at a cost of time.
MAP versus MMSE
Show program comparing some
methods on a simple MRF
testMRF.m
Outline of MRF section
• Inference in MRF’s.
–
–
–
–
–
Gibbs sampling, simulated annealing
Iterated condtional modes (ICM)
Variational methods
Belief propagation
Graph cuts
• Vision applications of inference in MRF’s.
• Learning MRF parameters.
– Iterative proportional fitting (IPF)
Vision applications of MRF’s
•
•
•
•
Stereo
Motion estimation
Labelling shading and reflectance
Many others…
Vision applications of MRF’s
•
•
•
•
Stereo
Motion estimation
Labelling shading and reflectance
Many others…
Motion application
image patches
image
scene patches
scene
What behavior should we see in a
motion algorithm?
• Aperture problem
• Resolution through propagation of
information
• Figure/ground discrimination
The aperture problem
The aperture problem
Program demo
Motion analysis: related work
• Markov network
– Luettgen, Karl, Willsky and collaborators.
• Neural network or learning-based
– Nowlan & T. J. Senjowski; Sereno.
• Optical flow analysis
– Weiss & Adelson; Darrell & Pentland; Ju,
Black & Jepson; Simoncelli; Grzywacz &
Yuille; Hildreth; Horn & Schunk; etc.
Inference:
Motion estimation results
(maxima of scene probability distributions displayed)
Image data
Iterations 0 and 1
Initial guesses only
show motion at edges.
Motion estimation results
(maxima of scene probability distributions displayed)
Iterations 2 and 3
Figure/ground still
unresolved here.
Motion estimation results
(maxima of scene probability distributions displayed)
Iterations 4 and 5
Final result compares well with vector
quantized true (uniform) velocities.
Vision applications of MRF’s
•
•
•
•
Stereo
Motion estimation
Labelling shading and reflectance
Many others…
Forming an Image
Illuminate the surface to get:
Surface (Height Map)
Shading Image
The shading image is the interaction of the shape
of the surface and the illumination
Painting the Surface
Scene
Image
Add a reflectance pattern to the surface. Points
inside the squares should reflect less light
Goal
Image
Shading Image
Reflectance
Image
Basic Steps
1. Compute the x and y image derivatives
2. Classify each derivative as being caused by
either shading or a reflectance change
3. Set derivatives with the wrong label to zero.
4. Recover the intrinsic images by finding the leastsquares solution of the derivatives.
Original x derivative image
Classify each derivative
(White is reflectance)
Learning the Classifiers
• Combine multiple classifiers into a strong classifier using
AdaBoost (Freund and Schapire)
• Choose weak classifiers greedily similar to (Tieu and Viola
2000)
• Train on synthetic images
• Assume the light direction is from the right
Shading Training Set
Reflectance Change Training Set
Using Both Color and
Gray-Scale Information
Results without
considering gray-scale
Some Areas of the Image Are
Locally Ambiguous
Is the change here better explained as
Input
?
or
Shading
Reflectance
Propagating Information
• Can disambiguate areas by propagating
information from reliable areas of the image
into ambiguous areas of the image
Propagating Information
• Consider relationship between
neighboring derivatives
• Use Generalized Belief
Propagation to infer labels
Setting Compatibilities
• Set compatibilities
according to image
contours
– All derivatives along a
contour should have
the same label
• Derivatives along an
image contour
strongly influence
each other
β=
0.5
1.0
1  
 (x , x j ) = 
i
 
 
1   
Improvements Using Propagation
Input Image
Reflectance Image
Without Propagation
Reflectance Image
With Propagation
(More Results)
Input Image
Shading Image
Reflectance Image
Outline of MRF section
• Inference in MRF’s.
–
–
–
–
–
Gibbs sampling, simulated annealing
Iterated conditional modes (ICM)
Variational methods
Belief propagation
Graph cuts
• Vision applications of inference in MRF’s.
• Learning MRF parameters.
– Iterative proportional fitting (IPF)
Learning MRF parameters, labeled data
Iterative proportional fitting lets you
make a maximum likelihood
estimate a joint distribution from
observations of various marginal
distributions.
True joint
probability
Observed
marginal
distributions
Initial guess at joint probability
IPF update equation
Scale the previous iteration’s estimate for the joint
probability by the ratio of the true to the predicted
marginals.
Gives gradient ascent in the likelihood of the joint
probability, given the observations of the marginals.
See: Michael Jordan’s book on graphical models
Convergence of to correct marginals by IPF algorithm
Convergence of to correct marginals by IPF algorithm
IPF results for this example:
comparison of joint probabilities
True joint
probability
Initial guess
Final maximum
entropy estimate
Application to MRF parameter estimation
• Can show that for the ML estimate of the clique
potentials, c(xc), the empirical marginals equal
the model marginals,
• This leads to the IPF update rule for c(xc)
• Performs coordinate ascent in the likelihood of the
MRF parameters, given the observed data.
Reference: unpublished notes by Michael Jordan
More general graphical models than
MRF grids
• In this course, we’ve studied Markov chains, and
Markov random fields, but, of course, many other
structures of probabilistic models are possible and
useful in computer vision.
• For a nice on-line tutorial about Bayes nets, see
Kevin Murphy’s tutorial in his web page.
“Top-down” information: a
representation for image context
Images
80-dimensional
representation
Credit: Antonio Torralba
“Bottom-up” information: labeled
training data for object recognition.
•Hand-annotated 1200 frames of video from a wearable webcam
•Trained detectors for 9 types of objects: bookshelf, desk,
screen (frontal) , steps, building facade, etc.
•100-200 positive patches, > 10,000 negative patches
Combining top-down with bottom-up:
graphical model showing assumed
statistical relationships between variables
Visual “gist”
observations
Scene category
kitchen, office, lab, conference
room, open area, corridor,
elevator and street.
Object class
Particular objects
Local image features
Categorization of new places
ICCV 2003 poster
By Torralba, Murphy,
Freeman, and Rubin
Specific location
Location category
Indoor/outdoor
frame
Bottom-up detection: ROC curves
ICCV 2003 poster
By Torralba, Murphy,
Freeman, and Rubin
Generative/discriminative hybrids
• CMF’s: conditional Markov random fields.
– Used in text analysis community.
– Used in a vision application by [name?] and Hebert, from
CMU, as a poster in ICCV 2003.
• The idea: an ordinary MRF models P(x, y). But
you may not care about what the distribution of the
images, y, is.
• It might be simpler to model P(x|y), with this
graphical model. It combines the structured
modeling of a generative model with the power of
discriminative training.
Conditional Markov Random Fields
Another benefit of CMF’s: you can include longrange dependencies in the model without it messing
up inference by introducing many new loops.
y1
y2
y3
x1
x2
x3
y1
y2
y3
x1
x2
x3
Lots of interdependencies
to deal with during
inference
Many fewer interdependencies,
because everything is
conditioned on the image data.
2003 ICCV Marr prize winners
• This year, the winners were all in the
subject area of vision and learning
• This is an exciting time to be working on
these problems; researchers are making
progress.
Afternoon companion class
• Learning and vision: discriminative methods,
taught by Paul Viola and Chris Bishop.
Course web page
For powerpoint slides, references, example
code:
www.ai.mit.edu/people/wtf/learningvision
end
Markov chain Monte Carlo
See ICCV 2003 talk:
Image Parsing: Unifying Segmentation,
Detection, and Recognition
Zhuowen Tu, Xiangrong Chen, Alan L. Yuille,
Song-Chun Zhu