Segmentation and Fitting Using Probabilistic Methods

Transcript Segmentation and Fitting Using Probabilistic Methods

Segmentation and Fitting Using
Probabilistic Methods
Or, How Expectation-Maximization
Can Cure Your Computer Vision
System of Almost Anything
Well… maybe...
Departure Point
• Up to now, most of what we’ve done in the
grouping, segmentation arena has been local.
• Now we want to model things globally, and in
probabilistic terms.
• Explain a large collection of tokens with a few
parameters. (Hmmm…. Like the Hough?)
Missing Data Problems, Fitting,
Segmentation
• Often, if some parameters were known, the
maximum likelihood problem would be easy
– Fitting: If you know which line each token comes
from, getting the parameters is easy
– Segmentation: If you the segment each pixel
comes from, the segment’s parameters are easily
determined
– Fundamental Matrix: If you know the
correspondences….
Missing Data Problem
• A missing data problem is one where…
– Some terms in a data vector are missing in
some instances, but present in others
– An inference problem can be made simpler by
rewriting it using some variables whose values
are unknown
• Algorithm Concept: Take an expectation
over the missing data
Missing Data Problems
• Strategy
–
–
–
–
Estimate values for the missing data
Plug these in, now estimate parameters
Re-estimate values for missing data
Continue to convergence
• For example
–
–
–
–
Guess a mapping of points to lines
Fit each line to its points
Reallocate points to the fitted lines
Loop to convergence
• Reminiscent of K-means, is it not?
Refining the Strategy
• The problem has parameters to be
estimated, and missing variables (data)
• Iterate to convergence:
– Replace missing data with expected values,
given fixed parameter values
– Fix the missing data, do a maximium likelihood
estimate of the parameters, given that data
Refining the Example
• Allocate each point to a line with a weight
equal to the probability of the point, given
the line’s parameters
• Refit the lines to the weighted set of points
• Converges to local extremum (caution)
• Can be generalized…
Image Segmentation
pl: Probability of choosing segment l at random
(a priori)
p(x|ql): Conditional density of feature vector x,
given that it comes from segment l, l=1,…g
Segment 1, q1
Segment 2, q2
Segment 3, q3
Segment 4, q4
Model: p(x|ql) is Gaussian, ql=(ml,Sl)
The total density for the feature vector of any pixel
drawn at random…
p(x)  l p(x | ql )p l
This is known as a Mixture Model
Mixture Model: Generative
• To produce a pixel (feature vector)
– Pick an image segment l with prior probability pl
– Draw a sample from p(x|ql)
• Density in x space is a set of g Gaussian blobs, one
per segment
• We want to determine
– The parameters of each blob (the m and S values)
– The mixing weights (the p values)
– A mapping of pixels to components (the segmentation)
Package all these things into a parameter vector:
  (1, 2,..., g, q1,q 2,...,q g )
mixing weights
blob parameters
The mixture model becomes:
g
p(x | )  l pl (x | q l )
l 1
With each component a multivariate Gaussian:
pl (x | q l ) 
1
d
2
(2p ) det( Si )
 1

T
1
1 exp  x  mi  Si (x  mi )
 2

2
The Chicken and the Egg
• If we knew which pixel belonged to which
component,  would be straightforward:
– Use Max Likelihood estimates for each ql
– Fraction of image in each component gives l
• If we knew , then
– For each pixel, assign it to its most likely blob
• Unfortunately, we know neither
• That’s where Expectation-Maximization (EM)
comes in; iterate guesses until convergence
Formal Statement of Missing
Data Problems
X
Complete data space
f
Y
Incomplete data space
Measurements at each pixel
and
Set of variables matching
pixels to mixture components
Measurements at each pixel
Measurements at each token
and
Mapping of tokens to lines
Measurements at each token
Missing, Formally
U
Parameter space
Mixing weights and
Parameters (mean, covariance) of each
mixture component (parameters of each line)
We want to obtain a maximum-likelihood estimate of
these parameters given incomplete data. If we had
complete data, the we could use the joint density function
for the complete data space, pc(x;u).
Complete data log-likelihood:
Lc (x;u)  log j pc (x j ; u)  j log(pc (x j ;u))
Lc (x;u)  log j pc (x j ; u)  j log(pc (x j ;u))
OK. We maximize this to estimate each segment’s parameters (image
segmentation) or the mixing weights and parameters of the lines, given the
mapping of the tokens to lines (for the line fitting example).
Problem. We don’t have complete data. The density for the incomplete
space is the marginal density of the complete space where we’ve integrated
out the parameters we don’t know.
pi (y; u)  
x | f (x ) y
pc (x;u)d









Li (x;u)  log
pi (y j ;u)   log(pi (y j ; u))     x | f (x ) y pc (x;u)d





 j  obse rva tions
 j
j






Li (x;u)  log  pi (y j ;u)   log(pi (y j ; u))     x | f (x ) y pc (x;u)d




 j  obse rva tions
 j
j
This is a pain in the neck… We don’t know which of the many possible x
values that could correspond to the y values we observe are correct. We’ve
taken a projection (of some sort), and we cannot uniquely reconstruct the full
joint density. So we have to average over all those possibilities to make our
best guess.
But all is not lost… We have the following strategy:
1. Obtain some estimate of the missing data using a guess at the parameters.
2. Form a maximum likelihood estimate of the free parameters using the
estimate of the missing data.
3. Iterate to (hopefully) convergence.
Strategy by Example
• Image segmentation
– Obtain an estimate of the component from which each
pixel comes using an estimate of the ql
– Update the ql and the mixing weights using this
estimate
• Tokens and lines
– Obtain an estimate of the correspondence between
tokens and lines, using a guess at the line parameters
– Revise the estimate of the line parameters using the
estimated correspondences
Expectation-Maximization
For Mixture Models
• Assume the complete log-likelihood is linear in
the missing variables. (Common)
• Mixture model: Missing data indicate the mixture
component from which a data item is drawn.
• Represent this by associating with each data point
a bit vector z of g elements (one per component in
the mix).
1 if jt h dat a point comes from lth mixt ure component
z jl  
0 ot herwise

About the z Vectors (matrix)
Mixture components, one
Gaussian per column
l
Data points, one
per row.
That is, one row
per observation,
each row a z
vector.
j
g
n
1 if pixel (token) j
produced by
Gaussian mixture
component l.
Expectation:
Probability of that
event.
1 if jt h dat a point comes from lth mixt ure component
z jl  
0 ot herwise


So our complete information can be written as: x j  y j ,z j

We will think of the entries in z as probabilities, expectations.
Write the mixture model as (line example): p(y)   p l p(y | al )
l
 g

  z jl log p(y j | a l )
Complete data log-likelihood is:


j  obse rva tions  l 1
This is linear in the missing variables. Good news!
How did we ensure that that would happen?
EM: The Key Idea
• Obtain working values for the missing data,
and so for x by substituting the expectation
for each missing value.
• That is, fix the parameters, then compute
each expectation E[zjl], given yj and the
parameter values.
• Plug E[zjl] into the complete data loglikelihood and find parameters maxing that.
• E[zjl] has probably changed, so repeat.
More Formally
Given us we form us+1 by:
1. E-Step: Compute expected value for complete data using
the incomplete data and the current parameter estimates. We
know the expected value of yj (the means of the current
Gaussian guesses) and only need expected value of zj for each
s
z
j. Denote these values as j . Superscript indicates that the
expectation depends on current parameter values at step s.
2. M-Step: Maximize the complete data log-likelihood with
respect to u using the expectation from the E-step.
u
s 1

 arg max Lc (x ;u)  arg max Lc y,z ;u
s
u
u
s

Image Segmentation
In Practice
(Warning: Your text is a typo minefield)
Set up an n by g array of indicators I (Each row like z vector)
E-Step: The j, l element of I is 1 if pixel j comes from blob l
E(Ijl)= Prob (pixel j comes from Gaussian blob l)


E I jl | s  I jl s 
a
b
x
 (sl ) pl x j | q (sl ) 
g

k 1
Note: This is no longer a binary value!
(s)
k
pk x j | q (sk ) 
~ b/(a+b)
Practice…
M-Step: Now form a maximum-likelihood estimate of s+1




n
1
ls1   E I jl |  s
n j 1
n
msl 1 
s
x
E
I
|

 j jl
j 1
n
 EI
jl
| s
j 1
 E I
n
S
s 1
l

j 1
jl
|
s
x
j
 EI
j 1
jl
… weighted average feature vector
for each column

m
n
… average value in each column
s
l
x
| s

j
m
s
l

T
… weighted average
covariance matrix for
each column
When it Converges...
• Can make a maximum a posteriori (MAP)
decision by assigning each pixel to the
Gaussian for which it has the highest E(Ijl).
• Can also keep the probabilities and work
with them in, for instance, a probabilistic
relaxation framework. (coming attractions)