Transcript Document

Expectation Maximization
• First introduced in 1977
• Lots of mathematical derivation
• Problem : given a set of data (data is incomplete or
having missing values).
• Goal : assume the set of data come from a underlying
distribution, we need to guess the most likely (maximum
likelihood) parameters of that model.
Example

 

• Given a set of data points in R2 ( x1, x 2,...,xn)  {X }
• Assume underlying distribution is mixture of Gaussians
• Goal: estimate the parameters of each gaussian
distribution
• Ѳ is the parameter, we consider it consists of means and
variances, k is the number of Gaussian model.
 


( 1, 2,...,k )  {}
Steps of EM algorithm(1)
• randomly pick values for Ѳk (mean and variance)
• for each xn, associate it with a responsibility value r
• rn,k - how likely the nth point comes from/belongs to the
kth mixture
• how to find r?
Assume data come from
these two distribution
Steps of EM algorithm(2)
 
p( xn |  k )
rn, k  k
 
 p( xn |  i )
i 1
Probability that we observe xn in the
data set provided it comes from kth
mixture
Distribution by Ѳk
Distance between xn and
center of kth mixture
Steps of EM algorithm(3)
• each data point now associate with (rn,1, rn,2,…, rn,k)
rn,k – how likely they belong to kth mixture, 0<r<1
• using r, compute weighted mean and variance for each
gaussian model
• We get new Ѳ, set it as the new parameter and iterate
the process (find new r -> new Ѳ -> ……)
• Consist of expectation step and maximization step
Ideas and Intuition
• given a set of incomplete (observed) data
• assume observed data come from a specific model
• formulate some parameters for that model, use this to
guess the missing value/data (expectation step)
• from the missing data and observed data, find the most
likely parameters (maximization step)
• iterate step 2,3 and converge
Application
• Parameter estimation for Gaussian mixture (demo)
• Baum-Welsh algorithm used in Hidden Markov Models
• Difficulties
• How to model the missing data?
• How to determine the number of Gaussian mixture.
• What model to be used?