Transcript Slide 1

Image Modeling
& Segmentation
Aly Farag and Asem Ali
Lecture #3
Parametric methods
 These methods are useful when the underlying distribution is known in
advance or is simple enough to be modeled by a simple distribution function
or a mixture of such functions
o A location parameter simply shifts the graph left or right on the horizontal axis
.
o A scale parameter (>1) stretches or (<1 ) compress the pdf,
 The parametric model is very compact (low memory and CPU usage) where
only few parameters need to fit.
 The model’s parameters are estimated using these methods such as
maximum likelihood estimation, Bayesian estimation and expectation
maximization.
2
Parametric methods (1- Maximum Likelihood Estimator: MLE)
 Suppose that n samples x1, x2, … , xn are drawn independently and identically
distributed (i.i.d.) ~ distribution φ(θ) with vector of parameters θ=(θ1,…. , θr)
 Know: data samples, the distribution type
Unknown : θ ?????
 MLE Method estimates θ by maximizing the log likelihood of the data
 MLE  arg max log p ( x 1 , x 2 ,  , x n |  )
To show the dependence
of p on ɵ explicitly.

n

p (x i |  )
By i.i.d.
 log
p (x i |  )
monotonicity of log
 arg max log

i 1
n
 arg max

i 1
3
Parametric methods (1- Maximum Likelihood Estimator: MLE)
n
 log
Let L 
p (x i |  )
i 1
 Then calculate
L

 Find θ by letting
[
L
L
,
L
1  2
, ,
L
 r
]
T
0

 Coin Example ………………..
 In some cases we can find a closed form for θ
Example:
Suppose that n samples x1, x2, … , xn are drawn independently and identically
distributed (i.i.d.)~ 1D N(μ,σ) find the MLE of μ and σ
n
L ( ) 
n
 log
p (x i |  ) 
i 1
L

[
L
i 1
L
,
n
] [
T
 
 MLE 
1


i 1
(x  )

 i
1
2
log 
e 2
  2

xi  

2
,

2

1

  n log 


  2 

n
n
2
2

x

n

i
i 1
2
i 1
n
2
MLE

(x i   )

1
n

(x i   )
i 1
2
2
2
2
]0
4
n
(x

n
i
 )
2
i 1
Matlab Demo
4
Parametric methods (1- Maximum Likelihood Estimator: MLE)
 An estimator of a parameter is unbiased if the expected value of the estimate is
the same as the true value of the parameters.
E [  MLE
Example:
1
]  E
n
n

i 1
 1
xi  
 n
n

E x i  
i 1
1
n
nE x i   
 An estimator of a parameter is biased if the expected value of the estimate is
different from the true value of the parameters.
E[
Example:
2
MLE
1
] E
n
1
 E
n
n
 (x
i
  MLE
i 1
n

(x 
2
i
i 1
2
n
1

)   E

n
2
n

i 1
xix j 

2
unbiased

1

i 1

(x i   x j ) 
n j 1

1
n
2

2
(
x
)
)

j
2 
n
j 1

 .......... .......... ...... Complete
1

  1  
n

n
1
n
the proof
.......... ........
2
n
(x

n 1
i 1
i
  MLE )
2
doesn’t make much difference once n --> large
5
Parametric methods
What if there are distinct subpopulations in observed data?
Example
 Pearson in 1894, tried to model the distribution of the ratio between
measurements of forehead and body length on crabs.
 He used a two-component mixture.
 It was hypothesized that the two-component structure was related to the
possibility of this particular population of crabs evolving into two new subspecies
Mixture Model
The underlying density is assumed to have the form
K
p( x) 
    x; 
k
k 1
The weights, Constrained
K
 k  0,

k 1
k
1
k
What is the difference between
Mixture Model and the kernel-based
estimator?
Components of the mixture are
densities and are parameterized
by  k
6
Parametric methods
Example
 Given that {xi, Ci1, Ci2} n samples (complete data) drawn i.i.d. two normal
distributions
 xi observed value of ith instance
 Ci1 and Ci2 indicate which of two normal distributions was used to
generate xi
 Cij=1 if Cij was used to generate xi, 0 otherwise
 MLE
1 
1 
2
1
n1
n1

1
n1
n1
x
i 1 ,
C i 1 1
(x i  1 )
i 1 ,
C i 1 1
2 
i
2
1
n2
2 
2
n2
x
i
i 1,
C i 2 1
1
n2
n2

(x i   2 )
2
i 1,
C i 2 1
How can we estimate the parameters given incomplete data (don’t know Ci1
and Ci2 )?
7
Parametric methods (2- Expectation Maximization: EM)
 The EM algorithm is a general method of finding the maximum-likelihood
estimate of the parameters of an underlying distribution from a given data set
when the data is incomplete or has missing values.
EM Algorithm:
 Given initial parameters Θ0
 Repeatedly
o Re-estimating expected values of hidden binary variables Cij
o Then recalculate the MLE of Θ using these expected values for the
hidden variables
 Note:
 EM unsupervised method, but MLE supervised
 To use EM you must to know:
 Number of classes K,
 Parametric form of the distribution.
8
Illustrative example
complete
incomplete
9
Illustrative example
10
Parametric methods (2- Expectation Maximization: EM)
Assume that a joint density function for complete data set
p ( x, c |  )
 The EM algorithm first finds the expected value of the complete-data loglikelihood with respect to the unknown data C given the observed data x and
the current parameter estimates Θi-1.
Q ( | 
i 1

)  E log p ( x , c |  ) | x , 
i 1

The current parameters
estimates that we used to
evaluate the expectation
The new parameters that we
optimize to maximize Q
 The evaluation of this expectation is called the E-step of the algorithm
 The second step (the M-step) of the EM algorithm is to maximize the
expectation we computed in the first step.
  arg max Q (  | 
i

i 1
)
 These two steps are repeated as necessary. Each iteration is guaranteed to
increase the log likelihood and the algorithm is guaranteed to converge to a local
maximum of the likelihood function.
11
Parametric methods (2- Expectation Maximization: EM)
 The mixture-density parameter estimation problem
K
p( x) 
    x; 
k
  [ 1 ,  ,  K ,  1 ,  ,  K ]
k
k 1
Q ( | 
i 1

)  E log p ( x , c |  ) | x , 
i 1
   log
p ( x, c |  ) p (c | x, 
i 1
)dc
c C
n
 p (c j | x j , 
i 1
j 1
)
 Using Bayes’s rule, we can compute:
 p(x j |c )
i
p (c j | x j , 
i 1
)
i
cj
j
K

k 1
i
k
E-step
p(x j |k )
i
 Then
Q ( | 
i 1
K
)
n
  log( 
k
p ( x j |  k )) p ( k | x j , 
i 1
)
k 1 j 1
 Grades Example ………………..
12
Parametric methods (2- Expectation Maximization: EM)
 For some distributions, it is possible to get an analytical expressions for
 For example, if we assume d-dimensional Gaussian component distributions
n

i
j 1
k
n
 
M-step
   ,  
i 1
x j p (k | x j ,  k )

i 1
p (k | x j ,  k )
j 1
n

k 
i
j 1
i 1
p ( k | x j ,  k )( x j   k )( x j   k )
i
n

p (k | x j , 
i 1
k
i
T
E-step
)
j 1
i
p (c j | x j , 
 
i
k
1
n

n
j 1
p (k | x j , 
 p(x j |c )
i 1
k
i 1
)
i
cj
j
K
  p(x j |k )
i
)
k 1
i
k
13
Parametric methods (2- Expectation Maximization: EM)
Example:
 MATLAB demo
0.02
0.018
0.016
0.014
0.012
0.01
0.008
0.006
0.004
0.002
0
0
50
100
150
200
250
14