Transcript Slide 1
Image Modeling
& Segmentation
Aly Farag and Asem Ali
Lecture #3
Parametric methods
These methods are useful when the underlying distribution is known in
advance or is simple enough to be modeled by a simple distribution function
or a mixture of such functions
o A location parameter simply shifts the graph left or right on the horizontal axis
.
o A scale parameter (>1) stretches or (<1 ) compress the pdf,
The parametric model is very compact (low memory and CPU usage) where
only few parameters need to fit.
The model’s parameters are estimated using these methods such as
maximum likelihood estimation, Bayesian estimation and expectation
maximization.
2
Parametric methods (1- Maximum Likelihood Estimator: MLE)
Suppose that n samples x1, x2, … , xn are drawn independently and identically
distributed (i.i.d.) ~ distribution φ(θ) with vector of parameters θ=(θ1,…. , θr)
Know: data samples, the distribution type
Unknown : θ ?????
MLE Method estimates θ by maximizing the log likelihood of the data
MLE arg max log p ( x 1 , x 2 , , x n | )
To show the dependence
of p on ɵ explicitly.
n
p (x i | )
By i.i.d.
log
p (x i | )
monotonicity of log
arg max log
i 1
n
arg max
i 1
3
Parametric methods (1- Maximum Likelihood Estimator: MLE)
n
log
Let L
p (x i | )
i 1
Then calculate
L
Find θ by letting
[
L
L
,
L
1 2
, ,
L
r
]
T
0
Coin Example ………………..
In some cases we can find a closed form for θ
Example:
Suppose that n samples x1, x2, … , xn are drawn independently and identically
distributed (i.i.d.)~ 1D N(μ,σ) find the MLE of μ and σ
n
L ( )
n
log
p (x i | )
i 1
L
[
L
i 1
L
,
n
] [
T
MLE
1
i 1
(x )
i
1
2
log
e 2
2
xi
2
,
2
1
n log
2
n
n
2
2
x
n
i
i 1
2
i 1
n
2
MLE
(x i )
1
n
(x i )
i 1
2
2
2
2
]0
4
n
(x
n
i
)
2
i 1
Matlab Demo
4
Parametric methods (1- Maximum Likelihood Estimator: MLE)
An estimator of a parameter is unbiased if the expected value of the estimate is
the same as the true value of the parameters.
E [ MLE
Example:
1
] E
n
n
i 1
1
xi
n
n
E x i
i 1
1
n
nE x i
An estimator of a parameter is biased if the expected value of the estimate is
different from the true value of the parameters.
E[
Example:
2
MLE
1
] E
n
1
E
n
n
(x
i
MLE
i 1
n
(x
2
i
i 1
2
n
1
) E
n
2
n
i 1
xix j
2
unbiased
1
i 1
(x i x j )
n j 1
1
n
2
2
(
x
)
)
j
2
n
j 1
.......... .......... ...... Complete
1
1
n
n
1
n
the proof
.......... ........
2
n
(x
n 1
i 1
i
MLE )
2
doesn’t make much difference once n --> large
5
Parametric methods
What if there are distinct subpopulations in observed data?
Example
Pearson in 1894, tried to model the distribution of the ratio between
measurements of forehead and body length on crabs.
He used a two-component mixture.
It was hypothesized that the two-component structure was related to the
possibility of this particular population of crabs evolving into two new subspecies
Mixture Model
The underlying density is assumed to have the form
K
p( x)
x;
k
k 1
The weights, Constrained
K
k 0,
k 1
k
1
k
What is the difference between
Mixture Model and the kernel-based
estimator?
Components of the mixture are
densities and are parameterized
by k
6
Parametric methods
Example
Given that {xi, Ci1, Ci2} n samples (complete data) drawn i.i.d. two normal
distributions
xi observed value of ith instance
Ci1 and Ci2 indicate which of two normal distributions was used to
generate xi
Cij=1 if Cij was used to generate xi, 0 otherwise
MLE
1
1
2
1
n1
n1
1
n1
n1
x
i 1 ,
C i 1 1
(x i 1 )
i 1 ,
C i 1 1
2
i
2
1
n2
2
2
n2
x
i
i 1,
C i 2 1
1
n2
n2
(x i 2 )
2
i 1,
C i 2 1
How can we estimate the parameters given incomplete data (don’t know Ci1
and Ci2 )?
7
Parametric methods (2- Expectation Maximization: EM)
The EM algorithm is a general method of finding the maximum-likelihood
estimate of the parameters of an underlying distribution from a given data set
when the data is incomplete or has missing values.
EM Algorithm:
Given initial parameters Θ0
Repeatedly
o Re-estimating expected values of hidden binary variables Cij
o Then recalculate the MLE of Θ using these expected values for the
hidden variables
Note:
EM unsupervised method, but MLE supervised
To use EM you must to know:
Number of classes K,
Parametric form of the distribution.
8
Illustrative example
complete
incomplete
9
Illustrative example
10
Parametric methods (2- Expectation Maximization: EM)
Assume that a joint density function for complete data set
p ( x, c | )
The EM algorithm first finds the expected value of the complete-data loglikelihood with respect to the unknown data C given the observed data x and
the current parameter estimates Θi-1.
Q ( |
i 1
) E log p ( x , c | ) | x ,
i 1
The current parameters
estimates that we used to
evaluate the expectation
The new parameters that we
optimize to maximize Q
The evaluation of this expectation is called the E-step of the algorithm
The second step (the M-step) of the EM algorithm is to maximize the
expectation we computed in the first step.
arg max Q ( |
i
i 1
)
These two steps are repeated as necessary. Each iteration is guaranteed to
increase the log likelihood and the algorithm is guaranteed to converge to a local
maximum of the likelihood function.
11
Parametric methods (2- Expectation Maximization: EM)
The mixture-density parameter estimation problem
K
p( x)
x;
k
[ 1 , , K , 1 , , K ]
k
k 1
Q ( |
i 1
) E log p ( x , c | ) | x ,
i 1
log
p ( x, c | ) p (c | x,
i 1
)dc
c C
n
p (c j | x j ,
i 1
j 1
)
Using Bayes’s rule, we can compute:
p(x j |c )
i
p (c j | x j ,
i 1
)
i
cj
j
K
k 1
i
k
E-step
p(x j |k )
i
Then
Q ( |
i 1
K
)
n
log(
k
p ( x j | k )) p ( k | x j ,
i 1
)
k 1 j 1
Grades Example ………………..
12
Parametric methods (2- Expectation Maximization: EM)
For some distributions, it is possible to get an analytical expressions for
For example, if we assume d-dimensional Gaussian component distributions
n
i
j 1
k
n
M-step
,
i 1
x j p (k | x j , k )
i 1
p (k | x j , k )
j 1
n
k
i
j 1
i 1
p ( k | x j , k )( x j k )( x j k )
i
n
p (k | x j ,
i 1
k
i
T
E-step
)
j 1
i
p (c j | x j ,
i
k
1
n
n
j 1
p (k | x j ,
p(x j |c )
i 1
k
i 1
)
i
cj
j
K
p(x j |k )
i
)
k 1
i
k
13
Parametric methods (2- Expectation Maximization: EM)
Example:
MATLAB demo
0.02
0.018
0.016
0.014
0.012
0.01
0.008
0.006
0.004
0.002
0
0
50
100
150
200
250
14