Comp. Genomics

Transcript Comp. Genomics

Comp. Genomics
Recitation 6
14/11/06
ML and EM
Outline
•
•
•
•
Maximum likelihood estimation
HMM Example
EM
Baum-Welch algorithm
Maximum likelihood
• One of the methods for parameter
estimation
• Likelihood: L=P(Data|Parameters)
• Simple example:
•
•
•
•
Simple coin with P(head)=p
10 coin tosses
6 heads, 4 tails
L=P(Data|Params)=(106)p6 (1-p)4
Maximum likelihood
• We want to find p that maximizes
L=(106)p6 (1-p)4
• Infi 1, Remember?
• Log is a monotonically increasing function,
we can optimize logL=log[(106)p6 (1-p)4]=
log(106)+6logp+4log(1-p)]
• Deriving by p we get: 6/p-4/(1-p)=0
• Estimate for p:0.6 (Makes sense?)
ML in Profile HMMs
• Transition Probabilities
• Mi  Mi+1
• Mi  Di+1
• Mi  Ii
• Ii  Mi+1
• Ii  Ii
• Di  Di+1
• Di  Mi+1
• Di  Ii
http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt
• Emission
probabilities
• Mi  a
• Ii  a
Parameter Estimation for HMMs
Input: X1,…,Xn independent training sequences
Goal: estimation of  = (A,E) (model parameters)
Note: P(X1,…,Xn | ) = i=1…nP(Xi | )
(indep.)
l(x1,…,xn | ) = log P(X1,…,Xn | ) = i=1…nlog P(Xi | )
Case 1 - Estimation When State Sequence is Known:
Akl
= #(occurred kl transitions)
Ek(b) = #(emissions of symbol b that occurred in state k)
small sample, or
Max. Likelihood Estimators:
prior knowledge correction:
• akl
= Akl / l’Akl’
A’kl
= Akl + rkl
E’k(b) = Ek(b) + rk(b)
• ek(b)
= Ek(b) / b’Ek(b’)
Example
• Suppose we are given the aligned
**---*
sequences
AG---C
A-AT-C
AG-AA--AAAC
AG---C
• Suppose also that the “match” positions are
marked...
http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt
Calculating A, E
count transitions and emissions:
transitions
0
M-M
M-D
M-I
I-M
I-D
I-I
D-M
D-D
D-I
1
2
3
**---*
AG---C
A-AT-C
AG-AA--AAAC
AG---C
http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt
emissions
0
A
C
G
T
A
C
T
G
1
2
3
Calculating A, E
count transitions and emissions:
transitions
M-M
M-D
M-I
I-M
I-D
I-I
0
4
1
0
0
0
0
1
3
1
0
0
0
0
2
2
0
1
2
1
4
3
4
0
0
0
0
0
D-M
D-D
D-I
-
0
1
0
0
0
2
1
0
0
**---*
AG---C
A-AT-C
AG-AA--AAAC
AG---C
http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt
emissions
A
C
G
T
0
-
1
4
0
0
0
2
0
0
3
0
3
0
4
0
0
A
C
T
G
0
0
0
0
0
0
0
0
6
0
1
0
0
0
0
0
Estimating Maximum Likelihood
probabilities using Fractions
emissions
A
C
G
T
0
-
1
4
0
0
0
2
0
0
3
0
3
0
4
0
0
A
C
T
G
0
0
0
0
0
0
0
0
6
0
1
0
0
0
0
0
http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt
A
C
G
T
0
-
A
C
T
G
.25
.25
.25
.25
1
1
0
0
0
2
0
0
1
0
3
0
1
0
0
.25 .86 .25
.25 0 .25
.25 .14 .25
.25 0 .25
Estimating ML probabilities
(contd)
transitions
M-M
M-D
M-I
I-M
I-D
I-I
0
4
1
0
0
0
0
1
3
1
0
0
0
0
D-M
D-D
D-I
-
0 0 1
1 0 0
0 2 0
2
2
0
1
2
1
4
3
4
0
0
0
0
0
http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt
0
M-M .8
M-D .2
M-I
0
I-M .33
I-D
.33
I-I
.33
D-M
D-D
D-I
-
1
.75
.25
0
.33
.33
.33
2
.66
0
.33
.28
.14
.57
3
1.0
0
0
.33
.33
.33
0
1
0
0
0
1
1
0
0
EM - Mixture example
• Assume we are given heights of 100 individuals
(men/women): y1…y100
• We know that:
• The men’s heights are normally distributed with
(μm,σm)
• The women’s heights are normally distributed with
(μw,σw)
• If we knew the genders – estimation is “easy”
(How?)
• What we don’t know the genders in our data!
• X1…,X100 are unknown
• P(w),P(m) are unknown
Mixture example
• Our goal: estimate the parameters
(μm,σm), (μn,σn), p(m)
• A classic “estimation with missing data”
• (In an HMM: we know the emmissions,
but not the states!)
• Expectation-Maximization (EM):
• Compute the “expected” gender for every
sample height
• Estimate the parameters using ML
• Iterate
EM
• Widely used in machine learning
• Using ML for parameter estimation at
every iteration promises that the
likelihood will consistently improve
• Eventually we’ll reach a local minima
• A good starting point is important
Mixture example
• If we have a mixture of M gaussians, each
with a probability αi and density
θi=(μm,σm)
• Likelihood the observations (X):
• The “incomplete-data” log-likelihood of
the sample x1,…,xN:
• Difficult to estimate directly…
Mixture example
• Now we introduce y1,…,y100: hidden
variables telling us what Gaussian every
sample came from
• If we knew y, the likelihood would be:
• Of course, we do not know the ys…
• We’ll do EM, starting from θg=(α1g ,..,αMg,
μ1g,..,μMg,σ1g,.., σMg)
Estimation
•
•
•
•
Given θg, we can estimate the ys!
We want to find:
The expectation is over the states of y
Bayes rule: P(X|Y)=P(Y|X)P(X)/P(Y):
Estimation
• We write down the Q:
• Daunting?
Estimation
• Simplifying:
• Now the Q becomes:
Maximization
• Now we want to find parameter
estimates, such that:
• Infi 2, remember?
• To impose the constraint Sum{αi}=1, we
introduce Lagrange multiplier λ:
• After summing both sides over l:
Maximization
• Estimating μig+1,σig+1 is more difficult 
• Out of scope here
• What turns out is actually quite
straightforward:
What you need to know about
EM:
• When: If we want to estimate model
parameters, and some of the data is
“missing”
• Why: Maximizing likelihood directly is very
difficult
• How:
• Initial guess of the parameters
• Finding a proper term for Q(θg, θg+1)
• Deriving and finding ML estimators
EM estimation in HMMs
Input: X1,…,Xn independent training sequences
Baum-Welch alg. (1972):
 Expectation:
• compute expected # of kl state transitions:
P(i=k, i+1=l | X, ) =
[1/P(x)]·fk(i)·akl·el(xi+1)·bl(i+1)
Akl = j[1/P(Xj)] · i fkj(i) · akl · el(xji+1) · blj(i+1)
• compute expected # of symbol b appearances in state k
Ek(b) = j[1/P(Xj)] · {i|xji=b} fkj(i) · bkj(i) (ex.)
Maximization:
• re-compute new parameters from A, E using max.
likelihood.
repeat (1)+(2) until improvement  