web.thu.edu.tw

Download Report

Transcript web.thu.edu.tw

Hidden Markov Model
Continues …
Finite State Markov Chain
A discrete time stochastic process, consisting of a
domain D of m states {1,…,m} and
1. An m dimensional initial distribution vector ( p(1),.., p(m)).
2. An m×m transition probabilities matrix M= (ast)
• For each integer L, a Markov Chain assigns probability to
sequences (x1…xL) over D (i.e, xi  D) as follows:
L
p(( x1 , x2 ,...xL ))  p( X 1  x1 ) p( X i  xi | X i 1  xi 1 )
L
 p( x1 ) axi1xi
i 2
i 2
DEF L

 ax
i 1
i 1
xi
(ie, a0, x1  p ( x1 ))
Similarly, (X1,…, Xi ,…)is a sequence of probability distributions
over D.
Use of Markov Chains:
Sequences with CpG Islands
In human genomes the pair CG often transforms to
(methyl-C) G which often transforms to TG.
Hence the pair CG appears less than expected from
what is expected from the independent frequencies
of C and G alone.
Due to biological reasons, this process is sometimes
suppressed in short stretches of genomes such as in
the start regions of many genes.
These areas are called CpG islands (p denotes “pair”).
Modeling sequences with CpG Island
The “+” model: Use transition matrix A+ = (a+st),
Where:
a+st = (the probability that t follows s in a CpG
island)
The “-” model: Use transition matrix A- = (a-st),
Where:
a-st = (the probability that t follows s in a non
CpG island)
(Stationary) Markov Chains
X1
X2
XL-1
XL
•Every variable xi has a domain. For example, suppose the
domain are the letters {a, c, t, g}.
•Every variable is associated with a local (transition) probability
table p(Xi = xi | Xi-1= xi-1 ) and p(X1 = x1 ).
•The joint distribution is given by
p( X 1  x1 ,, X n  xn )  p( X 1  x1 ) p( X 2  x2 | X 1  x1 )  p( X L  xL | X L 1  xL 1 )
L
 p( X 1  x1 ) p( X i  xi | X i 1  xi 1 )
i 2
L
In short:
p( x1 ,, xL )   p( xi | xi 1 )
i 1
Stationary means that the
transition probability
tables do not depend on i.
Question 1: Using two Markov chains
For CpG islands:
X1
X2
XL-1
XL
We need to specify pI(xi | xi-1) where I stands for
CpG Island.
Xi
A
C
T
G
0.2
0.3
0.4
0.1
A
Xi-1
0.4
p(C | C) p(T| C)
high
C
T
G
0.1
p(C | T)
p(T | T)
0.3
p(C | G)
p(T | G) p(G | G)
Lines must add up to one; columns need not.
p(G | T)
=1
Question 1: Using two Markov chains
For non-CpG islands:
X1
X2
XL-1
We need to specify pN(xi | xi-1) where N stands
for Non CpG island.
Xi
A
C
T
0.2
0.3
0.25
A
Xi-1
0.4
p(C | C) p(T | C)
C
T
G
XL
G
0.25
low
0.1
p(C | T)
p(T | T)
high
0.3
p(C | G)
p(T | G) p(G | G)
Some entries may or may not change compared to pI(xi | xi-1) .
Question 1: Log Odds-Ratio test
Comparing the two options via odds-ratio test
yields
pI ( xi | xi 1 )
pI ( x1  xL )
logQ  log
  log
pN ( x1  xL )
pN ( xi | xi 1 )
i
If logQ > 0, then CpG island is more likely.
If logQ < 0, then non-CpG island is more likely.
Question 2: Finding CpG Islands
Given a long genomic string with possible CpG
Islands, we define a Markov Chain over 8 states, all
interconnected (hence it is ergodic):
A+
C+
G+
T+
A-
C-
G-
T-
The problem is that we don’t know
the sequence of states which are
traversed, but just the sequence of
letters.
Therefore we use here Hidden Markov Model
Hidden Markov Model
S1
T
x1
M
S2
M
T
x2
A Markov chain (s1,…,sL):
M
SL-1
M
T
XL-1
SL
T
xL
L
p( s1 , , sL )   p( si | si 1 )
i 1
and for each state s and a symbol x we have p(Xi=x|Si=s)
Application in communication: message sent is (s1,…,sm) but we
receive (x1,…,xm) . Compute what is the most likely message sent ?
Application in speech recognition: word said is (s1,…,sm) but we
recorded (x1,…,xm) . Compute what is the most likely word said ?
Hidden Markov Model
S1
T
x1
M
S2
M
M
T
x2
M
SL-1
T
SL
T
XL-1
xL
Notations:
Markov Chain transition probabilities: p(Si+1= t|Si = s) = ast
Emission probabilities: p(Xi = b| Si = s) = es(b)
L
For Markov Chains we know: p( s)  p( s1 , , sL )   p( si | si 1 )
i 1
What is p(s,x) = p(s1,…,sL;x1,…,xL) ?
Hidden Markov Model
S1
T
x1
M
S2
M
M
SL-1
T
M
T
x2
XL-1
SL
T
xL
p(Xi = b| Si = s) = es(b), means that the probability of xi
depends only on the probability of si.
Formally, this is equivalent to the conditional
independence assumption:
p(Xi=xi|x1,..,xi-1,xi+1,..,xL,s1,..,si,..,sL) = esi(xi)
L
Thus
p( s, x)  p( s1 , , sL ; x1 ,..., xL )   p( si | si 1 ) esi ( xi )
i 1
Hidden Markov Model for CpG Islands
The states:
Domain(Si)={+, -}  {A,C,T,G} (8 values)
S1
S2
SL-1
SL
X1
X2
XL-1
XL
In this representation P(xi| si) = 0 or 1 depending on whether xi is
consistent with si . E.g. xi= G is consistent with si=(+,G) and with
si=(-,G) but not with any other state of si.
The query of interest:
( s ,..., s )  max arg p ( s 1 ,..., s L | x1 ,  , x L )
*
1
*
L
(s 1 ,..., s L )
Hidden Markov Model
S1
T
x1
M
S2
T
x2
M
M
SL-1
T
XL-1
M
SL
T
xL
Questions:
Given the “visible” sequence x = (x1,…,xL), find:
1. A most probable (hidden) path.
2. The probability of x.
3. For each i = 1, .., L, and for each state k, p(si=k| x)
1. Most Probable state path
S1
T
x1
M
S2
M
T
M
SL-1
M
T
x2
XL-1
SL
T
xL
First Question: Given an output sequence x = (x1,…,xL),
A most probable path s*= (s*1,…,s*L) is one which
maximizes p( s | x ).
s*  ( s1* ,..., s*L ) 
max arg
(s1 ,..., sL )
p ( s1 ,..., sL | x1 ,..., xL )
Viterbi’s Algorithm for Most Probable Path
s1
s2
si
X1
X2
Xi
The task: compute
max arg p( s1 ,..., sL ; x1 ,..., xL )
(s1 ,...,sL )
Let the states be {1, …, m}
Idea: for i = 1, …, L and for each state l, compute:
vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable
path up to i, which ends in state l .
Viterbi’s algorithm for most probable path
s1
Si-1
l
Xi-1
Xi
...
X1
vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable
path up to i, which ends in state l .
Exercise: For i = 1,…,L and for each state l:
vl (i)  el ( xi )  max{vk (i  1)  akl }
k
Viterbi’s algorithm
s1
s2
si
sL-1
sL
X1
X2
Xi
XL-1
XL
0
We add the special initial state 0.
Initialization: v0(0) = 1 , vk(0) = 0 for k > 0
For i=1 to L do for each state l :
vl(i) = el(xi) MAXk {vk(i-1)akl }
ptri(l)=arg maxk{vk(i-1)akl}
[storing previous state for reconstructing the path]
Termination:
Result: p(s1*,…,sL*;x1,…,xl) = max{vk ( L)}
k
2. Computing p(x)
S1
T
x1
M
S2
M
M
T
SL-1
M
T
x2
XL-1
Given an output sequence x = (x1,…,xL),
Compute the probability that this sequence was generated:
p( x)   p( x, s)
S
The summation taken over all state-paths s generating x.
SL
T
xL
Forward algorithm for computing p(x)
The task: compute
p( x)   p( x, s)
s
Idea: for i=1,…,L and for each state l, compute:
fl(i) = p(x1,…,xi;si=l ), the probability of all the paths
which emit (x1, .., xi) and end in state si=l.
Use the recursive formula:
f l (i )  el ( xi )   f k (i  1)  akl
k
?
?
si
X1
X2
Xi
Forward algorithm for computing p(x)
s1
s2
si
sL-1
sL
X1
X2
Xi
XL-1
XL
0
Similar to Viterbi’s algorithm:
Initialization: f0(0) := 1 , fk(0) := 0 for k>0
For i=1 to L do for each state l :
fl(i) = el(xi) ∑k fk(i-1)akl
Result: p(x1,…,xL) =
 k f k ( L)
3. The distribution of Si, given x
S1
T
x1
M
S2
T
x2
M
M
SL-1
M
T
XL-1
SL
T
xL
Given an output sequence x = (x1,…,xL),
Compute for each i=1,…,l and for each state k the
probability that si = k ; i.e., p(si=k| x) .
This helps to reply queries like: what is the probability
that si is in a CpG island, etc.
Solution in two stages
s1
s2
si
sL-1
sL
X1
X2
Xi
XL-1
XL
1. For each i and each state k, compute p(si=k | x1,…,xL).
2. Do the same computation for every i = 1, .. , L but
without repeating the first task L times.
Computing for a single i:
s1
s2
si
sL-1
sL
X1
X2
Xi
XL-1
XL
p ( si , x1 ,..., xL )
p ( si | x1 ,..., xL ) 
p ( x1 ,..., xL )
Decomposing the computation
s1
s2
si
sL-1
sL
X1
X2
Xi
XL-1
XL
P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL | x1,…,xi,si)
(by the equality p(A,B) = p(A) p(B|A ).
P(x1,…,xi,si)= fsi(i) ≡ F(si), so we are left with the
task to compute P(xi+1,…,xL | x1,…,xi,si) ≡ B(si)
Decomposing the computation
s1
s2
si
Si+1
sL
X1
X2
Xi
Xi+1
XL
Exercise: Show from the definitions of Markov Chain
and Hidden Markov Chain that:
P(xi+1,…,xL | x1,…,xi,si) = P(xi+1,…,xL | si)
Denote P(xi+1,…,xL | si) ≡ B(si).
Decomposing the computation
s1
s2
si
sL-1
sL
X1
X2
Xi
XL-1
XL
Summary:
P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL | x1,…,xi , si)
= P(x1,…,xi,si) P(xi+1,…,xL | si) ≡ F(si)·B(si)
Equality due to independence of ( {xi+1,…,xL},
and {x1,…,xi} | si ) – by the Exercise.
F(si): The Forward algorithm:
s1
s2
si
sL-1
sL
X1
X2
Xi
XL-1
XL
0
Initialization: F (0) = 1
For i=1 to L do for each state l :
F(si) = esi(xi)·∑si-1 F (si-1)asi-1si
The algorithm computes F(si) = P(x1,…,xi,si) for i=1,…,L (namely,
considering evidence up to time slot i).
B(si): The backward algorithm
Si
Si+1
SL-1
SL
Xi+1
XL-1
XL
The task: Compute B(si) = P(xi+1,…,xL|si) for i=L-1,…,1 (namely,
considering evidence after time slot i).
{first step, step L-1: Compute B(sL-1).}
P(xL| sL-1) = sLP(xL ,sL |sL-1) =
s
L
P(sL |sL-1) P(xL |sL )
{step i: compute B(si) from B(si+1)}
P(xi+1,…,xL|si) =  P(si+1 | si) P(xi+1 | si+1) P(xi+2,…,xL| si+1)
si+1
B(si)
B(si+1)
The combined answer
s1
s2
si
sL-1
sL
X1
X2
Xi
XL-1
XL
1. To compute the probability that Si=si given that
{x1,…,xL} run the forward algorithm and compute F(si) =
P(x1,…,xi,si), run the backward algorithm to compute B(si)
= P(xi+1,…,xL|si), the product F(si)B(si) is the answer (for
every possible value si).
2. To compute these probabilities for every si simply run
the forward and backward algorithms once, storing F(si)
and B(si) for every i (and every value of si). Compute
F(si)B(si) for every i.
Time and Space Complexity of the
forward/backward algorithms
s1
s2
si
sL-1
sL
X1
X2
Xi
XL-1
XL
Time complexity is O(m2L) where m is the number of
states. It is linear in the length of the chain, provided the
number of states is a constant.
Space complexity is also O(m2L).