Transcript Document
Pattern Classification
All materials in these slides were taken from
Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000
with the permission of the authors and the publisher
Chapter 3 (Part 3): Maximum-Likelihood and Bayesian Parameter Estimation (Section 3.10)
• Hidden Markov Model: Extension of Markov Chains
2
Pattern Classification, Chapter 3 (Part 3)
3
•
Hidden Markov Model (HMM)
• Interaction of the visible states with the hidden states
b jk = 1 for all j where b jk =P(V k (t) |
j (t)).
• 3 problems are associated with this model • The evaluation problem • The decoding problem • The learning problem Pattern Classification, Chapter 3 (Part 3)
4
• The evaluation problem It is the probability that the model produces a sequence V T of visible states. It is:
P ( V T )
r m ax r
1 P ( V T |
r T ) P (
r T )
r T
where each r indexes a particular sequence
( 1 ),
( 2 ),...,
( T )
of T hidden states.
( 1 ) P ( V T (2) P(
r T |
r T )
t t
T
1 P ( v ( t )
t t
T 1 P (
( t ) |
( t ) |
( t
1 )) ))
Pattern Classification, Chapter 3 (Part 3)
5
Using equations (1) and (2), we can write:
P ( V T )
r m ax r
1 t t
T 1 P ( v ( t ) |
( t )) P (
( t ) |
( t
1 )
Interpretation: sequence.
The probability that we observe the particular sequence of T visible states V T is equal to the sum over all r max possible sequences of hidden states of the conditional probability that the system has made a particular transition multiplied by the probability that it then emitted the visible symbol in our target Example: states Let
1 ,
2 ,
3
be the hidden states;
v 1 , v 2 , v 3
be the visible and
V 3 = {v 1 , v 2 , v 3 }
is the sequence of visible states
P({v 1 , v 2 , v 3 }) = P( +…+
1 ).P(v 1 |
1 ).P(
2 |
1 ).P(v 2 |
2 ).P(
3
(possible terms in the sum= all possible (3 3
|
2 ).P(v 3 |
3 )
= 27) cases !) Pattern Classification, Chapter 3 (Part 3)
v 1 v 2 v 3
First possibility:
1 (t = 1) v 1
2 (t = 2) v 2
3 (t = 3) v 3
Second Possibility:
2 (t = 1)
3 (t = 2)
1 (t = 3)
P({v 1 , v 2 , v 3 }) = P(
2 ).P(v 1 |
2 ).P(
3 |
2 ).P(v 2 |
3 ).P(
1 |
3 ).P(v 3 |
1 ) + …+
Therefore:
P ({ v 1 , v 2 , v 3 })
possible of sequence hidden states t t
3 1 P ( v ( t ) |
( t )).
P (
( t ) |
( t
1 ))
6
Pattern Classification, Chapter 3 (Part 3)
Evaluation
• • •
HMM forward HMM backward Example 3.
• HMM Forward
7
Pattern Classification, Chapter 3 (Part 3)
8
Pattern Classification, Chapter 3 (Part 3)
9
Pattern Classification, Chapter 3 (Part 3)
10
Pattern Classification, Chapter 3 (Part 3)
Left-to-Right model
(speech recognition)
11
Pattern Classification, Chapter 3 (Part 3)
• The decoding problem (optimal state sequence)
12
Given a sequence of visible states V T , the decoding problem is to find the most probable sequence of hidden states.
This problem can be expressed mathematically as:
find the single “best” state sequence (hidden states)
( 1 ),
( 2 ),...,
( T ( 1 ),
( 2 ),...,
( T ) such that )
:
( 1 arg ),
( 2 max ),...,
( T ) P
( 1 ),
( 2 ),...,
( T ), v ( 1 ), v ( 2 ),..., V ( T ) |
Note that the summation disappeared, since we want to find Only one unique best case !
Pattern Classification, Chapter 3 (Part 3)
13
Where:
= [
,A,B] = P(
(1) =
) (initial state probability) A = a ij = P(
(t+1) = j |
(t) = i) B = b jk = P(v(t) = k |
(t) = j)
In the preceding example, this computation corresponds to the selection of the best path amongst:
{
1 (t = 1),
2 (t = 2),
3 (t = 3)}, {
2 (t = 1),
3 (t = 2),
1 (t = 3)} {
3 (t = 1),
1 (t = 2),
2 (t = 3)}, {
3 (t = 1),
2 (t = 2),
1 (t = 3)} {
2 (t = 1),
1 (t = 2),
3 (t = 3)}
Pattern Classification, Chapter 3 (Part 3)
Decoding
• •
HMM decoding Example 4
• Might have invalid path
14
Pattern Classification, Chapter 3 (Part 3)
15
Pattern Classification, Chapter 3 (Part 3)
• The learning problem (parameter estimation) This third problem consists of determining a method to adjust the model parameters
= [
,A,B]
to satisfy a certain optimization criterion. We need to find the best model ^ [ ˆ , ˆ , ] Such that to maximize the probability of the observation sequence:
T Max
P
(
V
| ) We use an iterative procedure such as Baum-Welch or Gradient to find this local optimum
16
Pattern Classification, Chapter 3 (Part 3)
Learning
•
The Forward-Backward Algorithm
• Baum-Welch Algorithm
17
Pattern Classification, Chapter 3 (Part 3)
by Eq. 140 by Eq. 141
18
Pattern Classification, Chapter 3 (Part 3)