Transcript Document

Pattern Classification

All materials in these slides were taken from

Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000

with the permission of the authors and the publisher

Chapter 3 (Part 3): Maximum-Likelihood and Bayesian Parameter Estimation (Section 3.10)

• Hidden Markov Model: Extension of Markov Chains

2

Pattern Classification, Chapter 3 (Part 3)

3

Hidden Markov Model (HMM)

• Interaction of the visible states with the hidden states 

b jk = 1 for all j where b jk =P(V k (t) |

j (t)).

• 3 problems are associated with this model • The evaluation problem • The decoding problem • The learning problem Pattern Classification, Chapter 3 (Part 3)

4

• The evaluation problem It is the probability that the model produces a sequence V T of visible states. It is:

P ( V T )

r m ax r

 

1 P ( V T |

r T ) P (

r T )

r T

where each r indexes a particular sequence   

( 1 ),

( 2 ),...,

( T )

 of T hidden states.

( 1 ) P ( V T (2) P(

r T |

r T )

t t

T

 

1 P ( v ( t )

t t

  

T 1 P (

( t ) |

( t ) |

( t

1 )) ))

Pattern Classification, Chapter 3 (Part 3)

5

Using equations (1) and (2), we can write:

P ( V T )

r m ax r

 

1 t t

 

T 1 P ( v ( t ) |

( t )) P (

( t ) |

( t

1 )

Interpretation: sequence.

The probability that we observe the particular sequence of T visible states V T is equal to the sum over all r max possible sequences of hidden states of the conditional probability that the system has made a particular transition multiplied by the probability that it then emitted the visible symbol in our target Example: states Let 

1 ,

2 ,

3

be the hidden states;

v 1 , v 2 , v 3

be the visible and

V 3 = {v 1 , v 2 , v 3 }

is the sequence of visible states

P({v 1 , v 2 , v 3 }) = P( +…+

1 ).P(v 1 |

1 ).P(

2 |

1 ).P(v 2 |

2 ).P(

3

(possible terms in the sum= all possible (3 3

|

2 ).P(v 3 |

3 )

= 27) cases !) Pattern Classification, Chapter 3 (Part 3)

v 1 v 2 v 3

First possibility: 

1 (t = 1) v 1

2 (t = 2) v 2

3 (t = 3) v 3

Second Possibility: 

2 (t = 1)

3 (t = 2)

1 (t = 3)

P({v 1 , v 2 , v 3 }) = P(

2 ).P(v 1 |

2 ).P(

3 |

2 ).P(v 2 |

3 ).P(

1 |

3 ).P(v 3 |

1 ) + …+

Therefore:

P ({ v 1 , v 2 , v 3 })

 

possible of sequence hidden states t t

  

3 1 P ( v ( t ) |

( t )).

P (

( t ) |

( t

1 ))

6

Pattern Classification, Chapter 3 (Part 3)

Evaluation

• • •

HMM forward HMM backward Example 3.

• HMM Forward

7

Pattern Classification, Chapter 3 (Part 3)

8

Pattern Classification, Chapter 3 (Part 3)

9

Pattern Classification, Chapter 3 (Part 3)

10

Pattern Classification, Chapter 3 (Part 3)

Left-to-Right model

(speech recognition)

11

Pattern Classification, Chapter 3 (Part 3)

• The decoding problem (optimal state sequence)

12

Given a sequence of visible states V T , the decoding problem is to find the most probable sequence of hidden states.

This problem can be expressed mathematically as:

find the single “best” state sequence (hidden states)

( 1 ),

( 2 ),...,

( T ( 1 ),

( 2 ),...,

( T ) such that )

:

( 1 arg ),

( 2 max ),...,

( T ) P

 

( 1 ),

( 2 ),...,

( T ), v ( 1 ), v ( 2 ),..., V ( T ) |

  Note that the summation disappeared, since we want to find Only one unique best case !

Pattern Classification, Chapter 3 (Part 3)

13

Where:  

= [

,A,B] = P(

(1) =

) (initial state probability) A = a ij = P(

(t+1) = j |

(t) = i) B = b jk = P(v(t) = k |

(t) = j)

In the preceding example, this computation corresponds to the selection of the best path amongst:

{

1 (t = 1),

2 (t = 2),

3 (t = 3)}, {

2 (t = 1),

3 (t = 2),

1 (t = 3)} {

3 (t = 1),

1 (t = 2),

2 (t = 3)}, {

3 (t = 1),

2 (t = 2),

1 (t = 3)} {

2 (t = 1),

1 (t = 2),

3 (t = 3)}

Pattern Classification, Chapter 3 (Part 3)

Decoding

• •

HMM decoding Example 4

• Might have invalid path

14

Pattern Classification, Chapter 3 (Part 3)

15

Pattern Classification, Chapter 3 (Part 3)

• The learning problem (parameter estimation) This third problem consists of determining a method to adjust the model parameters 

= [

,A,B]

to satisfy a certain optimization criterion. We need to find the best model ^   [  ˆ , ˆ , ] Such that to maximize the probability of the observation sequence:

T Max

P

(

V

|  ) We use an iterative procedure such as Baum-Welch or Gradient to find this local optimum

16

Pattern Classification, Chapter 3 (Part 3)

Learning

The Forward-Backward Algorithm

• Baum-Welch Algorithm

17

Pattern Classification, Chapter 3 (Part 3)

by Eq. 140 by Eq. 141

18

Pattern Classification, Chapter 3 (Part 3)