Markov Models for Pattern Recognition

Download Report

Transcript Markov Models for Pattern Recognition

Part1
Markov Models for Pattern
Recognition – Introduction
CSE717, SPRING 2008
CUBS, Univ at Buffalo
Textbook
 Markov models for
pattern recognition: from
theory to applications
by Gernot A. Fink, 1st
Edition, Springer, Nov 2007
Textbook
 Foundation of Math Statistics
 Vector Quantization and Mixture
Density Models
 Markov Models
 Hidden Markov Model (HMM)



Model formulation
Classic algorithms in the HMM
Application domain of the HMM
n-Gram
 Systems
 Character and handwriting
recognition
 Speech recognition
 Analysis of biological sequences

Preliminary Requirements
 Familiar with Probability Theory and Statistics
 Basic concepts in Stochastic Process
Part 2 a
Foundation of Probability
Theory, Statistics &
Stochastic Process
CSE717 , SPRING 2008
CUBS, Univ at Buffalo
Coin Toss Problem
 Coin toss result: X  S X  {head, tail}
 X: random variable
 head, tail: states
 SX: set of states
 Probabilities: PrX (head)  PrX (tail)  0.5
Discrete Random Variable
 A discrete random variable’s states are
discrete: natural numbers, integers, etc
 Described by probabilities of states PrX(s1),
PrX(x=s2), …
s1, s2, …: discrete states (possible values of x)
 Probabilities over all the states add up to 1
 Pr
X
i
(si )  1
Continuous Random Variable
 A continuous random variable’s states are
continuous: real numbers, etc
 Described by its probability density function
(p.d.f.): pX(s)
 The probability of a<X<b can be obtained by
integral
b
 p X (s)ds
a
 Integral from   to 



p X (s)ds  1
Joint Probability and Joint p.d.f.
 Joint probability of discrete random variables
PrX1 , X 2 ,...,X n ( x1, x2 ,...,xn ), xi is any possible stateof X i
 Joint p.d.f. of continuous random variables
pX1 , X 2 ,...,X n ( x1, x2 ,...,xn ), xi is any possiblestateof X i
 Independence Condition
PrX1, X 2 ,..., X n ( x1, x2 ,...,xn )  PrX1 ( x1)  PrX 2 ( x2 )  ... PrX n ( xn )
pX1 , X 2 ,...,X n ( x1, x2 ,...,xn )  pX1 ( x1 )  pX 2 ( x2 )  ... pX n ( xn )
Conditional Probability and p.d.f.
 Conditional probability of discrete random
variables
PrX 2 | X1 ( x2 | x1 )  PrX1 , X1 ( x1, x2 ) / PrX1 ( x1 )
 Joint p.d.f. for continuous random variables
pX 2 | X1 ( x2 | x1 )  pX1 , X1 ( x1, x2 ) / pX1 ( x1 )
Statistics: Expected Value and Variance
 For discrete random variable
E{X }   si PrX (si )
i
Var { X }   ( si  E{ X }) PrX ( si )
2
i
 For continuous random variable
E{X }  



xpX ( x)dx
Var{X }   ( x  E{X }) 2 p X ( x)dx

Normal Distribution of Single Random
Variable N (, 2 )
 Notation
X ~ N (,  2 )
 p.d.f
1
( x   )2
p X ( x) 
exp(
)
2
2
 2
 Expected value
E{x}  
 Variance
Var{x}   2
Stochastic Process
 A stochastic process {X t }  {..., X t 1 , X t , X t 1 ,}
is a time series of random variables
 X t : random variable

t: time stamp
Audio signal
Stock market
Causal Process
 A stochastic process is causal if it has a finite
history
A causal process can be represented by
X1, X 2 ,, X t ,...
Stationary Process
 A stochastic process { X t }is stationary if the
probability at a fixed time t is the same for all
other times, i.e.,
for any n, X t1 , X t2 ,, X tn {X t } and  ,
PrX t , X t
1
2
,, X tn
( x1 , x2 ,, xn ) 
PrX t  , X t  ,, X t  ( x1 , x2 ,, xn )
1
2
n
 A stationary process is sometimes referred to
as strictly stationary, in contrast with weak or
wide-sense stationarity
Gaussian White Noise
 White Noise: X t obeys independent identical
distribution (i.i.d.)
 Gaussian White Noise
X t ~ N ( ,  2 )
Gaussian White Noise is a Stationary
Process
Proof
for any n, X t1 , X t2 ,, X tn {X t } and ,
pXt ,Xt
1
2
,, X tn
( x1 , x2 ,, xn )
n
  p X ti ( xi )
i 1
( xi   ) 2
1

exp(
)
2
2
i 1 ( 2 )
 p X t  , X t  ,, X t  ( x1 , x2 , , xn )
n
1
2
n
Temperature
Q1: Is the temperature within a day stationary?
Markov Chains
 A causal process { X t } is a Markov chain if
PrX t | X1 ,, X t1 ( xt | x1,, xt 1 )  PrX t | X tk ,, X t1 ( xt | xt k ,, xt 1 )
for any x1, …, xt
k is the order of the Markov chain
 First order Markov chain
PrX t | X1 ,, X t1 ( xt | x1,, xt 1 )  PrX t | X t1 ( xt | xt 1 )
 Second order Markov chain
PrX t | X1 ,, X t1 ( xt | x1,, xt 1 )  PrX t | X t2 , X t1 ( xt | xt 2 , xt 1 )
Homogeneous Markov Chains
 A k-th order Markov chain { X t }is homogeneous
if the state transition probability is the same
over time, i.e.,
PrX t | X tk ,, X t1 ( x0 | xk ,, x1 )  PrX | X k ,, X 1 ( x0 | xk ,, x1 )
for any t , τ , x0 ,, xk
 Q2: Does homogeneous Markov chain imply
stationary process?
State Transition in Homogeneous
Markov Chains
 Suppose { X t } is a k-th order Markov chain and
S is the set of all possible states (values) of xt,
then for any k+1 states x0, x1, …, xk, the state
transition probability
PrX t | X tk ,, X t1 ( x0 | xk ,, x1 )
can be abbreviated to Pr(x0 | xk ,, x1 )
Example of Markov Chain
0.4
0.6
Rain
Dry
0.2
0.8
Two states : ‘Rain’ and ‘Dry’.
Transition probabilities:
Pr(‘Rain’|‘Rain’)=0.4 , Pr(‘Dry’|‘Rain’)=0.6 ,
Pr(‘Rain’|‘Dry’)=0.2, Pr(‘Dry’|‘Dry’)=0.8
Short Term Forecast
0.4
0.6
Rain
Dry
0.2
0.8
Initial (say, Wednesday) probabilities:
PrWed(‘Rain’)=0.3, PrWed(‘Dry’)=0.7
What’s the probability of rain on Thursday?
PThur(‘Rain’)=
PrWed(‘Rain’)xPr(‘Rain’|‘Rain’)+PrWed(‘Dry’)xPr(‘Rain’|‘Dry’)=
0.3x0.4+0.7x0.2=0.26
Condition of Stationary
0.4
0.6
Rain
Dry
0.2
0.8
Pt(‘Rain’)=
Prt-1(‘Rain’)xPr(‘Rain’|‘Rain’)+Prt-1(‘Dry’)xPr(‘Rain’|‘Dry’)=
Prt-1(‘Rain’)x0.4+(1– Prt-1(‘Rain’)x0.2=
0.2+0.2xPrt(‘Rain’)
Pt(‘Rain’)= Prt-1(‘Rain’) => Prt-1(‘Rain’)=0.25, Prt-1(‘Dry’)=1-0.25=0.75
steady state distribution
Steady-State Analysis
0.4
0.6
Rain
Dry
0.2
0.8
Pt(‘Rain’) = 0.2+0.2xPrt-1(‘Rain’)
Pt(‘Rain’) – 0.25 = 0.2x(Prt-1(‘Rain’) – 0.25)
Pt(‘Rain’) = 0.2t-1x(Pr1(‘Rain’)-0.25)+0.25
 lim Pt(‘Rain’) = 0.25 (converges to steady state distribution)
t 
Periodic Markov Chain
0
1
Rain
Dry
1
0
Periodic Markov chain never converges to steady states