Markov Models for Pattern Recognition
Download
Report
Transcript Markov Models for Pattern Recognition
Part1
Markov Models for Pattern
Recognition – Introduction
CSE717, SPRING 2008
CUBS, Univ at Buffalo
Textbook
Markov models for
pattern recognition: from
theory to applications
by Gernot A. Fink, 1st
Edition, Springer, Nov 2007
Textbook
Foundation of Math Statistics
Vector Quantization and Mixture
Density Models
Markov Models
Hidden Markov Model (HMM)
Model formulation
Classic algorithms in the HMM
Application domain of the HMM
n-Gram
Systems
Character and handwriting
recognition
Speech recognition
Analysis of biological sequences
Preliminary Requirements
Familiar with Probability Theory and Statistics
Basic concepts in Stochastic Process
Part 2 a
Foundation of Probability
Theory, Statistics &
Stochastic Process
CSE717 , SPRING 2008
CUBS, Univ at Buffalo
Coin Toss Problem
Coin toss result: X S X {head, tail}
X: random variable
head, tail: states
SX: set of states
Probabilities: PrX (head) PrX (tail) 0.5
Discrete Random Variable
A discrete random variable’s states are
discrete: natural numbers, integers, etc
Described by probabilities of states PrX(s1),
PrX(x=s2), …
s1, s2, …: discrete states (possible values of x)
Probabilities over all the states add up to 1
Pr
X
i
(si ) 1
Continuous Random Variable
A continuous random variable’s states are
continuous: real numbers, etc
Described by its probability density function
(p.d.f.): pX(s)
The probability of a<X<b can be obtained by
integral
b
p X (s)ds
a
Integral from to
p X (s)ds 1
Joint Probability and Joint p.d.f.
Joint probability of discrete random variables
PrX1 , X 2 ,...,X n ( x1, x2 ,...,xn ), xi is any possible stateof X i
Joint p.d.f. of continuous random variables
pX1 , X 2 ,...,X n ( x1, x2 ,...,xn ), xi is any possiblestateof X i
Independence Condition
PrX1, X 2 ,..., X n ( x1, x2 ,...,xn ) PrX1 ( x1) PrX 2 ( x2 ) ... PrX n ( xn )
pX1 , X 2 ,...,X n ( x1, x2 ,...,xn ) pX1 ( x1 ) pX 2 ( x2 ) ... pX n ( xn )
Conditional Probability and p.d.f.
Conditional probability of discrete random
variables
PrX 2 | X1 ( x2 | x1 ) PrX1 , X1 ( x1, x2 ) / PrX1 ( x1 )
Joint p.d.f. for continuous random variables
pX 2 | X1 ( x2 | x1 ) pX1 , X1 ( x1, x2 ) / pX1 ( x1 )
Statistics: Expected Value and Variance
For discrete random variable
E{X } si PrX (si )
i
Var { X } ( si E{ X }) PrX ( si )
2
i
For continuous random variable
E{X }
xpX ( x)dx
Var{X } ( x E{X }) 2 p X ( x)dx
Normal Distribution of Single Random
Variable N (, 2 )
Notation
X ~ N (, 2 )
p.d.f
1
( x )2
p X ( x)
exp(
)
2
2
2
Expected value
E{x}
Variance
Var{x} 2
Stochastic Process
A stochastic process {X t } {..., X t 1 , X t , X t 1 ,}
is a time series of random variables
X t : random variable
t: time stamp
Audio signal
Stock market
Causal Process
A stochastic process is causal if it has a finite
history
A causal process can be represented by
X1, X 2 ,, X t ,...
Stationary Process
A stochastic process { X t }is stationary if the
probability at a fixed time t is the same for all
other times, i.e.,
for any n, X t1 , X t2 ,, X tn {X t } and ,
PrX t , X t
1
2
,, X tn
( x1 , x2 ,, xn )
PrX t , X t ,, X t ( x1 , x2 ,, xn )
1
2
n
A stationary process is sometimes referred to
as strictly stationary, in contrast with weak or
wide-sense stationarity
Gaussian White Noise
White Noise: X t obeys independent identical
distribution (i.i.d.)
Gaussian White Noise
X t ~ N ( , 2 )
Gaussian White Noise is a Stationary
Process
Proof
for any n, X t1 , X t2 ,, X tn {X t } and ,
pXt ,Xt
1
2
,, X tn
( x1 , x2 ,, xn )
n
p X ti ( xi )
i 1
( xi ) 2
1
exp(
)
2
2
i 1 ( 2 )
p X t , X t ,, X t ( x1 , x2 , , xn )
n
1
2
n
Temperature
Q1: Is the temperature within a day stationary?
Markov Chains
A causal process { X t } is a Markov chain if
PrX t | X1 ,, X t1 ( xt | x1,, xt 1 ) PrX t | X tk ,, X t1 ( xt | xt k ,, xt 1 )
for any x1, …, xt
k is the order of the Markov chain
First order Markov chain
PrX t | X1 ,, X t1 ( xt | x1,, xt 1 ) PrX t | X t1 ( xt | xt 1 )
Second order Markov chain
PrX t | X1 ,, X t1 ( xt | x1,, xt 1 ) PrX t | X t2 , X t1 ( xt | xt 2 , xt 1 )
Homogeneous Markov Chains
A k-th order Markov chain { X t }is homogeneous
if the state transition probability is the same
over time, i.e.,
PrX t | X tk ,, X t1 ( x0 | xk ,, x1 ) PrX | X k ,, X 1 ( x0 | xk ,, x1 )
for any t , τ , x0 ,, xk
Q2: Does homogeneous Markov chain imply
stationary process?
State Transition in Homogeneous
Markov Chains
Suppose { X t } is a k-th order Markov chain and
S is the set of all possible states (values) of xt,
then for any k+1 states x0, x1, …, xk, the state
transition probability
PrX t | X tk ,, X t1 ( x0 | xk ,, x1 )
can be abbreviated to Pr(x0 | xk ,, x1 )
Example of Markov Chain
0.4
0.6
Rain
Dry
0.2
0.8
Two states : ‘Rain’ and ‘Dry’.
Transition probabilities:
Pr(‘Rain’|‘Rain’)=0.4 , Pr(‘Dry’|‘Rain’)=0.6 ,
Pr(‘Rain’|‘Dry’)=0.2, Pr(‘Dry’|‘Dry’)=0.8
Short Term Forecast
0.4
0.6
Rain
Dry
0.2
0.8
Initial (say, Wednesday) probabilities:
PrWed(‘Rain’)=0.3, PrWed(‘Dry’)=0.7
What’s the probability of rain on Thursday?
PThur(‘Rain’)=
PrWed(‘Rain’)xPr(‘Rain’|‘Rain’)+PrWed(‘Dry’)xPr(‘Rain’|‘Dry’)=
0.3x0.4+0.7x0.2=0.26
Condition of Stationary
0.4
0.6
Rain
Dry
0.2
0.8
Pt(‘Rain’)=
Prt-1(‘Rain’)xPr(‘Rain’|‘Rain’)+Prt-1(‘Dry’)xPr(‘Rain’|‘Dry’)=
Prt-1(‘Rain’)x0.4+(1– Prt-1(‘Rain’)x0.2=
0.2+0.2xPrt(‘Rain’)
Pt(‘Rain’)= Prt-1(‘Rain’) => Prt-1(‘Rain’)=0.25, Prt-1(‘Dry’)=1-0.25=0.75
steady state distribution
Steady-State Analysis
0.4
0.6
Rain
Dry
0.2
0.8
Pt(‘Rain’) = 0.2+0.2xPrt-1(‘Rain’)
Pt(‘Rain’) – 0.25 = 0.2x(Prt-1(‘Rain’) – 0.25)
Pt(‘Rain’) = 0.2t-1x(Pr1(‘Rain’)-0.25)+0.25
lim Pt(‘Rain’) = 0.25 (converges to steady state distribution)
t
Periodic Markov Chain
0
1
Rain
Dry
1
0
Periodic Markov chain never converges to steady states