Transcript Handwriting

Hidden Markov Modelling and
Handwriting Recognition
Csink László 2009
Types of Handwriting 1
1. BLOCK PRINTING
2. GUIDED CURSIVE
HANDWRITING
2
Types of Handwriting 2
3. UNCONSTRAINED
CURSIVE HANDWRITING
Clearly faster, but less legible,
than 1 or 2
ONLINE recognition for 3: some systems have been developed
OFFLINE recognition for 3: much research has been done, still a lot to
do 
Suen: ”no simple scheme is likely to achieve high recognition and
reliability rates, not to mention human performance”
3
Introduction to Hidden Markov Modelling (HMM): a
simple example 1
Suppose we want to determine the
average annual temperature at a
specific location over a series of years.
We want to do it of such a past era of
which measurements are unavailable.
We assume that only two kinds of years
exist: hot (H) and cold(C) and we know
that the probability of a cold year
coming after a hot one is 0.3, and the
probability of a cold year coming after a
cold one is 0.6. Similar data are known
about the prob of a hot year after a hot
one or a cold one, respectively. We
assume that the probabilities are the
same over the years. Then the data are
expressed like this:
H
C
H
0.7 0.3
C
0.4 0.6
We note that the row
sums in the red matrix are
1! (row stochastic matrix)
The transition process
described by the red
matrix is a MARKOV
PROCESS, as the next
state depends only on the
prevoius one.
4
Introduction to HMM: a simple example 2

We also suppose that there is
a known correaltion between
the size of tree growth rings
and temperature. We
consider only 3 different ring
sizes: Small, Medium and
Large. We know that in each
year the following
probabilistic realtionship
holds between the states H
and C and the rings S, M and
L:
S
M
L
H
0.1 0.4
0.5
C
0.7 0.2
0.1
We note that the row sums in
the red matrix are 1!
(also a row stochastic matrix)
5
Introduction to HMM: a simple example 3

Since the past temperatures are unknown, that is the past states
are hidden, the above model is called a Hidden Markov Model
(HMM).
0.7 0.3
A 

0.4 0.6
 0.1 0.4 0.5
B

0
.
7
0
.
2
0
.
1


  0.6 0.4
State transition matrix
(Markov)
Observation matrix
Initial state distribution (we
assume this is also known)
A and B and π are all row stochastic
6
Introduction to HMM: a simple example 4




Denote the rings S, M and L by 0,1 and 2, resp.
Assume that in a –year period we observe
O=(0,1,0,2). We want to determine the most
likely sequence of the Markov process given the
observations O.
Dynamic Programming: the most likely
sequence is the one with the highest probability
from all possible state sequences of length four.
HMM solution: the most likely sequence is the
one that maximizes the expected number of
correct states.
These two solutuions do not necessarily
coincide!
7
Introduction to HMM: a simple example 5
Notations
State transition matrix A
H
C
H
0.7 0.3
C
0.4 0.6
Initial state distribution π
  0.6 0.4
In the previous example,
T=4, N=2, M=3,
Q={H, C}
V={0(=S), 1(=M), 2(=L)}, O=(0,1,0,2)
O= (0, 1, 0, 2)
Observation matrix B
0( S ) 1( M ) 2( L)
H 0.1 0.4
0.5
C 0.7
0.2
0.1
8
State Sequence Probability



Consider a state sequence of length four
X=(x0,x1,x2,x3) with observations
O=(O0,O1,O2,O3)
Denote by πx0 the probability of starting in
state x0. bx0(O0) is the probability of initially
observing O0 and ax0,x1 is the probability of
transiting from state x0 to state x1. We see that
the probability of the state sequence X above
is
P( X )  








O
O
O
O
b
a
b
a
b
a
b
x x
x x x
xx x
x x x
0
0
0
0
,
1
1
1
,
1
2
2
2
2
,
3
3
3
9
Probability of Sequence (H,H,C,C)
P( X )  








O
O
O
O
b
a
b
a
b
a
b
x x
x x x
xx x
x x x
H
A=
0
0
0
C
H
0.7 0.3
C
0.4 0.6
O  (0, 1, 0, 2)
0
,
1
1
1
  0.6 0.4
,
1
2
2
B=
2
2
H
C
,
3
3
3
0( S ) 1( M ) 2( L)
0.1
0.4
0.5
0.7
0.2
0.1
X  x0 , x1 , x2 , x3  (H , H , C, C)
P( X )   H bH 0aH , H bH 1aH ,C bC 0aC ,C bC 2
P(HHCC) = 0.6 (0.1)
(0.7) (0.4) (0.3)
(0.7) (0.6) (0.1) = 0.000212
10
Finding the Best Solution in the DP Sense
1
state
seq. (A)
2
HHHH
0,000412
0,042787
3
HHHC
0,000035
0,003635
4
HHCH
0,000706
0,073320
5
HHCC
0,000212
0,022017
6
HCHH
0,000050
0,005193
7
HCHC
0,000004
0,000415
Using EXCEL functions,
8
HCCH
0,000302
0,031364
=INDEX(A2:A17; MATCH(MAX(B2:B17);B2:B17;0))
9
HCCC
0,000091
0,009451
CHHH
0,001098
0,114031
[ =INDEX(A2:A17; HOL.VAN(MAX(B2:B17);B2:B17;0)) ]
10
11
CHHC
0,000094
0,009762
12
CHCH
0,001882
0,195451
13
CHCC
0,000564
0,058573
14
CCHH
0,000470
0,048811
15
CCHC
0,000040
0,004154
16
CCCH
0,002822
0,293073
17
CCCC
0,000847
0,087963
18
SUM
0,009629
1,000000
Prob. (B)
Normalized
prob.(C)
We compute the state sequence probabilities (see left)
the same way as we computed the 5th row in the
previous slides. Writing =B2/B$18 into C2 and
copying the formula downwards we get the
normalized probabilities.
we find that sequence with highest probability is CCCH.
This gives the best solution in the Dynamic Programming
(DP) sense.
11
1
state seq.
(A)
Prob. (B)
Norm.
prob.
(C)
2
HHHH
0,000412
3
HHHC
4
The HMM prob matrix
1st
2nd
3rd
4th
0,042787
H
H
H
H
0,000035
0,003635
H
H
H
C
HHCH
0,000706
0,073320
H
H
C
H
5
HHCC
0,000212
0,022017
H
H
C
C
6
HCHH
0,000050
0,005193
H
C
H
H
7
HCHC
0,000004
0,000415
H
C
H
C
8
HCCH
0,000302
0,031364
H
C
C
H
9
HCCC
0,000091
0,009451
H
C
C
C
10
CHHH
0,001098
0,114031
C
H
H
H
11
CHHC
0,000094
0,009762
C
H
H
C
12
CHCH
0,001882
0,195451
C
H
C
H
13
CHCC
0,000564
0,058573
C
H
C
C
14
CCHH
0,000470
0,048811
C
C
H
H
15
CCHC
0,000040
0,004154
C
C
H
C
16
CCCH
0,002822
0,293073
C
C
C
H
17
CCCC
0,000847
0,087963
C
C
C
C
0
1
2
3
P(H)
0,1882
0,5196
0,2288
0,8040
P(C)
0,8118
0,4804
0,7712
0,1960
Using the EXCEL finctions MID
[KÖZÉP] and SUMIF
[SZUMHA] we produced the
columns D,E,F and G to show
the 1st,2nd,3rd and 4th states.
Then summing up columns D,E,F
and G when the state is ”H” we
get the first row of the HMM
prob matrix.
The second row of the HMM
prob matrix is computed similarly,
using ”C” instead of ”H”.
12
Three Problems
Problem 1
Given the model λ=(A,B,π) and a sequence of observations
O, find P(O| λ). In other words, we want to determine the
likelihood of the observed sequence O, given the model.
Problem 2
Given the model λ=(A,B,π) and a sequence of observations
O, find an optimal state sequence for the underlying
Markov process. In other words, we want to uncover the
hidden part of the Hidden Markov Model.
Problem 3
Given an observation sequence O and dimensions N and
M, find the model λ=(A,B,π) that maximizes the
probability of O. This can be viewd as training the model
to best fit the observed data.
13
Solution to Problem 1
Let λ=(A,B,π) be a given model and let O=(O0,O1,…,OT-1) ne a series of
observations. We want to find P(O| λ).
Let X=(x0,x1,…,xT-1) be a state sequence.Then by definion of B we have
P(O | X ,  )  b
   
 
x0 O0 bx1 O1 bxT 1 OT 1
and by the definition of π and A we have
P( X |  )  
x0 a x0, x1 a x1, x2a xT 2, xT 1
PO  X   
and
P 
PO  X    P X    PO  X   
PO | X ,   P X |   

P X   
P 
P 
we have PO, X |    PO | X ,   P X |  
Since PO | X ,   
14
By summing over all possible state sequences we get
PO |     PO, X |     PO | X ,   P X |  
 
X
X
X




a
O
O
b
a
b
x x
xx x
x
0
0
0
,
0
1
1
1
T  2 xT 1
,
bx O 
T 1
T 1
As the length of the state sequence and the observation sequence is T, we have
NT terms in this sum, and we have T multiplications in a term, so the total
number of multiplications is T×NT. Fortunately, there exists a much faster
algorithm as well.
15
The Forward α-pass Algorithm
αt(i) is the probability of the partial observation sequence up to time t,
where qi is the state the underlying Markov process has at time t. Let
α0(i)=πibi(O0) for i=0,1,…,N-1. For t=1,2,…,T-1 and i=0,1,…,N-1
compute
 N 1

( j ) a ji
 t (i)  

t 1
j 0

N 1

We have to compute α T×N-times
and there are N multiplications in
each α, so this method needs T×N2
multiplications.

N 1
PO |     P O0 , O1 ,, OT 1 , xT 1  q |    T 1 i 
i 0
i
i 0
16
Solution to Problem 2
Given the model λ=(A,B,π) and a sequence of observations O, our
goal is to find the most likely state sequence, i.e. the one that
maximizes the expected number of correct states. First we
define the backward algorithm called β-pass.
For t  0,1,, T  1 and i  0,1,, N  1 define
 i   PO
t 1
t
Let
, Ot 2 ,, OT 1 | xt  qi ,  
 i   1,
T 1
for i  0,1, N  1
For t  T  2, T  1,,0 and i  0,1,, N  1 com pute
 i    a b O   j 
N 1
t
j 0
ij
j
t 1
t 1

 q , O, |    i   i 
x
 i  : Px  q , | O,    PO |    PO |  
P
i
t
t
i
t
t
i
17
Example (1996):
HMM-based Handwritten Symbol Recognition
Input: a sequence of strokes captured
during writing.A stroke is a sequence of
(x,y)-coordinates correponding to pen
positions. A stroke is writing from pen
down to pen up.
 Slant correction: try to find a near-vertical
part in each stroke and rotate it the whole
stroke so that the part should be vertical.

18
Normalization of Strokes



Normalization: determine the x-length of each
stroke. Denote t10 the threshold under which 10 %
of strokes are with respect to x-length. Denote x90
the threshold above which 10 % of strokes are with
respect to x-length. Then compute the average of xlengths of all strokes that are between the two
thresholds, denote this by x’.
Perform the above operations with respect to ylength two; compute y’.
Then normalize all strokes to x’ and y’.
19
The Online Temporal Feature Vector


Introduce a hidden stroke between the pen-up position of
a stroke and the pen-down position of the next stroke
(we assume that the strokes are sequenced according to
time).
The unified sequence of strokes and hidden strokes is
resampled at equispaced points along the trajectory
retaining the temporal order. For each point we store: the
local position, the sine and cosine of the angle between
the x-axis and the vector connecting the current point
and the origin, and the fact that the point belomgs to a
stroke or a hidden stroke constitute a feature vector.
20
HMM Topology
For each symbol Si of the alphabet
{S1,S2,…,SK} an HMM λi is generated. The
HMM is such that P(sj|si)=0 for states j<i or
j>i+2.
 The question is: how can we generate an
HMM? The answer is given by the solution
to Problem 3.

21
Solution of Problem 3
Now we want to adjust the model parameters to best fit the
observations. The sizes N (number of states) and M
(number of observations) are fixed but A, B and π are
free, we only have to take care that they be row
stochastic.
and i,j in {0,1,…,N-1} define the prob of
i    i, j 
Then we For
have  t=0,1,…,T-2
being in state qi at t and transiting to state qj at t+1:
N 1
t
j 0
t
 i, j  : Px  q , x
t
t
i
t 1
i a b O   j 

 q | O,   
t
j
ij
j
t 1
PO |  
t 1
N 1
Then we have
 i    i, j 
t
j 0
t
22
For
i  0,1,, N  1 let
 
i
0
(i )
T 2
For i  0,1,, N  1 and j  0,1,, N  1 com puteaij 
 i, j 
t 0
T 2
t
 i 
t 0
t
 j




For
j  0, 1,, N  1 and k  0,1,, M  1 com pute b j k  
t 0,1,,T 2
k
t
Ot
T 2
  j 
t 0
t
23
The Iteration
1.
2.
3.
4.
First we initialize λ=(A,B,π) with a best guess,
or choose random values such that πi≈1/N, aij
≈1/N, bj(k) ≈1/M, π,A and B must be row
stochastic.
 t i ,  t i ,  t i, j  and  t i 
Compute
Estimate the model λ=(A,B,π)
If P(O| λ) increases, GOTO 2.
(the increase may be measured by a
threshold, or the maximum number of
iterations may be set)
24
Practical Considerations
Be aware of the fact that αt(i) tends to 0 as
T increases. Therefore, realization of the
above formulas may lead to underflow.
 Details, and pseudocodes, may be found
here:

http://www.cs.sjsu.edu/faculty/stamp/RUA/HMM.pdf
25
Another example (2004):
Writer Identification Using HMM Recognizers





Writer identification is the task of determining the
author of a sample handwriting from a set of writers.
Writer verification is the task of determining if a given
text has been written by a certain person.
If the text is predefined, it is text dependent
verification, otherwise it is text independent
verification.
Writer verification may be done online or offline.
It is generally believed that text independent
verification is more difficult than the text dependent
one.
26



For each writer, an individual HMM-based handwriting
recognition system is trained using only data from that
writer. Thus from n writers we get n different HMM’s.
Given an arbitrary line of text input, each HMM
recognizer outputs some recognition with a recognition
score.
It is assumed that



Correctly recognized words have a higher score than incorrectly
recognized words
Recognition rate on input from a writer the system was trained
on is higher than on input from other writers
The scores produced by the different HMM’s can be
used to decide who has written the input text line.
27




After preprocessing (slant, skew, baseline location,
height) a sliding window of one-pixel width is shifted from
left to right
The features are: number of black pixels in the window,
center of gravity, second order moment, position and
contour direction of the upper- and lowermost pixels,
number of black-to-white transitions in the window,
distance between the upper- and lowermost pixels.
Normalization may lead to the reduction of individuality,
on the other hand, it supports recognition which is
important for the verification project
For each upper- and lowercase character an individual
HMM is built.
28
Related Concepts
The Viterbi algorithm is a dynamic
programming for finding the most likely
sequence of hidden states – called the
Viterbi path.
 The Baum–Welch algorithm is used to
find the unknown parameters of an HMM.
It makes use of the forward-backward
algorithm used above.

29
HMM-based Speech Recognition



Modern general-purpose speech recognition
systems are generally based on Hidden Markov
Models. Reason: Speech could be thought of as
a Markov model.
For further reference consult Rabiner: A Tutorial
on Hidden Markov Models and Selected
Applications in Speech Recognition
http://www.caip.rutgers.edu/~lrr/Reprints/tutorial%20on%20hmm%2
0and%20applications.pdf
30
Thank you for your attention!
31