Farsi Handwritten Word Recognition

Download Report

Transcript Farsi Handwritten Word Recognition

Farsi Handwritten Word Recognition
Using Continuous Hidden Markov
Models and Structural Features
M. M. Haji
CSE Department
Shiraz University
January 2005
Outline
Introduction
 Preprocessing

 Text
Segmentation
 Document Image Binarization
 Skew and Slant Correction
 Skeletonization
Structural Feature Extraction
 Multi-CHMM Recognition
 Conclusion and Discussion

© M. M. Haji, 2005
2
Introduction



One of the most challenging problems in Artificial Intelligence.
Words are rather complex patterns, having much variability in
handwriting style.
Performance of handwriting recognition systems is still far from
human's both in terms of accuracy and speed.
© M. M. Haji, 2005
3
Introduction

Previous Research:
 Dehghan
et al. (2001). "Handwritten Farsi (Arabic)
Word Recognition: A Holistic Approach Using Discrete
HMM", Pattern Recognition, vol. 34, pp. 1057-1065.
 Dehghan et al. (2001). "Unconstrained Farsi
Handwritten Word Recognition Using Fuzzy Vector
Quantization and Hidden Markov Models", Pattern
Recognition Letters, vol. 22, pp. 209-214.
 A maximum recognition rate of 65% for a 198-word
lexicon!
© M. M. Haji, 2005
4
Methodology

Holistic Strategies
Implicit Segmentation

Analytical Strategies
Explicit Segmentation
© M. M. Haji, 2005
5
Holistic Strategies



Recognition on the whole representation of a word.
No attempt to segment a word to its individual
characters.
Necessary to segment the text lines into words.
 Intra-word space is sometimes greater than interword space!
© M. M. Haji, 2005
6
Holistic Strategies



Using a lexicon, a list of the allowed
interpretations of the input word image.
The error rate increases with the lexicon size.
Successful for postal address recognition or
bank check reading where lexicon is limited and
small.
© M. M. Haji, 2005
7
Analytical Strategies
Explicit Segmentation:




Isolating single letters which are then separately recognized usually
by neural networks .
Successful for English machine-printed text.
Arabic/Farsi texts whether machine-printed or handwritten are
cursive.
Cursiveness and character overlapping are the main challenges.
© M. M. Haji, 2005
8
Analytical Strategies
Implicit Segmentation:



Converting the text (line or word) image into a sequence of small size units.
Recognition at this intermediate level rather than the word or character level
usually by Hidden Markov Model (HMM).
Each unit may be a part of a letter, so a number of successive units can
belong to a single letter.
© M. M. Haji, 2005
9
Text Segmentation
© M. M. Haji, 2005
10
Text Segmentation






Detecting text regions in an image (removing non-text components).
Applications in document image analysis and understanding, image
compression and content-based image retrieval.
Document image binarization and skew correction algorithms
usually require predominant text area to have an accurate estimate
of text characteristics.
Numerous methods have been proposed (an extensive literature).
There is no general method to detect arbitrary text strings.
In the most general form, detection must be:

insensitive to noise, background model and lighting conditions and,
 invariant to text language, color, size, font and orientation even in a
same image!
© M. M. Haji, 2005
11
Text Segmentation





We believe that a text segmentation algorithm should have
adaptation and learning capability.
A learner usually needs much time and training data to achieve
satisfactory results, which restricts its practicality.
A simple procedure was developed for generating training data from
manually segmented images.
A Naive Bayes Classifier (NBC) was utilized, which is fast both in
training and application phase.
Surprisingly excellent results were obtained by this simple classifier!
© M. M. Haji, 2005
12
Text Segmentation



DCT-18 features
10,000 training instance
Naive Bayes Classification:
vMAP  arg max P(v j | a1 , a2 ,...,an )
v j V
v MAP  arg max
v j V
P (a1 , a 2 ,...,a n | v j ) P (v j )
P (a1 , a 2 ,...,a n )
 arg max P (a1 , a 2 ,...,a n | v j ) P (v j )
v j V
P(a1 , a 2 ,...,a n | v j )   P(ai | v j )
i
© M. M. Haji, 2005
13
Text Segmentation

Naive Bayes Classification:
vNB  arg max P(v j )  P(ai | v j )
v j V
i
P(Text) = P(Non-text) = 0.5.
P(a1 | v1 ) P(a 2 | v1 )...P(a18 | v1 )
P(T ext) 
P(a1 | v1 ) P(a 2 | v1 )...P(a18 | v1 )  P(a1 | v 2 ) P(a 2 | v 2 )...P(a18 | v 2 )
© M. M. Haji, 2005
14
Binarization
© M. M. Haji, 2005
15
Binarization







Converting gray-scale images into two-level images.
Many vision algorithms and operators only handle two-level images.
Applied in primary steps of a vision algorithm.
Selecting a proper threshold surface.
Challenging for images with poor contrast, strong noise and variable
modalities in histograms.
Global and local (adaptive) algorithms.
General and special-purpose algorithms.
© M. M. Haji, 2005
16
Binarization
Four different algorithms for document image binarization were
compared and contrasted:
 Otsu, N. (Jan. 1979). “A Threshold Selection Method from Gray
Level Histograms”, IEEE Trans. on Systems, Man and Cybernetics,
vol. 9, pp. 62-66.
global, general purpose
 Niblack, W. (1989). An Introduction to Digital Image Processing,
Prentice Hall, Englewood Cliffs, pp. 115-116.
local, general-purpose
 Wu, V. and Manmatha, R. (Jan. 1998). "Document Image Clean-Up
and Binarization", Proceedings of SPIE conference on Document
Recognition.
local, special-purpose
 Liu, Y. and Srihari, S. N. (May 1997). “Document Image Binarization
Based on Texture Features”, IEEE Trans. on PAMI, vol. 19(5), pp.
540-544.
global, special-purpose
© M. M. Haji, 2005
17
Binarization
Input
Histogram
Otsu
Niblack
Wu and Manmatha
Liu and Srihari
© M. M. Haji, 2005
18
Binarization


Quality improvement by preprocessing and postprocessing.
Preprocessing:

Input
Taylor, M. J. and Dance, C. R. (Sep. 1998). "Enhancement of Document Images
from Cameras", Proceedings of SPIE conference on Document Recognition, pp.
230-241.
Unsharp Masking
3
Binarization
3
Output
super-resolution

Postprocessing:

Trier, D. and Taxt, T. (March 1995). "Evaluation of Binarization Methods for
Document Images", IEEE Trans. on PAMI, vol. 17(3), pp. 312-315.
© M. M. Haji, 2005
19
Skew Correction
© M. M. Haji, 2005
20
Skew Correction



The angle that text lines deviate from the x-axis.
Page decomposition techniques require properly aligned
images as input.
3 types:




global skew
multiple skew
non-uniform skew
“Skew correction" is applied by a rotation after "skew
detection“.
© M. M. Haji, 2005
21
Skew Correction

Categories based on the underlying techniques:







Projection Profile
Correlation
Hough Transform
Mathematical Morphology
Fourier Transform
Artificial Neural Networks
Nearest-Neighbor Clustering
© M. M. Haji, 2005
22
Skew Correction

The projection profile at the global skew angle of the document has
narrow peaks and deep valleys.
© M. M. Haji, 2005
23
Skew Correction

Projection profile technique:
globalSkewAngle arg max f (horizontalProjectionProfile(rotate(I,θ)))
 min   max
SD   (h(i)  h(i  1))2
i
goodness measure
© M. M. Haji, 2005
24
Skew Correction




Limiting the range of skew angles.
Binary search for finding the maximizer of a function.
Computing the sum of pixels along parallel lines at an angle, instead
of rotation at the angle.
Reducing the size of input image, as much as structure of text lines
is preserved.


MIN, MAX downsampling
Local skew correction, after line segmentation, by robust line fitting.
© M. M. Haji, 2005
25
Slant Correction
uniform
non-uniform
© M. M. Haji, 2005
26
Slant Correction


The deviation of average near-vertical strokes from the vertical
direction.
Occurring in handwritten and machine-printed texts.
‫اراک‬


Slant is non-informative.
The average slant angle is estimated first and then a shear
transformation in horizontal direction is applied to the word (or line)
image to correct its slant.
© M. M. Haji, 2005
27
Slant Correction


The most effective methods are based on the analysis of vertical
projection profiles (histograms) at various angles.
Identical to the projection profile based methods for skew correction,
except that:

The histograms are computed in vertical rather than horizontal direction.
 Shear transformation is used instead of rotation.


Accurate result for handwritten words with uniform slant.
Robust to noise.
© M. M. Haji, 2005
28
Slant Correction
© M. M. Haji, 2005
29
Slant Correction

Projection profile technique:
slant Angle arg max f (vertical Projection Profile (horizontal Shear(I,θ)))
 min   max
SD   (h(i)  h(i  1))2
i
goodness measure
© M. M. Haji, 2005
30
Slant Correction

Postprocessing:

Smoothing jagged edges.
1
1
1
1
0
0
1
p
0
1
p
0
1
1
1
1
0
0
A part of a slanted word
© M. M. Haji, 2005
…
after slant correction
and after smoothing
31
Skeletonization
© M. M. Haji, 2005
32
Skeletonization



Skeletonization or medial axis transform (MAT) of a shape has been
one the most surveyed problems in image processing and machine
vision.
A skeletonization (thinning) algorithm transforms a shape into arcs
and curves of thickness one which is called skeleton.
An ideal skeleton has the following properties:





retaining basic structural properties of the original shape
well-centered
well-connected
precisely reconstructable
robust
© M. M. Haji, 2005
33
Skeletonization

Simplifying classification:

Diminishing variability and distortion of instances of one class.
 Reducing the amount of data to be handled.

Proved to be effective in pattern recognition problems:

Character recognition
 Fingerprint recognition
 Chromosome recognition
 …

Providing compact representations and structural analysis of
objects.
© M. M. Haji, 2005
34
Skeletonization
Five different skeletonization algorithms were compared and contrasted with
the main focus on preserving text characteristics:
 Naccache, N. J. and Shinghal, R. (1984). "SPTA: A Proposed Algorithm for
Digital Pictures", IEEE Trans. on Systems, Man and Cybernetics, vol. SMC14(3), pp. 409-418.
 Zhang, T. Y. and Suen, C. Y. (1984). "A Fast Parallel Algorithm for Thinning
Digital Patterns", Comm. ACM, vol. 27(3), pp. 236-239.
 Ji, L. and Piper, J. (1992). "Fast Homotopy-Preserving Skeletons Using
Mathematical Morphology", IEEE Trans. on PAMI, vol. 14(6), pp. 653 - 664.
 Sajjadi, M. R. (Oct. 1996). "Skeletonization of Persian Characters", M. Sc.
Thesis, Computer Science and Engineering Department, Shiraz University,
Iran.
 Huang, L., Wan, G. and Liu, C. (2003). "An Improved Parallel Thinning
Algorithm", Proceedings of the Seventh International Conference on
Document Analysis and Recognition (ICDAR 2003), pp. 780-783.
© M. M. Haji, 2005
35
Skeletonization
Input
SPTA
© M. M. Haji, 2005
Homotopy-Preserving
Zhang-Suen
DTSA
Huang et al.
36
Skeletonization
Input
SPTA
© M. M. Haji, 2005
Homotopy-Preserving
Zhang-Suen
DTSA
Huang et al.
37
Skeletonization
Input
SPTA
robustness to border noise
© M. M. Haji, 2005
DTSA
Huang et al.
38
Skeletonization

Postprocessing:

Removing spurious branches
© M. M. Haji, 2005
39
Skeletonization

Modification:

Removing 4-connectivity, and preserving 8-connectivity of the pattern.
0
1
x
x
1
x
1
p
0
1
p
1
x
0
0
x
0
x
© M. M. Haji, 2005
…
40
Structural Feature Extraction

The connectivity number Cn:
dot
end-point
Cn=0
Cn=1
Cn=2
branch-point
cross-point
Cn=3
Cn=4
Cn=2
© M. M. Haji, 2005
41
Structural Feature Extraction





Capable of tolerating much variation.
Not robust to noise.
Hard to extract.
1D HMM needs 1D observation sequence.
Converting 2D word image into a 1D signal.


speech recognition, online handwritten recognition: 1D signal.
offline handwritten recognition: 2D signal.
© M. M. Haji, 2005
42
Structural Feature Extraction

Converting the word skeleton into a graph.

Tracing the edges in a canonical order:
3
End-Point
End-Point
4
2
7
6
5
1
Branch-Point
© M. M. Haji, 2005
43
Structural Feature Extraction

Loop Extraction:


Important distinctive features.
Making the number of strokes smaller:



Different types of loops:




Easier Modeling
Lower Computational Cost
simple-loop
multi-link-loop
double-loop
A DFS algorithm was written to find complex loops in the
word graph.
© M. M. Haji, 2005
44
Structural Feature Extraction
double-loop
double-loop
‫هـ‬
‫ـهـ‬
© M. M. Haji, 2005
double-loop
multi-link-loop
multi-link-loop
‫ـصـ‬
‫ـط‬
‫ـمـ‬
‫ـو‬
‫ـه‬
...
simple-loop
‫صـ‬
‫ص‬
‫ف‬
‫مـ‬
‫و‬
...
45
Structural Feature Extraction

Each edge is transformed into a 10D feature vector:









Normalized length feature (f1)
Curvature feature (f2)
Slope feature (f3)
Connection type feature (f4)
Endpoint distance feature (f5 )
Number of segments feature (f6 )
Curved features (f7-f10)
Independent of the baseline location.
Invariance against scaling, translation and rotation.
© M. M. Haji, 2005
46
Structural Feature Extraction
1: [0.68, 1.00, 6, 0 , 0.05, 1, 0.0, 0.0, 0.7, 0.0]
2: [0.11, 1.01, 6, 1 , 0.23, 1, 0.0, 0.0, 0.0, 0.0]
3: [2.00, 3.00, 8, 10, 0.00, 0, 0.0, 0.0, 0.0, 0.0]
...
© M. M. Haji, 2005
47
Hidden Markov Models

Signal Modeling:



HMM is a widely used statistical (stochastic) model:


The most widely used technique in modern ASR systems.
Speech and handwritten text are similar:



Deterministic
Stochastic:
 Characterizing the signal by a parametric random process.
Symbols with ambiguous boundaries.
Symbols with variations in appearance.
Not modeling the whole pattern as a single feature vector,
exploring the relationship between consecutive segments.
© M. M. Haji, 2005
48
Hidden Markov Models

Nondeterministic finite state
machines:

0.8
Sunny
0.6
© M. M. Haji, 2005
Cloudy
3
0.
1
0.
0.
2
0.
1
Probabilistic state transition.
 Each state is associated with
a random function.
 Unknown state sequence.
 Some probabilistic function of
the state sequence can be
seen.
0.3
0.2
Rainy
0.4
49
Hidden Markov Models
N: The Number of states of the model
S = {s1, s2, ..., sN}: The set of states
∏ = {πi= P(si at t = 1)}: The initial state probabilities
A = {aij = P(sj at t+1 | si at t)}: The state transition probabilities
M: The Number of observation symbols
V = {v1, v2, ..., vM}: The set of possible observation symbols
B = {bi(vk) = P(vk at t | si at t}: The symbol emission probabilities
Ot: The observed symbol at time t
T: The length of observation sequence
λ = (A, B, ∏): The compact notation to denote the HMM.
© M. M. Haji, 2005
50
Left-to-Right HMMs
.
S1
S2
S3
S4
S5
.
A 5-state Left-to-Right HMM
.
S1
S2
S3
S4
S5
.
A 5-state Left-to-Right HMM with maximum relative forward jump of 2
© M. M. Haji, 2005
51
Hidden Markov Models
The Three Fundamental Problems:
1. Given a model λ = (A, B, ∏), how do we compute P(O | λ), the
probability of occurrence of the observation seq. O = O1, O2, ..., OT.
The Forward-Backward Algorithm
2. Given the observation sequence O and a model λ, how do we choose
a state sequence S = s1, s2, ..., sT so that P(O, S | λ) is maximized,
i.e. finding a state sequence that best explains the observation.
The Viterbi Algorithm
3. Given the observation sequence O, how do we adjust the model
parameters λ = (A, B, ∏) so that P(O | λ) or P(O, S | λ) is maximized.
i.e. finding a model that best explains the observed data.
The Baum-Welch Algorithm, The Segmental K-means Algorithm
© M. M. Haji, 2005
52
Hidden Markov Models

Discrete HMM:


Discrete observation sequences: V = {v1, v2, ..., vM}.
A codebook obtained by Vector Quantization (VQ).



Codebook size?
Distortion: information loss due to the quantization error!
Continuous Hidden Markov Model (CHMM):



Overcoming the distortion problem.
Requiring more parameters → more memory
More deliberate initialization techniques:

Diverging with randomly selected initial parameters!
© M. M. Haji, 2005
53
Hidden Markov Models
Multivariate Gaussian mixture:
M
bi (ot )   cim  (ot ;  im ,  im )
m 1
M

m 1
cim
1
1
exp( (ot   im )  im
(ot   im ) T )
2
(2 ) K |  im |
cim: The mth mixture gain coefficient in state i
μim: The mean of the mth mixture in state i
∑im: The covariance of the mth mixture in state i
M: The number of mixtures used
K: The dimensionality of the observation space
© M. M. Haji, 2005
54
The Block Diagram of the Recognition System
P(O | λ1)
Observation
Sequence
Input Word Image
P(O | λ2)
Normalization
Feature
Extraction
Evaluate the likelihood of the observation
sequence by the Viterbi algorithm against
all models
© M. M. Haji, 2005
Ranked
Word List
P(O | λn)
55
The Class Diagram of the Experimental Recognition System
FeatureExtractor
-fe
WordClassifier
1
1
FixedSizeFeatureExt.
-ffe
1
1
StructuralFeatureExt.
NNWordClassifier
HMMWordClassifier
FourierFeatureExt.
MLPWordClassifier
CHMMWordClassifier
DHMMWordClassifier
1
1
1
1
-codebook
CodeBook
2..*
-models
CHMMWordModel
© M. M. Haji, 2005
2..*
-models
DHMMWordModel
56
An Overview of the Complete System
Input Image
Text Segmentation
Global Skew
Correction
Line Extraction
Local Skew
Correction
Slant Correction
Binarization
Denoising and
Smoothing
Word
Segmentation
Height
Normalization
Skeletonization
Feature Extraction
Multi-CHMM
Recognition
Output Text


Two-Stage Skew Correction
Postponed Binarization
© M. M. Haji, 2005
57
Training Data




The recognition system was trained and evaluated on a dataset of
100 city names of Iran.
A pattern recognition problem with 100 classes was considered.
Most samples in the dataset were automatically generated by a Java
program drawing input string with different fonts, sizes and
orientations on output image.
The dataset contains 150 samples for each word.
© M. M. Haji, 2005
58
Training Data
© M. M. Haji, 2005
59
Training Data
© M. M. Haji, 2005
60
Experimental Results
1-best recognized
© M. M. Haji, 2005
61
Experimental Results
1-best recognized
© M. M. Haji, 2005
62
Experimental Results
1-best recognized
© M. M. Haji, 2005
63
Experimental Results
1-best recognized
© M. M. Haji, 2005
64
Experimental Results
1-best recognized
© M. M. Haji, 2005
65
Experimental Results
3-best recognized:
1. ‫زنجان‬
© M. M. Haji, 2005
2. ‫اصفهان‬
3. ‫دامغان‬
66
Experimental Results
4-best recognized:
1. ‫قشم‬
© M. M. Haji, 2005
2. ‫قم‬
3. ‫مرند‬
4. ‫مشهد‬
67
Experimental Results
Not N-best recognized, for N ≤ 20
© M. M. Haji, 2005
68
Experimental Results
Not N-best recognized, for N ≤ 20
© M. M. Haji, 2005
69
Experimental Results
Not N-best recognized, for N ≤ 20
© M. M. Haji, 2005
70
Conclusion




The first work to use CHMMs with structural features to
recognize Farsi handwritten words.
A complete offline recognition system for Farsi
handwritten words.
A new machine learning approach based on the NBC for
text segmentation.
Comparing and contrasting different algorithms for:





Binarization
Skew and Slant Correction
Skeletonization
Excellent generalization performance.
A maximum recognition rate of 82% on our dataset of
size 100.
© M. M. Haji, 2005
71
Thanks for your attention
Please feel free to ask any question
[email protected]
© M. M. Haji, 2005
72