OCR for Cursive Scripts a survey

Transcript OCR for Cursive Scripts a survey

OCR
a survey
Csink László 2009
Problems to Solve
Recognize good quality printed text
 Recognize neatly written handprinted text
 Recognize omnifont machine-printed text
 Deal with degarded, bad quality
documents
 Recognize unconstrained handwritten text
 Lower substitution error rates
 Lower rejection rates

2
OCR accprding to Nature of Input
3
Feature Extraction
Large number of feature extraction
methods are available in the literature for
OCR
 Which method suits which application?

4
A Typical OCR System
1.
2.
Gray-level scanning (300-600 dpi)
Preprocessing
–
–
–
3.
4.
5.
Binarization using a global or locally adaptive
method
Segmentation to isolate individual characters
(optional) conversion to another character
representation (e.g. skeleton or contour curve)
Feature extraction
Recognition using classifiers
Contextual verification or post-processing
5
Feature Extraction (Devivjer and Kittler)

Feature Extraction = the problem of extracting from the

Extracted features must be invariant to the expected
distortions and variations
Curse of dimensionality= if the training set is small, the
number of features cannot be high either
Rule of thumb:
number of training patterns = 10×(dim of feature vector)


raw data the information which is most relevant for
classification purposes, in the sense of minimizing the
within-class variability while enhancing the betweenclass pattern variability
6
Some issues
Do the characters have known orientation
and size?
 Are they handwritten, machine-printed or
typed?
 Degree of degradation?
 If a character may be written in two ways
(e.g. ‘a’ or ‘α’), it might be represented by
two patterns

7
Variations of the same character
Size invariance can be
achieved by
normalization, but
norming can cause
discontinuities in the
character
Rotation invariance is
important if chaarcters
may appear in any
orientation (P or d ?)
Skew invariance is important for hand-printed text
or multifont machine-printed text
8
Features Extracted from
Grayscale Images
Goal: locate candidate characters. If the image is binarized, one may find
the connected components of expected character size by a flood fill type
algorithm (4-way recursive method, 8-way recursive method, nonrecursive scanline method etc., check
http://www.codeproject.com/KB/GDI/QuickFill.aspx
Then the bounding box is found.
A grayscale method is typically used when recognition based on the
binary representation fails. Then the localization may be difficult.
Assuming that there is a standard size for a character, one may simply try
all possible locations.
In a good case, after localization one has a subimage containing one
character and no other objects.
9
Template Matching
(not often used in OCR systems for grayscale characters)

No feature extraction is used, the template
character image itself is compared to the input
character image:
D   Z xi, yi T j xi, yi 
M
j
2
i 1
where the character Z and the template Tj are of
the same size and summation is taken over all
the M pixels of Z. The problem is to find j for
which Dj is minimal; then Z is identified with Tj.
10
Limitations of Template Matching
Characters and templates must be of the same
size
 The method is not invariant to changes in
illumination
 Very vulnerable to noise
In template matching, all pixels are used as
templates. It is a better idea to use unitary
(dfistance-preserving) transforms to character
images, obtaining a reduction of features while
preserving most of the informations of the
character shape.

11
The Radon Transform
The Radon transform computes
projections of an image matrix along
specified directions.
A projection of a two-dimensional
function f(x,y) is a set of line integrals.
The Radon function computes the line
integrals from multiple sources along
parallel paths, or beams, in a certain
direction.
The beams are spaced 1 pixel unit apart.
To represent an image, the radon
function takes multiple, parallel-beam
projections of the image from different
angles by rotating the source around the
center of the image. The following figure
shows a single projection at a specified
rotation angle.
12
Projections to Various Axes
13
14
Zoning
Consider a candidate area (connected set) surrounded by a bounded
box. Divide it to 5×5 equal parts and compute the average gray level in
each part, yielding a 25-length feature vector.
15
Thinning
Thinning is possible both for grayscale and
for binary images
 Thinning= skeletonization of characters
The informal definition of a skeleton is a line representation
an object that is:
 of
Advantage:
few features, easy to extract

i) one-pixel thick,
ii) through the "middle" of the object, and,
iii) preserves the topology of the object.
16
When No Skeleton Exists
a) Impossible to egnerate a one-pixel width skeleton to be in the
middle
b) No pixel can be left out while preserving the connectedness
17
Possible Defects
Specific defects of data may cause misrecognition
Small holes
 loops in skeleton
Single element irregularities  false tails
Acute angles
 false tails
18
How Thinning Works



Most thinning algorithms rely on the erosion of the boundary while
maintaining connectivity,see
http://www.ph.tn.tudelft.nl/Courses/FIP/noframes/fipMorpholo.html for mathematical morphology
To avoid defects, preprocessing is desirable
As an example, in a black and white application
– They remove very small holes
– They remove black elements having less than 3 black neighbours and
having connectivity 1
19
An Example of Noise Removal
This pixel will be removed (N=1;
has 1 black neighbour)
20
Generation of Feature Vectors Using
Invariant Moments

Given a grayscale subimage Z containing a character
candidate, the moments of order p+q are defined by
M
m
pq
 Z
i 1
x , y xi yi
p
i
q
i
where the sum is taken over all M pixels of the subimage.
The translation-invariant central moments of order p+q are obtained by
shifting the origin to the center of gravity:

M
pq
 Z
i 1
x , y xi  x  yi  y 
p
i
i
q
where
x
m
m
10
00
and y 
m
m
01
00
21
Hu’s (1962) Central Moments

pq



pq
pq
1
2
, where p  q  2,     m00
00
ηpq –s are scale
invariant to scale
Mi –s are
rotation
invariant
22
K-Nearest Neighbor Classification
Example of k-NN classification. The test
sample (green circle) should be
classified either to the first class of blue
squares or to the second class of red
triangles. If k = 3 it is classified to the
second class because there are 2
triangles and only 1 square inside the
inner circle. If k = 5 it is classified to
first class (3 squares vs. 2 triangles
inside the outer circle).
Disadvantage in practice: the distance of the green circle to all blue squares
and to all red triangle shave to be computed, this may take much time
23
From now on we will deal with
binary (black and white) images
only 
24
Projection Histograms

These methods are typically used for
– segmenting characters, words and text lines
– detecting if a scanned text page is rotated
But they can also provide features for recognition, too!
•Using the same number of bins on each
axis – and dividing by the total number
of pixels - the features can be made
scale independent
•Projection to the y-axis is slant invariant,
but projection to the x-axis is not
•Histograms are very sensitive to rotation
25
Comparision of Histograms
It seems plausible to compare two histograms y1 and y2
(where n is the number of bins) in the following way:
n
d 
i 1
y x   y x 
i
1
2
i
However, the dissimilarity using cumulative histograms is less sensitive to
errors. Define the cumulative histogram Y as follows:
x    yx 
k
Y
For the cumulative histograms
Y1 and Y2 define D as:
k
i 1
k
D   Y 1 xi   Y 2 xi 
n
i 1
26
Zoning for Binary Characters 1
Contour extraction or thinning may be
unusable for self-touching characters.
This kind of error often occurs to
degraded machine-printed texts
(generations of photocopying  )
The self-touching problem may be
healed by morphological opening.
27
Zoning for Binary Characters 2
Similarly to the grayscale case, we
consider a candidate area (connected
set) surrounded by a bounded box.
Divide it to 5×5 equal parts and
compute the number of black pixels
in each part, yielding a 25-length
feature vector.
28
Generation of Moments in the Binary Case
Given a binary subimage Z containing a character
candidate, the moments of order p+q are defined by
m
pq


overblack pixels
xi yi
p
q
where the sum is taken over all black pixels of the subimage
The translation-invariant central moments of order p+q are obtained by shifting
the origin to the center of gravity:

pq


over black pixels
xi  x  yi  y
p
q
where
x
m
m
10
00
and y 
m
m
01
00
29
The Central Moments can be used similarly to the grayscale case

pq



pq
pq
1
2
, where p  q  2,     m00
00
ηpq –s are scale
invariant to scale
Mi –s are
rotation
invariant
30
Contour Profiles
The profiles may be outer profiles or inner profiles. To construct
profiles, find the uppermost and lowermost pixels on the contour. The
contour is split at these points. To obtain the outer profiles, for each y
select the outermost x on each contour half.
Profiles to the other axis can be constructed similarly.
31
Features Generated by Contour
Profiles
First differences of profiles: X’L=XL(y+1)-xL(y)
Width: w(y)=xR(y)-xL(y)
Height/maxy(w(y))
Location of minima and maxima of the profiles
Location of peaksin the first differences (which may
indicate discontinuities)
32
Zoning on Contour Curves 1 (Kimura & Sridhar)
Enlarged zone
A feature vector of size
(4× 4) × 4
isgenerated
33
Zoning on Contour Curves 2 (Takahashi)
Contour codes were extracted from inner contours (if any) as well as
outer contours, the feature vector had dimension (4 ×6 ×6 ×6) ×4 ×(2)
(size ×four directions × (inner and outer))
34
Zoning on Contour Curves 3 (Cao)
When the
contour
curve is
close to a
zone border,
small
variations in
the curve
may lead to
large
variations in
the feature
vector
Solution:
Fuzzy border
35
Zoning of Skeletons
Features: length of the
character graph in each
zone (9 or 3).
By dividing the length
with the total length of
the graph, size
independence can be
achieved.
Additional features: the presence or absence
of junctions or endpoints
36
The Neural Network
Approach for Digit
Recognition
Le Cun et al:
• Each character is scaled to a 16×16 grid
• Three intermediate hidden layers
• Training on a large set
Advantage:
• feature extraction is automatic
Disadvantage:
• We do not know how it works
• The output set (here 0-19) is small
37