Transcript Slide 1

CEDAR
Chaincode Generation
Input
Image
data
2
1
information
3
Y
Contour separation extracted by algorithm
X
4
0
7
Status
Mode
Slope
Curvature
data
information
5
6
Eight Contour Directions
information
end of chain
Output
Chaincode contour
Represented as an array of <x, y>
coordinates and corresponding slopes
(0..7) at each contour point
CEDAR
Algorithm
Input
- Array of bytes
representing pixels.
- Value 0 for black and 255
for white.
Start at upper right corner of image
Travel to the left, down a row, until you
move from white pixel to black pixel
Yes
Is it marked?
No
New object, so mark pixel and store it
Travel counter-clock wise around
boundary, storing visited pixels, and
marking pixels as necessary, until you
return to the start of the contour
No
At lower
left corner?
Yes
Output
Contour representation
of the image
CEDAR
Pre-scan Digit Recognition
Input
Chaincode contour of connected
components in address block
Use fast digit recognizer
- POLY OR CP
on each appropriate component in
address block
Output
- Recognition choice with
confidence on each component
- Confidence of characters are
typically low
- Confidence of “real” numerals are
typically high
CEDAR
POLY Digit Recognizer
• Method
– 1240 binary pixel pair features used
– Linear discriminant classifier used
• Performance
– 1000 digits per second on a RS 6000
– 94% recognition rate on a standard test set
– useful in separating alpha characters and numerals
• Feature Extraction
– Set of 1240 binary (on, off) features
– Features are based on whether particular pixels or pairs of
pixels are BLACK
– Pixel pairs are empirically determined
– Consider distinguishing “7” and “2”
CEDAR
• Classification
– Uses linear discriminant functions
– Training:
• 1241 weights (one for each of the features) plus a constant are
determined for each of the 10 classes
– Testing: For each new test image do the following:
• For each digit class (0..9) create a sum consisting of all the
weights corresponding to a feature that is “on”, add in the
constant
• Compare the 10 sums and choose the largest value
• This is the top choice class
– Output:
• Ranked list of the 10 classes sorted by the sums
CEDAR
CP, Digit Recognizer
• Method
– combines a 3-layer back propagation neural network classifier
using Curvature features with POLY
– Top 2 choices of POLY and Top 2 choice of Curvature
recognizer are combined using logistic regression
• Performance
– 170 digits per second on a RS 6000
– 96% recognition rate on standard test set
CEDAR
Curvature, Digit Recognizer
• Input
– Binary image of digit size normalized by imposing a 4x4 grid
on the image
– Since the features are region based (as opposed to pixel
based) this form of size normalization is effective
CEDAR
• Feature Extraction
– Set of 296 real-valued features
• 208 based on contour shape (slope and curvature)
– For each of the 16 regions in the 4x4 grid determine
• percent of pixels with each of the 8 possible slopes
• percent of pixels with each of 5 ranges of curvature computed
over a neighborhood of 12-pixel window
0 1
2
3 4
5
dS
6
7
Slopes
1
S
-2 -1 0
1
2
Curvatures
CEDAR
• 84 based on stroke transitions between regions
– Chaincode represents the contour as a sequence of boundary
pixels, so there is a notion of “moving” from part of the image to
another
– In a 4x4 grid, there are 84 possible transitions
• 4 based on size, location, and number of interior contours
– Image is divided into 3 regions: UPPER, MIDDLE, and LOWER
–
–
–
–
Determine the center of region bounded by interior contours
Location of center determines which of 3 features is set
Value of feature is ratio of “hole” area to area of bounding box
last feature stores number of interior contours present
CEDAR
• Classification
– Uses a 3-layer back propagation neural network
• 296 input nodes for feature values
• 80 hidden nodes
• 10 output nodes (1 for each digit class)
– Connections between nodes have associated weights
determined during training
– Output node reporting the highest value corresponds to
classifier’s top choice
CEDAR
Thresholding Performance Graph
6 5 .0 0
6 0 .0 0
5 5 .0 0
5 0 .0 0
R ejectio n R ate
4 5 .0 0
4 0 .0 0
3 5 .0 0
3 0 .0 0
c o m b ina to r
c urva ture
2 5 .0 0
2 0 .0 0
1 5 .0 0
1 0 .0 0
5 .0 0
0 .0 0
0 .0 0
1 .0 0
2 .0 0
E rro r R a te
3 .0 0
4 .0 0
CEDAR
GSC - Top Level
Input
Put bounding box around the image
Hyper GSC Recognizer
Is
confidence
level of first
class 0?
NO
YES
GSC Recognizer
Output
Output
CEDAR
GSC, Digit Recognizer
• Method
– 512 binary valued features representing Gradient, Structural,
Concavity characteristics of the image
– Uses a nearest neighbor classifier
• Performance
– 100 digits per second on a RS 6000
– 97% recognition rate on standard test set
CEDAR
• Image Processing
– Size normalization accomplished by imposing a 4x4 grid
• Grid is determined by partitioning the image horizontally and
vertically into 4 equal pixel mass partitions
100
90
80
% Reject
% Reject
70
60
Original
50
Variable Grid
40
30
20
10
Uniform and Variable Gridding
0
0
10
20
30
% Error
40
50
CEDAR
• Feature Extraction
– Set of 512 binary features
– Choice of features motivated by belief that multi-scale
features have the best chance of capturing the difference
between classes of digits or characters
CEDAR
– 192 Gradient features (finest scale)
• Gradient is the angle perpendicular to local direction of the
contour boundary and is computed at every pixel
• Quantized to 12 different ranges of angles
• Histogram of occurrences of angles (ranges) for each of the 16
regions in the 4x4 grid are computed
• Histogram values that cross a threshold are turned “on”
– 192 Structural features (intermediate scale)
• 12 structures consisting of groups of pixels form mini-strokes
–
–
–
–
–
horizontal stroke
vertical stroke
diagonal rising
diagonal falling
corners
upper and lower surfaces
left and right surfaces
upper and lower surfaces
upper and lower surfaces
(4)
• If any pixel group falling in a region (4x4) satisfies the rule for a
mini-stroke, the feature is “on”
CEDAR
– 128 Concavity features (coarsest scale)
• 16 pixel density features
– Does the percentage of “on” pixels in region (4 x 4) exceed a
threshold
• 32 large stroke features
– Does region (4 x 4) contain a horizontal run or vertical run of “on”
pixels greater in length than a threshold
• 80 concavity features
– Does region (4 x 4) contain a concavity pointing
• up
• down
• left
• right
• enclosed “hole”
CEDAR
• Classification
– Identifies 6 nearest neighbors from among templates
– Takes the weighted vote of the neighbors where each
neighbor’s vote is weighted by its proximity to the test vector
– Performance of classifier is dependent on how representative
the templates are of the set of “all possible” digits
CEDAR
Gradient Features
Input
Put a 4 x 4 non-uniform grid
on the image by placing
sampling of a equimass
divisions of the histogram
Smooth the image by filtering
Convolve the image with 3x3 Sobel operators to
find the gradients
Dividing the range of direction in 12 nonoverlapping regions each of 2*pi/12 radians
Do a histogram based thresholding for each
sampling region
In each of the 4x4 regions if there are no pixels
with gradient values in a particular range then
set the corresponding bit in feature vector to 1
(12 bit feature vector for each region
corresponding to 12 bins of directions)
Output
12x4x4 = 128 bit feature vector
CEDAR
Structural Features
Input
Place a 4x4 fixed grid on the image
Apply a set of 12 rules to each pixel to
find the stroke and corner features
YES
For each
of the 4x4 region is
the no pixels satisfying a
rule > the threshold
set for the
rule?
Set the corresponding bit in feature
vector to 1 (12 bit feature vector for
each region signifying the 12 rules)
Output
NO
Set the corresponding bit in feature
vector to 1
12x4x4 = 128 bit feature vector as
structural features
CEDAR
Concavity Features
Input
Place 4x4 fixed grid on the image
Convolve the image with a starlike operator by shooting rays in
8 directions and determining what each ray hits
Define eight types of pixels depending on the way the rays shoot
out from the pixel hit the boundary
For each type of pixel define a threshold. For each type of pixel
set aside a bit in the feature vector for each of the regions
YES
IS (no of
corresponding type of
pixel)/(area of region) >
threshold set for the
type of pixel.
Set the corresponding bit in the feature vector to 1
Output
NO
Set the corresponding bit in the feature vector to 0
8x4x4 bit feature vector as the
concavity features
CEDAR
Word Recognition Control
Input
- Word Image
- Lexicon
- Word Recognizer - 1 (WMR)
- Word Recognizer - 2 (CMR)
Call WMR with expanded lexicon
conf = LO
REJECT
WMR
results
conf = HI
ACCEPT
WMR top choice
conf(top) = MED
Call CMR with n-best WMR choices
(n<11)
conf = LO
REJECT
CMR
results
conf = HI
ACCEPT
CMR top choice
conf = MED
REJECT
NO
Classifiers
concur ?
Output
conf = HI
YES
ACCEPT
common top choice
CEDAR
WMR
Input
- Chaincode of Word Image
- Lexicon
Over-segmentation of word into characters so that
no two characters remain merged
Features extracted from each segment
Match one or more (up to four) segments with each
character of a single lexicon entry
Derive “goodness” of match between segments
and a lexicon entry
Score match for all lexicon entries
Output
Rank the lexicon based on matching score
CEDAR
WMR Features
• 74 chaincode based features are extracted 2 global and 72 local features.
• Distribution of the 8 directional slopes for 9 (3
x 3) sub-images form
72 local feature.
Hi - Vthe
i
Vi
– global features
Fgi = sigmoid (
) for i = 1, 2
where
ij
H1 = sX
V1 = Ymax - Ymin
for
max - Xmin,
NiSj
aspect
ratio
H2 = Nhorizontal_stroke, V2 = N vertical_stroke
for
aspect ratio
sij
Ni
– locali feature
CEDAR
WMR
1
2
3
4
5
6
7
8
w[5.0]
1
2
o[6.6]
w[7.2]
w[8.6]
9
o[7.7]r[5.8]
o[6.1]
w[5.0]
w[7.6]
Distance between lexicon entry
‘word’ first character ‘w’ and the
image between:
- segments 1 and 4 is 5.0
- segments 1 and 3 is 7.2
- segments 1 and 2 is 7.6
o[6.0]
3
o[8.3]
4
r[7.6]
d[4.9]
r[6.4]
r[7.5]
o[7.6]r[6.3]
5
6
7
8
r[3.8]
o[7.2]
o[10.6]
9
d[4.4]
d[6.5]
o[7.8]r[8.6]
Find the best way of accounting for characters
‘w’, ‘o’, ‘r’, ‘d’ buy consuming all segments 1
to 8 in the process
CEDAR
CMR
Input
- Chaincode of word image
- Lexicon
Over segmentation of characters so that no two
characters remain merged
Features extracted from each segment
Recognize one or more (up to four) segments as a
single character of the alphabet
Obtain character strings (ASCII) corresponding
to the segments in the word image
Derive “goodness” of match between character
string and lexicon entries
Output
Rank the lexicon based on “goodness” score
CEDAR
CMR
1
4
2
3
6
7
5
8
-Image from 1 to 3 is a in with
0.5 confidence
-Image from segment 1 to 4 is
a ‘w’ with 0.7 confidence
-Image from segment 1 to 5 is
a ‘w’ with 0.6 confidence and
an ‘m’ with 0.3 confidence
w[.6], m[.3]
w[.7]
1
i[.8], l[.8]
2
i[.7]
d[.8]
o[.5]
u[.5], v[.2]
3
4
u[.3]
m[.2]
5
6
r[.4]
7
m[.1]
Find the best path in
graph from segment 1 to 8
word
8
CEDAR
img ftrs
lex ftrs
match length
match gaps
match word lengths
match
+
word level
match ascenders
match
+
match descenders
match
+
accept
r e j e c t
phrase level
Hover System