Transcript Slide 1
CEDAR Chaincode Generation Input Image data 2 1 information 3 Y Contour separation extracted by algorithm X 4 0 7 Status Mode Slope Curvature data information 5 6 Eight Contour Directions information end of chain Output Chaincode contour Represented as an array of <x, y> coordinates and corresponding slopes (0..7) at each contour point CEDAR Algorithm Input - Array of bytes representing pixels. - Value 0 for black and 255 for white. Start at upper right corner of image Travel to the left, down a row, until you move from white pixel to black pixel Yes Is it marked? No New object, so mark pixel and store it Travel counter-clock wise around boundary, storing visited pixels, and marking pixels as necessary, until you return to the start of the contour No At lower left corner? Yes Output Contour representation of the image CEDAR Pre-scan Digit Recognition Input Chaincode contour of connected components in address block Use fast digit recognizer - POLY OR CP on each appropriate component in address block Output - Recognition choice with confidence on each component - Confidence of characters are typically low - Confidence of “real” numerals are typically high CEDAR POLY Digit Recognizer • Method – 1240 binary pixel pair features used – Linear discriminant classifier used • Performance – 1000 digits per second on a RS 6000 – 94% recognition rate on a standard test set – useful in separating alpha characters and numerals • Feature Extraction – Set of 1240 binary (on, off) features – Features are based on whether particular pixels or pairs of pixels are BLACK – Pixel pairs are empirically determined – Consider distinguishing “7” and “2” CEDAR • Classification – Uses linear discriminant functions – Training: • 1241 weights (one for each of the features) plus a constant are determined for each of the 10 classes – Testing: For each new test image do the following: • For each digit class (0..9) create a sum consisting of all the weights corresponding to a feature that is “on”, add in the constant • Compare the 10 sums and choose the largest value • This is the top choice class – Output: • Ranked list of the 10 classes sorted by the sums CEDAR CP, Digit Recognizer • Method – combines a 3-layer back propagation neural network classifier using Curvature features with POLY – Top 2 choices of POLY and Top 2 choice of Curvature recognizer are combined using logistic regression • Performance – 170 digits per second on a RS 6000 – 96% recognition rate on standard test set CEDAR Curvature, Digit Recognizer • Input – Binary image of digit size normalized by imposing a 4x4 grid on the image – Since the features are region based (as opposed to pixel based) this form of size normalization is effective CEDAR • Feature Extraction – Set of 296 real-valued features • 208 based on contour shape (slope and curvature) – For each of the 16 regions in the 4x4 grid determine • percent of pixels with each of the 8 possible slopes • percent of pixels with each of 5 ranges of curvature computed over a neighborhood of 12-pixel window 0 1 2 3 4 5 dS 6 7 Slopes 1 S -2 -1 0 1 2 Curvatures CEDAR • 84 based on stroke transitions between regions – Chaincode represents the contour as a sequence of boundary pixels, so there is a notion of “moving” from part of the image to another – In a 4x4 grid, there are 84 possible transitions • 4 based on size, location, and number of interior contours – Image is divided into 3 regions: UPPER, MIDDLE, and LOWER – – – – Determine the center of region bounded by interior contours Location of center determines which of 3 features is set Value of feature is ratio of “hole” area to area of bounding box last feature stores number of interior contours present CEDAR • Classification – Uses a 3-layer back propagation neural network • 296 input nodes for feature values • 80 hidden nodes • 10 output nodes (1 for each digit class) – Connections between nodes have associated weights determined during training – Output node reporting the highest value corresponds to classifier’s top choice CEDAR Thresholding Performance Graph 6 5 .0 0 6 0 .0 0 5 5 .0 0 5 0 .0 0 R ejectio n R ate 4 5 .0 0 4 0 .0 0 3 5 .0 0 3 0 .0 0 c o m b ina to r c urva ture 2 5 .0 0 2 0 .0 0 1 5 .0 0 1 0 .0 0 5 .0 0 0 .0 0 0 .0 0 1 .0 0 2 .0 0 E rro r R a te 3 .0 0 4 .0 0 CEDAR GSC - Top Level Input Put bounding box around the image Hyper GSC Recognizer Is confidence level of first class 0? NO YES GSC Recognizer Output Output CEDAR GSC, Digit Recognizer • Method – 512 binary valued features representing Gradient, Structural, Concavity characteristics of the image – Uses a nearest neighbor classifier • Performance – 100 digits per second on a RS 6000 – 97% recognition rate on standard test set CEDAR • Image Processing – Size normalization accomplished by imposing a 4x4 grid • Grid is determined by partitioning the image horizontally and vertically into 4 equal pixel mass partitions 100 90 80 % Reject % Reject 70 60 Original 50 Variable Grid 40 30 20 10 Uniform and Variable Gridding 0 0 10 20 30 % Error 40 50 CEDAR • Feature Extraction – Set of 512 binary features – Choice of features motivated by belief that multi-scale features have the best chance of capturing the difference between classes of digits or characters CEDAR – 192 Gradient features (finest scale) • Gradient is the angle perpendicular to local direction of the contour boundary and is computed at every pixel • Quantized to 12 different ranges of angles • Histogram of occurrences of angles (ranges) for each of the 16 regions in the 4x4 grid are computed • Histogram values that cross a threshold are turned “on” – 192 Structural features (intermediate scale) • 12 structures consisting of groups of pixels form mini-strokes – – – – – horizontal stroke vertical stroke diagonal rising diagonal falling corners upper and lower surfaces left and right surfaces upper and lower surfaces upper and lower surfaces (4) • If any pixel group falling in a region (4x4) satisfies the rule for a mini-stroke, the feature is “on” CEDAR – 128 Concavity features (coarsest scale) • 16 pixel density features – Does the percentage of “on” pixels in region (4 x 4) exceed a threshold • 32 large stroke features – Does region (4 x 4) contain a horizontal run or vertical run of “on” pixels greater in length than a threshold • 80 concavity features – Does region (4 x 4) contain a concavity pointing • up • down • left • right • enclosed “hole” CEDAR • Classification – Identifies 6 nearest neighbors from among templates – Takes the weighted vote of the neighbors where each neighbor’s vote is weighted by its proximity to the test vector – Performance of classifier is dependent on how representative the templates are of the set of “all possible” digits CEDAR Gradient Features Input Put a 4 x 4 non-uniform grid on the image by placing sampling of a equimass divisions of the histogram Smooth the image by filtering Convolve the image with 3x3 Sobel operators to find the gradients Dividing the range of direction in 12 nonoverlapping regions each of 2*pi/12 radians Do a histogram based thresholding for each sampling region In each of the 4x4 regions if there are no pixels with gradient values in a particular range then set the corresponding bit in feature vector to 1 (12 bit feature vector for each region corresponding to 12 bins of directions) Output 12x4x4 = 128 bit feature vector CEDAR Structural Features Input Place a 4x4 fixed grid on the image Apply a set of 12 rules to each pixel to find the stroke and corner features YES For each of the 4x4 region is the no pixels satisfying a rule > the threshold set for the rule? Set the corresponding bit in feature vector to 1 (12 bit feature vector for each region signifying the 12 rules) Output NO Set the corresponding bit in feature vector to 1 12x4x4 = 128 bit feature vector as structural features CEDAR Concavity Features Input Place 4x4 fixed grid on the image Convolve the image with a starlike operator by shooting rays in 8 directions and determining what each ray hits Define eight types of pixels depending on the way the rays shoot out from the pixel hit the boundary For each type of pixel define a threshold. For each type of pixel set aside a bit in the feature vector for each of the regions YES IS (no of corresponding type of pixel)/(area of region) > threshold set for the type of pixel. Set the corresponding bit in the feature vector to 1 Output NO Set the corresponding bit in the feature vector to 0 8x4x4 bit feature vector as the concavity features CEDAR Word Recognition Control Input - Word Image - Lexicon - Word Recognizer - 1 (WMR) - Word Recognizer - 2 (CMR) Call WMR with expanded lexicon conf = LO REJECT WMR results conf = HI ACCEPT WMR top choice conf(top) = MED Call CMR with n-best WMR choices (n<11) conf = LO REJECT CMR results conf = HI ACCEPT CMR top choice conf = MED REJECT NO Classifiers concur ? Output conf = HI YES ACCEPT common top choice CEDAR WMR Input - Chaincode of Word Image - Lexicon Over-segmentation of word into characters so that no two characters remain merged Features extracted from each segment Match one or more (up to four) segments with each character of a single lexicon entry Derive “goodness” of match between segments and a lexicon entry Score match for all lexicon entries Output Rank the lexicon based on matching score CEDAR WMR Features • 74 chaincode based features are extracted 2 global and 72 local features. • Distribution of the 8 directional slopes for 9 (3 x 3) sub-images form 72 local feature. Hi - Vthe i Vi – global features Fgi = sigmoid ( ) for i = 1, 2 where ij H1 = sX V1 = Ymax - Ymin for max - Xmin, NiSj aspect ratio H2 = Nhorizontal_stroke, V2 = N vertical_stroke for aspect ratio sij Ni – locali feature CEDAR WMR 1 2 3 4 5 6 7 8 w[5.0] 1 2 o[6.6] w[7.2] w[8.6] 9 o[7.7]r[5.8] o[6.1] w[5.0] w[7.6] Distance between lexicon entry ‘word’ first character ‘w’ and the image between: - segments 1 and 4 is 5.0 - segments 1 and 3 is 7.2 - segments 1 and 2 is 7.6 o[6.0] 3 o[8.3] 4 r[7.6] d[4.9] r[6.4] r[7.5] o[7.6]r[6.3] 5 6 7 8 r[3.8] o[7.2] o[10.6] 9 d[4.4] d[6.5] o[7.8]r[8.6] Find the best way of accounting for characters ‘w’, ‘o’, ‘r’, ‘d’ buy consuming all segments 1 to 8 in the process CEDAR CMR Input - Chaincode of word image - Lexicon Over segmentation of characters so that no two characters remain merged Features extracted from each segment Recognize one or more (up to four) segments as a single character of the alphabet Obtain character strings (ASCII) corresponding to the segments in the word image Derive “goodness” of match between character string and lexicon entries Output Rank the lexicon based on “goodness” score CEDAR CMR 1 4 2 3 6 7 5 8 -Image from 1 to 3 is a in with 0.5 confidence -Image from segment 1 to 4 is a ‘w’ with 0.7 confidence -Image from segment 1 to 5 is a ‘w’ with 0.6 confidence and an ‘m’ with 0.3 confidence w[.6], m[.3] w[.7] 1 i[.8], l[.8] 2 i[.7] d[.8] o[.5] u[.5], v[.2] 3 4 u[.3] m[.2] 5 6 r[.4] 7 m[.1] Find the best path in graph from segment 1 to 8 word 8 CEDAR img ftrs lex ftrs match length match gaps match word lengths match + word level match ascenders match + match descenders match + accept r e j e c t phrase level Hover System