Document 7616256
Download
Report
Transcript Document 7616256
Visual CAPTCHA with Handwritten
Image Analysis
Amalia Rusu and Venu Govindaraju
CEDAR
University at Buffalo
Background on CAPTCHA
Completely Automatic Public Turing test to tell Computers and Humans
Apart – CAPTCHA
CAPTCHA should be automatically generated and graded
Tests should be taken quickly and easily by human users
Tests should accept virtually all human users and reject software agents
Tests should resist automatic attack for many years despite the
technology advances and prior knowledge of algorithms
Exploits the difference in abilities between humans and machines
(e.g., text, speech or facial features recognition)
A new formulation of the Alan Turing’s test - “Can machines think?”
Securing Cyberspace Using CAPTCHA
The user initiates
initiate the
the
dialog and has to be
authenticated by server
Authentication Server
User
Challenge
Response
Internet
User authentication
Automatic Authentication Session for Web Services.
Initialization
Handwritten CAPTCHA Challenge
User Response
Verification
Objective
Develop CAPTCHAs based on the ability gap between humans
and machines in handwriting recognition using Gestalt laws of perception
State-of-the-art in HR
Lexicon Lexicon Driven
size
time
(secs)
accuracy
Top 1
10
0.027
100
Grapheme Model
accuracy
Top 2
time
(secs)
Top 1
Top 2
96.53
98.73
0.021
96.56
98.77
0.044
89.22
94.13
0.031
89.12
94.06
1000
0.144
75.38
86.29
0.089
75.38
86.29
20000
1.827
58.14
66.56
0.994
58.14
66.49
Speed and accuracy of a HR. Feature extraction time is excluded.
[Xue, Govindaraju 2002]
Testing platform is an Ultra-SPARC.
H-CAPTCHA Motivation
Machine recognition of handwriting is more difficult than printed
text
Handwriting recognition is a task that humans perform easily and
reliably
Several machine printed text based CAPTCHAs have been
already broken
Greg Mori and Jitendra Malik of the UCB have written a program that can solve
Ez-Gimpy with accuracy 83%
Thayananthan, Stenger, Torr, and Cipolla of the Cambridge vision group have
written a program that can achieve 93% correct recognition rate against Ez-Gimpy
Gabriel Moy, Nathan Jones, Curt Harkless, and Randy Potter of Areté Associates
have written a program that can achieve 78% accuracy against Gimpy-R
Speech/visual features based CAPTCHAs are impractical
H-CAPTCHAs thus far unexplored by the research community
H-CAPTCHA Challenges
Generation of random and ‘infinite many’ distinct
handwritten CAPTCHAs
Quantifying and exploiting the weaknesses of state-of-theart handwriting recognizers and OCR systems
Controlling distortion - so that they are human readable
(conform to Gestalt laws) but not machine readable
Generation of random and infinite many distinct
handwritten text images
Use handwritten word images that current recognizers cannot read
Handwritten US city name images available from postal applications
Collect new handwritten word samples
Create real (or nonsense) handwritten words and sentences by gluing isolated
upper and lower case handwritten characters or word images
Generation of random and infinite many distinct
handwritten text images
Use handwriting distorter for generating “human-like” samples
Models that change the trajectory/shape of the letter in a controlled fashion (e.g.
Hollerbach’s oscillation model)
Original handwritten image (a). Synthetic images (b,c,d,e,f).
Exploit the Source of Errors for State-of-the-art
Handwriting Recognizers
Word Model Recognizer (WMR) [Kim, Govindaraju 1997]
Accuscript
[Xue,
Govindaraju 2002]
Lexicon Driven
Model
Distance between lexicon entry ‘word’
Grapheme Based Model first character ‘w’ and the image
End
1
2
3
Loops
4
5
6
7
8
between:
Junction
9
- segments 1 and 4 is 5.0
lexicon driven
- segments 1 and
3 is 7.2
End
approach
grapheme-based
recognizer
- segments
1 and 2 is 7.6
chain code based image
processing
extracts high-level structural
pre-processingfeatures from characters such as
w[5.0]
o[7.7]r[5.8] segmentation
r[7.6]
loops, turns,
d[4.9] junctions, arcs,
o[6.1]
r[6.4]without previous segmentation
feature extraction
w[5.0]
o[6.0]
r[7.5]
o[7.6]r[6.3]
o[8.3] Loop
w[7.6] Turns
7uses a stochastic
1
2 o[6.6] 3
4 5dynamic6matching
8
9finite state
r[3.8]
d[4.4]
automata model based on the
w[7.2]
o[7.2]
d[6.5]
o[10.6]
extracted
features
w[8.6]
o[7.8]r[8.6]
static lexicons
in the ‘w’, ‘o’,
Find the bestwayuses
of accounting
for characters
recognition
process
‘r’, ‘d’ buy consuming
all segments
1 to 8 in the
process
Source of Errors for State-of-the-art Handwriting
Recognizers
Image quality
Background noise, printing surface, writing styles
Image features
Variable stroke width, slope, rotations, stretching, compressing
Segmentation errors
Over-segmentation, merging, fragmentation, ligatures, scrawls
Recognition errors
Confusion with a similar lexicon entries, large lexicons
Gestalt Laws
Gestalt psychology is based on the observation that we often
experience things that are not a part of our simple sensations
What we are seeing is an effect of the whole event, not contained
in the sum of the parts (holistic approach)
Organizing principles: Gestalt Laws
By no means restricted to perception only (e.g. memory)
Gestalt Laws
1. Law of closure
2. Law of similarity
OXXXXXX
XOXXXXX
XXOXXXX
XXXOXXX
XXXXOXX
XXXXXOX
XXXXXXO
3. Law of proximity
**************
**************
**************
4. Law of symmetry
[
][
][
]
Gestalt Laws
5. Law of continuity
a) Ambiguous segmentation
b) Segmentation based on good continuity, follows the path of minimal curvature change
c) Perceptually implausible segmentation
6. Law of familiarity
a) Ambiguous segmentation
b) Perceptual segmentation
c) Segmentation based on good continuity proves to be erroneous
Gestalt Laws
7. Figure and ground
8. Memory
Control Overlaps
Gestalt laws: proximity, symmetry, familiarity, continuity, figure and ground
Create horizontal or vertical overlaps
For same word, smaller distance overlaps For different words, bigger distance overlaps
Control Occlusions
Gestalt laws: closure, proximity, familiarity
Add occlusions by circles, rectangles, lines with random angles
Ensure small enough occlusions such that they do not hide letters completely
Control Occlusions
Gestalt laws: closure, proximity, familiarity
Add occlusions by waves from left to right on entire image, with various
amplitudes / wavelength or rotate them by an angle
Choose areas with more foreground pixels, on bottom part of the text image
(not too low not to high)
Control Extra Strokes
Gestalt laws: continuity, figure and ground, familiarity
Add occlusion using the same pixels as the foreground pixels (black pixels),
arcs, or lines, with various thickness
Curved strokes could be confused with part of a character
Use asymmetric strokes such that the pattern cannot be learned
Control Letter/Word Orientation
Gestalt laws: memory, internal metrics, familiarity of letters
vertical mirror
horizontal mirror
flip-flop
Change word orientation entirely, or the orientation for few letters only
Use variable rotation, stretching, compressing
General H-CAPTCHA Generation Algorithm
Input.
Original (randomly selected) handwritten image (existing US city name
image or synthetic word image with length 5 to 8 characters or meaningful
sentence)
Lexicon containing the image’s truth word
Output.
H-CAPTCHA image
Method.
Randomly choose a number of transformations
Randomly establish the transformations corresponding to the given number
If more than one transformation is chosen then
A priori order is assigned to each transformation based on experimental results
Sort the list of chosen transformations based on their priori order and apply them
in sequence, so that the effect is cumulative
Testing Results on Machines
HW Recognizer
WMR
Accuscript
Lexicon Size
4,000
40,000
4,000
40,000
Occlusion by circles
35.93%
20.28%
32.34%
17.37%
Vertical Overlap
27.88%
14.36%
12.64%
3.94%
Horizontal Overlap
(Small)
24.35%
10.70%
2.93%
0.60%
Black Waves
16.36%
5.33%
1.57%
0.38%
Occlusion by waves
15.43%
7.00%
10.56%
4.28%
Horizontal Overlap
(Large)
12.93%
3.56%
2.42%
0.36%
Overlap Different
Words
3.80%
0.48%
4.43%
0.92%
Flip-Flop
0.46%
0.14%
0.70%
0.19%
General Image
Transformations
9.28%
N/A
4.41%
N/A
The accuracy of HR on images deformed using Gestalt laws approach. The number of tested images is
4,127 for each type of transformation. HR running time increases from few seconds per image for
lexicon 4,000 to several minutes per image for lexicon 40,000.
Testing Results on Humans
Human
Tests
All
Transforms
Occlusion by
circles
Vertical
Overlap
Horizontal
Overlap
(Small)
Black Waves
Occlusion by
waves
Horizontal
Overlap
(Large)
Nr. Of Tested
Images
1069
90
88
90
90
87
89
Accuracy
76.08%
67.78%
87.50%
76.67%
80.00%
80.46%
65.17%
The accuracy of human readers on images deformed using Gestalt laws approach.
A word image is recognized correctly when all characters are recognized.
90.00
WMR(4000)
80.00
WMR(40000)
Accuscript(4000)
70.00
Accuracy
Accuscript(40000)
60.00
Humans
50.00
40.00
30.00
20.00
10.00
0.00
Transformations
H-CAPTCHA Evaluation
No risk of image repetition
Image generation completely automated: words, images and distortions
chosen at random
The transformed images cannot be easily normalized or rendered
noise free by present computer programs, although original images
must be public knowledge
Deformed images do not pose problems to humans
Human subjects succeeded on our test images
Test against state-of-the-art: Word Model Recognizer, Accuscript
CAPTCHAs unbroken by state-of-the-art recognizers
Future Work
Develop general methods to attack H-CAPTCHA (e.g. pre and
post processing techniques)
Research lexicon free approaches for handwriting recognition
Quantify the gap between humans and machines in reading
handwriting by category (of distortions & Gestalt laws)
Parameterize the difficulty levels of Gestalt based H-CAPTCHAs
Thank You
Questions?