The Limits of OCR - University at Buffalo

Download Report

Transcript The Limits of OCR - University at Buffalo

Image Understanding & Web Security

Henry Baird

Joint work with:

Richard Fateman, Allison Coates, Kris Popat, Monica Chew, Tom Breuel, & Mark Luk

A fast-emerging research topic

Human Interactive Proofs (HIPs; definition later):

– – – – first instance in 1999 research took hold in CS security theory field first intersects image understanding, cog sci, etc etc fast attracting researchers, engineers, & users

This talk:

A brief history

of HIPs –

Existing systems

-- w/ my critiques – –

Professional activities, so far

-- incl. the 1 st Int’l Workshop

In detail:

PARC’s PessimalPrint & BaffleText H. Baird & K. Popat, “Web Security & Document Image Analysis,” in J. Hu & A. Antonacopoulos (Eds.), Web Document Analysis, World Scientific, 2003 (in press).

DIAR, Madison, WI – June 21, 2003 (HSB) 2

Straws in the wind…

  

90’s: spammers trolling for email addresses

– in defense, people disguise them, e.g.

baird AT parc DOT com ” 1997: abuse of ‘Add-URL’ feature at AltaVista

– – some write programs to add

their

URL many times skewed the search rankings

Andrei Broder et al (then at DEC SRC)

– – – a user action which is legitimate when performed once becomes abusive when repeated many times no effective legal recourse how to block or slow down these programs … DIAR, Madison, WI – June 21, 2003 (HSB) 3

An image of text, not ASCII

The first known instance…

Altavista’s AddURL filter

  1999: “ransom note filter” – – – randomly pick letters, fonts, rotations – render as an image every user is required to read and type it in correctly reduced “spam add_URL” by “over 95%” Weaknesses: isolated chars, filterable noise, affine deformations M. D. Lillibridge, M. Abadi, K. Bharat, & A. Z. Broder, “Method for Selectively Restricting Access to Computer Systems,” U.S. Patent No. 6,195,698, Filed April 13, 1998, Issued February 27, 2001.

DIAR, Madison, WI – June 21, 2003 (HSB) 4

Yahoo!’s “Chat Room Problem”

September 2000

Udi Manber asked Prof. Manuel Blum’s group at CMU: – programs impersonate people in chat rooms, then hand out ads – ugh!

how can all machines be denied access to a Web site without inconveniencing any human users?

I.e., how to distinguish between machines and people on-line … a kind of ‘Turing test’ !

DIAR, Madison, WI – June 21, 2003 (HSB) 5

Alan Turing (1912-1954)

1936 a universal model of computation 1940s helped break Enigma (U-boat) cipher 1949 first serious uses of a working computer including plans to read printed text (he expected it would be easy) 1950 proposed a test for machine intelligence DIAR, Madison, WI – June 21, 2003 (HSB) 6

Turing’s Test for AI

How to judge that a machine can ‘think’:

– – play an ‘imitation game’ conducted via

teletypes

a human judge & two invisible interlocutors:   a human a machine `pretending’ to be human – after asking any questions (challenges) he/she wishes, the judge decides which is human –

failure to decide correctly

would be convincing evidence of machine intelligence (Turing asserted) Modern

GUIs

invite richer challenges than teletypes….

A. Turing, “Computing Machinery & Intelligence,” Mind, Vol. 59(236), 1950.

DIAR, Madison, WI – June 21, 2003 (HSB) 7

“CAPTCHAs”:

Completely Automated Public Turing Tests to Tell Computers & Humans Apart

(M. Blum, L. A. von Ahn, J. Langford, et al, CMU-SCS)     challenges can be generated & graded automatically (i.e. the judge is a machine) accepts virtually all humans, quickly & easily rejects virtually all machines resists automatic attack for many years (even assuming that its algorithms are known?)

NOTE: the machine administers, but cannot pass the test!

L. von Ahn, M. Blum, N.J. Hopper, J. Langford, “CAPTCHA: Using Hard AI Problems For Security,” Proc., EuroCrypt 2003, Warsaw, Poland, May 4-8, 2003 [to appear].

DIAR, Madison, WI – June 21, 2003 (HSB) 8

CMU’s ‘Gimpy’ CAPTCHA

    Randomly pick: English words, deformations, occlusions, backgrounds, etc Challenge user to type in

any three

of the words Designed by CMU team: tried out by Yahoo!

Problem: users hated it --- Yahoo! withdrew it L. Von Ahn, M. Blum, N. J. Hopper, J. Langford, The CAPTCHA Web Page, http://www.captcha.net.

DIAR, Madison, WI – June 21, 2003 (HSB) 9

Yahoo!’s present CAPTCHA: “EZ-Gimpy”

    Randomly pick:

one

English word, deformations, degradations, occlusions, colored backgrounds, etc Better tolerated by users Now used on a large scale to protect various services Weaknesses: a single typeface, English lexicon DIAR, Madison, WI – June 21, 2003 (HSB) 10

PayPal’s CAPTCHA

    Nothing published Seems to use a single typeface Picks, at random: letters, overlain pattern Weaknesses: single typeface, simple grid, no image degradations, spaced apart DIAR, Madison, WI – June 21, 2003 (HSB) 11

Cropping up everywhere…

In use today, to defend against:

– skewing search-engine rankings (Altavista, 1999) – – infesting chat rooms, etc (Yahoo!, 2000) gaming financial accounts (PayPal, 2001) – – robot spamming (MailBlocks, SpamArrest 2002)

In the last few months:

Overture, Chinese website, HotMail, CD rebate, TicketMaster, MailFrontier, Qurb, Madonnarama, …

…have you seen others?

On the horizon:

– – – ballot stuffing, password guessing, denial-of-service attacks `blunt force’ attacks (e.g. UT Austin break-in, Mar ’03)

…many others

Similar problems w/ scrapers; also, likely on Intranets.

D. P. Baron, “eBay and Database Protection,” Case No. P-33, Case Writing Office, Stanford Graduate School of Business, Stanford Univ., 2001.

DIAR, Madison, WI – June 21, 2003 (HSB) 12

The Known Limits of Image Understanding Technology

There remains a

large gap in ability

between human and machine vision systems, even when reading printed text Performance of OCR machines has been systematically studied:

7 year olds can consistently do better!

This ability gap has been mapped quantitatively

S. Rice, G. Nagy, T. Nartker, OCR: An Illustrated Guide to the Frontier, Kluwer Academic Publishers: 1999.

DIAR, Madison, WI – June 21, 2003 (HSB) 13

blur thrs sens

Image Degradation Modeling

thrs x blur Effects of printing & imaging: We can generate challenging images pseudorandomly H. Baird, “Document Image Defect Models,” in H. Baird, H. Bunke, & K. Yamamoto (Eds.), Structured Document Image Analysis, Springer-Verlag: New York, 1992.

DIAR, Madison, WI – June 21, 2003 (HSB) 14

Machine Accuracy is a Smooth Monotonic Function of Parameters

T. K. Ho & H. S. Baird, “Large Scale Simulation Studies in Image Pattern Recognition,” IEEE Trans. on PAMI, Vol. 19, No. 10, p. 1067-1079, October 1997.

DIAR, Madison, WI – June 21, 2003 (HSB) 15

Can You Read These Degraded Images?

Of course you can ….

but OCR machines cannot!

DIAR, Madison, WI – June 21, 2003 (HSB) 16

Experiments by PARC & UCB-CS

   

Pick words at random:

– – 70 words commonly used on the Web w/out ascenders or descenders (cf. Spitz)

Vary physics-based image degradation parameters:

blur, threshold, x-scale

--

within certain ranges

Pick fonts at random from a large set:

Times Roman (TR),

Times Italic

(TI), Palatino Roman (PR),

Palatino Italic

Courier Roman (CR), (PI),

Courier Oblique

(CO), etc

Test legibility on:

– ten human volunteers (UC Berkeley CS Dept grad students) – three OCR machines:

Expervision TR

(E),

ABBYY FineReader

(A),

IRIS Reader

(I) DIAR, Madison, WI – June 21, 2003 (HSB) 17

Results: OCR Accuracy, by machine

Times R Times I Courier O Palatino R Palatino I total fraction of words correct 1 0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 Expervis'n ABBYY OCR machine

Each machine has its peculiar blind spots

IRIS

DIAR, Madison, WI – June 21, 2003 (HSB) 18

OCR Accuracy: varying blur & threshold

The machines share some blind spots DIAR, Madison, WI – June 21, 2003 (HSB) 19

PessimalPrint:

exploiting image degradations

Three OCR machines fail when:

OCR outputs

blur

= 0.0 &

threshold

 0.02 - 0.08

~~~.I~~~ ~~i1~~ N/A

threshold

= 0.02

& any value of

blur

… but people find all these easy to read A. Coates, H. Baird, R. Fateman, “Pessimal Print: A Reverse Turing Test,”

Proc. 6th IAPR Int’l Conf. On Doc. Anal. & Recogn. (ICDAR’01),

Seattle, WA, Sep 10-13, 2001.

DIAR, Madison, WI – June 21, 2003 (HSB)

N/A N/A ~~I~~

20

High Time for a Workshop!

Manuel Blum proposes it, rounds up some key speakers Henry Baird offers PARC as venue; Kris Popat helps run it

Goals:

Invite all known principals: theory, systems, engineers, users Describe the state of the art Plan next steps for the field

Organization:

– – – – – – ~30 attendees abstracts only, 1-5 pages, no refereeing, no archival publication 100% participation: everyone gives a (short) talk “mixing it up”: panel & working group discussions 2-1/2 days, lots of breaks for informal socializing plenary talk by John McCarthy ‘Father of AI’ DIAR, Madison, WI – June 21, 2003 (HSB) 21

1st NSF Int’l Workshop on Human Interactive Proofs

PARC, Palo Alto, CA, January 9-11, 2002

DIAR, Madison, WI – June 21, 2003 (HSB) 22

HIP’2002 Participants

CMU SCS, Aladdin Center

Manuel Blum, Lenore Blum, Luis von Ahn, John Langford, Guy Blelloch, Nick Hopper, Ke Yang, Brighten Godfrey, Bartosz Przydatek, Rachel Rue

PARC SPIA/Security/Theory

Henry Baird, Kris Popat, Tom Breuel, Prateek Sarkar, Tom Berson, Dirk Balfanz, David Goldberg

UCB CS & SIMS

Richard Fateman, Allison Coates, Jitendra Malik, Doug Tygar, Alma Whitten, Rachna Dhamija, Monica Chew, Adrian Perrig, Dawn Song

RPI

George Nagy

Stanford

John McCarthy

NSF

Robert Sloan

Altavista

Andrei Broder

Yahoo!

Udi Manber

Bell Labs

Dan Lopresti

IBM T.J. Watson

Charles Bennett

InterTrust Star Labs

Stuart Haber

City Univ. of Hong Hong

Nancy Chan

Weizmann Institute

Moni Naor

RSA Security Laboratories

Ari Juels

Document Recognition Techs, Inc

Larry Spitz

DIAR, Madison, WI – June 21, 2003 (HSB) 23

Variations & Generalizations

   CAPTCHA

Completely Automatic Public Turing test to tell Computers and Humans Apart

HUMANOID

Text-based dialogue which an individual can use to authenticate that he/she is himself/herself (‘naked in a glass bubble’)

PHONOID

Individual authentication using spoken language

Human Interactive Proof (HIP)

An automatically administered challenge/response protocol allowing a person to authenticate him/herself as belonging to a certain group over a network without the burden of passwords, biometrics, mechanical aids, or special training.

DIAR, Madison, WI – June 21, 2003 (HSB) 24

Highlights of HIP’2002

 Theory – some text-based CAPTCHAs are provably breakable  Ability Gaps – – – – vision: gestalt, segmentation, noise immunity, style consistency speech: noise of many kinds, clutter (cocktail party effect) intelligence: puzzles, analogical reasoning, weak logic gestures, reflexes, common knowledge , …  Applications – – subtle system-level vulnerabilties aggressive arms race with shadowy enemies http://www.parc.com/istl/groups/did/HIP2002 DIAR, Madison, WI – June 21, 2003 (HSB) 25

Funding & Partnerships

NSF

– – –

Robert Sloan

, Dir, Theory of Computing Pgm strongly supportive of this newborn field encouraged grant proposals 

Yahoo!

– – – willing to run field trials user acceptance laboratory able to detect intrusion DIAR, Madison, WI – June 21, 2003 (HSB) 26

Disciplines

     

Participating now:

Cryptography Security Pattern Recognition Computer Vision Artificial Intelligence eCommerce     

Needed:

Cognitive Science Psychophysics (esp. of Reading) Biometrics Business, Law, … ….?

DIAR, Madison, WI – June 21, 2003 (HSB) 27

Weaknesses of Existing Reading-Based CAPTCHAs

  English lexicon is too predictable: – – dictionaries are too small only 1.2 bits of entropy per character (cf. Shannon) Physics-based image degradations vulnerable to well-studied image restoration attacks, e.g.

  Complex images irritate people – – even when they

can

read them need user-tolerance experiments DIAR, Madison, WI – June 21, 2003 (HSB) 28

Strengths of Human Reading

Literature on the psychophysics of reading is relevant:     familiarity helps, e.g. English words optimal word-image size (subtended angle) is known (0.3-2 degrees) optimal contrast conditions known other factors measured for the best performance: to achieve and sustain “critical reading speed”

BUT gives no answer to: where’s the optimal comfort zone?

G. E. Legge, D. G. Pelli, G. S. Rubin, & M. M. Schleske, “Psychophysics of Reading: I. normal vision,” Vision Research 25 (2), 1985.

A. J. Grainger & J. Segui, “Neighborhood Frequency Effects in Visual Word Recognition,’ Perception & Psychophysics

47

, 1990..

DIAR, Madison, WI – June 21, 2003 (HSB) 29

Designing a Stronger CAPTCHA: BaffleText principles

Nonsense words.

– generate ‘pronounceable’ –

not ‘spellable’

– words – using a variable-length character

n

-gram Markov model they look familiar, but aren’t in any lexicon,

e.g

.

ablithan wouquire quasis

Gestalt perception.

– force inference of a whole word-image from fragmentary or occluded characters, e.g.

– using a single familiar typeface also helps M. Chew & H. S. Baird, “BaffleText: A Human Interactive Proof,”

Proc., SPIE/IS&T Conf. on Document Recognition & Retrieval X

, Santa Clara, CA, January 23-24, 2003 .

DIAR, Madison, WI – June 21, 2003 (HSB) 30

Mask Degradations

Parameters of pseudorandom mask generator: – – –

shape type

: square, circle, ellipse, mixed

density

: black-area / whole-area

range of radii

of shapes DIAR, Madison, WI – June 21, 2003 (HSB) 31

BaffleText Experiments at PARC

 

Goal

: map the margins of accurate & comfortable human reading on this family of images

Metrics:

– – – – objective difficulty: accuracy subjective difficulty: rating response time exit survey: how tolerable overall 

Participation:

– – – 41 individual sessions >1200 challenge/response trials 18 exit surveys DIAR, Madison, WI – June 21, 2003 (HSB) 32

BaffleText challenge webpage

DIAR, Madison, WI – June 21, 2003 (HSB) 33

BaffleText user ratings

DIAR, Madison, WI – June 21, 2003 (HSB) 34

User Acceptance

%

Subjects willing to solve a BaffleText…

17% 39%

every time they send email

if

it cut spam by 10x

89% 94%

every time they register for an e-commerce site

if

it led to more trustworthy recommendations

100%

every time they register for an email account Out of 18 responses to the exit survey.

DIAR, Madison, WI – June 21, 2003 (HSB) 35

Subjective difficulty tracks objective difficulty

DIAR, Madison, WI – June 21, 2003 (HSB) 36

How to engineer BaffleText

 When we generate a challenge, – – need to estimate its difficulty throw away if too easy or too hard  Apply an idea from the psychophysics of reading: – image “complexity” metric: how hard to read – simple to compute:

perimeter **

/ black-area

DIAR, Madison, WI – June 21, 2003 (HSB) 37

Image complexity predicts objective difficulty

DIAR, Madison, WI – June 21, 2003 (HSB) 38

Image complexity predicts subjective difficulty

DIAR, Madison, WI – June 21, 2003 (HSB) 39

Engineering guidelines

 For high performance, image complexity should fall in the range 50-100; e.g.

50 100  Within this regime,

BaffleText

performs well: – 100% human subjects willing to try to read it – – – – 89% accuracy by humans 0% accuracy by commercial OCR 3.3 difficulty rating, out of 10 (on average) 8.7 seconds / trial on average DIAR, Madison, WI – June 21, 2003 (HSB) 40

The latest serious

G. Mori & J. Malik, “Recognizing Objects in Adversarial Clutter,” submitted to CVPR’03, Madison, WI, June 16-22, 2003.

(known or published) attack…

Greg Mori & Jitendra Malik (UCB-CS) – – –

Generalized Shape Context

CV method requires known lexicon –

else, fails completely

expects known font (or fonts) –

else, does worse Results of Mori-Malik attacks (Dec 2002) given perfect foreknowledge of both lexicon and font:

CAPTCHA

EZ-GIMPY

Yahoo! + CMU

PessimalPrint

PARC + UCB

Attack success rate

83% 40%

BaffleText

PARC + UCB 25% DIAR, Madison, WI – June 21, 2003 (HSB) 41

BaffleText:

the strongest known CAPTCHA?

Resists many known algorithmic attacks:

– – – – physics-based image restoration recognizing into a lexicon known-typeface targeting segmenting then recognizing 

Exploits hard-to-automate human cognition powers:

– – – Gestalt perception “semi-linguistic” familiarity within typeface “style consistency” DIAR, Madison, WI – June 21, 2003 (HSB) 42

Recent Microsoft CAPTCHA

• Random strings, local space-warping; plus meaningless curving strokes, both black (overlaid) and white (erasing) • Fielded Dec 2002 on Passport (HotMail, etc) • Immediate reduction in new Hotmail accounts, with virtually no user complaints P. Y. Simard, R. Szeliski, J. Benaloh, J. Couvreur, I. Calinov, “Using Character Recognition and Segmentation to Tell Computer from Humans,” Proc., Int’l Conf. on Document Analysis & Recognition , Edinburgh, Scotland, August, 2003 [to appear].

DIAR, Madison, WI – June 21, 2003 (HSB) 43

PARC’s Leadership in R&D on Reading-based CAPTCHAs

    First refereed article on CAPTCHAs: A. L. Coates, H. S. Baird, R. Fateman, “Pessimal Print: a Reverse Turing Test,”

Proc., 6th IAPR Int’l Conf. On Document Analysis & Recognition

, Seattle, WA, Sept. 10-13, 2001.

First professional HIP event, organized by PARC:

1st NSF Int’l Workshop on HIPs,

Jan. 9-11, 2002, PARC, Palo Alto, CA.

First to ‘play

both

offense & defense’: – – builds high-performance OCR systems; attacks CAPTCHAs builds strong CAPTCHAs First to validate using human-factors research: – – human-subject trials measuring both accuracy & tolerance PARC’s interdisciplinary tradition: social + computer sciences DIAR, Madison, WI – June 21, 2003 (HSB) 44

The Arms Race

When will serious technical attacks be launched?

– ‘spam kings’ make $$ millions – two spam-blocking e-commerce firms now use CAPTCHAs 

How long can a CAPTCHA withstand attack?

– especially if its algorithms are published or guessed 

Strategy: keep a pipeline of defenses in reserve:

– continuing partnership between R&D & users DIAR, Madison, WI – June 21, 2003 (HSB) 45

Lots of Open Research Questions

What are the most intractable obstacles to machine vision?

segmentation, occlusion, degradations, …?

Under what conditions is human reading most robust?

linguistic & semantic context, Gestalt, style consistency…?

Where are ‘ability gaps’ located?

quantitatively, not just qualitatively

How to generate challenges strictly within ability gaps?

fully automatically an indefinitely long sequence of distinct challenges

DIAR, Madison, WI – June 21, 2003 (HSB) 46

HIP Research Community

 PARC CAPTCHA website www.parc.com/istl/projects/captcha  HIP’2002 Workshop www.parc.com/istl/groups/did/HIP2002  HIP Website at Aladdin Center, CMU-SCS www.captcha.net

 Volunteers for a PARC CAPTCHA usability test?

 A 2nd HIP Workshop soon?

DIAR, Madison, WI – June 21, 2003 (HSB) 47

Alan Turing might have enjoyed the irony …

A technical problem – machine reading – which he thought would be easy

,

has

resisted attack for 50 years

, and now allows the first widespread practical use of variants of his test for artificial intelligence.

48 DIAR, Madison, WI – June 21, 2003 (HSB)

Contact

Henry S. Baird [email protected]

www.parc.com/baird

DIAR, Madison, WI – June 21, 2003 (HSB) 49