Transcript Slide 1

Optical Character Recognition

By Vicky Shunkwiler, Heather Hurd, and Sam Stone

What is OCR?

 OCR is the electronic identification and digital encoding of printed or handwritten characters by means of an optical scanner and specialized software

Informational Video

OCR FOR DUMMIES

How does OCR work?

 Scans printed text onto a computer and the software interprets the material and “reads” it off through synthetic speech on the computer

OCR Software

  The OCR software scans and determines whether it is reading images or text The machine determines letters and words by recognizing shapes of letters by repetitions and patterns of familiar forms

History of OCR

  Originally developed in 1929 by G. Tauschek in Germany In the 1950’s the US funded to develop their own version of OCR, first by the American Bankers Association and Financial Services to process checks

History

  In 1953, David Shepard founded Intelligent Machines Research Corporation (IMR) Shepard came up with “Gismo” which later ended up being limited, compared to future IMR systems that could scan and recognize most documents

History

  OCR/IMR systems were first used by Readers Digest, IBM, Standard Oil Company, US Air Force, and credit card companies It also became widely used in US, British and Canadian postal services starting in the 1970s

OCR Today

  OCR software can distinguish most fonts and some handwritten text The current price is between $3,000.00 and $10,000.00. However is decreasing in price, because of increase in popularity in businesses

OCR Today

 OCR software has presently become a popular aide to those with visual impairments because it scans in text and can read it off to them

Factors Affecting OCR Accuracy

 Accuracy rate exceeding 98% is necessary for OCR to be more effective than rekeying

Hardware and Software Variables

      Scanner Quality Recognition Method and Algorithm Type of Font Scan Resolution Generation of Original Type of Binding

Paper Quality and Typeface Clarity

    Pale, broken, or touching characters may not be recognized Stains, marks, or any other non character may be recognized and misinterpreted by OCR Shaded or Colored backgrounds Variations in typeface may be lost or misunderstood

Formatting

   Unusual fonts or characters may not be in the software’s catalog and therefore may not be recognized Typed characters are most accurately recognized currently. Research into OCR that recognizes handwritten and cursive characters accurately is underway Tables, indents, footnotes, etc. may not be recognized

Ray Kurzweil

  Developed the first OCR that could recognize all kinds of printed text Continually advancing technology for the blind

Kurzweil Music Systems

  Kurzweil and Stevie wonder Developed a synthesizer that could reproduce the sound of grand pianos and other instruments

Original Kurzweil Reading Machine

Kurzweil- National Federation for the Blind Reader