Optical Data Capture: Optical Character Recognition (OCR) Intelligent Character Recognition (ICR) Intelligent Recognition UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary.

Download Report

Transcript Optical Data Capture: Optical Character Recognition (OCR) Intelligent Character Recognition (ICR) Intelligent Recognition UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary.

Optical Data Capture:
Optical Character Recognition (OCR)
Intelligent Character Recognition (ICR)
Intelligent Recognition
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Summary








Concept/Definition
Forms Design
Scanners & Software
Storage
Accuracy
OCR/ICR Advantages and Disadvantages
Intelligent Recognition (IR)
Commercial Suppliers
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Definition/Concept of OCR
 Gives scanning and imaging systems the ability
to turn images of machine printed characters
into machine readable characters.
 Images of the machine printed characters are
extracted from a bitmap of the scanned image
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Definition/Concept of ICR
 Gives scanning and imaging systems the
ability to turn images of hand written
characters into machine readable characters
 Images of the hand written characters are
extracted from a bitmap of the scanned image
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR and ICR Differences
 OCR is less accurate than OMR but more
accurate than ICR
 ICR will require editing to achieve high data
coverage
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Forms
 OCR/ICR has less strict form design
compared to OMR
 No timing tracks
 Has Registration Marks
 ICR requires hand printed boxes filled one
alphanumeric character per box
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR
 Forms
 OCR/ ICR is more flexible since:
 no timing tracks are required
 The image can float on a page
 The use of drop color reduces the size of the scanner’s
output and enhances the accuracy
 ICR/OCR technology often uses registration mark on the
four-corners of a document, in the recognition of an image
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR/ICR Scanners and Software
 Forms can be scanned through a scanner and then the
recognition engine of the OCR/ICR system interpret
the images and turn images of handwritten or printed
characters into ASCII data (machine-readable
characters).
 Users can scan up without doing the OCR
 Speeds Range from: 85-160 sheets/min (dependent
on the recognition engine)
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR/ICR Storage Characteristics
 Storage/Retrieval
 Images are scanned and stored and maintained
electronically
 There is no need to store the paper forms as long as
you safeguard the electronic files
 With OCR/ICR technologies, images can be scanned,
indexed, and written to optical media
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Ideal OCR/ICR Accuracy Thresholds
 Accuracy:
 Accuracy achieved by data entry clerks (~99.5%)
are approximately equal to OCR/ICR in in perfect
tuning (~99.5%)
 Up to 99.9% accuracy with editing (like OMR)
 The recognition engine must be tuned, tested
and validated very carefully
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR/ICR Advantages

Advantages

Recognition engines used with imaging can capture highly
specialized data sets

OCR/ICR recognize machine-printed or hand-printed
characters.

Scanning and recognition allowed efficient management and
planning for the rest of the processing workload

Quick retrieval for editing and reprocessing
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR/ICR Disadvantages

Technology is costly

May require significant manual intervention

Additional workload to data collectors -ICR has severe limitations
when it comes to human handwriting

Characters must be hand-printed/machine-printed with separate
characters in boxes

ineffective when dealing with cursive characters
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OMR-OCR/ICR Compared
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR/ICR Challenges/Issues
 Has corresponding issues with OMR
 Algorithm development (Preparation of
memory dictionary)
 Processing time considerations due to
recognition engine
 Development costs
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Definition/Concept of IR
State of the art recognition technology
 Gives scanning and imaging systems the ability to turn
images of hand written and cursive characters into
machine readable characters
 Images of the hand written and cursive characters
are extracted from a bitmap of the scanned image
 The ability to capture cursive make this method unique
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Definition/Concept of IR
 eight elements that make up the
trajectories of all cursive letters
(figure 1)
Photo: Parascript LLC
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Definition/Concept of IR

Intelligent Recognition dynamically uses context

context is used during the recognition process, improving the
accuracy of results

Contexts helps to identify letters where the symbol segmentation
of an image is ambiguous
Photo: Parascript LLC
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Technology Evolution
FORM TYPES
TEXT STYLES
No special form design
No constraining boxes or combs
Condensed strings
Dirty & Noisy forms
Bad quality paper
Legacy Forms
Cursive
Bad quality
machine print
Unconstrained
Handprint
Specially designed for automatic
recognition
Constrained
Handprint
Constraining boxes or combs
Drop out ink for preprinted
text & boxes
Machine Print
OCR
ICR
Intelligent
Recognition
TECHNOLOGY EVOLUTION
Illustration: Conference on Technology Options for 2011 Census
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Major Commercial Suppliers
 Top Image Systems (TIS)
(http://www.topimagesystems.com)
 ReadSoft
(http://www.readsoft.com)
 Teleform
(http://www.intelliscan.com/TeleForm1.htm)
 Scanner Suppliers
 Fujitsu, Canon, Bell & Howell, Kodak
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
THANK YOU!
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008