Optical Data Capture: Optical Character Recognition (OCR) Intelligent Character Recognition (ICR) Intelligent Recognition UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary.
Download
Report
Transcript Optical Data Capture: Optical Character Recognition (OCR) Intelligent Character Recognition (ICR) Intelligent Recognition UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary.
Optical Data Capture:
Optical Character Recognition (OCR)
Intelligent Character Recognition (ICR)
Intelligent Recognition
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Summary
Concept/Definition
Forms Design
Scanners & Software
Storage
Accuracy
OCR/ICR Advantages and Disadvantages
Intelligent Recognition (IR)
Commercial Suppliers
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Definition/Concept of OCR
Gives scanning and imaging systems the ability
to turn images of machine printed characters
into machine readable characters.
Images of the machine printed characters are
extracted from a bitmap of the scanned image
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Definition/Concept of ICR
Gives scanning and imaging systems the
ability to turn images of hand written
characters into machine readable characters
Images of the hand written characters are
extracted from a bitmap of the scanned image
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR and ICR Differences
OCR is less accurate than OMR but more
accurate than ICR
ICR will require editing to achieve high data
coverage
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Forms
OCR/ICR has less strict form design
compared to OMR
No timing tracks
Has Registration Marks
ICR requires hand printed boxes filled one
alphanumeric character per box
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR
Forms
OCR/ ICR is more flexible since:
no timing tracks are required
The image can float on a page
The use of drop color reduces the size of the scanner’s
output and enhances the accuracy
ICR/OCR technology often uses registration mark on the
four-corners of a document, in the recognition of an image
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR/ICR Scanners and Software
Forms can be scanned through a scanner and then the
recognition engine of the OCR/ICR system interpret
the images and turn images of handwritten or printed
characters into ASCII data (machine-readable
characters).
Users can scan up without doing the OCR
Speeds Range from: 85-160 sheets/min (dependent
on the recognition engine)
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR/ICR Storage Characteristics
Storage/Retrieval
Images are scanned and stored and maintained
electronically
There is no need to store the paper forms as long as
you safeguard the electronic files
With OCR/ICR technologies, images can be scanned,
indexed, and written to optical media
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Ideal OCR/ICR Accuracy Thresholds
Accuracy:
Accuracy achieved by data entry clerks (~99.5%)
are approximately equal to OCR/ICR in in perfect
tuning (~99.5%)
Up to 99.9% accuracy with editing (like OMR)
The recognition engine must be tuned, tested
and validated very carefully
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR/ICR Advantages
Advantages
Recognition engines used with imaging can capture highly
specialized data sets
OCR/ICR recognize machine-printed or hand-printed
characters.
Scanning and recognition allowed efficient management and
planning for the rest of the processing workload
Quick retrieval for editing and reprocessing
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR/ICR Disadvantages
Technology is costly
May require significant manual intervention
Additional workload to data collectors -ICR has severe limitations
when it comes to human handwriting
Characters must be hand-printed/machine-printed with separate
characters in boxes
ineffective when dealing with cursive characters
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OMR-OCR/ICR Compared
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR/ICR Challenges/Issues
Has corresponding issues with OMR
Algorithm development (Preparation of
memory dictionary)
Processing time considerations due to
recognition engine
Development costs
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Definition/Concept of IR
State of the art recognition technology
Gives scanning and imaging systems the ability to turn
images of hand written and cursive characters into
machine readable characters
Images of the hand written and cursive characters
are extracted from a bitmap of the scanned image
The ability to capture cursive make this method unique
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Definition/Concept of IR
eight elements that make up the
trajectories of all cursive letters
(figure 1)
Photo: Parascript LLC
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Definition/Concept of IR
Intelligent Recognition dynamically uses context
context is used during the recognition process, improving the
accuracy of results
Contexts helps to identify letters where the symbol segmentation
of an image is ambiguous
Photo: Parascript LLC
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Technology Evolution
FORM TYPES
TEXT STYLES
No special form design
No constraining boxes or combs
Condensed strings
Dirty & Noisy forms
Bad quality paper
Legacy Forms
Cursive
Bad quality
machine print
Unconstrained
Handprint
Specially designed for automatic
recognition
Constrained
Handprint
Constraining boxes or combs
Drop out ink for preprinted
text & boxes
Machine Print
OCR
ICR
Intelligent
Recognition
TECHNOLOGY EVOLUTION
Illustration: Conference on Technology Options for 2011 Census
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Major Commercial Suppliers
Top Image Systems (TIS)
(http://www.topimagesystems.com)
ReadSoft
(http://www.readsoft.com)
Teleform
(http://www.intelliscan.com/TeleForm1.htm)
Scanner Suppliers
Fujitsu, Canon, Bell & Howell, Kodak
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
THANK YOU!
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region:
Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008