Transcript PPT
UNSD Census Workshop
Day 2 - Session 8
Data Capture: Intelligent Character Recognition
Andy Tye – International Manager
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture
Intelligent Character Recognition (ICR)
Elements
• Form design
• Hardware/Software requirements
– Scanners
– Computer infrastructure
• Workflow
• Accuracy
• Advantages
• Disadvantages
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Forms design
• Typical stock grade
paper (90GSM)
• Corner Stones
advised
• Dropout colour is
recommended
but not essential
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Hardware requirements
Software requirements
•
Image Scanners
– TWAIN or ISIS
•
•
Database Server (Full redundancy)
Storage Server – Terabytes
•
•
•
MS-SQL or other database
Data Storage, Archive and Retrieval
Backup Software
•
•
•
•
•
•
Software for Administrator PC
CS-Pro for analysis and reporting PCs
Software for Key correction PCs
Software for Character inspection PCs
Software for Scanner PCs
Software for automatic data capture
–
(Raid 5, Mirrored, etc.)
•
Network (Gb preferred)
•
•
•
•
Administrator PC
CS-Pro PCs
Key correction PCs (Verification)
Character Inspection PCs
–
•
•
(Mass verification - optional)
Scanner PCs
Automatic data capture PCs
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Typical Workflow
ICR
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Typical Workflow
Paper Movement – Processing Centre/s
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Typical Workflow
Receiving
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Typical Workflow
Logging/Checking
• Open Batch
• Verify Contents
• Register Batch
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Typical Workflow
Sifting
• Orientation
• Other Forms
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Typical Workflow
Spine removal
• Cut Booklets
• 30,000/day
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Typical Workflow
Scanning
• Double Sided
• High Speed
• Double Detection
• Ease of Use
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Typical Workflow
Scanning/sorting
• Automatic
Identification
• Data Capture
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Typical Workflow
Storage
• Conditions
• Retrieval
• Space
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Typical Workflow
Image Movement/Data Extraction – Processing Centre/s
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Typical Workflow
Image interpretation
• Automated Process
• Background Task
• Page Identification
• De-skew
• Image Clean up
• Pre-defined Areas
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Typical Workflow
Character inspection
• Tiling
• High Confidence
• Operator Decision
• Field Context
• Tall to Short
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Typical Workflow
Key correction
• Low Confidence
• Operator Decision
• From Context
• External Verification
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Typical Workflow
Key Correction
• ASCII File
• CSV Format
• 1 Line/Form
• CSPro Import
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Typical Workflow
ICR
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Accuracy
This is always the first question
Handprint
•
•
•
•
•
•
Numeric only in isolated fields 98%
Numeric only in semi constrained fields 95-96%
Alpha upper case only 90%
Alpha lower case only 85-87%
Alpha mixed case 75-80%
Alpha/Numeric mixed case 50% or less
– reduce by 5% if there are special characters not a-z
and 0-9
The accuracy level post data correction (e.g. the final output
accuracy) should be 100% (subject to good operators)
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Accuracy continued…
The accuracy of all modern ICR engines are pretty
much comparable
The major differences with suppliers solutions are the
methods and workflow utilised with each offering
False positive detection takes 10 times longer than
entry of characters recognised with low confidence –
false positives (substitutions) are the most expensive
errors
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Accuracy continued…
Accuracy can be improved by:
• Restricting the responses to any given question
• Using external verification
• Using multiple ICR engines to ‘vote’ which is
expensive
• Training your ICR engines on local hand writing
styles (If possible)
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Advantages
• No specialist hardware required
• An image archive can be automatically produced of every
form
• Very high speed scanning can be achieved
• Both OMR and ICR can be interpreted using ICR software
• Forms designed for ICR relatively easy to fill in. Locally
printed forms can be used.
• Allows capturing much more complex data than with OMR
alone
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Disadvantages
• Significant hardware/software and trained IT staff will
be required
• Accuracy dependant on manual intervention
• High calibre IT staff are required to support the ICR
system
• More complex cost/benefit analysis than with OMR
alone
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - ICR
Indicative Costs
For 65 Million Population Census (20M Single Sided A4 household form)
Processing period of 12 Weeks (8 hours/day 5 days/week)
• Hardware $800k-$1M in total
• Software $700k-$1.3M in total
Total Indicative Costs are $1.5M to $2.3M
• No. of Staff 100-190 in total
– 6-10 Managers
– 94-180 PC Operators
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
Data Capture - OMR
Summary
ICR offers considerable flexibility at the cost of
higher skilled IT personnel
The single most important factor for timely and
accurate data capture is to make sure
‘the forms are filled in correctly and
are returned in good condition’
DRS are Worldwide specialists in Census data capture
www.drs.co.uk
UNSD Census Workshop
Day 2 - Session 8
Thank you for listening
Andy Tye – International Manager
DRS are Worldwide specialists in Census data capture
www.drs.co.uk