Ethiopian 2007 CENSUS DATA CAPTURING AND PROCESSING CENTRAL STATISTICAL AGENCY (CSA) APRIL, 2008 Background Information Population and Housing Census process is the largest data capturing.
Download
Report
Transcript Ethiopian 2007 CENSUS DATA CAPTURING AND PROCESSING CENTRAL STATISTICAL AGENCY (CSA) APRIL, 2008 Background Information Population and Housing Census process is the largest data capturing.
Ethiopian 2007 CENSUS DATA
CAPTURING AND PROCESSING
CENTRAL STATISTICAL AGENCY
(CSA)
APRIL, 2008
Background Information
Population and Housing Census process is the
largest data capturing exercise a country can undertake.
It involves capturing of millions of forms
The Central Statistics Agency (CSA) started using old
techniques like Punched Card Reader as early 1960’s.
Two Population and Housing Censuses have so
far been conducted in Ethiopia.
The first Population and Housing Census was
carried out in 1984.
Background Information Cont’d . . .
During the 1984 Census:
Data capture was done on manual keyboard
based entry using mainframe computer
FORMSPEC data entry system was used
It took more than 2 years to capture the data
for about 42 million people.
In the case of the 1994 Census:
Data capture was again done on manual
keyboard entry basis using PC’s
CENTRY data entry system (IMPS) was used
Background Information Cont’d . . .
It took about 18 months to capture the data for the
population of about 53 million.
About 180 data entry clerks were involved
Around 90 Pc’s were used
The entry work was done on 2-shift basis
Some Limitations of the Keyboard Manual
Entry Method
Time consuming
Does not allow the availability of timely data
The data will be weaker in representing the
current or existing situation
Subject to additional non-sampling errors
Human error due to manual keying
Due to the volume of the data, a 100% verification,
as in the case of sample surveys, is difficult.
Limitations Cont . . .
Involves a great deal of human resource
management.
Large number of data entry operators
and equipment required
The Need for Alternative Solutions
The need to have timely census results and
the limitations discussed above forced the
Agency to look for other alternatives
This is obviously very important with regards
to large volume of data like census.
Hence the need to use the Scanning Technology
The Scanning Technology
The Scanning Technology in general implements
two basic techniques
Mark recognition, like the Optical Mark
Reader (OMR)
Character recognition, like the Optical
Character Recognition (OCR), and the
Intelligent Character Recognition (ICR)
Scanning Technology Cont . . .
OMR is the recognition of shaded marks (blobs) on the
forms
The positioning of these blobs on a form
determines the alphanumeric characters
they represent
The character recognition is the recognition of
alphanumeric characters on forms and they are
of 2 types:
OCR which is the recognition of machine printed
characters and . .
Scanning Technology Cont . . .
ICR which refers to the capture of
hand- printed characters from a form
For scanning of the 2007 Census the Optical
Mark Reader (OMR) technique has been selected
The Scanning Technology we use:
PhotoScribe Series PS900 Scanners
(DRS Scanning Technology Product)
DRS
Photo Scribe Series PS900
High speed Imaging Mark Reader
Windows XP professional
CD R/WR drive
Network connectivity
A TFT monitor, Keyboard, mouse
Speed: up to 8,500 forms / hour
The Scanning Process in General
It mainly involves:
Scanning / Data Capture – including IMAGE capturing
Validation and Key-correction of scanned data
Exporting the scanned and key-corrected data
into ASCII or Text format
The format suitable for electronic processing
Learning from Experiences of
Other Countries
Study tour made to two African countries
Tanzania
To learn from their successes
Data capture of the 2002 Census of Tanzania
was done in about 26 days
General report tables were produced within
3 months from the start of the scanning
Experiences of Other Countries . . .
Ghana
To learn from their difficulties
Data capture of the 2000 Census took about
6 months - ( forms from 29,000 EAs)
3 Scanners were used (Kodak, Fujitsu)
The larger scanner was Kodak 500D
Speed: About 500 forms/min
Power failure was one of the major problems
Loss of some data occurred as a result
A large generator was installed to minimize
the effect of the frequent power cut
Major Benefits of the Scanning Technology
Significant decrease in time required to capture
the data
This helps to get timely data
Users’ need satisfied (policy makers, planners,
researchers, etc.)
No need to worry to store millions of forms for
long time in the future
Scanning captures the whole content of a
questionnaire in an electronic image format
Requirements for Effective Scanning
Proper training
Both on Hardware and Software
This helps to “own” the technology
Being able to use the technology after the
departure of the trainers / technical advisors
A reliable Network System
A well organized space for forms and data flow
is required
STRUCTURED SPACE FOR FILE FLOW
Data Processing Center
Waiting Room
Warehouse
Retrieval
Registering EA’s
for Scanning
Registering &
Organizing EA’s
Received from
the Field
1
4
3
5
Scanning
Room
Receiving the
Questionnaires
2
6
Store
Key-Correction
Room
7
8
Processing
Center
Requirements for Effective Scanning - - Proper file management and care
Checking Batch (EA) IDs and orientation of
forms
Ensuring the EA code on each box is the
same as the one on the questionnaires
Proper recording of the in-coming and outgoing questionnaires
Close attention in detecting errors in the
scanning process is required
Requirements for Effective Scanning - - Ensuring the proper paper throughput
through the scanner
Ensuring smooth running of the scanning machines
Maintenance
Cleaning (daily)
An arrangement to minimize the effect of Power
Interruption is required
Major Activities Accomplished in the
Course of the Census Taking
Data from the Pilot Census was successfully scanned
(OMR), key-corrected, exported to text format,
tabulated and tested.
One scanner (PS 900 Photo Scribe) was used to
capture the pilot data
Technical experts from the DRS company assisted in
capturing, validating and exporting the pilot data
Training in scanning technology was given :
16 professionals were trained
Major Activities Accomplished - - Hardware and Software training conducted
The training in general took about 7 working days
SOSKITW for Windows :- a DRS software package
for scanning was introduced
Components of the SOSKITW Software :
SOSGen : - used to generate scanning
decodes for completed OMR forms (How
marks on forms are interpreted and stored)
SOSInp : - used to scan, validate and export
scanned data.
Major Activities Accomplished - - Equipment purchased and installed
10 additional PS900 iM2 DRS Scanners
16 high capacity PC’s for key-correction
Census data processing work plan prepared
Recruitment of temporary staff
Staff training (scanning technology, CSPro)
Retrieval and organization of completed forms
Scanning and validation
Computer editing and tabulation
(For each activity: duration and responsible body are indicated)
Major Activities Accomplished - - Census data processing teams organized
Batch header database group
Scanning and validation team
Technical desk heads
Shift supervisors
Two senior programmers responsible for
the overall scanning process
Other sub-professional staff assigned
4 batch header scanning technicians
16 data validation workers
Major Activities Accomplished - - The scanning room organized
An air conditioner for the scanning room installed
A high capacity automatic generator installed to
ensure uninterrupted power supply
Batch Header Database organized
EA Control Forms completed in 2 parts during dispatch
Same EA ID on both parts of the control form
Same Enumerator Number on each part
No. of Households in the EA filled-in
The scannable part detached and scanned in office
Completed Census Forms
Completed forms retrieved from the field
(about 90,000 EA’s)
Reception and organization of filled-in forms
completed
About 33 teams for registering and
organizing forms were organized
3 persons assigned per team
Retrieval of each EA checked and registered
Presence of all form types checked (each EA)
Control forms are also used to check the
completeness of EA’s
Completed Census Forms - - Types of the 2007 Census Forms
Short questionnaires
Long questionnaires
Household Listing Forms
Summary Forms
Community Level Forms
EA Control Forms
(Batch Header Forms)
EA ID’s and no. of households filled-in
Unique Enumerator No. assigned
Scanned to create EA Database
Long Questionnaire
Batch Control Form
Summary Form
Actual Scanning Process - Census Forms
Organized forms taken from store to the waiting room
Batch Header information printed and associated with
its respective EA box
The existence of each EA verified
Checked EAs sent to the scanning room
Scanned forms are finally sent back to the stores
Captured data are validated and key-corrected
Key-correction involved checking and correcting:
Missing marks
Multi-marks
Partial marks
Actual Scanning Process - - Scanned and validated data is exported to TEXT format
Format suitable for computer editing and tabulation
Backup of the scanned / captured data is taken :
on the Database Server
externally, on high capacity tape cartridges
HP Ultrium
Data Cartridge
400 GB
Actual Scanning Process - - All Census forms have been scanned :
The scanning of the 10 sedentary Regions
was carried from mid Aug. 2007 to
mid Dec 2008
The scanning for Affar and Somali Regions
took about one month including checking
(mid Jan - mid Feb 2008)
44 scanning operators were assigned
11 scanners used
2 shifts per day, 7 days per week
Validation and key-correction of the scanned
data is done
Census Forms Scanning Process
Scanning
Key-Correction
Data Cleaning / Computer Editing
Scanned, key-corrected and exported data
Batch Edit Program based on Edit Specs provided by
subject matter specialists developed and run on the data.
The software to be used in editing the data is the Census
and Survey Processing System (CSPro)
And Batch Edit Application (.bch) is the component of
CSPro used to clean the data through editing and
imputation processes
Report Generation / Tabulation
Raising factors attached to the edited long
questionnaire data
Tabulation programs (in CSPro) are prepared and
tested
Tables in accordance with the Tabulation Plan will be
produced
Final data will be organized in various formats
(ASCII, SPSS)
Final data will be sent to the Central Databank for
achieving and dissemination purposes.
Problems Encountered
I. Scanning :
A batch might slip through un scanned during data capture
A batch might also be scanned in parts only
Misplacement of scanned forms in wrong boxes
Limited storage space on the scanning machines
Scanners become full– that makes scanning difficult
Scanned images should constantly be moved to the
storage server
The location of scanned images on the storage server
may sometimes not be found
Problems Encountered - - II. Key Correction:
Problems in retrieving scanned images for key
correction was encountered
Key correction took longer time as it is done
manually
The key correction process, as stared earlier, was
based on fixing:
Missing marks
Multi-marks
Partial marks
Problems Encountered - - III. Processing the data :
Large volume of data – takes long time (8 hrs)
Frequent power failure highly affects the processing
sessions
The tabulation component of CSPro software
sometimes fails unpredictably
(It is a newly developed tabulation system)
In summary :
Registration and organization of all completed Census
Forms done
The scanning and key correction of the Census
questionnaires completed
The scanning of the Household Listing forms is done
Draft Census preliminary results have been produced
Additional Comment:
Quick manual review (editing and coding) of the
filled-in forms might be needed prior to the scanning
process