Towards a high quality 2011 census: the design of the census coverage survey

Download Report

Transcript Towards a high quality 2011 census: the design of the census coverage survey

Design of the 2011 Census
Coverage Survey
Owen Abbott (ONS)
James Brown (Institute of Education)
Aim of the Presentation
• Role of the Census Coverage Survey (CCS)
• Review of the CCS design in 2001
• Basic structure of the design in 2011
• Evaluating the basic design decisions
• Issues to be resolved
2
Role of the CCS
• Provides the main data for assessing census
coverage
• To achieve this it must:
•
•
•
•
operate ‘independently’ of the census
have ‘complete’ coverage of the ‘population’
high response rate ~ 90%
happen close to Census Day (probably 6/7 weeks in
2011 compared to 3/4 weeks in 2001)
• The design needs to capture variation in
census coverage (geographic, demographic)
3
2011 Coverage Assessment Overview
Census
Coverage
Survey (CCS)
2011 Census
Matching
Estimation
1) DSE
2) Ratio estimation
Adjustment
OUTPUT:
Population
estimates
with CIs
OUTPUT:
Census
Database
4
Structure of the 2001 CCS Design
• National ‘Hard-to-Count’ (HtC) index with
three levels
• Formed Estimation Areas (EAs) by grouping
contiguous LAs
• Stratification by HtC within each of 101 EAs
• Further stratification at design by 1991 age-sex structure
(which was ignored in the estimation)
• No direct control of the individual LA samples
• Clustering based on selecting 1991
Enumeration Districts (EDs) and then fixed #
postcodes per ED
5
Evaluation of the 2001 CCS Design
• Based on rather ad-hoc data and intelligence
from 1991
HOWEVER
• The CCS worked well in most situations
• Problems with Manchester relate to the CCS
design information not reflecting change
• Have much more data on coverage patterns
for use in the 2011 design
6
Some key questions
•
•
•
•
What should the sample unit be?
How much clustering?
What should the stratification structure be?
How do we allocate the sample?
7
Basic Sampling Unit
• CCS must be independent of the Census
THEREFORE
• Implies postcodes must be the basic unit of
the design
• No lists of households and/or addresses with complete
coverage exists (independent of the Census)
• Problems identifying the boundaries but at
least postcodes are known by
householders…
8
How Much Do We Cluster?
• Cost against efficiency
- Clustering reduces statistical efficiency but can be offset by
reduced fieldwork costs allowing a larger sample
- Often makes fieldwork management easier as a single
postcode is too small for a single interviewer
• Some choices for cluster unit and number of
postcodes per cluster
• Part of the estimation strategy in 2001 used the
clustered structure
9
Stratification Structure (at design)
• Can be different at estimation (as long as we
think about it at design)
- How do we reflect LAs in the design?
- In 2011 do we still need to use Estimation Areas at
design?
• National verses local HtC index
- A national index gives more flexibility at estimation
- Allows LAs to be grouped
10
Sample Allocation
• Need to be careful that we are not highly
optimised for 2001 as 2011 could be quite
different
- Need to build on the likely patterns based on 2001
while being robust to change (sample everywhere)
• Driven by a ‘design variable’ to proxy for
under-count
- Trying different ones but the analysis in this
presentation based on imputed households in
2001
11
ANOVA Results
• ANOVA of the design variable across OAs
• Shows benefit of both HtC and geography
• Geography is most important
R-squared from ANOVA using the HtC index and geographic stratification
options as the independent variables.
Hard to Count index
40%, 40%, 20%
60%, 20%, 10%, 8%, 2%
Geographic Stratification
Local
2001 Estimation
Authorities Areas
0.732
0.639
0.748
0.654
Non-contiguous
Estimation Areas
0.641
0.658
12
Level of Clustering
• OAs are the obvious choice for PSUs
•
•
•
•
Have 2001 Census data
Some external data
Consistent size, homogenous
More consistent with postcodes
• Comparing with EDs and SOAs
• For fixed costs, OAs do well with one to three
postcodes
• Likely to have some clustering
- Helps DSE and field management
13
Performance of Designs Using LAs
RSEs for different geographical stratifications with national sample size of 5,500
Output Areas
Estimation Area
option
Local Authorities
2001 Estimation Areas
Non-contiguous EAs v1
Non-contiguous EAs v2
Subgroups
Groups
National
RSE for household
population
0.058%
0.068%
0.068%
0.061%
0.072%
0.073%
0.091%
RSE for person
population
0.067%
0.079%
0.078%
0.071%
0.085%
0.085%
0.103%
14
Comparing LA designs with EA design
15
Performance of Designs Using LAs
• LA level designs appear to be ‘best’ for the total
population
- Agrees with ANOVA results
• This suggests we should break the direct link
between design and estimation
- Best to design by LA
- Then group LAs at estimation
• Can control LA sample size
- Benefits for small area estimation
- Easier to understand for users?
16
Issues to be Resolved
• Collapsing strata
• Minimum sample sizes
• Grouping of LAs at estimation
- Contiguous or non-contiguous
17
Summary - ‘Likely’ Basic Structure
• Stratified cluster sample of postcodes
• LA and HtC stratification at design
• Clustering based on 2001 OAs
• EAs formed from the LAs for estimation
• Pre-specified (not necessarily geographic) but checked
after Census fieldwork
18
Summary
• CCS is key component of coverage
assessment
• Design of the survey is critical
• Maintaining a robust approach
- Guarding against moving too far from 2001 design
- More weighted towards harder areas than in 2001
- But still sampling within every LA
19