Transcript Innovations on methods and survey process for the 2011
Innovations on methods and survey process for the 2011 Italian population census
European Conference on Quality in Official Statistics 8-11 July, 2008
Giancarlo Carbonetti, Marco Fortini, Fabrizio Solari
European Conference on Quality in Official Statistics
Italian Census and municipal population registers
• • Italian census can be defined
register improved
Conventional methodology Check against population register, followed by follow up • • • Population registers enlist people dwelling in Municipalities Are managed locally Record vital and internal/external migration events Are used in updating population between two censuses • • New census strategy aims in improving Integration between census and population registers Field organization and timeliness
European Conference on Quality in Official Statistics
Size as the most critical factor for data quality
Organizational impact strongly dependent on the size of the municipality
demographic size
as one of the most important risk factors in the past census round
TYPE A B C 1 C 2 SIZE (inhabitants)
> 50,000 and chief towns 20,000 – 49,999 5,000 – 19,999 < 5,000
Municipalities
165 339 1,859 5,738
% Population (01.01.07)
35,8 17,0 29,6 17,6 As a result, it will be considered a modularity for the innovation scheme which depend on the municipalities population size
European Conference on Quality in Official Statistics
General Scheme for largest municipalities (A and B)
Source 1) Municipal Population Register (called LAC) Source 2) Auxiliary Sources Information Lists (called LIFA) Source 3) Addresses Register (called RNC)
Units included in LAC
• mail out of partially prefilled questionnaires • multi-mode response ( web, Cati) mail back , • Enumerators field work: • late respondents collection • check for over-coverage
Units not included in LAC
• option 1: complete search of missing units by enumerators on the basis of LIFA and RNC • option 2: survey based on capture recapture and adjustment of census counts
European Conference on Quality in Official Statistics
Use of samples of households for long form enumeration
• • Mail out of census forms requires lighter questionnaires so as to Increase the
spontaneous
response rate Reduce as much as possible the response time delay Less enumerators will be necessary to retrieve information
But using only short form would decrease census informative power
• • Census variables are partitioned into two mutually exclusive subsets
demographic variables
, such as gender, date of birth and nationality
socio-economic variables
, such as educational level, occupational status and commuting • • Random sampling from population registers Sample of households provided with long forms Remaining households provided with short forms
European Conference on Quality in Official Statistics
Sampling design and estimation domains
Domains: national, regional, provincial, municipal and sub-municipal Sub-municipal areas define a partition of municipal urban territory, each of about 15,000 inhabitants For 504 Italian municipalities with 20,000 inhabitants or more (or Chief Towns), at least an urban area can be built
Sampling from population registers
• • Municipalities greater than 20,000 inhabitants urban areas: long forms sent to a sample of households and short forms sent to remaining households rural areas: long form sent to all the households residents • Municipalities between 5,000 and 19,999 inhabitants municipal level: long forms sent to a sample of households and short forms sent to remaining households ( not yet decided ) In municipalities smaller than 4,999 inhabitants long forms are sent to the whole residents.
European Conference on Quality in Official Statistics
Experiments
Goal: evaluating the expected efficiency of sampling estimates Carried out on 2001 census data 40 municipalities with at least 10,000 inhabitants (15 ≥ 100,000) • • Chosen not randomly across the country different populations size 10% of Italian households considered in analyses Various sampling rates considered (10%, 20%, 33%) Multivariate tables of short and long form variables together Calibrated estimates of 90 absolute and relative frequencies Sub-municipal (among 5,000 and 15,000 people), to regional domains Montecarlo techniques simulating sampling distribution of estimates Comparisons carried out by means of Coefficient of Variation (CV)
European Conference on Quality in Official Statistics
Main results
Coefficient of Variation (average and maximum expected values) for the estimates referred to census sub-municipal areas
Classes of absolute frequency T
<10 10 ├30 30 ├50 50 ├100 100 ├250 250 ├500 500 ├1,000 1,000 ├2,500 2,500 ├5,000 5,000 ├10,000 sampling ratio = 10% cv%_average
143.3
75.9
51.8
38.6
25.4
16.1
11.8
7.5
4.9
3.2
cv%_max 191.8
85.1
57.1
41.3
28.5
18.3
12.8
8.9
5.4
3.8
sampling ratio = 20% cv%_average
101.4
48.4
31.8
22.3
15.7
10.4
7.5
4.7
3.0
2.0
cv%_max 123.7
54.6
37.1
28.4
19.6
12.5
8.2
5.9
3.6
2.5
sampling ratio = 33% cv%_average
66.5
33.8
23.4
17.4
11.4
7.5
5.3
3.3
2.0
1.3
cv%_max 95.8
38.5
25.6
19.1
12.8
8.1
5.9
3.9
2.5
1.9
European Conference on Quality in Official Statistics
Other results
• • CV improvement when increasing sampling rate from 10% to 20%: to 33%: (33-38%) (53-58%) • • Average CV improvement when size of sub-municipal areas increases from
less than 10,000
inhabitants to 10,000-12,000: (14-20%) More that 12,000: (22-28%) Percentage distribution of estimated absolute frequencies Areas with at least 12,000 people Classes of cv% < 10% 10% - 50% > 50% 10% 29.80
54.23
15.97
Sampling ratio 20% 51.07
40.29
8.64
33% 65.42
29.52
5.05
European Conference on Quality in Official Statistics
Tackle with small domains and rare populations
So as increasing estimates efficiency, standard small area estimators considered in the Eurarea Project are being studied Both unit level and area level model based estimators have been tested Early results show good performance when estimating relative frequencies up to 1% (among 50 and 150 cases for sub-municipal level, depending on their population size) Reduction of MSE is in between 40% and 80% with respect to calibrated estimators https://www.statistcs.gov.uk/eurarea
European Conference on Quality in Official Statistics
Undercounting evaluation and integration of missing units
• Population register have to be amended also by undercount e.g. those units actually dwelling on the field that are not enlisted into the correspondent population register About 235,000 was the number of people enrolled in municipal • • population registers after enumeration during 2001 census People already residing elsewhere in Italy People coming from abroad (foreigners are underestimated) Two options are being studied so to overcome the registers • • undercount Complete search of missing units on the field labour intensive Capture recapture model based on a sample survey depending on various statistical hypotheses
European Conference on Quality in Official Statistics
Option 1 - Complete search of missing units on the field
Use of alternative lists of names and addresses (LIFA) List of street numbers (RNC) supplemented with information referred to the related housing units 1.
whole number of dwellings (without housing units reserved for business activities) 2.
number of dwellings used as usual residence by households enlisted into the population register (LAC) LIFA is used to contact the households
directly
at their address RNC is used to
search
for
new
households only at street number where (1) – (2) > 0 • RNC not available for municipalities under 20,000 inhabitants complete search supported by the list of the households which already sent back a short/long form
European Conference on Quality in Official Statistics
Option 1 - Critical aspects of complete search
It works well only if vacant housing units are few or very clustered Can be used as a method of optimisation when lack of resources • prevent the complete investigation of the field In this case an estimation procedure has however to be considered • • • Example: Florence - 2001 Census data; 72% of buildings are fully occupied by resident households.
90% of remaining buildings contain only up to 3 vacant housing units these buildings are made by 8.5 housing units on average, against a general average of 5.6 housing units per building • Higher risk of people and households duplication is expected Demanding use of record linkage techniques
European Conference on Quality in Official Statistics
Option 2 - Capture recapture estimation based on sample survey
• • Estimation through a sample survey of the population living in each • • municipality being not enlisted into the related population register Not all the municipalities are included in the sample Simplified field activities and better control of costs
First capture
: population register corrected for over-coverage (
A
)
Second capture
: complete enumeration inside a sample of areas True population count
P
in a given area can be estimated by
A
1
t over
1
t
ˆ
under
t
over rate of not eligible people enclosed into the register
t
ˆ
under
rate of residents non enlisted into the register among those enumerated during the sample survey
European Conference on Quality in Official Statistics
Option 2- A simplified scheme of the survey in a municipality
Municipality areas Check for over-coverage in LAC Check for under-coverage in LAC
Some interesting features
Both the tasks can be carried out at the same time Operational independence between the two procedures is assured Differently than in the usual Capture Recapture approach, households are contacted only once
European Conference on Quality in Official Statistics
Option 2 - Sampling scheme
Quite large sample (300,000 – 1,000,000 people) Primary sampling units Secondary sampling units All municipalities ≥ 50,000 inhabitants Sample of street numbers (RNC) Sample of municipalities 20,000-50,000 inhabitants Sample of municipalities with less than 20,000 inhabitants Sample of street numbers (RNC) Sample of enumeration areas Direct dual system estimates until regional level Model assisted estimation for municipal and sub-municipal estimates
European Conference on Quality in Official Statistics
Option 2 - Some early results on small area estimation
Evaluated through 2001 census post enumeration survey data 180,000 people and 70,000 households in 98 Italian municipalities Unit level small area estimation following Wolter (1986)
x
ˆ
D
i
D x i
1
i
1
p
i1+ estimated by means of a logistic model on the PES data Jacknife estimation of the variance is under study Next step: using an area level approach Example: Florence
European Conference on Quality in Official Statistics
Option 2 - Example: Coverage evaluation for Florence
European Conference on Quality in Official Statistics
Option 2 - Critical aspects of capture recapture approach
• Dependencies on statistical hypothesis Independence between captures, homogeneity • • • Influence of record linkage errors activities on Timeliness Accuracy Identification of duplicates Quality evaluation of estimates carried out by the method Enumeration of housing units and present population Municipalities population register will not be directly amended for • • under-coverage Over-coverage: complete correction Under-coverage: estimate number of people to be included
European Conference on Quality in Official Statistics
Concluding remarks
2011 Census as a step toward a register supported census Statistical policy and technical issues • • • • Many different methodological issues related each other Sampling Modelling Imputation Record linkage • • • Find the trade off among Innovations Results comparability Stakeholder needs April 2009 – Pilot survey