Innovations on methods and survey process for the 2011

Download Report

Transcript Innovations on methods and survey process for the 2011

Innovations on methods and survey process for the 2011 Italian population census

European Conference on Quality in Official Statistics 8-11 July, 2008

Giancarlo Carbonetti, Marco Fortini, Fabrizio Solari

European Conference on Quality in Official Statistics

Italian Census and municipal population registers

   • • Italian census can be defined

register improved

Conventional methodology Check against population register, followed by follow up • • • Population registers enlist people dwelling in Municipalities Are managed locally Record vital and internal/external migration events Are used in updating population between two censuses • • New census strategy aims in improving Integration between census and population registers Field organization and timeliness

European Conference on Quality in Official Statistics

Size as the most critical factor for data quality

Organizational impact strongly dependent on the size of the municipality 

demographic size

as one of the most important risk factors in the past census round

TYPE A B C 1 C 2 SIZE (inhabitants)

> 50,000 and chief towns 20,000 – 49,999 5,000 – 19,999 < 5,000

Municipalities

165 339 1,859 5,738

% Population (01.01.07)

35,8 17,0 29,6 17,6 As a result, it will be considered a modularity for the innovation scheme which depend on the municipalities population size

European Conference on Quality in Official Statistics

General Scheme for largest municipalities (A and B)

Source 1) Municipal Population Register (called LAC) Source 2) Auxiliary Sources Information Lists (called LIFA) Source 3) Addresses Register (called RNC)

Units included in LAC

• mail out of partially prefilled questionnaires • multi-mode response ( web, Cati) mail back , • Enumerators field work: • late respondents collection • check for over-coverage

Units not included in LAC

• option 1: complete search of missing units by enumerators on the basis of LIFA and RNC • option 2: survey based on capture recapture and adjustment of census counts

European Conference on Quality in Official Statistics

Use of samples of households for long form enumeration

  • • Mail out of census forms requires lighter questionnaires so as to Increase the

spontaneous

response rate Reduce as much as possible the response time delay Less enumerators will be necessary to retrieve information

But using only short form would decrease census informative power

 • • Census variables are partitioned into two mutually exclusive subsets

demographic variables

, such as gender, date of birth and nationality

socio-economic variables

, such as educational level, occupational status and commuting  • • Random sampling from population registers Sample of households provided with long forms Remaining households provided with short forms

European Conference on Quality in Official Statistics

Sampling design and estimation domains

   Domains: national, regional, provincial, municipal and sub-municipal Sub-municipal areas define a partition of municipal urban territory, each of about 15,000 inhabitants For 504 Italian municipalities with 20,000 inhabitants or more (or Chief Towns), at least an urban area can be built

Sampling from population registers

 • • Municipalities greater than 20,000 inhabitants urban areas: long forms sent to a sample of households and short forms sent to remaining households rural areas: long form sent to all the households residents  • Municipalities between 5,000 and 19,999 inhabitants municipal level: long forms sent to a sample of households and short forms sent to remaining households ( not yet decided )  In municipalities smaller than 4,999 inhabitants long forms are sent to the whole residents.

European Conference on Quality in Official Statistics

Experiments

           Goal: evaluating the expected efficiency of sampling estimates Carried out on 2001 census data 40 municipalities with at least 10,000 inhabitants (15 ≥ 100,000) • • Chosen not randomly across the country different populations size 10% of Italian households considered in analyses Various sampling rates considered (10%, 20%, 33%) Multivariate tables of short and long form variables together Calibrated estimates of 90 absolute and relative frequencies Sub-municipal (among 5,000 and 15,000 people), to regional domains Montecarlo techniques simulating sampling distribution of estimates Comparisons carried out by means of Coefficient of Variation (CV)

European Conference on Quality in Official Statistics

Main results

 Coefficient of Variation (average and maximum expected values) for the estimates referred to census sub-municipal areas

Classes of absolute frequency T

<10 10 ├30 30 ├50 50 ├100 100 ├250 250 ├500 500 ├1,000 1,000 ├2,500 2,500 ├5,000 5,000 ├10,000 sampling ratio = 10% cv%_average

143.3

75.9

51.8

38.6

25.4

16.1

11.8

7.5

4.9

3.2

cv%_max 191.8

85.1

57.1

41.3

28.5

18.3

12.8

8.9

5.4

3.8

sampling ratio = 20% cv%_average

101.4

48.4

31.8

22.3

15.7

10.4

7.5

4.7

3.0

2.0

cv%_max 123.7

54.6

37.1

28.4

19.6

12.5

8.2

5.9

3.6

2.5

sampling ratio = 33% cv%_average

66.5

33.8

23.4

17.4

11.4

7.5

5.3

3.3

2.0

1.3

cv%_max 95.8

38.5

25.6

19.1

12.8

8.1

5.9

3.9

2.5

1.9

European Conference on Quality in Official Statistics

Other results

  • • CV improvement when increasing sampling rate from 10% to 20%: to 33%: (33-38%) (53-58%) • • Average CV improvement when size of sub-municipal areas increases from

less than 10,000

inhabitants to 10,000-12,000: (14-20%) More that 12,000: (22-28%) Percentage distribution of estimated absolute frequencies Areas with at least 12,000 people Classes of cv% < 10% 10% - 50% > 50% 10% 29.80

54.23

15.97

Sampling ratio 20% 51.07

40.29

8.64

33% 65.42

29.52

5.05

European Conference on Quality in Official Statistics

Tackle with small domains and rare populations

    So as increasing estimates efficiency, standard small area estimators considered in the Eurarea Project are being studied Both unit level and area level model based estimators have been tested Early results show good performance when estimating relative frequencies up to 1% (among 50 and 150 cases for sub-municipal level, depending on their population size) Reduction of MSE is in between 40% and 80% with respect to calibrated estimators https://www.statistcs.gov.uk/eurarea

European Conference on Quality in Official Statistics

Undercounting evaluation and integration of missing units

   • Population register have to be amended also by undercount e.g. those units actually dwelling on the field that are not enlisted into the correspondent population register About 235,000 was the number of people enrolled in municipal • • population registers after enumeration during 2001 census People already residing elsewhere in Italy People coming from abroad (foreigners are underestimated) Two options are being studied so to overcome the registers • • undercount Complete search of missing units on the field  labour intensive Capture recapture model based on a sample survey  depending on various statistical hypotheses

European Conference on Quality in Official Statistics

Option 1 - Complete search of missing units on the field

     Use of alternative lists of names and addresses (LIFA) List of street numbers (RNC) supplemented with information referred to the related housing units 1.

whole number of dwellings (without housing units reserved for business activities) 2.

number of dwellings used as usual residence by households enlisted into the population register (LAC) LIFA is used to contact the households

directly

at their address RNC is used to

search

for

new

households only at street number where (1) – (2) > 0 • RNC not available for municipalities under 20,000 inhabitants complete search supported by the list of the households which already sent back a short/long form

European Conference on Quality in Official Statistics

Option 1 - Critical aspects of complete search

    It works well only if vacant housing units are few or very clustered Can be used as a method of optimisation when lack of resources • prevent the complete investigation of the field In this case an estimation procedure has however to be considered • • • Example: Florence - 2001 Census data; 72% of buildings are fully occupied by resident households.

90% of remaining buildings contain only up to 3 vacant housing units these buildings are made by 8.5 housing units on average, against a general average of 5.6 housing units per building • Higher risk of people and households duplication is expected Demanding use of record linkage techniques

European Conference on Quality in Official Statistics

Option 2 - Capture recapture estimation based on sample survey

    • • Estimation through a sample survey of the population living in each • • municipality being not enlisted into the related population register Not all the municipalities are included in the sample Simplified field activities and better control of costs

First capture

: population register corrected for over-coverage (

A

)

Second capture

: complete enumeration inside a sample of areas True population count

P

in a given area can be estimated by 

A

 1 

t over

  1 

t

ˆ

under

t

over rate of not eligible people enclosed into the register

t

ˆ

under

rate of residents non enlisted into the register among those enumerated during the sample survey

European Conference on Quality in Official Statistics

Option 2- A simplified scheme of the survey in a municipality

Municipality areas Check for over-coverage in LAC Check for under-coverage in LAC

Some interesting features

 Both the tasks can be carried out at the same time  Operational independence between the two procedures is assured  Differently than in the usual Capture Recapture approach, households are contacted only once

European Conference on Quality in Official Statistics

Option 2 - Sampling scheme

 Quite large sample (300,000 – 1,000,000 people) Primary sampling units Secondary sampling units All municipalities ≥ 50,000 inhabitants Sample of street numbers (RNC) Sample of municipalities 20,000-50,000 inhabitants Sample of municipalities with less than 20,000 inhabitants Sample of street numbers (RNC) Sample of enumeration areas   Direct dual system estimates until regional level Model assisted estimation for municipal and sub-municipal estimates

European Conference on Quality in Official Statistics

Option 2 - Some early results on small area estimation

       Evaluated through 2001 census post enumeration survey data 180,000 people and 70,000 households in 98 Italian municipalities Unit level small area estimation following Wolter (1986)

x

ˆ

D

i

 

D x i

1 

i

1 

p

i1+ estimated by means of a logistic model on the PES data Jacknife estimation of the variance is under study Next step: using an area level approach Example: Florence

European Conference on Quality in Official Statistics

Option 2 - Example: Coverage evaluation for Florence

European Conference on Quality in Official Statistics

Option 2 - Critical aspects of capture recapture approach

     • Dependencies on statistical hypothesis Independence between captures, homogeneity • • • Influence of record linkage errors activities on Timeliness Accuracy Identification of duplicates Quality evaluation of estimates carried out by the method Enumeration of housing units and present population Municipalities population register will not be directly amended for • • under-coverage Over-coverage: complete correction Under-coverage: estimate number of people to be included

European Conference on Quality in Official Statistics

Concluding remarks

     2011 Census as a step toward a register supported census Statistical policy and technical issues • • • • Many different methodological issues related each other Sampling Modelling Imputation Record linkage • • • Find the trade off among Innovations Results comparability Stakeholder needs April 2009 – Pilot survey