Ei dian otsikkoa - Vienna University of Technology

Download Report

Transcript Ei dian otsikkoa - Vienna University of Technology

Challenges in small area
estimation of poverty
indicators
Risto Lehtonen, Ari Veijanen, Maria Valaste
(University of Helsinki) , and
Mikko Myrskylä (Max Planck Institute for
Demographic Research, Rostock)
Ameli 2010 Conference, 25-26 February 2010, Vienna
Outline
 Background
 Material and methods
 Results
 Discussion
 References
2
EU/FP7 Project AMELI
 Advanced Methodology for European Laeken
Indicators (2008-2011)

The project is supported by European
Commission funding from the Seventh
Framework Programme for Research
 DoW: The study will include research on data
quality including




Measurement of quality
Treatment of outliers and nonresponse
Small area estimation
The measurement of development over time
3
Material and methods
 Investigation of statistical properties (bias and
accuracy) of estimators of selected Laeken
indicators for population subgroups or
domains and small areas
 Method: Design-based Monte Carlo
simulation experiments based on real data
 Data: Statistical register data based on
merging of administrative register data at the
unit level (Finland)
4
Laeken indicators based on
binary variables
 At-risk-of poverty rate
 Direct estimators

Horvitz-Thompson estimators HT
 Indirect estimators


Model-assisted GREG and MC estimators
Model-based EBLUP and EB estimators
 Modelling framework

Generalized linear mixed models GLMM
 Lehtonen and Veijanen (2009)
 Rao (2003), Jiang and Lahiri (2006)
5
Laeken indicators based on
medians or quantiles
 Indicators based on medians or quantiles of
cumulative distribution function of the
underlying continuous variable
 Relative median at-risk-of poverty gap
 Quintile share ratio (S20/S80 ratio)
 Gini coefficient

Direct estimators DEFAULT
 Synthetic estimators SYN
 Expanded prediction SYN estimators EP-SYN
 Composite estimators COMP
 Simulation-based methods
6
Generalized linear mixed models
Model formulation with domain - specific
(area - specific) random terms
Em ( y k ud )  f ( xk (β  ud )), d  1,..., D, where
f (.) refers to the chosen functional form
x k  (1, x1k ,..., x pk )
β  ( 0 , 1,...,  p ) are fixed effects
ud  (u0d ,..., u pd ) are random effects
Fitted values are yˆ k  f ( xk (βˆ  uˆ d )), k  U
7
Design-based GREG type
estimators for poverty rate
GREG estimators MLGREG
tˆdMLGREG   kU yˆ k   ks ak eˆk
d
d
d  1,..., D, where
ak  1/  k
eˆk  y k  yˆ k
yˆ k  f ( xk βˆ  uˆ d ), k  U
f refers to logistic function
8
Model-based estimators for
poverty rate
EBLUP and EB type estimators
tˆ

y 
yˆ ,
dEBLUP
k sd
k
k Ud  sd
k
d  1,..., D, where
yˆ k  f ( xk βˆ  uˆ d ), k  U
f refers to logistic function
9
Poverty gap for domains
 Relative median at-risk-of poverty gap
 Poverty gap in domain d describes the
difference between the poor people's median
income and the at-risk-of-poverty threshold t
t  Md { y k ; y k  t ; k  Ud }
gd 
t
d  1,..., D
10
Estimators of poverty gap
Default estimator for domain d is calculated
from the sample values y k :
tˆHT  Md { y k ; y k  tˆHT ; k  sd }
gˆ d 
tˆ
HT
d  1,..., D,
where tˆHT is HT estimator of poverty
threshold for the whole population
11
Estimators of poverty gap
Synthetic estimator for domain d is calculated from
predicted values yˆ k so that people with prediction
smaller than the estimated threshold are classified
as poor:
gˆ d ;SYN
tˆHT  Md { yˆ k ; yˆ k  tˆHT ; k  Ud }

tˆ
HT
d  1,..., D,
where
yˆ k  xk (βˆ  uˆ d ), k  U , d  1,..., D
12
Estimators of poverty gap
Composite estimator incorporates the default
estimator and the synthetic estimator:
gˆd ;COMP  ˆd gˆd  (1 ˆd )gˆd ,SYN
where ˆd is an average of
ˆ (gˆ
MSE
d ,SYN )
ˆ ˆ
ˆ (gˆ
MSE
d ,SYN )  MSE ( g d )
over a domain size class.
13
Estimators of poverty gap
Alternative SYN estimator EP-SYN:
Expanded prediction SYN estimator gˆd ;EP SYN
We transform predictions yˆ k ( k  Ud ) so that they have
similar histogram as the observed values y k ( k  s )
qˆc Percentage points of the distribution of yˆ k
qc Percentage points of the sample values y k
Find a linear transformation yk*  a  byˆk so that qc*  a  bqˆc
are close to corresponding qc k  Ud
(Ref. triple-goal estimation, e.g. Judkins and Liu 2000, Rao
2003)
14
MSE estimation for direct
estimator DEFAULT
ˆ (gˆ ) by bootstrap:
Estimation of MSE
d
An artificial population is generated by cloning
each unit with frequency equal to the design
weight
Bootstrap samples are drawn with the original
sampling design from the artificial population
The variance of the default estimator is then
estimated by the sample variance of estimates
in the bootstrap samples
15
MSE estimation for SYN estimator
ˆ (gˆ
Estimation of MSE
:
d ,SYN )
2
ˆ
ˆ (gˆ )
MSE (gˆ d ,SYN )   gˆ d ,SYN  gˆ d   MSE
d
Rao (2003 p. 52) and Fabrizi et al. (2007)
ˆ (gˆ
Alternative estimation of MSE
:
d ,SYN )
Parametric bootstrap similar to Molina and
Rao (2009)
16
Monte Carlo simulation
 Fixed finite population of 1,000,000 persons
 D = 70 domains of interest

Cross-classification of NUTS 3 with sex and age
group (7x2x5)
 Y-variables
 Equivalized income (based on register data)
 Binary indicator for persons in poverty
 X-variables (binary or continuous variables)
 house _owner (binary)
 education_level (7 classes) and educ_thh
 lfs_code (3 classes) and empmohh
 socstrat (6 classes)
 sex_class and age_class (5 age classes)
 NUTS3
17
Sampling designs
 SRSWOR sampling

Sample size n = 5,000 persons
 Stratified SRSWOR





Sample size n = 5,000 persons
Stratification by education level of HH head
H = 7 strata
Unequal inclusion probabilities
Design weights vary between strata
- Min: 185, Max: 783
 K = 1000 independent samples
18
Quality measures of estimators
 Design bias

Absolute relative bias
ARB (%)
1 K ˆ
 d (sk )  d / d
K k 1
 Accuracy

Relative root mean
squared error
RRMSE (%)
1 K ˆ
2
 ( d (sk )   d ) /  d
K k 1
19
Table 1. Poverty rate estimators with logistic mixed model including
NUTS3 level random intercepts
Unequal probability sampling: Stratified SRS (by education level)
Predictors: house_owner, age_class, sex_class, lfs_code, education
Domains: NUTS3 x age x sex (D = 70 domains)
Average ARB (%)
Domain size class
Estimator
Design-based
estimators
MLGREG
Model-based
estimators
EBLUP (EB)
Minor
20-49
Medium
50-99
Average RRMSE (%)
Domain size class
Major
100-
Minor
20-49
Medium
50-99
Major
100-
2.2
2.3
1.3
48.8
31.9
21.8
14.4
10.6
4.4
20.4
17.0
10.8
20
Table 2. Poverty gap estimators with linear mixed model fitted to
log(income+1) including NUTS3 level random intercepts
SRSWOR sampling
Predictors: house_owner, educ_thh, empmohh, lfs_code, socstrat
Domains: NUTS3 x age x sex (D = 70 domains)
Average ARB (%)
Domain size class
Estimator
Minor
20-49
Direct estimator
DEFAULT
Model-based estimators
SYN
EP-SYN
Composite estimator
COMP
(with DEFAULT
and EP-SYN)
Medium
50-99
Average RRMSE (%)
Domain size class
Major
100-
Minor
20-49
Medium
50-99
Major
100-
12.1
4.4
1.8
65.8
43.6
27.3
40.1
17.0
43.4
19.6
57.5
16.6
61.5
23.8
57.1
25.4
62.1
22.9
10.9
14.4
11.9
25.6
22.4
18.6
21
Discussion: Poverty rate
 Indirect design-based estimator MLGREG



Design unbiased
Large variance in small domains
Small variance in large domains
 Indirect model-based estimator EB




Design biased
Small variance also in small domains
Accuracy: EB outperformed MLGREG
Might be the best choice at least for small
domains unless it is important to avoid design
bias
22
Discussion: Poverty gap
 Direct estimator DEFAULT

Small design bias but large variance
 Indirect model-based SYN

Very large bias but small variance
 Indirect model-based EP-SYN
based on expanded predictions

Much smaller bias and variance than in SYN
 Composite (DEFAULT with EP-SYN)


Small domains: good compromise
Large domains: bias can still dominate the MSE
23
References
Fabrizi, E., M. R. Ferrante and S. Pacei (2007). Comparing alternative distributional
assumptions in mixed models used for small area estimation of income parameters.
Statistics in Transition 8, 423-439.
Jiang, J. and P. Lahiri (2006). Mixed model prediction and small area estimation.
Sociedad de Estadistica e Investigacion Operative Test 15, 1-96.
Judkins, D. R. and J. Liu (2000). Correcting the bias in the range of a statistic across
small areas. Journal of Official Statistics 16, 1-13.
Lehtonen, R. and A. Veijanen (2009). Design-based methods of estimation for
domains and small areas. In: C. R. Rao and D. Pfeffermann (eds.), Handbook of
Statistics 29B. Sample Surveys: Inference and Analysis. Elsevier.
Molina, I. and J.N.K. Rao (2009). Estimation of poverty measures in small areas.
(Manuscript)
Rao, J. N. K. (2003). Small Area Estimation. John Wiley & Sons, New York.
24
Thank you for your
attention!
25