Ei dian otsikkoa - Vienna University of Technology
Download
Report
Transcript Ei dian otsikkoa - Vienna University of Technology
Challenges in small area
estimation of poverty
indicators
Risto Lehtonen, Ari Veijanen, Maria Valaste
(University of Helsinki) , and
Mikko Myrskylä (Max Planck Institute for
Demographic Research, Rostock)
Ameli 2010 Conference, 25-26 February 2010, Vienna
Outline
Background
Material and methods
Results
Discussion
References
2
EU/FP7 Project AMELI
Advanced Methodology for European Laeken
Indicators (2008-2011)
The project is supported by European
Commission funding from the Seventh
Framework Programme for Research
DoW: The study will include research on data
quality including
Measurement of quality
Treatment of outliers and nonresponse
Small area estimation
The measurement of development over time
3
Material and methods
Investigation of statistical properties (bias and
accuracy) of estimators of selected Laeken
indicators for population subgroups or
domains and small areas
Method: Design-based Monte Carlo
simulation experiments based on real data
Data: Statistical register data based on
merging of administrative register data at the
unit level (Finland)
4
Laeken indicators based on
binary variables
At-risk-of poverty rate
Direct estimators
Horvitz-Thompson estimators HT
Indirect estimators
Model-assisted GREG and MC estimators
Model-based EBLUP and EB estimators
Modelling framework
Generalized linear mixed models GLMM
Lehtonen and Veijanen (2009)
Rao (2003), Jiang and Lahiri (2006)
5
Laeken indicators based on
medians or quantiles
Indicators based on medians or quantiles of
cumulative distribution function of the
underlying continuous variable
Relative median at-risk-of poverty gap
Quintile share ratio (S20/S80 ratio)
Gini coefficient
Direct estimators DEFAULT
Synthetic estimators SYN
Expanded prediction SYN estimators EP-SYN
Composite estimators COMP
Simulation-based methods
6
Generalized linear mixed models
Model formulation with domain - specific
(area - specific) random terms
Em ( y k ud ) f ( xk (β ud )), d 1,..., D, where
f (.) refers to the chosen functional form
x k (1, x1k ,..., x pk )
β ( 0 , 1,..., p ) are fixed effects
ud (u0d ,..., u pd ) are random effects
Fitted values are yˆ k f ( xk (βˆ uˆ d )), k U
7
Design-based GREG type
estimators for poverty rate
GREG estimators MLGREG
tˆdMLGREG kU yˆ k ks ak eˆk
d
d
d 1,..., D, where
ak 1/ k
eˆk y k yˆ k
yˆ k f ( xk βˆ uˆ d ), k U
f refers to logistic function
8
Model-based estimators for
poverty rate
EBLUP and EB type estimators
tˆ
y
yˆ ,
dEBLUP
k sd
k
k Ud sd
k
d 1,..., D, where
yˆ k f ( xk βˆ uˆ d ), k U
f refers to logistic function
9
Poverty gap for domains
Relative median at-risk-of poverty gap
Poverty gap in domain d describes the
difference between the poor people's median
income and the at-risk-of-poverty threshold t
t Md { y k ; y k t ; k Ud }
gd
t
d 1,..., D
10
Estimators of poverty gap
Default estimator for domain d is calculated
from the sample values y k :
tˆHT Md { y k ; y k tˆHT ; k sd }
gˆ d
tˆ
HT
d 1,..., D,
where tˆHT is HT estimator of poverty
threshold for the whole population
11
Estimators of poverty gap
Synthetic estimator for domain d is calculated from
predicted values yˆ k so that people with prediction
smaller than the estimated threshold are classified
as poor:
gˆ d ;SYN
tˆHT Md { yˆ k ; yˆ k tˆHT ; k Ud }
tˆ
HT
d 1,..., D,
where
yˆ k xk (βˆ uˆ d ), k U , d 1,..., D
12
Estimators of poverty gap
Composite estimator incorporates the default
estimator and the synthetic estimator:
gˆd ;COMP ˆd gˆd (1 ˆd )gˆd ,SYN
where ˆd is an average of
ˆ (gˆ
MSE
d ,SYN )
ˆ ˆ
ˆ (gˆ
MSE
d ,SYN ) MSE ( g d )
over a domain size class.
13
Estimators of poverty gap
Alternative SYN estimator EP-SYN:
Expanded prediction SYN estimator gˆd ;EP SYN
We transform predictions yˆ k ( k Ud ) so that they have
similar histogram as the observed values y k ( k s )
qˆc Percentage points of the distribution of yˆ k
qc Percentage points of the sample values y k
Find a linear transformation yk* a byˆk so that qc* a bqˆc
are close to corresponding qc k Ud
(Ref. triple-goal estimation, e.g. Judkins and Liu 2000, Rao
2003)
14
MSE estimation for direct
estimator DEFAULT
ˆ (gˆ ) by bootstrap:
Estimation of MSE
d
An artificial population is generated by cloning
each unit with frequency equal to the design
weight
Bootstrap samples are drawn with the original
sampling design from the artificial population
The variance of the default estimator is then
estimated by the sample variance of estimates
in the bootstrap samples
15
MSE estimation for SYN estimator
ˆ (gˆ
Estimation of MSE
:
d ,SYN )
2
ˆ
ˆ (gˆ )
MSE (gˆ d ,SYN ) gˆ d ,SYN gˆ d MSE
d
Rao (2003 p. 52) and Fabrizi et al. (2007)
ˆ (gˆ
Alternative estimation of MSE
:
d ,SYN )
Parametric bootstrap similar to Molina and
Rao (2009)
16
Monte Carlo simulation
Fixed finite population of 1,000,000 persons
D = 70 domains of interest
Cross-classification of NUTS 3 with sex and age
group (7x2x5)
Y-variables
Equivalized income (based on register data)
Binary indicator for persons in poverty
X-variables (binary or continuous variables)
house _owner (binary)
education_level (7 classes) and educ_thh
lfs_code (3 classes) and empmohh
socstrat (6 classes)
sex_class and age_class (5 age classes)
NUTS3
17
Sampling designs
SRSWOR sampling
Sample size n = 5,000 persons
Stratified SRSWOR
Sample size n = 5,000 persons
Stratification by education level of HH head
H = 7 strata
Unequal inclusion probabilities
Design weights vary between strata
- Min: 185, Max: 783
K = 1000 independent samples
18
Quality measures of estimators
Design bias
Absolute relative bias
ARB (%)
1 K ˆ
d (sk ) d / d
K k 1
Accuracy
Relative root mean
squared error
RRMSE (%)
1 K ˆ
2
( d (sk ) d ) / d
K k 1
19
Table 1. Poverty rate estimators with logistic mixed model including
NUTS3 level random intercepts
Unequal probability sampling: Stratified SRS (by education level)
Predictors: house_owner, age_class, sex_class, lfs_code, education
Domains: NUTS3 x age x sex (D = 70 domains)
Average ARB (%)
Domain size class
Estimator
Design-based
estimators
MLGREG
Model-based
estimators
EBLUP (EB)
Minor
20-49
Medium
50-99
Average RRMSE (%)
Domain size class
Major
100-
Minor
20-49
Medium
50-99
Major
100-
2.2
2.3
1.3
48.8
31.9
21.8
14.4
10.6
4.4
20.4
17.0
10.8
20
Table 2. Poverty gap estimators with linear mixed model fitted to
log(income+1) including NUTS3 level random intercepts
SRSWOR sampling
Predictors: house_owner, educ_thh, empmohh, lfs_code, socstrat
Domains: NUTS3 x age x sex (D = 70 domains)
Average ARB (%)
Domain size class
Estimator
Minor
20-49
Direct estimator
DEFAULT
Model-based estimators
SYN
EP-SYN
Composite estimator
COMP
(with DEFAULT
and EP-SYN)
Medium
50-99
Average RRMSE (%)
Domain size class
Major
100-
Minor
20-49
Medium
50-99
Major
100-
12.1
4.4
1.8
65.8
43.6
27.3
40.1
17.0
43.4
19.6
57.5
16.6
61.5
23.8
57.1
25.4
62.1
22.9
10.9
14.4
11.9
25.6
22.4
18.6
21
Discussion: Poverty rate
Indirect design-based estimator MLGREG
Design unbiased
Large variance in small domains
Small variance in large domains
Indirect model-based estimator EB
Design biased
Small variance also in small domains
Accuracy: EB outperformed MLGREG
Might be the best choice at least for small
domains unless it is important to avoid design
bias
22
Discussion: Poverty gap
Direct estimator DEFAULT
Small design bias but large variance
Indirect model-based SYN
Very large bias but small variance
Indirect model-based EP-SYN
based on expanded predictions
Much smaller bias and variance than in SYN
Composite (DEFAULT with EP-SYN)
Small domains: good compromise
Large domains: bias can still dominate the MSE
23
References
Fabrizi, E., M. R. Ferrante and S. Pacei (2007). Comparing alternative distributional
assumptions in mixed models used for small area estimation of income parameters.
Statistics in Transition 8, 423-439.
Jiang, J. and P. Lahiri (2006). Mixed model prediction and small area estimation.
Sociedad de Estadistica e Investigacion Operative Test 15, 1-96.
Judkins, D. R. and J. Liu (2000). Correcting the bias in the range of a statistic across
small areas. Journal of Official Statistics 16, 1-13.
Lehtonen, R. and A. Veijanen (2009). Design-based methods of estimation for
domains and small areas. In: C. R. Rao and D. Pfeffermann (eds.), Handbook of
Statistics 29B. Sample Surveys: Inference and Analysis. Elsevier.
Molina, I. and J.N.K. Rao (2009). Estimation of poverty measures in small areas.
(Manuscript)
Rao, J. N. K. (2003). Small Area Estimation. John Wiley & Sons, New York.
24
Thank you for your
attention!
25