A new version of the CALMAR calibration adjustment program 1

Download Report

Transcript A new version of the CALMAR calibration adjustment program 1

A new version of the
CALMAR
calibration adjustment program
1
The CALMAR2 macros
2
I.1. Background
CALMAR = CALibration on MARgins
CALMAR 1 = SAS macro program, written in 1992-1993 at
France’s INSEE by Sautory
Scope : implementing calibration methods developped by
Deville & Särndal (JASA, 1992)
CALMAR 2 = SAS macro, written in 2000 at France’s INSEE
Scope : implementing generalized calibration method for
handling total non-response (Deville, 1998)
3
I.2. What’s new in CALMAR 2
• Simultaneous calibration with 2 or 3 levels
• Total non- response adjustment using generalized calibration
• Handling collinearities between auxiliary variables
• A 5th distance function : generalized hyperbolic sine
• Interactive screens to enter parameters, thanks to
CALMAR2_GUIDE
4
Simultaneous calibration
5
II.1. The method
Informations are collected at several levels of observation :
• households + every household’s member
or :
firms + every establishment of the firms
i.e. cluster sampling survey, including questions about the clusters
• households + some of their members (Kish individuals)
i.e. two-stages sampling, including questions about the primary
units (P.U.)
• households + every household’s member + Kish units
+ auxiliary information available at every level
6
How performing calibration ?
• Independent calibration at every level of observation
• Simultaneous calibration (or "integrated") :
- same weights for all members of a household
- consistency between
varied data files
statistics
obtained
from
Simultaneous calibration method
A single calibration is performed at the P.U. level, after having
computed the calibration variables totals defined at the
secondary levels for each P.U. (Sautory, 1996)
7
II.2. An example
• households (sM sample)
• all the members of the selected households (sI sample)
• one member k m (Kish individual) in each selected
household m, chosen by simple random sampling among
the e m eligible members of the household (sK sample)
Weight of the household m :
d m  1/π m
Weight of the member i of the household m :
d m, i  d m  i  hous m
Weight of the Kish-individual of the household m :
d k m  d m em
8
Auxiliary information
x m = auxiliary variables vector for each household m in s M
X
x
mU m
m
= vector of the known auxiliary variables totals
in the households population U M
z m,i = auxiliary variables vector for each individual (m,i) in sI
Z   z i = vector of the known totals in the individuals
iU
population U I
I
v k = auxiliary variables vector for the Kish- individual k m
in s K
m
V
v
iU eI
i
= vector of the known totals in the Kish-units
population U eI
9
For each household m we compute :
• the totals of the individual variables : z m 
z
m,i
(m,i) men m
• the estimated totals of the Kish- individual variables :
v̂ m  e m v k
m
Vector of the calibration variables for the household m :
(x m , z m , v̂ m )
Vector of the totals :
(X , Z, V)
Calibration equations :
d
m s M
m
F(xm λ  zm μ  v̂m γ) (x m , z m , v̂ m )  (X , Z , V)
10
 weights
wm
= weight of the household m in s M
= weight of the individual (m,i) of the
household m in s I
= weight of the Kish-individual k m of the
household m in s K
wm
w m,i  w m
w k  w m em
m
The 3 samples are correctly calibrated on totals X, Z et V :


w m, i z m, i   w m  z m, i    w m z m  Z

i s I
m s M
 imen m  msM
 w k vk   w m em vk   w m v̂m  V
k m sK
m
m
k m sK
m
k m sK
11
Calmar 2 performs such
simultaneous calibrations.
The user must provide the entry tables for the various
levels (sample data files and calibration variables totals
files) : the program performs all the required operations
necessary to reduce the process to a single calibration,
and creates the varied calibrated weights files.
12
An example of simultaneous
calibration
13
The survey
• Sampling design : two stages sampling
– primary units = households, selected by stratified sampling with
S.R.S. in the stratum
– secondary units (Kish-units) = one member per selected
household, withdrawn by S.R.S. among more than 14 years old
members
• Questionary
– variables of interest are measured on Kish-units
– questions about the habitation and the whole family
– questions about each member of the household (age, sex,
profession)
• Calibration variables (xk)
– Households : household size + head of household professional
group + strata (~ agglomeration size)
– All individuals : sex + age group
– Kish individuals : sex + age group
• Population totals (X) come from the sampling frame
14
The program
15
• %CALMAR2 (datamen=base.echant_menages,
•
marmen=base.marge_men,
•
poids=poids1,
•
ident=ident,
•
dataind=base.echant_indiv,
•
marind=base.marge_ind,
•
ident2=id,
•
datakish=base.echant_kish,
•
markish=base.marge_kish,
•
poidkish=nbelig,
•
m=1,
•
datapoi=poidsmen,
•
datapoi2=poidsind,
•
datapoi3=poidskish,
•
poidsfin=w3,
•
labelpoi=calage 3 niveaux,
•
poidskishfin=w3k,
•
labelpoikish=poids kish total,
•
edition=3)
16
The output
17
**********************************
***
PARAMÈTRES DE LA MACRO
***
**********************************
TABLE(S) EN ENTRÉE :
TABLE DE DONNÉES DE NIVEAU 1
IDENTIFIANT DU NIVEAU 1
TABLE DE DONNÉES DE NIVEAU 2
IDENTIFIANT DU NIVEAU 2
TABLE DES INDIVIDUS KISH
PONDÉRATION INITIALE
FACTEUR D'ÉCHELLE
PONDÉRATION QK
PONDÉRATION KISH
TABLE(S) DES MARGES :
DE NIVEAU 1
DE NIVEAU 2
DE NIVEAU KISH
MARGES EN POURCENTAGES
EFFECTIF DANS LA POPULATION :
DES ÉLÉMENTS DE NIVEAU 1
DES ÉLÉMENTS DE NIVEAU 2
DES ÉLÉMENTS KISH
DATAMEN
IDENT
DATAIND
IDENT2
DATAKISH
POIDS
ECHELLE
PONDQK
POIDKISH
=
=
=
=
=
=
=
=
=
BASE.ECHANT_MENAGES
IDENT
BASE.ECHANT_INDIV
ID
BASE.ECHANT_KISH
POIDS1
1
__UN
NBELIG
MARMEN
MARIND
MARKISH
PCT
=
=
=
=
BASE.MARGE_MEN
BASE.MARGE_IND
BASE.MARGE_KISH
NON
POPMEN
POPIND
POPKISH
=
=
=
18
MÉTHODE UTILISÉE
BORNE INFÉRIEURE
BORNE SUPÉRIEURE
COEFFICIENT DU SINUS HYPERBOLIQUE
SEUIL D'ARRÊT
NOMBRE MAXIMUM D'ITÉRATIONS
TRAITEMENT DES COLINÉARITÉS
TABLE(S) CONTENANT LA POND. FINALE
DE NIVEAU 1
DE NIVEAU 2
DE NIVEAU KISH
MISE À JOUR DE(S) TABLE(S) DATAPOI(2)(3)
PONDÉRATION FINALE
LABEL DE LA PONDÉRATION FINALE
NIVEAUX
PONDÉRATION FINALE DES UNITES KISH
LABEL DE LA PONDÉRATION KISH
TOTAL
CONTENU DE(S) TABLE(S) DATAPOI(2)(3)
ÉDITION DES RÉSULTATS
ÉDITION DES POIDS
STATISTIQUES SUR LES POIDS
CONTRÔLES
TABLE CONTENANT LES OBS. ÉLIMINÉES
NOTES SAS
M
LO
UP
ALPHA
SEUIL
MAXITER
COLIN
=
=
=
=
=
=
=
1
1
0.0001
15
NON
DATAPOI
DATAPOI2
DATAPOI3
MISAJOUR
POIDSFIN
LABELPOI
=
=
=
=
=
=
POIDSMEN
POIDSIND
POIDSKISH
OUI
W3
CALAGE 3
POIDSKISHFIN
LABELPOIKISH
=
=
W3K
POIDS KISH
CONTPOI
=
OUI
EDITION
EDITPOI
STAT
CONT
OBSELI
NOTES
=
=
=
=
=
=
3
NON
OUI
OUI
NON
NON
19
COMPARAISON ENTRE LES MARGES TIRÉES DE L'ÉCHANTILLON (PONDÉRATION INITIALE)
ET LES MARGES DANS LA POPULATION (MARGES DU CALAGE)
VARIABLE
MODALITÉ
MARGE
ÉCHANTILLON
MARGE
POURCENTAGE POURCENTAGE
POPULATION ÉCHANTILLON POPULATION
NBIND
01
02
03
04
05
06
1525.60
1914.37
797.71
930.78
365.18
267.36
1539
1860
1000
885
361
156
26.30
33.00
13.75
16.05
6.30
4.61
26.53
32.06
17.24
15.26
6.22
2.69
PCSPR
1
2
3
4
5
6
7
8
80.70
191.78
822.81
832.34
569.41
1279.53
1839.32
185.11
124
290
624
870
682
1237
1831
143
1.39
3.31
14.18
14.35
9.82
22.06
31.71
3.19
2.14
5.00
10.76
15.00
11.76
21.32
31.56
2.47
STRATE
0
1
2
3
4
1453.00
966.00
805.00
1689.00
888.00
1453
966
805
1689
888
25.05
16.65
13.88
29.12
15.31
25.05
16.65
13.88
29.12
15.31
20
VARIABLE
AGE
SEXE
AGEK
SEXEK
MODALITÉ
MARGE
ÉCHANTILLON
MARGE
POPULATION
POURCENTAGE
ÉCHANTILLON
POURCENTAGE
POPULATION
00-14 ans
3245.32
2857
21.46
19.52
15-24 ans
2217.86
2044
14.67
13.96
25-59 ans
6699.70
6800
44.31
46.45
60- ? ans
2957.50
2939
19.56
20.08
1
7546.69
7108
49.91
48.55
2
7573.69
7532
50.09
51.45
A15
2155.94
2044
18.28
17.35
A25
6752.61
6800
57.25
57.71
A60
2885.84
2939
24.47
24.94
1
5596.30
5673
47.45
48.15
2
6198.09
6110
52.55
51.85
21
MÉTHODE : LINÉAIRE
PREMIER TABLEAU RÉCAPITULATIF DE L'ALGORITHME
LA VALEUR DU CRITÈRE D'ARRÊT ET LE NOMBRE DE POIDS NÉGATIFS
APRÈS CHAQUE ITÉRATION
ITÉRATION
1
2
CRITÈRE
D'ARRÊT
1.31960
0.00000
POIDS
NÉGATIFS
1
1
22
MÉTHODE : LINÉAIRE
DEUXIÈME TABLEAU RÉCAPITULATIF DE L'ALGORITHME
LES COEFFICIENTS DU VECTEUR LAMBDA DE MULTIPLICATEURS DE LAGRANGE
APRÈS CHAQUE ITÉRATION
VARIABLE
NBIND
NBIND
NBIND
NBIND
NBIND
NBIND
PCSPR
PCSPR
PCSPR
PCSPR
PCSPR
PCSPR
PCSPR
PCSPR
MODALITÉ
01
02
03
04
05
06
1
2
3
4
5
6
7
8
LAMBDA1
-0.15325
-0.24295
0.00562
-0.17355
-0.00502
-0.44773
0.92036
0.50376
-0.18514
0.15354
0.36019
0.08424
0.16042
.
LAMBDA2
-0.15325
-0.24295
0.00562
-0.17355
-0.00502
-0.44773
0.92036
0.50376
-0.18514
0.15354
0.36019
0.08424
0.16042
.
23
•
VARIABLE
MODALITÉ
LAMBDA1
LAMBDA2
STRATE
STRATE
STRATE
STRATE
STRATE
AGE
AGE
AGE
AGE
SEXE
SEXE
AGEK
AGEK
AGEK
SEXEK
SEXEK
0
1
2
3
4
00-14 ans
15-24 ans
25-59 ans
60- ? ans
1
2
A15
A25
A60
1
2
-0.14172
-0.07338
-0.12634
-0.03106
.
-0.03549
-0.65576
-0.52872
-0.64430
-0.08395
.
0.67198
0.68366
0.74262
0.01727
.
-0.14172
-0.07338
-0.12634
-0.03106
.
-0.03549
-0.65576
-0.52872
-0.64430
-0.08395
.
0.67198
0.68366
0.74262
0.01727
.
24
COMPARAISON ENTRE LES MARGES FINALES DANS L'ÉCHANTILLON
(AVEC LA PONDÉRATION FINALE)
ET LES MARGES DANS LA POPULATION (MARGES DU CALAGE)
VARIABLE
MODALITÉ
MARGE
ÉCHANTILLON
MARGE
POPULATION
POURCENTAGE
ÉCHANTILLON
POURCENTAGE
POPULATION
NBIND
01
02
03
04
05
06
1539
1860
1000
885
361
156
1539
1860
1000
885
361
156
26.53
32.06
17.24
15.26
6.22
2.69
26.53
32.06
17.24
15.26
6.22
2.69
PCSPR
1
2
3
4
5
6
7
8
124
290
624
870
682
1237
1831
143
124
290
624
870
682
1237
1831
143
2.14
5.00
10.76
15.00
11.76
21.32
31.56
2.47
2.14
5.00
10.76
15.00
11.76
21.32
31.56
2.47
STRATE
0
1
2
3
4
1453
966
805
1689
888
1453
966
805
1689
888
25.05
16.65
13.88
29.12
15.31
25.05
16.65
13.88
29.12
15.31
25
VARIABLE
AGE
MODALITÉ
MARGE
POPULATION
POURCENTAGE
ÉCHANTILLON
POURCENTAGE
POPULATION
ans
ans
ans
ans
2857
2044
6800
2939
2857
2044
6800
2939
19.52
13.96
46.45
20.08
19.52
13.96
46.45
20.08
SEXE
1
2
7108
7532
7108
7532
48.55
51.45
48.55
51.45
AGEK
A15
A25
A60
2044
6800
2939
2044
6800
2939
17.35
57.71
24.94
17.35
57.71
24.94
1
2
5673
6110
5673
6110
48.15
51.85
48.15
51.85
SEXEK
00-14
15-24
25-59
60- ?
MARGE
ÉCHANTILLON
26
STATISTIQUES SUR LES RAPPORTS DE POIDS
(= PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES)
ET SUR LES PONDÉRATIONS FINALES
The UNIVARIATE Procedure
Variable: _F_ (RAPPORT DE POIDS)
Basic Statistical Measures
Location
Mean
Median
Mode
Quantiles (Definition 5)
Quantile
Estimate
Variability
1.000000
0.996533
0.991339
Std Deviation
Variance
Range
Interquartile Range
0.24564
0.06034
2.32886
0.21258
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
2.009262
1.745002
1.377982
1.278637
1.105492
0.996533
0.892917
0.749877
0.613091
0.251528
-0.319601
Extreme Observations
-------------Lowest------------Value
-0.3196012
0.0374385
0.1498661
0.1872096
0.2314417
------------Highest-----------
IDENT
Obs
Value
1163032100
7363016270
1169040310
7269001420
7363017990
27
365
73
348
366
1.76397
1.79618
1.85813
1.97094
2.00926
IDENT
Obs
5363019600
7463000450
2369004180
5463007950
5263016110
293
381
129
326
27
268
STATISTIQUES SUR LES RAPPORTS DE POIDS
(= PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES)
ET SUR LES PONDÉRATIONS FINALES
The UNIVARIATE Procedure
Variable:
_F_
(RAPPORT DE POIDS)
Histogram
2.05+*
.*
.*
.*
.*
.**
.***
.*********
.*********
.********************
.*************************************
.*******************************************
0.85+*******************
.***********
.******
.***
.**
.*
.*
.*
.*
.
.
.
-0.35+*
----+----+----+----+----+----+----+----+--* may represent up to 3 counts
#
1
1
1
3
3
5
7
26
27
59
110
128
57
33
17
8
5
3
2
2
1
1
Boxplot
*
*
*
0
0
0
0
|
|
+-----+
| + |
*-----*
+-----+
|
|
0
0
0
0
*
*
*
28
STATISTIQUES SUR LES RAPPORTS DE POIDS (= PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES)
ET SUR LES PONDÉRATIONS FINALES
The UNIVARIATE Procedure
Variable: __WFIN (PONDÉRATION FINALE)
Basic Statistical Measures
Location
Mean
Median
Mode
Quantiles (Definition 5)
Quantile
Estimate
Variability
11.60200
10.11949
9.57633
Std Deviation
Variance
Range
Interquartile Range
4.62597
21.39957
32.03263
5.70090
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
29.19457
25.69548
20.11085
18.04434
13.98763
10.11949
8.28672
7.15056
6.41373
2.50660
-2.83806
Extreme Observations
-------------Lowest-----------Value
-2.838058
0.543982
1.330811
1.808444
2.235727
------------Highest-----------
IDENT
Obs
Value
1163032100
7363016270
1169040310
7269001420
7363017990
27
365
73
348
366
25.7604
26.0985
28.6378
28.6643
29.1946
IDENT
Obs
5369016540
7463000450
5463007950
8269018030
5263016110
317
381
326
421
268 29
STATISTIQUES SUR LES RAPPORTS DE POIDS
(= PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES)
ET SUR LES PONDÉRATIONS FINALES
The UNIVARIATE Procedure
Variable:
__WFIN
(PONDÉRATION FINALE)
Histogram
29+*
.*
.**
.***
.****
.*********
.**************
.***********
13+***********************
.**********************
.*********************************************
.******************************
.*****
.**
.*
.
-3+*
----+----+----+----+----+----+----+----+----+
* may represent up to 3 counts
#
3
1
4
8
11
25
41
32
67
64
134
88
14
4
3
1
Boxplot
0
0
0
0
|
|
|
|
+-----+
*--+--*
+-----+
|
|
|
|
0
30
MÉTHODE : LINÉAIRE
RAPPORTS DE POIDS MOYENS (PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES)
POUR CHAQUE VALEUR DES VARIABLES
VARIABLE
MODALITE
NBIND
NBIND
NBIND
NBIND
NBIND
NBIND
PCSPR
PCSPR
PCSPR
PCSPR
PCSPR
PCSPR
PCSPR
PCSPR
STRATE
STRATE
STRATE
STRATE
STRATE
ENSEMBLE
01
02
03
04
05
06
1
2
3
4
5
6
7
8
0
1
2
3
4
NOMBRE
D'OBSERVATIONS
DE NIVEAU 1
133
167
69
79
31
21
6
15
73
73
51
111
157
14
100
100
100
100
100
500
RAPPORT
DE POIDS
1.00152
0.97304
1.24647
0.95151
0.99271
0.58818
1.55064
1.52001
0.76645
1.04281
1.20566
0.96902
0.99429
0.76191
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
31
MÉTHODE : LINÉAIRE
RAPPORTS DE POIDS MOYENS (PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES)
POUR CHAQUE VALEUR DES VARIABLES
VARIABLE
MODALITE
AGE
AGE
AGE
AGE
SEXE
SEXE
ENSEMBLE
00-14
15-24
25-59
60- ?
1
2
an
an
an
an
VARIABLE
MODALITE
AGEK
AGEK
AGEK
SEXEK
SEXEK
ENSEMBLE
A15
A25
A60
1
2
NOMBRE
D'OBSERVATIONS
DE NIVEAU 2
RAPPORT
DE POIDS
274
184
581
249
640
648
1288
0.88664
0.93210
1.01758
0.99088
0.94443
0.99993
0.97235
NOMBRE
D'INDIVIDUS
KISH
RAPPORT
DE POIDS
66
283
151
232
268
500
0.95043
1.01108
1.00090
0.98540
1.01264
1.00000
32
MÉTHODE : LINÉAIRE
CONTENU DE LA TABLE poidsmen CONTENANT LA NOUVELLE PONDÉRATION w3
The CONTENTS Procedure
#
1
2
Variable
IDENT
w3
Type
Char
Num
Len
10
8
Pos
8
0
Label
calage 3 niveaux
CONTENU DE LA TABLE poidsind CONTENANT LA NOUVELLE PONDÉRATION w3
#
2
1
3
Variable
IDENT
id
w3
Type
Char
Char
Num
Len
10
12
8
Pos
20
8
0
Label
calage 3 niveaux
CONTENU DE LA TABLE poidskish CONTENANT LA NOUVELLE PONDÉRATION w3
#
2
1
3
4
Variable
ID
IDENT
w3
w3k
Type
Char
Char
Num
Num
Len
12
10
8
8
Pos
26
16
0
8
Label
calage 3 niveaux
poids kish total
33
*********************
***
BILAN
***
*********************
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
DATE : 24 AOUT 2005
HEURE : 11:12
*************************************
TABLE EN ENTRÉE : BASE.ECHANT_MENAGES
*************************************
NOMBRE D'OBSERVATIONS DANS LA TABLE EN ENTRÉE
NOMBRE D'OBSERVATIONS ÉLIMINÉES
NOMBRE D'OBSERVATIONS CONSERVÉES
:
:
:
500
0
500
VARIABLE DE PONDÉRATION : POIDS1
NOMBRE DE VARIABLES CATÉGORIELLES : 3
LISTE DES VARIABLES CATÉGORIELLES ET DE LEURS NOMBRES DE MODALITÉS :
nbind (6) pcspr (8) strate (5)
SOMME DES POIDS INITIAUX
TAILLE DE LA POPULATION
: 5801
: 5801
***********************************
TABLE EN ENTRÉE : BASE.ECHANT_INDIV
***********************************
NOMBRE D'OBSERVATIONS DANS LA TABLE EN ENTRÉE
NOMBRE D'OBSERVATIONS ÉLIMINÉES
NOMBRE D'OBSERVATIONS CONSERVÉES
:
:
:
1288
0
1288
NOMBRE DE VARIABLES CATÉGORIELLES : 2
LISTE DES VARIABLES CATÉGORIELLES ET DE LEURS NOMBRES DE MODALITÉS :
age (4) sexe (2)
SOMME DES POIDS INITIAUX
TAILLE DE LA POPULATION
: 15120
: 14640
34
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
***********************************
TABLE EN ENTRÉE : BASE.ECHANT_KISH
***********************************
NOMBRE D'OBSERVATIONS DANS LA TABLE EN ENTRÉE
NOMBRE D'OBSERVATIONS ÉLIMINÉES
NOMBRE D'OBSERVATIONS CONSERVÉES
:
:
:
500
0
500
VARIABLE DE PONDÉRATION CONDITIONNELLE
NOMBRE MAXIMUM D'UNITES SECONDAIRES PAR UP
:
:
NBELIG
1
NOMBRE DE VARIABLES CATÉGORIELLES : 2
LISTE DES VARIABLES CATÉGORIELLES ET DE LEURS NOMBRES DE MODALITÉS :
agek (3) sexek (2)
SOMME DES POIDS INITIAUX
TAILLE DE LA POPULATION
: 11794
: 11783
MÉTHODE UTILISÉE : LINÉAIRE
LE CALAGE A ÉTÉ RÉALISÉ EN 2 ITÉRATIONS
IL Y A 1
POIDS NÉGATIFS
LES POIDS ONT ÉTÉ STOCKÉS DANS LA VARIABLE W3 DE LA TABLE POIDSMEN
ET DE LA TABLE POIDSIND
ET DE LA TABLE POIDSKISH
LES POIDS DES UNITES KISH ONT ÉTÉ STOCKÉS DANS LA VARIABLE W3K
DE LA TABLE POIDSKISH
35
Handling
total non-response with
generalized calibration
36
III.1. Generalized calibration
F zk   such
Calibration functions :
as F0  1
where  : vector of p adjustment parameters
zk : vector of p known vari ables in s
Calibration equations :
 d Fz λ  x
k
k
k
X
s
Solving for  
w k  d k Fzk λ 
37
Basic result
Yˆw   wk yk asymptotic ally equivalent to
s


Yˆreg i  YˆHT  X  Xˆ HT Bˆ szx
1

 

ˆ
where Bszx    d k zk xk    d k zk yk 
 k s
  k s

B̂szx = parameter estimates of the instrumental regression
of y k on x k with z k as instrumental variables,
weighted by d k
If F zk    1  zk  , then Yˆw  Yˆreg i
38
Precision
 AV Ŷw    Δ k d k E k d  E  
U




E k  y k  x k Bzx tq   z k x k  Bzx   z k y k
 U

U
= residual of the regression of Y on X in U with the
instrumental variables Z
Δ k
 V̂Ŷw   
d k e k d  e 
π k
s
e k  y k  xk B̂szx
Note : the instruments are equal to
grad F(zk λ) λ0  F(0) z k
 z k if F(0)  1
39
III.2. Calibration in case of total non-response
 Calibration after adjustment for non-response
1.a. Adjustment for non-response
Response probabilities (conditionnally to s) :
p k  Pk  r / k  s 
p k is estimated referring to a response model and
LM   p̂k
an estimation method
Expansion estimator :
Ŷexp
1
  d k yk
p̂ k
r
40
Examples
• Uniform response model :
r
p̂ k 
n
• Homogeneous response groups : p̂ k  r h if k   h 
nh
• Generalized linear model :
1
pk 
Hzk β 
z k  vector of explanatory non-response variables
Note : for estimating p k , z k must be known both for
respondents AND NON-RESPONDENTS
41
1.b. Calibration
We start from corrected weights
dk
 d*k
p̂ k
Conventional calibration :
* *
d
 k F xk μ  x k  X
r
42
 Direct conventional calibration
 d Fx λ  x
k
k
k
X
r
 is equivalent to  with a uniform non-response model.
Comparison between  and 
(Dupont, 1993)
Let’s suppose :
- N.R. is corrected by a GLM, in which H is one of the usual
1
calibration functions F : p k 
Fzk  
- non-response variables z k are included into calibration set of
variables x k .
Then :  and  are " similar "
43
 and  are identical when :
a  x k  z k ,
(b)
.
F  F*  H  exp
N.R. is corrected by HRG model based on a categorical
variable X
.
The sample is calibrated on the number of units in U for
each X level
 =  = formal post-stratification on U
44
 Direct generalized calibration
(E)
 d Hz  x
k
k
k
X
r
Interpretation
Response model :
(E) can be written :
1
pk 
Hzk 0 
 
H zk β̂
Hzk β 0  λ 
X   d k Hzk β 0 
x k   d k Hzk β 0 
xk
Hzk β 0 
Hzk β 0 
r
r
  dk
r
with β̂  β 0  λ
1
pk
Fzk λ 
xk
45
So, if the p k were known :
(E) = generalized calibration equation, with :
 dk
  initial weights
 pk
F  calibratio n function

Hzk 0   
F is defined as Fzk   
Hzk 0 
and such as F0  1
instruments : grad Fzk    0
Hzk 0 

z k  z*k
46
Hzk 0 
Precision
• AVŶw  uses the residuals in the population
E k  y k  xk Bz*x
where
*
z
 k yk  xk Bz*x   0
U
• V̂Ŷw  uses the residuals of the instrumental regression
in r, weighted by the d k Hzk 0  :
ek  yk  xk B̂rz*x0


where  d k Hzk β0  z*k y k  xk B̂rz*x0  0
r
B̂rz x 0 estimator for Bz x if response probabilities H 1 zk 0 
*
were known
*
47
Response probabilities are unknown
 "estimate" B̂rz*x 0 and the residuals :
ek  yk  xk B̂rz*x
where
  

*

d
H
z
β̂
z
 k k k yk  xk B̂rz*x  0
r
i.e. instrumental regression weighted by final weights
w k  d k Hzk ˆ 
Note : V̂Ŷw  looks like
V̂Ŷw   Q1 (ek )  Q2 (ek )
Q1 = estimated variance 1st phase (sample s selection)
Q 2 = estimated variance 2nd phase (respondents r "selection")
Properties of the method
• allows non–response correction even when explanatory
variables are only known for respondents
• Handles the particular situation in which non-response
explanatory variables are variables of interest (non
ignorable response mechanism )
• reduces the bias produced by non–response thanks to
variables z k , and reduces the variance thanks to variables x k
This method is performed in Calmar 2.
49
An example of
generalized calibration
50
The survey
• Sampling frame : population census (1990)
• Sampling design : cluster sampling
– clusters = households
– secondary units = all members of selected households
• Response model
– H.R.G.
– response variables = household size (alone or not)
+ head of household profession (6 levels)
+ strata (~ agglomeration size)
• Calibration variables (xk)
– Households : the same as before (in the sampling frame)
– Individuals : sex + age group (in the sampling frame)
– Simultaneous calibration with two levels
• Instrumental variables (zk)
– Response variables as they are measured in the survey, that is in
51
1996
The population totals data
Constraint : the xk and
zk vectors must have same dimension
• Primary units (households)
var
n
R
mar1
mar2
mar3
mar4
mar5
mar6
strate90
seul90
cs90
strate96
seul96
cs96
5
2
6
5
2
6
0
0
0
1
1
1
1314
3933
457
.
.
.
833
1172
470
.
.
.
704
.
537
.
.
.
1477
.
435
.
.
.
777
.
1254
.
.
.
.
.
1952
.
.
.
• Secondary units (individuals)
var
n
R
mar1
mar2
mar3
mar4
sexe
age
sexe_bis
age_bis
2
4
2
4
0
0
1
1
6255
2514
.
.
6628
1799
.
.
.
5984
.
.
.
2586
.
.
52
%calmar2_guide
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
Merci de votre attention !
68