Diapositive 1

Download Report

Transcript Diapositive 1

An Active Collection using Intermediate
Estimates to Manage Follow-Up of NonResponse and Measurement Errors
Jeannine Claveau, Serge Godbout and Claude Turmelle
Statistics Canada
International Total Survey Error Workshop
Québec, June 20, 2011
Outline




2
Introduction
Quality Indicators (QI)
Measure of Impact (MI) Scores
Future Work
Statistics Canada • Statistique Canada
18/07/2015
Unified Enterprise Surveys (UES)
 UES consists of 58 annual business surveys integrated in
terms of content, collection and data processing
 Collect information on enterprise financial variables
 Collection period: February to early October
 Telephone pre-contact used for new units in the sample
 Mail questionnaires for initial data gathering
 Telephone follow-up conducted to collect data from nonrespondent and to resolve failed edits
3
Statistics Canada • Statistique Canada
18/07/2015
Unified Enterprise Surveys (UES)
 Score function is used to prioritize telephone follow-up for
non-response
• Score based on weighted sampling revenue
 For most of the UES surveys: no score function used for
failed edits follow-up
 Collection Processing System: Blaise
 Paradata in Blaise Transaction History files
4
Statistics Canada • Statistique Canada
18/07/2015
Integrated Business Statistics Program (IBSP)
 IBSP is under development to redesign and expand UES to
integrate other enterprise surveys and sub-annual surveys
 Goal:
• Reduce operating costs
• Enhance quality assurance
 IBSP will integrate 120 surveys by 2016 (phase 1: 2014)
 Electronic questionnaire (electronic data collection) will be
the principal collection mode offered to enterprise
5
Statistics Canada • Statistique Canada
18/07/2015
Current UES – Processing Model
Sampling
Collection
Processing
Analysis
Dissemination
 Collection, processing and analysis are run sequentially
 Estimates produced at very end only
 Collection ends at set date
6
Statistics Canada • Statistique Canada
18/07/2015
IBSP – Estimates Model
 Collection, processing and analysis will be run in parallel
 Estimates will be produced and re-run periodically
 Collection could end earlier when pre-specified quality
target has been met
Collection
Sampling
Processing
Dissemination
Analysis
7
Statistics Canada • Statistique Canada
18/07/2015
Active Collection
 Role: Manage follow-up of non-response and
measurement errors (failed edits)
 Responsive Design (Laflamme and Karaganis, 2010) or
Dynamic adaptative approach (Schouten, Calinescu and
Luiten, 2011) that uses data available during collection to
modify collection strategy
 Estimates and quality indicators will be produced
periodically throughout collection: e.g. monthly basis
 Then scores measuring impact on estimates and on
quality indicators are calculated to allocate and prioritize
telephone follow-up
8
Statistics Canada • Statistique Canada
18/07/2015
Basic Collection Strategy
Initial
S
Sampl
e
Successiv
e Designs
Observed
NR and
Response
9
Production of Intermediate Estimates
d1
d0
d2
di-2
di-1
NR1
NR2
NR3
NRi-1
NRi
R1
R2
R3
Ri-1
Ri
Statistics Canada • Statistique Canada
18/07/2015
Parameter and Estimator
 Variables of interest: Set of I key variables yi
 Parameters of interest: t yi  kU yik
 1 nh

ˆ
 Stratified expansion estimators: t yi    p h  y ik 
h 1 
k 1

H
 Sampling variances:
(under a stratified
Bernoulli design):
nh

1
1
2 
ˆ
ˆ
V (t yi )    p h (p h  1) yik 
h 1 
k 1

H
Where i, k and h identify respectively the I variables, the Nh units and the H strata
Nh = stratum population size
ph = unit sampling probability within stratum
nh = the stratum sample size
10
Statistics Canada • Statistique Canada
18/07/2015
Non-Response
 Response propensity model:
• Auxiliary data and paradata would be used to estimate response
propensities
 Estimation:
• In case on non-response, we will either use imputation or
reweighting to account for missing data
• Response propensities could be used to form imputation or
reweighting homogeneous classes for reducing the nonresponse bias (Haziza and Beaumont, 2007)
1




1
1 
1 
Stratified expansion estimators: tˆyi     p k yik   p k   p k  
c 1  k Rc
 kSc  kRc  

C
11
Statistics Canada • Statistique Canada
18/07/2015
Quality Indicators (QI)
 Role:
• Monitor collection progress
• Help to allocate and prioritize collection efforts
 Can be item-based
•
•
•
•
•
12
Specific to a variable of interest
Variance, CV
Item response rate of a variable of interest
Bias: Bˆ (tˆyi | ˆ X )  N Coˆv( yi , ˆ X ) / ˆ X
MSE: MSˆE(tˆ | ˆ )  Bˆ 2 (tˆ | ˆ )  Vˆ (tˆ | ˆ )
yi
X
yi
X
yi
X
Statistics Canada • Statistique Canada
18/07/2015
Quality Indicators (QI)
 Can be covariate-based
• Derived from statistics on the estimated response propensities
given the covariates X
• Independent from the variables of interest
 Examples of covariate-based QIs (Schouten, 2011) :
• Mean response propensity:
ˆ
• R-indicator:
• Standardized Maximal Bias:
Rˆ ( ˆ )  1  2Sˆ ( ˆ )
Bˆm ( ˆ X )  N Sˆ ( ˆ X ) / ˆ X
2 1 1
• Standardized Maximal Variance: Vˆm ( ˆ X )  N n ˆ X
• Standardized Maximal MSE:
13
MSˆEm ( ˆ X )  Bˆm2 ( ˆ X )  Vˆm ( ˆ X )
Statistics Canada • Statistique Canada
18/07/2015
Measure of Impact (MI) Scores
 Types of Scores
• Common types: Edit-related and estimate-related score functions
• Example: Predicted difference in estimates (Hedlin, 2008)
 Proposal:
ˆik  wk ( ~yik  yik ) / tˆy
i
Generalize the MI Score to include quality-related score functions
• For an estimated parameter ˆ (estimate or quality indicator)
 Definition:
~ ˆ
ˆ
MIk ( )  (k  ) / 
~
• Where  is the estimated parameter after changing reported values
k
~
and/or covariates of unit k respectively to yk and/or ~
xk and  is a
scaling factor
14
Statistics Canada • Statistique Canada
18/07/2015
Measure of Impact (MI) Scores
 MI Score for an estimated total:
~
MI k (tˆy )  ( ty ,k  tˆy ) / i
i
i
i
 p h1 ( ~
yik  yik ) / i
• Requires predicted values to compare to reported values
• Proposal: Use imputation to obtain predicted values
• Used to prioritize units for failed edit follow-up
15
Statistics Canada • Statistique Canada
18/07/2015
Measure of Impact (MI) Scores
 MI Score for item-based quality indicators
• MI Score for estimated sampling variance for expansion
estimators
~~
ˆ
ˆ
MI k (V (t yi ))  (V ( tyi )  Vˆ (tˆyi )) / i
 p 1 (p 1  1)(~
y 2  y2 ) / 
h
•
•
•
•
16
h
ik
ik
i
Specific to a variable of interest
Also use imputation to obtain predicted values
Linked directly to quality of output estimates
Prioritize units for failed edit follow-up
Statistics Canada • Statistique Canada
18/07/2015
Measure of Impact (MI) Scores
 MI Score for item-based quality indicator
~
MI  (MSE  MSˆE) / 
k
k
MSˆE(tˆyi | ˆ X )  Bˆ 2 (tˆyi | ˆ X )  Vˆ (tˆyi | ˆ X )
 MI Score for covariate-based quality indicator
~
MIk  (MSEk  MSˆE) / 
MSˆEm ( ˆ X )  Bˆm2 ( ˆ X )  Vˆm ( ˆ X )
 Used to prioritize units for both non-response and failed edit followup
17
Statistics Canada • Statistique Canada
18/07/2015
Active Collection Management
 A large number of variables to monitor
• Monitoring all of them will be a challenge
• Not all equally important
 Identify a limited number of key variables
 For each key variable
• Quality monitored using item-based QIs and MI Scores
 For the non-key variables
• Quality controlled using covariate-based QIs
18
Statistics Canada • Statistique Canada
18/07/2015
Active Collection Management
 MI scores for each estimated parameter and quality
indicator are considered local scores
 In order to prioritize units for telephone follow-up, global
score per unit is needed
 Derive global MI Score (Hedlin, 2008)
• Sum, maximum or Euclidian distance could be used
• Some QIs are appropriate for evaluating the impact of nonresponse and others for the impact of edit failures
• Derive one global score for non-response follow-up and one
global score for failed edit follow-up
19
Statistics Canada • Statistique Canada
18/07/2015
Control Quality with Covariated-Based QIs
ˆ k
• Goal: Increase the average of the response propensities
while improving their homogeneity.
ˆ
Sˆ ( ˆ )
k
20
Statistics Canada • Statistique Canada
18/07/2015
Control Quality with Covariated-Based QIs
ˆ k
• Goal: Increase the average of the response propensities
while improving their homogeneity.
~7
~4
ˆ
ˆ 4
~17
ˆ 7
~
~6
ˆ17
Sˆ ( ˆ )
15
ˆ15
ˆ 6
k
21
Statistics Canada • Statistique Canada
18/07/2015
Control Quality with Covariated-Based QIs
ˆ k
• Goal: Increase the average of the response propensities
while improving their homogeneity.
~4
ˆ
ˆ 4
Sˆ ( ˆ )
~6
ˆ 6
k
22
Statistics Canada • Statistique Canada
18/07/2015
Summary
Current Approach
Proposed Approach
• A score function with no link
with estimates
• Follow-up and editing for
influential units based on
Follow-up
estimates and quality
and editing
• Prioritization based on
• Prioritization based on frame
frame, paradata (dynamic)
(static) information
and estimates
• Results (and quality
• Produce results (and quality
Processing
measures) are known only at
measures) during collection
the end of the process
to manage collection
Cut-off
collection
23
• Based on weighted response • Based on achieved quality of
rate
estimates
Statistics Canada • Statistique Canada
18/07/2015
Summary
Quality
Indicators (QI)
24
Measure of Impact (MI)
Scores
Quality (accuracy) specific to a
domain and an estimate
Impact of a unit on an estimate or on
a quality indicator
Monitor collection and analysis
progress
Allocate and prioritize collection and
analysis efforts
•Proactively identify problems
•Assess quality of produced
estimates
•Close active collection
•Non-response and failed-edit followup
Statistics Canada • Statistique Canada
18/07/2015
Summary
Covariate-based QIs
25
Item-based QIs
Independent of survey variables
Related to survey variables
Used to all variables
Used with MI Scores to monitor
specified key variables
• Mean response propensity
• R-indicator
• Standardized Maximal Bias,
Variance and MSE
• Other…
• Item response rate
• Variance, CV
• Bias, MSE
• Other…
Statistics Canada • Statistique Canada
18/07/2015
Summary
Non-response follow-up
MI Scores
26
Failed Edit follow-up
MI Scores
One global score
One global score
• Item response rate
• Mean response propensity
• R-indicator
• Variance, CV
• Item-based MSE
• Standardized Maximal Bias,
Variance and MSE
• Estimated total
• Estimated sampling variance
• Variance, CV
• Item-based MSE
• Standardized Maximal Bias,
Variance and MSE
Statistics Canada • Statistique Canada
18/07/2015
Future Work
 Methodology development
• Response propensity model: development of a model based on
data and paradata
• Item-based and covariate-based QIs
 Validation of the proposed strategy
• Conduct simulation studies and develop prototypes using current
UES environment
• Summer 2011 prototype: response rates, imputation rate, CV and
MI scores
• Next prototype: Other local and global MI scores and QIs
27
Statistics Canada • Statistique Canada
18/07/2015
Discussion
 What quality indicators are appropriate to measure the risks of
potential bias in the estimates?
 What is the best way to use quality indicator (e.g. R-indicator)
to monitor collection of highly skewed business surveys?
 The proposed approach obviously affects the response
propensities throughout collection. Although we can adjust the
estimator later on to take this into account, is it something we
should move away from? Or should we take advantage of it?
 In the proposed approach, are there any additional aspects
that should be considered?
28
Statistics Canada • Statistique Canada
18/07/2015
Merci / Thank You
 For more information,
please contact:
 Pour plus d’information,
veuillez contacter :
Jeannine Claveau [email protected]
Serge Godbout
[email protected]
Claude Turmelle
29
[email protected]
Statistics Canada • Statistique Canada
18/07/2015