Evaluating the Quality of Editing and Imputation: the Simulation Approach M. Di Zio, U.
Download ReportTranscript Evaluating the Quality of Editing and Imputation: the Simulation Approach M. Di Zio, U.
Evaluating the Quality of Editing and Imputation: the Simulation Approach
M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute UN/ECE Work Session on Statistical Data Editing Ottawa, 16-18 May 2005
Outline
• Introduction • The simulation approach • Perfomance indicators • An example: the Istat software ESSE
Quality of E&I =
Accuracy
accuracy at micro level
Capability of editing of correctly identifying errors / the capability of imputation of correctly recovering true data
accuracy at macro level
Capability of editing/imputation of preserving the data distributions and target estimates The quality of E&I in terms of accuracy can be measured only when it is possible to compare the edited and imputed data with the corresponding
true
ones
Why evaluating the quality of E&I
Analysis of the performance of an editing/imputation method for a specific type of data/error under different data/error scenarios Improve the performance of an editing/imputation method for a specific type of data/error Choose among alternative editing/imputation methods for a specific type of data/error
The evaluation framework
“E&I represent additional sources of non sampling errors in the statistical production process” ? ?
?
? ?
? ? ?
? ?
True values
(Super-population/ Finite populatoin)
Observed (corrupted) values
Error/missing mechanisms Editing model
Localized errors
Imputation model
Final values
Evaluating the quality of E&I
The evaluation of the quality of editing and/or imputation has to be performed taking into account the other mechanisms involved in the statistical production process This correspond to measuring the effects on data induced by the editing and/or the imputation mechanisms
conditionally
to the other mechanisms influencing the survey results
The simulation approach
Artificial generation of some of the key elements of the evaluation framework based on predefined mechanisms/models
Controlled experiments
data distributions and data relations error and missing data mechanisms error and missing data incidence
Variability due to each stochastic mechanism
( repeated simulations)
Low cost
The simulation approach
High modelling effort
– true data – raw data
Simulation of true data
Let ( X 1 , …, X p ) be a random variable following the probability function F(x 1 , …, x p ; q ) F(x 1 , …, x p ; q )
unknown
parametric
techniques) approaches (specify a data model; estimate parameters; re-sampling
non parametric
approaches (no assumptions; re-sampling techniques)
Simulation of true data
Additional problems:
Modelling multivariate distributions (reproducing joint relations/dependencies between variables) Modelling asymmetric multivariate distributions Modelling under edit constraints
Simulation of raw data
Parametric/non parametric approaches:
Generating missing data
Generating errors (deviations from true data)
Simulation of missing data
Assumptions on non response mechanisms ( MCAR, MAR, NMAR ) Assumptions on the incidence of non response ( non response rates ) In multivariate contexts, modelling patterns of non response Assumptions on multivariate non response mechanisms (e.g. independence) Assumptions on rates of non response patterns
Simulation of errors
Assumptions on error mechanism ( EAR, ECAR, ENAR ) Assumptions on the incidence of errors ( error rates ) Assumptions on the intensity of errors ( error magnitude; intermittent nature of errors ) In a multivariate context, modelling error patterns: Assumptions on multivariate error mechanisms (e.g. independence) Assumptions on rates of error patterns Overlapping mechanisms (e.g. stochastic+ systematic) Simulation of errors under constraints
How to measure: evaluation indicators under the simulation approach
Evaluation objectives
Accuracy at micro level Accuracy w.r.t. distributions and target estimates
Indicators
Level ( micro/macro ; local/global ) Identification Priority
An Istat tool for evaluating E&I under the simulation approach
ESSE (Editing Systems Standard Evaluation) system (SAS language + SAS/AF environment)
Module for raw data simulation
Module for evaluation
Module for raw data simulation
Approach:
non parametric
Missing data mechanisms:
MCAR, MAR and independent non responses
Error mechanisms:
Completely At Random (ECAR) and independent errors (e.g. Misplacement errors, Interchange of values, Interchange errors, Loss or addition of zeroes ,….)
Module for evaluation
Assumptions
Editing is a classification procedure that assigns each raw value into one of two states: - (1) acceptable - (2) not acceptable Imputation affects only values previously classified by the editing process as unacceptable.
Imputation is successful if the new assigned value is equal to the original one
Module for evaluation
Evaluation objective: assessing the accuracy of E&I at values)
micro
level (capability to detect as many errors as possible; capability to to restore the true
Evaluation approach: single application
( no variability ) of E&I
Evaluation level: micro level
Indicators:
the number
local indicators
corrected errors ( hit rates ) based on of detected, undetected, introduced and
Future work at ISTAT
Identify standard measures to assess the accuracy of E&I at macro level Simulating multivariate patterns of errors/missing values (dependent errors/non response) Evaluating the impact of E&I on variability at micro/macro level