UNECE Work Session on Statistical Data Editing Vienna April 2008 Topic ii – Editing Administrative Data and Combined Sources.

Download Report

Transcript UNECE Work Session on Statistical Data Editing Vienna April 2008 Topic ii – Editing Administrative Data and Combined Sources.

UNECE Work Session on
Statistical Data Editing
Vienna
April 2008
Topic ii – Editing Administrative Data and
Combined Sources
Introduction
• Statistical Agencies rely on administrative data to
improve the quality of statistics, reduce costs and
response burden
• Administrative data are not originally designed for
use as statistical data and need to undergo
extensive processing and editing
• In recent years, more emphasis on the use of tax
data to augment business statistics and register
data to augment social and economic data
Introduction
• Combining multiple sources of data presents new
challenges: ensuring quality in line with statistical
standards and coherence across different sources.
• Papers cover:
– Effective methods for adjusting administrative data to
statistical use
– Improving the usability of business and population
registers
– Construction of quality statistical databases using
effective E&I strategies which ensure correct coverage,
consistent and clean records
– Enhancing the quality and efficiency of estimates from
surveys
Introduction
• Papers:
– WP.8 Italy: The editing process in the Italian shortterm survey on Labour Cost based on administrative
data
– WP.9
New Zealand: E&I of administrative data
used for producing business statistics
– WP.10 Norway: Role of edit and imputation in
integration of sources for structural business statistics
– WP.11 Norway: Prediction and imputation in ISEE:
tools for more efficient use of combined data sources
Introduction
• Papers:
– WP.12 Austria: Quality of administrative data – a
challenge for the maintenance of the statistical business
register
– WP.13 France: The future system of French
structural business statistics: the role of the estimates
– WP.14 Italy: Combining survey and administrative
data in the Italian EU-SILC experience: positive and
critical aspects
– WP.15 Netherlands: Editing Strategies for VAT Data
Presentations
The editing process in the Italian short-term survey
on labour cost based on administrative data
• M. Carla Congia, Silvia Pacini and Donatella Tuzi –
Italian National Statistical Institute (Istat)
• Steps in process:
– Preliminary checks on administrative data and retrieval
of statistical variables
– Micro data editing (cross sectional and longitudinal
checks)
– Imputation of eligible unit non-responses
– Large enterprise checks and combination with survey
data
– Macro editing based on time series analysis
The editing process in the Italian short-term survey
on labour cost based on administrative data
• Interesting points
– Integration of administrative and survey data and
identifying errors
– Combining many processes in an integrated setting
– Recognition of the importance of metadata for
administrative data – changes in concepts and definitions
– Macro editing using time series methods for automated
detection of outliers
E&I of administrative data used for producing
business statistics
• Vera Costa, Frances Krsinich and Rudi Van der
Mescht – Statistics New Zealand
• Challenges with using administrative data from the
private sector
– Electronic Card Transactions Data obtained as
aggregated data from switch companies
– Must rely on companies for ensuring quality data
– Discussion of time series models for identifying outliers
and carrying out imputation
E&I of administrative data used for producing
business statistics
• Building a longitudinal business database
– Integrating survey data, tax data and business sampling
frame using a deterministic record linkage process
– Donor imputation for missing/erroneous data from tax
files
– Expanding methods of imputation to take into account
historical values and other fine-tuning mechanisms
• Interesting points
– Advantages and disadvantages of time series methods
for macro editing on aggregated administrative data
– Need to consider practicality and feasibility for large
scale production systems when analyzing imputation
methods
The Role of E&I in integration of sources for
structural business statistics
• Svein Gasemyr, Svein Nordbotten and Morton
Anderson – Statistics Norway
• Integrated longitudinal business database from
multiple sources
– Estimate enterprise accounts distribution for complex
enterprises
– Aggregations from Job files
– Imputation of input and output production variables,
imputation of non-response and out of survey units
– Corrections to enhance record linkage
– Need for more computer based methods and support for
editing
The Role of E&I in integration of sources for
structural business statistics
• Standardized modules for editing and estimation
– Imputation and estimation carried out interactively
– Inspect effect of changed values on the estimates
• Interesting points
– Quality information for integrated databases as opposed
to single source databases with emphasis on errors in
linking data and inconsistencies
Prediction and imputation in ISEE: tools for
more efficient use of combined data sources
• Li-Chun Zhang and Svein Nordbotten – Statistics
Norway
• Standardization of data processing for combined
data sources
– Editing individual data
– Estimation of population parameters
• Integrating multiple sources by constructing a
complete population data file
– Imputation for non-response and out of sample
– Nearest neighbour imputation method with restrictions
on totals
Prediction and imputation in ISEE: tools for
more efficient use of combined data sources
• Interesting points
– Good review and discussion of imputation methods
– Innovative new method for imputing out of sample units
– Development of a generic statistical application
The Future System of French Structural
Business Statistics: the Role of Estimates
• Philippe Brion - INSEE
• Combining administrative sources and a statistical
survey
– Breakdown of turnover and NACE code only available in
the sample
• Analysis of statistical estimates produced by
mass-imputation versus weighting
– Imputation of APE code for out of sample units can be
biased
– Weights calibrated to 3-digit APE code with adjustments
based on the survey outcome at the 4-digit level
The Future System of French Structural
Business Statistics: the Role of Estimates
• Interesting points
– Good discussion of advantages and disadvantages of
mass imputation versus weighting to obtain population
estimates
– Consideration of editing strategies: micro edits and
selective editing based on scores and “jack-knifed”
ratios
Combining Survey and Administrative data in
the Italian EU-SILC Experience
• Claudio Ceccarelli, Lucia Coppola, Andrea Cutillo
and Davide Di Laurea
• Use of administrative data in the social survey EU-
SILC
– Tracking individuals for a longitudinal survey
– Linking tax registers to reduce impact of item nonresponse and other measurement errors (recall effects,
telescoping, etc.)
• Problems related to timeliness and comparability of
data sources
– Need for integrated processing systems, understanding
of complexities, more time to process data
Combining Survey and Administrative data in
the Italian EU-SILC Experience
• Interesting points
– A good discussion is provided on the advantages and
disadvantages of incorporating administrative data at
different stages of the survey process
– Interesting analysis of estimation methods for
calculating survey weights
Quality of Administrative Data – a challenge for the
maintenance of the Statistical Business Register
• Norbert Rainer – Statistics Austria
• Main administrative data sources for the Business
Register
• Quality issues
– Linking data sources
– Different definitions
– Continuity procedures
– Missing data
• Improvement strategies
Quality of Administrative Data – a challenge for the
maintenance of the Statistical Business Register
• Interesting points
– Data issues leading to a need for using imputation
• Data available only for a higher aggregation level
within the business
• Timeliness
• Annual data only, when monthly data is needed
• Not all activities covered
• Data available only for enterprises above a certain
threshold
Editing Strategies for VAT Data
• Peter Kruiskamp – Statistics Netherlands
• VAT data
– Current use: auxiliary variable
– Future use: source for turnover data (small / medium
businesses)
• Editing strategy
– Micro editing on the fiscal units level
– Data handling on the statistical units level
– Macro editing on the aggregates level
Editing Strategies for VAT Data
• Data used for the production of the Short-Term
Statistics – data frequency
• Interesting points
– Consideration of time series model for estimating VAT to
overcome seasonal effects
– Discussion of cut off points for identifying outliers
– Need to move from using VAT data as auxiliary versus
VAT data as a source of data
Questions for discussion
• Editing Administrative Data
– Combine collections then edit versus edit each collection
then combine
– Editing / imputation of back data
– Keeping track of changes in administrative data definitions
– Parameters for outlier detection under multiple sources of
data
– Use of time series models to identify outliers and
impute/estimate unit record values: advantages and
disadvantages
– Automation of macro editing when a large number of
series are produced
Questions for discussion
• Assessing quality
– New statistical tools/methods for assessing coherency
between sources and linking errors
– Types of quality indicators for integrated databases
– The impact of the timeliness of the data sources on the
quality of the data
– Does the need for practical and feasible production
systems reduce the quality of the data
– Variance estimation in combined data sources, especially
when an auxiliary is an estimate or when massive
imputation is carried out
Questions for discussion
• Weighting versus mass-imputation
– Most papers opted for mass imputation and square
datasets and a few papers opted for weighting – pros and
cons
– Methods for building “square” datasets when linking
administrative sources to survey data
Thank you for your attention