UNECE Work Session on Statistical Data Editing Vienna April 2008 Topic ii – Editing Administrative Data and Combined Sources.
Download ReportTranscript UNECE Work Session on Statistical Data Editing Vienna April 2008 Topic ii – Editing Administrative Data and Combined Sources.
UNECE Work Session on Statistical Data Editing Vienna April 2008 Topic ii – Editing Administrative Data and Combined Sources Introduction • Statistical Agencies rely on administrative data to improve the quality of statistics, reduce costs and response burden • Administrative data are not originally designed for use as statistical data and need to undergo extensive processing and editing • In recent years, more emphasis on the use of tax data to augment business statistics and register data to augment social and economic data Introduction • Combining multiple sources of data presents new challenges: ensuring quality in line with statistical standards and coherence across different sources. • Papers cover: – Effective methods for adjusting administrative data to statistical use – Improving the usability of business and population registers – Construction of quality statistical databases using effective E&I strategies which ensure correct coverage, consistent and clean records – Enhancing the quality and efficiency of estimates from surveys Introduction • Papers: – WP.8 Italy: The editing process in the Italian shortterm survey on Labour Cost based on administrative data – WP.9 New Zealand: E&I of administrative data used for producing business statistics – WP.10 Norway: Role of edit and imputation in integration of sources for structural business statistics – WP.11 Norway: Prediction and imputation in ISEE: tools for more efficient use of combined data sources Introduction • Papers: – WP.12 Austria: Quality of administrative data – a challenge for the maintenance of the statistical business register – WP.13 France: The future system of French structural business statistics: the role of the estimates – WP.14 Italy: Combining survey and administrative data in the Italian EU-SILC experience: positive and critical aspects – WP.15 Netherlands: Editing Strategies for VAT Data Presentations The editing process in the Italian short-term survey on labour cost based on administrative data • M. Carla Congia, Silvia Pacini and Donatella Tuzi – Italian National Statistical Institute (Istat) • Steps in process: – Preliminary checks on administrative data and retrieval of statistical variables – Micro data editing (cross sectional and longitudinal checks) – Imputation of eligible unit non-responses – Large enterprise checks and combination with survey data – Macro editing based on time series analysis The editing process in the Italian short-term survey on labour cost based on administrative data • Interesting points – Integration of administrative and survey data and identifying errors – Combining many processes in an integrated setting – Recognition of the importance of metadata for administrative data – changes in concepts and definitions – Macro editing using time series methods for automated detection of outliers E&I of administrative data used for producing business statistics • Vera Costa, Frances Krsinich and Rudi Van der Mescht – Statistics New Zealand • Challenges with using administrative data from the private sector – Electronic Card Transactions Data obtained as aggregated data from switch companies – Must rely on companies for ensuring quality data – Discussion of time series models for identifying outliers and carrying out imputation E&I of administrative data used for producing business statistics • Building a longitudinal business database – Integrating survey data, tax data and business sampling frame using a deterministic record linkage process – Donor imputation for missing/erroneous data from tax files – Expanding methods of imputation to take into account historical values and other fine-tuning mechanisms • Interesting points – Advantages and disadvantages of time series methods for macro editing on aggregated administrative data – Need to consider practicality and feasibility for large scale production systems when analyzing imputation methods The Role of E&I in integration of sources for structural business statistics • Svein Gasemyr, Svein Nordbotten and Morton Anderson – Statistics Norway • Integrated longitudinal business database from multiple sources – Estimate enterprise accounts distribution for complex enterprises – Aggregations from Job files – Imputation of input and output production variables, imputation of non-response and out of survey units – Corrections to enhance record linkage – Need for more computer based methods and support for editing The Role of E&I in integration of sources for structural business statistics • Standardized modules for editing and estimation – Imputation and estimation carried out interactively – Inspect effect of changed values on the estimates • Interesting points – Quality information for integrated databases as opposed to single source databases with emphasis on errors in linking data and inconsistencies Prediction and imputation in ISEE: tools for more efficient use of combined data sources • Li-Chun Zhang and Svein Nordbotten – Statistics Norway • Standardization of data processing for combined data sources – Editing individual data – Estimation of population parameters • Integrating multiple sources by constructing a complete population data file – Imputation for non-response and out of sample – Nearest neighbour imputation method with restrictions on totals Prediction and imputation in ISEE: tools for more efficient use of combined data sources • Interesting points – Good review and discussion of imputation methods – Innovative new method for imputing out of sample units – Development of a generic statistical application The Future System of French Structural Business Statistics: the Role of Estimates • Philippe Brion - INSEE • Combining administrative sources and a statistical survey – Breakdown of turnover and NACE code only available in the sample • Analysis of statistical estimates produced by mass-imputation versus weighting – Imputation of APE code for out of sample units can be biased – Weights calibrated to 3-digit APE code with adjustments based on the survey outcome at the 4-digit level The Future System of French Structural Business Statistics: the Role of Estimates • Interesting points – Good discussion of advantages and disadvantages of mass imputation versus weighting to obtain population estimates – Consideration of editing strategies: micro edits and selective editing based on scores and “jack-knifed” ratios Combining Survey and Administrative data in the Italian EU-SILC Experience • Claudio Ceccarelli, Lucia Coppola, Andrea Cutillo and Davide Di Laurea • Use of administrative data in the social survey EU- SILC – Tracking individuals for a longitudinal survey – Linking tax registers to reduce impact of item nonresponse and other measurement errors (recall effects, telescoping, etc.) • Problems related to timeliness and comparability of data sources – Need for integrated processing systems, understanding of complexities, more time to process data Combining Survey and Administrative data in the Italian EU-SILC Experience • Interesting points – A good discussion is provided on the advantages and disadvantages of incorporating administrative data at different stages of the survey process – Interesting analysis of estimation methods for calculating survey weights Quality of Administrative Data – a challenge for the maintenance of the Statistical Business Register • Norbert Rainer – Statistics Austria • Main administrative data sources for the Business Register • Quality issues – Linking data sources – Different definitions – Continuity procedures – Missing data • Improvement strategies Quality of Administrative Data – a challenge for the maintenance of the Statistical Business Register • Interesting points – Data issues leading to a need for using imputation • Data available only for a higher aggregation level within the business • Timeliness • Annual data only, when monthly data is needed • Not all activities covered • Data available only for enterprises above a certain threshold Editing Strategies for VAT Data • Peter Kruiskamp – Statistics Netherlands • VAT data – Current use: auxiliary variable – Future use: source for turnover data (small / medium businesses) • Editing strategy – Micro editing on the fiscal units level – Data handling on the statistical units level – Macro editing on the aggregates level Editing Strategies for VAT Data • Data used for the production of the Short-Term Statistics – data frequency • Interesting points – Consideration of time series model for estimating VAT to overcome seasonal effects – Discussion of cut off points for identifying outliers – Need to move from using VAT data as auxiliary versus VAT data as a source of data Questions for discussion • Editing Administrative Data – Combine collections then edit versus edit each collection then combine – Editing / imputation of back data – Keeping track of changes in administrative data definitions – Parameters for outlier detection under multiple sources of data – Use of time series models to identify outliers and impute/estimate unit record values: advantages and disadvantages – Automation of macro editing when a large number of series are produced Questions for discussion • Assessing quality – New statistical tools/methods for assessing coherency between sources and linking errors – Types of quality indicators for integrated databases – The impact of the timeliness of the data sources on the quality of the data – Does the need for practical and feasible production systems reduce the quality of the data – Variance estimation in combined data sources, especially when an auxiliary is an estimate or when massive imputation is carried out Questions for discussion • Weighting versus mass-imputation – Most papers opted for mass imputation and square datasets and a few papers opted for weighting – pros and cons – Methods for building “square” datasets when linking administrative sources to survey data Thank you for your attention