Diapositiva 1

Download Report

Transcript Diapositiva 1

Quality Challenges in Processing
Administrative Data to Produce
Short-term Labour Cost Statistics
M. Carla Congia, Silvia Pacini,
Donatella Tuzi ([email protected])
Istat - Italy
European Conference on Quality 2008 in Official Statistics
Session on Administrative data.
Rome, 8–11 July 2008
Administrative
data
Session
Presentation Outlines
 The Italian Oros Survey
 The peculiarities of the administrative source used
 The quality strategy in a context of timely and extensive use
of administrative data
 Final remarks
Q2008. Rome, 8-11 July 2008
Administrative
data
Session
The Oros Survey
Since 2003 the Italian NSI has released quarterly indicators on gross
wages and total labour cost (Oros Survey) covering all size enterprises
in the private non-agricultural sector. Indices are released 70 days after
the end of the reference quarter.
In the past this information was monthly collected only for large firms
through the Survey on Large Enterprises (> 500 employees).
The Oros Survey was planned to fill this gap in the Italian statistics, using
administrative data (employees’ social contribution declarations to the
National Social Security Institute - INPS) for Small and Medium
Enterprises, integrated with the survey data on Large Enterprises (LES).
Nowadays, in Italy the Oros Survey is an innovative example of
administrative data extensively used to produce timely business
statistics
Q2008. Rome, 8-11 July 2008
Administrative
data
Session
The Administrative Sources
All Italian non-agricultural firms in the private sector, with at least one
employee (roughly 12 million employees and 1.3 million employers per
year) have to pay monthly social security contributions to INPS.
INPS administrative register (AR)
Contains structural information for each administrative unit
(administrative id., fiscal code, name, legal form, dates of registration
and cancellation, etc.). About 4 million records each quarter.
Transmitted at the end of the reference quarter.
Employers monthly declaration (DM10 form)
Highly detailed grid organized in administrative codes with information
on employment by type, paid days, wage bills, social contributions,
credit terms and tax relieves. Each DM10 lays in more records (on
average 8 records per unit). About 10 million records each month.
Transmitted 35 days after the end of the reference quarter.
Q2008. Rome, 8-11 July 2008
Administrative
data
Session
Peculiarities of the Administrative Source
Differently from Survey data, the use of an administrative source:
 reduces the financial costs of a direct collection and avoids further
response burden on enterprises;
 satisfies the growing demand for timely and detailed statistical
information, for multiple statistical aims.
Yet, data collection is beyond the NSI control (that needs information
about the quality of the administrative data used).
Strict relationships and coordination with the administrative institutions
help to reduce the risks to incur in data quality problems due to the
dependence from the data supplier.
In this, the Oros Survey does not differ from other register-based
statistics.
Q2008. Rome, 8-11 July 2008
Administrative
data
Session
Peculiarities of the Administrative Source (2)
What makes the Oros Survey peculiar with respect to other registerbased statistics is its release timeliness, that obliged Istat to acquire
data without any previous check and aggregation (completely raw).
Unusual statistical quality aspects are implied:
 the processing of a huge quantity of complex data in a very short
time;
 the lack of standardized metadata to translate administrative
information;
 the continuous changes of administrative definitions and concepts.
The acquisition of raw information allows Istat to monitor most of the
processing aspects, but an hard work is needed to guarantee a high
standard of quality.
A pervasive strategy of quality has been implemented, covering the
whole Oros production process.
Q2008. Rome, 8-11 July 2008
Administrative
data
Session
The Quality Strategy in the Oros Production Process
DM10
micro data
Preliminary checks and
retrieval of the statistical variables
Administrative
Register (AR)
Treatment of measurement
errors (micro editing)
Treatment of non-response errors
(imputation of temporary
employment agencies)
The large firms:
integration with survey data
Checks on macro data
Oros Survey
indicators
Q2008. Rome, 8-11 July 2008
Metadata
Database
Administrative
data
Session
The Administrative Register
The AR is used as a representation of the current population.
But:
 it suffers of over-coverage problems (temporary suspensions and
firm closures are under-recorded);
 the economic activity code is drawn from the Italian Business
Register (BR) (90% of the Oros active units);
 hard work to outline the estimation frame (exclusion of units not
belonging to the Oros target population);
 special attention to the quality of the fiscal code as leading matching
variable.
Q2008. Rome, 8-11 July 2008
Administrative
data
Session
Preliminary Checks and Retrieval of the Statistical Variables
Meta-information on laws, regulations, contribution rates, codes and
other technical aspects of Social Security is timely collected and
updated in a standardized METADATA DATABASE in-house built. It is
necessary to carry out:
 preliminary checks on raw data and correction of errors on codes,
record duplications, incoherencies with current legislation;
 translation of the administrative data into statistical variables,
through complex additions and subtractions of a huge number of
wage and contribution items identified by numerous administrative
codes (actually more than 5,000);
 estimation of some components for which information is not
available in the administrative form (e.g. Employers’ injuries insurance
premium and severance payment).
In this step each DM10 is reorganized in 1 record.
Q2008. Rome, 8-11 July 2008
Administrative
data
Session
Treatment of Measurement Errors
Once statistical data have been made available a more traditional
micro editing procedure is set up…but…
…given the huge number of units, it is strongly based on selective
criteria. A score function assigns to each of the 1.3 million of units the
probability that an error occurs in the target variables.
Cut-off thresholds are fixed to select anomalous values, but their
identification is deeply affected by the significant tails in the distribution
of the target variables:
 very low per capita wages (e.g. units with only supplementary
earnings);
 negative per capita other labour costs (e.g. social contribution
rebates).
Q2008. Rome, 8-11 July 2008
Figure 1 – Distribution of the per capita other labour costs (euro values) in the Oros
manufacturing small and medium enterprises – July 2007 -
15.0
12.5
%
10.0
7.5
5.0
2.5
0
-1,350
-975
-600
-225
150
525
900
1,275
1,650
2,025
2,400
2,775
3,150
3,525
Per capita other labour costs
Mean= 450
Median= 430
Max= 6,900
Min= -1,350
3,900
4,275
4,650
5,025
5,400
5,775
Treatment of Measurement Errors (2)
The edit and imputation rules are based on known functional relations
among the analyzed variables and are aimed at evaluating and
keeping at unit record level both cross-sectional and longitudinal
consistency using information on the closest months.
The number of monthly edits is generally not high but even an
oversight may have a significant effect.
6.0
Quarterly changes of the Oros wage
index in the Wholesale and retail trade
sector (G) – In the third quarter 2007,
the number of employees of a unit was
affected by a measurement error: part
time workers 73,000. Imputed data: 2.
5.0
4.0
3.0
2.0
1.0
Series w ith measurement error
2007Q3
2007Q2
2007Q1
2006Q4
2006Q3
2006Q2
2006Q1
2005Q4
2005Q3
2005Q2
0.0
2005Q1
Administrative
data
Session
Would have implied a change of 0.8%
instead of 3%.
Corrected series
This step is mainly interactive. Given the nature of data, by
experience automatic corrections are avoided
Q2008. Rome, 8-11 July 2008
Administrative
data
Session
Treatment of Non-response Errors
In the Oros Survey non-responses are units delivering the DM10 with
a delay. Nevertheless, almost the 95-98% of the Oros population is
represented by the preliminary administrative data.
Given the tested MAR nature of the missing units and their limited
number in the preliminary data, they do not significantly affect the Oros
wage and other labour cost changes.
Units referred to Temporary Employment Agencies (TEA) are an
exception, because of their strong characterization.
About 100 units accounting for the 3% of total employment in the
private sector (20% in sector K - Real estate, renting and business
activities).
The absence of even few of these units may significantly impact
on changes of the per capita indicators
Q2008. Rome, 8-11 July 2008
Administrative
data
Session
Treatment of Non-response Errors (2)
The single out of TEA unit non-responses is not an easy task:
 the population under study is represented by the current AR which
suffers of over-coverage problems (a list of respondents is not
available). It follows that the unit active status must be predicted,
through a longitudinal analysis of the unit activity in the nearby
quarters;
 given the strong dynamic nature of TEA, an hard work is necessary
to follow their frequent changes (e.g. mergers, split-ups, etc.) over
time to separate real non-responses from non-active units.
Imputation of missing data is deterministic and widely based on the
use of past information on non-respondents and panel information on
the current respondents.
Q2008. Rome, 8-11 July 2008
Administrative
data
Session
Integration with Survey Data on Large Enterprises
In the Oros estimates a special attention is given to Large
Enterprises (firms with more than 500 employees - LE). In the Italian
non-agricultural sector LE account for about 1000 units employing 2
million workers.
In the past integration of survey data on LE was strongly motivated by
a non-significant representation of these units in the preliminary
administrative data.
Nowadays the INPS source guarantees a good coverage of these
units but, as experience has suggested, the use of the statistical
source provides higher quality data:
 enterprise recalling in case of non-responses or suspected
measurement errors;
 more rapid and efficient management of the frequent legal changes
these units are subjected to (e.g. mergers, split-ups, acquisitions etc.).
Q2008. Rome, 8-11 July 2008
Administrative
data
Session
Integration with Survey Data on Large Enterprises (2)
Combining Survey and administrative data, specific quality aspects are
involved :
 harmonisation of variables;
 record matching: the fiscal code is the main linking variable, but
ambiguities may happen because of formal errors or different
updating time in the two sources (mergers, hive-offs, split-ups might
be recorded in several periods). Big efforts are aimed at avoiding
omissions and duplications, using supplementary information (legal
name, number of employees etc.).
About 12% of LES employment is manually reviewed and
matched to the correspondent administrative firms.
Q2008. Rome, 8-11 July 2008
Administrative
data
Session
Checks on Macro Data
Final checks on macro data are a key step in the quality target to
identify possible residual errors that may affect the estimates. These
checks are mainly based on:
 analytic and graphical inspection of the time series at a subpopulation detail: acceptance boundaries must be respected by predefined statistical measures;
 automatic detection of outliers based on TERROR, an application of
the software TRAMO-SEATS, where the detection of suspected errors
is based on REG-ARIMA model estimates;
 comparison with other statistical source figures (e.g. National
Accounts, Indices of wages according to collective agreements, etc.);
 variable relationships, whose coherence has to be guaranteed (e.g.
the ratio of other labor costs on wages, etc.).
If any error is detected, a drill-down to micro data may be necessary
Q2008. Rome, 8-11 July 2008
Administrative
data
Session
Internal Oros Quality Reporting
The quarterly documentation and updating of the Oros production
process is a fundamental task in the general strategy of quality:
 metadata are archived;
 methodological information is documented;
 imputed data are flagged (and pre-imputation data are archived);
 quality indicators on the impact of imputation are calculated.
The documentation of the Oros process guarantees
its reproducibility and repeatability
Q2008. Rome, 8-11 July 2008
Administrative
data
Session
Final Remarks
The Oros Survey was:
 developed with any previous experience in the use of administrative
data for the production of short term official statistics;
 gradually implemented learning by doing.
High timeliness, frequent changes in Social Security laws and
regulations and strongly detailed raw data imply relevant and unusual
quality problems managed through:
 strict relationships and coordination with the administrative institution;
 pervasive quality strategy along the whole production process;
 highly skilled human resources to handle the wide and nonconventional processing aspects, subjected to frequent modifications;
 systematic documentation of the production steps.
Less “standardizable” than a traditional survey quality strategy?
Q2008. Rome, 8-11 July 2008
Administrative
data
Session
References
Baldi C., Ceccato F., Cimino E., Congia M.C., Pacini S., Rapiti F., Tuzi D.
(2004) Use of Administrative Data to produce Short Term Statistics on
Employment, Wages and Labour Cost. Essays, n.15/2004, Istat, Rome.
Caporello G., Maravall A. (2002) A tool for quality control of time series data.
Program TERROR. Bank of Spain.
Eurostat (2003) Quality assessment of administrative data for statistical
purposes. Doc. Eurostat/A4/Quality/03/item6, available on the web site:
http://epp.eurostat.ec.europa.eu/pls/portal/docs/PAGE/PGP_DS_QUALITY/TA
B47141301/DEFINITION_2.PDF
Istat, CBS, SFSO, Eurostat (2007) Recommended Practices for Editing and
Imputation in Cross-Sectional Business Surveys, available on the web site:
http://edimbus.istat.it/dokeos/document/document.php?openDir=%2FRPM_ED
IMBUS
Thank you for your attention
Donatella Tuzi
[email protected]
Q2008. Rome, 8-11 July 2008