Diapositiva 1

Download Report

Transcript Diapositiva 1

Quality in statistics:
the BR case
Session: Quality indicators and quality measurement
of Statistical Registers
10 July 2008
Monica Consalvi – Giuseppe Garofalo – Caterina Viviano
Italian National Statistical Institute
Business Register vs Statistical Survey
Quality
in
statistics:
the
BR case
BRs are statistical products with their own specificities:
• Extensive use of Administrative data
• Heterogeneity and variability of inputs
• Relevance of technological aspects
• Output specificity (dissemination of micro data)
• Heterogeneity of users
• Continuous data updating
2
Business Register vs Statistical Survey – Quality specificities
Quality
in
statistics:
the
BR case
Extensive use of Administrative data
The problem of quality is set in a different context – in
comparison with statistical surveys – it is resolvable only
ex-post: data is known but not how it is generated
Heterogeneity and variability of inputs
Quality indicators for specific subsets of units and for
different variables are necessary
Relevance of technological aspects
Huge amount of data, complex procedures for data
integration and methodologies application, changes over
time in applied rules (e.g. changes in classification, in adm.
sources contents….)
Output specification
The dissemination of micro data suggests that “errors annul
each other on average” is not true anymore. With reference
to BR errors add one to another (e.g. over and under
coverage)
Business Register vs Statistical Survey – Quality specificities
Quality
in
statistics:
the
BR case
Heterogeneity of users
• The BR’s reference universe and updating period will be
different if used for the STS rather than for SBS
• If the Value Added is estimated referring to BR’s universe, the
quality (e.g. activity code and size) of large units will be
fundamental.
• If the indicators of the Business Demography take the BR as
reference, the quality of the smaller units will be very important.
Continuous data updating
Need to identify actual and spurious changes:
 structural development of the economy
•
demographic aspects,
•
changes in size
•
changes in economic activity
 process of revision of the register
• the BR may acquire data referring to a previous time
• actual changes recorded at a later time
• delay in recording birth/death or in recording changes in
characteristics in the administrative registers
The BR quality indicators
Quality
in
statistics:
the
BR case
The system of quality indicators refers to three
dimensions:
1. The phases of the BR’s updating process
2. A framework of components of the quality
3. The factors for the building up of the indicators
5
The phases of the BR’s updating process
Quality
in
statistics:
The
BR case
The BR is the result of a conceptual and physical
integration of several administrative and statistical
input sources
1)
Quality of the INPUT (input sources)
2)
Quality of the process (matching, merging,
editing, updating)
3)
Quality of the OUTPUT
6
A framework of components of the quality
Quality
in
statistics:
the
BR case
To monitor the BR quality the most frequently used
components are:
- Coverage in terms of both units and variables
- Timeliness in terms of delay in updating
- Completeness
- Accuracy
7
A methodological
process
fordel
assessing
La qualità
registro
variables
coming
from
ASIA
administrative sources
The factors for the building up of the indicators
Five factors for defining a BR quality indicator:
time, scope, subpopulation, variable and criterion
Quality
in
statistics:
the
BR case
The most important factor is the criterion : a method to
evaluate, unit by unit, the correctness of the variables’ values
of the interest
•
•
•
•
Compliance
Internal Consistency
Temporal Consistency
Metadata
8
Criteria (1)
Quality
in
statistics:
the
BR case
1. Compliance
The value of a unit of the BR can be considered as correct if it is
sufficient “close” to the reference value (external sources).
The compliance determines whether or not the BR complies with an
ex. source
The compliance comes close to the reliability when the real
value is not known
2. Internal consistency
A value will be deemed “correct” if it is coherent in relation to
other variables of the same unit.
9
A methodological
process
fordel
assessing
La qualità
registro
variables
coming
from
ASIA
administrative sources
Quality
in
statistics:
the
BR case
Criteria (2)
3. Temporal consistency
The quality is defined on the basis of a comparison between
two values in two different periods.
Big changes in short temporal lags are defined as
impossible or less plausible
4. Quality without ‘witness’ (use of metadata)
Usage of a set of information included in the BR to measure
quality without needing a reference value and with no
element of comparison - variables of BR management or
metadata system: validity date, estimation methodology, origin
of data, data validation process.
10
A methodological
process for assessing
variables coming from
administrative sources
Quality
in
statistics:
the
BR case
Phase: Input / Component : timeliness /
Factor: temp. consistency
Source: Social Security
Indicator: Percentage of records with declared
employees by month
JANFEB
MAR
APR
MAY
JAN
JIU
FEB
MAR
APR
JUL
MAY
AUG
JUN
JUL
Supply_2004
AUG
OCT
Sep
DEC
NOV
OCT
57%
DEC
Sep
NOV
71%
Supply_2005
11
A methodological
process for assessing
variables coming from
administrative
sources
Quality
in
statistics:
the
BR case
Phase: Input / Component : coverage /
Factor: temp. consistency
Source: Chamber of Commerce
Indicator: Loss of information in dates of cessation
Supply’s year
2001
2002
2003
2004
2005
2006
BR reference year
Cessation
date
2000
2001
2002
2003
2004
2005
2000
2001
2002
2003
I(t)%
2004
2005
1[N(t+1)/N(t)
332.878
19
374.341
100
19
20
194.634
350.462
384.199
178
31
19
-9,6
14
30.055
408.291
419.144
36
19
-2,7
-
-
129.661
358.822
369.815
35
-3,1
-
-
4
130.247
357.907
380.778
-6,4
-
-
-
28
79.721
384.272
A methodological
process for assessing
variables coming from
administrative
sources
Phase: Process / Component : accuracy
/ Factor: metadata
Indicator: Variables Edit and Imputation
Quality
in
statistics:
the
BR case
VAR
INDICATOR
N° edit
NACE
N° imputation
N° edit without imputation
VAR
Empl.
INDICATOR
It=2005
It=2005%
202.333
1,85 %
87.628
43,31 %
114.705
56,69 %
It=2005
It=2005%
0,68 %
N° edit
74.312
N° imputation
72.768
97,92 %
1.544
2,08 %
N° edit without imputation
A methodological
process for assessing
variables coming from
administrative
sources
Quality
in
statistics:
the
BR case
Phase: Process / Component : accuracy
/ Factor: int. consistency
Source: Tax Authority
Indicator: out-of-date classification
INDICATOR
N° record with out-of-date classification that
are not decoded using NACE Rev 1.1
It=2005
725.697
It=2005% Var_I[t-(t-1)]
9,53 %
0,84 %
Phase: Output / Component : accuracy /
Factor: compliance
Quality
in
statistics:
the
BR case
Source: SME sample survey
Indicator: differences in address and activity status
10,0
8,0
6,0
Addres
4,0
Activity Status
2,0
0,0
1999 2000 2001 2002 2003 2004 2005
15
Phase: Output / Component : accuracy /
Factor: temp. consistency
Quality
in
statistics:
the
BR case
Indicator: coherence in activity status
time series
(t-2)_(t-1)_(t)
001
000
111
110
100
011
010
101
Population
Entries
Out never active
Active
Exits
Exits in t-1 and not active
Entries in t-1 and active
Dis-activations
Reactivations
Ij= 100 –[(xkj * ek) / xkj * 100]
I2005 = 97.8
N
442,352
2,275,196
3,597,559
313,413
225,868
365,097
54,848
52,567
error
0
0
0
0
0
0
1.5
1.5
The BR’s Quality Declaration (QD)
Quality
in
statistics:
the
BR case
QD is a complex system of quality indicators
QD is based on the concept of transparency:
to supply all the meaningful and useful tools to
measure different quality components in relation to
each stage of the process. QD consists of a rich
documentation made up of a set of important direct
and indirect indicators, having a time dimension for
data, sources and variables.
QD contains:
- meta-data
- a set of indicators easily to be interpreted
17
The BR’s Quality Declaration (QD)
A methodological
process for assessing
variables coming from
administrative sources
Quality
in
statistics:
the
BR case
1. Phases of the process
2. Components
3. Factors
Input
C: timeliness, coverage, completeness,
F: temporal consistency, internal consistency
Process C: coverage, accuracy
F : temporal consistency, internal consistency,
metadata
Output
C: timeliness, coverage, completeness, accuracy
F: compliance, internal consistency, metadata
18
The BR’s Quality Declaration (QD)
Quality
in
statistics:
the
BR case
37 Indicators have been identified:
Criteria
Timeliness
Compliance
Int. Consistency
Temp. Consistency
Metadata
2
Components
Coverage
Completeness
INPUT
Accuracy
7
1
3
PROCESS
Compliance
Int. Consistency
Temp. Consistency
Metadata
2
2
1
3
OUTPUT
Compliance
Int. Consistency
Temp. Consistency
Metadata
2
2
4
6
1
1
19
A methodological
process for assessing
variables coming from
administrative sources
Quality
in
statistics:
the
BR case
The BR’s Quality Declaration (QD)
1. Quality of Input
Component – 1.1 Completeness
1.1.1 ) Address, s=CCIAA: Number of records ( % weight) with missing information
INDICATOR
COMPUTATION
It=2005
VI=It=2005 - It=2004
Records with missing % weight
address (cciaa)
(abs.number of records)
0.49
(37,408)
-0.03
2. Quality of process
Component – 2.1 Coverage
1) Number of records, s=CCIAA, not matched with the base MEF
INDICATOR
COMPUTATION
It=2005
VI=It=2005 - It=2004
Not matched Records % weight
(cciaa)
(abs. number of records)
5.25
(338,304)
0.03
3. Quality of output
Component – 3.2 Timeliness
3.2 Lag, in days, between dissemination time of BR and reference year of data
INDICATOR
COMPUTATION
It=2005
VI=It=2005 - It=2004
Timeliness of
dissemination
BR Days of delay between the 492
dissemination time and the
reference year of data
+24
20
The BR’s Quality Declaration (QD)
Quality
in
statistics:
the
BR case
The QD has been disseminated to internal users
for the first time in 2007
Problems not solved yet:
1. Dissemination of a different version for
external users - containing only meta-data and
indicators on quality of output.
2. The necessity to obtain a synthetic view of the
proposed indicators using “compound
indicators”.
3. Internal users were involved in the discussion
around QD, but a deeper analysis of their
suggestions has not been considered yet.
21