QUALITY ASSESSMENT OF THE REGISTER-BASED SLOVENIAN CENSUS 2011 Rudi Seljak, Apolonija Flander Oblak Statistical Office of the Republic of Slovenia.

Download Report

Transcript QUALITY ASSESSMENT OF THE REGISTER-BASED SLOVENIAN CENSUS 2011 Rudi Seljak, Apolonija Flander Oblak Statistical Office of the Republic of Slovenia.

QUALITY ASSESSMENT OF
THE REGISTER-BASED
SLOVENIAN CENSUS 2011
Rudi Seljak, Apolonija Flander Oblak
Statistical Office of the Republic of
Slovenia
The schedule of the presentation
• Administrative data in the statistical process
• Quality assessment concepts
• Quality assessment and the statistics based on the
administrative sources
• Slovenian plans for the register based census and its
quality assessment
• Conclusions
Administrative data in the statistical
process
• Administrative data have been used in the official
statistics for a long time.
• Firstly the administrative data have been used mostly for
the sampling frame construction and sample selection.
• Later the development of the statistical theory introduced
the usage of these data in the estimation process in
order to improve the accuracy of the statistical results
(calibration techniques).
• Lately the administrative data are more and more used
also as a direct data source.
• Use of the administrative data as the direct data source
usually causes significant changes in the statistical
process.
Administrative data at the Statistical Office of
the Republic of Slovenia (SORS)
• SORS has a long history of the usage of the
administrative data in the statistical process.
• Many administrative registers that are now under
maintenance of other authorities have been set-up and
for some time also maintained by SORS.
• Administrative data are used in many surveys as a
supplementary data source (e.g. EU-SILC).
• The 2002 Census of Population and Housing was still a
combined one, while the next one in 2011 is already
planned to be fully register-based.
Quality assessment and reporting in the
ESS
• The general quality assessment framework, which has been
in the last decade widely accepted inside the European
statistical system, is based on the definition of the six quality
components.
• For each component the set of standard quality indicators,
which should provide the numerical assessment of the quality,
was defined.
• The six components are: relevance; accuracy; timeliness and
punctuality; accessibility and clarity; comparability; coherence.
Costs and burdens is the (seventh) additional component.
• For the presentation purposes we will divide the components
in two parts: the product oriented and the process oriented.
Quality components - the model
Quality components
Product oriented
Process oriented
Relevance
Coherence
Comparability
Accuracy
Timeliness and punctuality
Accessibility and clarity
Costs and burden
Process oriented components and the
“classical“ statistical process
Sampling frame
construction,
sample selection
Coverage
errors,
sampling
errors
Data
collection
Data
processing
Dissemination of
the statistical
results
Response
errors,
measurement
errors
Processing
errors
Timeliness of the first
results, Number of
means used for the
dissemination
Accuracy
Timeliness and
punctuality,
accessibility and
clarity
Process oriented components and the
“register-based“ statistical process
Sampling frame
construction,
sample selection
Data
collection
Data
processing
Relevance of the
variables, Up-todate of the
register data
Matching
errors,
consistency
errors
Processing
errors
Relevance
Accuracy
Dissemination of
the statistical
results
Timeliness of the first
results, Number of
means used for the
dissemination
Timeliness and
punctuality,
accessibility and
clarity
The Slovenian forthcoming census
• The last, 2002 Slovenian census was already partly
based on the administrative data sources.
• The 2011 census is planned to be fully register-based.
• The main administrative sources which will define the
target population will be:
– Central Population Register
– e-Database of Households
– Register of Dwellings
• Many other data sources will be linked to the target
populations in the data integration phase.
The register-based census – main
obstacles
• A the moment the main obstacles concerning the data
sources for the register-based census are:
– The Register of Dwellings is still in the establishment phase
– The quality of the data in the e-Database of Households is not
satisfactory (maintained by the Ministry of the Interior) since it
has been just recently established.
– Data on relationships among persons in the CPR are not as
complete as they should be so that they would enable SORS to
easily identify all relationships among persons in the families.
RU
RD
RTU
SREP
. . .
FORMAL CONSISTENCY CHECKS
RH
TARGET
POPULATION
(RD)
MISSING AND ERRORNEOUS DATA IMPUTATION
CONSISTENCY AND DISTRIBUTIONAL CHECKS
DISSEMINA-TION
DATA BASE:
HOUSEHOLDS AND
FAMILIES
DISSEMINA-TION
DATA BASE:
BUILDINGS AND
DWELLINGS
DATA DISCLOSURE PROCEDURES
TARGET
POPULATION
(CPR)
DATA INTEGRATION - LINKING
AND MATCHING OF DATA
CPR, RT
DATA INTEGRATION - LINKING
AND MATCHING OF DATA
CONVERSION OF IDENTIFIERS TO SID
REGISTRATION OF SOURCES
The register-based census – the foreseen
process
DISSEMINA-TION
DATA BASE:
POPULATION
Census quality assessment – prior
activities
• Identification of the relevant administrative and statistical
sources.
• For each source the following information will be documented:
– name of the source,
– description of the source,
•
•
•
•
•
date of the establishment,
legal basis,
identifiers used,
system of updating,
structure
– method and frequency of acceptance of the source at SORS,
– data editing at SORS
Census quality assessment – the
process
• Registration, conversion of the identifiers, validity checks
– Information of the registration of the source, evidence about the
failed validity checks
– consistency with the predefined range of variables in the source
• Set up of the two basic target populations (persons, houses
and dwellings)
– Consistency with the theoretical description of the population
– Detection of the duplicates
Census quality assessment – the
process cont’d
• Integration of the different sources
– The emphasis will be on the numerical assessment of the quality
– At the moment there is a lack of the standard quality indicators
for this phase of the process
– Two examples of the possible quality indicators: the matching
rate, the relative consistency rate
• Matching rate. The rate of the records, successfully matched (direct
or statistical) to the reference population.
• The relative consistency rate: rate of the units for which the
condition |YR- YA |/ YR < p (e.g. p=0.01) is fulfilled
Census quality assessment – posterior
activities
• Comparability of the results. The main focus will be on the
comparability with the 2002 (mostly) conventional census.
• Coherence with the results of the with the results of some
other, “classical” statistical surveys such as EU-SILC, LFS
and HBS.
• The costs and burdens will be estimated and the comparison
with the previous census will be made.
• At the end the comprehensive quality report which should
cover all the before mentioned components will be prepared
and published on the internet.
Conclusions
• The movement from a conventional to a register-based
statistical survey demands an adjusted approach in the data
quality assessment.
• The relevance of the administrative and statistical sources
and the relevance of the statistical variables, derived from
these sources, is the crucial component of the new approach.
• The exhaustive documentation which should describe the
different aspects of the incoming sources should become an
important part of the quality report.
• The set of standard quality indicators should be adopted for
the case of register-based census.