TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE BURDEN Rudi Seljak, Metka Zaletel Statistical Office of the Republic of.

Transcript TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE BURDEN Rudi Seljak, Metka Zaletel Statistical Office of the Republic of.

TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE BURDEN

Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia

Introduction

   Through the recent years the national statistical institutes have been constantly confronted with two challenges, which are especially outstanding in the case of the short-term business surveys:   How to improve the timeliness of the published data How to decrease response burden and the survey costs One of the lately most frequently used ways to fulfill at least some of these demands is a convenient use of different types of administrative data.

A lot of offices is in the last years exploring the possibilities of using the TAX data, which are originally used for the monthly settlement of the value added tax (VAT), for the purposes of the turnover indices estimation.

Introduction cont’d

    Statistical Office of the Republic of Slovenia (SORS) began to carry out the first systematic studies in this area in year 2005.

In 2005 the feasibility study was carried out, which explored the possibilities of using the VAT data for the purposes of the turnover indices estimation in the wholesale trade activity. On the basis of the results of this study the fundaments of the new methodology were set up.

This methodology was then adopted and applied to some other areas.

Main features of the new methodology

   One of the significant changes in the new methodology was the movement from the random sampling to the cut-off sampling procedure. The sampling error is “replaced” with the bias due to the omission of the part of the population. One of the goals of the feasibility study was the estimation of the range of this bias. The new methodology combines two types of data. For the small number of the largest units the classical post survey is carried out. For the majority of the units the “VAT data” are used. The statistical data processing is therefore significantly changed.

Feasibility study

    In the feasibility study we simulated the data collection process under the new methodology for all the months in 2003-2005 and then compared the “new” results with the originally published results.

The level of the turnover sometimes differed essentially but the movement, expressed in the form of the indices, was in most cases quite coherent.

As the main indicator of the coherence of the index time-series we used the coefficient of correlation. With the exception of some smaller domains, the coefficient was around 0.9. For the “problematic” domains we increased the number of units to be surveyed.

Comparison of time series obtained by two different methodologies

Month-to-month indices in Wholesale Trade activity 130,00 120,00 110,00 100,00 90,00 80,00 70,00 60,00 fe b.

03 ap r.0

3 ju n.

03 av g.

03 ok t.0

3 de c.

03 fe b.

04 ap r.0

4 ju n.

04 av g.

04 ok t.0

4 de c.

04 fe b.

05 ap r.0

5 ju n.

05 av g.

05 New methodology Old methodology

Main steps of the process

       Selection of the set of the observational units Selection of the set of the units to be surveyed Collection and editing of survey data Merging survey and administrative data Detection of outlying values by using the Hideroglou Berthelot method Imputation for non-response Aggregation and calculation of processing quality indicators

Selection process

  The whole procedure is carried out in two steps. In the first step the units of the target population are determined and then in the second step the units for which the data will still be obtained by the “classical survey” are selected. In the first step the units which fulfill one of the following criteria are selected:  The semi-annual turnover of the unit is more than 100,000 EUR.

 The semi-annual turnover of the unit is more than 50,000 EUR and the unit has at least 3 employees.

 The unit has at least 6 employees.

Selection process cont’d

    For the smaller part of the units, the data is still obtained by the post survey.

For the selection of the units to be surveyed, the target population groups.

is firstly sorted by the descending turnover in each of the activity Then so many of the largest units of the group are selected that the share of the turnover of the selected units exceeds the target share of the total turnover. The target share slightly differs between the activity groups, but it is generally between 50% and 60%.

The number of units to be surveyed is approximately 2% of the whole target population.

Selection process - schematic presentation

Admin.

data Business register Survey data

t r S o e d d t a a

Selected population Units to be surveyed Units for the admin. data

Merging data from different sources

    The data are entering the process by using the two different channels. Each of the set of the data is firstly separately edited by using some consistency checks.

Data from different sources are merged into one table and each data on turnover is assigned with the suitable status.

This status contains information about the data collection method and the information whether the data was corrected through the editing process or not.

The values of the status are assigned according to the standard 4-digit classification used at the SORS.

Merging data from different sources – schematic presentation

Tax data database Survey data

10010 10230

TURNOVER

124323 572

STATUS

11.12

21.11

   

Statistical editing

When the data are merged together we use Hidiroglu-Berthelot method to detect the outliers.

The methods explores the distribution of month to-month growth rate to find extreme values. The main goal is to detect the “extreme leaps” in the turnover, estimated from the VAT data. These leaps are usually the consequence of the methodological difference between administrative and statistical data. Such problems mostly occur in the case when the enterprise sells the real property. This purchase money is reported to the tax authorities but it shouldn’t be included in the turnover.

Imputation procedures

   In the imputation process we impute the missing values as well as the values which were in the statistical editing process designated as the extreme values Three different imputation methods are used:  Estimation of monthly data from quarterly data (only at the end of each quarter).

  Historical Trend Method (only for the units with the data from previous month).

Mean Value method.

For each imputed data, through the values of the statuses the reason for imputation as well as the imputation method is recorded.

Editing and imputation – schematic presentation

Merged data H-B method Detection of outliers Imputations Imputed data ID 10010 10230 13213 TURNOVER STATUS 124323 11.12

Null 572 21.11

Null OUTLIER Y N Null ID 10010 10230 13213 TURNOVER STATUS 19345 1 2 .

13 28122 572 21.11

41.14

Quality indicators

    Using the values of the statuses, where all the “process changes” were recorded, the set of quality indicators is automatically calculated. Two types of quality indicators are calculated: micro and macro indicators. An example of the micro indicators is the imputation rate, which is defined as the rate of the data which have been imputed through the process. An example of the macro indicators is the relative difference between the index calculated from all the data and the index calculated from the non imputed data.

Quality indicators cont’d

 All the quality indicators are calculated automatically and inserted into the excel spreadsheet template. The indicators for the last 13 months could also be presented graphically.

MAR06 Response rate

Domain1 Domain2 97,6% 99,5%

APR06

97,5% 99,4%

MAY06

97,4% 99,3%

JUN06

97,2% 99,3%

JUL06

97,2% 98,8%

AUG06

97,3% 98,8%

SEP06

97,1% 98,5%

OCT06

97,1% 98,4%

NOV06

96,8% 98,0%

DEC06

93,7% 94,7%

JAN07

76,4% 92,2%

FEB07

76,1% 91,5%

MAR07

73,3% 87,8% Domain3 Domain4 95,2% 99,1% 95,1% 99,1% 94,9% 99,1% 94,8% 99,1% 95,4% 97,5% 95,1% 97,5% 94,9% 97,0% 94,8% 97,0% 94,5% 96,6% 91,5% 94,1% 54,6% 90,4% 54,4% 89,1% 53,1% 87,0%

Response rate for Domain4

105,0% 100,0% 95,0% 90,0% 85,0% 80,0% FE B0 6 M A R 06 A PR 06 M A Y0 6 JU N 06 JU L0 6 A U G 06 S EP 06 O C T0 6 N O V0 6 D E C 06 JA N 07 FE B0 7 M A R 07

Quality indicators cont’d

 One of the macro indicators compares indices, calculated from the whole set of data with the indices calculated just from the “survey data” and the indices, calculated just from the “admin data”.

Indices - all data Domain1 FEB06 MAR06 APR06 MAY06 JUN06 JUL06 AUG06 SEP06 OCT06 NOV06 DEC06 JAN07 FEB07 MAR07

99,31 120,81 93,19 113,05 104,07 102,30 111,01 74,03 101,96 116,77 95,28 122,94 100,00

Indices - VAT data

90,00 93,11 108,22 104,08 98,51 95,24 110,95 99,83 105,64 100,35 83,40 99,89 111,26 All data Field data 98,68 123,37 65,20 104,46 123,12

Domain1

60,00 50,00 40,00 94,20 110,23 106,60 104,54 FEB06 APR06 JUN06 AUG06 OCT06 DEC06 FEB07

Benefits of the new system

      The new methodology represents a radical change in the process of the production of the short-term indices. Although there are some deficiencies of the new system, the benefits far overcome them. The largest benefit of the new methodology is the essential reduction of the response burden as well as the reduction of the survey costs. To quantify the benefits of the new methodology we estimated the burden and cost reduction, both of them expressed in the “man-days unit”.

The estimation was done for two areas “Hotels and restaurants” and “Services”. In the chart we present the cost and burden for year 2006, when the old methodology was still used, compared with the year 2007 when we launched the new methodology.

Respond burden and cost reduction

Response burden and costs

1200 1000 800 600 400 200 0 2006 Burden 2007 2006 2007 Costs at SORS Hotels and restaurants Services

Conclusions

     SORS started to implement the new methodology for the estimation of the monthly turnover indices in 2006.

The new methodology combines two different sources. Survey data for smaller part and administrative data for larger part of the units. Allthough there are differences in the methodological definitions of the turnover, all the studies showed that the admin data could be well used for the purposes of the short-term statistics. The new methodology means an essential decrease of the costs and the response burden.

The new methodology is planned to be widened to the retail trade activity in year 2008.

TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE BURDEN Rudi Seljak, Metka Zaletel Statistical Office of the Republic of.

Transcript TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE BURDEN Rudi Seljak, Metka Zaletel Statistical Office of the Republic of.