Transcript ac258 P18 e
Backcasting
United Nations Statistics Division
Overview
Any change in classifications creates a
break in time series, since they are suddenly
based on differently formed categories
Backcasting is a process to describe data
collected before the “break” in terms of the
new classification
Overview
There is no single “best method”
Factors influencing a decision include:
type of statistical series that requires backcasting (raw data,
aggregates, indices, growth rates, ...)
statistical domain of the time series
availability of micro-data
availability of "dual coded" micro-data (i.e. businesses are
classified according to both the old and the new
classification)
length of the "dual coded" period
frequency of the existing time series
required level of detail of the backcast series
cost / resource considerations
Main methods
“Micro-data approach” (re-working of
individual data)
“Macro-data approach” (proportional
approach)
Hybrids thereof
Micro-data approach
Consists of assigning a new activity code (= new
classification) to all units in every period in the
past (as far back as backcasting is desired)
No other change is required
Statistics are then compiled by standard
aggregation
Census vs. survey (weight adjustment issue)
Micro-data approach
Census
All in-scope unites are selected and therefore have a
weight of one.
Each unit is therefore recoded and then the reaggregation can take place.
Survey
The non-observed units in the population have
influence on the outcome via sampling weights
Therefore all units under the population (both observed
and non-observed) need to be coded
Re-aggregation of the sample units under the new
classification can then occur.
Micro-data approach
Requires detailed information from past
periods (for all units to be recoded)
More detailed than just the old code
If information is available, results are more
reliable than those from macro-approaches
Micro-data approach
Issues:
Resource intensive
Need solutions if unit information is not available for
a period (not collected, not responded)
• Nearest neighbor
– Back calculation of the elementary unit is made
in the same way as made for the “closet unit”.
• Transition matrix approach
– Using conversion coefficient at the elementary
level
Macro-data approach
Also called “proportional method”
This method calculates a ratio (“proportion”,
“conversion coefficients”) in a fixed dual coding
period that is then applied to all previous periods
The ratios are calculated at the macro level
Could be based on number of units (counts) or
size variables such as turnover or
employment
Has a more approximate character
Macro-data approach
In simple form, applies growth rates of
former time series to the revised level for
the whole historical period
More sophisticated methods may use
adjustments based on experts’ knowledge
Example: mobile phones
Macro-data approach
Assumes that the same set of coefficients
applies to all periods
This means it is assumed that the
distribution of the variable of interest
has not changed between the old and the
new classification
Applied to aggregates; does not consider
micro-data
Relatively simple and cheap to implement
Macro-data approach: Steps
1 – estimation of conversion coefficients
• Done for dual-coding period
– Longer/multiple periods help in overcoming “infant
problems’ of the new classification and allow for
correction of data
– Based on selection of specific variable
2 – calculation of aggregates using the conversion
coefficients
• Weighted linear combination
3 – linking the different segments
• Old – overlap – new series
• Breaks caused by mainly by change in field of
observation
– Simple factor or “wedging”
4 – final adjustment
• Seasonal etc.
Macro-data approach:
Hypothetical example
Basics of conversion matrices
Makes use of a simple, artificial example
Convert from A to B.
a=3
(codes 1A, 2A, 3A)
b=5
(codes 1B, 2B, 3B, 4B,
5B)
N (Count) = 115
Dual-coded business register
Auxiliary, x
Unit, i
Class. A code
Class. B code
Count
Reg.
Emp
Reg,
Turn
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
5
8
9
11
1
3
3
4
5
6
6
8
13
21
3
3
5
5
6
8
88
135
483
418
386
51
153
231
353
331
219
239
76
167
1538
130
246
161
425
249
783
Derive summary totals
N
1A
2A
1B
5
2B
10
3A
20
30
Total
1B
35
40
2B
70
533
Total
568
281
351
651
4
4B
53
53
30
30
5B
60
115
4B
4
40
3A
651
16
15
2A
3B
16
Total
1A
25
3B
5B
Emp
OR
Total
984
984
105 1237 1265 2607
Conversion matrix A to B:
Conversion is via
coefficient
linear combination
counts Conversion
from 1A to 1B
beta
N
1B
2B
1A
2A
5
2A
3A
3A
1B
.33
2B
.67
.50
20
10
3B
1A
.50
30
3B
.40
4B
.10
16
4B
4
5B
5B
30
Total
Total
.50
15
40
60
1
1
1
Conversion is via linear
combinations
t y ,1B 0.33 t y ,1A 0.50 t y ,2 A
t y ,2B 0.67 t y ,1A
0.50 t y ,3A
t y ,3B
0.40 t y ,2 A
t y ,4B
0.10 t y ,2 A
t y ,5B
0.50 t y ,3A
… and the aggregate totals are the same:
t y ,1B t y ,2B t y ,3B t y ,4B t y ,5B t y ,1A t y ,2A t y ,3A
Apply these proportions to
each time point …
Example – ISIC Rev3 to Rev.4
Conversion at the Section level
Denote turnover (y) of ISIC Rev.3 Section C,
D & E and out-of-scope unit (Z) by
yC , r 3 , y D , r 3 , y E , r 3 & y Z , r 3
Denote turnover (y) of ISIC Rev.4 Section B,
C, D & E and out-of-scope unit (Z) by
y B ,r 4 , yC ,r 4 , y D ,r 4 , y E ,r 4 ,& yZ ,r 3
Conversion matrix
Conversion coefficient from Rev3 Section C to Rev4 Section B
y B ,r 4
y
C
,
r
4
y D ,r 4
y
E ,r 4
yZ ,r 4
5 x1
C ,r 3
B ,r 4
C ,r 3
C ,r 4
C ,r 3
D ,r 4
C ,r 3
E ,r 4
C ,r 3
Z ,r 4
D ,r 3
B ,r 4
D ,r 3
C ,r 4
D ,r 3
D ,r 4
D ,r 3
E ,r 4
D ,r 3
Z ,r 4
E ,r 3
B ,r 4
E ,r 3
C ,r 4
E ,r 3
D ,r 4
E ,r 3
E ,r 4
E ,r 3
Z ,r 4
yC , r 3
y
D ,r 3
y E ,r 3
y
Z ,r 3 4 x1
Z ,r 3
B ,r 4
Z ,r 3
C ,r 4
Z ,r 3
D ,r 4
Z ,r 3
E ,r 4
Z ,r 3
Z ,r 4 5 x 4
Turnover: Summary table
The turnover value of activities that is classified in
• Old classification: Rev.3 Section C
•New classification: Rev.4 Section B
Rev .3
C
D
E
Z
Total
Rev.4
B
19,829,202
0
0
14,178
19,843,380
C
211,632
1,297,621,607
2,142
4,975,276
1,302,810,657
D
0
101,624
147,814,407
25,793,423
173,709,454
E
6,834
7,712,001
8,342,747
18,977,634
35,039,216
Z
101,654
44,961,905
783,298
3,152,252,617
3,198,099,474
20,149,322
1,350,397,137
156,942,594
3,202,013,128
4,729,502,181
Total
Conversion matrix
Of the Rev.3 Section C activities,
Rev.4 Section C activities is a combination of 1.05 % Rev.3
Section C, 96.09% Rev.3 Section E, and 0.16% of Rev.3
activities that does not belong to the Rev.3 industrial sector
•98.41% is reclassified to Rev.4 Section B
•1.05 % is reclassified to Rev.4 Section C, and
so on
Rev .3
C
Rev.4
D
E
Z
B
0.9841
0.0000
0.0000
0.0000
C
0.0105
0.9609
0.0000
0.0016
D
0.0000
0.0001
0.9418
0.0081
E
0.0003
0.0057
0.0532
0.0059
Z
0.0050
0.0333
0.0050
0.9845
Total weight
1.0000
1.0000
1.0000
1.0000
Conversion via linear
combination
Equations for converting total series from Rev.3 to Rev.4
are:
yB , r 4 0.984 yC , r 3 0.003 y99, r 3
yC , r 4 0.011 yC , r 3 0.961y D , r 3 0.002 yZ , r 3
y D , r 4 0.942 y E ,r 3 0.008 yZ ,r 3
yE , r 4 0.006 yD, r 3 0.053 yE , r 3 0.0056 yZ , r 3
Comparison
Micro-data approach better retains structural evolution of
the economy
Micro-data approach does not require choice of a special
variable
Macro-data approach reflects evolution based on fixed
ratio for a fixed variable
Seasonal patterns may be distorted
Macro-data approach is more cost-efficient
No consideration of micro-data necessary
Assumptions underlying the macro-data approach become
invalid over longer periods
“Benchmark years” might help to measure the
effect, if data is available
Other options
Combinations of both approaches are possible
Ratios for the macro-data approach could be
calculated for shorter periods only
Micro-data approach could be used for specific
years and the macro-data approach for
interpolation between these years
• E.g. based on availability of census data
Many factors can influence the choice (see
beginning) but data availability is a key practical
factor