Transcript ac258 P18 e

Backcasting
United Nations Statistics Division
Overview
Any change in classifications creates a
break in time series, since they are suddenly
based on differently formed categories
Backcasting is a process to describe data
collected before the “break” in terms of the
new classification
Overview
 There is no single “best method”
 Factors influencing a decision include:
 type of statistical series that requires backcasting (raw data,
aggregates, indices, growth rates, ...)
 statistical domain of the time series
 availability of micro-data
 availability of "dual coded" micro-data (i.e. businesses are
classified according to both the old and the new
classification)
 length of the "dual coded" period
 frequency of the existing time series
 required level of detail of the backcast series
 cost / resource considerations
Main methods
“Micro-data approach” (re-working of
individual data)
“Macro-data approach” (proportional
approach)
Hybrids thereof
Micro-data approach
 Consists of assigning a new activity code (= new
classification) to all units in every period in the
past (as far back as backcasting is desired)
 No other change is required
 Statistics are then compiled by standard
aggregation
 Census vs. survey (weight adjustment issue)
Micro-data approach
 Census
 All in-scope unites are selected and therefore have a
weight of one.
 Each unit is therefore recoded and then the reaggregation can take place.
 Survey
 The non-observed units in the population have
influence on the outcome via sampling weights
 Therefore all units under the population (both observed
and non-observed) need to be coded
 Re-aggregation of the sample units under the new
classification can then occur.
Micro-data approach
 Requires detailed information from past
periods (for all units to be recoded)
 More detailed than just the old code
 If information is available, results are more
reliable than those from macro-approaches
Micro-data approach
 Issues:
 Resource intensive
 Need solutions if unit information is not available for
a period (not collected, not responded)
• Nearest neighbor
– Back calculation of the elementary unit is made
in the same way as made for the “closet unit”.
• Transition matrix approach
– Using conversion coefficient at the elementary
level
Macro-data approach
 Also called “proportional method”
 This method calculates a ratio (“proportion”,
“conversion coefficients”) in a fixed dual coding
period that is then applied to all previous periods
 The ratios are calculated at the macro level
 Could be based on number of units (counts) or
size variables such as turnover or
employment
 Has a more approximate character
Macro-data approach
 In simple form, applies growth rates of
former time series to the revised level for
the whole historical period
 More sophisticated methods may use
adjustments based on experts’ knowledge
 Example: mobile phones
Macro-data approach
 Assumes that the same set of coefficients
applies to all periods
 This means it is assumed that the
distribution of the variable of interest
has not changed between the old and the
new classification
 Applied to aggregates; does not consider
micro-data
 Relatively simple and cheap to implement
Macro-data approach: Steps




1 – estimation of conversion coefficients
• Done for dual-coding period
– Longer/multiple periods help in overcoming “infant
problems’ of the new classification and allow for
correction of data
– Based on selection of specific variable
2 – calculation of aggregates using the conversion
coefficients
• Weighted linear combination
3 – linking the different segments
• Old – overlap – new series
• Breaks caused by mainly by change in field of
observation
– Simple factor or “wedging”
4 – final adjustment
• Seasonal etc.
Macro-data approach:
Hypothetical example
Basics of conversion matrices
Makes use of a simple, artificial example
Convert from A to B.
a=3
(codes 1A, 2A, 3A)
b=5
(codes 1B, 2B, 3B, 4B,
5B)
N (Count) = 115
Dual-coded business register
Auxiliary, x
Unit, i
Class. A code
Class. B code
Count
Reg.
Emp
Reg,
Turn
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
5
8
9
11
1
3
3
4
5
6
6
8
13
21
3
3
5
5
6
8
88
135
483
418
386
51
153
231
353
331
219
239
76
167
1538
130
246
161
425
249
783
Derive summary totals
N
1A
2A
1B
5
2B
10
3A
20
30
Total
1B
35
40
2B
70
533
Total
568
281
351
651
4
4B
53
53
30
30
5B
60
115
4B
4
40
3A
651
16
15
2A
3B
16
Total
1A
25
3B
5B
Emp
OR
Total
984
984
105 1237 1265 2607
Conversion matrix A to B:
Conversion is via
coefficient
linear combination
counts Conversion
from 1A to 1B
beta
N
1B
2B
1A
2A
5
2A
3A
3A
1B
.33
2B
.67
.50
20
10
3B
1A
.50
30
3B
.40
4B
.10
16
4B
4
5B
5B
30
Total
Total
.50
15
40
60
1
1
1
Conversion is via linear
combinations
t y ,1B  0.33 t y ,1A  0.50 t y ,2 A
t y ,2B  0.67 t y ,1A
 0.50 t y ,3A
t y ,3B 
0.40 t y ,2 A
t y ,4B 
0.10 t y ,2 A
t y ,5B 
0.50 t y ,3A
… and the aggregate totals are the same:
t y ,1B  t y ,2B  t y ,3B  t y ,4B  t y ,5B  t y ,1A  t y ,2A  t y ,3A
Apply these proportions to
each time point …
Example – ISIC Rev3 to Rev.4
Conversion at the Section level
Denote turnover (y) of ISIC Rev.3 Section C,
D & E and out-of-scope unit (Z) by
yC , r 3 , y D , r 3 , y E , r 3 & y Z , r 3
Denote turnover (y) of ISIC Rev.4 Section B,
C, D & E and out-of-scope unit (Z) by
y B ,r 4 , yC ,r 4 , y D ,r 4 , y E ,r 4 ,& yZ ,r 3
Conversion matrix
Conversion coefficient from Rev3 Section C to Rev4 Section B

 y B ,r 4 

y 

C
,
r
4

 
 y D ,r 4    

 
y

 E ,r 4 

 yZ ,r 4 
  5 x1 
C ,r 3
B ,r 4
C ,r 3
C ,r 4
C ,r 3
D ,r 4
C ,r 3
E ,r 4
C ,r 3
Z ,r 4





D ,r 3
B ,r 4
D ,r 3
C ,r 4
D ,r 3
D ,r 4
D ,r 3
E ,r 4
D ,r 3
Z ,r 4





E ,r 3
B ,r 4
E ,r 3
C ,r 4
E ,r 3
D ,r 4
E ,r 3
E ,r 4
E ,r 3
Z ,r 4






  yC , r 3 
 y 
D ,r 3 

 
  y E ,r 3 
 y 
  Z ,r 3  4 x1

Z ,r 3
B ,r 4
Z ,r 3
C ,r 4
Z ,r 3
D ,r 4
Z ,r 3
E ,r 4
Z ,r 3
Z ,r 4 5 x 4
Turnover: Summary table
The turnover value of activities that is classified in
• Old classification: Rev.3 Section C
•New classification: Rev.4 Section B
Rev .3
C
D
E
Z
Total
Rev.4
B
19,829,202
0
0
14,178
19,843,380
C
211,632
1,297,621,607
2,142
4,975,276
1,302,810,657
D
0
101,624
147,814,407
25,793,423
173,709,454
E
6,834
7,712,001
8,342,747
18,977,634
35,039,216
Z
101,654
44,961,905
783,298
3,152,252,617
3,198,099,474
20,149,322
1,350,397,137
156,942,594
3,202,013,128
4,729,502,181
Total
Conversion matrix
Of the Rev.3 Section C activities,
Rev.4 Section C activities is a combination of 1.05 % Rev.3
Section C, 96.09% Rev.3 Section E, and 0.16% of Rev.3
activities that does not belong to the Rev.3 industrial sector
•98.41% is reclassified to Rev.4 Section B
•1.05 % is reclassified to Rev.4 Section C, and
so on
Rev .3
C
Rev.4
D
E
Z
B
0.9841
0.0000
0.0000
0.0000
C
0.0105
0.9609
0.0000
0.0016
D
0.0000
0.0001
0.9418
0.0081
E
0.0003
0.0057
0.0532
0.0059
Z
0.0050
0.0333
0.0050
0.9845
Total weight
1.0000
1.0000
1.0000
1.0000
Conversion via linear
combination
 Equations for converting total series from Rev.3 to Rev.4
are:
yB , r 4  0.984  yC , r 3  0.003 y99, r 3
yC , r 4  0.011 yC , r 3  0.961y D , r 3  0.002 yZ , r 3
y D , r 4  0.942 y E ,r 3  0.008 yZ ,r 3
yE , r 4  0.006 yD, r 3  0.053 yE , r 3  0.0056 yZ , r 3
Comparison
 Micro-data approach better retains structural evolution of
the economy
 Micro-data approach does not require choice of a special
variable
 Macro-data approach reflects evolution based on fixed
ratio for a fixed variable
 Seasonal patterns may be distorted
 Macro-data approach is more cost-efficient
 No consideration of micro-data necessary
 Assumptions underlying the macro-data approach become
invalid over longer periods
 “Benchmark years” might help to measure the
effect, if data is available
Other options
 Combinations of both approaches are possible
 Ratios for the macro-data approach could be
calculated for shorter periods only
 Micro-data approach could be used for specific
years and the macro-data approach for
interpolation between these years
• E.g. based on availability of census data
 Many factors can influence the choice (see
beginning) but data availability is a key practical
factor