Transcript ac258 P18 e
Backcasting United Nations Statistics Division Overview Any change in classifications creates a break in time series, since they are suddenly based on differently formed categories Backcasting is a process to describe data collected before the “break” in terms of the new classification Overview There is no single “best method” Factors influencing a decision include: type of statistical series that requires backcasting (raw data, aggregates, indices, growth rates, ...) statistical domain of the time series availability of micro-data availability of "dual coded" micro-data (i.e. businesses are classified according to both the old and the new classification) length of the "dual coded" period frequency of the existing time series required level of detail of the backcast series cost / resource considerations Main methods “Micro-data approach” (re-working of individual data) “Macro-data approach” (proportional approach) Hybrids thereof Micro-data approach Consists of assigning a new activity code (= new classification) to all units in every period in the past (as far back as backcasting is desired) No other change is required Statistics are then compiled by standard aggregation Census vs. survey (weight adjustment issue) Micro-data approach Census All in-scope unites are selected and therefore have a weight of one. Each unit is therefore recoded and then the reaggregation can take place. Survey The non-observed units in the population have influence on the outcome via sampling weights Therefore all units under the population (both observed and non-observed) need to be coded Re-aggregation of the sample units under the new classification can then occur. Micro-data approach Requires detailed information from past periods (for all units to be recoded) More detailed than just the old code If information is available, results are more reliable than those from macro-approaches Micro-data approach Issues: Resource intensive Need solutions if unit information is not available for a period (not collected, not responded) • Nearest neighbor – Back calculation of the elementary unit is made in the same way as made for the “closet unit”. • Transition matrix approach – Using conversion coefficient at the elementary level Macro-data approach Also called “proportional method” This method calculates a ratio (“proportion”, “conversion coefficients”) in a fixed dual coding period that is then applied to all previous periods The ratios are calculated at the macro level Could be based on number of units (counts) or size variables such as turnover or employment Has a more approximate character Macro-data approach In simple form, applies growth rates of former time series to the revised level for the whole historical period More sophisticated methods may use adjustments based on experts’ knowledge Example: mobile phones Macro-data approach Assumes that the same set of coefficients applies to all periods This means it is assumed that the distribution of the variable of interest has not changed between the old and the new classification Applied to aggregates; does not consider micro-data Relatively simple and cheap to implement Macro-data approach: Steps 1 – estimation of conversion coefficients • Done for dual-coding period – Longer/multiple periods help in overcoming “infant problems’ of the new classification and allow for correction of data – Based on selection of specific variable 2 – calculation of aggregates using the conversion coefficients • Weighted linear combination 3 – linking the different segments • Old – overlap – new series • Breaks caused by mainly by change in field of observation – Simple factor or “wedging” 4 – final adjustment • Seasonal etc. Macro-data approach: Hypothetical example Basics of conversion matrices Makes use of a simple, artificial example Convert from A to B. a=3 (codes 1A, 2A, 3A) b=5 (codes 1B, 2B, 3B, 4B, 5B) N (Count) = 115 Dual-coded business register Auxiliary, x Unit, i Class. A code Class. B code Count Reg. Emp Reg, Turn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 5 8 9 11 1 3 3 4 5 6 6 8 13 21 3 3 5 5 6 8 88 135 483 418 386 51 153 231 353 331 219 239 76 167 1538 130 246 161 425 249 783 Derive summary totals N 1A 2A 1B 5 2B 10 3A 20 30 Total 1B 35 40 2B 70 533 Total 568 281 351 651 4 4B 53 53 30 30 5B 60 115 4B 4 40 3A 651 16 15 2A 3B 16 Total 1A 25 3B 5B Emp OR Total 984 984 105 1237 1265 2607 Conversion matrix A to B: Conversion is via coefficient linear combination counts Conversion from 1A to 1B beta N 1B 2B 1A 2A 5 2A 3A 3A 1B .33 2B .67 .50 20 10 3B 1A .50 30 3B .40 4B .10 16 4B 4 5B 5B 30 Total Total .50 15 40 60 1 1 1 Conversion is via linear combinations t y ,1B 0.33 t y ,1A 0.50 t y ,2 A t y ,2B 0.67 t y ,1A 0.50 t y ,3A t y ,3B 0.40 t y ,2 A t y ,4B 0.10 t y ,2 A t y ,5B 0.50 t y ,3A … and the aggregate totals are the same: t y ,1B t y ,2B t y ,3B t y ,4B t y ,5B t y ,1A t y ,2A t y ,3A Apply these proportions to each time point … Example – ISIC Rev3 to Rev.4 Conversion at the Section level Denote turnover (y) of ISIC Rev.3 Section C, D & E and out-of-scope unit (Z) by yC , r 3 , y D , r 3 , y E , r 3 & y Z , r 3 Denote turnover (y) of ISIC Rev.4 Section B, C, D & E and out-of-scope unit (Z) by y B ,r 4 , yC ,r 4 , y D ,r 4 , y E ,r 4 ,& yZ ,r 3 Conversion matrix Conversion coefficient from Rev3 Section C to Rev4 Section B y B ,r 4 y C , r 4 y D ,r 4 y E ,r 4 yZ ,r 4 5 x1 C ,r 3 B ,r 4 C ,r 3 C ,r 4 C ,r 3 D ,r 4 C ,r 3 E ,r 4 C ,r 3 Z ,r 4 D ,r 3 B ,r 4 D ,r 3 C ,r 4 D ,r 3 D ,r 4 D ,r 3 E ,r 4 D ,r 3 Z ,r 4 E ,r 3 B ,r 4 E ,r 3 C ,r 4 E ,r 3 D ,r 4 E ,r 3 E ,r 4 E ,r 3 Z ,r 4 yC , r 3 y D ,r 3 y E ,r 3 y Z ,r 3 4 x1 Z ,r 3 B ,r 4 Z ,r 3 C ,r 4 Z ,r 3 D ,r 4 Z ,r 3 E ,r 4 Z ,r 3 Z ,r 4 5 x 4 Turnover: Summary table The turnover value of activities that is classified in • Old classification: Rev.3 Section C •New classification: Rev.4 Section B Rev .3 C D E Z Total Rev.4 B 19,829,202 0 0 14,178 19,843,380 C 211,632 1,297,621,607 2,142 4,975,276 1,302,810,657 D 0 101,624 147,814,407 25,793,423 173,709,454 E 6,834 7,712,001 8,342,747 18,977,634 35,039,216 Z 101,654 44,961,905 783,298 3,152,252,617 3,198,099,474 20,149,322 1,350,397,137 156,942,594 3,202,013,128 4,729,502,181 Total Conversion matrix Of the Rev.3 Section C activities, Rev.4 Section C activities is a combination of 1.05 % Rev.3 Section C, 96.09% Rev.3 Section E, and 0.16% of Rev.3 activities that does not belong to the Rev.3 industrial sector •98.41% is reclassified to Rev.4 Section B •1.05 % is reclassified to Rev.4 Section C, and so on Rev .3 C Rev.4 D E Z B 0.9841 0.0000 0.0000 0.0000 C 0.0105 0.9609 0.0000 0.0016 D 0.0000 0.0001 0.9418 0.0081 E 0.0003 0.0057 0.0532 0.0059 Z 0.0050 0.0333 0.0050 0.9845 Total weight 1.0000 1.0000 1.0000 1.0000 Conversion via linear combination Equations for converting total series from Rev.3 to Rev.4 are: yB , r 4 0.984 yC , r 3 0.003 y99, r 3 yC , r 4 0.011 yC , r 3 0.961y D , r 3 0.002 yZ , r 3 y D , r 4 0.942 y E ,r 3 0.008 yZ ,r 3 yE , r 4 0.006 yD, r 3 0.053 yE , r 3 0.0056 yZ , r 3 Comparison Micro-data approach better retains structural evolution of the economy Micro-data approach does not require choice of a special variable Macro-data approach reflects evolution based on fixed ratio for a fixed variable Seasonal patterns may be distorted Macro-data approach is more cost-efficient No consideration of micro-data necessary Assumptions underlying the macro-data approach become invalid over longer periods “Benchmark years” might help to measure the effect, if data is available Other options Combinations of both approaches are possible Ratios for the macro-data approach could be calculated for shorter periods only Micro-data approach could be used for specific years and the macro-data approach for interpolation between these years • E.g. based on availability of census data Many factors can influence the choice (see beginning) but data availability is a key practical factor