WORKSHOP ON SCANNER DATA Geneva 10 May 2010 Joint presentation by Ragnhild Nygaard (Statistics Norway) and Heymerik van der Grient (Statistics Netherlands)
Download ReportTranscript WORKSHOP ON SCANNER DATA Geneva 10 May 2010 Joint presentation by Ragnhild Nygaard (Statistics Norway) and Heymerik van der Grient (Statistics Netherlands)
WORKSHOP ON SCANNER DATA Geneva 10 May 2010 Joint presentation by Ragnhild Nygaard (Statistics Norway) and Heymerik van der Grient (Statistics Netherlands) Historical overview – NL Supermarkets Mid 90s: first contacts with chain(s) 2002: first implementation: 1/2 chain(s) Yearly Laspeyres (labour intensive) Construction of yearly basket of items Manual linking of items to COICOP-groups Manual replacement of disappearing items Reduction of ca 10 000 monthly price quotes in field survey Historical overview – NL, cont Supermarkets 2010: extension: 6 chains Monthly chained Jevons (efficient process) No manual linking of items No explicit replacements Extra reduction of ca 5 000 monthly price quotes in field survey Historical overview – N 1997: first contact with one chain Gradually contact with more chains Implementation in the CPI only price information of specific representative items 2002: scanner data from all the chains (no questionnaires - big incentive) Aug 2005: expanded use for COICOP 01 price and quantity information for all items in representative outlets Questions to be answered when dealing with scanner data How/Where require scanner data? Which statistical method? How to link items to COICOP? How to deal with all kind of particularities in data? Development of new computer system? Source of scanner data Market research companies Cleaned data (very) expensive Two-stage delivery chain (timeliness) Companies/Chains Raw data Cheap (NL/N do not pay) Direct contact with original supplier Negotiations with companies Time consuming process Negotiations can take up to a year or more including meetings, sending test data, analysing data etc. Be aware of some company establishing costs e.g. preparing the data extractions Can company provide what you want/need? E.g. information to link items to COICOP automatically Negotiations with companies, cont. Focus on advantages for companies Minor costs once established (just a copy of their sales administration) No questionnaires or monthly visits of price collectors Other incentives for companies? Money – not likely Information E.g. company price development compared to overall price development Negotiations with companies, cont. Establishing good routines with the companies are essential Strict time schedules No changes in formats when implemented Pre - production work Take your time analyzing the data Enormous amount of data N: Over 300 000 price observations each month divided into about 14 000 items Build shadow system (prototype) Compare the new price indexes based on scanner data with the old method for a certain period of time before implementation Discover possible problems in advance Unexpected situations will arise for sure Pre - production work Ideas for analysing the data: Is same EAN always same item? Extreme price changes Specific price development at beginning or end life cycle EAN structurally Risk of bias! All kind of dynamics in data Missing prices Do properties of data change over time Etc Methodology / IT-system Find methodology that: Delivers good indexes (e.g. no bias) Can deal with all particularities in data Build IT-system that supports the chosen methodology Learn from experiences other countries using scanner data Properties of data Consequences for methodology NL and N High attrition rate of items 70 60 50 40 30 20 10 # matches (200501,t) # matches (t-1, t) 07 05 20 08 03 20 08 01 20 08 11 20 08 09 20 07 07 20 07 05 # matches (t, 200808) 20 07 03 20 07 01 20 07 11 20 07 09 20 06 07 20 06 05 20 06 03 20 06 01 20 06 11 20 06 09 20 05 07 20 05 05 20 05 03 20 05 20 05 20 05 01 0 Properties of data, cont. Consequences for methodology NL and N How to deal with high attrition rate of items NL : monthly chained index N : monthly chained index 30000 25000 01 0 20 5 05 0 20 9 05 1 20 3 05 1 20 7 05 2 20 1 05 2 20 5 05 2 20 9 05 3 20 3 05 3 20 7 05 4 20 1 05 4 20 5 05 4 20 9 06 0 20 1 06 0 20 5 06 0 20 9 06 1 20 3 06 1 20 7 06 2 20 1 06 2 20 5 06 2 20 9 06 3 20 3 06 3 20 7 06 4 20 1 06 4 20 5 06 4 20 9 07 01 20 05 20 05 0 20 1 05 0 20 5 05 0 20 9 05 1 20 3 05 1 20 7 05 2 20 1 05 2 20 5 05 2 20 9 05 3 20 3 05 3 20 7 05 4 20 1 05 4 20 5 05 4 20 9 06 0 20 1 06 0 20 5 06 0 20 9 06 1 20 3 06 1 20 7 06 2 20 1 06 2 20 5 06 2 20 9 06 3 20 3 06 3 20 7 06 4 20 1 06 4 20 5 06 4 20 9 07 01 20 05 Properties of data, cont. Consequences for methodology NL and N Sales: low prices combined with enormous increase in quantities sold 7.00 Unit values 6.50 6.00 5.50 5.00 4.50 4.00 3.50 3.00 2.50 10000 Quantities 8000 6000 4000 2000 0 Expenditures Properties of data, cont. Consequences for methodology NL and N Consequences of sales: Single observations can have extremely high influence on elementary index Risk of bias applying monthly chaining and explicit weights Properties of data, cont. Consequences for methodology NL and N Bias not just theoretically! Example for detergents Formula Weekly index I(200835; 200501=100) Laspeyres 7 794 207.27 Monthly index I(200808; 200501=100) 11 301.04 Paasche 0.0000033 0.88 Fisher 5.10 99.89 Törnqvist 7.40 101.53 Jevons 78.76 91.75 Walsh 33.78 107.72 Properties of data, cont. Consequences for methodology NL and N How to deal with sales? NL N crude weighting on item level: w=0 or 1 Manual checks of price ratios that contribute most to elementary results: “critical observations” Properties of data, cont. Consequences for methodology NL and N Implausible price changes NL price changes (pt/pt-1) of more than a factor 4 are deleted N Changes of +5000% and -99% do actually occur price changes (pt/pt-1) of more than a factor 3 are deleted 05 20 08 03 01 20 08 20 08 11 09 20 07 20 07 07 05 20 07 20 07 03 01 20 07 20 07 11 09 20 06 20 06 07 05 20 06 Temporarily 20 06 03 01 20 06 20 06 11 09 20 05 20 05 07 05 20 05 20 05 03 01 20 05 20 05 Properties of data, cont. Consequences for methodology NL and N missing prices 10 Prices of item toilet paper 9 8 7 6 5 Properties of data, cont. Consequences for methodology NL and N How to deal with temporarily missing prices: NL: imputation of missing prices N: no adjustments, but imputing prices is considered for the near future Properties of data, cont. Consequences for methodology NL and N Quality differences Items with same EAN are considered to be identical Items with different EAN are treated as different items (no matching) How to deal with quality differences: NL N Only adjustment in exceptional cases: manual interference No adjustment Actual method - NL Data received: For each item each week: EAN Short description (Chain specific) product group Used to link items to COICOP automatically Expenditures Quantities sold Actual method – NL, cont. Price of item: Unweighted price index elementary level: Unit value based on first three weeks of month Monthly chained Jevons on selection of items Weighted price index higher aggregates: Yearly chained Laspeyres Weights based on scanner data of all 52 weeks of previous year Actual method – NL, cont. Item selection at elementary level Items with low expenditures Other items Threshold of low (average) expenditure share: si : w=0 : w=1 j , m / m 1 1 j ,m Nl Example: threshold =1% for χ=2 and N=50 Actual method – NL, cont. Determination of threshold value Simulations lead to: Optimal value: χ=1.25 Ca 50% of items is excluded (on average) Elementary index based on 80 à 85% of total expenditures Elementary level (chain dependent) comparable with COICOP6 Actual method – NL, cont. Refinements: Extreme price changes are excluded (factor 4) Missing prices are imputed Dump prices at end lifecycle item are excluded (see paper) Actual method – NL. What advantages were achieved? Indexes are of higher quality Response burden for companies is lower Compared with old method scanner data Compared with field survey No price collection in the shops Efficiency gains? Yes: more or less automatic production process Investment costs (IT-system) were (very) high 20 08 0 20 1 08 0 20 2 08 0 20 3 08 0 20 4 08 0 20 5 08 0 20 6 08 0 20 7 08 0 20 8 08 0 20 9 08 1 20 0 08 1 20 1 08 1 20 2 09 0 20 1 09 0 20 2 09 0 20 3 09 0 20 4 09 0 20 5 09 0 20 6 09 0 20 7 09 0 20 8 09 0 20 9 09 1 20 0 09 1 20 1 09 12 Illustrations Price indexes based on five supermarkets 106 Coicop 010000: Food and non-alcoholic beverages 105 104 103 102 101 100 99 98 Field survey Scanner data 20 08 0 20 1 08 0 20 2 08 0 20 3 08 0 20 4 08 0 20 5 08 0 20 6 08 0 20 7 08 0 20 8 08 0 20 9 08 1 20 0 08 1 20 1 08 1 20 2 09 0 20 1 09 0 20 2 09 0 20 3 09 0 20 4 09 0 20 5 09 0 20 6 09 0 20 7 09 0 20 8 09 0 20 9 09 1 20 0 09 1 20 1 09 12 20 08 0 20 1 08 0 20 2 08 0 20 3 08 0 20 4 08 0 20 5 08 0 20 6 08 0 20 7 08 0 20 8 08 0 20 9 08 1 20 0 08 1 20 1 08 1 20 2 09 0 20 1 09 0 20 2 09 0 20 3 09 0 20 4 09 0 20 5 09 0 20 6 09 0 20 7 09 0 20 8 09 0 20 9 09 1 20 0 09 1 20 1 09 12 Illustrations Price indexes based on five supermarkets 130 Coicop 11930: Soups, broths 120 110 100 90 Field survey Field survey Scanner data 120 Coicop 11150: Pasta products 115 110 105 100 95 Scanner data Actual method - N Data received: For each item in the midweek of the month: EAN/PLU Short description (Chain specific) product group Calculated average price Quantity sold Expenditure Actual method – N, cont. Sample of representative outlets Matching EAN/PLU with COICOP6 Weighted Jevons price index on elementary level with expenditures shares of current and base period; Stratified by chain and concept Monthly chained Törnqvist index Scanner data weights between the COICOP6 groups Actual method – N, cont. Higher aggregates: Yearly chained Laspeyres Weights from HES (NR as of 2011) Exclude strongly seasonal items only available for a certain period of the year Manual control and possibly exclusion of extreme contributions to elementary results Actual method – N What advantages were achieved? Indexes of higher quality? Low response burden for companies New methodology led to reduction of e.g sampling and measurement errors, but also to new biases Much more data – more detailed price indexes Considering both prices and quantities Many indexes have improved, others have not No questionnaires Efficiency gains? Automatic production process which requires some manual interference Resources demanded not much higher than before High investment costs (IT-system) New methodology Newly developed index (Ivancic, Diewert, Fox) Rolling year GEKS price index Source: GEKS-algorithm of purchasing power parities (International Comparison Programme) GEKS index transitive by construction chained index equals direct index no chain drift A geometric mean of direct superlative price indexes New methodology, cont. P jl and P kl : bilateral indexes (Törnqvist or Fisher) between entities j and l (l=1..M) and between entities k and l, respectively Purchasing power parities Scanner data M jk GEKS P P /P l 1 jl kl 1 / M : entity is country : entity is month M P P l 1 jl lk 1 / M New methodology, cont. Expanding time period leads to revising all previous GEKS indexes Solution: rolling version (chaining) 0,12 PGEKS : start of series 0,13 RGEKS P P P 13 0,12 GEKS 12 , t / P13,t 1 / 13 : first chaining t 1 0,14 RGEKS P P 0,13 GEKS P 14 13, t t 2 etc / P14,t 1 / 13 : sec ond chaining RYGEKS and NL RYGEKS specifically developed for Statistics Netherlands as remedy for not-weighting at elementary level Not (yet) applied in practice Used as benchmark Finding optimal value threshold Current method (NL) resembles RYGEKS quite well (on average) No bias found RYGEKS and NL: Illustrations Coicop 11130: Bread and other bakery products 108 106 104 102 100 98 09 0 7 20 09 0 5 20 09 0 3 former method 20 09 0 1 20 09 0 1 20 08 1 9 new method 20 08 0 7 20 08 0 5 20 08 0 3 20 08 0 1 20 08 0 1 07 1 9 20 07 0 7 20 07 0 5 20 07 0 3 20 07 0 1 20 07 0 20 20 rolling-year GEKS 9 96 RYGEKS and NL: Illustrations Coicop 012240: Syrups (Lemonade) 115 110 105 100 95 90 85 rolling-year GEKS new method former method 09 09 20 09 07 20 09 05 20 09 03 20 09 01 20 08 11 20 08 09 20 08 07 20 08 05 20 08 03 20 08 01 20 07 11 20 07 09 20 07 07 20 07 05 20 07 03 20 20 07 01 80 RYGEKS and NL: Illustrations Coicop 56120: Detergents 130 120 110 100 90 80 rolling-year GEKS 9 09 0 7 20 09 0 5 20 09 0 3 former method 20 09 0 1 20 09 0 1 20 08 1 9 new method 20 08 0 7 20 20 08 0 5 08 0 3 20 08 0 1 20 08 0 1 20 07 1 9 20 07 0 7 20 07 0 5 20 07 0 3 20 07 0 20 20 07 0 1 70 RYGEKS and NL, cont. Plans for near future: Shadow system based on RYGEKS indexes Continuous benchmark for current method Implementation when RYGEKS is widely accepted? More (international) analysis needed RYGEKS and N RYGEKS indexes tested on Norwegian scanner data on different levels; EAN, elementary and aggregated COICOP levels For COICOP 01 compared a monthly chained Törnqvist index with a monthly chained RYGEKS index The results indicate some bias in the Törnqvist index RYGEKS and N, cont. Small deviations for many COICOP aggregates Milk, Cheese and eggs, Oils and fats, Vegetables, Fish COICOP 0114: M ilk, che e se and e ggs 130,0 RYGEKS Törnqvist 125,0 120,0 115,0 110,0 105,0 100,0 95,0 90,0 20 06 20 07 06 20 09 06 20 11 07 20 01 07 20 03 07 20 05 07 20 07 07 20 09 07 20 11 08 20 01 08 20 03 08 20 05 08 20 07 08 20 09 08 20 11 09 20 01 09 20 03 09 20 05 09 20 07 09 20 09 09 11 RYGEKS and N, cont. COICOP 0111: Bread and cereals 125,0 RYGEKS Törnqvist 120,0 115,0 110,0 105,0 100,0 95,0 20 06 20 07 06 20 09 06 20 11 07 20 01 07 20 03 07 20 05 07 20 07 07 20 09 07 20 11 08 20 01 08 20 03 08 20 05 08 20 07 08 20 09 08 20 11 09 20 01 09 20 03 09 20 05 09 20 07 09 20 09 09 11 90,0 While others show more deviations Meat, Sugar, jam and chocolate 20 06 07 20 06 1 20 0 07 0 20 1 07 04 20 07 0 20 7 07 1 20 0 08 01 20 08 0 20 4 08 0 20 7 08 10 20 09 0 20 1 09 04 20 09 0 20 7 09 10 RYGEKS and N, cont. COICOP 0112: Meat 120,0 115,0 Törnqvist RYGEKS 110,0 105,0 100,0 95,0 90,0 RYGEKS and N, cont. Causing bias; Missing prices Seasonal items (not excluded) Price and quantity oscillating over time Shadow system for calculating RYGEKS indexes on monthly basis established Too early to be implemented Scanner data in other branches? NL: Expanding to other branches desirable Data available (e.g. durables) Problem of quality changes Analysis needed N: Continuously working to expand scanner data Data available for pharmaceutical products, wine and spirits (state monopoly) and petrol Increasing pressure from chains and outlets Mostly price information implemented Have tried to cover clothing, but matched item model unsuccessful