WORKSHOP ON SCANNER DATA Geneva 10 May 2010 Joint presentation by Ragnhild Nygaard (Statistics Norway) and Heymerik van der Grient (Statistics Netherlands)

Download Report

Transcript WORKSHOP ON SCANNER DATA Geneva 10 May 2010 Joint presentation by Ragnhild Nygaard (Statistics Norway) and Heymerik van der Grient (Statistics Netherlands)

WORKSHOP ON SCANNER DATA
Geneva
10 May 2010
Joint presentation by
Ragnhild Nygaard
(Statistics Norway)
and
Heymerik van der Grient
(Statistics Netherlands)
Historical overview – NL
Supermarkets


Mid 90s: first contacts with chain(s)
2002: first implementation: 1/2
chain(s)

Yearly Laspeyres (labour intensive)




Construction of yearly basket of items
Manual linking of items to COICOP-groups
Manual replacement of disappearing items
Reduction of ca 10 000 monthly price
quotes in field survey
Historical overview – NL, cont
Supermarkets

2010: extension: 6 chains

Monthly chained Jevons (efficient
process)



No manual linking of items
No explicit replacements
Extra reduction of ca 5 000 monthly
price quotes in field survey
Historical overview – N

1997: first contact with one chain




Gradually contact with more chains
Implementation in the CPI
 only price information of specific
representative items
2002: scanner data from all the chains
(no questionnaires - big incentive)
Aug 2005: expanded use for COICOP 01

price and quantity information for all items in
representative outlets
Questions to be answered when
dealing with scanner data





How/Where require scanner data?
Which statistical method?
How to link items to COICOP?
How to deal with all kind of
particularities in data?
Development of new computer
system?
Source of scanner data

Market research companies




Cleaned data
(very) expensive
Two-stage delivery chain (timeliness)
Companies/Chains



Raw data
Cheap (NL/N do not pay)
Direct contact with original supplier
Negotiations with companies

Time consuming process



Negotiations can take up to a year or more
including meetings, sending test data, analysing
data etc.
Be aware of some company establishing costs
e.g. preparing the data extractions
Can company provide what you
want/need?

E.g. information to link items to COICOP
automatically
Negotiations with companies, cont.

Focus on advantages for companies



Minor costs once established (just a copy
of their sales administration)
No questionnaires or monthly visits of
price collectors
Other incentives for companies?


Money – not likely
Information

E.g. company price development compared to
overall price development
Negotiations with companies, cont.

Establishing good routines with the
companies are essential


Strict time schedules
No changes in formats when
implemented
Pre - production work

Take your time analyzing the data


Enormous amount of data
 N: Over 300 000 price observations each
month divided into about 14 000 items
Build shadow system (prototype)
 Compare the new price indexes based on
scanner data with the old method for a certain
period of time before implementation
 Discover possible problems in advance

Unexpected situations will arise for sure
Pre - production work

Ideas for analysing the data:







Is same EAN always same item?
Extreme price changes
Specific price development at beginning or end
life cycle EAN structurally
 Risk of bias!
All kind of dynamics in data
Missing prices
Do properties of data change over time
Etc
Methodology / IT-system

Find methodology that:




Delivers good indexes (e.g. no bias)
Can deal with all particularities in data
Build IT-system that supports the
chosen methodology
Learn from experiences other
countries using scanner data
Properties of data
Consequences for methodology NL and N
High attrition rate of items

70
60
50
40
30
20
10
# matches (200501,t)
# matches (t-1, t)
07
05
20
08
03
20
08
01
20
08
11
20
08
09
20
07
07
20
07
05
# matches (t, 200808)
20
07
03
20
07
01
20
07
11
20
07
09
20
06
07
20
06
05
20
06
03
20
06
01
20
06
11
20
06
09
20
05
07
20
05
05
20
05
03
20
05
20
05
20
05
01
0
Properties of data, cont.
Consequences for methodology NL and N

How to deal with high attrition rate
of items

NL
: monthly chained index

N
: monthly chained index
30000
25000
01
0
20 5
05
0
20 9
05
1
20 3
05
1
20 7
05
2
20 1
05
2
20 5
05
2
20 9
05
3
20 3
05
3
20 7
05
4
20 1
05
4
20 5
05
4
20 9
06
0
20 1
06
0
20 5
06
0
20 9
06
1
20 3
06
1
20 7
06
2
20 1
06
2
20 5
06
2
20 9
06
3
20 3
06
3
20 7
06
4
20 1
06
4
20 5
06
4
20 9
07
01
20
05
20
05
0
20 1
05
0
20 5
05
0
20 9
05
1
20 3
05
1
20 7
05
2
20 1
05
2
20 5
05
2
20 9
05
3
20 3
05
3
20 7
05
4
20 1
05
4
20 5
05
4
20 9
06
0
20 1
06
0
20 5
06
0
20 9
06
1
20 3
06
1
20 7
06
2
20 1
06
2
20 5
06
2
20 9
06
3
20 3
06
3
20 7
06
4
20 1
06
4
20 5
06
4
20 9
07
01
20
05
Properties of data, cont.
Consequences for methodology NL and N
Sales: low prices combined with enormous increase in
quantities sold
7.00
Unit values
6.50
6.00
5.50
5.00
4.50
4.00
3.50
3.00
2.50
10000
Quantities
8000
6000
4000
2000
0
Expenditures
Properties of data, cont.
Consequences for methodology NL and N

Consequences of sales:


Single observations can have extremely
high influence on elementary index
Risk of bias applying monthly chaining
and explicit weights
Properties of data, cont.
Consequences for methodology NL and N

Bias not just theoretically!

Example for detergents
Formula
Weekly index
I(200835; 200501=100)
Laspeyres 7 794 207.27
Monthly index
I(200808; 200501=100)
11 301.04
Paasche
0.0000033
0.88
Fisher
5.10
99.89
Törnqvist
7.40
101.53
Jevons
78.76
91.75
Walsh
33.78
107.72
Properties of data, cont.
Consequences for methodology NL and N

How to deal with sales?

NL

N
crude weighting on item level:
w=0 or 1
Manual checks of price ratios
that contribute most to
elementary results:
“critical observations”
Properties of data, cont.
Consequences for methodology NL and N

Implausible price changes

NL
price changes (pt/pt-1) of more
than a factor 4 are deleted


N
Changes of +5000% and -99% do actually occur
price changes (pt/pt-1) of more
than a factor 3 are deleted
05
20
08
03
01
20
08
20
08
11
09
20
07
20
07
07
05
20
07
20
07
03
01
20
07
20
07
11
09
20
06
20
06
07
05
20
06
Temporarily
20
06
03
01
20
06
20
06
11
09
20
05
20
05
07
05
20
05
20
05
03
01
20
05
20
05
Properties of data, cont.
Consequences for methodology NL and N
missing prices
10
Prices of item toilet paper
9
8
7
6
5
Properties of data, cont.
Consequences for methodology NL and N

How to deal with temporarily
missing prices:

NL:
imputation of missing prices

N:
no adjustments, but imputing
prices is considered for the near
future
Properties of data, cont.
Consequences for methodology NL and N

Quality differences



Items with same EAN are considered to be
identical
Items with different EAN are treated as different
items (no matching)
How to deal with quality differences:

NL

N
Only adjustment in exceptional
cases: manual interference
No adjustment
Actual method - NL

Data received:

For each item each week:
EAN
 Short description
 (Chain specific) product group


Used to link items to COICOP automatically
Expenditures
 Quantities sold

Actual method – NL, cont.

Price of item:


Unweighted price index elementary level:


Unit value based on first three weeks of month
Monthly chained Jevons on selection of items
Weighted price index higher aggregates:


Yearly chained Laspeyres
Weights based on scanner data of all 52 weeks
of previous year
Actual method – NL, cont.

Item selection at elementary level



Items with low expenditures
Other items
Threshold of low (average) expenditure share:
si

: w=0
: w=1
j , m / m 1
1

j ,m
  Nl
Example: threshold =1% for χ=2 and N=50
Actual method – NL, cont.

Determination of threshold value

Simulations lead to:


Optimal value: χ=1.25
 Ca 50% of items is excluded (on average)
 Elementary index based on 80 à 85% of total
expenditures
Elementary level (chain dependent)
comparable with COICOP6
Actual method – NL, cont.

Refinements:



Extreme price changes are excluded
(factor 4)
Missing prices are imputed
Dump prices at end lifecycle item are
excluded (see paper)
Actual method – NL.
What advantages were achieved?

Indexes are of higher quality



Response burden for companies is lower


Compared with old method scanner data
Compared with field survey
No price collection in the shops
Efficiency gains?


Yes: more or less automatic production
process
Investment costs (IT-system) were (very) high

20
08
0
20 1
08
0
20 2
08
0
20 3
08
0
20 4
08
0
20 5
08
0
20 6
08
0
20 7
08
0
20 8
08
0
20 9
08
1
20 0
08
1
20 1
08
1
20 2
09
0
20 1
09
0
20 2
09
0
20 3
09
0
20 4
09
0
20 5
09
0
20 6
09
0
20 7
09
0
20 8
09
0
20 9
09
1
20 0
09
1
20 1
09
12
Illustrations
Price indexes based on five supermarkets
106
Coicop 010000: Food and non-alcoholic beverages
105
104
103
102
101
100
99
98
Field survey
Scanner data
20
08
0
20 1
08
0
20 2
08
0
20 3
08
0
20 4
08
0
20 5
08
0
20 6
08
0
20 7
08
0
20 8
08
0
20 9
08
1
20 0
08
1
20 1
08
1
20 2
09
0
20 1
09
0
20 2
09
0
20 3
09
0
20 4
09
0
20 5
09
0
20 6
09
0
20 7
09
0
20 8
09
0
20 9
09
1
20 0
09
1
20 1
09
12

20
08
0
20 1
08
0
20 2
08
0
20 3
08
0
20 4
08
0
20 5
08
0
20 6
08
0
20 7
08
0
20 8
08
0
20 9
08
1
20 0
08
1
20 1
08
1
20 2
09
0
20 1
09
0
20 2
09
0
20 3
09
0
20 4
09
0
20 5
09
0
20 6
09
0
20 7
09
0
20 8
09
0
20 9
09
1
20 0
09
1
20 1
09
12
Illustrations
Price indexes based on five supermarkets
130
Coicop 11930: Soups, broths
120
110
100
90
Field survey
Field survey
Scanner data
120
Coicop 11150: Pasta products
115
110
105
100
95
Scanner data
Actual method - N

Data received:

For each item in the midweek of the
month:
EAN/PLU
 Short description
 (Chain specific) product group
 Calculated average price
 Quantity sold
 Expenditure

Actual method – N, cont.

Sample of representative outlets



Matching EAN/PLU with COICOP6
Weighted Jevons price index on
elementary level with expenditures shares
of current and base period;


Stratified by chain and concept
Monthly chained Törnqvist index
Scanner data weights between the
COICOP6 groups
Actual method – N, cont.

Higher aggregates:




Yearly chained Laspeyres
Weights from HES (NR as of 2011)
Exclude strongly seasonal items only
available for a certain period of the year
Manual control and possibly exclusion of
extreme contributions to elementary
results
Actual method – N
What advantages were achieved?

Indexes of higher quality?





Low response burden for companies


New methodology led to reduction of e.g sampling and
measurement errors, but also to new biases
Much more data – more detailed price indexes
Considering both prices and quantities
Many indexes have improved, others have not
No questionnaires
Efficiency gains?

Automatic production process which requires some
manual interference


Resources demanded not much higher than before
High investment costs (IT-system)
New methodology

Newly developed index (Ivancic, Diewert, Fox)

Rolling year GEKS price index
 Source:



GEKS-algorithm of purchasing power parities
(International Comparison Programme)
GEKS index transitive by construction
 chained index equals direct index
 no chain drift
A geometric mean of direct superlative price
indexes
New methodology, cont.
P jl and P kl :
bilateral indexes (Törnqvist or Fisher) between
entities j and l (l=1..M) and between entities k and l,
respectively
Purchasing power parities
Scanner data
M
jk
GEKS
P

 P /P
l 1
jl

kl 1 / M
: entity is country
: entity is month
M

 P P
l 1
jl

lk 1 / M
New methodology, cont.


Expanding time period leads to revising all
previous GEKS indexes
Solution: rolling version (chaining)
0,12
PGEKS
: start of series
0,13
RGEKS
P
P
 P
13
0,12
GEKS
12 , t
/ P13,t

1 / 13
: first chaining
t 1
0,14
RGEKS
P
P
0,13
GEKS
 P
14
13, t
t 2
etc
/ P14,t

1 / 13
: sec ond chaining
RYGEKS and NL

RYGEKS specifically developed for Statistics
Netherlands as remedy for not-weighting at
elementary level



Not (yet) applied in practice
Used as benchmark
 Finding optimal value threshold
Current method (NL) resembles RYGEKS quite
well (on average)
 No bias found
RYGEKS and NL: Illustrations
Coicop 11130: Bread and other bakery products
108
106
104
102
100
98
09
0
7
20
09
0
5
20
09
0
3
former method
20
09
0
1
20
09
0
1
20
08
1
9
new method
20
08
0
7
20
08
0
5
20
08
0
3
20
08
0
1
20
08
0
1
07
1
9
20
07
0
7
20
07
0
5
20
07
0
3
20
07
0
1
20
07
0
20
20
rolling-year GEKS
9
96
RYGEKS and NL: Illustrations
Coicop 012240: Syrups (Lemonade)
115
110
105
100
95
90
85
rolling-year GEKS
new method
former method
09
09
20
09
07
20
09
05
20
09
03
20
09
01
20
08
11
20
08
09
20
08
07
20
08
05
20
08
03
20
08
01
20
07
11
20
07
09
20
07
07
20
07
05
20
07
03
20
20
07
01
80
RYGEKS and NL: Illustrations
Coicop 56120: Detergents
130
120
110
100
90
80
rolling-year GEKS
9
09
0
7
20
09
0
5
20
09
0
3
former method
20
09
0
1
20
09
0
1
20
08
1
9
new method
20
08
0
7
20
20
08
0
5
08
0
3
20
08
0
1
20
08
0
1
20
07
1
9
20
07
0
7
20
07
0
5
20
07
0
3
20
07
0
20
20
07
0
1
70
RYGEKS and NL, cont.

Plans for near future:



Shadow system based on RYGEKS
indexes
Continuous benchmark for current
method
Implementation when RYGEKS is
widely accepted?

More (international) analysis needed
RYGEKS and N

RYGEKS indexes tested on Norwegian
scanner data on different levels;



EAN, elementary and aggregated COICOP
levels
For COICOP 01 compared a monthly
chained Törnqvist index with a monthly
chained RYGEKS index
The results indicate some bias in the
Törnqvist index
RYGEKS and N, cont.
Small deviations for many COICOP
aggregates

Milk, Cheese and eggs, Oils and fats,
Vegetables, Fish
COICOP 0114: M ilk, che e se and e ggs
130,0
RYGEKS
Törnqvist
125,0
120,0
115,0
110,0
105,0
100,0
95,0
90,0
20
06
20 07
06
20 09
06
20 11
07
20 01
07
20 03
07
20 05
07
20 07
07
20 09
07
20 11
08
20 01
08
20 03
08
20 05
08
20 07
08
20 09
08
20 11
09
20 01
09
20 03
09
20 05
09
20 07
09
20 09
09
11

RYGEKS and N, cont.
COICOP 0111: Bread and cereals
125,0
RYGEKS
Törnqvist
120,0
115,0
110,0
105,0
100,0
95,0
20
06
20 07
06
20 09
06
20 11
07
20 01
07
20 03
07
20 05
07
20 07
07
20 09
07
20 11
08
20 01
08
20 03
08
20 05
08
20 07
08
20 09
08
20 11
09
20 01
09
20 03
09
20 05
09
20 07
09
20 09
09
11
90,0

While others show more deviations

Meat, Sugar, jam and chocolate
20
06
07
20
06
1
20 0
07
0
20 1
07
04
20
07
0
20 7
07
1
20 0
08
01
20
08
0
20 4
08
0
20 7
08
10
20
09
0
20 1
09
04
20
09
0
20 7
09
10
RYGEKS and N, cont.
COICOP 0112: Meat
120,0
115,0
Törnqvist
RYGEKS
110,0
105,0
100,0
95,0
90,0
RYGEKS and N, cont.

Causing bias;




Missing prices
Seasonal items (not excluded)
Price and quantity oscillating over time
Shadow system for calculating RYGEKS
indexes on monthly basis established

Too early to be implemented
Scanner data in other branches?

NL:





Expanding to other branches desirable
Data available (e.g. durables)
Problem of quality changes
Analysis needed
N:

Continuously working to expand scanner data


Data available for pharmaceutical products, wine
and spirits (state monopoly) and petrol


Increasing pressure from chains and outlets
Mostly price information implemented
Have tried to cover clothing, but matched item
model unsuccessful