A System For Scanner Data

Download Report

Transcript A System For Scanner Data

Some Implementation Issues
of Scanner Data
Muhanad Sammar, Anders Norberg &
Can Tongur
Some Background
•
•
•
•
•
3 major outlet chains in Sweden
Statistics Sweden has received scanner data
since 2009
First principal issue to decide how to use S.D.
The Swedish CPI Board approved the use of
scanner data in 2011
Second principal issue how to aggregate data
The First Principal Issue
– How to Use Scanner Data
A. Replace the manually collected price data with
scanner data for the sample of outlets and
products
B. Use scanner data as auxiliary information
C. Compute index based on scanner data all
products (and outlets)
D. Use scanner data for auditing and quality
control
The First Principal Issue
– How to Use Scanner Data
A. Replace the manually collected price data with
scanner data for the sample of outlets and
products
B. Use scanner data as auxiliary information
C. Compute index based on scanner data all
products (and outlets)
D. Use scanner data for auditing and quality
control
The First Principal Issue
– How to Use Scanner Data
A. Replace the manually collected price data
with scanner data for the sample of outlets
and products
Sample of 32 supermarket and local shops
and 4 hypermarkets
3 negatively coordinated samples of 500
products, identified by EAN for products
A. is the Swedish idea
The First Principal Issue
– How to Use Scanner Data
A. Replace the manually collected price data with
scanner data for the sample of outlets and
products
B. Use scanner data as auxiliary information
C. Compute index based on scanner data all
products (and outlets)
D. Use scanner data for auditing and quality
control
The First Principal Issue
– How to Use Scanner Data
A. Replace the manually collected price data
with scanner data for the sample of outlets
and products
B. Use scanner data as auxiliary information
Index =
Index(M.C.P.)
Index(S.D.)
small sample
* Index(S.D.)
big sample
The First Principal Issue
– How to Use Scanner Data
A. Replace the manually collected price data
with scanner data for the sample of outlets
and products
B. Use scanner data as auxiliary information
big sample
Index =
Index(S.D.)
Index(S.D.)
* Index(M.C.P.)
small sample
The First Principal Issue
– How to Use Scanner Data
A. Replace the manually collected price data with
scanner data for the sample of outlets and
products
B. Use scanner data as auxiliary information
C. Compute index based on scanner data all
products (and outlets)
D. Use scanner data for auditing and quality
control
The First Principal Issue
– How to Use Scanner Data
A. Replace the manually collected price data with
scanner data for the sample of outlets and
products
B. Use scanner data as auxiliary information
C. Compute index based on scanner data all
products (and outlets)
Problems;
- COICOP-classification of all products
- Products with deposits must be identified
- New products might hide price changes
The First Principal Issue
– How to Use Scanner Data
A. Replace the manually collected price data with
scanner data for the sample of outlets and
products
B. Use scanner data as auxiliary information
C. Compute index based on scanner data all
products (and outlets)
D. Use scanner data for auditing and quality
control.
The First Principal Issue
– How to Use Scanner Data
A. Replace the manually collected price data with
scanner data for the sample of outlets and
products
B. Use scanner data as auxiliary information
C. Compute index based on scanner data all
products (and outlets)
D. Use scanner data for auditing and quality
control.
We have seen variation between price collectors
as regards quality of delivery
The Second Principal Issue
– Data Aggregation
• Scanner data are weekly aggregates of data
for each product and outlet in the sample
• Each week has ca. 8 500 price observations
• Weekly data requires aggregation to month
Natural choices of aggregation:
i. Unweighted Geometric Mean value or
ii. Quantity-Weighted Arithmetic Mean value
Motives
i.
In line with rest of CPI for daily necessities
ii. In line with data
The Two Mean Values
• The geometric mean value:
Pt
G


W
w1

1/ W
p j ,k , w
W
• The weighted arithmetic mean value:
Pt A 
p
w1
j ,k , w
q j ,k ,w
W
q
w1
• We compared the two methods irrespective of their
inhabited differences
j ,k , w
Some Statistics
• 2% Geometric > Arithmetic in base while
Geometric=Arithmetic in Jan, Feb, Mar
• 3% Geometric = Arithmetic in base while
Geometric > Arithmetic in Jan, Feb, Mar
• > 98% of observations (weekly prices) without variations
between days
• Ca. 9% of monthly prices had variations between weeks
Figure 5.1 in the paper: Logarithmic ratios of mean prices in current month relative to base
period. Unweighted geometric mean on vertical axis and quantity-weighted arithmetic mean
on horizontal axis. Eight sectors are numbered for analysis purposes.
(P
t
A

G
A
 Pt G )  (PBase
 PBase
)  (Pt ,w  Qt ,w )
G
A
(PBase ,w  QBase ,w )  (PBase
 PBase
)
Figure 5.2 in the paper. Monthly price indices for product groups in supermarkets
and hypermarkets, based on geometric and arithmetic mean prices per month.
Index_G
120
110
100
90
80
80
90
100
Index_A
110
120
Indices by Different Methods
Period
Unw. Geom.
W. Arith.
W. Geom.
Unw. Arith.
January
100
99.815
99.785
100.038
February
100
99.998
99.996
100.000
March
100
100.000
100.000
100.003
April
100
99.969
99.963
100.008
Quantity weigthing seems to impact a bit…
Figure 5.3 in the paper. Distribution of price changes during January –
April 2012 with base in December 2011. Unweighted geometric mean.
3000
F
r
e
2000
q
u
e
n
c
1000
y
0
-.80
-.65
-.50
-.35
-.20
-.05
0.10
pr i cer at i o
0.25
0.40
0.55
0.70
0.85
Data Quality
Variation between outlets for scanner data (left) and manually collected data (right).
Individual prices on vertical axis and monthly average prices per product on horizontal axis.
The year 2010.
Data Quality (2)
Scanner Data (S.D.) and Manually Collected Prices (M.C.P.) in
comparison. Product-offers, outlets and weeks. January –
December, 2009 and 2010.
Matching categories in
2009 (%)
2010 (%)
Neither in M.C.P. or S.D.
In M.C.P. but not in S.D.
In S.D. but not in M.C.P.
M.C.P. = S.D.
M.C.P. > S.D.
M.C.P < S.D.
1.5
4.5
1.5
83.4
4.3
4.8
0.6
5.3
0.9
86.2
3.7
3.3
Number of comparable product-offers is 36 102 and 38 786 respectively.
EAN code maintenance
•
•
•
•
•
S.D = Vast Amounts of Data ≠ Large Samples
Data extraction = EAN code probing
Yearly EAN survival rate (base-to-base) 70-80%
Some 500 products identified and maintained
Until now, 35 of 538 products changed EAN
code during 2012 (=6.5%)
• Fixed basket implication - Always up to date
with S.D.!