the use of scanner data in denmark- a case study · the use of scanner data in denmark- a case...
TRANSCRIPT
-
The use of scanner data in Denmark- a case study
Session 2.1
1
-
First step: Simple checksystems in Sas (missing records, proper dates, number of records and format) Second step: SD is linked to the COICOP-classification based on the
supermarkets’ own classification and a searchword-process created in Excel and linked to Sas Third step: Further cleaning and aggregation of scanner data
Data are validated in 3 steps
-
Check for missing records Check for proper dates Check for number of records Check for format
Step 1: What is the output of validation-system
3
-
What kind of check systems would you like to have for scanner data?
EXERCISE 1 ( 5 minutes)
4
-
The linking of GTIN codes to COICOP is done at a 6 digit level for each supermarket type of store. At the moment we use existing 8 digit level where possible
Step 2: Linking of Scanner data to ECOICOP
5
-
First we have the nomenclature, which can be updated:
How is the linking done?
6
C6 C6_description C6-description_extra11110 Rice Rice, pourridge of rice, ricedesserts11121 Flower and grain Wheat flower, wheat grains, other types of flower
11131 RyebreadAll kind of rye bread
11199 Residual group of bread and bakery products
-
Then we have the nomenclature linking (looked through once a year)
….
7
xxx_VNR xxx_C6
xxxx 11110
xxxxx 11199
-
C6 C6_description Searchword1 Searchword2New_C6 New_C6_description
11199Residual group of bread and bakery products %BASMATI% 11110 Rice11199Residual group of bread and bakery products %HVEDEMEL% 11121 Flower and grain
What do we do with the residual groups
8
-
How does this look in real life?
9
-
How do we monitor the new search words needed
10
C6 C6_d e scrip tio na g g r_turno ver
turno ve r_c2_ag g r
turno ve rsha re o f C2-le ve l (PCT )
turno ve r_c4_ag g r
turo nve rsha re o f C4-le ve l (PCT )
22099 Restgrupper tobak 163.110 165.113 98,79 165.113 98,7911399 Restgrupper fisk 3.202.789 1.447.052.237 0,22 52.538.203 6,111198 Restgrupper af bageri 8.491.246 1.447.052.237 0,59 184.288.644 4,6111299 Restgrupper kød og fjerkræ 10.576.361 1.447.052.237 0,73 269.208.928 3,93
OMS_TEST_RESTGR
DateC6C6_descriptionaggr_turnoverturnover_c2_aggrturnovershare of C2-level (PCT)turnover_c4_aggrturonvershare of C4-level (PCT)
173822099Restgrupper tobak163,110165,11398.79165,11398.79
173811399Restgrupper fisk3,202,7891,447,052,2370.2252,538,2036.1
173811198Restgrupper af bageri8,491,2461,447,052,2370.59184,288,6444.61
173811299Restgrupper kød og fjerkræ10,576,3611,447,052,2370.73269,208,9283.93
173821399Restgrupper øl og alkopops1,623,644300,046,8760.5453,736,0293.02
173811279Restgrupper kødpålæg7,389,9191,447,052,2370.51269,208,9282.75
173812299Restgrupper sodavand, mineralvand og juice2,436,0051,447,052,2370.1791,984,3582.65
173811499Restgrupper mælk, ost og æg3,982,2041,447,052,2370.28219,864,3421.81
173811899Restgrupper sukkervarer, marmelade, chokolade is mv.2,305,2911,447,052,2370.16138,308,0771.67
173811799Restgrupper grønsager2,484,1821,447,052,2370.17155,944,4381.59
173811939Restgrupper ketchup, remoulade og mayonnaise1,912,2551,447,052,2370.13141,295,0901.35
173811699Restgrupper frugt1,153,2411,447,052,2370.08104,870,1451.1
173821199Restgrupper spiritus336,175300,046,8760.1132,928,1351.02
173811199Restgrupper bageri og kornprodukter1,454,2751,447,052,2370.1184,288,6440.79
173812199Restgrupper kaffe, kakao, te290,5271,447,052,2370.0246,823,4380.62
173811599Restgrupper smør, spiseolie og margerine220,9321,447,052,2370.0241,926,5750.53
173821299Restgrupper vin401,483300,046,8760.1391,927,4450.44
173811259Restgrupper fjerkræ673,5691,447,052,2370.05269,208,9280.25
00000000
engelsk
DateC6C6_descriptionaggr_turnturn_c2_aggrTurnovershare of C2-level turnover (PCT)turn_c4_aggrTurnovershare of C4-level turnover (PCT)
145211279Restgrupper kødpålæg6,834,9561,147,502,7570.6226,381,3423.02
145221299Restgrupper vin2,906,263247,481,2071.1797,614,1692.98
145211899Restgrupper sukkervarer, marmelade, chokolade is mv.3,926,8681,147,502,7570.34144,025,7012.73
145211198Restgrupper af bageri2,704,6091,147,502,7570.24124,836,4712.17
145212299Restgrupper sodavand, mineralvand og juice1,450,4211,147,502,7570.1373,728,9571.97
145211199Restgrupper bageri og kornprodukter2,343,2941,147,502,7570.2124,836,4711.88
145211799Restgrupper grønsager2,216,9341,147,502,7570.19127,325,3191.74
145211399Restgrupper fisk935,4361,147,502,7570.0858,241,7401.61
145211299Restgrupper kød og fjerkræ3,440,4371,147,502,7570.3226,381,3421.52
145221399Restgrupper øl og alkopops593,391247,481,2070.2450,462,6181.18
145212199Restgrupper kaffe, kakao, te291,8881,147,502,7570.0327,374,6961.07
145211699Restgrupper frugt795,4841,147,502,7570.07101,144,6430.79
145211259Restgrupper fjerkræ1,337,1121,147,502,7570.12226,381,3420.59
145211499Restgrupper mælk, ost og æg887,9541,147,502,7570.08167,286,2410.53
145221199Restgrupper spiritus94,080247,481,2070.0432,259,1730.29
00000000
-
Eancode Product description C6 C6 description
Ean-turnover
The eancodes’ share of the residual group
The wholeresidual groups’ share of the C2-level
1234567891011 LAKS 11399Restgrupper fisk 299.3 8,06 0,221234567891011MOERKSEJFILET 11399Restgrupper fisk 183.9 4,95 0,22
What isn’t linked then?
11
-
We use the searchwordlist shown before
And what do we do
12
C6 C6_description Searchword1 Searchword2New_C6 New_C6_description
11399Residual group of fish %MOERKSEJFILET% 11311 Cod and the like
-
This includes making amounts uniform It also includes cleaning other variables(product text, unit, amount…) at
the lowest possible level
For example GTINs with more than one amount are recalculated into numeraire Per default all PLUs with more than one product text, amount or unit
are deleted here!
Step 3: Aggregation
13
-
It also includes removing prices, which are more than a factor 4 away from the median price for a given GTIN
Now: The prices are aggregated into one price per GTIN per type of shop
…
14
-
The final step: More weeks are aggregated together (including cleaning) to form the
data for one month Application of filters: Prices which have changed by more than a factor 10, since last month,
are removed Prices which have fallen in both price and volume since last month are
removed
15
-
CPI via scanner dataReception of scannerdata
Data is validatedData is ready for processing
The representative basket ( a sample) for month t-1 is drawn
Each good in the representative basket is checked- is it still sold-yes/no?
Yes
No
The representative basket for month t is drawn
Replacement goods are chosen
-
- Goods in the representative basket need to have been sold for all 12 months of 2011 (2015) and constitute the top 50 % of turnover.
- There are individual criteria at either too many or too few observations compared to the existing sample.
Selection criteria (1)
-
Replacement goods need to have been part of the sample for 4 months They are then picked based on highest turnover and stability of
turnover
Selection criteria (2)
18
-
System for maintenance of the representative basket
-
Calculation is based on Laspeyres type indices (JEVONS)- done at all levels following the existing calculation methods
February 2015: Commenced with the calculation of RYGEKS indices, so we are able to have a basis for a juxtaposition.
→ suggest small underestimation of price development, but rygeks method is not perfect
On the calculation of SD-indices
20
-
Documentation?
21
-
CPI
22
-
COICOP 1
23
-
COICOP 2
24
-
No significant differences
25
-
Scanner data is a better source
26
-
Scanner data is a different source
27
-
The complexity of the data can involve the risk of a suboptimal sample Data is not delivered timely
Worries
28
-
We use two weeks for the calculation of SD-indices giving us a chance of switching weeks.
Developed SAS-programs for the continuation of prices for one or more SD-chains
We’ve signed contracts with the chains on the delivery of data. This will include a clause letting us know if the product structure changes
Emergency systems
29
-
1. Scanner data is complex. Cleaning of data and an assumption of diversity of barcodes is necessary 2. The differences when handling scanner data as compared to
traditional price collection need to be implemented in the IT-systems in a proper way and thought through as early as possible 3. Common frame work is important
Lessons learnt
30
-
In general unit values are applicable for the calculation of SD-indices
But:- The data needs to be cleaned in other variables before hand(product
text, unit, amount…) at the lowest possible level- The price development has to be monitored both within the month and
between months
On unit values
31
-
What are your worries with the processing of scanner data?
End discussion
32
Session 2.1Data are validated in 3 stepsStep 1: What is the output of validation-system EXERCISE 1 ( 5 minutes)Step 2: Linking of Scanner data to ECOICOPHow is the linking done?….What do we do with the residual groupsHow does this look in real life?How do we monitor the new search words neededWhat isn’t linked then?And what do we doStep 3: Aggregation…Slide Number 15CPI via scanner dataSelection criteria (1)Selection criteria (2)System for maintenance of the representative basketOn the calculation of SD-indicesDocumentation?CPICOICOP 1COICOP 2No significant differencesScanner data is a better sourceScanner data is a different sourceWorries Emergency systemsLessons learntOn unit valuesEnd discussion