using alternative data sources to produce consumer price
TRANSCRIPT
![Page 1: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/1.jpg)
02 December 2019
Using alternative data sources to produce consumer price indices
Liam and Lefteris
![Page 2: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/2.jpg)
4 December 2019
Liam Greenhough
Consumer Prices Methods Transformation
Overview of the Alternative Data Sources Project
![Page 3: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/3.jpg)
How price statistics are measured
In January, select 700 “items” to track over year. Known
as the fixed basket. Each year the basket is “refreshed” to
account for changing consumer behaviours.
4 December 2019
![Page 4: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/4.jpg)
How price statistics are measured
4 December 2019
For each item, select a group
of products to track over the
year.
Each item is an aggregate –
but is also a “subset” of higher
aggregates.
![Page 5: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/5.jpg)
How price statistics are measured
Collect prices of products each month. These are
collected:
• Locally, and
• Centrally
Approximately 180,000 price quotes are collected per
month.
4 December 2019
![Page 6: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/6.jpg)
How price statistics are measured
Use index formulae to compare prices
of products across months. Most
common index is the Jevons.
Use weights to aggregate upwards to
higher-level indices.
4 December 2019
![Page 7: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/7.jpg)
4 December 2019
0
1
2
3
4
5
1996 JAN 1999 JAN 2002 JAN 2005 JAN 2008 JAN 2011 JAN 2014 JAN 2017 JAN
CPIH compared to the current Bank of England inflation target
CPIH Inflation target
![Page 8: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/8.jpg)
4 December 2019
Consumer Price Statistics: Alternative
Data Sources
![Page 9: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/9.jpg)
Alternative Data
Looking to implement two new data
sources:
• Scanner data – transactional data from
large retailers
• Web scraped data – data scraped from
online retailers
Aim to use in conjunction with traditional!
4 December 2019
![Page 10: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/10.jpg)
Alternative Data – targeted items
4 December 2019
![Page 11: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/11.jpg)
4 December 2019
Data dimension Traditional Scanner data Web scraping
Data acquisition Manual Automated Automated
Completeness/scope Sample from all
retailers
All transactions (bulk)
from medium to large
retailers
Bulk or sample from
online retailers
Metadata Item description Item description +
limited attributes
Item description +
attributes
Quantity data N/A Quantities sold N/A
Timing Single collection day Daily Daily
![Page 12: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/12.jpg)
Big data
System needs to process big
data.
4 December 2019
Traditional data sources: ~180,000 price quotes per month
Scanner data: ~100,000,000 price quotes per month
![Page 13: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/13.jpg)
4 December 2019
The Team
![Page 14: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/14.jpg)
4 December 2019
(Prices)Data
Transformation
(Prices)Methods
Transformation
Emerging Platforms
Methodology
![Page 15: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/15.jpg)
4 December 2019
Some of the research
![Page 16: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/16.jpg)
Scalability
Not possible to manually
scrutinise big data, e.g.
classification.
4 December 2019
![Page 17: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/17.jpg)
Product Churn – synthetic
4 December 2019
0
10
20
30
40
50
60
70
80
90
100
Jan:Jan Jan:Feb Jan:Mar Jan:Apr Jan:May Jan:Jun Jan:Jul Jan:Aug Jan:Sep Jan:Oct Jan:Nov Jan:Dec
![Page 18: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/18.jpg)
The combination problem
Lots of steps to calculate indices
Different methods at each step
Leads to many potential combinations!
4 December 2019
![Page 19: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/19.jpg)
4 December 2019
Our plans
![Page 20: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/20.jpg)
4 December 2019
may be brought forward
with uarterly ublications
if feasible to do so
item coverage de ends on data
availability
![Page 21: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/21.jpg)
4 December 2019
Lefteris Karachalias
Emerging Platforms Development and Support Team
Consumer Prices Data Transformation:
Development
![Page 22: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/22.jpg)
Overview
• The system
o Overall system architecture
o User interaction
• Development framework
o Development project delivery team
o Tools
o Documentation
o Dev&Test
4 December 2019
![Page 23: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/23.jpg)
Overall system architecture
4 December 2019
Data supplier Research/
Publication
DAP: Workspace zone
Staged Processed Analysis
DAP: Landing zone
Raw
Core
pipeline
Analysis
pipelineValidation
Staging
Classification
Decision
rules
Retailer
Expenditure
weights
FTP
Raw
Data Engineers
![Page 24: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/24.jpg)
Core pipeline
4 December 2019
![Page 25: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/25.jpg)
Multiple configuration scenarios
4 December 2019
pipeline
config 1
Staged
data
Stage 1 output
Stage 2 output
Stage 3 output
pipeline
config 2
Stage 1 output
Stage 2 output
Stage 3 output
pipeline
config 3
Stage 1 output
Stage 2 output
Stage 3 output
scenario stage
1 1
2 1
3 1
scenario stage
1 2
2 2
3 2
scenario stage
1 3
2 3
3 3
Processed
data
![Page 26: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/26.jpg)
User interaction
• UI: CDSW / HUE
• Manual
• Configuration
• Mappers (BAU)
• Dashboard (cannot share VDI)
• Output tables
4 December 2019
![Page 27: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/27.jpg)
Development framework (1)
• Project delivery team
• Development phase: Between Discovery and Alpha phase
• Agile, Jira
• DAP, PySpark, HDFS, HIVE (sensitivity)
• Git, GitLab
4 December 2019
![Page 28: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/28.jpg)
Project delivery team
4 December 2019
Project Specialist
Technical Lead
Methods Specifier
Business Analyst
Configuration Engineer/Developer
ConfigurationArchitect
Tester
Product Owner
![Page 29: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/29.jpg)
Development framework (2)
• Unit testing, CI with Jenkins, UAC
• Documentation Sphinx, user manuals
• Business Analysis models, Sparx
• Business Architecture: pushing to the SML
• Synthetic data, Dev&Test, packaging
4 December 2019
![Page 30: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/30.jpg)
Statistical process model
4 December 2019
![Page 31: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/31.jpg)
Data (journey) model
4 December 2019
Input Processing Output
![Page 32: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/32.jpg)
Dev and Test environment
4 December 2019
Real data
ProdDev&Test
Synthetic
data
package
CDSW
prod
Synthetic
data
CDSW
dev
package
CI
JenkinsOutput
![Page 33: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/33.jpg)
Thank you!
4 December 2019
![Page 34: Using alternative data sources to produce consumer price](https://reader034.vdocuments.site/reader034/viewer/2022052106/62877918076f175b1441076b/html5/thumbnails/34.jpg)
4 December 2019
Any questions?