using alternative data sources to produce consumer price

34
02 December 2019 Using alternative data sources to produce consumer price indices Liam and Lefteris

Upload: others

Post on 20-May-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using alternative data sources to produce consumer price

02 December 2019

Using alternative data sources to produce consumer price indices

Liam and Lefteris

Page 2: Using alternative data sources to produce consumer price

4 December 2019

Liam Greenhough

Consumer Prices Methods Transformation

Overview of the Alternative Data Sources Project

Page 3: Using alternative data sources to produce consumer price

How price statistics are measured

In January, select 700 “items” to track over year. Known

as the fixed basket. Each year the basket is “refreshed” to

account for changing consumer behaviours.

4 December 2019

Page 4: Using alternative data sources to produce consumer price

How price statistics are measured

4 December 2019

For each item, select a group

of products to track over the

year.

Each item is an aggregate –

but is also a “subset” of higher

aggregates.

Page 5: Using alternative data sources to produce consumer price

How price statistics are measured

Collect prices of products each month. These are

collected:

• Locally, and

• Centrally

Approximately 180,000 price quotes are collected per

month.

4 December 2019

Page 6: Using alternative data sources to produce consumer price

How price statistics are measured

Use index formulae to compare prices

of products across months. Most

common index is the Jevons.

Use weights to aggregate upwards to

higher-level indices.

4 December 2019

Page 7: Using alternative data sources to produce consumer price

4 December 2019

0

1

2

3

4

5

1996 JAN 1999 JAN 2002 JAN 2005 JAN 2008 JAN 2011 JAN 2014 JAN 2017 JAN

CPIH compared to the current Bank of England inflation target

CPIH Inflation target

Page 8: Using alternative data sources to produce consumer price

4 December 2019

Consumer Price Statistics: Alternative

Data Sources

Page 9: Using alternative data sources to produce consumer price

Alternative Data

Looking to implement two new data

sources:

• Scanner data – transactional data from

large retailers

• Web scraped data – data scraped from

online retailers

Aim to use in conjunction with traditional!

4 December 2019

Page 10: Using alternative data sources to produce consumer price

Alternative Data – targeted items

4 December 2019

Page 11: Using alternative data sources to produce consumer price

4 December 2019

Data dimension Traditional Scanner data Web scraping

Data acquisition Manual Automated Automated

Completeness/scope Sample from all

retailers

All transactions (bulk)

from medium to large

retailers

Bulk or sample from

online retailers

Metadata Item description Item description +

limited attributes

Item description +

attributes

Quantity data N/A Quantities sold N/A

Timing Single collection day Daily Daily

Page 12: Using alternative data sources to produce consumer price

Big data

System needs to process big

data.

4 December 2019

Traditional data sources: ~180,000 price quotes per month

Scanner data: ~100,000,000 price quotes per month

Page 13: Using alternative data sources to produce consumer price

4 December 2019

The Team

Page 14: Using alternative data sources to produce consumer price

4 December 2019

(Prices)Data

Transformation

(Prices)Methods

Transformation

Emerging Platforms

Methodology

Page 15: Using alternative data sources to produce consumer price

4 December 2019

Some of the research

Page 16: Using alternative data sources to produce consumer price

Scalability

Not possible to manually

scrutinise big data, e.g.

classification.

4 December 2019

Page 17: Using alternative data sources to produce consumer price

Product Churn – synthetic

4 December 2019

0

10

20

30

40

50

60

70

80

90

100

Jan:Jan Jan:Feb Jan:Mar Jan:Apr Jan:May Jan:Jun Jan:Jul Jan:Aug Jan:Sep Jan:Oct Jan:Nov Jan:Dec

Page 18: Using alternative data sources to produce consumer price

The combination problem

Lots of steps to calculate indices

Different methods at each step

Leads to many potential combinations!

4 December 2019

Page 19: Using alternative data sources to produce consumer price

4 December 2019

Our plans

Page 20: Using alternative data sources to produce consumer price

4 December 2019

may be brought forward

with uarterly ublications

if feasible to do so

item coverage de ends on data

availability

Page 21: Using alternative data sources to produce consumer price

4 December 2019

Lefteris Karachalias

Emerging Platforms Development and Support Team

Consumer Prices Data Transformation:

Development

Page 22: Using alternative data sources to produce consumer price

Overview

• The system

o Overall system architecture

o User interaction

• Development framework

o Development project delivery team

o Tools

o Documentation

o Dev&Test

4 December 2019

Page 23: Using alternative data sources to produce consumer price

Overall system architecture

4 December 2019

Data supplier Research/

Publication

DAP: Workspace zone

Staged Processed Analysis

DAP: Landing zone

Raw

Core

pipeline

Analysis

pipelineValidation

Staging

Classification

Decision

rules

Retailer

Expenditure

weights

FTP

Raw

Data Engineers

Page 24: Using alternative data sources to produce consumer price

Core pipeline

4 December 2019

Page 25: Using alternative data sources to produce consumer price

Multiple configuration scenarios

4 December 2019

pipeline

config 1

Staged

data

Stage 1 output

Stage 2 output

Stage 3 output

pipeline

config 2

Stage 1 output

Stage 2 output

Stage 3 output

pipeline

config 3

Stage 1 output

Stage 2 output

Stage 3 output

scenario stage

1 1

2 1

3 1

scenario stage

1 2

2 2

3 2

scenario stage

1 3

2 3

3 3

Processed

data

Page 26: Using alternative data sources to produce consumer price

User interaction

• UI: CDSW / HUE

• Manual

• Configuration

• Mappers (BAU)

• Dashboard (cannot share VDI)

• Output tables

4 December 2019

Page 27: Using alternative data sources to produce consumer price

Development framework (1)

• Project delivery team

• Development phase: Between Discovery and Alpha phase

• Agile, Jira

• DAP, PySpark, HDFS, HIVE (sensitivity)

• Git, GitLab

4 December 2019

Page 28: Using alternative data sources to produce consumer price

Project delivery team

4 December 2019

Project Specialist

Technical Lead

Methods Specifier

Business Analyst

Configuration Engineer/Developer

ConfigurationArchitect

Tester

Product Owner

Page 29: Using alternative data sources to produce consumer price

Development framework (2)

• Unit testing, CI with Jenkins, UAC

• Documentation Sphinx, user manuals

• Business Analysis models, Sparx

• Business Architecture: pushing to the SML

• Synthetic data, Dev&Test, packaging

4 December 2019

Page 30: Using alternative data sources to produce consumer price

Statistical process model

4 December 2019

Page 31: Using alternative data sources to produce consumer price

Data (journey) model

4 December 2019

Input Processing Output

Page 32: Using alternative data sources to produce consumer price

Dev and Test environment

4 December 2019

Real data

ProdDev&Test

Synthetic

data

package

CDSW

prod

Synthetic

data

CDSW

dev

package

CI

JenkinsOutput

Page 33: Using alternative data sources to produce consumer price

Thank you!

4 December 2019

Page 34: Using alternative data sources to produce consumer price

4 December 2019

Any questions?