modernisation of price collection at statistics netherlands

26
Els Hoogteijling ESS Modernisation Workshop, Bucharest, March 2016 Modernisation of Price Collection at Statistics Netherlands

Upload: hoangduong

Post on 15-Dec-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Modernisation of Price Collection at Statistics Netherlands

Els Hoogteijling ESS Modernisation Workshop, Bucharest, March 2016

Modernisation of Price Collection at Statistics Netherlands

Page 2: Modernisation of Price Collection at Statistics Netherlands

Overview

Why modernise price collection for CPI?

Scanner data

Webscrapers

Robot assisted price collection

Critical success factors and lessons learned

Page 3: Modernisation of Price Collection at Statistics Netherlands

Why modernisation of price collection ?

– Dynamics of consumer market

– Internet purchases

– Reduction of administrative burden

– Cost effective

– Improved quality of CPI/HICP

– More detail

Page 4: Modernisation of Price Collection at Statistics Netherlands

Price collection at Statistics Netherlands

Before 2000 Mainly price collection in shops Questionnaires Price collection by telephonic interviews 2000-2010 Introduction of scanner data Introduction of price collection on internet Reduction of price collection in shops From 2010 More scanner data More internet data Registers and administrative data Strong reduction of price collection in shops

4

0

1000

2000

3000

4000

5000

6000

7000

8000Number of shops visited by interviewers (per month)

Page 5: Modernisation of Price Collection at Statistics Netherlands

Scanner data, transaction data, administrative data

Started in 2003 - Strong growth from 2010

• Scanner data 14 supermarkets; no price collection by interviewers 2 DIY-shops; more DIY-shops in 2016/2017 2 drugstores; more drugstores in 2016 1 department store; more department stores in 2016/2017

• Transaction data travel agencies • Transaction data fuels

• Registers energy prices

5

0

5

10

15

20

25

30

35

40

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2018

Scanner data in CPI-production

Page 6: Modernisation of Price Collection at Statistics Netherlands

Internet data by webscrapers

Why internet as a data source Internet robots, how do they work? Price collection from webshops – clothing Classification, computation, results Monitoring

Page 7: Modernisation of Price Collection at Statistics Netherlands

Robots / crawlers / bots / spiders / scrapers: how do they work? (1)

Browser

Website

Internet Requests

code, images,

style, data, etc.

Graphical markup

You Commands

Page 8: Modernisation of Price Collection at Statistics Netherlands

Robots / crawlers / bots / spiders / scrapers: how do they work? (2)

Robot/ spider/ crawler

Website

Internet Requests

Navigation

code, images,

style, data, etc.

Data

You

Page 9: Modernisation of Price Collection at Statistics Netherlands

Robots / crawlers / bots / spiders / scrapers: how do they work? (3)

Robot/ spider/ crawler

Website

Internet Requests

Navigation

code, images,

style, data, etc.

Data

Monitor actively

Generic software for: - site navigation - product details - monitoring

Data Data

Data Data

Page 10: Modernisation of Price Collection at Statistics Netherlands

Webscrapers: legal aspects

– Netiquette Robots identify as “CBSBot, Statistics Netherlands” Robots operate during night / morning Robots minimize load: wait for a second between requests

– Communication Statistics Netherlands informs web site owners in case of considerable data retrieval

– Database law / intellectual property rights

Statistics Netherlands operates under the Dutch statistics law and does not use the data for any other means than specified in that legislation.

Page 11: Modernisation of Price Collection at Statistics Netherlands

Webscrapers for CPI

Started in 2012: ‐ robots collects daily all products and prices from webshops ‐ including product description and classification characteristics ‐ major webshops for clothing

Data analysed for some years; classification and methodology developed

Now:

‐ 15 websites are scraped daily ‐ 3 websites are used for computation of CPI ‐ automated collection, monitoring, postprocessing, transport and

storage ‐ daily/weekly monitoring

Future:

‐ 20 – 30 websites in 2018?

Page 12: Modernisation of Price Collection at Statistics Netherlands

Price collection from webshops

Page 13: Modernisation of Price Collection at Statistics Netherlands

Inspection of data – very volatile

13

Number of articles

Number of articles : sale

Page 14: Modernisation of Price Collection at Statistics Netherlands

From data to statistics

Challenges: ‐ from volatile data to stable statistics ‐ how to classify multiple less structured data sources

Seasonal pattern

Page 15: Modernisation of Price Collection at Statistics Netherlands

Product characteristics

15

Page 16: Modernisation of Price Collection at Statistics Netherlands

From product characteristics to classification

Brand: H&M Division: women Type: jacket Article description: blazer Fabric: leather Color: dark grey Size: medium …….. …….. Price: € 49.99

Classification per website DIVISION: men, women, children LAYER: underwear, upperwear FUNCTION: regular, special occasions; nightwear; sports PART OF BODY: upper body, lower body; legs, arms, head, …

Price

Page 17: Modernisation of Price Collection at Statistics Netherlands

Monitoring Websites change constantly and unexpectedly Monitoring of collected data is a must Articles per division Articles in sale New types of articles New structure of website DevOps team: Development – IT-experts- Operations (CPI) working close together

Page 18: Modernisation of Price Collection at Statistics Netherlands

Some results

60

110

160

201412 201501 201502 201503 201504 201505 201506 201507 201508 201509 201510 201511 201512 201601

031210 Garments for men

60

110

160

201412 201501 201502 201503 201504 201505 201506 201507 201508 201509 201510 201511 201512 201601

031220 Garments for women

60

110

160

201412 201501 201502 201503 201504 201505 201506 201507 201508 201509 201510 201511 201512 201601

031230 Garments for infants and children

Website‐X all shops

Page 19: Modernisation of Price Collection at Statistics Netherlands

There is more than webscraping

• Webscrapers are suited for many prices on few sites • In CPI we also collect few prices from many sites

for example: driving lessons, cinema tickets, pizza delivery services

• Not feasible to build a robot for every single site too expensive, monitoring, maintenance

Start of robot assisted data collection

Page 20: Modernisation of Price Collection at Statistics Netherlands

Robot assisted data collection Robot tool automatically checks whether prices are changed

Traffic light indicates status:

• Green: nothing changed, prices is saved in database • Red: some change, need attention of statistician • Two clicks to hold old price or store a new one

Page 21: Modernisation of Price Collection at Statistics Netherlands

Robot assisted data collection – impact on organisation Specialists who used to collect prices manually from the internet now use the robottool More prices collected in less time (80% productivity improvement) Better quality and less rework (reduced chance of making errors) Work is more interesting No need for organisational changes

Page 22: Modernisation of Price Collection at Statistics Netherlands

Robot assisted data collection – try it yourself

Robot Tool is available on request for other NSI’s http://research.cbs.nl

Page 23: Modernisation of Price Collection at Statistics Netherlands

Critical succes factors

Close cooperation methodologists, IT and CPI‐experts

Feeling of urgency, wish to change

• Adapt quickly to changes in the website (using an

efficient framework for the robots)

• Automatic classification of the data

• Methodology to calculate prices indices

• Patience: you can’t change overnight

• Balance between impatience and cautiousness

Page 24: Modernisation of Price Collection at Statistics Netherlands

Lessons learned

• Monitor the data weekly, even if not in production for CPI • Implement the new methods gradually, no ‘big boom’ o Learn from the robots in production o Improve monitoring o Improve classification algoritms o Lower risks

• In traditional methods the collection of prices is the end of a proces, in internet robots it is just the beginning

Robots are not perfect, neither is price collection in shops

Page 25: Modernisation of Price Collection at Statistics Netherlands

Conclusion

Scanner data and internet data: • Reduce administrative burden (85% less price collection in shops) • Cost effective • Better quality by using millions of prices • Can be done without large impact on organisation

25

Price collection by weighting share in CPI Scanner data

Electronic questionnaireRents surveyE‐data energy and fuels

Internet prices and pricelistsE‐data travel agencies

Electronic/paperquestionnairesPrice collection bytelephonePrice collection in shops

Page 26: Modernisation of Price Collection at Statistics Netherlands

Thank you for your attention! Questions? Discussion