olav ten bosch msis, dublin, 14-16 april 2014

Post on 26-Feb-2016

28 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

On the use of internet robots for official statistics. Olav ten Bosch MSIS, Dublin, 14-16 April 2014. Overview. Why internet as a data source (IAD)? Internet robots, how do they work ? Applications: Airline tickets Housing market Clothing “Robot assisted data collection” - PowerPoint PPT Presentation

TRANSCRIPT

Olav ten BoschMSIS, Dublin, 14-16 April 2014

On the use of internet robots for official statistics

Overview

– Why internet as a data source (IAD)?– Internet robots, how do they work?– Applications:

‐ Airline tickets‐ Housing market‐ Clothing‐ “Robot assisted data collection”

– Conclusion

Why IAD? (1)

Administrative sources– Tax, social security services– Municipalities/ Provinces– Supermarkets

Surveys

Internet sources

Less!!!

Faster, better, more efficient

New indicators

4

Which content is original, reliable, stable,representative and accessible?

Internet sources

Why IAD? (2)

– Internet prices for CPI ?– Real estate sites for housing statistics ?– Internet vacancies for job statistics ?– Social media sentiment for consumer

confidence ?– Trade in second-hand goods as

economic indicators ? – Travel activity for tourism statistics ?

Robots / crawlers / bots / spiders / scrapers: how do they work? (1)

Browser

Website

Internet Requests

code,images,

style,data,etc.

Graphicalmarkup

YouCommands

Robots / crawlers / bots / spiders / scrapers: how do they work? (2)

Robot/ spider/ crawler

Website

Internet Requests

Navigation

code,images,

style,data,etc.

Data

You

Robots / crawlers / bots / spiders / scrapers: how do they work? (3)

Robot/ spider/ crawler

Website

Internet Requests

Navigation

code,images,

style,data,etc.

Data

Monitoractively

Generic software for:- site navigation- product details- monitoring

DataData

DataData

Agile

Airline tickets (1)Robot collection versus manual collection

0

50

100

150

200

250

11 Feb 03 Mar 23 Mar 12 Apr 02 May 22 May 11 Jun 01 Jul 21 Jul 10 Aug

Ticket price Amsterdam -Milano

Robot

Manual

Airline tickets (2)Price of a ticket over time

-80%

-60%

-40%

-20%

0%

20%

40%

60%

-120 -90 -60 -30 0

Days before departure

Pric

e w

rt av

erag

e

Barcelona

London

Milaan

Rome

Housing Market (1)

Housing market (2)Dynamics of the ‘database behind’ becomes visible

Clothing (1):

2 sites: very volatile data

Clothing (2):

Challenges:- from volatile data to stable statistics- how to classify multiple less structured

data sources

Seasonal pattern

Robot-assisted data collection (1)

– Use case: few price observations on many sites– Example: price of a cinema ticket– “Robot tool” to automatically check if prices are changed

Robot-assisted data collection (2)

16

Conclusion

– Using internet as a datasource we can measure statistical phenomena in a completely different way

– It is powerful to combine fast internet data with reliable (but slower) administrative data

– We should redesign statistics with the possibilities of internet data in mind

Challenges:– Legal framework– The internet changes continuously: how to turn volatile data sources into reliable statistics?– We need advanced statistical methods, processes and IT

top related