sc6 workshop 1: big data (phenomenon) challenges and requirements in official statistics -...

19
Big data (phenomenon) challenges and requirements in official statistics Fernando Reis, Big Data Task-Force European Commission (Eurostat) Big Data Europe Workshop Luxembourg, 18 th November 2015

Upload: bigdataeurope

Post on 13-Apr-2017

872 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

Big data (phenomenon) challenges and requirements in official statistics

Fernando Reis, Big Data Task-Force European Commission (Eurostat)

Big Data Europe Workshop Luxembourg, 18th November 2015

Page 2: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

Defining big data in 1 minute

•  Data deluge •  High Volume, High Velocity, High Variety

•  Data-driven analytical applications •  Statistical modelling •  Visualisation

•  Data-driven economy •  Official statistics does not have a nearly statistical

monopoly anymore

Page 3: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

The data deluge

© Copyright Brett Ryder 2010

Page 4: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

Digital footprint

Datafication

Sensors

Page 5: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

The data deluge

•  Everything is data (data ubiquity) •  Examples: Text, sound, images, video •  Emergent use of unstructured data •  Exhaust data and “reality mining” •  Types of big data sources

•  Signal to noise ratio (the 4th ‘V’: Value) •  The next age: Internet of things

Page 6: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

6

The data deluge •  Organic data / exhaust data / digital footprint

•  Data ubiquity: text, sound, images, video •  Emergent use of unstructured data •  Reality mining

Page 7: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

Communication

Mobile phone data

Social Media

WWW

Web Searches

Businesses' Websites

E-commerce websites

Job advertisements

Real estate websites

Sensors

Traffic loops

Smart meters

Vessel Identification

Satellite Images

Process generated data

Flight Booking transactions

Supermarket Cashier Data

Financial transactions

Crowd sourcing

VGI websites

(OpenStreetMap)

Community pictures

collection

The data deluge

Page 8: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

Analytics

This photo, “Cartoon: Big Data” is copyright (c) 2014 Thierry Gregorius and made available under an Attribution 2.0 Generic license.

Page 9: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

•  How to deal with exhaust data? •  Dealt by machine learning / predictive analytics

•  Massive datasets •  Foster machine learning

•  Data science: a new discipline? •  Signal processing (audio, image, video) •  Natural Language Processing (NLP) •  Network data •  Distributed computing •  Multiple inference •  Over-fitting

Analytics

Page 10: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

Csáji, Balázs Cs, et al. "Exploring the mobility of mobile phone users." Physica A: Statistical Mechanics and its Applications 392.6 (2013): 1459-1473.

Population statistics Mobile phone

frequent locations Mobile phone commute map

Page 11: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

Population Mapping Using Mobile Phone Data

Deville, Pierre, et al. "Dynamic population mapping using mobile phone data." Proceedings of the National Academy of Sciences 111.45 (2014): 15888-15893.

https://www.youtube.com/watch?v=qsUDH5dUnvY

Page 12: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

An emergent market

Page 13: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

An emergent market

•  Monetisation of data: Data is the new oil •  Data as a new factor of production (competitive

differentiating factor for businesses) •  A threat to official statistics? (ex: Argentina) •  Data ecosystem •  The cases of Google and Facebook

Page 14: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

What does big data mean for official statistics?

•  Change of paradigm •  From: finite population sampling methodology •  To: additional statistical modelling and

machine learning •  from designers of data collection processes to

designers of statistical products •  Privacy

•  Use of digital footprint •  Data subject lack of control of data •  High data detail and insight from analytics

Page 15: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

Mobile Phone Data

Tourism Statistics

Population Statistics

Migration Statistics

Traffic Statistics

Commuting

Statistics Population

Statistics

Mobile phone data

Smart Meters

VGI websites

Satellite Images

Multisource statistics and multipurpose sources

Page 16: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

Policy Quality Skills

Experience sharing Legislation

IT Infrastructures

Methods Ethics / Communication Pilots

ESS big data action plan

Page 17: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

Challenges for data management

•  Size of datasets (storage and processing) •  Lack of control on data sources •  Data ownership / licensing •  Volatility / sustainability of data sources •  Data integration (variety of data sources) •  Open data (do we need to store it?) •  Data types (natural language, images, geo-

location) •  Level of detail of the data

Page 18: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

Challenges for data management

•  Technological change (tools change frequently) •  Privacy (anonymization methods) •  Data security (which data to share) •  Data interface with production / research •  Metadata (are current standards enough?) •  Replicability / auditability •  Data versioning •  Data handling methodologies •  Applications (e.g. network analysis)

Page 19: SC6 Workshop 1: Big data (phenomenon) challenges and requirements in official statistics - BigDataEurope, SC6 Workshop

Thank you for your attention

Fernando Reis

Eurostat Task Force on Big Data

https://github.com/reisfe/

https://twitter.com/reisfe/

https://linkedin.com/in/reisfe/

[email protected]