sc6 workshop 1: big data (phenomenon) challenges and requirements in official statistics -...
TRANSCRIPT
Big data (phenomenon) challenges and requirements in official statistics
Fernando Reis, Big Data Task-Force European Commission (Eurostat)
Big Data Europe Workshop Luxembourg, 18th November 2015
Defining big data in 1 minute
• Data deluge • High Volume, High Velocity, High Variety
• Data-driven analytical applications • Statistical modelling • Visualisation
• Data-driven economy • Official statistics does not have a nearly statistical
monopoly anymore
The data deluge
© Copyright Brett Ryder 2010
Digital footprint
Datafication
Sensors
The data deluge
• Everything is data (data ubiquity) • Examples: Text, sound, images, video • Emergent use of unstructured data • Exhaust data and “reality mining” • Types of big data sources
• Signal to noise ratio (the 4th ‘V’: Value) • The next age: Internet of things
6
The data deluge • Organic data / exhaust data / digital footprint
• Data ubiquity: text, sound, images, video • Emergent use of unstructured data • Reality mining
Communication
Mobile phone data
Social Media
WWW
Web Searches
Businesses' Websites
E-commerce websites
Job advertisements
Real estate websites
Sensors
Traffic loops
Smart meters
Vessel Identification
Satellite Images
Process generated data
Flight Booking transactions
Supermarket Cashier Data
Financial transactions
Crowd sourcing
VGI websites
(OpenStreetMap)
Community pictures
collection
The data deluge
Analytics
This photo, “Cartoon: Big Data” is copyright (c) 2014 Thierry Gregorius and made available under an Attribution 2.0 Generic license.
• How to deal with exhaust data? • Dealt by machine learning / predictive analytics
• Massive datasets • Foster machine learning
• Data science: a new discipline? • Signal processing (audio, image, video) • Natural Language Processing (NLP) • Network data • Distributed computing • Multiple inference • Over-fitting
Analytics
Csáji, Balázs Cs, et al. "Exploring the mobility of mobile phone users." Physica A: Statistical Mechanics and its Applications 392.6 (2013): 1459-1473.
Population statistics Mobile phone
frequent locations Mobile phone commute map
Population Mapping Using Mobile Phone Data
Deville, Pierre, et al. "Dynamic population mapping using mobile phone data." Proceedings of the National Academy of Sciences 111.45 (2014): 15888-15893.
https://www.youtube.com/watch?v=qsUDH5dUnvY
An emergent market
An emergent market
• Monetisation of data: Data is the new oil • Data as a new factor of production (competitive
differentiating factor for businesses) • A threat to official statistics? (ex: Argentina) • Data ecosystem • The cases of Google and Facebook
What does big data mean for official statistics?
• Change of paradigm • From: finite population sampling methodology • To: additional statistical modelling and
machine learning • from designers of data collection processes to
designers of statistical products • Privacy
• Use of digital footprint • Data subject lack of control of data • High data detail and insight from analytics
Mobile Phone Data
Tourism Statistics
Population Statistics
Migration Statistics
Traffic Statistics
Commuting
Statistics Population
Statistics
Mobile phone data
Smart Meters
VGI websites
Satellite Images
Multisource statistics and multipurpose sources
Policy Quality Skills
Experience sharing Legislation
IT Infrastructures
Methods Ethics / Communication Pilots
ESS big data action plan
Challenges for data management
• Size of datasets (storage and processing) • Lack of control on data sources • Data ownership / licensing • Volatility / sustainability of data sources • Data integration (variety of data sources) • Open data (do we need to store it?) • Data types (natural language, images, geo-
location) • Level of detail of the data
Challenges for data management
• Technological change (tools change frequently) • Privacy (anonymization methods) • Data security (which data to share) • Data interface with production / research • Metadata (are current standards enough?) • Replicability / auditability • Data versioning • Data handling methodologies • Applications (e.g. network analysis)
Thank you for your attention
Fernando Reis
Eurostat Task Force on Big Data
https://github.com/reisfe/
https://twitter.com/reisfe/
https://linkedin.com/in/reisfe/