inegi ess big data workshop

27
Twitter: A source of Big Data in a NSO. Experimental Indicators in Sentiment and Mobility The ESS Big Data Workshop 2016

Upload: abel-alejandro-coronado-iruegas

Post on 13-Apr-2017

181 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: INEGI ESS big data workshop

Twitter: A source of Big Data in a NSO.

Experimental Indicators in Sentiment and MobilityThe ESS Big Data Workshop 2016

Page 2: INEGI ESS big data workshop

Objective

To share the experience of INEGI in the use of Twitter as a big data source.

Page 3: INEGI ESS big data workshop

Types of Big Data Sources• Meters and smart sensors. Traffic cameras, GPS devices, power

consume meters, IoT, smartwatches, smartphones, etc.• Social interactions. Conversations and publications on social

networks like Twitter, Facebook, FourSquare, etc.• Business transactions. Credit cards movements, scanned data,

cell phone records, etc.• Electronic files. Documents which are available in electronic

formats such as PDF files, websites, videos, audio, images, photos, etc.

• Broadcast media. Digital video and audio streamed on real-time

Page 4: INEGI ESS big data workshop

Types of Big Data Sources• Meters and smart sensors. Traffic cameras, GPS devices,

power consume meters, IoT, smartwatches, smartphones, etc.• Social interactions. Conversations and publications on social

networks like Twitter, Facebook, FourSquare, etc.• Business transactions. Credit cards movements, electronic

cash registers, cell phone records, etc.• Electronic files. Documents which are available in electronic

formats such as PDF files, websites, videos, audio, digital media broadcasting

• Broadcast media. Digital video and audio streamed on real-time

Page 5: INEGI ESS big data workshop

Process Followed by INEGI (until now)

Page 6: INEGI ESS big data workshop

Production process

Page 7: INEGI ESS big data workshop

Study Case

• Initial objective of INEGI’s Big Data Project: To generate experimental indicators using Big Data techniques with social media data, to complement statistical information obtained from traditional methods and sources.

• Initial Goal: To obtain indicators of subjective wellbeing from social media data sources.

Page 8: INEGI ESS big data workshop

Why did we choose Twitter as a data source?

• It’s a widely adopted social network where you can find content written by common people

• Tweets are public, so we can use them without concerns about privacy

• There is a free API which allows to get up to 1% of the tweets that are being produced on real time

• ( https://dev.twitter.com/streaming )

Page 9: INEGI ESS big data workshop

In the beginning…

February 2014 – October 201524/7

Page 10: INEGI ESS big data workshop

Collection / Analysis infrastructureSince October 2015 - …

(EMC / VMWare)

LOGSTASHLocation Query

Free Access

https://twittercommunity.com/t/stream-filter-sample-size/30865

Apache SparkClean & Sentiment

Analysis

TweetsDaily Processing(4 a.m.)300 K Geo-Tweets

MinimalRepresentation

~260 Millionsof Geo-Tweets ~130 Millions inside Mexico~ 2 Years and 8 Months ~ 24/7

Page 11: INEGI ESS big data workshop

Multivariate Stratification of blocks in Apache Spark

%Acceso a Internet, %Pc, %Telefono Celular, %Automovil

Page 12: INEGI ESS big data workshop

Software Stack (2016)

Page 13: INEGI ESS big data workshop

Tweet Structure

Page 14: INEGI ESS big data workshop

Tweets Map

Page 15: INEGI ESS big data workshop

Tweets Map

Page 16: INEGI ESS big data workshop

We found that

• The JSON structure is easy to process (APACHE SPARK)

• The content is text that we can examine to make the sentiment analysis

• Geographical coordinates can be used to filter the tweets and obtain only those of interest (warning: not all the tweets are geo referenced)

• We can make mobility analysis, based in Tweets’ location and time (a serendipity)

Page 17: INEGI ESS big data workshop

Exploring Big Data Base

Page 18: INEGI ESS big data workshop

Supervised Training

Manual Tagging

5000 people (TecMilenio students), 100 tweets tagged by each one,each tweet was tagged nine times,about 40,000 different tagged tweets,interpreted accordingly to regional idioms

http://cienciadedatos.inegi.org.mx/animotuitero/

They’re not enchiladas!

Page 19: INEGI ESS big data workshop

Training and modeling Collaborative Research

PythonImplementation

Page 20: INEGI ESS big data workshop

Sentiment Visualization (2015)(Monthly Indicator)

http://www.inegi.org.mx/inegi/contenidos/investigacion/experimentales/animotuitero/default.aspx

{JSON:File}

Page 21: INEGI ESS big data workshop

Sentiment Visualization (Late 2016)(Daily Indicator)

C#{RESTful:API}

{NoSQL}

Page 22: INEGI ESS big data workshop

Mobility (Late 2016)

Page 23: INEGI ESS big data workshop

Integration of other sourcesBig Spatial Join

(5 M Economic Units with +60 M Tweets)

https://github.com/syoummer/SpatialSpark

Page 24: INEGI ESS big data workshop

Some applications• Tourism• Migration• Use of roads• Regional influence of big cities• Mobility patterns• Business activity patterns• Subjective wellbeing• Inequities impact

• Impact analysis of relevant news

• Mental health • Misogynist/discriminatory

language use• SDG indicators?

Page 25: INEGI ESS big data workshop

Collaboration• International

– UNECE• ICHEC

– UNSD– LAMBDoop– University of Pensylvania

• National– KioNetworks– Dattlas– TecMilenio– INFOTEC– Centro Geo– CIDE– CIMAT– Sectur

• Internal– INEGI General Directorates

Page 26: INEGI ESS big data workshop

Questions?

[email protected] M.Sc. Abel Coronado

@abxda

Page 27: INEGI ESS big data workshop

Conociendo México01 800 111 46 34

[email protected]

@inegi_informa INEGI Informa