infering activities from social media datasfaye.com/wsm17/constantinos_antoniou.pdf · 9,135 venues...

30
Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 Emmanouil Chaniotakis and Constantinos Antoniou Chair of Transportation Systems Engineering Department of Civil, Geo and Engironmental Engineering Technical University of Munich (TUM) Infering Activities From Social Media Data

Upload: others

Post on 10-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017

Emmanouil Chaniotakis and Constantinos Antoniou

Chair of Transportation Systems Engineering Department of Civil, Geo and Engironmental EngineeringTechnical University of Munich (TUM)

Infering Activities From Social Media Data

Page 2: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 2

Facebook the 3rd most visited website

Twitter the 13th most visited website

Facebook 1.09 billion daily active users

Twitter 100 million daily active users

MotivationFacebook Travel Diary Survey

Foursquare Twitter

Page 3: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 3

Two major challenges• Functional use of Social Media• Data availability

Social Media in Transportation: Challenges

Page 4: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 4

Social Media Functionalities

Chaniotakis, Antoniou, Pereira. "Mapping Social Media for Transportation Studies." IEEE Intelligent Systems 31.6 (2016): 64-70.

Page 5: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 5

Social Media Data Availability

Chaniotakis, Antoniou, Pereira. "Mapping Social Media for Transportation Studies." IEEE Intelligent Systems 31.6 (2016): 64-70.

Page 6: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 6

Social Media in Transportation

Chaniotakis, Antoniou, Pereira. "Mapping Social Media for Transportation Studies." IEEE Intelligent Systems 31.6 (2016): 64-70.

Page 7: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 7

Empirical – Exploratory Study for understanding Social Media useGoalsEmpirically compare travel demand from Social Media and conventional travel survey

Activities (based on destination type)

Temporal characteristics• Distribution comparison• Correlations in periods

Spatial Distribution• Zonal Visits• Concentration of Arrivals

Social Media for Travel Demand Data

Chaniotakis, Antoniou, Salanova, Dimitriou, "Can Social Media data augment travel demand survey data?."Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on. IEEE, 2016.

Page 8: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 8

Datasets• 3 Social Media

• Facebook • Twitter• Foursquare

• Recent travel diary survey

Study Area• Thessaloniki, Greece• 2nd largest city in Greece• 1m inhabitants (metropolitan area)• Moderate Social Media use

1st Case Study: SM vs. Surveys

City Centre

Chaniotakis, Antoniou, Salanova, Dimitriou, "Can Social Media data augment travel demand survey data?."Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on. IEEE, 2016.

Page 9: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 9

Data from 3 SM sources:• Twitter• Facebook• Foursquare

In all 3 cases• Geographically bounded data collection• Use of public APIs• Automatic data collection programs• Periodical collection of data

Social Media Data

Chaniotakis, Antoniou, Salanova, Dimitriou, "Can Social Media data augment travel demand survey data?."Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on. IEEE, 2016.

Page 10: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 10

Data Collection of both Real-time and Historical (per user) data:• 7,856 Twitter users using Real-time data collection (~1 year)• User’s timeline data collection (tweets not restricted to Thessaloniki)• 49,169 geotagged tweets (only in Thessaloniki)

Type of information:• Unique user ID• unique tweet ID • tweet text • time of tweet • (lon, lat)

Twitter Data

Chaniotakis, Antoniou, Salanova, Dimitriou, "Can Social Media data augment travel demand survey data?."Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on. IEEE, 2016.

Page 11: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 11

Data Collection of venues check-ins (public data)• 30 min query interval• Area based search à venues & characteristics• Centre (lat,lon) & radius (150m)• Only in the city centre of Thessaloniki

(due to API request number limitations)

7,511 Venues from Facebook9,135 Venues from Foursquare

Type of Information• venue ID• the venue name• the venue category• the total number of visitors

Facebook & Foursquare Data

Chaniotakis, Antoniou, Salanova, Dimitriou, "Can Social Media data augment travel demand survey data?."Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on. IEEE, 2016.

Page 12: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 12

Activities

Chaniotakis, Antoniou, Salanova, Dimitriou, "Can Social Media data augment travel demand survey data?."Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on. IEEE, 2016.

Page 13: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 13

Temporal Analysis (1)

Chaniotakis, Antoniou, Salanova, Dimitriou, "Can Social Media data augment travel demand survey data?."Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on. IEEE, 2016.

Page 14: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 14

Temporal Analysis (2)

Chaniotakis, Antoniou, Salanova, Dimitriou, "Can Social Media data augment travel demand survey data?."Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on. IEEE, 2016.

Page 15: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 15

Text and geolocation offered by one SM platformLocation characterization is offered by anotherThe connection between them may be available (URL)User-centric approach is possible.

2nd Case study: Inferring Activities from Social Media

PossiblythequietestWagamama I'veeverbeenin- andinthemiddleofLondon!!https://t.co/pY3UVIV4iw

Twitter & 4SQ Dataset

Recurrence Classification

per userData Enrichment Location Type

Text Classification

Twitter Dataset

{subset}

{trained classification}

Chaniotakis, E., Antoniou, C., Aifadopoulou, G., & Dimitriou, L., 2017 "Inferring Activities from Social Media Data."In Transportation Research Record (accepted).

Page 16: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 16

Inferring Activities from Social Media (2)

Location Characterization

Enrichment

Twitter Dataset

{trained classification}

Definition of DBSCAN parameters

Per user DBSCAN application

Characterization of tweets’ clusters

Twitter Data Enrichment (Figure 1)

Tweets dataset classified on recurrence

Combined Twitter & Foursquare

Dataset

Enriched – Classified Dataset

Document Term Matrix Creation

Text cleaning

Machine Learning Algorithms application

Start

Identified Activities

Foursquare Dataset

{subset}

End

Chaniotakis, E., Antoniou, C., Aifadopoulou, G., & Dimitriou, L., 2017 "Inferring Activities from Social Media Data."In Transportation Research Record (accepted).

Page 17: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 17

2 years’ data collected from London (Twitter API)482,883 unique users Collected timeline (for a random sample of 90,000 users)11,060,814 tweets in total

Datasets (1)

Chaniotakis, E., Antoniou, C., Aifadopoulou, G., & Dimitriou, L., 2017 "Inferring Activities from Social Media Data."In Transportation Research Record (accepted).

Page 18: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 18

Subset Formation• 101.8 tweets with a standard deviation of 311.4• 0.8% standard deviation of percentage posted per day (close to uniform)

Datasets (2)

In London Twitter Dataset482,883 Users

8,141,996 geotagged tweets3,764,230 URLs

220,118 Foursquare URLs

{subset}{nTweet > 22 & mean dist >1}

Examined Dataset50,344 Users

5,080,362 geotagged tweets 2,127,071 URLs

145,192 Foursquare URLs

Chaniotakis, E., Antoniou, C., Aifadopoulou, G., & Dimitriou, L., 2017 ”Inferring Activities from Social Media Data."In Transportation Research Record (accepted).

Page 19: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 19

8.3% of subset have posted at least one 4SQ link39% of Twitter-4SQ users’ posts include 4SQ link

Density–based Spatial Clustering of Applications with Noise• Point parameter (Eps) = 0.002 (GPS accuracy)• Clustering > 5 posts per location

Recurrence Classification (1)

Chaniotakis, E., Antoniou, C., Aifadopoulou, G., & Dimitriou, L., 2017 "Inferring Activities from Social Media Data."In Transportation Research Record (accepted).

Page 20: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 20

Average number of clusters = 2.41 Standard deviation of 3.18Significant amount of users, who tend to only post in locations

• Activities posted performed scarcelyFrom 65,806 tweets with activities à 172,675 known tweets

Recurrence Classification (2)

Chaniotakis, E., Antoniou, C., Aifadopoulou, G., & Dimitriou, L., 2017 "Inferring Activities from Social Media Data."In Transportation Research Record (accepted).

Page 21: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 21

Combine 4SQ & Twitter Data

Data Enrichment

Twitter API

Tweet Post Text

4SQ API

Twitter Posts

Tweets with Venue Information

Tweet Post Text

URL Extraction

(regex)

URL Extension

4SQ link?

4SQ query

Yes

n<N

Yes

No

JSON parser

Combined Information

Start

End No

1 Chaniotakis, E., Antoniou, C., Aifadopoulou, G., & Dimitriou, L., 2017 ”Inferring Activities from Social Media Data."

In Transportation Research Record (accepted).

Page 22: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 22

Manually aggregated in 14 categoriesTendency towards leisure activities Education and work higher represented during week days

4SQ Activities Distribution

Chaniotakis, E., Antoniou, C., Aifadopoulou, G., & Dimitriou, L., 2017 "Inferring Activities from Social Media Data."In Transportation Research Record (accepted).

Page 23: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 23

Linked tweet text with activitiesRemoved stop words, punctuation

Automatic Text Classification:• Support Vector Machine (SVM)• Generalized Linear Model via Penalized Maximum Likelihood (GLMPML)• Maximum Entropy (MAXENT)

• RTextTools library• Repeated 10 times• 20,000 randomly selected cases • 85% – (17,000 entries) training set• 15% – (3000 remaining entries) testing set

Location Type Classification

Chaniotakis, E., Antoniou, C., Aifadopoulou, G., & Dimitriou, L., 2017 "Inferring Activities from Social Media Data."In Transportation Research Record (accepted).

Page 24: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 24

Acceptable results for MAXENT and GLM classification• Limited text in Twitter• Little manual intervention• Limited cases used

Classification Results

AveragePrecision

PrecisionStandardDeviation

AverageRecall

RecallStandardDeviation

Average F–Score

F–ScoreStandard Deviation

SVM 0.6374 0.0104 0.5808 0.0132 0.6006 0.0117MAXENT 0.8060 0.0112 0.7773 0.0081 0.7881 0.0080GLM 0.8392 0.0078 0.6752 0.0068 0.7249 0.0064

Chaniotakis, E., Antoniou, C., Aifadopoulou, G., & Dimitriou, L., 2017 "Inferring Activities from Social Media Data."In Transportation Research Record (accepted).

Page 25: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 25Max Entropy

Page 26: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 26

• Confusion Matrix for Max Entropy

• Most Confused: • Bar–Pub and restaurant pair

Classification Results

Max Entropy

Chaniotakis, E., Antoniou, C., Aifadopoulou, G., & Dimitriou, L., 2017 "Inferring Activities from Social Media Data."In Transportation Research Record (accepted).

Page 27: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 27

Increased Data Availability from fused - diverse datasets

Data Enrichment based on both user-centric spatial and link information

Unsupervised Text Classification for Activity Generalization

Decoded social media data in useful, inexpensive information concerning activities

Large Span of time in Social Media Data à long–term activity–chains

Can be used to supplement travel surveys

Transferability place and platforms

Conclusions

Page 28: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 28

Not activity-related tweets

Cross-validation of activities and user generated content information

Use of other classification methods

Expand the framework to other activity related aspects of tweets

Identify SM related biases

Future Work (short-term)

Page 29: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

Chair of Transportation Systems EngineeringDepartment of Civil, Geo and Environmental EngineeringTechnical University of Munich

Constantinos Antoniou (TUM) | Workshop on Smart Mobility | 1-2 June 2017 29

Infer mobility patterns and information from various data sources• Traditional surveys• Social media data• Vehicle data (passenger, taxis, commercial, public transport, …)• Telco data (cell-phone, wifi, etc.)• Ride-sharing, car-sharing data• …

Future Work (mid-term)

Page 30: Infering Activities From Social Media Datasfaye.com/wsm17/Constantinos_Antoniou.pdf · 9,135 Venues from Foursquare Type of Information •venue ID •the venue name •the venue

30

Thank you for your attention!

Prof. Constantinos AntoniouChair of Transportation Systems Engineering

Department of Civil, Geo and Environmental Engineering

Technical University of Munich

[email protected]