twitter intelligent sensor agent

Post on 07-Jul-2015

328 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

An overview of University of Athens' work on INSIGHT's Twitter Intelligent Sensor Agent.

TRANSCRIPT

INSIGHT: Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

TwitterISA

Ioannis Katakis Univ. of Athens

1

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Contents

2

The Twitter ISA

Classifying Traffic Related Tweets in Dublin, and the Twitter ISA

Complementarity of Event Detection Methods

Identifying Noisy Hashtags

Evaluating the Sample Quality

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Purpose of the Twitter ISA

Advantages of Social Sensors

Richer information about the event (description in natural language)

Multi-modal content (text, image, sound, video)

Mobile

Low cost. People will volunteer.

Can be asked any question (by crowdsourcing)

3

The Twitter ISA

Analyze the Social Stream in Real-Time and Identify Events

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Twitter ISA in the INSIGHT Architecture

4

Τhe Twitter ISA

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Twitter-ISA Architecture

5

The Twitter ISA

Twitter Process

Historical Data Real-time JSON

Stream

Twitter Agent

Twitter Model

(Traffic +Floods

Classifier)

RT1 RT2 … RTN Round Table

Manager

Twitter Streaming API

Join Table

Leave Table

Query

data

discussion anomaly

Τhe Twitter ISA

Classifying Traffic Related Tweets in Dublin

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Current Situation & Problem

Twitter Services that inform people about traffic issues (@LiveDrive, @RoadWatch, @GardaTraffic)

7

Τhe Twitter ISA

Citizen Tweets about traffic

> Can we automatically identify citizen traffic related tweets?

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Solution

Training the Text Classifier

We could use service Tweets (they talk about traffic) but there are not formatted like a citizen tweet

Assumption: Tweets that @mention one of the services are talking about

Build a classifier on those tweets

Extend the Twitter Dublin-Stream by following users from Dublin

Precision: 70%

8

Τhe Twitter ISA

> A classifier that identifies traffic related tweets

Dimitrios Kotzias, Theodoros Lappas, DimitriosGunopulos: Addressing the Sparsity of Location Information on Twitter. EDBT/ICDT Workshops 2014: 339-346

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Τhe Twitter ISA

Complementarity of Event Detection Methods

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Problem

Study a set of event detection techniques using different sources of information of the same stream

10

Τhe Twitter ISA

Active Users

Sentiment Analysis

Social Graph

London Dataset (10 Days, 700K Tweets)

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Activating Users

11

Τhe Twitter ISA

> Correlation between active users and events> Events motivate users to say something in Twitter

Unique Users Participating in each time segment

Severe thunderstorms in Germany (9/6 & 11/6)

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Sentiment Analysis

> Mostly sad events…

> Independent of the number of users participating

12

Τhe Twitter ISA

Posi

tive

Neg

ativ

e

Emotion Change Detection for Event Identification Online detection of changes

in the emotional data distribution

Anger, fear, disgust, happiness, sadness, surprise.

Valkanas G., Gunopulos D., "How the Live Web Feels About Events", CIKM 2013

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

User to User Interactions

13

Τhe Twitter ISA

1. Extract the social graph (of each time segment) based on the reply tweets (reply = connection

2. Display the largest connected component of this graph as a time series

Largest Connect Component vs Time

> Detection Methods are Complementary

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Τhe Twitter ISA

Identifying noisy hashtags

D. Kotsakos, P. Sakkos, I. Katakis, D. Gunopulos, “#tag: Meme or Event?“, The 2014 IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM 2014), Beijing, China, August 17-20, 2014.

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

#Hashtags

Add valuable meta-knowledge to text that is by nature limited in length

#Events

Track events using hashtags #worldcup2014

#Memes

Promote certain ideas or discussions

Celebrity fans – target trends list of the platform

Advertising

Hashtags, Events and Memes

15

Τhe Twitter ISA

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Memes vs Events

16

Τhe Twitter ISA

Events can be traced back to the news stream and social stream whereas Memes only appears in the social stream

Memes are not inherently detrimental. However, due to their volume they can be noise for some tasks (e.g. event detection)

Many event detection applications are affected by these noisy meme-#hashtags

We developed a method that distinguishes between Event-Hashtags and Meme-Hashtags by using machine learning classifiers

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Features for the Classifier

Text Features

TokensPerTweet

hashTagsPerTweet

urlsPerTweet

mediasPerTweet

favoritesPerTweet

retweetsPerTweet

17

Τhe Twitter ISA

Social features replyTweets mentionsPerTweet tweetsPerUser uniqueUsersCount userFollowersPerUser userFriendsPerUser listedCountPerUser avgVerifiedUsers

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Results

18

Τhe Twitter ISA

We can accurately distinguish between events and meme hashtags

Most informative features (Information Gain)

Τhe Twitter ISA

Is the Sample Good Enough?

G. Valkanas, I. Katakis, D. Gunopulos, A. Stefanidis, “Mining Twitter Data with Resource Constraints“, The 2014 IEEE / WIC / ACM International Conference on Web Intelligence, 11-14 August 2014, Warsaw, Poland.

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Main Research Question

We compare against the 10% sample (Garden Hose)

20

Τhe Twitter ISA

Is the 1% sample provided by the Twitter API sufficient for spatio-temporal analysis tasks? … which tasks?

Problem: Even though the we use methods to extend the Twitter Stream (e.g. following specific users), the 1% constraints remains an issue for a lot of tasks.

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Tasks we look into

Sentiment Analysis

Geo-located information

Popular tweets

Social Graph Evolution

Linguistic Analysis

21

Τhe Twitter ISA

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data

Results & Conclusions

The two streams are similar when it comes to geo-locatedinformation, sentiment analysis, social graph

22

Τhe Twitter ISA

… but they differ when it comes to looking into details (e.g. if you try to find the most re-tweeted tweets)

(An Experiment…)

1. Identify the most retweeted tweets by analyzing both samples.

2. Compare these lists against the ground truth (since this information is included in the tweet)

How the top-N most retweet tweets extracted from the samples are similar to the ground truth. 10% sample approximates the ground truth better

N

Intelligent Synthesis and Real Time Response using Massive Streaming of Heterogeneous Data 23

Thank You!Questions?

top related