sentiment knowledge discovery in twitter streaming … · introduction methods results discussion...
TRANSCRIPT
![Page 1: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/1.jpg)
Introduction Methods Results Discussion
Sentiment Knowledge Discovery in TwitterStreaming Data
Albert Bifet and Eibe Frank (2010)
Angermuller Christof
LMU, Lehr- und Forschungseinheit fur Datenbanksysteme
November 14, 2012
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 1/36
![Page 2: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/2.jpg)
Introduction Methods Results Discussion
1 Introduction
2 Methods
3 Results
4 Discussion
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 2/36
![Page 3: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/3.jpg)
Introduction Methods Results Discussion
Social networking - microblogging service
Tweets
up to 140 characters long text message@runtastic: user name#iPhone: subject or categoryRT I like my iPhone: retweet - repetition of tweet
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 3/36
![Page 4: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/4.jpg)
Introduction Methods Results Discussion
Twitter: Tweets per day
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 4/36
![Page 5: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/5.jpg)
Introduction Methods Results Discussion
Twitter: Statistics
Statistics
More than 140 mil users
More than 400 mil tweets per day
Exponential growth
Makes Twitter attractive for
Companies
Politicians
Data mining
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 5/36
![Page 6: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/6.jpg)
Introduction Methods Results Discussion
Sentiment analysis
Definition (Sentiment analysis)
Given a tweet
Predict its sentiment
positive(neutral)negative
Twitter sentiment analysis
sentiment140.com
tweetfeel.com
tweettone.com
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 6/36
![Page 7: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/7.jpg)
Introduction Methods Results Discussion
Challenges learning from Twitter streams
Twitter API provides access to all tweets
Incremental learning instead of Batch learning required
1 Time efficiency: samples arrive with high rate
2 Memory efficiency: infinite training set
3 Concept drift: sentiments change dynamically
4 Evaluation: concept drift and unbalanced classes
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 7/36
![Page 8: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/8.jpg)
Introduction Methods Results Discussion
1 Introduction
2 Methods
3 Results
4 Discussion
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 8/36
![Page 9: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/9.jpg)
Introduction Methods Results Discussion
Classification
Problem description
Given: training set{
(x (1), y (1)), ..., (x (m), y (m))}
x (i): data (unigram, bigram)y (i): class (y (i) = +1: positive, y (i) = −1: negative)
Wanted: hypothesis hθ(x)→ y
minimizes some cost function J(θ)
Models for sentiment classification
Naıve Bayes
SVM
Hoeffding tree
Logistic regression
...
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 9/36
![Page 10: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/10.jpg)
Introduction Methods Results Discussion
Classification
Problem description
Given: training set{
(x (1), y (1)), ..., (x (m), y (m))}
x (i): data (unigram, bigram)y (i): class (y (i) = +1: positive, y (i) = −1: negative)
Wanted: hypothesis hθ(x)→ y
minimizes some cost function J(θ)
Models for sentiment classification
Naıve Bayes
SVM
Hoeffding tree
Logistic regression
...
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 9/36
![Page 11: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/11.jpg)
Introduction Methods Results Discussion
Classification
Problem description
Given: training set{
(x (1), y (1)), ..., (x (m), y (m))}
x (i): data (unigram, bigram)y (i): class (y (i) = +1: positive, y (i) = −1: negative)
Wanted: hypothesis hθ(x)→ y
minimizes some cost function J(θ)
Models for sentiment classification
Naıve Bayes
SVM
Hoeffding tree
Logistic regression
...
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 9/36
![Page 12: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/12.jpg)
Introduction Methods Results Discussion
Naıve Bayes
Prediction
P(y |x) =P(x |y)P(y)
P(x)∝
n∏j=1
P(wj |y)xjP(y)
hθ(x) = argmaxy ′P(y ′|x)
Training
P(wj |y) Prob. of word wj in tweet with sentiment y
P(y) Prior prob. of sentiment y
Can be easily approximated by the relative frequency
Count word wj sentiment y
⇒ Simple incremental implementation
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 10/36
![Page 13: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/13.jpg)
Introduction Methods Results Discussion
Batch Gradient Descent
Idea
Minimize cost function J(θ) =∑m
i=1 l(x(i), y (i), θ)
Go into direction of steepest descent ∇J(θ)
Algorithm
Initialize θ0 randomly
Loop until convergence:
θt+1 = θt − α∇J(θt)
= θt − αm∑i=1
∇l(x (i), y (i), θt)
J
dJ
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 11/36
![Page 14: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/14.jpg)
Introduction Methods Results Discussion
Stochastic Gradient Descent
Idea
Compute gradient of a single random sample
More efficient than Batch Gradient Descent
Algorithm
Initialize θ0 randomly
Loop until convergence:
Shuffle training set randomlyfor i = 1 to m
θt+1 = θt − α∇l(x (i), y (i), θt)
⇒ Simple incremental update for new (x (i), y (i))
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 12/36
![Page 15: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/15.jpg)
Introduction Methods Results Discussion
Stochastic Gradient Descent
Idea
Compute gradient of a single random sample
More efficient than Batch Gradient Descent
Algorithm
Initialize θ0 randomly
Loop until convergence:
Shuffle training set randomlyfor i = 1 to m
θt+1 = θt − α∇l(x (i), y (i), θt)
⇒ Simple incremental update for new (x (i), y (i))
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 12/36
![Page 16: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/16.jpg)
Introduction Methods Results Discussion
SVM: Support Vector Machine
Prediction
Given weight vector θ
hθ(x) =
{+1 θT x ≥ 0
−1 θT x < 0
Training
Hinge loss function:
J(θ) =1
m
m∑i=1
max{
1− y (i)θT x (i), 0}
θ can be updated incrementally viaSGD
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 13/36
![Page 17: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/17.jpg)
Introduction Methods Results Discussion
Hoeffding TreePrediction
Each node tests an attribute
Each edge refers to an attribute value
hθ(x) =‘label y of leaf to which x is assigned to’
x = (6, 2)
x1 < 5
x2 < 6 x2 < 3
θ1 θ2 θ3 θ4
YesNo
Yes N
o Yes N
o
x1
x2
5
6
3 θ1
θ2
θ3
θ4
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 14/36
![Page 18: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/18.jpg)
Introduction Methods Results Discussion
Hoeffding TreePrediction
Each node tests an attribute
Each edge refers to an attribute value
hθ(x) =‘label y of leaf to which x is assigned to’
x = (6, 2) x1 < 5
x2 < 6 x2 < 3
θ1 θ2 θ3 θ4
YesNo
Yes N
o Yes N
o
x1
x2
5
6
3 θ1
θ2
θ3
θ4
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 14/36
![Page 19: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/19.jpg)
Introduction Methods Results Discussion
Hoeffding TreeIncremental tree induction
1 Assign (x , y) to leaf2 Use Hoeffding bound as split criterion
Top-down model tree induction
x = (No,No,Yes) A1
A2 θ3
θ1 θ2
YesNo
Yes N
o
θ3
No
A3
θ4 θ5
Yes Yes
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 15/36
![Page 20: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/20.jpg)
Introduction Methods Results Discussion
Hoeffding TreeIncremental tree induction
1 Assign (x , y) to leaf
2 Use Hoeffding bound as split criterion
Top-down model tree induction
x = (No,No,Yes) A1
A2 θ3
θ1 θ2
YesNo
Yes N
o
θ3
No
A3
θ4 θ5
Yes Yes
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 15/36
![Page 21: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/21.jpg)
Introduction Methods Results Discussion
Hoeffding TreeIncremental tree induction
1 Assign (x , y) to leaf2 Use Hoeffding bound as split criterion
Top-down model tree induction
x = (No,No,Yes) A1
A2 θ3
θ1 θ2
YesNo
Yes N
o
θ3
No
A3
θ4 θ5
Yes Yes
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 15/36
![Page 22: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/22.jpg)
Introduction Methods Results Discussion
Hoeffding TreeIncremental tree induction
1 Assign (x , y) to leaf2 Use Hoeffding bound as split criterion
∆G = G (Xa)− G (Xb) > ε
ε = ln
√R2ln(1/α)
2n
⇒ Hypothesis testing with significance level α
Top-down model tree induction
x = (No,No,Yes) A1
A2 θ3
θ1 θ2
YesNo
Yes N
o
θ3
No
A3
θ4 θ5
Yes Yes
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 15/36
![Page 23: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/23.jpg)
Introduction Methods Results Discussion
Labeling tweets
Question
How to label millions of tweets?
Answer
Emoticons :-)
Positive sentiment
x =‘I love my iPhone more than my girlfriend :-)‘
y = +1
Negative sentiment
x =‘My iPhone is broken :-(‘
y = −1
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 16/36
![Page 24: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/24.jpg)
Introduction Methods Results Discussion
Labeling tweets
Question
How to label millions of tweets?
Answer
Emoticons :-)
Positive sentiment
x =‘I love my iPhone more than my girlfriend :-)‘
y = +1
Negative sentiment
x =‘My iPhone is broken :-(‘
y = −1
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 16/36
![Page 25: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/25.jpg)
Introduction Methods Results Discussion
Data preparation (Go et al., sentiment140.com)
1 Retrieve tweets via Twitter API
2 Discard tweets without emoticons
3 Define y = +1/− 1 with positive/negative emoticon4 Feature reduction
Remove emoticons@john → USERwww.google.com → URLheeeello → hello
5 Define x with unigrams/bigrams
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 17/36
![Page 26: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/26.jpg)
Introduction Methods Results Discussion
Training and Test set
Holdout validation
Prequential / Interleaved test-then-train
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 18/36
![Page 27: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/27.jpg)
Introduction Methods Results Discussion
Training and Test set
Holdout validation
1 Divide data into training set (60%) and test set (40%)
2 Train model on training set
3 Measure performance on test set
⇒ Static test set
Prequential / Interleaved test-then-train
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 18/36
![Page 28: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/28.jpg)
Introduction Methods Results Discussion
Training and Test set
Holdout validation
1 Divide data into training set (60%) and test set (40%)
2 Train model on training set
3 Measure performance on test set
⇒ Static test set
Prequential / Interleaved test-then-train
New sample (x , y) arrives...
1 Test hθ(x)?= y and update performance metric
2 Update θ with (x , y)
⇒ Dynamic test set
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 18/36
![Page 29: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/29.jpg)
Introduction Methods Results Discussion
Accuracy
f (x)
hθ(x)
1 · · · L1 o1,1 · · · o1,L r1...
.... . .
......
L oL,1 · · · oL,L rLc1 · · · cL m
Definition (Accuracy)
p0 =1
m
L∑i=1
oi ,i
⇒ Fraction of correct predictions
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 19/36
![Page 30: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/30.jpg)
Introduction Methods Results Discussion
Accuracy
Unbalanced classes
Unbalanced (skewed) class distribution
Can lead to high accuracy
Cancer prediction
y = −1: person has no cancer (99%)
y = +1: person has cancer (1%)
hθ(x) = −1, i.e. predict always no cancer
⇒ p0 = 99%
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 20/36
![Page 31: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/31.jpg)
Introduction Methods Results Discussion
Accuracy
Unbalanced classes
Unbalanced (skewed) class distribution
Can lead to high accuracy
Cancer prediction
y = −1: person has no cancer (99%)
y = +1: person has cancer (1%)
hθ(x) = −1, i.e. predict always no cancer
⇒ p0 = 99%
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 20/36
![Page 32: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/32.jpg)
Introduction Methods Results Discussion
κ statistic (Cohen, 1960)
Definition (kappa statistic)
rim
= predicted frequency of class i
cim
= actual frequency of class i
pc =L∑
i=1
rim
cim
= probability of being correct by chance
κ =p0 − pc1− pc
κ: accuracy relative to change prediction
po = pc → κ = 0
po = 1→ κ = 1Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 21/36
![Page 33: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/33.jpg)
Introduction Methods Results Discussion
κ statistic
Cancer prediction
y = −1: person has no cancer (99%)
y = +1: person has cancer (1%)
hθ(x) = −1, i.e. predict always no cancer
p0 = 0.99
pc = 1.00 ∗ 0.99 + 0.00 ∗ 0.01 = 0.99
⇒ κ = p0−pc1−pc = 0.0
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 22/36
![Page 34: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/34.jpg)
Introduction Methods Results Discussion
κ statisticIncremental update
Variables
Only 2L + 1 variables
ri , ci for i = 1, ..., L: row, column sums
p0: accuracy
Can be updated incrementally for a new (x , y)
Only consider most recent samples
Sliding window: store last w samples
Fading factors: down weight older samples
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 23/36
![Page 35: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/35.jpg)
Introduction Methods Results Discussion
1 Introduction
2 Methods
3 Results
4 Discussion
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 24/36
![Page 36: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/36.jpg)
Introduction Methods Results Discussion
Training sets
sentiment140.com (Go et al.)
800.000 positive tweets800.000 negative tweetsBalanced classes - not representative
Edinburgh corpus
1.800.000 positive tweets325.000 negative tweetsUnbalanced classes - representative
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 25/36
![Page 37: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/37.jpg)
Introduction Methods Results Discussion
sentiment140.comAccuracy
Only positives
⇒ High accuracy for unbalances classes
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 26/36
![Page 38: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/38.jpg)
Introduction Methods Results Discussion
sentiment140.comκ statistic
Only positives
⇒ κ statistic detects unbalanced classes
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 27/36
![Page 39: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/39.jpg)
Introduction Methods Results Discussion
sentiment140.com
Accuracy κ Time
1 SGD 86.80% 62.60% 219.54 sec.2 Naıve Bayes 75.05% 50.10% 116.62, sec.3 Hoeffding Tree 73.11% 46.23% 5525.51 sec.
SGD scores best (Accuracy, κ)
Hoeffding tree scores worst (Accuracy, κ, Time)
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 28/36
![Page 40: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/40.jpg)
Introduction Methods Results Discussion
Edinburgh corpusAccuracy
⇒ Similar accuracy
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 29/36
![Page 41: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/41.jpg)
Introduction Methods Results Discussion
Edinburgh corpusκ statistic
⇒ κ discriminates better
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 30/36
![Page 42: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/42.jpg)
Introduction Methods Results Discussion
Edinburgh corpus
Accuracy κ Time
1 Naıve Bayes 86.11% 36.15% 173.28, sec.2 SGD 86.26% 31.88% 293.98 sec.3 Hoeffding Tree 84.76% 20.40% 6151.51 sec.
Naıve Bayes scores best (κ, Time)
Hoeffding tree scores worst (Accuracy, κ, Time)
κ lower than sentiment140.com due to unbalanced classes
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 31/36
![Page 43: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/43.jpg)
Introduction Methods Results Discussion
Changing feature weights
Tag Middle End Variance
apple 0.3 0.7 0.4microsoft -0.4 -0.1 0.3facebook -0.3 0.4 0.7mcdonalds 0.5 0.1 -0.4google 0.3 0.6 0.3disney 0.0 0.0 0.0bmw 0.0 -0.2 -0.2pepsi 0.1 -0.6 -0.7dell 0.2 0.0 -0.2gucci -0.4 0.6 1.0amazon -0.1 -0.4 -0.3
High weight: more frequent in positive tweets
Low weight: more frequent in negative tweets
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 32/36
![Page 44: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/44.jpg)
Introduction Methods Results Discussion
1 Introduction
2 Methods
3 Results
4 Discussion
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 33/36
![Page 45: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/45.jpg)
Introduction Methods Results Discussion
κ statistic
Pros
Suitable for unbalanced classes
Simple computation
Suitable for incremental learning
Cons
Independence assumption for computing pc often invalid
Conservative estimate
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 34/36
![Page 46: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/46.jpg)
Introduction Methods Results Discussion
Furture improvements
Additional features
EmoticonsNumber of friends/followers
Different languages
Neural sentiments (three classes)
Semantics
Foo beats BarBar beats Foo
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 35/36
![Page 47: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank](https://reader031.vdocuments.site/reader031/viewer/2022021514/5b1548ee7f8b9a45448b6587/html5/thumbnails/47.jpg)
Appendix
Sources
A. Bifet and E. Frank (2010)Sentiment knowledge discovery in twitter streaming data.
A. Go et al. (2009)Twitter Sentiment Classification using Distant Supervision.
J. Gama et al. (2009)Issues in evaluation of stream learning algorithms.
J. Cohen (1960)A coefficient of agreement for nominal scales.
Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 36/36