forecasting with twitter data presented by : thusitha chandrapala 20064923 marta arias, argimiro...
TRANSCRIPT
Forecasting with Twitter dataPresented by : Thusitha Chandrapala20064923
MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA
What information does twitter messages have?
•Twitter information▫Sentiment analysis: Are people happy or
unhappy about a certain topic? ▫Volume: Number of tweets about a given
topic
•Does twitter really help in predicting time series data?▫Moving stream of info.
This motivation of the paper
•Use three different forecasting model families, vary parameters systematically and analyze under which conditions twitter information is actually useful
•Testing non-linearity and causality between twitter data and the target
•Introduction of summery tree
Related work
• Stock market prediction▫Bollen et al:
Twitter -> sentiment->predict Dow Jones Industrial average
▫Wolfram et al. Twitter as an additional source of features, no sentiment
analysis
• Movie box office income▫Mishne et al:
correlation, blog posts▫Asur et al:
predict sales
Work flow
1)• Collecting data
2)
• Cleaning and preprocessing
3)• Sentiment analysis
4)• Prediction model
Preprocessing:
•Language detection
•Negation handling: considering “I like this…” and “I don’t like this… “ to be 2 features
•Relevance filtering and topic classification: using LDA▫Latent Dirichlet Allocation
Sentiment classification•Whether the text contains negative or
positive impressions on a given subject•Approach 1:
▫Automatic tagging to extract training instances :) :D - Happy sentiment :( - Unhappy sentiment
▫Binary classification problem: Use naïve Bayes to train the classifier
▫Use different dictionaries as features
Sentiment classification•Whether the text contains negative or
positive impressions on a given subject•Approach 1:
▫Automatic tagging to extract training instances :) :D - Happy sentiment :( - Unhappy sentiment
▫Binary classification problem: Use naïve Bayes to train the classifier
▫Use different dictionaries as features
Sentiment index
•A time-series of sentiment values▫The daily value is calculated based on the
daily % of +/- tweets over the total number of messages on a specific topic
Training the model
•ARMA : Auto Regressive Moving Average ▫y[t] = a.x[t]+b.x[t-1]+… +m.y[t-1]+n.y[t-2]
…..
•Simplified prediction:▫A binary prediction, which says if y[t]>y[t-
1]▫Use past values of self, and twitter time
series
Model parameters
Target Time series Share Market :ReturnsMovie box office: Revenue
Twitter series VolumeSentiment Index
Forecasting model family Linear modelsSupport vector machinesNeural networks
Result: Does including Twitter data increase classification accuracy by 5%?
Study details
•Stock market prediction targets▫Companies: Apple, google, … ▫General market indices: S&P100, S&P500
•Box office data▫Daily sales revenue series
Summery Tree
•Helps to identify model parameters that leads to consistently +/- results
•Decision Tree structure ▫Nodes are different parameters▫Leaves : Result
Summery Tree
Results: Stock market data
•Summery of prediction results:▫Generally Linear models do not provide a
significance performance improvement either for twitter volume or sentiment analysis based info.
▫Non-linear models can give an improvement!
▫Neural network based models gave the best performance
Results: Stock market data
Results: Movie box office
•Summary:▫Sentiment analysis did not have a positive
impact▫Volume information had a positive impact
with Linear regression and SVM
Conclusion
•In general, twitter information when used with non-linear models increase the prediction accuracy for long term stock market predictions
•Twitter volume had a linear relationship with movie sales, but sentiment analysis had none
Appendix
•Logarithmic returns of the series
1
1
t tt
t
P PR
P
Testing model adequacy
•Testing the relationship between twitter time series and the time series that has to be forecasted
•Neglected nonlinearity▫Are the 2 Time series non-linearly related?
•Granger causality▫X->Y OR Y->X ?