socinfo14 - on the feasibility of predicting news popularity at cold start
TRANSCRIPT
On the Feasibility of Predicting News Popularity at Cold Start Ioannis Arapakis, B. Barla Cambazoglu, Mounia Lalmas Yahoo Labs, Barcelona
Background Information § Until now news popularity prediction has relied for the most part
on: • on early-stage measurements • user-generated content
§ Cold-start prediction has been investigated mostly in the context of recommender systems*
*R. Bandari, A. Sitaram, and B. A. Huberman. The pulse of news in social media: Forecasting popularity. In Proc. 6th Int’l Conf. Weblogs and Social Media, 2012.
§ We follow the same experimental setting and reproduce the performance results reported in Bandari et al.
§ We improve the methodology and integrate the right performance metrics in a step-by-step fashion
§ We introduce a large number of new features which may further help predict future article popularity
§ In addition to tweet counts, we also use the view counts of article pages
Scope
News Dataset
§ News corpus of 13,319 news articles from Yahoo News, crawled over a period of two weeks
§ To quantify the popularity of news we considered two metrics: • number of times an article was posted/shared in Twitter (Tweets) • number of times an article was viewed by users (page views)
§ For each crawled article we sampled these metric values every 30' over a period of one week after the article’s publication
§ 337 observations per article
100 101 102 103 104
Rank of the article (log scale)100
101
102
103
104N
umbe
r of t
wee
ts (l
og sc
ale)
0 1 2 3 4 5 6 7Time (in days)
0.0
0.2
0.4
0.6
0.8
1.0
Num
ber o
f tw
eets
(nor
mal
ized
)
0 1 2 3 4 5 6 7Time (in days)
0
5
10
15
20
25
30
Num
ber o
f tw
eets
Fig. 1: Tweet counts of articles. Fig. 2: Tweet counts over time.
Feature Engineering § Time § News source § Genre § Length § NLP § Sentiment analysis § Entity extraction § Wikipedia § Twitter § Web search
Experiments § We start by reproducing the classification results presented in
Bandari et al. for Tweets § We split two weeks of articles into three classes based on
their tweet counts: • A (low popularity) [1, 20] • B (medium popularity) (20, 100] • C (high popularity) ) (100, ∞)
§ We experiment with the same classifiers (NB, Bagging, J48, SVM) and include a baseline (majority class)
§ We make predictions for one hour, one day, and one week after an article is published
Results
Classifier Tweets
Hour Day Week Baseline .840 .710 .703
NB .693 .581 .574
Bagging .858 .749 .741
J48 .856 .781 .775
SVM .859 .802 .797
Table 1: Accuracy (ten-fold cross validation, without zero-popularity articles)
Classifier Tweets
Hour Day Week Baseline .839 .706 .698
NB .735 .589 .584
Bagging .858 .737 .740
J48 .852 .779 .774
SVM .861 .803 .798
Table 2: Accuracy (training/test split, without zero-popularity articles)
Results
Classifier Tweets
Hour Day Week Baseline .871 .746 .740
NB .772 .642 .633
Bagging .886 .780 .769
J48 .883 .805 .804
SVM .890 .829 .825
Table 3: Accuracy (training/test split, with zero-popularity articles)
Class Tweets
Hour Day Week A .871 .746 .740
B .125 .227 .231
C .004 .027 .029
Table 4: Fraction of instances in each of the three popularity classes
Results
Actual Predicted
A B C A 4,698 247 0
B 728 812 0
C 98 96 0
Table 5: The confusion matrix for (Tweets, Week)
Class Tweets
Hour Day Week BaselineR 1.701 1.931 1.950
LR 1.132 1.270 1.305
KNNR 1.537 1.720 1.753
SVM 1.135 1.278 1.315
Table 6: Root mean squared error (train- ing/test split, with zero-popularity articles)
Results
Table 7: Performance in terms of the Kendal Tau and recall@k metrics
Tweets Pageviews
Hour Day Week Hour Day Week
R@10 .000 .000 .000 .000 .000 .000
R@100 .240 .110 .090 .010 .020 .060
R@1000 .578 .557 .548 .212 .173 .245
Conclusions § Predicting the news popularity at cold start is not a solved problem § Classifiers are biased to learn unpopular articles due to the
imbalanced class distribution § Highly popular articles could not be accurately detected,
rendering the predictions not useful in most practical scenarios § News popularity may be more accurately predicted if early-stage
popularity measurements are incorporated into the prediction models as features
§ Increasing the duration of such measurements will increase the accuracy of predictions but decrease their importance, leading to an interesting trade-off
Questions?
This work was supported by MULTISENSOR project, partially funded by the European Commission, under the contract number FP7-610411
iarapakis
http://www.slideshare.net/iarapakis/