prediction of box office revenue of movies using hype analysis of twitter data
Post on 22-Jul-2015
161 Views
Preview:
TRANSCRIPT
PREDICTION OF BOX OFFICE SUCCESS OF MOVIES USING HYPE ANALYSIS OF TWITTER
DATA(PREDICTING THE FUTURE)
By
SAMEER THIGALE, TUSHAR PRASAD
MIT COLLEGE OF ENGINEERING, PUNE
Internal Guide:
PROF. REENA PAGARE
Sponsored Organization:
PERSISTENT SYSTEMS LIMITED
A BRIEF OUTLINE
• Presence of “rich insights” in
social networks
• The Hypothesis:
“A Movie Well Talked About is Well Watched”
• Pre-release buzz- a success factor
2
LITERATURE SURVEY
3
REFERENCE DESCRIPTION
[1] FORECASTING- Methods andApplications by- Spyros M., Steven W., RobH., 3rd Edition, Wiley Publication (book)
Basic concepts of statistics like correlationStudy of forecasting models.Linear regressionTime series regression
[2] Predicting the Future with Social Media-S Asur, B Huberman, HP Labs, HP Journal, Jan2012
The various factors that could be consideredfor calculating the success rate might beattention seeking, Distribution, Polarity, Typeof film etc.Prediction can be made using linearregression.
EXISTING MODELS
• HOLLYWOOD STOCKEXCHANGE (HSX.COM)
– Uses Virtual Stocks to predict revenue
– Accuracy 90%, confidence: medium
• INTERNET MOVIE DB (IMDB.COM)
– Uses clicks, reviews, blogs, star casts to predict
• BoxOfficeMojo.com
– Uses clicks, reviews, blogs, star casts to predict
4
But None of the leading movie database sites use Social Media to make predictions. Why?
PROBLEM DEFINITION
• To demonstrate that the amount of attentiona subject has, has strong correlation to itsranking in future.
• To show that a simple regression model builtfrom the Twitter chatter can outperformmarket based predictions.
• To demonstrate how the model built can alsobe extended to products of consumer interest
5
Technical Keywords:Statistical prediction, Social network analysis, Regression
THE DATASET
• 100,000+ unique users
• Dataset of 6 weeks4 million tweets
6
MOVIE NAME
Jupiter Ascending
Shamitabh
SpongeBob: Sponge out of water
LoveSick
Fifty Shades of Grey
Birdman
American Sniper
Foxcatcher
Hot Tub Time Machine 2
Chappie Movie
Badlapur
MODEL EMPLOYED
• MULTIPLE LINEAR REGRESSION
– BASED ON FINDING “A STRAIGHT LINE PREDICTING Y(INCOME)”
7
MODEL EMPLOYED
A AVG COUNT OF TWEETS PER HOUR
P CALCULATED USING SENTIMENT ANALYSISRANGE: 0 TO 4 (0: VERY NEGATIVE, 4: VERY POSITIVE)
D NUMBER OF THEATRES MOVIE IS RELEASED IN
C CATEGORY OF MOVIE:ACTION, THRILLER, COMEDY, ANIMATION, ROMANCE
E STAR CAST- DIVIDED INTO 3 CATEGORIES; DEPEND ON TWITTER FOLLOWER
S SEQUELRANGE: 0 IF NOT SEQUEL, 1 IF SEQUEL
8
CONTRIBUTION
• In our model we are using multiple linearregression for forecasting which guarantees abetter and accurate outcome rather thanusing complicated Neural Networks, patternrecognition and other AI concepts.
• Model is robust and can be extended to otherconsumer products by just changing theregression parameters.
9
DEMO
10
SYSTEM ARCHITECTURE
11
PLATFORM AND TECHNOLOGY
• OPERATING SYSTEM AND ARCHITECTURE INDEPENDENT
– TESTED ON WINDOWS XP+, UBUNTU 12.04 LTS+
– BOTH 32-BIT AND 64-BIT ARCHITECTURE
• SOFTWARE REQUIREMENTS (MINIMUM):
– JDK 8
– MYSQL 5+
12
SALIENT FEATURES• Client-server architecture
• Accurate prediction
• Displays
– Sentiment of tweets
– tag cloud of tweets
– Location of tweet
– Rate of tweets per hour
PROUDLY BUILT ON THE OPEN SOURCE MODEL. ALL OPEN-SOURCE TOOLS USED. SOFTWARE LICENSED UNDER GNU GPL. 13
RESULTS
Features R2
Avg tweet rate 0.02
Avg tweet rate + theatre count 0.91
14
Movie Name Release Date What we predicted (in USD)
What actually happened!
Fifty Shades of Grey 13-Feb-2015 80,214,910 85,043,000
Shamitabh 06-Feb-2015 243,661 241,720
Kingsman: Secret Service
13-Feb-2015 34,345,613 36,225,000
HotTubTimeMachine2
20-Feb-2015 30,255,168 ????(IMDB SAYS 25M)
APPLICATIONS
• Forecasting products of consumer interestgiven the chatter
– Movies
– Elections
– ICC World Cup
– Epidemiology (Google Flu trends)
• For theatre owners to predict the number ofshows to be scheduled
– Similarly to retailers of respective products
15
LIMITATIONS
• Data cleaning limitations– Presence of reference to two or more movies
– Presence of sarcastic tweets
– Emoticons
• CONSTRAINTS:– Due to Twitter API limitations only 1% of tweets
can be caught (Can be improved by Firehoseaccess)
– Only tweets in English language accepted
16
Such a wonderful movie #Humshakal is!
I <3 d mve #Shamitabh
FUTURE SCOPE
• Estimating from “negative hype”
– For e.g. Revenue of #PK increased due to the#PKDebate
• Correlating success of songs to success ofmovie
– Famous example of the song “Tum Hi Ho”
• Correlating “structure” of retweets and“favorited” tweets
17
THANK YOU!
18
top related