ssc case study competition: solving a puzzle with · pdf filessc case study competition:...
TRANSCRIPT
Biostatistics Seminar Oct 7, 2014
Shahriar Shams, Changchang Xu
SSC Case Study Competition: Solving a Puzzle with Multiple Solutions
SSC case study competition: FREE FOOD on top of knowledge transfer
Venue for 2015: Nova Scotia
∗ Annual meeting of Statistical Society of Canada (SSC) ∗ Student conference and the main conference ∗ Student conference
∗ Keynote speaker, career panel, individual research poster presentation (eg. Your practicum work)
∗ Main conference ∗ Presentations by students and professionals (academia/industry) ∗ Case study competition
SSC conference
∗ Two case studies ∗ Data will be uploaded on the SSC conference website ∗ Some research questions(3/4) will be given with each
dataset to keep uniform focus ∗ Final output is a poster (this year the max dim was 6’
wide and 3’ height) ∗ No restriction on group size ∗ Allowed to have faculty mentor (or Senior PhD
students as mentor)
Case study competition
∗ http://www.ssc.ca/en/meetings/2014/case-studies
∗ Registration fees ($230 early bird)
∗ Cost of printing posters – ($60-$80)
∗ Reimbursement- Dr. Wendy Lou
Identification of Factors Associated with Revenue from a
Social Mobile Game
Gabriel Lau, Lei Miao, Shahriah Shams, Kwong-Him To, Changchang
Xu, Ruoyong Xu Biostatistics Division, Dalla Lana School of Public Health,
University of Toronto
∗ Social mobile game users: 300,000, behavioural characteristics ∗ Data cover Platform: which the players used for game playing Dates for reaching a stage & first purchase & first prize awarded & first git received: a comprehensive representation of the user’s progress in the game Revenue: amount which they paid for in-game purchases.
∗ Observation period: User behaviors were recorded for a
predefined period after game installation
Background on the social mobile game
• Identifying the predictors of revenue
• Determining the best timing for introducing certain free in-game features which would lead to higher revenue
Objectives
Methods
Data description Revenue: The revenue obtained from user scaled by an unknown constant Prize: The day of which the first prize was awarded:
I. Within 2 days of installation II. More than 2 days after installation III. No prize awarded
Purchase: The day of which the first in-game free purchase was made: I. On the day of installation II. After the day of installation
Games: The number of games played (1 unit =10 games) Platform: The use of hardware to access the game
I. One platform II. Two or more platform
Total number of items purchased during the observation period Facebook: Connected to Facebook
Statistical Modelling Logistic regression model: predict whether a user would pay any amount Inverse Gaussian model: finding covariates associated with the increase in mean revenue
Methods (cont’d)
EDA: revenue distribution on other variables
EDA: revenue distribution on other variables
Num
ber o
f dis
tinc
t in-
gam
e pu
rcha
ses
Revenue Mean
Number of in-game purchases vs Revenue
Results: Logistic Modelling
Parameters OR (95% CI) p-value
Games Played 1.018 ( 1.017 , 1.018) <0.0001 Date of first prize: Within 2 days of installation vs No prize 11.840 (10.892 , 12.871) <0.0001 Date of first prize: More than 2 days after installation vs No prize 9.775 ( 8.924 , 10.706) <0.0001 Number of platform: 2 or more vs 1 1.783 ( 1.603 , 1.982) <0.0001 Facebook connection: Connected vs Not connected 1.889 ( 1.777 , 2.009) <0.0001
Logit[P(revenue>0)] = β0 + β1*games + β2*prize1 + β3*prize2+ β4*platform + β5*facebook
• Number of games played, date of first prize, number of platforms and facebook connection all seem to play a significant role in predicting whether a game user would contribute to the revenue
Results: Inverse Gaussian Modelling
Parameters Exp(Estimates) (95 % CI) p-value
Intercept 0.269 (0.240 , 0.302) <0.0001
Games 1.015 (1.013 , 1.017) <0.0001
First in-game purchase on the installation day 1.080 (1.006 , 1.158) 0.0328
First prize received within the first 2 days of installation 1.148 (1.045 , 1.261) 0.0039
First prize received 2 days after installation 0.988 (0.893 , 1.093) 0.8133
Total numbers of items purchased 1.422 (1.375 , 1.471) <0.0001
Log(revenue)] = β0 + β1*games + β2*purchase + β3*prize1+ β4*prize2 + β5*total
• Number of games played, date of first in-game purchase, date of first prize and total number of items purchased all seem to impact significantly on the increase of final revenue
People who did not receive a prize were very unlikely to pay hence it was recommended that the game can give out more prizes in the beginning stages.
Facebook connection is associated to a great chance of paying.
People who installed the game on multiple platforms were more likely to pay, therefore this should be encouraged.
Generally, the earlier the user made free in-game purchases and received prizes, the higher the predicted revenue was received.
Discussions
More demographic information was needed to really investigate the predictors leading to higher revenue.
It would be interesting to analyze age information.
The return player variable provided was not a good representation of retention because it was dictated by whether a user returned on one single day.
Discussions (cont’d)
Further work could be done on the investigation of retention by creating a better representable and quantifiable variable.
Instead of modeling the aggregate revenue, the spending patterns of users over the observation period should be explored.
Future work
∗ Dr. Wendy Lou ∗ Dr. Paul Corey ∗ Dr. Billy Chang
∗ And Changchang Xu for putting this presentation
together
Acknowledgement
∗ Start a bit early (end of April is crazy for Masters students)
∗ Try to form a balanced group ∗ No need to focus on all the questions unless you
really want to ∗ Learn from your peers as much as you can
Experience from last year
Thank you!