predicting customer lifetime value - umu.se · predicting customer lifetime value using machine...

Predicting Customer Lifetime ValueUsing machine learning algorithms

Matilda Karlssson

Matilda KarlsssonHT 2016Examensarbete, 30 hpSupervisor: Patrik EklundExaminer: Henrik BjorklundCivilingenjorsprogrammet i tekinsk datavetenskap, 300hp

Abstract

Spending money to acquire new customers can be a risk since new play-ers never immediately pay off. In this thesis three machine learningalgorithms, neural network, bayesian network and regression, is used totry to early find out if it is possible to determine how much a user willspend in the game in order to minimize the risk.

The result showed that neural network performed badly mostly becausethere might not be a strong correlation between how a player plays, orwhere he comes from, and how much he will spend.

Because of how bayesian network works, it was hard to answer the ques-tion, but it still gave a good indication at what kind of players spendsmoney in the game.

Regression showed that a player should have paid off around 50% of theadvertisement cost around day six or seven, or it will most likely neverpay off.

Contents

1 Introduction 1

1.1 Partners 1

1.2 Purpose 1

2 Problem description 3

2.1 Problem 3

2.2 Mad skills motocross 2 3

2.3 Goals and purposes 4

2.3.1 In-depth study 4

2.3.2 Data 5

3 Machine learning algorithms 7

3.1 Bayesian network 7

3.1.1 Sprinkler, rain and wet grass 8

3.1.2 Hugin Expert 10

3.2 Neural network 10

3.2.1 Back propagation 12

3.3 Regression 12

4 Results 15

4.1 Result of neural network 15

4.2 Result of bayesian network 16

4.3 Result of Regression 17

5 Conclusion and future work 21

5.1 Conclusion and limitation 21

5.2 Future work 21

A All other bayesian networks 25

1(40)

1 Introduction

Having customers is an essential part of any profitable company. Without paying customersmost companies would quickly have to go bankrupt. Therefore most companies needs away of acquiring new customers.

Advertisement could be an efficient way to gain new customers. But advertisement costsmoney and if the company wants to profit they need to acquire enough costumers to coverthe advertisement costs. A new customer might have a lifetime of over a year, and duringthis time the customer will hopefully pay off. However, when acquiring new customerswe do not want to wait a full year before knowing whether or not the advertisement wassuccessful. The best case would be to know already after a week or two so that we couldquickly stop advertise in case it costs more than it will ever return. Therefore it could be ofgood value to be able to predict a customer’s lifetime value. When knowing the customerlifetime value a company also know how much they can be willing to spend to acquire newcustomers.

Turborilla AB[23] develops games for iOS and Android and wishes to advertise their gamewith Pay per Click advertisement. Turborilla would have to pay for every click on theiradvertisement, and hopefully gain a customer for every click as well. In Turborilla’s gameMad Skills Motocross 2, they earn money from customers watching ads, and customersbuying certain in-game features, such as a new bike or some kind of assist to beat a track.

1.1 Partners

The idea for this thesis was provided by Turborilla AB. They also provided test data andresources to use when implementing and testing the algorithms. They helped a lot duringthe work of this thesis with knowledge, information and a workspace.

1.2 Purpose

The purpose of this thesis is to test if it is possible to predict the lifetime value of a customerwithin a confidence interval:

The customer X will spend Y amount within 180 days with 95% confidence.

Preferably as soon as possible after the customer was acquired.

This can be divided into several subtasks. First we need to figure out which attributes

Matilda Karlsson Predicting Customer Lifetime Value

2(40)

about a customer that correlates to their lifetime value. Looking at the device they playon might give an indication of how much they are willing to spend on a mobile game. Itcould be possible that someone on a brand new phone might be willing to spend real moneyon a mobile game while someone one an older phone might not be, or that how well theyperform in the game correlates to how much they are willing to spend.

It is also essential that we are able to measure the confidence of the predictions. A modelthat predicts the lifetime value of customers is useless unless it also gives some kind ofconfidence of the prediction. Using a 95% confidence interval means that we know with acertainty of 95% that the actual value will lie within the interval. Any percentage could beused for the interval, and 90%, 95% or 99% is commonly chosen.


3(40)

2 Problem description

This chapter will explain more about the game in question, the problem and explain the datathat Turborilla currently saves about their customers.

2.1 Problem

Turborilla does not have any way of calculating their customers lifetime value today, mean-ing that this thesis does not depend on any previous work at Turborilla.

The requirements for the prediction of lifetime value is that is should run fairly quickly.With Turborilla’s 29 million users it can be expected to take some time but this should stillbe minimized as much as possible. The software also needs to be able to make confidentpredictions. The software needs to predict the right value quite certain to be of any use atall.

2.2 Mad skills motocross 2

The main focus of this thesis is how Turborilla AB profits from their game Mad SkillsMotocross 2 (MSM2).


4(40)

Figure 1: Screenshot from the game MSM2, where two players race against each other.

MSM2 is a 2D side-scrolling game where players compete against each other or against acomputer controlled player. Figure 1 is a screenshot from the game where two players raceagainst each other. The game is free to download and play for free if the user wishes too.New bikes or bike parts are possible to unlock when the player is skilled enough, or theycan be bought at any time for real money.

A player who has never spent any money in the game is shown advertisement once per day,meaning that Turborilla will profit from users even if they have not spent any money in thegame.

2.3 Goals and purposes

This thesis consists of two parts.

• Make an in-depth study of machine learning algorithms, calculations of lifetime valuefrom other companies and study Turborilla’s data.

• Using some machine learning methods to test if it is possible to predict players life-time value.

2.3.1 In-depth study

The first part of the in-depth study consists of reading how other companies evaluate theircustomers lifetime value and Turborilla’s data. When reading about other calculations ofcustomer’s lifetime value, most used an equation that depends on a customer’s average spentfor a week and for how long they expect a customer to stay within the company. CustomerLifetime Value = weekly spent * number of weeks in company. When checkingthis against Turborilla’s data it was quite easy to understand that this approach would notbe sufficient. This is the main reason why this thesis focuses on using machine learning


5(40)

algorithms to solve this problem.

In the in-depth study the main focus what therefore machine learning algorithms. Thealgorithms studied where bayesian network, neural network and regression. The study ofthe algorithms shows what kind of data that applies usually is used. Neural network tendsto work best with continuous data, while Bayesian network also work on categorical data.

2.3.2 Data

There are two kinds of data available, one for aggregated players, and one for a single player.

Turborilla has today around 29 million users, meaning that the user data over every singleplayer is huge. The data is also split into two; one for Android users and one for Appleusers. These are split so that around two third of the users are Android users and one thirdis Apple users.

Aggregated data

The aggregated user data is grouped by the date players started playing, and for all of theseit is possible to see what they together spent a specific date.

Table 1 A table over aggregated data showing date, players and spent every day after start

Date players 0 1 2 3 4 · · · 301 jan 5241 $ $ $ $ $ · · · $2 jan 4613 $ $ $ $ $ · · · $3 jan 4166 $ $ $ $ $ · · · $4 jan 2554 $ $ $ $ $ · · · $...26 feb 5048 $ $ $ $ $27 feb 4512 $ $ $ $28 feb 5623 $ $ $29 feb 5241 $ $30 feb 1245 $

Table 1 shows how the aggregated data looks like. The number of users starting on a specificday are made up, and in the real data it is a numbers instead if $, but Turborilla wanted tokeep these numbers hidden.

For every date, 1 Jan, 2 Jan, it shows how many players that installed the game on thisspecific date, and the rest is how much they totally spent on day x after they started. So forthe 5241 players who started on January 1 the 0 column is how much they spent on Januray1, and the 1 column shows how much they spent on January 2. But for the 4613 playerswho started on January 2 the 0 column shows how much they spent on January 2, and thecolumn 1 shows how much they spent on the January 3. This data is only collected up untilday 30.


6(40)

Single user data

The single user data is collected in massive database files, where every row in one user.Their data is also split into several different files. Every player in this data has its own ID-tag to keep track of them in between files, that is also used as a primary key if the data isloaded into a database.

This data saves only if a player has does something, much like a boolean value, and not anydates when it happened. So if a player has lost track one nine times, the database wouldsimply only save a ’9’. The only date it does save is starting date for each player and thelast time they were online.

Table 2 Some example columns from the databaseuserID device timezone bike customized bike unlocked bought anything track one lostsdf45ER234gFGGFGH iPhone7 America/Honolulu True 8 False 8qwe234SDXV23aqwer iPhone4 America/Toronoto True 3 True 7

Table 2 shows some example columns of what the data might look like. This is a reallycompromised version of the database, in reality the data is a lot larger. The data in the Table2 is made up. Like said these are also a concatenation of what could be seen in severaldifferent files. In reality game progress and device is not saved in the same file.


7(40)

3 Machine learning algorithms

Machine learning is to teach computers to do things without being explicitly programmedto do so [13]. Machine learning has developed from pattern recognition and is today usedin several different fields. Banks uses it to get insights in investment opportunities, websitesuses it to recommend items users might like depending on previous purchases and trans-portation services analyse data to find patterns that will make routes more efficient, andtherefore profitable [5].

In this thesis, three different machine learning algorithms were used, which are explainedin detail below.

3.1 Bayesian network

A bayesian network is used to find patterns of influence among a set of variables [12]. Abayesian network consist of a directed acyclic graph where the nodes in the graph are ran-dom variables like attributes or features, and the arc between nodes represent a dependencebetween the nodes. Since the arcs have a direction it indicates that A ’causes’ B when thearc goes from A to B [14]. Bayesian networks are also called probabilistic networks becausethey use classic probabilistic calculus.

The possibility of an event A denoted P(A) is a number in the interval [0,1]. The basictheorems of probabilistic calculus are:

P(A) = 1 if and only if A in certain

If A and B is mutually exclusive then:

P (A ∨ B) = P (A) + P(B)

A basic concept in Bayesian network is conditional probability. A lot of statements haveunsaid prerequisites, like ”the probability that a die turning up 6 is 1

6 ” have an unsaid pre-requisite that the die is fair. This is denoted

P(A | B) = x

Giving the event B, the probability of A is x.

This means that if B is true and everything else known is irrelevant for A, then P(A) = x.

Another fundamental rule for probability calculus is:


8(40)

P(A | B) P(B) = P(A, B)

where P(A, B) is the same as A ∨ B. Followed from this we get that

P(A | B) P(B) = P(B | A)P(A)

→

P(B | A) = P(A | B) P(B)P(A)

If the formula is dependent on a context C, the formula becomes like equation 3.2 [10]

P(A|B,C)P(B|C) = P(A,B|C) (3.1)

P(B|A,C) =P(A|B,C)P(B|C)

P(A|C)(3.2)

3.1.1 Sprinkler, rain and wet grass

A classical example of a bayesian network is the cause of wet grass. The wet grass couldbe caused by either rain or sprinklers, but seldom by both. If the grass is wet and that thesprinkler is on, the chance of rain decreases.

sprinkler rain

cloudy

wet grass

Figure 2: Bayesian network over weather, sprinkler, rain and wet grass, showing that thewet grass depends on the sprinklers and rain, but not directly on the weather

Figure 2 shows how the wet grass depends on sprinklers and rain.


9(40)

Table 3 Several tables: first showing the chance of cloud, then the chance of sprinklersbeing on in case of cloud, then the chance of rain in case of cloud and last the chance of thewet grass, depending on sprinklers and rain

P(Cloudy=F) P(Cloudy=T)0.2 0.8Cloudy P(Sprinkler=F) P(Sprinkler=T)T 0.9 0.1F 0.5 0.5Cloudy P(Rain=F) P(Rain=T)T 0.2 0.8F 0.9 0.1Sprinkler Rain P(Wet grass=F) P(Wet Grass=T)F F 1.0 0.0T F 0.1 0.9F T 0.1 0.9T T 0.01 0.99

Table 3 shows the different probabilities for cloudy weather, sprinklers, and rain in the caseof cloud or not could, and wet grass, depending on sprinklers and rain.

From this we can make predictions: If the sprinklers is on, the grass is probably wet.Abductions: if someone falls on the slippery grass, it is probably wet. Abduction: if thegrass is wet it is more likely that either the sprinkler is on, or it is raining. Explainingaway: If the sprinklers are on, the likelihood of it also raining is reduced [17].

To calculate something in a bayesian network equation 3.2 is used. For example P(W=T|C=T),the probability that the grass is wet, given that it is cloudy.

P(W = T |C = T ) = ∑S={T,F}

∑R={T,F}

P(C = T,S,R,W = T )C = T

(3.3)

Because of the conditional dependencies P(C, S, R, W) = P(C)P(S| C)P(R| C) P(W| S, R)

Therefore equation 3.3 becomes:

∑S={T,F}

∑R={T,F}

P(C = T,S,R,W = T )C = T

=P(T,T,T,T )+P(T,F,T,T )+P(P,T,F,T )+P(T,F,F,T )

P(C = T )(3.4)

0.8 ·0.1 ·0.8 ·0.99+0.8 ·0.9 ·0.8 ·0.9+0.8 ·0.1 ·0.2 ·0.9+00.8

= 0.7452 (3.5)

At equation 3.5 the last term becomes 0 because P(W=T | R=F, S=F) is 0. The result 0.7452means that P(W=T | C=T) is 75%.


10(40)

3.1.2 Hugin Expert

For this thesis, the program Hugin Expert [9] has been used to create bayesian network.Hugin Expert takes the data, in this case, all players and their attributes, and created a net-work with both probabilities and correlation between nodes. In the example, about weather,sprinklers, rain and wet grass the input data would be the state of each four variables everyhour or day.

When training a network it is possible to choose between two different learning algorithms,PC and NPC.

PC algorithm

The PC algorithm is named after its inventors [8], the first two authors of Spirtes et al.(2000) [22]. In Hugin, a variant of this algorithm is used. However, this algorithm has somecons. It will generally not be able to know which direction that nodes should be linked(i.e., fever causes sweat, and not the other way around) meaning that the result needs to beinspected to find links that seem counterintuitive, and this forces some knowledge of thedomain.

NPC algorithm

NPC stands for Necessary Path Condition and is developed by researchers at Siemens. Thebasics of the NPC algorithm is the same as the PC algorithm. The PC algorithm does notwork well with limited data sets, and NPC tries to repair this deficiency [15].

When PC is uncertain of what direction an arc should be it will simply randomly chooseone, when NPC lets the user interact and choose the direction [3].

Both PC and NPC produce correct structures when the data is infinite, and the test sets areperfect. However, this is often not the case, and with limited data, the algorithms oftenderive conditional dependencies that are not correct, or leave out other dependencies.

In most cases, the NPC algorithm is, therefore, the one to prefer over PC as it will bettermap the relations represented in the data. Especially when the dataset is small [16].

3.2 Neural network

There are a lot of problem that is easy for a computer to solve, while hard for humans, forexample the square root for a large number. There are also a lot of problems that are easyfor a human to solve, when really hard for a computer. For example, show a kid a picture ofa cat and a dog and he or she will quickly be able to tell which one is which. For a computerto do this is however really heard [21]. Neural networks are based on the way the humanbrain performs computations. They are great at fitting nonlinear functions and recognizingpatterns, which are the problems that are easy for human but hard for computers to solve [7].

An artificial neural network is built in layers. Every layer consists of neurons, or nodes.First one input layer, then a few hidden layer and a final output layer. In between the layersare connections which have an associated number called weight [24]. In figure 3 a small


11(40)

neural network is shown.If the output layer only has one node, the network could make aprediction that is a continuous number, or it could have several nodes in the output and workas a classifier. This network has one input layer with tree neurons, one hidden layer withfive neurons and one output layer with tree neurons.

input hidden output

Figure 3: A neural network with one hidden layer.

The output of the hidden layer and output layer is the sum of all inputs, multiplied by theirweight.

hi = σ

(n

∑i=1

wixi +θ

)(3.6)

θ is the bais in the of the weight value [11], and the σ is an activation function. A commonactivation function is sigmoid, which is also used during this thesis, but others can also beused, such as arc tangent or hyperbolic tangent.

σ(u) =1

1+ exp(−u)(3.7)

The key element in neural networks is their ability to learn, or adapt. This is done bychanging the value of the weights. If the output from the network is poor its possible toalter the weights in order to fit the training data better. There are several strategies to trainthe network.

Supervised learning

A supervised learning has some kind of cheat sheet with the answers. If the network issupposed to determine handwritten numbers on a picture, it will first make its initial guessand then compare it to the right answer and make adjustments depending on the error andrepeat this process until all guesses are correct (or some other stop criteria like a maximumnumber of iterations are fulfilled).

An example of a supervised learning technique is the back propagation algorithm, explainedin section 3.2.1.


12(40)

Unsupervised learning

Required when the training sets does not provide the correct answer. Here the neural net-work will instead rely on clustering the data set into small groups that are similar.

Reinforcement learning

Reinforcement learning is common in robotics. The robot will do something, say turn left orright in a maze and observe the new environment. If the new observation is always negativewhen turning left, and always positive when turning right, the robot will over time learn toalways turn right.

3.2.1 Back propagation

The most commonly used algorithm for training neural networks, when the correct answeralso is provided, is back propagation. The basic idea behind back propagation is the repeatedapplication of the chain rule. Compute how much each weight influence the network withrespect to an arbitrary error function E. To minimize the error function using gradientdescent [19]:

wi j(t +1) = wi j(t)− εδE

δwi j(3.8)

where w is a weight between i and j, and ε is the learning rate. The learning rate will play abig part in the success of the network. A too big learning rate will make it possible to miss aminimum by jumping over it, while a too small learning rate will take too long time to finda minimum at all [18].

3.3 Regression

Regression analysis is used to model the relationship between variables. Regression is usedto answer questions such as Does the change of class size affects the students’ results?We have some independent variable, X, in the example above this is the class size, anda dependent variable, Y, the result of the students. The goal is to find some connectionbetween X and Y.


13(40)

Figure 4: A scatter plot and a regression model. Picture by Christopher Bare [2].

Figure 4 shows an example of a linear regression between a variable X and independentvariable Y. The line drawn in the middle of all scatter plots is the linear fit to the data. Thiscan later be used to predict what Y is for different X. It is however not necessary to the havea linear fit to the data. The data might as well be exponential or logarithmic [26].

A regression model consist of a unknown parameter β, an independent variable X and adependent variable Y, which relates to each other as [20]:

Y = f (X ,β) (3.9)

if β is a vector of length k, in the normal case if regression it takes at least k observations tosolve the system, making the system overdetermined. However, if there are exact k obser-vations and the system is linear. An overdetermined system is almost always inconsistent.To solve this least square fitting is commonly used. This is a mathematical procedure to findthe best fitting curve to a set of points [25].

Least square fitting

Least square fitting is a mathematical procedure used to find the best fitting curve to a set ofpoints. The best fitting curve is said to be the curve that minimized the sum of the squaredresiduals, the length between the curve and the real value. The usage of the sum of squares,instead of the sum of absolute values allow the residuals to be treated as continuous quantitywhere the derivative can be found. The disadvantage of using the sum of square instead ofa absolute value is that an outlying point gets a disproportional effect of the curve [4].

The least square problem divides into two, linear and non linear least square. Linear leastsquare fitting is simpler and most commonly applied. It provides a solution to the problemof finding the best fitting straight line through a set of points. Is it common to transform thedata so that it will fit a straight line, say by plotting T vs.

√t instead of T vs. t.

The non linear least square fitting may be applied iteratively until convergence is achieved.It is often possible to linearise a nonlinear function and therefore not having to use an


https://www.r-bloggers.com/linear-regression-by-gradient-descent/

14(40)

iterative method. Depending on the type of fit and how the initial parameters are chosen anon linear fit may have a poor convergence [25].


15(40)

4 Results

This chapter will present the results of the different machine learning algorithms.

For all tests users have been sorted out, forcing them to have started after 2015-04-20 be-cause a big update happened just before this, making the game a lot different. Users startingbefore this game is therefore not relevant. The users in the test must also have not been on-line since 2016-08-30 because their ”lifetime” in the game needs to be over, to get a lifetimevalue of players.

4.1 Result of neural network

To make sure that the network worked at all, some initial tests were made, using data fromthe Fundamentals of Artificial Intelligence course at Umea university [1].

The neural network performed badly in this thesis. Initially some tests were made thatshowed that there is difference how much users spent depending on what kind of devicethey are on. Someone using an iPhone7 tended to spend more money than someone on aniPhone4. The country someone is from also tended to somewhat show if someone wouldspend money in the game. A user from a rich country tended to spend more money thansomeone from a poorer country.1

Since the correct answer, how much a player has spent, is known the network was trainedusing supervised learning. In section 3.2.1 the back propagation algorithm is explained,which is the most common way to train neural network, and is also what was used here.

The first neural network was therefore first trained on user properties like their device, wherethey are from, how long time they are spent in the game, number of days they have played,and an average of time per day.

A second test was made, training the net on how the player performed on the tracks in thestart of the game. The hypothesis is that a player who performs really well will not have tospend any money in the game, and a player who performs really bad most likely does notwant to spend any money.

The tests were performed with one or two hidden layers, and nodes in each layer betweenfive and 30. The number of training sets were between 500 and 5000 and test sets werebetween 100 and 1000.

1The game is not released in China, yet have quite a few players in China. Players there tend to have reallylow lifetime value.


16(40)

However, non of the tests gave any good results. First the result was horrible because thenumber of player who never spend anything at all is so many more than player who spendmoney, the network always guessed 0 spent.

When removing players with 0 spent, the result was still poor. This shows that there is not aclear correlation between the input and the amount spent in game. As said in section 2.3.2neural networks works best with continuous data, and not all the data that was used wasnecessarily continuous. Some data, like the device, was still sorted by when it was made,and probably therefore also the cost of the device, but perhaps this did not suffice.

4.2 Result of bayesian network

The Bayesian network performed a lot better than the neural network.

Figure 5: Pictures showing a bayesian network, and different states of it

Figure 5 shows how the result of the best Bayesian network. When one of the columns aremarked red, only those players are looked at. This means that the average time a player whospends within the ”very low” category plays for 308 h.


17(40)

Table 4 An overview of the result of the Bayesian network

Spent, $ Spent, time [h]very low 308low 430medium 600high 712very high 1031

Table 4 shows the same thing as the Figure 5, but with a better overview. It is easy to seethat the time spent in the game correlates to the money spent in the game in the expectedway; more time means more money.

The other networks, shown in Appendix A shows that if the player does something in thegame, for example plays the versus more, the chance of the player spending more moneyin the game increases. All of these networks are using the NPC-algorithm explained insection 3.1.2. During the phase where the user can change the direction of the arrows in theNPC-algorithm, nothing was ever done. This is possible to see in several of the networksin Appendix A, where the networks claims that P(X|spent), and not the other way around,which most of the time would be more intuitive.

In these tests players who has not spent any money at all is sorted out. They are a too bigpart of the players that is was hard to see any difference at all when including them. Thiscan be seen in figure 8 in Appendix A.

However, this does not really answer the original question, to early in the players lifetimefind out how much a player will spent in the game. It does give an indication that anengaged player will spend more money in the game, but because of how bayesian networkswork, it was not possible to say exactly how much the player will spend.

4.3 Result of Regression

When doing the regression tests the aggregated data was used, that showed what all playersstarting on a specific day has spent in the game. This is explained more in section 2.3.2.

The plot over how player average spent money on every day after they started is showing inFigure 6


18(40)

Figure 6: Graph over how players spends money, where x axis is days after installing thegame, blue line is the exact number, and red line is the fitted curve.

In Figure 6 the numbers on y axis is removed, on Turborilla’s wish. They exact numbers areanyhow not important in this case. What should be noticed in how the average player spendsmost of what they will ever spend in the first few days. The fitted curve is a two term expo-nential model [6]. General model Exp2: model(x) = a*exp(b*x) + c*exp(d*x).

The Figure 6 is doing what is explained in section 3.3 where a line in drawn though a scatterplot. To get the right curve a version of least square fitting, explained in section 3.3, is used,called Trust-region-reflective.

From the data for every single player the average lifetime value for every player was com-puted, and their average lifetime. This value was compared to the sum of what the modelin Figure 6 predicted the players to spend from day zero to their average lifetime length.These numbers fit extremely well with each other.


19(40)

Figure 7: Percentage spent every day, together with a lower and upper confidence interval.

From the expected total spent from Figure 6, Figure 7 was created. On x-axis is the daysafter a player installed the game and y-axis is the percentage of what the player totally willspend on the game up util that day. The sum of what he spent up until that day, divided bythe average total spendings. The number are seen in Table 5. In this table its shown that theaverage player will have spent 50% on what he will ever spend already on day six.

Table 5 could be used to find if an advertisement will be successful or not. If the new playerhave not paid 50% of the advertisement cost by day six or seven, they most likely never willprofit from these new players. This also means that Turborilla will never have to have anadvertisement running for more than about a week before they know if they will profit ornot, which was the goal from the start.


20(40)

Table 5 Procentage of how much of their total a player will have had spent of each day

Day Lower CI Mean upper CI0 0.1795 0.1880 0.19651 0.2573 0.2682 0.27922 0.3160 0.3283 0.34063 0.3641 0.3774 0.39084 0.4074 0.4214 0.43545 0.4485 0.4633 0.47816 0.4854 0.5010 0.51677 0.5198 0.5363 0.55288 0.5496 0.5670 0.58439 0.5756 0.5937 0.611810 0.5981 0.6166 0.635211 0.6207 0.6397 0.658612 0.6436 0.6626 0.681613 0.6665 0.6857 0.705014 0.6888 0.7083 0.727815 0.7087 0.7288 0.748916 0.7263 0.7468 0.767317 0.7421 0.7628 0.783618 0.7576 0.7785 0.799319 0.7731 0.7943 0.815620 0.7886 0.8098 0.830921 0.8029 0.8245 0.846122 0.8159 0.8380 0.860023 0.8293 0.8518 0.874224 0.8424 0.8651 0.887825 0.8538 0.8768 0.899826 0.8658 0.8890 0.912327 0.8775 0.9010 0.924428 0.8897 0.9132 0.936829 0.9003 0.9242 0.948130 0.9105 0.9346 0.9587


21(40)

5 Conclusion and future work

This chapter will discuss the the results, restrictions and future work.

5.1 Conclusion and limitation

This thesis shows that it is possible to find what an average player will spend during theirlifetime. The linear regression model was close to perfect when predicting what an averageplayer will spend during their lifetime. It seems like the way the game is created, it makesplayer spend most of what they will ever spend in the beginning of the game, making iteasy to say that a player quickly has to pay the advertisement cost, even if the player willcontinue of play several months or years.

It was never possible to pinpoint what a specific individual will spent during their lifetimebecause the algorithms are worked on individuals, neural network and bayesian network,did not work as well as hoped. But even though the original question was never answered,the purpose of the question was answered. Turborilla’s wish was to minimize the risk whenusing advertisement, which is possible to do when using the regression model.

From the neural network we can draw the conclusion that it is not enough to look at onlyhow a player plays the game, or where they are from is enough to find a correlation to howmuch they will spend in the game. Neural networks were also pretty time consuming toconstruct. The training part took quite some time, and finding the right amount of hiddenlayers and training speed is also time consuming. From the bayesian network we can saythat a engaged player will spend more than a non engaged player.

The bayesian network gave some great results in knowing what attributes that correlates tospendings. It is unfortunate thought that the Hugin software needs a licence to be able to useand the results needs to be analysed, instead of getting a simple output saying how muchsomeone will spend.

5.2 Future work

Looking at the linear regression gives a hint that people seems to spend money in the sameway. If this is true it would be helpful to know what a player bought something, and see ifearly spendings have a correlation to their total spending. However, saving dates of everypurchase is quite unrealistic. The database files are already huge, even without this.


22(40)

A bit more realistic thing to do, that could make it possible to see if the regression model iscorrect would be to save more than just the first 30 days of a players spendings. Saving 60or 45 would make is possible to further check how correct the model is.


23(40)

Acknowledgements

Thanks to Turborilla for welcoming me into their office and letting be do this thesis. Thanksto my supervisor at the department of computer science for contributing with good ideas,and a special thanks to Simon Sandstrom for your support.


24(40)


25(40)

A All other bayesian networks

As explained in section 3.1.2 the NPC algorithm lets the user change the direction of thearrows if necessary. No arrow is never changed, so the results is always what Hugin guessed.Some arrows therefore seems counter intuitive because of this.


26(40)

Figure 8: Showing the difference in spending depending on however a user has customizedtheir bike in any way or now. For example changed the color of the bike.These pictures also includes everyone who never spent money, to the right.


27(40)

Figure 9: Pictures showing how the number of days between the installtion and last playedand the correlation to spendings.


28(40)


29(40)

Figure 10: Showing how win and losses of the first nine tracks correlates to each other andspendings

Figure 11: Showing if a user has player jam or not, and correlation to spending


30(40)

Figure 12: Showing what division of jam the player is in, and how this correlates to spend-ing. Note that a lower division is better than a higher.


31(40)

Figure 13: Showing how the usage of rockets correlates to spendings.

Figure 14: Showing what kind(s) of social media the user is signed into the game, and howthis correlates to spendings.


32(40)

Figure 15: Showing if the player has used the time attack feature, and how this correlatesto spendings.


33(40)

Figure 16: Showing how log in to turbonet, Turborilla’s server with forums and others, andthe correlation to spendings.

Figure 17: Showing if a user has automatic throttle or regular, and how this correlates tospendings.


34(40)


35(40)

Figure 18: Figures showing how spendings winning and lowing versus mode correlates,when maxing out different spending categories.


36(40)

Figure 19: Figures showing how winning and lowing versus mode correlates to spendings,when maxing out versus lost and won.


37(40)

Figure 20: Figures showing how the usage of xpboosts correlates to spendings.


38(40)


39(40)

Bibliography

[1] Assignment 2 - Happy, Sad, Mischievous or Mad? URL: https://www8.cs.umu.se/kurser/5DV121/HT13/assignment2/index.html (visited on 11/04/2016).

[2] Christopher Bare. Linear regression by gradient descent. June 26, 2012. URL: https://www.r-bloggers.com/linear-regression-by-gradient-descent/ (visitedon 11/04/2016).

[3] Riccardo Bellazzi, Ameen Abu-Hanna, and Jim Hunter. Artificial Intelligence inMedicine: 11th Conference on Artificial Intelligence in Medicine in Europe, AIME2007, Amsterdam, The Netherlands, July 7-11, 2007, Proceedings. Vol. 4594. SpringerScience & Business Media, 2007.

[4] Waldemar Dos Passos. Numerical Methods, Algorithms and Tools in C. CRC press,2016.

[5] Evolution of machine learning. URL: http://www.sas.com/en%5C_us/insights/analytics/machine-learning.html (visited on 10/23/2016).

[6] Exponential Models. URL: https : / / se . mathworks . com / help / curvefit /exponential.html#zmw57dd0e6701 (visited on 11/06/2016).

[7] Martin T Hagan et al. Neural network design. Vol. 20. PWS publishing companyBoston, 1996.

[8] Naftali Harris and Mathias Drton. “PC algorithm for nonparanormal graphical mod-els.” In: Journal of Machine Learning Research 14.1 (2013), pp. 3365–3383.

[9] Hugin Expert. URL: http://www.hugin.com/.

[10] Finn V Jensen. An introduction to Bayesian networks. Vol. 210. UCL press London,1996.

[11] Piotr Miernikowski Krzysztof Ziaja. Biased and non biased neurons. URL: http://galaxy.agh.edu.pl/˜vlsi/AI/bias/bias_eng.html (visited on 11/02/2016).

[12] Steffen L Lauritzen and David J Spiegelhalter. “Local computations with probabil-ities on graphical structures and their application to expert systems”. In: Journal ofthe Royal Statistical Society. Series B (Methodological) (1988), pp. 157–224.

[13] Machine Learning. URL: https://www.coursera.org/learn/machine-learning(visited on 10/23/2016).

[14] Kevin Murphy. A Brief Introduction to Graphical Models and Bayesian Networks.1998. URL: http://www.cs.ubc.ca/˜murphyk/Bayes/bnintro.html (visited on11/02/2016).


https://www8.cs.umu.se/kurser/5DV121/HT13/assignment2/index.html

https://www8.cs.umu.se/kurser/5DV121/HT13/assignment2/index.html



http://www.sas.com/en%5C_us/insights/analytics/machine-learning.html

http://www.sas.com/en%5C_us/insights/analytics/machine-learning.html

https://se.mathworks.com/help/curvefit/exponential.html#zmw57dd0e6701

https://se.mathworks.com/help/curvefit/exponential.html#zmw57dd0e6701

http://www.hugin.com/

http://galaxy.agh.edu.pl/~vlsi/AI/bias/bias_eng.html

http://galaxy.agh.edu.pl/~vlsi/AI/bias/bias_eng.html

https://www.coursera.org/learn/machine-learning

http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html

40(40)

[15] Hugin Help Pages. NPC Algorithm. URL: http://download.hugin.com/webdocs/manuals/Htmlhelp/descr%5C_NPC%5C_algorithm%5C_pane.html (visited on11/03/2016).

[16] Hugin Help Pages. PC Algorithm. URL: http://download.hugin.com/webdocs/manuals/Htmlhelp/descr%5C_PC%5C_algorithm%5C_pane.html (visited on11/02/2016).

[17] Judea Pearl. “Bayesian networks”. In: Department of Statistics, UCLA (2011).

[18] Martin Riedmiller and Heinrich Braun. “A direct adaptive method for faster back-propagation learning: The RPROP algorithm”. In: Neural Networks, 1993., IEEEInternational Conference On. IEEE. 1993, pp. 586–591.

[19] Raul Rojas. Neural networks: a systematic introduction. Springer Science & Busi-ness Media, 2013.

[20] C Ronald. Analysis of variance, design, and regression. 1996.

[21] Daniel Shiffman, Shannon Fry, and Zannah Marsh. The nature of code. D. Shiffman,2012.

[22] Peter Spirtes, Clark N Glymour, and Richard Scheines. Causation, prediction, andsearch. MIT press, 2000.

[23] Turborilla. URL: http://www.turborilla.com/.

[24] Sun-Chong Wang. “Artificial neural network”. In: Interdisciplinary Computing inJava Programming. Springer, 2003, pp. 81–100.

[25] Eric Weisstein. Least Squares Fitting. URL: http://mathworld.wolfram.com/LeastSquaresFitting.html (visited on 11/16/2016).

[26] Eric Weisstein. Regression. URL: http://mathworld.wolfram.com/Regression.html (visited on 11/15/2016).


http://download.hugin.com/webdocs/manuals/Htmlhelp/descr%5C_NPC%5C_algorithm%5C_pane.html

http://download.hugin.com/webdocs/manuals/Htmlhelp/descr%5C_NPC%5C_algorithm%5C_pane.html

http://download.hugin.com/webdocs/manuals/Htmlhelp/descr%5C_PC%5C_algorithm%5C_pane.html

http://download.hugin.com/webdocs/manuals/Htmlhelp/descr%5C_PC%5C_algorithm%5C_pane.html

http://www.turborilla.com/

http://mathworld.wolfram.com/LeastSquaresFitting.html

http://mathworld.wolfram.com/LeastSquaresFitting.html

http://mathworld.wolfram.com/Regression.html

http://mathworld.wolfram.com/Regression.html

predicting customer lifetime value - umu.se · predicting customer lifetime value using machine...

Documents