DWIT COLLEGE
DEERWALK INSTITUTE OF TECHNOLOGY
Tribhuvan University
Institute of Science and Technology
PREMIER LEAGUE GAME RESULT PREDICTION
A PROJECT REPORT
Submitted to
Department of Computer Science and Information Technology
DWIT College
In partial fulfillment of the requirements for the Bachelor’s Degree in Computer Science and Information Technology
Submitted by
Sumit Shrestha
August, 2016
DWIT College
DEERWALK INSTITUTE OF TECHNOLOGY
Tribhuvan University
SUPERVISOR’S RECOMMENDATION
I hereby recommend that this project prepared under my supervision by SUMIT
SHRESTHA entitled “PREMIER LEAGUE GAME RESULT PREDICTION” in
partial fulfillment of the requirements for the degree of B.Sc. in Computer Science and
Information Technology be processed for the evaluation.
…………………………………………
Sarbin Sayami
Assistant Professor
Institute of Science and Technology
Tribhuvan University
i
DWIT College
DEERWALK INSTITUTE OF TECHNOLOGY
Tribhuvan University
LETTER OF APPROVAL
This is to certify that this project prepared by SUMIT SHRESTHA entitled “PREMIER
LEAGUE GAME RESULT PREDICTION” in partial fulfillment of the requirements
for the degree of B.Sc. in Computer Science and Information Technology has been well
studied. In our opinion it is satisfactory in the scope and quality as a project for the
required degree.
…………………………………… Sarbin Sayami [Supervisor]
Assistant Professor IOST, Tribhuvan University
………………………………………… Hitesh Karki
Chief Academic Officer DWIT College
………………………………………….. Jagdish Bhatta [External Examiner]
IOST, Tribhuvan University
………………………………………….. Rituraj Lamsal [Internal Examiner]
Lecturer DWIT College
ii
ACKNOWLEDGEMENTS
I would like to give my thanks to our highly respected guide Asst. Prof. Sarbin Sayami
for his valuable guidance who gave me great encouragement for this work and his
suggestions were very much helpful. I would like to express my sincere thanks to Mr.
Hitesh Karki, Chief Academic Officer of DWIT College, who helped me directly or
indirectly during this project work.
Sumit Shrestha
TU Exam Roll no: 1817/069
iii
STUDENT’S DECLARATION
I hereby declare that I am the only author of this work and that no sources other than the
listed here have been used in this work.
... ... ... ... ... ... ... ...
Sumit Shrestha
TU Exam Roll No: 1817/069
Date: ........................
iv
ABSTRACT
Prediction is a statement of an uncertain event which uses past data, analyzes it and
predicts the future. This application focuses on predicting game result of the premier
league on the basis of Back Propagation Algorithm. The input parameters chosen were
Home Teams, Away Teams, Home Team Goals, Away Team Goals and Goal
Differences. The accuracy of the system was found to be 47 percent.
Keywords: Prediction, Back Propagation, Premier League
v
TABLE OF CONTENTS
LETTER OF APPROVAL .................................................................................................. i
ACKNOWLEDGEMENTS................................................................................................ ii
STUDENT’S DECLARATION ........................................................................................ iii
ABSTRACT ...................................................................................................................... iv
LIST OF FIGURES .......................................................................................................... vii
LIST OF TABLES........................................................................................................... viii
LIST OF ABBREVATIONS ............................................................................................. ix
CHAPTER 1:INTRODUCTION ......................................................................................... 1
1.1 Background ............................................................................................................... 1
1.2 Problem Statement .................................................................................................... 2
1.3 Objectives .................................................................................................................. 2
1.4 Scope ......................................................................................................................... 2
1.5 Limitation .................................................................................................................. 2
1.6 Outline of Document ................................................................................................. 3
CHAPTER 2: REQUIREMENT ANALYSIS AND FEASIBILITY STUDY ................... 4
2.1 Literature Review ...................................................................................................... 4
2.1.1 Logistic regression concepts ............................................................................... 4
2.1.2 Bayesian network concepts ................................................................................ 5
2.1.3 Probabilistic concepts ......................................................................................... 6
2.1.4 Markov chain and Monte carlo concept ............................................................. 7
2.2 Requirement Analysis ............................................................................................... 8
2.2.1 Functional requirement ....................................................................................... 8
2.2.2 Non-functional requirement ............................................................................... 8
2.3 Feasibility Analysis ................................................................................................... 9
2.3.1 Technical feasibility ........................................................................................... 9
2.3.2 Operational feasibility ........................................................................................ 9
2.3.3 Schedule feasibility............................................................................................. 9
CHAPTER 3: SYSTEM DESIGN .................................................................................... 10
vi
3.1 Methodology ........................................................................................................... 10
3.1.1 Data collection and normalization .................................................................... 11
3.1.2 Training ............................................................................................................ 13
3.1.3 Testing .............................................................................................................. 14
3.1.4 Prediction .......................................................................................................... 16
3.1.5 Algorithm.......................................................................................................... 17
3.2 System Design ......................................................................................................... 18
3.2.1 Sequence diagram ............................................................................................. 18
3.2.2 Event diagram ................................................................................................... 19
3.2.3 Class diagram ................................................................................................... 20
CHAPTER 4: IMPLEMENTATION AND TESTING ..................................................... 21
4.1 Implementation ........................................................................................................ 21
4.1.1 Tools used ......................................................................................................... 23
4.1.2 Description of major classes ............................................................................. 24
4.2 Testing ..................................................................................................................... 27
4.2.1 Unit testing ....................................................................................................... 27
CHAPTER 5: MAINTENANCE AND SUPPORT .......................................................... 29
5.1 Adaptive Maintenance ............................................................................................. 29
5.2 Perfective Maintenance ........................................................................................... 29
CHAPTER 6: CONCLUSION AND RECOMMENDATION ......................................... 30
6.1 Conclusion ............................................................................................................... 30
6.2 Recommendation ..................................................................................................... 30
APPENDIX ....................................................................................................................... 31
REFERENCES .................................................................................................................. 36
vii
LIST OF FIGURES
Figure 1- Project block diagram .......................................................................................... 3
Figure 2 - Use case diagram of Premier League Game Result Prediction ........................... 8
Figure 3 - Activity network diagram of Premier League Game Result Prediction ............. 9
Figure 4 - Method for prediction ....................................................................................... 11
Figure 5 - Neural network of Premier League Game Result Prediction ............................ 16
Figure 6 - Back propagation algorithm .............................................................................. 18
Figure 7 - Sequence diagram of Premier League Game Result Prediction ....................... 18
Figure 8 - Event diagram of Premier League Game Result Prediction ............................. 19
Figure 9 - Class diagram of Premier League Game Result Prediction .............................. 20
viii
LIST OF TABLES
Table 1 - Functional and Non-functional requirement ........................................................ 8
Table 2 - Original data taken from website ....................................................................... 12
Table 3 - Selected data ....................................................................................................... 13
Table 4 - Normalized data .................................................................................................. 13
Table 5 - Sample training data ........................................................................................... 14
Table 6 - Trained result output ........................................................................................... 14
Table 7 - Sample testing data ............................................................................................. 15
Table 8 - Test output result ................................................................................................ 15
Table 9 - Sample trained data used for implementation .................................................... 21
Table 10 - Trained data result used for implementation .................................................... 22
Table 11 - Test data used for implementation ................................................................... 22
Table 12 - Test Data Result Used For Implementation ..................................................... 23
Table 13 - Unit Testing Of Premier League Game Result Prediction ............................... 27
ix
LIST OF ABBREVATIONS
PLS : Pseudo-likelihood Statistic
KNN: K Nearest Neighbors
ANN: Artificial Neural Network
LR: Logistic Regression
MLP: Multilayer Perceptron
JSP: Java Server Page
IDE: Integrated Development Environment
CSS: Cascading Style Sheet
DWIT: Deerwalk Institute of Technology
HTML: Hyper Text Markup Language
Premier League Game Result Prediction
1
CHAPTER 1:INTRODUCTION
1.1 Background
Football is one of the most famous games for the people worldwide. It is a game played
by two teams, who needs to hit the ball on its opponent team’s goal post. The way the
football is getting worldwide and the fans are also crazy about it. Audience wants to know
who wins the game and who loses. Prediction of such football game is hence popular.
Also many people do the betting for the favourite team. They simply see their luck
keeping the money on the team and they actually don’t know who will win and who will
lose. But, it’s actually difficult to predict the result of the game. Many of the football
experts also finds difficult to predict the score line or result of the game. The league that
is famous among all the football lovers is English Premier League where 20 English
teams compete with each other to win the league. (Reilly, Thomas; Gilbourne, 2003)
The craziness is increasing day by day and each time the teams focuses on the league.
The point system of the game is simple whichever team wins get 3 points and the team
that loses get 0 point and if the game is draw then each team will share one point each.
The team with highest point at the last wins the league. Each time different winner is seen
in every year, but the most dominant team is Manchester United with 13 Premier
League’s trophies. The bottom three teams are relegated and replaced by other teams
from lower leagues who perform better. Each team plays every other team twice, once at
home and once away. Premier League is one of the most-watched football (soccer)
leagues today within the world which is the league between the states of England clubs. It
is the league between 20 teams where bottom 3 teams gets relegated to another sub
division league of England. Prediction makes the game more really exciting and focused
to the supporting team for the wait of the chances of the team to win the game. The
prediction will be focused within the teams of English Premier League only. Premier
League Game Result Prediction predicts the result of
Premier League Game Result Prediction
2
the game between two teams performed in every weekend. (F. Halicioglu, 2005)
predicted Euro 2000 winner and conformed that the prediction is possible and hence the
prediction was started.
1.2 Problem Statement
Premier League fans are more excited to know the results of the game. They are more
enthusiastic to know the result before the game. However, the main problem is to make
the accurate result of the prediction between the teams.
1.3 Objectives
The main objective of the Premier League Game Result Prediction is:
a. To predict the result of each game of the team of the premier league.
b. To implement Back Propagation algorithm for training and testing the model.
1.4 Scope
This prediction will be related to English Premier League only.
1.5 Limitation
The limitation of the Premier League Game Result Prediction is:
a. It can't predict the result of leagues and cups matches like Spanish League, French
League, German League, World Cup, Euro Cup, etc.
b. It can't predict the score line.
c. It doesn't contains parameters like individual players details, key players, player
transfer details.
Premier League Game Result Prediction
3
1.6 Outline of Document
The remaining part of the document consists of following:
Figure 1- Project block diagram
Premier League Game Result Prediction
4
CHAPTER 2: REQUIREMENT ANALYSIS AND
FEASIBILITY STUDY
2.1 Literature Review
(Aditya Srinivas Timmaraju, Aditya Palnitkar, VikeshKhanna,2014) estimated that 4.7
billion people watched 2010-11 season. (FarzinOwramipur, ParinazEskandarain, and
Faezeh Sadat Mozneb,2015) used Bayesian Network to predict the score of Spanish
Team, Barcelona. They found the model probability and relationship among the given
domain. They collected information from valid websites that offered the statistics of the
football. They also used two factors for the match prediction psychological and non-
psychological factors through which they can predict the final result.
2.1.1 Logistic regression concepts
(Igiri, Chinwe Peace, Nwachukwu, Enoch Okechukwu, 2014) used Artificial Neural
Network (ANN)and logistic regression (LR) techniques with Rapid Miner as data mining
tool which result of 85% and 93% prediction accuracy respectively. They compared the
existing system of prediction with their system and found that their system were twice
more accurate than the current existing system. They took certain factors that affects the
result of the match like Home advantage effect on team’s performance, the effect of
injuries of key players on team performance, effect of external cup on league
performance. They have seen these factors and related that the effect of the certain factors
can also be seen in the team. (YueWengMak,2013) used a three multilayer perceptron
concept and took factors like last 5 matches of each team and their last 3 encounters
between the team and added home advantage and ranking inputs to predict the result
which predicted that the more the consideration of the factors the more likely the
prediction will be correct.
Premier League Game Result Prediction
5
2.1.2 Bayesian network concepts
(A. Joseph, N.E. Fenton, M. Neil, 2006) used Bayesian network approach to predict the
result of Spurs only. They predicted containing certain factors of Bayesian network like
MC4 Learner, Naïve Bayesian Network, Hugin Bayesian Network, Expert Bayesian
network and KNN (K Nearest Neighbour) algorithm. They used the process of machine
learning with two tangible benefits, understanding and prediction. The MC4 learner
identifies those attributes, which have the largest effect on the outcome of the game. It
shows their relationships to each other in terms of their effect on the outcome of the
game. This is a very simplified model of the game itself. The naïve Bayesian learner
doesn’t construct a model as such its model is predefined. The learning process for the
naïve Bayesian learner is then simply one of discovering the relative strength, and
polarity, of the effect of each attribute with respect to the result. KNN does not construct
a model as such it simply uses the existing data and provides a likeness comparison with
any test data. Thus KNN doesn't significantly enhance our understanding. The expert
constructed Bayesian Network represents the knowledge of the expert, that is, it is a
model is the expert’s belief of the interrelationships between the attributes and their
relative importance.
(Jeffrey Alan Logan Snyder, 2013) reviewed past and current research efforts in soccer
prediction, categorizing their approaches and conclusion. It treats a match as a result-
producing black box, ignoring the noisy, but beautifully complex processes that
contribute to each shot and goal. As XY data makes its way into the hands of researches,
more detailed models of in-game processes may be built. This will in turn open up new
problems and avenues for analysis, hopefully leading to a deeper understanding of the
game itself. It also has begun the process of determining the relative importance of these
types of data in prediction and analysis. They also have presented an approximately
optimal betting strategy for use in betting simultaneously on multiple games with
mutually-exclusive outcomes, which performs substantially better than other strategies
used in academic betting simulations.
Premier League Game Result Prediction
6
2.1.3 Probabilistic concepts
(AdityaSrinivasTimmaraju, Aditya Palnitkar, VikeshKhanna,2014)selected the important
characteristics for a feature they are Incorporate a notion of the competing nature of the
problem, Be reflective of the recent form of a particular team and Manifest the home
advantage factor. They took three approaches and they are:-
a. Approach 1:-The first approach they took was using Multinomial Logistic Regression.
They considered the performance metrics derived from the current match, rather than
taking the average over the last “k” matches. During testing, they predicted the match
outcome of team A vs team B, where they arrived at the feature vector using KPP.
b. Approach 2 :-In second approach, they trained the same way and more precisely, in the
training phase too, instead of using the feature vector as the performance metric vector
corresponding to the current match, they used KPP. This meant that the trained
parameters now inform their beliefs about the result of a match based on the performance
in last “k” matches. In this approach, they also used TGKPP instead of KPP.
c. Approach 3 :-All approaches they tried will find a global set of parameters, which were
independent of the competing teams. So, the given past “k” performances of the team
playing home and the team playing away, our model was agnostic to the identity of the
actual teams playing. They felt they were missing some team-specific trends using this
method. So, in this approach, they trained different models for the different teams.
However, this approach placed a limitation on the data they could use to test/train our
model. They could no longer combine data from two different seasons due to the form of
the team varying between seasons, and major players being traded between teams. So,
due to the limited data, and the increased noise induced by increasing the granularity in
the model, they ended up getting a lower accuracy (an average of 47%).
With these approaches, they proposed a metric, which is computed as the geometric mean
of the predicted probabilities of actual outcomes, which is called PLS. They obtain a PLS
of 0.357 for EPL which had the best value of PLS as 0.36007. (AlbinaYezus, 2014)
divided the work in the steps as choosing match set that is to be analyzed, deciding on key
features, data extraction, testing various machine learning algorithms and improving the
implemented algorithms. He found out that it is possible to find a classifier that predicts
the outcome of soccer matches with the precision of more than 60%.
Premier League Game Result Prediction
7
2.1.4 Markov chain and Monte carlo concept
(Havard Rue and Oyvind Salvesen, 1999) used MCMC (Markov Chain and Monte Carlo
Methods) in which they produced as an irreducible aperiodic transition kernel. They
presented that this approach seems superior to the earlier attempts to model the football
games as it allows for coherent inference of the properties between the teams easily
account for the joint uncertainty in the variables which in important in prediction allows
for doing various interesting retrospective analysis of a season and finally provides a
framework, where is it easy to change parts or parameterization in the model. They also
have improved on the data, parameter estimation, goal model and home field advantage,
which make that the point of the prediction highly rises than before with these
parameters. (DouweBuursma, 2013) found out the selection of the feature set led to the
important insights and the performance of the classifier kept improving the bigger set of
recent matches which showed the optimal number of matches to lie around 10, 20 and
performance keeps improving. The history of opponents of home teams does not seem to
play as important a role as the history of opponents of the away team. He trained a
number of data, which made the system fare, which makes that the analysis can use more
data. The feature set like Match History, Classifiers helps to get the correct model and it
also helps to make the data more useful for the training set. The Classifiers classifies all
matches as home wins, draws or away wins, depending on the features belonging to that
match. He used classifiers like ClassificationViaRegression, MultiClassClassifier,
Rotation Forest, LogitBoost, BayesNet, NaiveBayes and Home Wins. He concluded that
the football matches would always be very hard to predict. (Francisco Louzadal, Adriano
K. Suzuki and Luis E.B. Salasar, 2014) & (Brijesh Kumar Bhardwaj, Saurabh Pal, 2011)
used the data mining concept for the prediction over the improvement using
classification. In data mining process, they first prepared the data over the size of 300 and
they selected only the field that was necessary for data mining. The predictor and
variables were derived from the database and they used Bayesian Classification for the
implementation of the mining model. (GianlucaBaio, Marta A. Blangiardo, 2010) used
Bayesian network and poisson distribution for the analysis of the model where the
parameters defined was home advantage, the scoring intensity. They have showed that the
home team have greater potential to win and the team with the low points are to concede
more goals at home and lose. The estimation of the team was poor when bivariate poisson
was used and the models parameters perform better than bivariate poisson.
Premier League Game Result Prediction
8
2.2 Requirement Analysis
Table 1 - Functional and Non-functional requirement
2.2.1 Functional requirement 2.2.2 Non-functional requirement
Predicts the result of the game between two
teams.
Prediction is based on the basis of team
name.
Compares the result of the data that are
trained.
A result will be shown on which team will
win, lose or draw the game.
The basic functionality of this application is that it predicts the result of the game between
two teams. The results are then compared with the data that are being trained which is
described in Table 1. The non-functional requirement includes displaying of the input
field and a predict result button, then a result will be shown comparing the data with the
trained data which is described in Table 1.
Figure 2 - Use case diagram of Premier League Game Result Prediction
The Figure 2 describes about the use case diagram of the system. The figure illustrates
that user can first select the team names and sees the prediction results.
Premier League Game Result Prediction
9
2.3 Feasibility Analysis
2.3.1 Technical feasibility
Premier League Game Result Prediction is an application based on the implementation of
the Back Propagation algorithm. It uses gradient descent, multilayer perceptron concept
and Java Servlet and Java Server Page (JSP) as a front end to display the content for the
user. All of the technology that are needed by Premier League Game Result Prediction
are available and are open source which can be accessed freely so it is technically
feasible.
2.3.2 Operational feasibility
Premier League Game Result Prediction has a simple and user friendly user interface
which makes it easy to use. The user can see the result by inserting the team names in the
availability of the internet which makes it operationally feasible.
2.3.3 Schedule feasibility
The time allocated of the Premier League Game Result Prediction application is within 47
days and shown as below:
Figure 3 - Activity network diagram of Premier League Game Result Prediction
In Figure 3, we can see that Premier League Game Result Prediction application was
completed within47dayswhich is within schedule and hence it is schedule feasible.
Premier League Game Result Prediction
10
CHAPTER 3: SYSTEM DESIGN
3.1 Methodology
With all the literature review, the most common thing was choosing the correct parameter
is the first way to get the prediction. The more the parameters, the more chances of
getting the result of the prediction correct. The prediction that was difficult for the experts
have made some easy task due to several prediction methods. The parameters or factors
such as home advantage, injuries of the players, cup game effect on league, team recent
form and the head-to-head matches between the opponent need to be analyzed which
adversely affect the result of the match. (Igiri, Chinwe Peace, Nwachukwu, Enoch
Okechukwu, 2014) used the Hidden Markov Process Model and Ordered Probit
Regression model with only three parameters like home advantage, injury of the key
players and cup game effect on league. I will be using the same models but with more
parameters like the team recent form and the head-to-head matches between the
opponents which might affect the accuracy of the prediction then (Igiri, Chinwe Peace,
Nwachukwu, Enoch Okechukwu, 2014). With lots of data, the data-mining tool is used to
extract the information. Artificial Neural Network (ANN) and Regression techniques are
two data-mining techniques that will be used. I will first collect the previous results of the
matches with every history of the team and the teams they played with. Then, from the
collected data will extract the features such as Home and Away Goal Difference, Points,
Attack and Defense Skills, which is not needed for the techniques. Then a collective
database will be built to collect all the necessary data and will be kept in MS Excel
spreadsheet. Using an algorithm will use the parameters to adjust the training algorithms
to make it more useful. The data-mining tool, which holds the better results and improves
the learning rate, also finds the optimal values and an improved model will be built. The
two models Artificial Neural Network (ANN) and Logistic Regression (LR) are used to
make the improved models for the system. So, the record of the match and the result of
the match will be predicted by the use of ANN and LR model. The accuracy was 75.04%
according to (Igiri, Chinwe Peace, Nwachukwu, Enoch Okechukwu, 2014)but after the
Premier League Game Result Prediction
11
addition of the parameters that helps to estimates the model more might increase the
accuracy by 85%.
So, with the known of the several factors, I came to use different input variables, output
variables, algorithms for the process of prediction.
Figure 4 - Method for prediction
Figure 4 illustrates the method that needs to be followed while making the prediction.
First, the data is collected and it is being normalized by using sigmoid function. Then the
normalized data is trained by using neural network and the trained data are being tested
for the prediction accuracy. Then after the testing the prediction can be done.
3.1.1 Data collection and normalization
The data are collected from the official premier league website i.e.
www.premierleague.com& http://www.football-data.co.uk/englandm.php. The premier
league started from 1993 and there are overall 9000 data of the premier league starting
from 1993. The data that are collected contains unnecessary parameters that need to be
filtered out. The necessary data are only collected. Then, the collected data are
normalized. The process adopted for normalizing the data is given below:
Normalized value = (xi−min(x)) / (max(x)−min(x))
Premier League Game Result Prediction
12
After normalization of the data the input and output parameter are selected which have
major impacts on the accuracy of the prediction.
The input variables chosen for training will have a heavy impact on the prediction of the
result. The input variables that were used are
a. Team Name
b. Previous game home score
c. Previous game away score
d. Head-to-head home games
e. Head-to-head away games
The neural network architecture was trained with the given data set with the above five
variables as input parameters. Supervised learning approach was adopted to train the
network and the network can predict the result based on the output combination of three
nodes of output layer. The pattern of the output would determine the result whether it is
the case of win, lose, or the draw game.
Table 2 - Original data taken from website
Table 2 shows the original data taken from the websites which needs to be refined and
only usable parameters are selected.
Premier League Game Result Prediction
13
Table 3 - Selected data
Table 3 shows the refined data with only the necessary parameters but without
normalizing the data. Home Team, Away Team, Home Team Goals, Away Team Goals
and Goal Differences are the necessary parameters chosen from the refined data.
Table 4 - Normalized data
Table 4 shows the refined data that are normalized.
3.1.2 Training
The data that were collected from official premier league website i.e.
www.premierleague.com&http://www.football-data.co.uk/englandm.php of which 80%
of the data were selected in this application for the training purpose. The normalized data
were divided into two parts as an input and output parameter.
Premier League Game Result Prediction
14
Table 5 - Sample training data
Table 5 shows the sample input data that are being trained. The selected data are kept are
the input parameter for the training set.
Table 6 - Trained result output
Table 6 shows the output of the three nodes in the output layer which determines the
result of the game. The above output represents win, loss, and draw depending upon the
pattern of output nodes. The three combinations as 000 represents loss, 010 represents
draw, and 100 represents win case during the prediction.
3.1.3 Testing
From the available 9000 data sets, 20% of the data were selected in this application for
the testing purpose. The normalized data were divided into two parts as an input and
output parameter. Since, it is a supervised learning, the output and input is already known.
Premier League Game Result Prediction
15
Table 7 - Sample testing data
Table 7 shows the sample input data that are being tested. The data are kept are the input
parameter for testing purpose.
Table 8 - Test output result
Table 8 shows the output that are been seen while testing data. The data represents win,
lose and draw. 010 represents draw, 100 represents win and 000 represents lose.These
estimated output condition is then compared with the original target output condition to
check the validity of the system. Output condition given by these test cases data set must
be similar with the target output condition for all the data set of testing data for the
accuracy of the system.
Premier League Game Result Prediction
16
3.1.4 Prediction
After the completion of the training and testing phase, the next phase is prediction which
predicts the result of the game by adjusting weight neural network and by entering the
team names in the input field.
A Multilayer perceptron was chosen because we have to train the input variables and
generate the output variables that are significantly originated by the hidden layers used in
the multilayer perceptron. Multilayer Perceptrons (MLPs) are layered feed forward
network. Multilayer Perceptron helps to make the mapping of input and output but it
requires more set of training data and also requires more time. In this multilayer
perceptron¸ the hidden layers are selected by hit and trail and more number of hidden
layers will have low processing capability. During the training phase, first the hidden
layer is set to the minimal and checking the value of cost functions, the number of hidden
layers and number of nodes in each hidden layers is determined. Finally, the hidden layer
with number of nodes in each layer is determined as such that the cost function reached
minimum and the process is stopped and the network topology is set up for the prediction.
Figure 5 - Neural network of Premier League Game Result Prediction
Figure 5 shows that this application consists of five input parameters as home team, away
team, home team goals, away team goals and goal difference with two hidden layers and
the output layer contains three nodes. The result predicted by this network is listed in the
appendices 1, 2, and 3 respectively.
Premier League Game Result Prediction
17
3.1.5 Algorithm
Back propagation algorithm was used for training the neural network.
Steps for back propagation:
a. Initialization of the weight to random values.
b. Feed the training sample through the network and determine the final output.
c. Computation of the error for each output unit
for unit k, it is
δk = (tk – yk)*f'(yink) = (tk – yk)*f(yink)*[1-(f(yink))]
d. Calculation of the weight correction term for each output unit
for unit k, it is
∆θjk = αδkZj
e. Propagate the delta terms (errors) back through the weights of hidden units where the
delta input for jth hidden unit is
δj= (δink)*f'(Zink) = (δink)*f(Zink)*[1-(f(Zink))]
f. Calculate the weight correction term or parameters for the hidden units
∆θij = αδj xi
g. Update the weights or parameter
θjk (new) = θjk (old)+ ∆θjk
h. Test for the stoppage
Premier League Game Result Prediction
18
Figure 6 - Back propagation algorithm
3.2 System Design
3.2.1 Sequence diagram
Figure 7 - Sequence diagram of Premier League Game Result Prediction
Premier League Game Result Prediction
19
Figure 7 illustrates the sequence diagram of the application Premier League Game Result
Prediction. It gives the sequence of the application from the user to the system and vice-
versa. When the user chooses the team names from the available teams with trained set of
data, it gets interpreted and the system returns the predicted result to the user. Further it is
displayed on the application where admin gives trained and tested result.
3.2.2 Event diagram
Figure 8 - Event diagram of Premier League Game Result Prediction
Figure 8 illustrates the event diagram of the Premier League Game Result Prediction
which displays events and the process that are interrelated to each other. First, admin
trains the data which is registered for the testing purpose and then test is performed. After
the test is performed, a user-friendly User Interface is built and the user enters the team
names into the input fields and clicks the predict button from which the result can be seen
by the user.
Premier League Game Result Prediction
20
3.2.3 Class diagram
Figure 9 - Class diagram of Premier League Game Result Prediction
Figure 9shows the class diagram of this application which consists of a Main Class only
where a function named as predictResult() represents the prediction of the result. The
parameters or variables include Home Team Name, Away Team Name, Home Team
Goals, Away Team Goals and Goal Differences.
Premier League Game Result Prediction
21
CHAPTER 4: IMPLEMENTATION AND TESTING
4.1 Implementation
Premier League Game Result Prediction is the ability to predict the game result. The main
purpose of implementing Premier League Game Result Prediction is to predict result of
matches before the game. The past dataset needs to be trained and tested and with the
trained dataset, we need to predict the result. In the implementation process, we have
around 9000 data of which 80% are used for the training purpose whereas remaining 20%
are used for the testing purpose.
Training is done by using Back Propagation algorithm in octave application where the
data are differentiated into Home Team, Away Team, Home Team Goals, Away Team
Goals and Goal Differences in one file and the result is set in another file. The supervised
learning is done so the test and trained result output must be given before processing the
application.
Table 9 - Sample trained data used for implementation
Table 9 shows the sample input data that are being trained. The data are kept are the input
parameter for the training set.
Premier League Game Result Prediction
22
Table 10 - Trained data result used for implementation
Table 10 shows the output that is seen while data are trained. The data represents win,
lose and draw. 010 represents draw, 100 represents win and 000 represents lose. Testing
is done as per the result obtained from the training sets in octave application where the
output pattern is already known by the application.
Table 11 - Test data used for implementation
Table 11 shows the input data that are being tested. The data are kept are the input
parameter for the testing purpose.
Premier League Game Result Prediction
23
Table 12 - Test Data Result Used For Implementation
Table 12 shows the output that is seen while testing the data. The data represents win,
lose and draw. 010 represents draw, 100 represents win and 000 represents lose.Testing is
done as per the result obtained from training sets in octave where the output pattern is
already known by the application.
4.1.1 Tools used
Different tools, applications and technologies have been used in this project. Some of
them are discussed below.
Octave
It is a high-level interpreted language, primarily intended for numerical computations. It
provides capabilities for the numerical solution of linear and nonlinear problems, and for
performing other numerical experiments. Basically, Octave is used to train and test the
data sets which are useful in predicting the result. Octave uses different functions and
some custom functions which sets up the input, hidden layers and generate output. The
main reason for using octave is it makes easy to interpret the result and used for testing
and training the dataset in the project.
Creately
It is the diagramming and designing software which helps to design the various technical
diagrams. It is a cloud-based diagram tool built on Adobe's Flex/Flash technologies and
provides a visual communication platform for virtual teams. It can be used to create info-
graphics, flowcharts, Gantt charts, organizational charts, website wireframes, UML
Premier League Game Result Prediction
24
designs, mind maps, circuit board designs, doodle art and many other diagram types. In
our project, there are many technical diagrams like UML class diagram, event diagram,
sequence diagram and use case diagram which are built from creately.com which helps to
make the easy diagram simply by drag and drop.
HTML/CSS/JQuery
HTML(Hyper Text Mark Up Language) and CSS(Cascading Style Sheet) is used to give
the view of the application or websites. It is used to make the user interface for the user
which makes them user friendly to use it. JQuery is used to hold the answer, process it
and display the answer in this application.
Java Servlet
It is used to create the web-based application in java using Java Server Page (JSP). In this
project, the training set of data is kept as the theta values where the Java Servlet is used to
interpret it and display the result of the prediction.
IDEA Intellij
It is a Java integrated development environment (IDE) for developing computer software.
The project is written, compiled and run in this software which made easy to write code
of the application.
4.1.2 Description of major classes
Main Class
It is the main class where the main program is included. This class consists of a method
which interpret the values that are coming from the trained data and processing it to the
user input and displaying the output which consists of predictResult() as the main
function
predictResult() method predicts the result of the user input on which team will win the
game with interpreting the values of theta with the help of matrix multiplication.
double [][] theta2Transpose = new double[11][3];
//Transpose of a matrix
Premier League Game Result Prediction
25
for(int i=0;i<3;i++){
for(int j=0;j<11;j++){
theta2Transpose[j][i] = theta2DoubleArray[i][j];
}
}
double [][] finalResult = new
double[firstResultAfterBias.length][theta2Transpose[0].length];
System.out.println(finalResult.length + "x" + finalResult[0].length);
for(int i=0;i<firstResultAfterBias.length;i++) {
for(int j=0;j<theta2Transpose[0].length;j++){
finalResult[i][j] = 0.0;
for(int k=0;k<theta2Transpose[0].length;k++) {
finalResult[i][j] += firstResultAfterBias[i][k] * theta2Transpose[k][j];
}
}
}
double [][] secondModifiedresult = new
double[firstResultAfterBias.length][theta2Transpose[0].length];
for(int i = 0; i<finalResult.length; i++){
for(int j = 0; j <finalResult[0].length; j++){
secondModifiedresult [i][j]= (1 / (1 + Math.pow(Math.E, (-1 * finalResult[i][j]))));
System.out.println(secondModifiedresult[i][j]);}}
The above code shows the interpretation of theta values that are obtained from the
training set. It interprets the theta values doing matrix multiplication. The two
Premier League Game Result Prediction
26
dimensional matrix multiplications is used for analyzing the theta values and the result is
being displayed. For the multiplication of matrix, the size of the multidimensional array
are determined and if it doesn't matches the matrix as of the theta value then transpose of
the matrix is used.
costFunction = @(p) nnCostFunction(p, input_layer_size, hidden_layer_size, num_labels,
X, y, lambda);
The code calculates the cost function with given input layer, hidden layer, the output. y
represents what will be the output look like and X is the training set of data.
Premier League Game Result Prediction
27
4.2 Testing
4.2.1 Unit testing
Each unit of the system was tested for its correct and proper functionality.
Table 13 - Unit Testing Of Premier League Game Result Prediction
Test Id. Unit Test
Summary
Expected
Result
Test
Outcome
Evidence
1 Cost
Function
Cost
function
minimizing
test
Cost
function
should be
minimum in
every
iteration
Pass Test 1.1
2 Error Check Target value
test
Output
should be
similar to
target value
of testing
data set
Pass Test 1.2
3 Predicted
Value Check
Output value
test
Output
should be in
pattern
either 000,
100, or 010
Pass Test 1.3
Premier League Game Result Prediction
28
Test Case 1
Test 1.1: Cost function successfully minimized
Input: Values of attributes variables like Home Team, Away Team, Home Team Goals,
Away Team Goals and Goal Differences are provided to the neural network as input
variables.
Output: After 20,000 iterations, cost functions was finally stabilized with a minimum
value of 5.3247832.
Test 1.2: Output for testing dataset successfully implemented
Input: Values of attributes variables like Home Team, Away Team, Home Team Goals,
Away Team Goals and Goal Differences from testing data set was provided as an input
for validating the output of system.
Output: Output obtained with the target values of testing data set.
Test 1.3: Predicted Output Checked.
Input: Two teams were selected from the given team names by the user for the prediction
of the game.
Output: Result obtained meet the criteria of the result as home team wins, away team
wins or draw as shown to the user.
Premier League Game Result Prediction
29
CHAPTER 5: MAINTENANCE AND SUPPORT
This application will be maintained over a period of time to provide the future needs.
Some of the technique for maintenance is:
5.1 Adaptive Maintenance
This application needs to update the data and it needs to be trained within every year
which makes the prediction more accurate.
5.2 Perfective Maintenance
The interface for user can be changed frequently and more user friendly design is
developed. The game result can be predicted with the score line rather than win, lose or
draw.
Premier League Game Result Prediction
30
CHAPTER 6: CONCLUSION AND RECOMMENDATION
6.1 Conclusion
This application estimates the prediction of the result of premier league game between
two teams as win, lose or draw of the game. The result is overall satisfactory with 47% of
the accuracy of the prediction using Back Propagation algorithm.
6.2 Recommendation
This application consists of the few number of parameters which definitely cannot predict
more accurate result. However, more number of attributes cannot improve accuracy of the
prediction. Thus, selection of good attributes can only improve the accuracy of the
prediction system. Hence, the best parameters still need to be selected for the higher
improvement of accuracy of predicting the game result.
Premier League Game Result Prediction
31
APPENDIX
This screenshot shows the home page or the landing page which consists of the current
premier league table standings of the club. In the middle section, the teams are being
selected through the dropdown. The right side consists of the upcoming game fixtures in
the premier league.
Premier League Game Result Prediction
32
This screenshot shows the selection procedure of the teams to predict their results.
Premier League Game Result Prediction
33
This screenshot shows the result of the game when home team wins.
Premier League Game Result Prediction
34
This screenshot shows the result of the game when the result is draw.
Premier League Game Result Prediction
35
This screenshot shows the result of the game when away team wins.
Premier League Game Result Prediction
36
REFERENCES
A. Joseph, N. F. (2006, April 6). Predicting football results using Bayesian nets and other machine learning techniques. p. 10.
Aditya Srinivas Timmaraju, A. P. (2013). Game ON! Predicting English Premier League Match Outcomes.
Blundell, J. D. (2014). Numerical Algorithms for Predicting Sports Results. School of Computing, Faculty of Engineering.
Brijesh Kumar Bhardwaj, S. P. (April, 2011 ). Data Mining: A prediction for performance improvement using classification . (IJCSIS) International Journal of Computer Science and Information Security, .
Buursma, D. (2013). Predicting sports events from past results towards effective betting on football matches. Netherland: University of Twente.
Byungho Min, C. C. A Compound Approach for Football Result Prediction. Seoul, Korea: School of Computer Science and Engineering.
Farzin Owramipur, P. E. (October, 2013). Football Result Prediction with Bayesian Network in Spanish League-Barcelona Team. International Journal of Computer Theory and Engineering, Vol. 5, No. 5 .
Francisco Louzada, A. K. (2014). Predicting Match Outcomes in the English Premier League:Which Will Be the Final Rank? Journal of Data Science , 235 - 254.
Gianluca Baio, M. A. (2013). Bayesian hierarchical model for the prediction of football results. London: University College London.
Håvard Rue, Ø. S. (1997). Prediction and Retrospective analaysis of Soccer Matches in a league. Trondheim, Norway: Norges Teknisk-Naturvitenskapelige Universitet.
Igiri, C. P. (December, 2014). An Improved Prediction System for Football a Match Result . IOSR Journal of Engineering (IOSRJEN) , 12 - 20.
Johanne Birgitte Linde, M. L. (June, 2014). Predicting Outcomes of Association Football Matches Based on Individual Players' Performance. Norwegian University of Science and Technology.
Premier League Game Result Prediction
37
LANGSETH, H. (2014). Beating the bookie: A look at statistical models for prediction of football matches. Trondheim, Norway: Department of Computer and Information Science, Norwegian University of Science and Technology.
Mak, Y. W. (2013). Prediction on Soccer Matches using MultiLayer Perceptron.
Snyder, J. A. (2013). What Actually Wins Soccer Matches: Prediction of the 2011-2012 Premier League for Fun and Profit.
Stylianos Kampakis, A. A. (n.d.). Using Twitter to predict football outcomes.
Yezus, A. (2014). Predicting outcome of soccer matches using machine learning. Saint-Petersburg State University, Mathematics and Mechanics Faculty.