predicting football match results

21
Predicting Football Match Results Vahndi Minah Machine Learning Methods Comparison

Upload: vahndi-minah

Post on 12-Apr-2017

226 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Predicting Football Match Results

Predicting Football Match ResultsVahndi MinahMachine Learning Methods Comparison

ContextProfessional football is a multi-billion pound industry. Many bookmakers make a lot of money from running odds on each game.Organisations such as OPTA collect hundreds of thousands of statistics each year, which they sell on to bookmakers to produce their prediction modelsIf a prediction algorithm could be produced using only publicly available data, which is more accurate than the bookmakers algorithms, money could be made without paying OPTA for their information

DataEnglish Premier League 2003 - 201537 Teams, 49 Referees, 4,329 matchesOdds for Home Win, Away Win & Draw from 4 bookiesAdditional info from each teams previous match:Odds# Yellow and Red Cards# Corners# Fouls# Shots and # Shots on TargetCategorical variables e.g. team names converted to dummy variables 1173 features in totalData from football-data.co.uk

1 Understanding Logistic Regression Analysis, Sandro Sperandi 2013

Models and SoftwareLogistic Regression and Random Forest ClassifiersImplemented in Python:Scikit-Learn: Machine Learning ModelsPandas: Input Data Conversion and Model OptimisationNumpy: Random Settings Generation and VariationMatplotlib: Plotting of Results

Model TypesLogistic RegressionIterative linear parametric modelRelative importance of inputs interpreted by inspection of coefficientsCan be slow to converge - dependent on selection of number of iterations and regularisationIterative, so not parallelisableAssumes independence between inputs

Random ForestsCombination of decision tree predictorsEasy to which features are most importantMajority vote winsGeneralises well with large number of treesMore models = longer run timeSince models are independent can use multithreadingNo independence assumption

Hypothesis StatementAn ensemble classifier, with no assumption of independence of inputs will outperform a classifier which assumes independence at predicting football match results

Model ParametersLogistic RegressionPenalisation norm: l1 / l2Inverse regularisation strength: positive floatFit intercept: boolIntercept scaling: floatClass weights: 3 floatsMaximum Iterations: intSolver: newton-cg / lbfgs / liblinearMulti-class: ovr / multinomialRandom ForestNumber of trees: intSplit criterion: gini / entropyMin samples split: intMin leaf samples: intBootstrapping: boolOOB score: boolClass weights: 3 floats

Training Strategy2 Rounds of TrainingRound 1: Randomised [Grid] SearchRound 2: Optimisation of top 100 modelsPros:Less user interventionRelatively independent testing of each parameterCons:Slightly more complicated implementationLonger run-timesMore post-processing

Control Loop 1Training ImplementationRandom SearchOptimisationRange ConstraintsResults /Settings Store 1Results / Settings Store 2

CV Model Trainer

Control Loop 2

Random Settings GeneratorRandom Settings Variator

CV Model Trainer1a1b1c1d2b2a2c2d2e2f

Numbers of Training RunsAlgorithmRandom Search RunsOptimisation RunsLogistic Regression3, 0003, 000Random Forest3, 00010, 000

Parameters: Class WeightsCommon to both model typesauto option to set weights according to class sizes

RF Parameters: Split Samples Min # Samples to Split Node or Create LeafAvoid low settings for Min Leaf Samples

RF Parameters: Forest Size # Trees and Max # Features to consider at SplitMinimum # of estimators 50Lower # of Split Features seems to be better this is consistent with Breimans finding that one or two features gives near optimum results2

2 Random Forests: Leo Breiman, 2001

RF: Categorical ParametersGini index with bootstrapping and no out of bag samples performs best (by a slim margin)

LR: Categorical ParametersCategorical parameters do not appear to have a significant effect on the quality of results, although the liblinear solver with the ovr multi-class option using l1 regularisation has high varianceThis seems to contradict Ngs result that l1 regularisation performs better than l2 regularisation in the presence of irrelevant features3

3 Feature selection, L1 vs. L2 regularization and rotational invariance: Andrew Ng, 2004

LR: Regularisation and Intercept Scaling

Too much regularisation appears to slightly impair model performanceIntercept scaling has no real effect

Search vs. OptimisationThe plan was that the optimisation stage would improve resultsIn actuality it gave a distribution around the existing best results, perhaps due to the large number of search runs

ResultsLogistic RegressionPredictedWinDrawLossActualWin28.6%10.7%9.4%Draw9.8%6.7%8.3%Loss3.5%6.4%16.6%Accuracy68.3%28.2%48.5%52.0%

Random ForestPredictedWinDrawLossActualWin31.8%9.8%7.2%Draw11.8%6.2%6.8%Loss5.3%5.8%15.4%Accuracy65.0%28.6%52.4%53.3%

Analysis: Random ForestReferees and Home / Away Teams not important (