predicting the rugby world cup with a log linear …...introduction modelling outcomes predicting...

54
Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: [email protected] Hargreaves Lansdown Bristol Data Scientists, October 2019 M. Box Predicting the Rugby World Cup

Upload: others

Post on 27-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Predicting the Rugby World Cup with a Log LinearScore Model

Email: [email protected]

Hargreaves Lansdown

Bristol Data Scientists, October 2019

M. Box Predicting the Rugby World Cup

Page 2: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Outline

1 IntroductionMy background data scienceRugby World Cup: what and why?

2 ModellingData, ideas, assumptionsSketch of the modelMethodsModel validation

3 OutcomesWhat does the model tell us?Predictions: group stagePredictions: knock-out

M. Box Predicting the Rugby World Cup

Page 3: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

My background data scienceRugby World Cup: what and why?

My Background in Data Science

Mathematics BSc from University of the West of England(genetic algorithms in combinatorial optimisation)Bristol Centre for Complexity Sciences, University of BristolPhD on statistical modelling of spike trains (Box, M., Jones,M.W. and Whiteley, N., 2016. A hidden Markov model fordecoding and the analysis of replay in spike trains. Journal ofcomputational neuroscience, 41(3), pp.339-366)Data Scientist at Hargreaves Lansdown

M. Box Predicting the Rugby World Cup

Page 4: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

My background data scienceRugby World Cup: what and why?

My Background in Data Science

Mathematics BSc from University of the West of England(genetic algorithms in combinatorial optimisation)Bristol Centre for Complexity Sciences, University of BristolPhD on statistical modelling of spike trains (Box, M., Jones,M.W. and Whiteley, N., 2016. A hidden Markov model fordecoding and the analysis of replay in spike trains. Journal ofcomputational neuroscience, 41(3), pp.339-366)Data Scientist at Hargreaves Lansdown

M. Box Predicting the Rugby World Cup

Page 5: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

My background data scienceRugby World Cup: what and why?

My Background in Data Science

Mathematics BSc from University of the West of England(genetic algorithms in combinatorial optimisation)Bristol Centre for Complexity Sciences, University of BristolPhD on statistical modelling of spike trains (Box, M., Jones,M.W. and Whiteley, N., 2016. A hidden Markov model fordecoding and the analysis of replay in spike trains. Journal ofcomputational neuroscience, 41(3), pp.339-366)Data Scientist at Hargreaves Lansdown

M. Box Predicting the Rugby World Cup

Page 6: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

My background data scienceRugby World Cup: what and why?

My Background in Data Science

Mathematics BSc from University of the West of England(genetic algorithms in combinatorial optimisation)Bristol Centre for Complexity Sciences, University of BristolPhD on statistical modelling of spike trains (Box, M., Jones,M.W. and Whiteley, N., 2016. A hidden Markov model fordecoding and the analysis of replay in spike trains. Journal ofcomputational neuroscience, 41(3), pp.339-366)Data Scientist at Hargreaves Lansdown

M. Box Predicting the Rugby World Cup

Page 7: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

My background data scienceRugby World Cup: what and why?

My Background in Data SciencePast attempts at sports modelling and prediction

Horse racingRugbyFootball

M. Box Predicting the Rugby World Cup

Page 8: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

My background data scienceRugby World Cup: what and why?

My Background in Data SciencePast attempts at sports modelling and prediction

Horse racingRugbyFootball

M. Box Predicting the Rugby World Cup

Page 9: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

My background data scienceRugby World Cup: what and why?

My Background in Data SciencePast attempts at sports modelling and prediction

Horse racingRugbyFootball

M. Box Predicting the Rugby World Cup

Page 10: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

My background data scienceRugby World Cup: what and why?

Rugby World CupRugby union - about

Figure: Rugby: Running

M. Box Predicting the Rugby World Cup

Page 11: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

My background data scienceRugby World Cup: what and why?

Rugby World CupRugby union - about

Figure: Rugby: A try

M. Box Predicting the Rugby World Cup

Page 12: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

My background data scienceRugby World Cup: what and why?

Rugby World CupRugby union - about

Figure: Rugby: Kicking

M. Box Predicting the Rugby World Cup

Page 13: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

My background data scienceRugby World Cup: what and why?

Rugby World CupRugby union - about

Figure: Rugby: Tackling

M. Box Predicting the Rugby World Cup

Page 14: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

My background data scienceRugby World Cup: what and why?

Rugby World CupDetails of the competition

20 teamsTwo stages: group (pool) stage, knockout stage4 pools of 5 teams each (4×

(52)

= 40 matches)4 quarter finals, 2 semi finals, final, third place playoff (8matches)

M. Box Predicting the Rugby World Cup

Page 15: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

My background data scienceRugby World Cup: what and why?

Rugby World CupDetails of the competition

20 teamsTwo stages: group (pool) stage, knockout stage4 pools of 5 teams each (4×

(52)

= 40 matches)4 quarter finals, 2 semi finals, final, third place playoff (8matches)

M. Box Predicting the Rugby World Cup

Page 16: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

My background data scienceRugby World Cup: what and why?

Rugby World CupDetails of the competition

20 teamsTwo stages: group (pool) stage, knockout stage4 pools of 5 teams each (4×

(52)

= 40 matches)4 quarter finals, 2 semi finals, final, third place playoff (8matches)

M. Box Predicting the Rugby World Cup

Page 17: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

My background data scienceRugby World Cup: what and why?

Rugby World CupDetails of the competition

20 teamsTwo stages: group (pool) stage, knockout stage4 pools of 5 teams each (4×

(52)

= 40 matches)4 quarter finals, 2 semi finals, final, third place playoff (8matches)

M. Box Predicting the Rugby World Cup

Page 18: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

ModellingData

Match Team Opponent Score Feature_1 Feature_2 Feature_k

1 Wales England 19 2011 1 . . .

1 England Wales 26 2011 0 . . .

2 France Scotland 34 2011 1 . . .

2 Scotland France 21 2011 0 . . .

......

......

......

Table: Example data

Sources: www.scorespro.com/rugby-union/,www.oddsportal.com/rugby-union/

M. Box Predicting the Rugby World Cup

Page 19: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

ModellingData

South Africa Tonga Uruguay USA Wales

Namibia New Zealand Russia Samoa Scotland

France Georgia Ireland Italy Japan

Argentina Australia Canada England Fiji

2012 2014 2016 2018 2020 2012 2014 2016 2018 2020 2012 2014 2016 2018 2020 2012 2014 2016 2018 2020 2012 2014 2016 2018 2020

−100

−50

0

50

100

−100

−50

0

50

100

−100

−50

0

50

100

−100

−50

0

50

100

Figure: Scores for and against, all matchesM. Box Predicting the Rugby World Cup

Page 20: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

ModellingData

South Africa Tonga Uruguay USA Wales

Namibia New Zealand Russia Samoa Scotland

France Georgia Ireland Italy Japan

Argentina Australia Canada England Fiji

2012 2014 2016 2018 2020 2012 2014 2016 2018 2020 2012 2014 2016 2018 2020 2012 2014 2016 2018 2020 2012 2014 2016 2018 2020

−20

0

20

40

−20

0

20

40

−20

0

20

40

−20

0

20

40

Figure: Margin, running mean, all matchesM. Box Predicting the Rugby World Cup

Page 21: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

ModellingModelling decisions

Boys, R.J. and Philipson, P.M., 2019.On the ranking of test match batsmen.Journal of the Royal Statistical Soci-ety: Series C (Applied Statistics), 68(1),pp.161-179.

Score of team i against opponent j to be the dependentvariable.Circumstances of the match used as predictive features.Want to use past performance as a guide, but make ittime-dependent.

M. Box Predicting the Rugby World Cup

Page 22: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

ModellingModelling decisions

Boys, R.J. and Philipson, P.M., 2019.On the ranking of test match batsmen.Journal of the Royal Statistical Soci-ety: Series C (Applied Statistics), 68(1),pp.161-179.

Score of team i against opponent j to be the dependentvariable.Circumstances of the match used as predictive features.Want to use past performance as a guide, but make ittime-dependent.

M. Box Predicting the Rugby World Cup

Page 23: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

ModellingModelling decisions

Boys, R.J. and Philipson, P.M., 2019.On the ranking of test match batsmen.Journal of the Royal Statistical Soci-ety: Series C (Applied Statistics), 68(1),pp.161-179.

Score of team i against opponent j to be the dependentvariable.Circumstances of the match used as predictive features.Want to use past performance as a guide, but make ittime-dependent.

M. Box Predicting the Rugby World Cup

Page 24: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

ModellingModelling decisions

Boys, R.J. and Philipson, P.M., 2019.On the ranking of test match batsmen.Journal of the Royal Statistical Soci-ety: Series C (Applied Statistics), 68(1),pp.161-179.

Score of team i against opponent j to be the dependentvariable.Circumstances of the match used as predictive features.Want to use past performance as a guide, but make ittime-dependent.

M. Box Predicting the Rugby World Cup

Page 25: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

ModellingModelling assumptions

Score of team i in match m is conditionally independent of allscores in all matches given the circumstances of match m.Given the circumstances of match m and the year t(m), scoreof team i in match m is distributed as

Si ,m ∼ Pois (µi ,m)

Further assumptions below.

M. Box Predicting the Rugby World Cup

Page 26: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

ModellingModelling assumptions

Score of team i in match m is conditionally independent of allscores in all matches given the circumstances of match m.Given the circumstances of match m and the year t(m), scoreof team i in match m is distributed as

Si ,m ∼ Pois (µi ,m)

Further assumptions below.

M. Box Predicting the Rugby World Cup

Page 27: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

ModellingModelling assumptions

Score of team i in match m is conditionally independent of allscores in all matches given the circumstances of match m.Given the circumstances of match m and the year t(m), scoreof team i in match m is distributed as

Si ,m ∼ Pois (µi ,m)

Further assumptions below.

M. Box Predicting the Rugby World Cup

Page 28: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

Sketch of the ModelLog-linear mean score model

µi ,m = f (xi ,m, θ) ∈ R

M. Box Predicting the Rugby World Cup

Page 29: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

Sketch of the ModelLog-linear mean score model

µi ,m = f (xi ,m, θ) ∈ R

log (µi ,m) = η (xi ,m, θ) ∈ R

log (µi ,m) = ai ,t(m) − bj,t(m) + θi ,1xi ,m − θj,2xj,m

M. Box Predicting the Rugby World Cup

Page 30: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

Sketch of the ModelHidden autoregressive process for attack strength, defence strength

ai ,t = αi ,1ai ,t−1 + αi ,2ai ,t−2 + εi ,t ,

εi ,t ∼ N(0, σ2

α

).

bi ,t = βi ,1bi ,t−1 + βi ,2bi ,t−2 + τi ,t ,

τi ,t ∼ N(0, σ2

β

).

M. Box Predicting the Rugby World Cup

Page 31: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

Sketch of the ModelHierarchical model

Figure: DAG of the hierarchical model

M. Box Predicting the Rugby World Cup

Page 32: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

Fitting the ModelBayesian inference

Sample from posterior using MCMC (adaptiveGibbs-within-Metropolis).Algorithm coded in R.Posterior predictive distribution used for score predictions.

Shi ,m ∼ Pois

(eη(xi,m,θh)

), h = 1, 2, . . .H

Si ,m =1H

H∑h=1

Shi ,m

M. Box Predicting the Rugby World Cup

Page 33: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

Fitting the ModelBayesian inference

Sample from posterior using MCMC (adaptiveGibbs-within-Metropolis).Algorithm coded in R.Posterior predictive distribution used for score predictions.

Shi ,m ∼ Pois

(eη(xi,m,θh)

), h = 1, 2, . . .H

Si ,m =1H

H∑h=1

Shi ,m

M. Box Predicting the Rugby World Cup

Page 34: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

Fitting the ModelBayesian inference

Sample from posterior using MCMC (adaptiveGibbs-within-Metropolis).Algorithm coded in R.Posterior predictive distribution used for score predictions.

Shi ,m ∼ Pois

(eη(xi,m,θh)

), h = 1, 2, . . .H

Si ,m =1H

H∑h=1

Shi ,m

M. Box Predicting the Rugby World Cup

Page 35: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

Model ValidationTraining and posterior predictive checks for validation

Train the model using first T − 1 years’ data. Evaluateperformance using T thyear.Compute posterior mean and posterior quantiles of score.Compare with actual score using mean absolute error (MAE)and look at distributions.

M. Box Predicting the Rugby World Cup

Page 36: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

Model ValidationTraining and posterior predictive checks for validation

Train the model using first T − 1 years’ data. Evaluateperformance using T thyear.Compute posterior mean and posterior quantiles of score.Compare with actual score using mean absolute error (MAE)and look at distributions.

M. Box Predicting the Rugby World Cup

Page 37: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

Model ValidationTraining and posterior predictive checks for validation

Train the model using first T − 1 years’ data. Evaluateperformance using T thyear.Compute posterior mean and posterior quantiles of score.Compare with actual score using mean absolute error (MAE)and look at distributions.

M. Box Predicting the Rugby World Cup

Page 38: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

Model ValidationPosterior predictive checks: findings

95% posterior interval corresponds to about 3 tries; 90%interval to about 2.6 tries.78% scores in the 90% posterior interval for score.

M. Box Predicting the Rugby World Cup

Page 39: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

Model ValidationPosterior predictive checks: findings

95% posterior interval corresponds to about 3 tries; 90%interval to about 2.6 tries.78% scores in the 90% posterior interval for score.

0

25

50

75

Actual Predicted

Sco

re

Figure: Test data posterior predictive distributionM. Box Predicting the Rugby World Cup

Page 40: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

Model ValidationPosterior predictive checks: findings

10 20 30 40

WalesFrance

EnglandIreland

ItalyScotland

IrelandScotland

WalesItaly

FranceEngland

EnglandWales

20 40 60

ScotlandFrance

IrelandItaly

ItalyEngland

WalesScotland

FranceIreland

ScotlandEngland

FranceItaly

IrelandWales

Figure: 6 Nations 2019 results, test data

M. Box Predicting the Rugby World Cup

Page 41: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

Data, ideas, assumptionsSketch of the modelMethodsModel validation

Model ValidationPosterior predictive checks: findings

MAE in test data: 6.1

Correct Incorrect

Home 37 6

Away 20 4

Table: Confusion matrix for predictions, test data

M. Box Predicting the Rugby World Cup

Page 42: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

What does the model tell us?Predictions: group stagePredictions: knock-out

Interesting ResultsPosterior distribution of attack strength AR process

Figure: Attack strength for each teamM. Box Predicting the Rugby World Cup

Page 43: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

What does the model tell us?Predictions: group stagePredictions: knock-out

Interesting ResultsPosterior distribution of defence strength AR process

Figure: Defence strength for each teamM. Box Predicting the Rugby World Cup

Page 44: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

What does the model tell us?Predictions: group stagePredictions: knock-out

World Cup PredictionsGroup stage 1

0 50 100 150

RussiaJapan

ArgentinaFrance

FijiAustralia

South AfricaNew Zealand

TongaEngland

ScotlandIreland

NamibiaItaly

GeorgiaWales

SamoaRussia

UruguayFiji

Model’s odds Bookmakers’ odds

<1/1000, >9999/1, >9999/1 1/100, 48/1, 469/20

7/1000, 479/1, 210/1 73/100, 497/25, 123/100

6/1000, 592/1, 251/1 11/100, 1633/50, 23/4

263/100, 19/1, 5/10 43/100, 533/25, 201/100

<1/1000, >9999/1, >9999/1 1/100, 543/10, 3013/100

36/100, 4/1, 20/1 27/100, 2293/100, 31/10

<1/1000, >9999/1, >9999/1 1/100, 219/4, 1823/100

<1/1000, >9999/1, >9999/1 1/100, 2467/50, 81/5

1795/100, 52/1, 1/10 1629/100, 1201/25, 1/50

49/100, 26/1, 2/1 1/100, 461/10, 422/25

M. Box Predicting the Rugby World Cup

Page 45: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

What does the model tell us?Predictions: group stagePredictions: knock-out

World Cup PredictionsGroup stage 2

0 50 100

CanadaItaly

USAEngland

TongaArgentina

IrelandJapan

NamibiaSouth Africa

WalesAustralia

UruguayGeorgia

SamoaScotland

CanadaNew Zealand

USAFrance

Model’s odds Bookmakers’ odds

<1/1000, >9999/1, >9999/1 2/25, 1999/50, 7/1

<1/1000, >9999/1, >9999/1 1/100, 4771/100, 2007/100

<1/1000, 4479/1, 2280/1 3/100, 2399/50, 318/25

6669/100, 216/1, <1/10 177/20, 1042/25, 3/50

<1/1000, >9999/1, >9999/1 -, 1911/25, 5613/100

11044/100, 262/1, <1/10 111/100, 1043/50, 21/25

50/100, 23/1, 2/1 1/5, 784/25, 37/10

3/1000, 855/1, 622/1 3/25, 887/25, 269/50

<1/1000, >9999/1, >9999/1 -, 105/1, 8343/100

<1/1000, >9999/1, >9999/1 3/100, 4353/100, 291/25

M. Box Predicting the Rugby World Cup

Page 46: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

What does the model tell us?Predictions: group stagePredictions: knock-out

World Cup PredictionsGroup stage 3

0 25 50 75

FijiGeorgia

RussiaIreland

ItalySouth Africa

ArgentinaEngland

UruguayAustralia

SamoaJapan

TongaFrance

NamibiaNew Zealand

CanadaSouth Africa

USAArgentina

Model’s odds Bookmakers’ odds

108/100, 17/1, 1/1 251/100, 521/25, 17/50

<1/1000, >9999/1, >9999/1 -, 2247/25, 99/2

7/100, 71/1, 18/1 1/50, 4413/100, 309/20

<1/1000, >9999/1, >9999/1 1/10, 3331/100, 347/50

1/1000, 6021/1, 2510/1 -, 6841/100, 1939/50

3/100, 126/1, 56/1 9/100, 2009/50, 168/25

<1/1000, >9999/1, >9999/1 3/100, 861/20, 611/50

<1/1000, >9999/1, >9999/1 -, 181/2, 353/4

<1/1000, >9999/1, >9999/1 -, 8583/100, 159/2

1/100, 326/1, 110/1 2/25, 192/5, 141/20

M. Box Predicting the Rugby World Cup

Page 47: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

What does the model tell us?Predictions: group stagePredictions: knock-out

World Cup PredictionsGroup stage 4

0 25 50 75

FijiWales

RussiaScotland

GeorgiaAustralia

FranceEngland

SamoaIreland

ItalyNew Zealand

CanadaNamibia

ScotlandJapan

TongaUSA

UruguayWales

Model’s odds Bookmakers’ odds

<1/1000, >9999/1, >9999/1 2/25, 906/25, 703/100

<1/1000, >9999/1, >9999/1 1/100, 273/5, 592/25

1/100, 302/1, 125/1 1/100, 2273/50, 801/50

4/100, 98/1, 34/1 -

1/1000, 2886/1, 2482/1 1/50, 159/5, 404/25

19/100, 37/1, 7/1 -

178/100, 18/1, 7/10 23/25, 16/1, 93/100

3084/100, 115/1, <1/10 73/50, 159/5, 404/25

16/100, 44/1, 7/1 161/100, 471/25, 11/20

<1/1000, >9999/1, >9999/1 -, 58/1, 145/4

M. Box Predicting the Rugby World Cup

Page 48: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

What does the model tell us?Predictions: group stagePredictions: knock-out

World Cup PredictionsKnockout stage: quarter finals and semi finals

0 20 40 60

AustraliaEngland

IrelandNew Zealand

FranceWales

South AfricaJapan

New ZealandEngland

South AfricaWales

Model’s odds Bookmakers’ odds

<1/1000, 7416/1, 3313/1 31/100, 614/25, 139/50

278/100, 21/1, 4/10 1/5, 686/25, 197/50

30/100, 19/1, 5/1 7/20, 562/25, 259/100

22140/100, 495/1, <1/10 551/100, 3329/100, 13/100

2/100, 182/1, 57/1 239/100, 2307/100, 37/100

34/100, 20/1, 4/1 257/100, 2473/100, 33/100

M. Box Predicting the Rugby World Cup

Page 49: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

What does the model tell us?Predictions: group stagePredictions: knock-out

World Cup PredictionsPredictions for the final and third place playoff

Figure: Predictions: Final and third place

M. Box Predicting the Rugby World Cup

Page 50: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

What does the model tell us?Predictions: group stagePredictions: knock-out

Summary

Poisson model for score: a little underdispersed (as always).Hidden autoregressive processes filter past performance.Predictions surprisingly good considering not many predictors.But not that good (e.g. New Zealand).Future work: more predictor variables, try negative binomialmodel.

M. Box Predicting the Rugby World Cup

Page 51: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

What does the model tell us?Predictions: group stagePredictions: knock-out

Summary

Poisson model for score: a little underdispersed (as always).Hidden autoregressive processes filter past performance.Predictions surprisingly good considering not many predictors.But not that good (e.g. New Zealand).Future work: more predictor variables, try negative binomialmodel.

M. Box Predicting the Rugby World Cup

Page 52: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

What does the model tell us?Predictions: group stagePredictions: knock-out

Summary

Poisson model for score: a little underdispersed (as always).Hidden autoregressive processes filter past performance.Predictions surprisingly good considering not many predictors.But not that good (e.g. New Zealand).Future work: more predictor variables, try negative binomialmodel.

M. Box Predicting the Rugby World Cup

Page 53: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

What does the model tell us?Predictions: group stagePredictions: knock-out

Summary

Poisson model for score: a little underdispersed (as always).Hidden autoregressive processes filter past performance.Predictions surprisingly good considering not many predictors.But not that good (e.g. New Zealand).Future work: more predictor variables, try negative binomialmodel.

M. Box Predicting the Rugby World Cup

Page 54: Predicting the Rugby World Cup with a Log Linear …...Introduction Modelling Outcomes Predicting the Rugby World Cup with a Log Linear Score Model Email: Marc.Box@hl.co.uk Hargreaves

IntroductionModellingOutcomes

What does the model tell us?Predictions: group stagePredictions: knock-out

Summary

Poisson model for score: a little underdispersed (as always).Hidden autoregressive processes filter past performance.Predictions surprisingly good considering not many predictors.But not that good (e.g. New Zealand).Future work: more predictor variables, try negative binomialmodel.

M. Box Predicting the Rugby World Cup