because it's the cup: predicting the stanley cup...
TRANSCRIPT
Because It's the Cup: Predicting the Stanley Cup PlayoffsMason Swofford, Shuvam Chakraborty, Vineet Kosaraju
The Stanley Cup playoffs have longbeen known for their drama: mostgames are close, upsets arecommon, and teams not consideredone of the best can win the cup.However, current predictions arebased on traditional NHL statistics,which are not indicative of successdue to luck influenced outcomes.Our project has two main goals: 1)predict regular season and playoffgame results, and 2) construct agambling agent to optimize returns.
Backround Features Models Results & Error Analysis
Data Collection
Twomaindatasets:
• Regularseason&playoffgames
• Trainingsetamalgamatedfromdatasources#2(stats:features)and#3(gameresults:labels)
• Eachtrainingexampleis1game,label=winningteam(0=A,1=H)
• Featuresforeachtrainingexampleconsistsofteamstatisticsaveragedoverpastgamesinseason(seeright)
Feature DescriptionCF Corsi For,shotattemptsforateam,
includingblockedshots,andshotsnotongoal
CA Corsi Against,shotattemptsagainstateam,includingblockedandnotongoalshots.
GF GoalsForGA GoalsAgainstxGF ExpectedGoalsFor,basedon
qualityofshotattemptsforxGA ExpectedGoalsAgainst,basedon
qualityofshotattemptsagainstPENT PenaltiesTakenPEND PenaltiesDrawn
Feature DescriptionPDO Shooting%+Save% (roughmeasure
ofluck)FF Fenwick For(unblockedshot
attempts)FA FenwickAgainstSF ShotsFor(ongoal)SA ShotsAgainstxPDO ExpectedPDOdPDO PDOdifferenceOZS OffensiveZoneStartsDZS DefensiveZoneStartsNZS NeutralZoneStartsZSR ZoneStartRatioFOW Faceoffs WonFOL Faceoffs LostGVA GiveawaysTKA TakeawaysHF HitsForHA HitsAgainst%Win WinningPercentage
“Basic” Features
“Advanced” Features
Regarding Goal 1, classificationmodels attempted include:
• Logistic, softmax regression
• SVM (rbf, linear, poly, sigmoid)
• ANNs (varying hidden layers,activation functions)
Features chosen using basic featureselection and PCA. Predicting ifteam A wins a playoff is done with abinomial distribution, where p is theprob. A wins a game:
Regarding Goal 2, the gamblingproblem was formulated as aMarkov Decision Process.State:(currentMoney,game).Startstate:(initialMoney,0).Action:(money,team).CanbetuptocurrentmoneyonHome/Away;bettingamountsdiscretized.T(s,a,s’): ProbabilitiesoftransitionsaregivenbyourMLmodel.isEnd(s): Ifwerunoutofmoney,orwehavereachedthelastgame.R(s,a,s’):1ifwehavereachedanendstateandhavegreaterthanorequaltoDesiredAmountand0otherwiseDiscount:Setto1.
UserParameters:
Payoff:Anumbergreaterthan1thatcorrespondstohowmuchyougetbackforeachdollarbetBucketSize:Discretizationsizeforbetting.DesiredAmount:minimummoneywewanttofinishwith
44464850525456586062
Baseline Logistic Softmax SVM(Rbf) SVM(Poly) SVM(Sigmoid) SVM(Linear)
GameResultPredictionAccuracyusingBasicFeatures
TrainingSet ValidationSet
44
46
48
50
52
54
56
58
60
62
Baseline Logistic SVM(Rbf) SVM(Linear) ANN(h=5,relu) ANN(h=15,logistic)
ANN(h=5/10,tanh)
ANN(h=5/10,identity)
GameResultPredictionAccuracyusingBasic+AdvancedFeatures
TrainingSet ValidationSet
Figure 1, 2: Training and validation accuracies reported using 10-fold crossvalidation. For the best model, the accuracy on the test set of playoffs was54.66% (for reference, ESPN experts were ~51% accurate).
Ablative analysis of basic featuresdemonstrated that more advancedones were needed, however evenadvanced features didn’t help. Theliterature mentions a theoretical limitfor predicting the result of a singlehockey game due to luck/variability:
This limit of 60-63% was confirmed ina Monte Carlo simulation, running1000 trials, suggesting games can’t bedirectly predicted (right). This model issimilar to those used in the NFL.
Conclusions
References
• Hockey is a very challenging sport topredict due to variability inherent tosport.
• Perfect stats could allow reaching thetheoretical accuracy limit, butincremental progress needed.
• Reached 70% using SVM on playoffdata, so model could be fine-tuned.
• Applications in other leagues (otherthan NHL) or sports (baseball, etc).
Pischedda,Gianni.PredictingNHLMatchOutcomeswithMLModels.citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.735.795&rep=rep1&type=pdf.
Weissbock,Joshua,etal.UseofPerformanceMetricstoForecastSuccessintheNationalHockeyLeague.ceur-ws.org/Vol-1969/paper-06.pdf.
1. Playoff Results
2. Daily Team Stats
3. Betting Odds
00.050.10.150.20.250.30.35
0 10 20 30 40 50 60 70 80 90 100
MonteCarloSimulationofSeason
SDfromLuck ObservedSD
Figure 3: Results are accounted for by 73%luck, so when making predictions we canaccurately predict 27% and guess with 50%accuracy on the 73%, which gives us a63.5% ceiling.
$200
$0
$300
+ $675
- $200
+/- $0
MDP: Example
Total: $1475 in 3 games.
Shot attempt/quality features.
Includes shot attempt features, andadds overall team-based metrics.
0
0.2
0.4
0.6
0.8
0
500
1000
1500
0.01 0.02 0.03 0.04 0.05 0.075 0.1 0.2 0.5
Accuracyvs.”Closeness”ofGame
Samples Accuracy
TeamType NumberGames
Accuracy
BothGood 1088 0.5588
Both Bad 1306 0.5628
Good&Bad 3146 0.5950
TeamType Accuracy
BothGoodTeams 0.524
Both BadTeams 0.547
OneGoodTeam,One Bad 0.572
Baseline Predictions