matlab computational finance conference 2017 … · matlab computational finance conference 2017...
TRANSCRIPT
MATLAB COMPUTATIONAL FINANCE CONFERENCE 2017
Quantitative Sports Analytics using MATLAB
Robert Kissell, [email protected]
September 28, 2017
Important Email and Web Addresses
• AlgoSports23/MATLAB Competition
Are you smarter than the Algo?
Email: [email protected]
Website: AlgoSports23.com
Please check the website for data updates, and contact [email protected] for further information.
Presentation Outline
• Quantitative Sports Modeling
• Modeling Techniques from:
• “Optimal Sports, Math, Statistics, and Fantasy”
• Probability Models
• Rank Sports Teams
• Estimate Winning Probability
• Calculate Winning Margin
• Computing Probability of Beating a Spread
• AlgoSports23/MATLAB Competition
Presentation Outline
• Quantitative Sports Modeling
• Modeling Techniques from:
• “Optimal Sports, Math, Statistics, and Fantasy”
• Probability Models
• Rank Sports Teams
• Estimate Winning Probability
• Calculate Winning Margin
• Computing Probability of Beating a Spread
• AlgoSports23/MATLAB Competition
• Are you smarter than the Algo!
Transaction Cost Analysis and Algorithm Trading
• Suite of TCA Models and Optimizers have been fully integrated into MATLAB’s Trading Toolbox.
• These suites of tools are being used for Algorithmic Trading and Portfolio Management.
• These include:
• Market Impact Estimation
• Pre-Trade
• Post-Trade
• Trade Schedule Optimization
• Liquidation Cost Analysis
• Portfolio Optimization with TCA
• Various Libraries are Available
• Access to a full suite of TCA libraries and MI Data is available upon request.
• Contact: [email protected] or [email protected]
Optimal Sport Math, Statistics, and Fantasy
Key items addressed include:
• Accurately rank sports teams
• Compute winning probability
• Demystify the black-box world of computer models
• Provide insight into the BCS and RPI selection process.
• Select optimal mix of players for a fantasy league competition
• Evaluate player skill and forecast future player performance
• Select team rosters
• Assist in salary negotiation
• Determine Hall of Fame eligibility
• Sabermetrics on Steroids!
What is Quantitative Finance?
• Quantitative Finance is the application of methods and analyses
from the different sciences to solve financial problems.
• This include: Math, Statistics, Physics, Engineering, Economics,
Computer Science, Biology, Psychology, Business, etc.
• Quantitative Finance is all about proper utilization of the
“Scientific Method” and drawing statistically significant
conclusions.
Scientist or Engineer
• A Scientist is someone who “loves” surprises. This is an
opportunity to learn and make further advancements. The goal is
to learn, improve, and progress.
Scientist or Engineer
• A Scientist is someone who “loves” surprises. This is an
opportunity to learn and make further advancements. The goal is
to learn, improve, and progress.
• A Engineer is someone who “hates“ surprises. Surprises are
usually a indication that something “failed” or gone wrong and
often results in a loss or slowing of progress.
What about a Quant?
• A Quant is someone who learns from a proper application of the
scientific method by finding “Scientific” surprises and “profit”
opportunities.
• Quants go through great lengths to learn the cause of these
surprises and to ensure that these relationships are statistically
significant.
• Quants then seek to implement these scientific surprises without
suffering any “Engineering” surprises and losses.
The Scientific Method in Practice
ScientistData
Data
Data
StatisticallySignificantConclusion
The Scientific Method in Practice
ScientistData
Data
Data
StatisticallySignificantConclusion
Attorney Desired OutcomeFind supporting data
Data Mining
Data
Data
Data
The Scientific Method in Practice
ScientistData
Data
Data
StatisticallySignificantConclusion
Attorney Desired OutcomeFind supporting data
Data Mining
Data
Data
Data
Doctor Educated GuessTest Data
Worse Case Scenario?
Data ?
Data ?
Data ?
Moral of the Story:
Be a Scientist!
Moral of the Story:
Be a Scientist!
Don’t be that Anti-Scientist!
Quantitative Sports Modeling
What is Quantitative Sports Modeling?
• The application of quantitative tools and analytics, and sound
scientific methods, to sports related problems and questions.
• Quantitative sports modeling consists of the same tools used in
quantitative finance and is comprised of: mathematics, statistics,
engineering, machine learning, economics, business, etc.
• Sports Modeling is based on the same framework as Quantitative
Finance, but solves different set of problems.
What do we want to solve?
• Expected Winning Team
• Probability of Winning
• Expected Winning Margin
• Probability of Beating a Specified Margin
• Future Player Performance
• Roster of Players (Best set of Complementary Players)
• Best Mix of Players given Opponent
• Salaries & Salary Negotiation
Sports Modeling Data: What we want to Predict (LHS)
• Win/Loss
• Win Margin
• Probability of winning by more than X points
• Player Statistics (Fantasy Sports)
• Evaluating Player Ability
• Roster Selection
• Salary and Salary Negotiations
• Line-up and Match-ups
• Player Trades
• Hall of Fame Selection
Sports Modeling Data: Explanatory Factors Data (RHS)
• Win/Loss Result
• Game Scores
• Game Data
• Team Statistics• (AVG, OBP, ERA, HR, Comp. Ratio)
• Venue Location• (Home Field Advantage)
• Momentum
• Players, Injuries
• Career Statistics
• Salary
• Age
• Teammates & Roster
• Principal Component Analysis
Different Sports Prediction Models
• Probability Models
• Non-Linear Regression
• Non-Parametric Statistics
• Neural Networks / Machine Learning
• Sabermetrics on Steroids!
Head-to-Head Competitions – How do we Rank Teams
A
B
E
C
D
F
Ranking:A
B & CD & E
F
Head-to-Head Competitions – How do we Rank Teams
Ranking:A, B, C
A
B C
Head-to-Head Competitions – How do we Rank Teams
Ranking:A & GB & C D & E
F
A
B
E
C
D
F
G
Ranking:A
B & C & GD & E
F
Head-to-Head Competitions – How do we Rank Teams
Ranking:A
B & C D & EF & H
Ranking:A
B & C D & E & H
F
A
B
E
C
D
F
H
Sports Models To Discuss Today
Probability Models: Probability (X>Y)
• Power Function:𝜆𝑥
𝜆𝑥 + 𝜆𝑦
• Logit Regression
𝑏0 + 𝑏ℎ − 𝑏𝑎 = ln𝐹−1 𝑧
1 − 𝐹−1 𝑧
• In probability models, the LHS variable is (0,1) !
Power Function
Power Function
The Power function is derived from the Exponential Distribution.
Let,
𝑓 𝑥 ~𝜆𝑥𝑒−𝜆𝑥𝑡
𝑓 𝑦 ~𝜆𝑦𝑒−𝜆𝑦𝑡
Then,
𝑃𝑟𝑜𝑏 𝑥 > 𝑦 =𝜆𝑥
𝜆𝑥 + 𝜆𝑦
where, 𝜆𝑘= Team “k” Rating
Power Function with Home Field Advantage
Let X be Home Team
Prob X > Y =λx + λ0
λx + λy + λ0
Let Y be Away Team
Prob Y > X =λy
λx + λy + λ0
λk= Team “k” Rating
λ0= Team “k” Rating
Power Function: Solving Parameters
Function
𝐺 =
λx + λ0λx + λy + λ0
𝑖𝑓 ℎ𝑜𝑚𝑒 𝑡𝑒𝑎𝑚 𝑤𝑖𝑛𝑠 𝑔𝑎𝑚𝑒
λx + λ0λx + λy + λ0
𝑖𝑓 𝑎𝑤𝑎𝑦 𝑡𝑒𝑎𝑚 𝑤𝑖𝑛𝑠 𝑔𝑎𝑚𝑒
Max 𝐿 = ς𝐺𝑖
Max log 𝐿 = σ log 𝐺𝑖
Solve using Maximum Likelihood Estimates (“MLE”)
Power Function: Estimate Spread
Run Second Regression,
𝑆𝑝𝑟𝑒𝑎𝑑 = 𝑑0 + 𝑑1 ∙ 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
Results,𝑑0, 𝑑1, 𝑠𝑒𝑌
MATLAB – Solving Power Function Parameters
% Power Function Model
% Num = matrix of winning team and location (HFA if at home)
% Denon = matrix of all teams including HFA
[b,fval,exitflag,output]=fmincon(@(b) myPower(b,Num,Denom),...
b0,[],[],[],[],LB,UB,...
[],...
options);
exitflag;
function f = myPower(b,Num,Denom)
Z=(Num*b)./(Denom*b);
f=-sum(log(Z));
end
Steps to Solve Power Function
• Set up Objective Function:
• Estimate Team Ratings using MLE
• Compute Winning Probabilities using Power Function Formula
• Run Regression of Home Team Win Margin (Spread”) as function of
Predicted Home Team Winning Probability (“Prob”):
• 𝑆𝑝𝑟𝑒𝑎𝑑 = 𝑑0 + 𝑑1 ∙ 𝑃𝑟𝑜𝑏
• This provides:
• 1) Probability that Home Team Wins Game
• 2) Expected Home Team Win Margin
• 3) Teams can be ranked based on Model Parameter (from highest to lowest)
Logit Regression
Logit Regression Model
Start with Logistic Distribution Function:
1
1 + exp − 𝑏0 + 𝑏ℎ − 𝑏𝑎= 𝑧1
s = Home Pts − Away Pts = Home Team Spread, (-inf, +inf)
z =𝑠 − 𝑎𝑣𝑔(𝑠)
𝑠𝑡𝑑𝑒𝑣(𝑠), (−𝑖𝑛𝑓, +𝑖𝑛𝑓)
𝑧1 = 𝐹−1 𝑧 = 𝑛𝑜𝑟𝑚𝑐𝑑𝑓 𝑧 , (0,1)
Logit Regression Model
We transform the logistic function into the logit regression:
𝑏0 + 𝑏ℎ − 𝑏𝑎 = ln𝑧1
1 − 𝑧1
s = Home Team Spread, (-inf, +inf)
z =𝑠 − 𝑎𝑣𝑔(𝑠)
𝑠𝑡𝑑𝑒𝑣(𝑠), (−𝑖𝑛𝑓, +𝑖𝑛𝑓)
𝑧1 = 𝐹−1 𝑧 = 𝑛𝑜𝑟𝑚𝑐𝑑𝑓 𝑧 , (0,1)
Steps to Solve Logit Spread Regression (Part 1)
• Calculate LHS Spread Values = Home Team Spread, (-inf, +inf);
z =𝑠 − 𝑎𝑣𝑔(𝑠)
𝑠𝑡𝑑𝑒𝑣(𝑠), −𝑖𝑛𝑓, +𝑖𝑛𝑓 ; 𝑧1 = 𝐹−1 𝑧 = 𝑛𝑜𝑟𝑚𝑐𝑑𝑓 𝑧 , (0,1)
• Solve parameters from OLS
• 𝑏0 + 𝑏ℎ − 𝑏𝑎 = ln𝑧1
1−𝑧1
• Estimate Home Team Win Margin
• 𝑧1 = 𝐹−1 𝑧 =1
1+exp − 𝑏0+𝑏ℎ−𝑏𝑎
• 𝑧 = 𝑛𝑜𝑟𝑚𝑖𝑛𝑣 𝑧1
• 𝑠 = 𝑧1 ∙ 𝑠𝑡𝑑𝑒𝑣 𝑠 + 𝑎𝑣𝑔(𝑠)
Steps to Solve Logit Spread Regression (Part 2)
• Run second regression:
• 𝐴𝑐𝑡𝑢𝑎𝑙 𝑆𝑝𝑟𝑒𝑎𝑑 = 𝑑0 + 𝑑1 ∙ 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑆𝑝𝑟𝑒𝑎𝑑
• 𝑌 = 𝑑0 + 𝑑1 ∙ 𝑠
• 𝑑0, 𝑑1, 𝑠𝑒𝑌
• Compute Home Team Win Probability
• 𝑃𝑟𝑜𝑏 𝑆𝑝𝑟𝑒𝑎𝑑 > 0
• 𝑃𝑟𝑜𝑏 𝑌 > 0
• 𝑌~𝑁 𝑠, 𝑠𝑒𝑌
MATLAB – Logit Regression
% Logit Regression
% s = home team win margin,
% s>0, home team won game by s
% s<0, home team lost game by s
% z=zscore(s), mu = mean(s), stdev = stdev(s)
% Finv=normcdf(z)
% Y=log(Finv/(1-Finv))
% X=matrix of games, home team = +1, away team = -1
whichstats={'beta','tstat','r','yhat','mse','rsquare'};
myStats = regstats(Y,X,'linear',whichstats);
beta=myStats.tstat.beta;
beta=[beta(2:end);beta(1)];
TeamRating=beta;
NFL
NFL Data: Only Three Weeks of Games (47 Games)
NFL Data: Only Three Weeks of Games
NFL Data: Only Three Weeks of Games
Power Function: Estimating Spreads
𝑝𝑟𝑜𝑏 =λx + λ0
λx + λy + λ0
spread = 𝑑0 + 𝑑1 ∙ 𝑝𝑟𝑜𝑏
NFL - Power Function
Estimating Home Team Win Probability:
𝑝𝑟𝑜𝑏 =λx + λ0
λx + λy + λ0
Estimating Home Team Spread
𝑠 = 𝑑0 + 𝑑1 ∙ 𝑝𝑟𝑜𝑏 = −12.601 + 28.154 ∙ 𝑝𝑟𝑜𝑏
Example: Power Function
New England (Home) vs. Carolina (Away)
New England = 28.954
Carolina = 5.1099
HFA = 0.01
𝑝𝑟𝑜𝑏 =28.954+0.01
28.954+5.109+0.01= 85%
Estimating Home Team Spread
𝑠 = −12.601 + 28.154 ∙ 0.85 = +11.3 (need to adjust)
Logit Regression: Estimating Spreads
Est. Spread = b0 + bH − ba
Act. Spread = 𝑑0 + 𝑑1 ∙ 𝐸𝑠𝑡. 𝑆𝑝𝑟𝑒𝑎𝑑
NFL – Logit Regression
Estimating Home Team Win Probability:
ln𝑧1
1 − 𝑧1= 𝑏0 + 𝑏ℎ − 𝑏𝑎
Estimating Home Team Spread
Y (Actual Spread) = 𝑑0 + 𝑑1 ∙ 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑆𝑝𝑟𝑒𝑎𝑑 𝑠𝑑0, 𝑑1, 𝑠𝑒𝑌𝑃𝑟𝑜𝑏 𝑌 > 0 = 𝑛𝑜𝑟𝑚𝑐𝑑𝑓 0, 𝑠, 𝑠𝑒𝑌
NFL Data: Only Three Weeks of Games
Example: Power Function
New England (Home) vs. Carolina (Away)
New England = 1.0079
Carolina = 0.4869
HFA = -0.0592
Estimating Home Team Spread:
𝑠 = 𝐽 𝐾1
1 + exp(−(1.0079 − 0.4869 − 0.0592)= +6.7
Estimating Home Team Win Probability:
𝑝 = f 6.7 =74%
NFL - Predictions
NCAA College Football
College Football: Only Four Weeks of Games (286 Games)Games with Div 1- FBS Teams Only
NCAA Football: Only Four Weeks of Games
NCAA Football - FBS: Model Results
NCAA Football - FBS: Algorithmic Rankings (after 4 weeks)
NCAA Football - FBS: Week 5 Predictions (Part 1)
NCAA Football - FBS: Week 5 Predictions (Part 2)
AlgoSports23/MATLAB Competition
AlgoSports23 / MATLAB Competition
• Are you Smarter than the Algo!
AlgoSports23 / MATLAB Competition
• Are you Smarter than the Algo!
• Can you Beat the Algo!
AlgoSports23 / MATLAB Competition
Two Important Emails:
AlgoSports23 / MATLAB Competition
• Rules of the Competition
• All Analysis & Programming MATLAB
• Game Results Data will be Posted Weekly
• Game Prediction File will be Posted Weekly
• Return Model Predictions by Specified Date
• Top 23 performing Algorithms each week will be included in
the AlgoSport23 Computer Rankings and Prediction
• National Media Attention!
• Are you smarter than the Algo?
AlgoSports23 / MATLAB Competition
Your program and submission needs to include the following:
1) Ranking of Teams
2) Prediction of Home Team Winning Margin for all game in a week
Models are measured based on:
1) RMSE
2) Avg Difference
3) Number of Wins
AlgoSports23 / MATLAB Competition
• Top 23 performing Algorithms each week will be included in the
AlgoSport23 Computer Rankings and Prediction!
• National Media Attention!
• Bragging Rights!