Neural Network Prediction of NFL Football Games
Joshua Kahn
ECE539 – Fall2003
Overview
Introduction Work Performed
Data Collection Preliminary Study Training and Prediction Set Creation Data Preprocessing Making Predictions
Results Conclusion
Introduction
The National Football League (NFL) is a multi-billion dollar business
Many web sites claim to be able to predict the outcome of NFL games
Some of these sites are trustworthy, others are downright seedy
Why are actually correct?
Project Goal
Most prognostications are based on human opinion Invariably, some degree of bias enters in
This project aims to create a completely objective, statistics based system for predicting the outcome of NFL games The trouble lies in the “intangible” aspects of the
game It seems plausible to do create a statistical system
Why a Neural Network?
Teams can win in a variety of ways No linear mapping exists to determine the
outcome This problem essentially boils down to a
pattern classification problem Neural networks are very good at solving these
problems Neural network provides a non-linear mapping
Data Collection
Data was to be available from a typical NFL box score
A large data set was required to represent the large number of ways to win
Collected from NFL.com Used Excel’s web query feature to acquire tabular
data, such as box scores and team averages
Data Collection
Data was extracted from the box scores using a Perl script Perl provides an Excel interface
Statistics could be selected from the box scores as desired Perl also allowed additional data processing
Needed to determine which statistics to use
Preliminary Study
Data was analyzed using Matlab to look for dependency, redundant data, etc.
No hyperplane exists to separate wins and losses based on statistical analysis
-400 -300 -200 -100 0 100 200 300 400 -2000-1000
01000
2000-8
-6
-4
-2
0
2
4
6
8
Total Yardage Differential Time of Possession Differential
Tur
nove
r D
iffer
entia
l
Preliminary Study Results
Determined the following statistics were most predictive: Total yardage differential Rushing yardage differential Time of possession differential (in seconds) Turnover differential Home or away
Differential statistics provide insight into offensive and defensive performance
Scoring data was excluded as it would bias the network’s output toward a single feature
Training and Prediction Sets
Training sets include the statistics for both teams for each game
Each training vector also includes the outcome of the game Outcome marked for both teams 1 = win, -1 = loss
Two prediction sets were created: One based on team season averages Other based on average of prior 3 weeks Both sets were applied to determine effectiveness
Neural Network Selection
Back-propagation multi-layer perceptron provides a great deal of flexibility Good pattern classifier Supervised learning
Network parameters and structure were determined based on testing
Data Preprocessing
Processed all data using singular value decomposition Gives additional weight to the most pertinent
features prior to network input Makes training more effective
Performed using Matlab’s svd function
Making Predictions
Trained network using training data Applied prediction data three times
Used both season and three week average to determine effectiveness of the two
Found the average of the three trials Classified winner/loser of game
Winner had higher network output
Results
Neural network classification correct 94% when actual (not predicted) statistics are used
NFL teams seem to be consistent over the long-term
Prediction Rate
Week Season Average Data
Three Week Average Data
Week 14 75% 62.5%
Week 15 75% 37.5%
Results
Indianapolis def. Atlanta
Tennessee def. Buffalo
Kansas City def. Detroit
Tampa Bay def. Houston
New England def. Jacksonville
Minnesota def. Chicago
New York Jets def. Pittsburgh
St. Louis def. Seattle
Cincinnati def. San Francisco
Oakland def. Baltimore
Denver def. Cleveland
Carolina def. Arizona
Dallas def. Washington
Green Bay def. San Diego
New Orleans def. NY Giants
Philadelphia def. Miami
Green Bay def. Chicago
Baltimore def. Cincinnati
Philadelphia def. Dallas
Jacksonville def. Houston
Indianapolis def. Tennessee
Pittsburgh def. Oakland
San Diego def. Detroit
Minnesota def. Seattle
Tampa Bay def. New Orleans
New York Giants def. Washington
San Francisco def. Arizona
Denver def. Kansas City
New England def. Miami
Buffalo def. New York Jets
Atlanta def. Carolina
St. Louis def. Cleveland
Week 14 Week 15
Baseline Study
Prediction Rate
Week Neural Network ESPN.com
Week 14 75% 57%
Week 15 75% 87%
Neural network was more accurate on average Previous neural networks predictors accurate for
63% of games
Conclusions
Of eight misclassifications, each can be subjectively identified in one of 3 categories
Game Misclassification Reasoning
Philadelphia def. Dallas Misclassification
San Diego def. Detroit Too close to call
Atlanta def. Carolina Upset
Minnesota def. Seattle Too close to call
New England def. Jacksonville
Misclassification
New York Jets def. Pittsburgh
Too close to call
Cincinnati def. San Francisco
Too close to call
Oakland def. Baltimore Upset
Conclusions
Prediction rate could be improved by adding the “human element” Take immeasurable into consideration Las Vegas betting lines Subjective team rankings
Training set could be based on previous season data Ways in which teams win presumably does not change
over time Proves that a statistically based system can be
developed to predict outcome of NFL games
References
Haykin, S. (1999). Neural Networks: A Comprehensive Foundation. Upper Saddle River, New Jersey: Prentice-Hall, Inc.
ESPN.com, http://www.espn.com [Retrieved Dec 2003].Purucker, M.C. (1996) Neural Network Quarterbacking.
Potentials, IEEE, vol. 15:3, pp. 9-15.NFL.com, http://www.nfl.com [Retrieved Dec 2003].
Questions???
Thank you…