in search of “sure things”mmss.wcas.northwestern.edu/thesis/articles/get/600/johnson...
TRANSCRIPT
In Search of “Sure Things”
An Analysis of Market Efficiency in the NFL Betting Market
By: Rick Johnson Advisor: Professor Eric Schulz
Submitted for Honors in Mathematical Methods in the Social Sciences Spring, 2007
Abstract
The market for betting on National Football League games is well established and can be
observed on a weekly basis for four months every year. Despite the visibility of the market, past
research has tested its efficiency from several angles and reported a variety of inefficient areas.
This thesis uses more recent data and a system of probit models to test if some of these
inefficiencies continue to exist and if they can be consistently exploited to earn above average
returns. While a model that incorporates measures of overall team strength and recent
performance can be used to reliably predict in-sample results, its ability to predict out-of-sample
games is significantly weaker and it can only be used to generate excess profits if the gambler
observes an extreme cut-point strategy. Furthermore, this model is also applied using a varying
number of years to predict outcomes during the following season. The results suggest using a
mid-range number of years, such as three, may actually generate the most successful model as
inefficiencies evolve over time and a model that bases its predictions on more years may not
capture recent trends as well.
Acknowledgements
I would first like to thank my advisor, Professor Eric Schulz, for his help and guidance
throughout the MMSS thesis process. I walked into his office with broad ideas and he helped me
focus them and made me think deeper about the results I found. In addition, I would like to
thank my technical advisors, Kisa Watanabe and Scott Payseur, for providing additional advice
on the Econometric methods I use. Furthermore, I would like to thank all my colleagues at
Deloitte who not only were willing to work around the unique constraints placed on me while
finishing my thesis, but also provided an immense amount of support along the way.
From a more personal standpoint, I would like to thank my parents, siblings, and the rest
of my family, who have always supported my academics and pushed me to succeed.
Additionally, I would like to thank my friends and girlfriend for all the good times they have
provided me with over the past four years.
Table of Contents
I. INTRODUCTION..............................................................................................................1
II. THE BETTING MARKET FOR FOOTBALL ..............................................................3
III. REVIEW OF EXISTING LITERATURE ......................................................................6
IV. DATA ................................................................................................................................15
V. DATA ANALYSIS AND RESULTS ..............................................................................18
VI. DISCUSSION ...................................................................................................................34
VII. CONCLUSION ................................................................................................................37
VIII. WORKS CITED...............................................................................................................38
Johnson 1
I. Introduction
Every year an increasing amount of money is gambled on sports in America. While one
could argue the market used to have significant barriers to entry, that case can no longer be
made. Sports gambling used to only be legally available in select locations in the United States,
such as Las Vegas and Atlantic City, or through use of illegal channels, such as a “bookie”—a
person who takes bets. However, since 1995, reputable internet gambling sites have grown and
become popular; so popular, in fact, that it has become a major focus of United States legislative
bodies. Although this has caused the industry to adapt and find loopholes in the law, internet
gambling is still readily available to anyone with a credit card and internet connection.
While internet gambling provides lower barriers to entry, a relatively small amount of
money is wagered online—an estimated $11.9 billion, including all forms of gambling such as
poker and other casino games. On the other hand, $2.43 billion was wagered in Nevada’s
sportsbooks in 2006 and the National Gambling Impact Study Commission estimates as much as
$380 billion was wagered illegally on sports1. This shows that although the industry is easy to
access through the internet, the majority still prefer more traditional, yet illegal, forums.
With an industry so widely accessible, one must wonder if the market truly is efficient or
if there are arbitrage opportunities that persist due to inherent patterns of irrationality on the part
of bettors—or, investors. Therefore, I have attempted to find answers to the questions: Are
there market inefficiencies apparent in historical data that can be systematically exploited to earn
a profit? If these inefficiencies exist, are they similar to most other arbitrage opportunities that
disappear over time, or can they actually be applied to out-of-sample data to achieve similar
results? 1 http://www.americangaming.org/
Johnson 2
In order to explore these questions, I employed a variety of econometric methods and
tested the applicability of other papers’ findings to present day data and found some of the
inefficiencies reported still exist, although they have weakened over time. In addition, I tested
the predictive power of models generated using historical results and found the only way these
inefficiencies can be exploited is by observing extreme cut-point strategies—which would limit
the number of betting opportunities a gambler would have over the course of a season.
Johnson 3
II. The Betting Market for Football
Betting on football is unlike other forms of betting. Rather than betting on a single event,
or a combination of events, like most other forms of betting, the bettor chooses a team to “win
against the spread.” In theory, the spread (S) is a value that if added to the underdog’s number of
points at the end of the game (U) would equal the favorite’s number of points (F). Or, in
mathematical terms,
F = U + S
Therefore, in betting on the favorite to beat the spread, the bettor is predicting:
F > U + S
And the opposite is true if the bettor predicts the underdog will beat the spread. This method of
betting is different from most other sports, which use “odds betting”—meaning, given a fixed
wager, one event will pay a different amount than another if it is the correct outcome. For
example, if football used the odds betting system and the San Francisco 49ers were favored over
the Green Bay Packers, the 49ers might have odds of -310 and the Packers might have odds of
+255. This means if you bet on the correct team to win outright, you would have to bet $310 on
the 49ers to win $100 ($100 would win $32.26) or $100 on the Packers to win $255. However,
in a spread system, a correct bet on one team to cover the spread would typically earn the same
amount as if you had bet correctly on the other team. While the spread is also being used more
commonly in basketball betting, it is not widely used in low-scoring sports, such as baseball and
soccer; sports with multiple competitors, such as races or golf; or sports where defining a margin
of victory is difficult, such as boxing.
Another interesting characteristic of betting markets is the role played by the bookie, or
the “house” in cases where betting in a sportsbook—a legal forum for placing sports bets.
Johnson 4
Rather than betting against the bettor, the bookie can be better characterized as a conduit for
betting against other bettors, and the bookie charges a commission for his services. Typically, in
a game in which you are betting on the spread, the payout for both teams will be -110—meaning
whichever team you bet on, you must wager $110 to win $100. In the world of betting, this is
commonly referred to as the “11 for 10” rule and the commission the bookie is earning is
referred to as the “vigorish.” This means if you give the bookie a bet of $110 and win, he will
give you $210 back; however, if you lose he will give you nothing back. Therefore, we can
define WP* as the winning percentage necessary to “break even” and calculate accordingly:
(WP* x 210) + [(1 – WP*) x 0] = 110
WP* = 52.4%
So, a bettor must predict 52.4 percent of games correctly in order to break even. In this situation,
the bookie is essentially placing an even money bet between two people of $105 and charging
each person $5 to place the bet—a commission of 4.8 percent per bet. Therefore, as long as the
bookie can keep the amount of money wagered equal on each team, he earns a 4.8 percent
commission, or 4.5 percent of the total money wagered, with absolutely no risk.
While the bookie’s task may sound easy, they must be sure to keep an equal amount of
money on each side in order not to be gambling themselves. Therefore, lines for an upcoming
game are typically released early in the week, Tuesday or Wednesday for a Sunday game, and
allowed to move if there is too much money wagered on one side. However, this can create a
unique problem in which the bookie may get “middled.” For example, if the opening line in the
49ers and Packers game is the 49ers by seven points and the majority of the public feels they will
win by more than this and bet accordingly, the line will move upwards. However, if the line
moves too far, to ten for example, 49ers’ supporters may stop betting and Packers’ supporters
Johnson 5
may start betting on them. Thus, the majority of bets will be on either the 49ers at -7 or the
Packers at +10, and there will be very few on the 49ers at -10 and the Packers at +7. This is the
worst possible situation for a bookie to be in because if the actual game spread is 8 or 9, the
bookie will lose the vast majority of the bets he accepted. Therefore, the opening lines are
determined by experience “oddsmakers” and are reluctantly changed—especially by a substantial
margin.
One final characteristic of spread betting worth mentioning is importance of the “half-
point.” While it may not make sense to say a team is favored by ten and a half points, the
majority of lines use this terminology. Thus, instead of the 49ers being favored by ten points,
they might be favored by nine and a half points. In these cases, if the 49ers win by ten, a bettor
who bet on them to cover would win; however, if they only win by nine, a bettor who bet on the
Packers to cover would win. The party that derives the biggest advantage from this convention
is the bookie. Traditionally, a game in which the actual spread equals the betting line results in a
tie, or “no action.” In these cases, the bookie refunds both bettors money and earns nothing.
However, if the line could have been set at nine and a half points instead of ten without
dramatically changing the level of action on each side, the bookie would earn his commission on
all bets. Therefore, it is in the bookie’s best interest to use a line with a half-point whenever
possible.
Johnson 6
III. Review of Existing Literature
The pool of existing literature on the football betting market, and the betting market as a
whole, is fairly limited. In addition, the majority of empirical research that has been done is not
current and there is no guarantee certain trends still exist due to changes in the market structure.
Therefore, one of the primary goals of this paper is to examine models and results from existing
literature, see if their results are still applicable in the present day, and attempt to improve on
their theories.
Spread vs. Odds
One important feature of the football betting market, previously mentioned, is its reliance
on “spread” betting compared to the more conventional “odds” betting. As this is a major
characteristic of the market, multiple authors have investigated why this is the case and, although
these models will not be critiqued or improved upon in this paper, their results are worth noting.
Prior to these analyses, the conventional wisdom was spreads were the dominant medium for
betting due to bettor sentiment by making games more interesting to watch. For example, in
football there are often heavy favorites and points come in large chunks, generally three or
seven, rather than one at a time. A byproduct of this is games can exhibit large point swings in a
short amount of time. Therefore, if a bettor is watching a game with a point spread, he is
essentially watching two games in one—the competition to be the absolute winner and the
competition to be the winner against the spread—which is likely to generate more interest.
Spread betting also makes one-sided games more interesting because even if the favorite is
winning and the losing team does not have a realistic chance of winning the game, they have a
better chance of beating the spread and, therefore, the game might still come down to the last
minute in the eyes of the bettor even though the outcome has been determined.
Johnson 7
While Bassett does not reject these theories of preference towards spread betting, and
even believes they do have influence, he presents a model where the bookie is allowed to set the
spread and odds for a given game. He demonstrates that, given unimodal and symmetric belief
functions for the bettors, presenting the bettor with a spread, rather than odds, is the profit-
maximizing behavior of a bookie. However, Woodland and Woodland challenge Bassett’s
assumption of risk neutrality due to his assumption bettors do not gamble a significant proportion
of their wealth. Woodland and Woodland argue this unnecessarily restricts the size of the bet a
bettor can choose in Bassett’s model. They present a model that demonstrates the spread system
is indeed the profit-maximizing choice for a bookie, but primarily due to the risk aversion in
bettors. They show in games where neither team is thought to have a significant advantage, the
amounts wagered will be similar under spread and odds betting. However, if one team is thought
to have an inherent advantage, less will be wagered by a risk adverse bettor under the odds
system than the spread system. Therefore, because bookies, in theory, make a percentage of total
bets placed, they will make more under the spread system.
Predicting the Winner of a Game
Another focus of literature on the betting market has been developing ways to predict the
absolute winner of a given game. While these papers are not direct tests of market efficiency,
they are relevant to research as they demonstrate the predictive value of the spread. Harville
uses a complex statistical algorithm to demonstrate he can use public information to predict the
winner of a game. When applied to actual results, Harville predicts the absolute winner of a
game 70.3 percent of the time, based on 1,320 games from 1971 to 1977. However, Harville’s
success rate is overshadowed by the fact that the team favored by the spread is the absolute
winner in 72.1 percent of games in the same sample.
Johnson 8
Stern also attempts to model a team’s probability of winning a given game; however, he
uses far simpler methods. Using data from the 1981 to 1984 seasons, Stern calculates the margin
of victory over the spread—defined as the favorite’s actual score minus the underdog’s actual
score minus the spread—for each game. This statistic for Stern’s sample is normally distributed
with a standard deviation of 13.86 points and a mean of .07, which he rounds to zero for
simplicity. Stern conducts a Chi-squared test to show a normal distribution with a mean of zero
and standard deviation of 13.86 can not be rejected. Therefore, the probability of a favorite
(underdog) winning a game is the area under the normal curve centered at the point spread, and
with standard deviation of 13.86, and below (above) zero. To test his results Stern simulates the
1984 regular season ten thousand times and is able to correctly predict five of six division
winners. However, Stern does not attempt to apply his simulation to out-of-sample games.
Perhaps the most relevant piece of Stern’s work is that he is not able to reject a normal
distribution centered on zero for the margin of victory over the spread.
Efficiency of the NFL Betting Market
Turning to the focus of this paper, there is a substantial amount of literature on the
efficiency of the NFL betting market. On one hand, there is research that claims to identify
systematic errors in the betting market that can be exploited, even when looking at out-of-sample
data. However, there are just as many pieces of literature either commenting on these papers, or
presenting data and models of their own, in support of market efficiency. In the middle, there are
papers that identify large errors in the market but argue they cannot be systematically exploited.
One such paper is by Pankoff, and is widely regarded as the first piece of literature to
address the efficiency of the betting market. To test the efficiency of the market, Pankoff relies
on an analysis of the mean square error of the market forecasts from 1963 to 1965. He attempts
Johnson 9
to adjust the unadjusted mean square error by subtracting a “mean error” and a “variance error”.
The mean error is intended to account for home-field advantage not being accurately valued and
the variance error—which incorporates constants from regressing actual spreads on forecasted
spreads—is intended to account for the market’s inability to accurately judge the teams’ average
abilities to score points. However, subtracting these error terms from the mean square error only
reduces the mean square error by one percent. Therefore, Pankoff concludes “market forecasts
will not be amenable to much, if any, statistical improvement” (Pankoff). He elaborates on this
conclusion by arguing the large mean square error implies the market is not successful in
forecasting game outcomes, however there are not patterns that would allow this inefficiency to
be exploited.
Although Pankoff does not believe these errors can be statistically exploited, he does not
rule out the possibility of superior analysts being able to consistently beat the market. In fact, he
presents an example where the predicted spreads of five different analysts, and the median of
their predictions, were compared to the betting spread. A strategy of placing a bet if the absolute
value of the analyst’s prediction minus the spread is greater than a cut-point, which varies, is
employed. Pankoff finds that as the cut-point is increased, so does the probability an analyst will
place a correct bet and, at the highest level of the cut-point, four of the five analysts and the
median of them earn a profit. Therefore, while Pankoff does not believe a statistical model can
outperform the market, he does not eliminate the possibility that some bettors may be able to
outperform it.
Another early paper is by Vergin and Scriabin, who look for statistical significance in a
variety of common betting methods in games from 1969 to 1974. Some “best bet of the week”
methods, such as placing a bet on the team which outscored its opponent by the most points in
Johnson 10
the last n weeks or betting on the team that beat the spread by the most points in the past n
weeks, showed some statistically significant positive returns, depending on what parameters are
used. However, a similar method of betting on the team that won by the largest margin in the
previous week’s game actually provides negative, albeit not significant, results. Perhaps the
most consistent and applicable method they analyze is that of betting on the underdog. They
segment games by the value of the spread into categories of [0, 5], (5, 10], (10, 15], and >15, and
find betting on the favorite tended to yield positive returns only when it is a small favorite, with a
significance level of .098. On the other hand, betting on underdogs in the (5, 10], (10, 15], and
>15 categories yields positive returns with significance levels of .085, .378, and .049,
respectively. However, upon further examination, they conclude it is logical to group all games
with a spread >5 together, as these are all games with a “heavy” favorite. Betting on any
underdog given at least five points yields a winning percentage of 54.6 percent, with a
significance level of .017, which is enough to earn a profit.
Another test of efficiency is done on a weekly basis by Zuber, Gandar, and Bowers.
Using data from the 1983 season, they regress the actual spread on the betting spread and test the
null of efficiency—the constant equal to zero and the coefficient on betting spread equal to one.
They are unable to reject the null for 13 of the 16 weeks, which might imply an efficient market.
However, they also use the same model to test against the null of both the constant and the
coefficient being equal to zero—which would suggest no correlation between the actual game
result and the predicted spread. In this test they are unable to reject the null in 15 of the 16
weeks, which suggests an inefficient market and, thus, they search for a different way to test
efficiency. To do this, they regress the actual spread on several explanatory variables which are
the differences of the home team’s value minus the visitor’s value for a variety of statistics, such
Johnson 11
as yards rushed, yards passed, number of wins, etc., for games through the first eight weeks of
the season. Then, for the next eight weeks, they use the generated model and collected statistics
to compute their projected point spread for each game. If the difference between their point
spread and the betting spread is greater than a value, λ, they place a bet. Although they float λ to
higher levels, at its lowest level of .5, the model generates 102 bets; 60 of which are correct. If
one were to assume the probability of picking a winner is .5, there is less than a five percent
chance of picking 60 out of 102 games correctly (Zuber, et al). Although their model generates
results, Zuber, at al, concede they do not prove market inefficiency because their data is limited,
there is a significant cost to collecting and analyzing the data which would have to be included,
and, as gambling was illegal in 1983, there might be additional transaction costs that are not
included.
While Zuber, et al, admit limitations of their model, Sauer, Brajer, Ferris, and Marr wrote
a comment on their paper in defense of the efficient markets hypothesis. They criticize the fact
that Zuber, et al, attempt to reject the null hypotheses on a weekly basis because the small
sample size within each week would make it very hard to reject a null hypothesis. Therefore,
Sauer, et al, run the same regressions using data from the entire 1983 season as one sample.
They are not able to reject the null of the efficient market hypothesis, but are able to reject the
null that both the constant and coefficient are zero, which demonstrates there is a correlation
between actual outcome and the betting spread.
In addition, Sauer, et al, criticize the construction of Zuber’s, et al’s, model. They argue
the efficient markets hypothesis states the betting line incorporates all public information and
cannot be improved. Therefore, the spread should be included in the model in addition to the
explanatory variables, and if any coefficient on another explanatory variable is significantly
Johnson 12
different from zero the efficient markets hypothesis can be rejected. Sauer, et al, run this
regression and are not able to reject the null that the coefficient is equal to zero for any of the
additional explanatory variables, thus further supporting the efficient markets hypothesis.
Finally, Sauer, et al, assume Zuber’s, et al’s, model for 1983 and run a similar test for
1984. They calculate the coefficients based on the first eight weeks of 1984, and apply their
model to the second half. However, unlike Zuber’s, et al’s, simulation, Sauer, et al, only predict
39 of 101 bets correctly, thus concluding Zuber, et al, “have not constructed a model that can
consistently produce profitable gambling opportunities” (Sauer, et al).
Another inefficiency that has been reported, but will not be addressed by this paper, is the
movement in the line from when it opens to game time. Gandar, Zuber, O’Brien, and Russo
examine the patterns of lines moving to see if the general public moved the spread in the correct
direction or away from the eventual result. They find a simple betting strategy of betting against
the public—betting on a favorite that became less of a favorite or betting on an underdog that
became a bigger underdog—was successful 54.9 percent of the time from 1980 to 1985, which
has a significance value of .002. Avery and Chevalier conduct a similar study of games from
1984 to 1994. They calculate a model to predict the movement of a line over the course of the
week and bet against the predicted public movement. Using this strategy, they are able to win
50.5 percent of bets when betting on all games, but if they segment the sample and only bet on
larger predicted swings, their winning percentage increases. However, they also run a test
similar to Gandar, et al, and bet against the actual movement in the spread and are not able to
generate a profit. Therefore, this is yet another speculative area in which the research can not
reach a consensus.
Johnson 13
As for the focus of this paper, it will mainly expand on the work done by Gray and Gray
in their paper, “Testing Market Efficiency: Evidence from the NFL Sports Betting Market”.
Gray and Gray do not concern themselves with the typical explanatory models to test market
efficiency that are primarily rooted in the OLS framework. Rather, they develop a variety of
probit models to predict the probability a given team will cover the point spread. Their primary
model is specified by regressing a binomial variable, 1 if the home team wins and 0 if they lose,
on the season winning percentage against the spread of both teams, the number of wins against
the spread each team has in the past four games, and a dummy variable that is 1 if the home team
is the favorite. The logic behind using both overall winning percentages and recent winning
percentages is analogous to the stock market theory that a stock which is overall valuable, but
has performed poorly recently, will be underpriced and a stock that has recently done well, but is
not actually valuable, will be overpriced. Although the coefficients of all variables are not
independently significant—the home winning percentage and the away wins in last four games
do not reach significance—Gray and Gray argue the two winning percentage variables should be
considered jointly, as well as the two streak variables. The logic for this is:
“a team that has been performing well (relative to expectations), over the course of a
season to date is more likely to beat the spread than a team that has been performing
poorly. This is consistent with the idea that the market is slow to realize that a particular
team is having a particularly good (or bad) year.”
As for the rationalization of the joint test for the streak variables, Gray and Gray argue:
“a team that has been performing well (relative to expectations), over the past four games
is less likely to beat the spread than a team that has been performing poorly over the last
Johnson 14
four games. This is consistent with the idea that the market overreacts to recent form,
discounting the performance of a team over the season as a whole.”
In addition to these sets of variables being jointly significant, the coefficient on the dummy
variable for the home team being the favorite is negative and statistically significant—which
implies if a home team is the underdog it is more likely to beat the spread.
The coefficients for this model are specified using data from the 1976 to 1994 season—
which is by far the largest sample analyzed—and the model is able to correctly predict the
winner of 54.6 percent of the games within the sample. However, when only specifying the
model using data from 1976 to 1992, and predicting the outcomes for 1993 and 1994 the model
predicts the correct winner 56.0 percent of the time, which has a significance level of .013 when
compared to 50.0 percent and .091 when compared to 52.4 percent—the minimum to earn real
profits. In addition, Gray and Gray run similar tests using different cut-point strategies. While
returns tend to increase, and be more significant, for the in-sample data, the results were mixed
for out-of-sample games.
Johnson 15
IV. Data
The primary source of data for this paper is The Gold Sheet—a popular resource for
sports bettors. The Gold Sheet archives provide complete scores, closing lines, dates, and
location for every regular season and playoff game from 1993 to 2005. However, due to the fact
that playoff games may be atypical due to certain intangible factors, these games are excluded
from the sample, which leaves a sample of 3,176 games.
In addition to eliminating playoff games from the samples, games which the market
predicted correctly and the actual spread equaled the betting spread (“no-action games”) have to
be given special consideration. Due to the fact that the focus of this paper, as is Gray and
Gray’s, is developing probit models to predict the probability a team will win a game, these
games can not be reliably incorporated into the sample. However, because most explanatory
variables are based on what has already happened that season—whether it be a season-long
variable like total winning percentage or a streak variable like the number of wins in the last four
games—deciding whether or not to include these games can influence the result. While Gray
and Gray do not specify how they treat this issue, I calculate historical statistics before excluding
these no-action games in order to capture the true influence of the variables. After these
calculations, eliminating these no-action games cuts the sample by 86 games, or 2.7 percent, to
3,090 games.
Table 1 contains summary statistics of the primary data sample used for this analysis.
Johnson 16
Mean Standard Deviation
Lower Bound
Lower Quartile Median Upper
QuartileUpper Bound
Winning Score 26.51 8.77 6 20 26 31 59Losing Score 14.99 7.94 0 10 14 20 48Margin of Victory 11.51 9.02 0 4 9 17 49
Table 1Summary statistics of the 3,090 NFL games analyzed from 1993 to 2005.
As point spreads are a focus of this paper, it is also important to examine them.
However, as pointed out by Gray and Gray, calculating any statistics of a point spread also
presents a unique problem as it is defined relative to one team and is not an absolute. For
example, when looking at a game with an away favorite of ten points, summary statistics can be
computed by using the absolute value of the spread (10), which is in effect defining it from the
perspective of the favorite; by using the number of points the home team is favored by (-10); or
by choosing the “Team of Record” (Gray and Gray) randomly. Gray and Gray believe the most
accurate way of looking at the summary statistics is taking the mean and standard deviation of
the betting spread, actual spread, and difference by determining the Team of Record randomly.
However, while using a random Team of Record might provide the most accurate standard
deviation, the mean is close to zero, and therefore does not convey much information.
Therefore, examining the spread by using the favorite as the Team of Record will provide the
most meaningful mean—an average spread of 5.2 points—as it gives a true sense of what the
spread is in a randomly chosen game. Finally, using the home team as the Team of Record may
not provide better summary statistics, but it should provide an estimate of how many points
home-field advantage is worth, both in bettors’ minds (2.47) and in actuality (2.81).
Therefore, summary statistics for the point spread computed using all three methods are
presented in Table 2.
Johnson 17
Favorite Home Random
Betting Spread Mean 5.20 -2.47 -0.09 Standard Deviation 3.39 5.70 6.21 Lower Quartile 3 -6 -4 Median 4 -3 0 Upper Quartile 7 2 4Actual Spread Mean 5.33 -2.81 -0.02 Standard Deviation 13.62 14.34 14.62 Lower Quartile -3 -12 -9 Median 5 -3 -1 Upper Quartile 14 6 9Difference Mean 0.13 -0.34 0.07 Standard Deviation 13.13 13.12 13.14 Lower Quartile -8 -9 -9 Median 0 0 0 Upper Quartile 9 8 8
Table 2
Summary statistics of the betting and actual point spreads of the 3,090 NFL games analyzed, varying based on the team of
record.
In addition to data on of past football games, this analysis also incorporates metropolitan
population as a proxy for fan support. Therefore, population estimates for each year, 1993 to
2005, were obtained from the online Census database for each metropolitan area with an NFL
team. Similarly, team revenue was collected as an alternative measure of fan support. This data
was gathered from Forbes.com’s annual valuation of NFL franchises.
Johnson 18
V. Data Analysis and Results
This paper attempted to examine the efficiency of the NFL betting market from multiple
angles. Most namely, the data was developed to update and test Gray and Gray’s results.
However, this data can also be used to conduct other widespread tests of efficiency, as well as
incorporating new variables in the search for a systematic inefficiency that can be exploited.
General Test of Market Efficiency
The most general test of market efficiency in the NFL betting market is to see whether,
on an aggregate basis, the actual spreads are highly correlated to the predicted spreads and do not
show a systematic bias. This model takes the basic OLS form of:
Ai = b0 + b1Si + εi
This OLS regression simply regresses the actual game spread (A) on the predicted spread (S) and
estimates the values of b0, a constant, and b1, a coefficient. If the spreads truly are accurate
forecasts of actual game spreads, b0 should take on a value of zero as a positive (or negative)
value would suggest actual spreads are consistently larger (or smaller) on an aggregate basis, and
b1 should take a value of one, in order to show perfect correlation to the actual spreads.
Therefore, the hypotheses for this test of efficiency are:
H0: b0 = 0 and b1 = 1
HA: b0 ≠ 0 or b1 ≠ 1
This model was run using the entire data set, including the games that were exactly predicted,
and the results are presented in Table 3. From the results, neither condition of efficiency can be
rejected and, therefore, the market appears to be efficient on an aggregate level. However, this
does not necessarily imply systematic inefficiencies do not exist on a lower level and, therefore,
this paper will continue to examine other existing models and variables for exploitable trends.
Johnson 19
Coefficient Estimate
0.52(0.23)1.03(0.04)
Basic test of market efficiency estimating A i = b 0 + b 1S i + εi using the complete sample of 3,176 games from 1993 to 2005.
Table 3
b 0
b 1
Gray and Gray’s Probit Models
The goal of probit models, like those developed by Gray and Gray, are to take exogenous
variables and estimate the probability of a binary variable taking the desired value. The first
model developed by Gray and Gray is a simple probit model of the form:
Wini = b0 + b1Homei + b2Favi + εi
The explanatory variables, Home and Fav, are dummy variables that take a value of one if the
team of record is the home team or the favorite, respectively, for a given observation. Similarly,
as the model is designed to predict the probability of a win against the spread and is not
concerned with margin of victory, the dependent dummy variable is one if the team of record
beats the spread and zero if it does not. This model was run using the sample of 3,090 games
and the results are presented in Table 4.
From the probit regression, the coefficient on Home is positive and the coefficient on Fav
is negative, meaning a home team is more likely to beat the spread and a favorite is less likely to
win against the spread. According to these results, a home underdog is the most likely to beat
Johnson 20
Coefficient Estimate
0.02(0.04)0.03
(0.05)-0.09(0.05)
b 2
Simple Probit test of market efficiency estimating Win i = b 0 + b 1Home i + b 2Fav i + εi using the sample of 3,090 games from 1993 to 2005.
Table 4
b 0
b 1
the spread and an away favorite is most likely to lose. Although Gray and Gray conclude from
their model the coefficient on the Fav variable is significant, and the two variables are jointly
significant, the coefficients resulting from the updated data are neither individually, nor jointly,
significant even though they share the same signs as those generated by Gray and Gray’s
regression. In addition, while Gray and Gray’s model correctly predicts 52.51% of in-sample
games, which is enough to earn a slim profit, the updated model only predicts 51.52% of in-
sample games correctly, which would earn a loss once the bookie’s commission is taken into
account.
Although Gray and Gray conclude the simple probit model exhibits systematic
inefficiencies, they also develop what they term an “augmented probit model” to incorporate
additional variables and increase prediction accuracy. As previously mentioned, this model
attempts to model commonly speculated stock market inefficiencies that good stocks with poor
recent performance are undervalued and bad stocks with good recent performance are
overvalued. In other words, momentum is overvalued in bettors’ minds and outweighs the actual
strength of a team. In order to test this, Gray and Gray’s model takes the following form:
Johnson 21
Wini = b0 + b1HWPi + b2AWPi + b3HWP4i + b4AWP4i + b5HFavi + εi
Here, the home team is always the team of record, so the dependent binomial variable, Win, is
defined as it was previously except it equals one of the home team beats the spread and zero if it
does not. The HWP and AWP variables are the season-long winning percentages of the home
and away teams, respectively, which are intended to serve as proxy variables for a team’s actual
strength. On the other hand, HWP4 and AWP4 represent the home and away teams’ winning
percentages in the past four games, respectively, which are intended to represent what type of
momentum a team is exhibiting. Because HWP4 and AWP4 rely on each team’s results from the
previous four games, the first four games of each season were eliminated, thereby cutting the
sample to 2,321 games. The probit model was run using this sample and the results are
presented in Table 5.
Although Gray and Gray find b2, b4, and b5 are individually significant, the results from
the probit estimation using updated data only shows b2 is individually significant. In addition,
Gray and Gray find HWP and AWP are jointly significant, and the same is true for HWP4 and
AWP4. In this sample, the likelihood ratio test (LRT) statistic for HWP and AWP’s joint
significance is 12.11 (χ22 under the null), which is significant at any reasonable level, and the
LRT statistic for HWP4 and AWP4’s joint significance is 5.33 (χ22 under the null), which is
significant at the 90 percent level; for purposes of comparison, Gray and Gray’s LRT statistics
are 6.14 and 12.27, respectively. While this estimation seems to imply less significant results,
the most interesting aspect is the coefficients b1, b2, b3, and b4 have opposite signs as those
estimated by Gray and Gray. The coefficients on HWP and AWP are negative and positive,
respectively. This can be interpreted that a team that has performed well over the course of a
season is overestimated when at home and underestimated when on the road—perhaps implying
Johnson 22
Coefficient Estimate
-0.08(0.12)-0.26(0.22)0.71
(0.22)0.14
(0.14)-0.28(0.14)-0.08(0.06)
b 3
b 4
b 5
b 2
Augmented Probit test of market efficiency estimating Win i = b 0 + b 1HWP i + b 2AWP i + b 3HWP4 i
+ b 4AWP4 i + b 5HFav i + εi using the sample, excluding the first four games, of 2,321 games from 1993 to 2005.
Table 5
b 0
b 1
home field advantage is given too much weight. Conversely, the coefficients on HWP4 and
AWP4 are positive and negative, respectively, which can be interpreted that the market is
actually slow to react to recent performance and resists changing its opinion on a team’s strength
despite recent evidence to the contrary. Therefore, these results contradict Gray and Gray’s;
however, they also have reasonable explanations. While one can infer from Gray and Gray’s
results that the market lacks long-term memory and over-reacts to recent events, these results
suggest the market actually clings to its beliefs about a team’s strength—even though a team
may develop or deteriorate over the course of a season.
Another way to compare Gray and Gray’s model with the model generated by updated
data is by examining the actual success rates generated by the models. In their paper, Gray and
Gray show their model predicts the correct winner 54.6 percent of the time—which provides an
Johnson 23
excess return of 2.2 percent after accounting for the bookie’s vigorish. However, as expected
given the lower values of significance, the updated model only predicts 53.8 percent of games
correctly—an excess return of 1.4 percent.
Out-of-Sample Results
While both Gray and Gray’s model and my model show results that indicate it would
have been possible to systematically earn a profit over the time span of the data samples, this, in
itself, does not imply a bettor can use these models to earn a profit in the future. In other words,
the predictive value of out-of-sample results is much more important than the success rate for in-
sample results.
With this in mind, Gray and Gray conduct a simple test of out-of-sample success. They
use their data from 1976 to 1992 to estimate the augmented model and apply the results to games
from 1993 and 1994. In this test, Gray and Gray find their model predicts 56.0 percent of games
correctly—an excess return of 3.6 percent, which is even greater than their in-sample results. In
order to similarly test the updated results, I estimated the model using ten years of data and two
time spans—1993 to 2002 and 1994 to 2003—and applied the estimates, as applicable, to three
sample periods—2003 to 2004, 2004 to 2005, and 2003 to 2005. The results of the success rates
are shown in Table 6. Unlike Gray and Gray’s, these out-of-sample tests show success rates of
less than 50 percent in two cases, and the prediction of 2003 and 2004 results was successful
52.5 percent of the time—which is only marginally, and not statistically, greater than the break-
even point of 52.4 percent, and is not significantly different from 50 percent when tested under a
binomial distribution. Therefore, I cannot conclude the model accurately predicts out of sample
results.
Johnson 24
Years Predicted
Estimated Using Bets Wins Win %
2004-2005 1994-2003 377 178 47.21%2003-2004 1993-2002 375 197 52.53%2003-2005 1993-2002 562 277 49.29%
Table 6Success of out-of-sample predictions using games from the specified years to predict
outcomes of games in the specified years.
Dynamic Predictive Models
Due to the contradiction between Gray and Gray’s out-of-sample tests and mine, I was
concerned only running the tests using one or two estimations and relatively small sample sizes
may not be an accurate representation of my model’s success. In addition, as market sentiment
may change over time, inefficiencies may shift or change as was witnessed with the signs of
coefficients in the augmented probit results. For example, if the market develops an opinion that
home favorites are generally too big of favorites (i.e. the coefficient is negative), money will
shift to the away teams. However, given data on these subjects is not readily available, there
may be an information lag and the lines will shift too much in the direction of road underdogs
(i.e. the coefficient on HFav would be positive). Therefore, I analyzed the data set using a
variety of dynamic models which took data from the past one, three, five, and ten years to predict
game results for the next season. In addition to varying the number of seasons the model was
estimated using, I also varied the cut-point for placing bets. In other words, using a cut-point of
c would mean the bettor would place a bet on the home team if the model assigned a probability
of at least c to the home team winning, and would bet on the away team if the model predicted
Johnson 25
the home team would win with a probability less than 1-c. This system of models was run and
the results are presented in Table 7.
An analysis of the total number of bets placed and bets won under each strategy provides
some interesting results. First, the sample using three years to estimate the models provides the
best results, and the sample using ten years of data provides the worst. In addition, the sample
using three years is the only sample that significantly predicts more than 50 percent of games
correctly. However, upon further examination, this trend might be attributed to the fact that there
are only three years of data to predict using the ten year models and 2005 seems to have been a
difficult year to predict across the board. In fact, when looking at all of the models’ predictive
value over the 2003 to 2005 time span, there are a few data points slightly above 50 percent and
the rest are below. Therefore, no conclusive results can be reached on the optimal number of
years’ results to use to estimate the model.
DRAFT Johnson 26
Cut-PointNumber of
Years of Data for Regression
Number of Years
Predicted by Model
Total Bets Placed
Total Bets Won Success Rate P -value
(>50%)P -value
(>52.4%)2003-2005
Success Rate
1 12 2,152 1,098 51.02% 0.166 0.896 48.30%3 10 1,808 948 52.43% 0.018 0.479 49.55%5 8 1,459 726 49.76% 0.562 0.977 47.41%
10 3 559 275 49.19% 0.632 0.930 49.19%
Cut-PointNumber of
Years of Data for Regression
Number of Years
Predicted by Model
Total Bets Placed
Total Bets Won Success Rate P -value
(>50%)P -value
(>52.4%)2003-2005
Success Rate
1 12 1,961 1,001 51.05% 0.171 0.881 48.13%3 10 1,647 855 51.91% 0.057 0.645 48.91%5 8 1,325 673 50.79% 0.273 0.874 48.59%
10 3 509 251 49.31% 0.605 0.911 49.31%
Cut-PointNumber of
Years of Data for Regression
Number of Years
Predicted by Model
Total Bets Placed
Total Bets Won Success Rate P -value
(>50%)P -value
(>52.4%)2003-2005
Success Rate
1 12 1,762 903 51.25% 0.142 0.827 48.04%3 10 1,489 778 52.25% 0.039 0.536 49.13%5 8 1,176 600 51.02% 0.233 0.821 49.11%
10 3 458 226 49.34% 0.592 0.897 49.34%
0.50
Summary of dynamic modeling success rates using varying cut-point strategies.Table 7
0.55
0.60
DRAFT Johnson 27
Cut-PointNumber of
Years of Data for Regression
Number of Years
Predicted by Model
Total Bets Placed
Total Bets Won Success Rate P -value
(>50%)P -value
(>52.4%)2003-2005
Success Rate
1 12 1,557 807 51.83% 0.071 0.665 48.29%3 10 1,293 667 51.59% 0.121 0.712 48.21%5 8 1,045 531 50.81% 0.289 0.840 49.63%
10 3 408 205 50.25% 0.441 0.795 50.25%
Cut-PointNumber of
Years of Data for Regression
Number of Years
Predicted by Model
Total Bets Placed
Total Bets Won Success Rate P -value
(>50%)P -value
(>52.4%)2003-2005
Success Rate
1 12 1,327 684 51.54% 0.124 0.725 47.67%3 10 1,102 571 51.81% 0.108 0.640 48.65%5 8 892 455 51.01% 0.262 0.788 49.14%
10 3 358 178 49.72% 0.521 0.832 49.72%
Cut-PointNumber of
Years of Data for Regression
Number of Years
Predicted by Model
Total Bets Placed
Total Bets Won Success Rate P -value
(>50%)P -value
(>52.4%)2003-2005
Success Rate
1 12 1,119 584 52.19% 0.067 0.544 46.29%3 10 912 470 51.54% 0.168 0.688 47.87%5 8 749 380 50.73% 0.331 0.810 49.31%
10 3 312 157 50.32% 0.433 0.751 50.32%
0.70
0.75
Table 7 (cont.)Summary of dynamic modeling success rates using varying cut-point strategies.
0.65
DRAFT Johnson 28
Cut-PointNumber of
Years of Data for Regression
Number of Years
Predicted by Model
Total Bets Placed
Total Bets Won Success Rate P -value
(>50%)P -value
(>52.4%)2003-2005
Success Rate
1 12 890 474 53.26% 0.024 0.293 49.77%3 10 737 382 51.83% 0.151 0.607 46.93%5 8 607 317 52.22% 0.128 0.519 49.57%
10 3 256 125 48.83% 0.623 0.860 48.83%
Cut-PointNumber of
Years of Data for Regression
Number of Years
Predicted by Model
Total Bets Placed
Total Bets Won Success Rate P -value
(>50%)P -value
(>52.4%)2003-2005
Success Rate
1 12 642 346 53.89% 0.022 0.213 48.70%3 10 562 303 53.91% 0.029 0.223 49.13%5 8 435 233 53.56% 0.062 0.297 49.70%
10 3 196 95 48.47% 0.639 0.849 48.47%
Cut-PointNumber of
Years of Data for Regression
Number of Years
Predicted by Model
Total Bets Placed
Total Bets Won Success Rate P -value
(>50%)P -value
(>52.4%)2003-2005
Success Rate
1 12 421 230 54.63% 0.026 0.167 52.69%3 10 372 211 56.72% 0.004 0.042 49.07%5 8 278 158 56.83% 0.010 0.061 52.25%
10 3 116 60 51.72% 0.321 0.522 51.72%
0.80
0.85
0.90
Table 7 (cont.)Summary of dynamic modeling success rates using varying cut-point strategies.
DRAFT Johnson 29
Cut-PointNumber of
Years of Data for Regression
Number of Years
Predicted by Model
Total Bets Placed
Total Bets Won Success Rate P -value
(>50%)P -value
(>52.4%)2003-2005
Success Rate
1 12 199 111 55.78% 0.044 0.153 51.22%3 10 158 93 58.86% 0.010 0.044 45.65%5 8 112 66 58.93% 0.023 0.069 53.33%
10 3 43 26 60.47% 0.063 0.112 60.47%
Summary of dynamic modeling success rates using varying cut-point strategies.
0.95
Table 7 (cont.)
Johnson 30
Another interesting trend in the data pertains to using different cut-point strategies. Gray
and Gray examine using cut-point strategies near .5 and do not reach conclusive results.
Therefore, I examined similar cut-points, as well as the results when more extreme values are
used, and the results are found in the same tables. As the tables show, for the various models,
the success rate stays fairly constant across cut-points for most “reasonable” values. In fact, it is
not until we used .85 and above that we start to observe changes in the success rate, and success
rates that are significantly greater than 50 percent. Furthermore, using cut-points of .90 and .95
actually provide success rates that are significantly greater than the break-even point of 52.4
percent in the samples that rely on three and five year results to estimate the models (significant
at a 5 percent level for three years and a 7.5 percent level for five years). However, observing an
extreme cut-point strategy drastically limits the number of bets a gambler can place each year,
which would inevitably increase the variance in returns the better would realize. Therefore, it is
possible these profits can be seen as compensation for this additional risk incurred by using an
extreme cut-point.
The Effects of Team Popularity on Probit Models
Another commonly speculated area of inefficiency in the betting market is the issue of
team popularity. The belief is teams with a larger fan base will have a disproportionate amount
of money on them as bettors tend to bet on their own team. The increased amount of money on a
team would have the effect of moving the line in that team’s direction, thereby making them
either a greater favorite or lesser underdog. If this is the case, it would be beneficial to bet on the
team that has less fan support in order to capture the inefficiency. For the purposes of testing
this theory, this paper attempts to use the two variables previously described in Section IV:
population and team revenue.
Johnson 31
In order to incorporate population as a proxy for fan support in the model, this paper tests
the potential significance both individually and as part of the augmented model. The individual
model takes the form:
Wini = b0 + b1LnPopHi + b2LnPopAi + εi
In this model, the Win variable continues to represent whether the home team wins, and the
LnPopH and LnPopA variables represent the natural logarithm of the home and away cities’
populations. Similarly, these variables are added to Gray and Gray’s augmented probit model
and it takes the form:
Wini = b0 + b1HWPi + b2AWPi + b3HWP4i + b4AWP4i + b5LnPopHi + b6LnPopAi + b7HFavi + εi
Theses models were run using the data set, and incorporating the set of population estimates for
each NFL city from 1993 to 2005, and the results for each are presented in Table 8.
Analyzing the results from these regressions show using city populations does not
provide a significant opportunity to gain an advantage over the betting market. In fact, there
appears to be absolutely no advantage to using this data as each coefficient’s absolute value is
.01 and each standard deviation is .04—yielding p-values of greater than .75 for each.
As mentioned, another way of incorporating fan support was through team revenues.
Therefore, similar models were produced:
Wini = b0 + b1LnRevHi + b2LnRevAi + εi
Wini = b0 + b1HWPi + b2AWPi + b3HWP4i + b4AWP4i + b5LnRevHi + b6LnRevAi + b7HFavi + εi
These models were run and the results are also presented in Table 8. While the results show an
improvement over using population, there is still not a significant advantage gained by using
team revenues to evaluate prospective bets. Therefore, these tests do not corroborate the
Johnson 32
commonly held belief that there is an inefficiency to be captured by betting against popular
teams.
Johnson 33
Coefficient Estimates Using Population(Fan replaced by Pop )
Estimates Using Revenue(Fan replaced by Rev )
0.04 -0.85(0.90) (1.54)0.02 0.28
(0.04) (0.21)-0.02 -0.11(0.04) (0.21)
Coefficient Estimates Using Population(Fan replaced by Pop )
Estimates Using Revenue(Fan replaced by Rev )
-0.04 -1.05(0.91) (1.54)-0.24 -0.23(0.22) (0.22)0.71 0.71
(0.22) (0.22)0.11 0.10
(0.14) (0.14)-0.28 -0.28(0.14) (0.14)0.01 0.30
(0.04) (0.21)-0.01 -0.11(0.04) (0.21)-0.06 -0.07(0.06) (0.06)
Probit test of the impact of fan support market efficiency estimating Win i = b 0 + b 1LnFanH i + b 2LnFanA i + εi using the sample of 2,321 games from 1993 to 2005, where Fan refers to Pop or
Rev .
b 7
Table 8Results from various tests examining the impact of fan support on efficiency.
Probit test of market efficiency estimating Win i = b 0 + b 1HWP i + b 2AWP i + b 3HWP4 i + b 4AWP4 i
+ b 5LnFanH i + b 6LnFanA i + b 7HFav i + εi using the sample of 2,321 games from 1993 to 2005.
b 0
b 1
b 2
b 3
b 4
b 5
b 6
b 2
b 0
b 1
Johnson 34
VI. Discussion
I found that while Gray and Gray’s augmented probit model still provides a significant
framework for analyzing historical data, the variables are less significant using updated data,
which would imply a lower level of inefficiency. In addition, while the model using recent
results is significant, one of the most interesting results is the coefficients suggest the market is
slow to incorporate recent performance in its decision making, which is a contradiction to Gray
and Gray’s findings. This finding could mean agents in the market are now slower to
incorporate recent information and cling to their original beliefs on team strengths. Perhaps this
can be attributed to the always increasing amount of information available to bettors in the
Information Age. Thus, when bettors form opinions on teams before, or early in, the season they
may be reluctant to change that opinion given the immense amount of information they used to
form it. This would be consistent with the Behavioral Economic research of Slovic, who
conducted an experiment using horse handicappers and found that as more information was
provided to the handicappers, their confidence in their predictions increased while their success
rates stayed stagnant.
Another possible explanation for the market being slow to react is not that it clings to its
beliefs, but, rather, teams’ abilities actually change too fast for the market to capture. The high
level of NFL parity in recent years is a popular discussion in sports; every year, teams that were
terrible the previous year make the playoffs, and visa versa. This high level of “class mobility”
is a relatively recent trend attributed to changes in the NFL’s structure in the past decade and a
half. For example, free agency—the process by which a player can switch teams at the end of
his contract and is not bound to a particular team—was started in a limited sense in 1989, and the
current policy was implemented in 1993; and a hard salary cap—the device that sets a league-
Johnson 35
wide maximum payroll each team may have—was introduced in 1994. These important
developments are each given credit for making the NFL more competitive and they were both
created in the first half of the 1990s. Therefore, this level of parity did not exist during the time
period Gray and Gray take their data from, but it is present in my sample. Assume, as an
extreme example, the value a bettor places on recent performance has not changed since the
beginning of Gray and Gray’s sample. It is possible that value was too high during Gray and
Gray’s time span when it was difficult for teams to change their true strength. At the same time,
it is possible the same value is too low for the present day NFL where teams become strong very
quickly. This situation would explain the apparent contradiction between Gray and Gray’s
results and mine. In addition, it is possible bettors do not place a high enough value on recent
performance because there is a lag between the increase in parity and when the bettor
incorporates this trend.
In addition to testing the continued success of other models, I also attempted to analyze
the out-of-sample success of these models in greater detail. I found that using a mid-range
number of years, such as three, to estimate models for the next year may be more successful than
relying on models using complete historical data, as it could better capture recent market trends.
However, while I have enough data to analyze the success of using three years to estimate the
model, a larger data set is needed to accurately compare these results to the results that would be
obtained using ten years of data. Therefore, I cannot prove using three years is better than using
ten years, and this is left as a potential extension of this paper. I also examined the use of
different cut-point strategies when dealing with probit predictions to decide which games to bet
on and found using extreme cut-points, such as .90 or higher, provides success rates significantly
greater than 52.4%. However, this strategy would not provide a bettor with as many betting
Johnson 36
opportunities—an average of one or two per week—and this low number of bets would expose
them to higher risk and more fluctuations in their returns.
Finally, I attempted to implement a common layperson belief that more popular teams
will have too much money wagered them and, therefore, betting against popular teams could
provide profitable results. Using both city population and team revenue as measures for fan
support, these variables did not provide significant evidence that there is an inefficiency relating
to fan support that can be exploited. One argument against those who promote this theory is the
majority of bettors are not making betting decisions as fans betting on their own teams; rather,
they are people who believe they can predict the success of games better than the market.
Johnson 37
VII. Conclusion
The purpose of this paper was to investigate whether inefficiencies that have been
identified in the NFL betting market continue to exist when analyzing present day data. In
addition, I attempted to see if these inefficiencies could be systematically exploited to predict
more than 50 percent of games correctly. Furthermore, as an actual gambler in the betting
market has to pay a commission for placing bets, I analyzed if these models could achieve
success rates of greater than 52.4 percent, which is the success rate needed to break-even when
accounting for a typical commission.
Using a complete sample of regular season games from 1993 to 2005, I have shown
historical data can be analyzed to find significant trends and what appear to be inefficiencies in
the market. However, the opportunities for applying these findings to future results are limited
as they do not provide good predictive power unless one is using extreme cut-points. Therefore,
a patient bettor who is willing to invest in building these models (which is an expensive process,
given the limited amount of data) can use them to generate consistent profits if they are willing
to only make, on average, one or two bets per week and incur potential risk due to the fact they
place so few bets. Otherwise, if the bettor wishes to relax his cut-point and increase the number
of bets he places, any profits he earns can be more correctly attributed to luck rather than
Econometrics.
Johnson 38
VIII. Works Cited
Avery, Christopher, and Judith Chevalier. "Identifying Investor Sentiment from Price Paths: The Case of Football Betting." Journal of Business 72 (1999): 607-636.
Basset, Jr., Gilbert W. "Point Spreads Versus Odds." Journal of Political Economy 89 (1981): 752-768.
Gandar, John, Richard Zuber, Thomas O'brien, and Ben Russo. "Testing Rationality in the Point Spread Betting Market." Journal of Finance 43 (1988): 995-1007.
Gray, Philip K., and Stephen F. Gray. "Testing Market Efficiency: Evidence From the NFL Sports Betting Market." Journal of Finance 52 (1997): 1725-1737.
Harville, David. "Predictions for National Football League Games Via Linear-Model Methodology." Journal of the American Statistical Association 75 (1980): 516-524.
Pankoff, Lyn D. "Market Efficiency and Football Betting." The Journal of Business 41 (1968): 203-214.
Sauer, Raymond D., Vic Brajer, Stephen P. Ferris, and M. Wayne Marr. "Hold Your Bets: Another Look At the Efficiency of the Gambling Market for National Football League Games." Journal of Political Economy 96 (1988): 206-213.
Slovic, Paul. “Behavioral Problems of Adhering to a Decision Policy” and [Handout]. Institute for Quantitative Research in Finance, May 1973. <http://www.decisionresearch.org/About/People/slovic.html>.
Stern, Hal. "On the Probability of Winning a Football Game." The American Statistician 45 (1991): 179-183.
Vergin, Roger C., and Michael Scriabin. "Winning Strategies for Wagering on National Football League Games." Management Science 24 (1981): 809-818.
Woodland, Bill M., and Linda M. Woodland. "The Effects of Risk Aversion on Wagering: Point Spread Versus Odds." Journal of Political Economy 99 (1991): 638-653.
Zuber, Richard A., John M. Gandar, and Benny D. Bowers. "Beating the Spread: Testing the Efficiency of the Gambling Market for National Football League Games." Journal of Political Economy 93 (1985): 800-806.