eco 231 empirical project mets attendance

Empirical Project:

Mets Attendance (1986-2011)

Kevin Mulcahey

ECO-231

Dr. Letcher

The College of New Jersey

2

I. Statement of the Problem

This empirical study seeks to discover the explanatory variables that best reflect the

stadium attendance of the New York Mets from 1986 to 2011. The explanatory variables that I

have decided to study are the years since last playoff appearance, payroll (adjusted for inflation),

number of all-stars, winning percentage, and average batter’s age. In order to discover the

relationship between these explanatory variables and the dependent variable of attendance,

multiple regression, residual plots, normal probability plots, and various other statistical analyses

will be employed.

II. Review of Literature Related to the Variables

Before beginning the study, statistical journals and analyses regarding Major League

Baseball attendance, payroll, winning percentages, and other variables will be consulted. The

first journal, written by Don N. Macdonald and Morgan O. Reynolds of Texas A&M University,

analyzes the relationship between players and their marginal product by using many of the same

explanatory variables from my empirical study. Marginal product is defined as the amount of

total revenue earned by the company by hiring one extra unit of labor.

Research from the seasons of 1986 and 1987 in Major League Baseball prove that players

are paid for what they earn for their respective baseball organizations in sales. The reason for

payroll correlating directly to revenue for MLB organizations has a lot to do with the institutions

of free agency and final offer arbitration. Free agency allows players to test the market and seek

the best offer for their abilities among all major league teams. Arbitration allows for pay

increases during the season, based upon performance. These two contractual outlets for players

allow their marginal revenue product to correlate more directly with ticket sales, attendance, and

3

team revenue. These findings relate to my data very closely, as my dependent variable is

attendance, and my explanatory variable with the lowest p-value is payroll (adjusted for

inflation). My data also begins in 1986, just like this study performed by MacDonald and

Reynolds. The allowance for arbitration officially began in 1970, when a second MLB collective

bargaining agreement allowed for an impartial arbitrator in settling player contract

disagreements, rather than the commissioner of baseball. In the season of 1985, arbitrators

discovered that owners across Major League Baseball were colluding to keep baseball player

salaries artificially low, thus reducing competitive bidding. For their collusion, owners were

fined $280 million dollars in damages, and baseball team payrolls have steadily increased each

season.

The journal article by Macdonald and Reynolds also relates to another one of my

variables. Winning percentage, they say, is not as important of a significant predictor of

attendance when compared with statistics that forecast a team’s success. For example, if a team’s

winning percentage increases from .500 to .550, the increase from .550 to .600 will not make a

noticeable impact on stadium attendance. This is due to the fact that fans view entertainment on

an, “ex ante” basis rather than an, “ex post,” basis. In other words, the forecasting of a team’s

success is more conducive of sales and attendance than post performance. People are more likely

to buy more tickets when they expect a team to perform well, rather than once they are already

doing well. In relating this idea to my variables, number of All-Stars would be a more significant

predictor of attendance than winning percentage. Payroll and All-Stars are similarly related in

that the more popular and high-quality a roster is, the more attendance will increase in a given

season.

4

Another statistical journal, written by Michael C. Davis of Missouri-Rolla University

analyzes the interaction between baseball attendance and winning percentage. According to

Davis, the interaction between baseball attendance and winning may not be completely obvious.

It is expected that as a team performs well, the organization’s, “bandwagon effect,” will come to

fruition and a team should, “therefore expect to see an increase in attendance during and

following seasons in which the team played well on the field.” This journal article implies that

although winning percentage affects attendance directly because winning has become

increasingly important to fans in recent years, attendance also could affect winning percentage.

When a team is successful and generates superior attendance and revenue, winning percentage

should rise. Successful organizations have more room in their budget to attain high quality

players. In my regression, I chose to place payroll and winning percentage as the explanatory

variables, and attendance as the dependent variable.

Interestingly enough, according to the article, only about half of the National League

teams in the MLB had “up-ticks” in attendance. This would indicate that winning percentage as a

significant predictor of attendance varies by team. Also, as far as the American League, some

teams such as the Yankees actually showed a negative shock response to winning, as it may be

possible that fans have almost become indifferent to the team’s consistent winning nature.

However, in the conclusion of the study, it showed that in the long-term, all ten of the sampled

teams (Cubs, Reds, Yankees, White Sox, Phillies, Pirates, Indians, Tigers, Cardinals, Red Sox)

showed positive attendance growth in regard to winning percentage. Also, by the conclusion of

the study, the data showed that only one team, the Indians, had a positive effect on winning

percentage from attendance. This would indicate that my chosen dependent variable, attendance,

5

is the best choice between the two. Winning percentage is a better explanatory variable in Major

League Baseball.

A final statistical analysis, conducted by market research analyst David P. Kronheim,

takes into account another variable that affected New York Met attendance in the past 3 years.

This journal discusses the effect of the stadium, which can have an effect when taken into

account with the numerical data I employed in my analysis. Kronheim raises the point that when

the Mets moved to their new stadium in 2009, Citi Field, attendance was going to decrease

regardless of performance. The total amount of seats in Shea Stadium was 57,365, whereas Citi

Field has only 41,800 seats. From 2005 to 2007, The Mets had gains in attendance of more than

470,000 per year, which left them within the top 2 teams in the National League in regard to

attendance increases. The Mets were very competitive during these last few years at Shea

Stadium. Kronheim notes that, “If the Mets had sold every single ticket possible in 2009,

including player and ‘comp’ tickets, their attendance still would have fallen by 656,243.

However, the Mets still had quality attendance in 2009 at Citi Field, as 3,168,571 spectators

attended.

The huge drop off from 2009 to 2010 of 576,166 (an 18.4% decline), has to do with the

lesser amount of seats, as well as other statistics. The Mets fell below a .500 winning percentage

again in 2009-2011 and had less all stars. In fact, “the smallest attendance at any Mets home

game in 2008 was 45,321, which is more than 3,500 higher than Citi Field’s capacity.” The Mets

were playoff contenders in their last year at Shea Stadium, although they missed out on the

playoffs on the last game of the season. Attendance that year was 4,042,045. I decided not to

include the type of stadium in my personal regression analysis, because the data dates back to

1986, and the Mets have only been at Citi Field since 2009. The majority of the analysis comes

6

from Shea Stadium from 1986-2008, and the new stadium statistics would only appear to be

outliers. However, I wanted to include this market research journal in my report because it could

partially explain the drastic drop in the most recent data I have compiled (2009-2011).

III. Data Sources and Descriptions

In compiling the information for my Mets data set from 1986 to 2011, I used two main

sources. For the dependent variable of attendance, as well as the explanatory variables of

winning percentage, years since last playoff appearance, payroll, and average batter’s age, I used

Baseball-Reference.com. In order to discover the number of all-stars per year for the New York

Mets, I used Mets.com. I also decided to adjust the payroll for each year from 1986 to 2010 for

inflation in order to have the most accurate comparison possible. The inflation calculator on

bls.gov aided me in this process.

As mentioned earlier, my data organizes the effect of years since last playoff appearance,

payroll (adjusted for inflation), number of all-stars, winning percentage, and average batter’s age

on stadium attendance for the New York Mets from 1986 to 2011 (Figure 1). For the first few

years of the data, namely 1986 to 1990, the Mets were very successful. After making the playoffs

in 1985, 1986, and 1988, and winning the World Series in 1986, total season stadium attendance

ranged from 2.7 million to 3 million. In these 5 years, The Mets had high winning percentages

ranging from .537 to .667, and a total of 19 all-stars, which is an extremely high amount.

In direct contrast to the years of 1986-1990, the Mets performed horribly from the years

of 1991 to 1998. The average batter’s age during these years was much younger than during

successful years (27 as opposed to 30 in 2000 when they made it to the World Series), winning

percentages were in the dismal range of .364 to .478, and payroll was much lower, highlighted

7

by the 35,015,247.14 team payroll in 1996. Attendance in these years was very low, rarely

breaking 2 million. The Mets performed poorly again from 2001 to 2005, performed well from

2006 to 2008, and are performed poorly from 2009 to 2011. These three Mets eras indicate

fluctuations in the dependent variable of attendance with most of the explanatory variables.

IV. Regression and Analysis

I have identified my explanatory variables, or X-variables, in this study as years since last

playoff appearance (YSLPA), payroll adjusted for inflation (Payroll), winning percentage (Win

%), number of all-stars (All-stars), and average batter’s age (Avg Batt. Age). My dependent

variable, or Y-variables, is attendance (Attendance). My data set is made up entirely of numeric

explanatory variables. First, individual scatter plots of each explanatory variable were created.

The scatter plot of the X-variable YSPLA against the Y-variable attendance is shown below.

0 2 4 6 8 10 120

50000010000001500000200000025000003000000350000040000004500000

f(x) = − 712380.93056898 ln(x) + 3285693.67286353R² = 0.499236624667553

f(x) = 3339126.98358037 x -̂0.304548335135481R² = 0.485690393216885

f(x) = 12415.3785763 x³ − 159529.959663 x² + 290506.684811 x + 2967486.31414R² = 0.63413029095782

YSLPA v. Attendance (Figure 2)

Years Since Last Playoff Appearance

Attendance

The scatter plot of YSPLA against Attendance indicates that as the amount of years since

the Mets have reached the playoffs increases, the attendance decreases. Originally, I tried a linear

trend line to fit the data. The linear line fit the data relatively well, aside from three data points

8

from years 8 through 10. The R-square for the linear line, was only .4, however, and I opted to

try a quadratic or cubic equation to reflect the curvature of the data in years 8, 9, and 10. The R-

square improved dramatically from .4 to .634. Although it would appear that there should be a

direct, negatively linear line that fits the data, there could be an explanation that explains the

curvature. In years 8, 9, and 10 of missed playoff berths, according to the data set, the Mets were

starting to come out of their decade-long slump and becoming possible playoff contenders. It is

possible that the fans, in anticipation of the improved play of the team, started to attend more

games.

Payroll, my second X-variable, against the Y-variable, has a scatter plot that also reveals

some curvature. By using the R-square as a measure of fit again, I decided that a linear fit was

not appropriate. A linear equation had an R-square of .1, while the 4-order quartic equation

0 50,000,000 100,000,000 150,000,000 200,000,0000

50000010000001500000200000025000003000000350000040000004500000

f(x) = 1.8609402E-25 x⁴ − 7.187176E-17 x³ + 9.79745286E-09 x² − 0.538350076 x + 11955827.293R² = 0.481238298393091

Payroll v. Attendance (Figure 3)

Payroll

Attendance

had an R-square of .48. In choosing a 4-order equation, I took into account the R-squares of

quadratic and cubic equations and decided that parsimony did not apply. With each increase of

higher orders, I received improved R-squares ranging from .08 to .10. Therefore, the increases in

fit were not minimal enough to simply leave the equation alone as a quadratic or cubic equation.

9

For my third X-variable, All-stars, I analyzed the coefficient of determination, the R-

square, once again. Although a linear equation was stronger than payroll, at .397, I still opted

with a cubic polynomial equation. The R-square for the fit of this equation was .419

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.50

50000010000001500000200000025000003000000350000040000004500000

f(x) = − 41835.174790099 x³ + 270645.06697743 x² − 49071.849825538 x + 1795789.1082761R² = 0.419157873670009

All-stars v. Attendance (Figure 4)

Number of All-stars

Attendance

Clearly, as the more talented players a team acquires increases, attendance increases. However,

there is still some curvature. A possible explanation for this curvature could be that as a team has

1, 2, 3 all-stars, the team’s prospects for attendance rises dramatically, but the excitement fans

have for all-stars 4, 5 and 6, increases at a slower rate. While the attendance rates are still higher,

this may suggest that a team only performs marginally better with more than 3 or 4 all-stars.

The fourth X-variable, Win%, has somewhat of a sporadic scatter plot. The goodness of

fit, regardless of the type of equation, seems to be relatively low. The highest R-square I was

able to attain was .284 with a cubic equation. The curvature indicates that there is low attendance

from winning percentages of .35 to .45 with little increases. This may suggest that even though it

is a much higher winning percentages, fans still do not wish to attend games because the team is

not competitive in the championship season. There are, however, dramatic increases from

10

winning percentages of .45 to .55. With these percentages, the Mets have a chance at playoff

aspirations.

0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.70

50000010000001500000200000025000003000000350000040000004500000

f(x) = − 243251228.120716 x³ + 370083180.274333 x² − 179730140.437141 x + 30221937.0151557R² = 0.283880070614012

Win% v. Attendance (Figure 5)

Winning Percentage

Attendance

There are still modest increases in attendance from .55 to .6, before it levels off. The next pattern

in the curvature reflects that attendance actually decreases at the winning percentage of .667, but

this could reflect an outlier, because the majority of the data has already leveled off from .6

to .65.

My final X-variable is the Mets average batter’s age, Avg. Batt. Age, for each season

from 1986 to 2011.

27 27.5 28 28.5 29 29.5 30 30.5 310

1000000

2000000

3000000

4000000

5000000

f(x) = − 93160.63899996 x⁴ + 11018274.10692 x³ − 488154740.5889 x² + 9602092047.389 x − 70754078648.07R² = 0.333701062726698

Avg. Batt. Age V. Attendance (Figure 7)

Average Batter's Age

Attendance

11

For this fifth variable, a polynomial equation was appropriate once again. A quartic equation

appeared to have the highest R-square, with a value of .334. A linear equation would not have

been as appropriate, because it appears that attendance rises from the average batter’s ages of

27.5 to 28, then levels off from 28.5 to 29.5, and rises dramatically from the age of 30 on. A

possible explanation for this curvature could be that an older lineup may have more experience,

reflect better performance, and thus affect attendance. This variable, however, proved to be

insignificant toward attendance, as shown by the multiple regression performed in the

subsequent portion of this study.

Additional statistical analysis, aside from scatter plots and goodness of fit, is required in

order to discover significant predictors of attendance. A multiple regression including each

explanatory variable against the dependent variable of attendance indicated that a form of

variable selection was necessary (Figure 8). First, a global F-test was run against each of the

variables to decide whether any of them were significant predictors. The hypothesis test for the

global F-test is as follows:

H0: B1+B2+B3+B4+B5= 0

Ha: At Least One of the Betas ≠ 0

As far as the alpha level for this hypothesis test, I decided to use an alpha of .15. My

reasoning is that when studies of social sciences are conducted with human beings, there is more

variation. Therefore, I do not want to reject any variables that could be found significant. The

global F-test showed a very low P-value of .00002. According to the P-value, I chose to reject

the null hypothesis, and concluded that at least one of the explanatory variables was significant.

eco 231 empirical project mets attendance

Documents