topgolf inc: demand forecasting - s2.smu.edubarr/4395/history/presentations/2014/topgolf... ·...
TRANSCRIPT
Senior Design Final Report
TopGolf Inc: Demand Forecasting Prepared by: Kaitlyn Farmer, Janie Flowers, Zach Taher May 8th, 2014
2
Table of Contents
Management Summary ............................................................................. 1
Background and Description of Project .............................................. 2
Analysis of the Situation .......................................................................... 3
Technical Description of the Model ..................................................... 4
Dallas R Software Output ...................................................................... 6
Dallas R Software Output (No 2014) ................................................. 8
Austin R Software Outputs ................................................................ 10
Analysis and Managerial Interpretation ......................................... 12
Austin, TX Site Regression Chart ..................................................... 12
Dallas, TX Site Regression Chart ..................................................... 12
Dallas, TX Site (No 2014) Regression Chart................................. 13
Conclusions and Critique ...................................................................... 14
Appendix A - Austin Diagnosis ............................................................ 16
Appendix B - Austin Plot 2 .................................................................... 16
Appendix C - Dallas Plot ........................................................................ 17
Appendix D - Dallas Diagnostics ......................................................... 17
Appendix E- Dallas Plot (No 2014) .................................................... 18
Appendix F - Dallas Diagnostics (No 2014) .................................... 18
1
Management Summary
TopGolf is a unique entertainment facility that provides guests with a new type of golfing
experience that allows players of all ages and skill levels to participate in a fun point-scoring golf
game. Players can hit microchip golf balls at targets located in the outfield and track their hits with
the balls unique tracking system. TopGolf is also a premier event venue that provides guests with
upscale food and beverage options, VIP memberships, special events, private event services, and
much more. They currently have 10 locations in two countries and will have opened 90 more sites
within the next five years. That’s over 900% growth in five years. Despite TopGolf’s great amount
of success and growth, they currently have no form of revenue forecasting models or programs to
help them better predict their daily business operations. TopGolf’s quick growth and massive
success has allowed them to get by without having to forecast their customers spending patterns.
However, without any type of demand forecasting models, TopGolf indicated to us the difficulty this
creates in many daily business decisions such as optimizing the staffing schedule.
With this in mind, our team decided that our senior design project would be to create a type
of forecasting model to help TopGolf better track revenue data and predict daily demand. Our team
was also interested in finding out if weather conditions had any effect on customer play and sales at
TopGolf. Our project would look at these variables and create a future revenue forecasting model to
be implemented to help staff scheduling, so that TopGolf can better meet Key Performance
Indicators, such as sales per employee hour.
Our team was given over 1.8 million data points from TopGolf’s data revenue collection file.
Our team set out to identify potential factors that could influence revenue, build an accurate model
that will forecast future demand with respect to these factors. Over the course of our project our
team used AWK, a software tool, to break down the large revenue file into multiple files based on
specific site data. We also used Microsoft Excel to organize, manipulate, and run basic regressions
of the weather and revenue data. Finally, we used R software to create models, analyze the data,
and build our final forecasting models.
The first step of analysis was discovering which variables would be significant in
forecasting revenue. We looked at TopGolf’s revenue data, day of the week, month of the year, and
historical weather data to create a forecasting model. The data was inconclusive for the Dallas, TX
location due to the new collection process; however, the Austin, TX site had a good collection of
data we were able to use to create a model. We concluded using the Austin data that day of the
week plays the largest role in determining revenue for a given day at TopGolf. Through our model,
we found that weekends yield better results than the weeknights. In addition, we found that
TopGolf also performs very well when the weather is between 51 and 70 degrees Fahrenheit.
2
Background and Description of Project TopGolf is a new company, with operations only spanning over five years. After Ken May,
Chief Operating Officer, came on board with the company he made it his mission to incorporate
Business Intelligence strategies into daily business operations. Less than a year ago, Ken May
created a Business Intelligence and Analytics group at the company to start tracking and modeling
data. Coming into this project, the Business Intelligence and Analytics group at TopGolf was still in
its preliminary stages of working towards condensing all the data collection that had been
untouched for the entire life of the company. TopGolf’s had many problems when it came to
Business Intelligence. TopGolf has all this data that is not being tracked, organized, or put into any
standardization models. They have no current forecasting software. The company is manually
forecasting daily sales in spreadsheets. With all of these problems, it is very difficult for TopGolf to
come up with a standardization in making daily business decisions such as creating optimal staffing
schedules for their multiple sites.
Our senior design team was motivated to apply our expertise in Management Science
modeling and analytics to help TopGolf create this solution to this problem. The purpose of our
study is to help TopGolf to predict customer treads by creating a demand forecasting model, which
will allow them to make more informed business decisions.
Our team had to first decide what software we would use to create the forecasting model.
TopGolf currently was using an advanced Microsoft Business Package. So we decided to start
running our models in excel. We ran preliminary regressions in excel to see if there was any
correlation. However, Excel is only able run regressions using 16 variables. So after seeing some
correlation in the data using excel and speaking with TopGolf about what software they could
utilize, we converted the data to CSV format and uploaded it into R software which is able to handle
more variables.
Another decision we had to make is how to group the variables. We decided to group the
variables by the day of the week, month of the year, and temperature range. We used these
variables against the TopGolf revenue data to analyze if they affected the revenue for a given time.
We chose these variables through deductive reasoning using personal experiences to hypothesize
what would be largest influencers on revenue.
There were many variables to consider when looking at what weather data to include in the
study. We were able to find historic daily weather data that included tons information from
humidity to wind direction. Our team decided to only focus on a few key weather factors that we
believed might have an impact on customers TopGolf usage.
This model will affect the Business Intelligence (BI) group because they will be able to use
our model to implement future decisions in regards to staffing. Our results will help the BI group
compare the trends they have seen in-house and what we found. As TopGolf employees become
more aware of customer trends they will be able to better forecast revenues and become a more
efficient company.
3
Analysis of the Situation Our senior design teams problem/ goal to tackle this project was to create a forecasting
model that looked at the daily correlation between date, climate factors, and daily revenues
collected at a specific TopGolf sites. Our team’s general approach to the problem of demand
forecasting for TopGolf began with data mining the revenue data we received. We converted this
data to a txt file to ensure that it could later be put into excel. After reviewing the data file, we found
that it had over 1.8 million data points. Therefore from this point, we decided that to even consider
tackling the data, we must first break it down. We decided the most practical way to divide up the
data was to break the file down by the different TopGolf site locations. We were able to use AWK to
split this data up into the different sites. Once we had separated the data for the different TopGolf
sites, we were able to input it into Excel for further data scrubbing.
First, we started out comparing weather data and revenue data against each other in basic
Excel pivot charts and graphs. Next, our team created regression models in Excel using the Dallas,
TX site revenue data and Dallas weather data for further analysis. After analysis of the results we
saw a large spike in 2014 revenue data. After speaking with TopGolf, they stated that this spike was
due to the cause of their new revenue collection where customers rented bays per hour and no
longer by purchasing physical quantities of balls. Our team decided to first take a look at the results
without 2014 data, to see the effect the new pricing strategy had on the regression. Surprisingly,
these results turned out to provide an even worse result for forecasting the revenue. After finally
finding a somewhat significant correlation with the Austin, TX site, we decided to starting running
the data in R software to create models using larger amounts of data. Our team started running the
Dallas and Austin, Texas site data in R software. We decided to not use the Houston, TX site data
because it showed poor correlation. It seems that the reporting for the Houston site was off due to
the total revenues were very low and often below $10,000 for certain days. This caused a significant
impact in the results and was not even able to produce R2 values above 0.1 in Excel, despite running
it many different ways. Fortunately, the results of the Austin data were promising, achieving an R2
value around 0.7. The only factor that reduces this results credibility is the fact that the Austin,
Texas TopGolf site data only holds about nine months of data.
Our team was able to find distinct weekly and monthly trends in this data which can be seen
in the graphs in the Appendix. Dallas regressions were not as promising; with the R2 value resulting
in 0.2, so we decided to run the regression twice. The first time we used the data is that was
provided to our team by TopGolf. However, as you can see in the Dallas plot graph (Appendix C),
there is an obvious change in reporting between 2013 and 2014 which throws off the regression.
To account for this change, we took away all of the 2014 data (see Appendix E) and ran the
regression again. Unfortunately, the results were still not providing strong correlation, as you can
see from the actual vs. forecasted plotted points (Appendix E). The full models created can be found
further in this report to further clarify variables, technical analysis, and project results.
4
Technical Description of the Model Data was initially received from Top Golf in a crystal reports format. Crystal Reports is a
SAP program which pulls database queried information into a viewable report. Another reason files
are exported in this format is so that the data can be compressed and large amounts of data are not
hogging computer processing power. We initially tried to install a free report viewer for the data
however the data would not open using this program and we were forced to look for other methods
to view the data. By changing the file extension to a .txt file, we were able to open the file with
Notepad. The data file was very large, over 800 megabytes in size, and contained over 1.8 million
rows of data. This was due to the fact that the file contained every transaction for the last two years
at all 10 operating top golf sites.
Once we were able to actually view the data, we needed to find a method in which to clean
up the data to find the chunks that would be useful to us. Initially we read the data into Excel,
however Excel would only take slightly over one million rows of data in, leaving half of our data file
unused. Fortunately, the data was ordered by site ID. The Top Golf site ID for the Dallas location
was 5 and we were able to run data validation in Excel which left us only with the data for the
Dallas location. A problem we ran into was that the date recorded for each entry was not split up by
month day and year. Instead it was provided as a “Date ID” with a value such as “20130411”. By
analyzing multiple lines of data, we were able to infer that the first four numbers of the Date ID
contained the year, the next two numbers contained the day of the week, and the last two numbers
were the month number. We were able to use Excel’s =RIGHT(string, #characters), =MID(string,
start character #, #characters) and =LEFT(string, #characters), functions to change the info into
separate year, month, and day columns. Using this data we were then able to use the
=WEEKDAY(date) function in Excel to determine the weekday for each data point, which would be
7 of the variables that we used in our analysis.
The data still required more scrubbing because it could not yet be used in a regression. We
needed to delete all of the entries with no reported revenue. These showed up as NULL and could
not be used as their value would greatly skew any regression models. Unneeded columns such as
transaction type and transaction amount were deleted. We considered using transaction type as a
variable however there were over 25 different transaction types and it was not known how we
would assign numerical values to each different transaction type. Lastly, we put the revenue into a
pivot table so that we would have daily sums.
After cleaning up all the data, we needed to add weather data. We pulled the data from
www.wunderground.com which had daily reports of weather statistics for the past five years. Our
data only went back three years so we had weather data for each TopGolf operating. The weather
data files contained much more data than we needed and had info such as minimum temperature,
maximum temperature, average temperature, wind speed, humidity, visibility, and precipitation.
We decided that the temperature and precipitation data were the data that would have the greatest
impact and decided to use these in our model. This data was able to be added to the TopGolf data
using multiple VLookUps in excel where the formula would search for the date in the weather
spreadsheet and then pull the relevant information. To illustrate, we would use the date in the
5
TopGolf revenue sheet, the formula would then look for the corresponding date in the weather
spreadsheet and then look into the specified column (temperature or precipitation) for that row.
This accurately and quickly pulled all of the relevant weather data into our main top golf data file.
After the data had been cleaned, we needed to select variables which looked to have a
significant impact on revenue. To help us decide on these variables, we graphed the raw daily
revenue data in which there were obvious seasonality and weather trends. We also predicted
originally that weather would have a significant impact on the revenue and decided to use
temperature and precipitation as variables as well. Originally, and mistakenly, we were using the
variables as typical numerical values. Thus we essentially only had four variables, weekday (with
values ranging from 1-7), month of year (values ranging from 1-12), precipitation, and temperature
(ranging from 40-100+). We ran an initial regression on the data using Excel’s regression analysis
tool and came away with an R2 value less than 0.1 indicating poor correlation. We concluded that
this was poor model design due to our variables. In this type of scenario, we would be assuming
that later in the week, later in the year, and the hotter it was, the greater revenue would be. Under
this assumption, a 110 degree day on a Sunday in December would produce the highest revenue.
Thus we decided to change our variables into binary variables. This meant each day of the week
and each month would become its own independent variable. The weather data we would then
stratify and use IF formulas in excel to assign either a 1 or 0 depending on if the average
temperature fell within that range. We kept precipitation as a normal variable assuming that it
would have a negative coefficient and that the more it rained in a day, the less revenue top golf
would recognize.
We again attempted to run the regression in Excel; however Excel’s regression tool will only
take 16 variables. We used day of the week, and temperature data, and precipitation for a total of
16 variables. The regression gave us a correlation coefficient of 0.28 which was still poor. The
coefficient for the precipitation data was strangely positive as well indicating that with more rain,
there would be more revenue. This seemed incorrect and after reassessing the precipitation data,
determined that it was incorrect. There had been days where there had been rain according to
other sources where our data did not indicate such. As a result we decided to remove the
precipitation data from our model. This increased our R2 value to 0.3 which was a slight
improvement.
We noticed there were a few data points that were large outliers. There were certain days
with revenue over $500,000 when a typical day sat between $50,000 and $100,000 in revenue. We
decided to remove these days from our model. We also noticed a large jump in sales for 2014 data.
Daily revenue was almost 4 times what it was in previous years. This was not simply a result
increased business but was a reporting difference. We deleted the 2014 entries and ran another
regression which resulted in a R2 = .35 which was another improvement.
In an effort to get better results, we decided to look at another site. That site ended up being
the Austin site which we chose because the site was opened with the new reporting system. The
methods above were repeated to clean the Austin data and after running a regression, we were able
6
to achieve an R2 = .65 which showed that there was actually correlation between the daily revenue
and the variables that we selected. We still had not run the regression with all 25 variables due to
Excel’s variables limits so we decided to use the R statistical package.
To get the data into R, we saved the Excel files into csv format which can easily be read into
R using the read.csv(csv name) command. Each file was read into the program and saved as its own
data frame. We then used R’s lm command to designate a linear regression and designated which
variables we wanted to use. The variables analyzed in the regression are below:
Temperature range (50 and below, 51-60 degrees, 61-70 degrees, 71-80 degrees, 81-90
degrees, above 90 degrees)
Day of the Week (Monday, Tuesday, Wednesday, etc)
Month of the year (January, February, March, etc)
The regression was run three different times, once for all of the Dallas data, once for the Dallas data
without 2014 revenue, and once for the Austin location. The outputs of R’s linear regression are
below:
Dallas R Software Output
Call:
lm(formula = tgd$Total.Amount ~ tgd$Below.50 + tgd$X51.60 + tgd$X61.70 +
tgd$X71.80 + tgd$X81.90 + tgd$ABOVE.90 + tgd$Monday + tgd$Tuesday +
tgd$Wednesday + tgd$Thursday + tgd$Friday + tgd$Saturday +
tgd$Sunday + tgd$January + tgd$February + tgd$March + tgd$April +
tgd$May + tgd$June + tgd$July + tgd$August + tgd$September +
tgd$October + tgd$November + tgd$December)
Residuals:
Min 1Q Median 3Q Max
-10332 -2551 -71 1049 60006
Coefficients: (2 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
7
(Intercept) 7402.8 2307.2 3.209 0.001400 **
tgd$Below.50 -1076.8 2141.7 -0.503 0.615273
tgd$X51.60 741.3 2134.4 0.347 0.728465
tgd$X61.70 245.9 2158.8 0.114 0.909337
tgd$X71.80 -881.5 2322.0 -0.380 0.704334
tgd$X81.90 -696.6 2456.5 -0.284 0.776832
tgd$ABOVE.90 -886.8 2738.1 -0.324 0.746151
tgd$Monday -593.8 1039.2 -0.571 0.567940
tgd$Tuesday -42.1 1044.0 -0.040 0.967842
tgd$Wednesday 207.8 1041.4 0.200 0.841934
tgd$Thursday 430.1 1044.6 0.412 0.680701
tgd$Friday 2052.2 1041.0 1.971 0.049108 *
tgd$Saturday 3652.9 1037.2 3.522 0.000459 ***
tgd$Sunday NA NA NA NA
tgd$January -6666.0 1300.8 -5.125 3.94e-07 ***
tgd$February 576.4 1338.5 0.431 0.666884
tgd$March 1090.6 1398.4 0.780 0.435731
tgd$April -7034.9 1692.5 -4.156 3.67e-05 ***
tgd$May -5736.4 1754.1 -3.270 0.001131 **
tgd$June -6266.9 1871.1 -3.349 0.000857 ***
tgd$July -6370.5 1908.2 -3.338 0.000891 ***
tgd$August -6465.6 1920.7 -3.366 0.000807 ***
tgd$September -6597.4 1772.3 -3.722 0.000214 ***
tgd$October -7311.5 1451.8 -5.036 6.17e-07 ***
tgd$November 277.6 1346.8 0.206 0.836777
tgd$December NA NA NA NA
8
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7199 on 647 degrees of freedom
Multiple R-squared: 0.2278, Adjusted R-squared: 0.2004
F-statistic: 8.3 on 23 and 647 DF, p-value: < 2.2e-16
Dallas R Software Output (No 2014)
Call:
lm(formula = tgd$Total.Amount ~ tgd$Below.50 + tgd$X51.60 + tgd$X61.70 +
tgd$X71.80 + tgd$X81.90 + tgd$ABOVE.90 + tgd$Monday + tgd$Tuesday +
tgd$Wednesday + tgd$Thursday + tgd$Friday + tgd$Saturday +
tgd$Sunday + tgd$January + tgd$February + tgd$March + tgd$April +
tgd$May + tgd$June + tgd$July + tgd$August + tgd$September +
tgd$October + tgd$November + tgd$December)
Residuals:
Min 1Q Median 3Q Max
-1657.9 -381.0 -35.0 255.5 6455.2
Coefficients: (2 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1662.86 274.89 6.049 2.87e-09 ***
tgd$Below.50 -831.70 254.58 -3.267 0.001163 **
tgd$X51.60 -681.00 245.29 -2.776 0.005705 **
tgd$X61.70 -567.34 246.78 -2.299 0.021920 *
9
tgd$X71.80 -445.17 257.40 -1.730 0.084340 .
tgd$X81.90 -487.23 268.99 -1.811 0.070697 .
tgd$ABOVE.90 -473.90 294.07 -1.612 0.107704
tgd$Monday -454.83 116.93 -3.890 0.000114 ***
tgd$Tuesday -491.54 117.24 -4.193 3.27e-05 ***
tgd$Wednesday -429.33 116.45 -3.687 0.000252 ***
tgd$Thursday -324.84 116.74 -2.783 0.005597 **
tgd$Friday 145.94 116.49 1.253 0.210851
tgd$Saturday 601.05 116.18 5.173 3.34e-07 ***
tgd$Sunday NA NA NA NA
tgd$January 245.35 182.08 1.348 0.178434
tgd$February 185.24 187.48 0.988 0.323611
tgd$March 507.19 182.88 2.773 0.005758 **
tgd$April 254.09 189.55 1.340 0.180712
tgd$May 796.26 194.63 4.091 5.01e-05 ***
tgd$June 226.26 205.13 1.103 0.270555
tgd$July -11.34 208.43 -0.054 0.956623
tgd$August -38.72 209.47 -0.185 0.853425
tgd$September -123.22 196.21 -0.628 0.530309
tgd$October -185.41 176.51 -1.050 0.294014
tgd$November 59.00 188.43 0.313 0.754335
tgd$December NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 707.3 on 496 degrees of freedom
10
Multiple R-squared: 0.3243, Adjusted R-squared: 0.293
F-statistic: 10.35 on 23 and 496 DF, p-value: < 2.2e-16
Austin R Software Outputs
Residuals:
Min 1Q Median 3Q Max
-61981 -9307 106 10391 56856
Coefficients: (4 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37867 13806 2.743 0.006516 **
tga$MONDAY -11384 4360 -2.611 0.009553 **
tga$TUESDAY -2987 4335 -0.689 0.491455
tga$WEDNESDAY 3502 4360 0.803 0.422497
tga$THURSDAY 11532 4360 2.645 0.008675 **
tga$FRIDAY 23180 4366 5.309 2.37e-07 ***
tga$SATURDAY 40430 4374 9.243 < 2e-16 ***
tga$SUNDAY NA NA NA NA
tga$BELOW.50 34155 14310 2.387 0.017716 *
tga$X51.60 49210 14492 3.396 0.000792 ***
tga$X61.70 47933 14593 3.285 0.001162 **
tga$X71.80 -2500 5325 -0.469 0.639165
tga$X81.90 -1034 7665 -0.135 0.892813
tga$Above.90 6951 10313 0.674 0.500899
tga$January -20957 5032 -4.164 4.25e-05 ***
tga$February -52789 5184 -10.183 < 2e-16 ***
11
tga$March -54662 5911 -9.247 < 2e-16 ***
tga$April NA NA NA NA
tga$May NA NA NA NA
tga$June -61289 9510 -6.445 5.62e-10 ***
tga$July -24679 9012 -2.738 0.006603 **
tga$August -25144 9270 -2.712 0.007125 **
tga$September -28664 8418 -3.405 0.000767 ***
tga$October -32609 5885 -5.541 7.38e-08 ***
tga$November -35586 5166 -6.889 4.25e-11 ***
tga$December NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 19280 on 259 degrees of freedom
Multiple R-squared: 0.618, Adjusted R-squared: 0.587
F-statistic: 19.95 on 21 and 259 DF, p-value: < 2.2e-16
The results of the regression were a little worse than what we had achieved in excel. We
then graphed the forecasted revenue against the actual revenue for each scenario using R as well as
ran regression analysis and graphed these to illustrate the accuracy/inaccuracy of our models.
These graphs can be found in the Appendix.
12
Analysis and Managerial Interpretation We came up with the following forecasting models for each of the three scenarios.
Austin, TX Site Regression Chart
Austin Regression
Day of Week Month Temperature
Variable Coefficient Variable Coefficient Variable Coefficient
Monday -11,384 January -20,957 50 and Below 34,155
Tuesday -2,987 February -52,798 51-60 Degrees 49,210
Wednesday 3,502 March -54,662 61-70 Degrees 47,933
Thursday 11,532 April N/A 71-80 Degrees -2,500
Friday 23,180 May N/A 81-90 Degrees -1,034
Saturday 40,430 June -61,289 Above 90 Degrees 6,951
Sunday N/A July -24,679
August -25,144
September -28,664
October -32,609
November -35,586
December N/A
Intercept of 37867
*Coefficients with N/A are due to singularities
Dallas, TX Site Regression Chart
Dallas Regression
Day of Week Month Temperature
Variable Coefficient Variable Coefficient Variable Coefficient
Monday -594 January -6,666 50 and Below -1,077
Tuesday -42 February 576 51-60 Degrees 741
Wednesday 208 March 1,091 61-70 Degrees 246
Thursday 430 April -7,035 71-80 Degrees -882
Friday 2,052 May -5,736 81-90 Degrees -697
Saturday 3,653 June -6,267 Above 90 Degrees -887
Sunday N/A July -6,371
August -6,466
September -6,597
October -7,312
November 278
December N/A
Intercept of 7402.8
*Coefficients with N/A are due to singularities
13
Dallas, TX Site (No 2014) Regression Chart
Dallas Regression (No 2014 Data)
Day of Week Month Temperature
Variable Coefficient Variable Coefficient Variable Coefficient
Monday -455 January 245 50 and Below -832
Tuesday -492 February 185 51-60 Degrees -681
Wednesday -429 March 507 61-70 Degrees -567
Thursday -325 April 254 71-80 Degrees -445
Friday 146 May 796 81-90 Degrees -487
Saturday 601 June 226 Above 90 Degrees -474
Sunday N/A July -11
August -39
September -123
October -185
November 59
December N/A
Intercept of 1662.9
*Coefficients with N/A are due to singularities
The results for the Dallas location showed poor correlation between our selected variables
and the revenue reported. This is most likely due to the change in reporting at the Dallas site. The
system that Top Golf had in place prior to January of 2014 poorly reported revenue on transactions
and showed in the results, generating a poor regression model. This is one of the reasons we
decided to repeat our methods in Austin where the new reporting software had been used since the
opening of the facility. Our results from the Austin model show that there is some validity to our
forecasted model. With time, as Top Golf collects more and more revenue, it is likely that more
trends will begin to emerge in the data and it will be easier to accurately forecast revenue.
The Austin location is the best site to look at if we are going to make any conclusions from
the model. First, we can look at the coefficients from the day of the week data. It is obvious that
revenues are much higher on weekends than they are on weekdays. In fact, as the week goes on,
revenue slightly increases with a peak on Saturday, a decrease on Sunday, and then a sharp decline
for Monday of the following week. Another conclusion that can be made is that revenue seems to be
higher when it is below 70 degrees or above 90 degrees. This could be due to customers knowing
that Top Golf has heated and air conditioned bays. Our hypothesis on this result is that people
would rather be outside when the weather is ideal. They know that if it is cold outside, they can
play Top Golf with heaters on when they could not play golf on a golf course. The same applies for
the high temperatures when customers may want to avoid staying out of the blistering Texas heat.
Another possible reason for increased revenue in high temperature is that these high temperatures
happen in the summer when customers are more likely to go out for entertainment than during the
school year.
14
We believe that the Austin regression model should be applied but cautiously so. The issues
with the model is that it is coming from less than a year of data, the results are somewhat correlated
but not very highly correlated, and the coefficients are large which could cause very large variations
in day to day predictions. The best application of this model would be to consider these variables
when TopGolf implements their new forecasting software in the future. I would also recommend
looking into many more variables and combinations of variables for these forecasts. Revenue
prediction is as much of an art as it is a science and it takes many iterations of trial and error to find
an accurate revenue model. It would be interesting to look at information such as how prior day
revenue affects the next day revenue. Other variables such as events in the area, promotions
(coupons, sales, etc), the number of memberships sold, and corporate bookings, may have large
impacts on revenue and should be considered. In addition to the many variables that need to be
considered, every day new data is added which may change the model. This means that analysis
will have to be done iteratively as new data is received. This is a large and complex issue which
requires a lot of dedication to solve.
Conclusions and Critique Based on our analysis and testing of various cases, we recommend to management that
correlation can be found, as shown in the Austin, TX site data. Also this information can be used to
forecast future demand using our model. However at this time we recommend that TopGolf,
continue to gather revenue data for all sites to create stronger models for all site locations.
Hopefully after multiple years of data collection, they will have more knowledge that will better
explain customer behaviors and clearly define all possible outliers to encompass into future
forecasting models that will allow them to create a very accurate model.
This potential for creating accurate forecasting models with future site date, the business
intelligence team to supply the rest of the company with demand forecast to help the business make
better daily operating decisions, such as staffing. Also, these results conclude that there is a
possibility for future models to be created to build software that can automatically calculate
amount of employees that should be staffed based on the weather forecast and date. This could
create huge opportunity for TopGolf to help reduce their operating cost and move towards
becoming a more efficient workplace.
As a senior design team we were faced with a lot of unavoidable problems. One area we
believe we could have approved upon would be to research outlying events that could affect data
before deciding to move forward with a project. It would have been helpful to know before running
multiple forecasts with the Dallas, TX site data that the whole company had switched over to a new
revenue collection model. TopGolf provided us with many years of data that we thought would
allow us to create very accurate forecasting models; however our team did not know that the
company had changed its revenue collection over to a new system a few years ago. Luckily we were
able to discover this information early on because any old revenue data before the change in
revenue collection would be pointless in forecasting future revenue and predicting demand.
However, this had a large impact in the forecasting of future revenues with their new pricing model
because very few data of this type existed. If our team was to have known about this change before
15
hand, it would have been easy to conclude that the data would not be mature enough to provide
sound results for TopGolf to implement in their daily business operations. As TopGolf does continue
to collect data from their sites over the next few years we are hoping that they will be able to
implement a more accurate version of the revenue forecasting models we created.