bsfc phoenix final report team 5 · bsfc - us airways final project megan hanson, delfin de las...
TRANSCRIPT
Business Forecasting with Prof. Shmuéli, Term 7 2012
Planning Ahead: Forecasting US Airways’ international passenger traffic in PHOENIX: How do historical information and other hubs’ performance influence prediction? Team 5: Megan Hanson, Delfin Rico, Filippo Sclafani, Yating Yu
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
1
EXECUTIVE SUMMARY
Our analysis aims to reveal insights from monthly international passenger traffic data in
the three major U.S. Airways hubs, Charlotte (CLT), Philadelphia (PHL), and Phoenix (PHX), to
predict future implications at the newest international hub, Phoenix. Operations in PHX
international hub began more recently (2007) than the other two hubs (2000), leading us to focus
on creating a forecasting model to predict the monthly flow of international passengers to
and from the PHX hub with fuller accuracy using the combination of insights.
Conclusions of the analysis revealed that, even after de-seasonalizing data from the
other two hubs using different methods, any model or combination of models using the data from
the other two hubs performed worse in predicting the flow of international passengers in Phoenix
than models based on PHX data only. Test results proved that the influence of the flow
passenger series in PHL and CLT is not statistically significant when predicting the flow
passenger series in PHX. Such conclusion is also intuitive considering that each hub runs flights
to different destinations, has different trend patterns, and reaches unique seasonal peaks.
Based on our research, we recommend a model using a combination of three
forecasting techniques that together are parsimonious in their simplicity, yet robust and flexible
enough to react to unique events like the travel advisory to Mexico during peak season 2011 and
create a reliable prediction. The first technique of the model is an additive multiple linear
regression, then the autocorrelation within seasons is removed by adjusting the errors using
ARIMA techniques. The 12-month out forecasting model is then further followed by naïve
error forecasting. Within this last component, the model should easily be used on a month to
month rolling forward basis, as data becomes available, given the high degree of fluctuation in
the airline industry. Consistent updating is necessary, and creates even more utility in
coordinating the level of multipurpose ground staff and airport services with the expected
number of passengers.
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
2
PROBLEM DESCRIPTION
Compared to established hubs in Philadelphia and Charlotte, Phoenix Sky Harbor
International Airport is US Airways most recent and fastest growing hub for international flights.
US Airways’ international flights to and from Phoenix began in October 2007. As a result, US
Airways has limited historic data on its international passenger traffic in Phoenix.
The goal of our project is to understand the patterns that describe international
passenger traffic in Phoenix using the most current forecasting techniques and publically
available airline data from the Bureau of Transportation Statistics. Our research focus includes
data from all three of US Airways international hubs, Phoenix, Charlotte, and Philadelphia, to
examine if the evolution of international passenger traffic at other established international hubs
can be used to better predict Phoenix traffic. Through our analysis we seek to provide US
Airways, our principal stakeholder, with a model using time series data that helps forecast its
international passenger traffic in the Phoenix airport for the next 12 months. Secondary
stakeholders are airport facility advisors, support services provided by Phoenix airport, and
secondary service providers that rely on the flow of international passengers, such as foreign
language bookshops and souvenir shops. Each stakeholder should take our model and derive
the ability to adapt their services to the evolving needs of US Airways international
passengers more predictably.
As the fastest-growing and most cost-effective of the five largest U.S. airlines, US
Airways is particularly sensitive to increasing operational efficiency. If US Airways can use
historical information and other hubs’ performance to better predict Phoenix air traffic, then US
Airways will be able to better manage its sales and operations as our predictions can impact the
future use of airplanes, flexible ground crews, and optimal scheduling of prized
international flights, a major component of the airline’s growth.
As consultants to US Airways, we want to ensure that our model is sophisticated enough
to predict their international air traffic better than an internal analyst, proving both the value
of competent forecasting and our own credibility. Concurrently, we want to create a dynamic
model that the airline can use to help forecast international air traffic each month, independently
of our efforts. Because stakeholders include air traffic and operations management, it is possible
to adjust the model to slightly over- or under- forecast, correct it for external predictors, and
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
3
create flexibility to account for historical “black swan” unpredictability that allow the airline to
optimize its use of planes and services.
DATA DESCRIPTION
As mentioned above, we utilized US Airways’ monthly international passenger data
from all three of its major hubs over the time periods available from bts.gov sources.
International passenger data for Charlotte and Philadelphia was available from January 2000
through July 2011. In Phoenix, because of the airport’s more recent introduction of international
service, data was available from October 2007 through July 2011. Our analysis utilized all three
data sets, but our core analysis focuses on our primary data set, Phoenix, given our goals.
Our first visualization (below) of the Phoenix data helped us understand the
multiplicative seasonality and upward trend reflected in the data. Seasonality patterns in
Phoenix follow a twelve month cycle. At PHX, a peak is the spring is followed by a consistent
drop in international passenger traffic every September. An additional not of interest is the
anomalous variation towards the end of the time series (highlighted in yellow below). Upon
further research, we determined that this was due to a unique, historic event – in the spring of
2011, Departments of Public Safety across the United States warned U.S. citizens against
traveling in Mexico because of violence related to drug trafficking. As a result, we believe
travel through Phoenix was negatively impacted.1 Despite this anomaly, we decided to leave in
all of the data points to account for the relative frequency of similar events.
1 http://www.dallasnews.com/news/state/headlines/20110301-texas-dps-tells-spring-breakers-to-avoid-mexico-because-of-drug-cartel-violence-and-other-crime.ece
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
4
TECHNICAL SUMMARY
Method: In order to take into account seasonality and trend, we tried modeling the data using
a few different approaches: Naïve Forecasting (to establish a benchmark for our subsequent
models), Additive and Multiplicative Multi-Linear Regression, Polynomial, and Holt-Winter’s
Smoothing.
Naïve Forecast (full details in Appendix A)
Description: After running all of the models, looking at the performance statistics (MAE, Ave.
Error, MAPE, RMSE) and, most importantly, evaluating the residual charts (full charts available
for every model in Appendix B-E), we determined that the Additive MLR model was the best
predictor for Phoenix international passenger traffic (independent of the other two hubs).
However, the chosen model, MLR – Additive Seasonality, did not adequately account
for seasonality (per the comparison of residual charts below). We tried different variations
using ARIMA and found that the most accurate solution was to adjust for the error using AR
methods.
Original MLR – Additive Seasonality Adjusted MLR – Additive Seasonality
Naive MLR – Additive Seasonality
MLR –Multiplicative Seasonality
2nd Degree Polynomial Holt-‐Winter’s
MAE 7,732.50 5,711.19 6,718.70 7,036.27 18,229.08
Avg. Error (4,870.33) (113.12) 187.99 -‐3,363.21 17,836.09
MAPE 13.07% 9.19% 10.51% 10.92% 27.87%
RMSE 8,744.07 6,674.69 7,966.12 9,145.30 23,591.47
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
5
After choosing the best model for predicting US Airways international passenger traffic
at Phoenix, we then layered on Philadelphia and Charlotte international passenger traffic in
numerous ways to see if and how other international hubs can help forecast Phoenix’s
international passenger traffic. First, we added inputs for each hub representing Lag-1. Second,
we added inputs for each hub representing the naïve forecast (essentially Lag-12) which uses
data from twelve months past. Upon running the regression, we found that all of the variables
relating to these two hubs were insignificant and thus concluded that there is no real or visible
relationship between Phoenix international traffic and traffic at other major hubs for US
Airways. Full details are available in Appendix F, as well as the full spectrum of Philadelphia
and Charlotte data, models, and charts (Appendix G – Philadelphia, Appendix H – Charlotte).
Equation: (the additive MLR model + adjustment for error with AR(1))
yt = b0+ b1 t +b2 D2 +b3 D3 +b4 D4 +b5D5 +b6 D6 +b7 D7 +b8 D8 +b9 D9 +b10 D10
+b11 D11 + (� + AR1 *�t-1 + SAR1t)
Forecast: Using our previously chosen model, we forecasted the next twelve months following
our Phoenix data set (e.g. Month 140-151, August 2011 through July 2012). This forecast has
immediate relevance to shift or increase resources day by day, and week by week for US
Airways and Phoenix airport management. The more data becomes available, the more uses
can be derived. Evolution of international passenger traffic makes it likely that the model
should be verified and even revised regularly. We suggest updating it at least every six months
in the interest of using the most recent data, while keeping the forecasting fairly parsimonious
and easy to use for US Airways’ analytic team. Our full forecast and prediction interval is
Lags ACF0 11 0.727892942 0.438580483 0.17266544 -0.033437215 -0.061519736 -0.136615687 -0.238891278 -0.263408849 -0.3712710 -0.4182786611 -0.4933197212 -0.54856366
ACF Values
-‐1
-‐0.5
0
0.5
1
0 1 2 3 4 5 6 7 8 9 10 11 12
ACF
Lags
ACF Plot for Residual
ACF UCI LCI
Lags ACF0 11 0.285933642 0.197171613 -0.029017374 -0.224547955 -0.042975236 -0.052337387 -0.248199948 -0.098597649 -0.3385156410 -0.22361711 -0.1828054612 -0.2484265
ACF Values
-‐1
-‐0.5
0
0.5
1
0 1 2 3 4 5 6 7 8 9 10 11 12
ACF
Lags
ACF Plot for New Residual
ACF UCI LCI
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
6
found in Appendix J. The chart below depicts our modeled forecast beyond actual data
available through July 2011.
Forecasted Section
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
7
Appendix A. Naïve Forecasting.
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
8
Appendix B. Additive MLR.
Lags ACF0 11 0.727892942 0.438580483 0.17266544 -0.033437215 -0.061519736 -0.136615687 -0.238891278 -0.263408849 -0.3712710 -0.4182786611 -0.4933197212 -0.54856366
ACF Values
-‐1
-‐0.5
0
0.5
1
0 1 2 3 4 5 6 7 8 9 10 11 12
ACF
Lags
ACF Plot for Residual
ACF UCI LCI
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
9
Appendix C. Multiplicative MLR
Lags ACF0 11 0.721856472 0.459106393 0.220348754 0.021202455 -0.008150376 -0.12315537 -0.249621518 -0.260192399 -0.369432310 -0.4108701611 -0.5017022512 -0.60175997
ACF Values
-‐1
-‐0.5
0
0.5
1
0 1 2 3 4 5 6 7 8 9 10 11 12
ACF
Lags
ACF Plot for Residual
ACF UCI LCI
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
10
Appendix D. 2nd Degree Polynomial
Lags ACF0 11 0.664378232 0.362590643 0.057191534 -0.208811955 -0.489209126 -0.605575567 -0.49026628 -0.243296599 -0.0623819810 0.1663718111 0.3807587612 0.44822583
ACF Values
-‐1
-‐0.5
0
0.5
1
0 1 2 3 4 5 6 7 8 9 10 11 12
ACF
Lags
ACF Plot for Actual Value
ACF UCI LCI
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
11
Appendix E. Holt-Winter’s Smoothing.
Lags ACF0 11 0.820901042 0.62391463 0.438870164 0.293168515 0.20084046 0.146558547 0.083116118 0.05565669 0.0105912410 -0.0101716411 0.016852412 0.04724447
ACF Values
-‐1
-‐0.5
0
0.5
1
0 1 2 3 4 5 6 7 8 9 10 11 12
ACF
Lags
ACF Plot for Residuals
ACF UCI LCI
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
12
Appendix F. Model of Philadelphia and Charlotte data to predict Phoenix international passenger traffic
The Regression Model
Coefficient Std. Error p-value SS34059.65625 36204.42969 0.36000499 94254340000 17418.0504456 122.8245697 0.003381 518875100 0.9046262749855.724609 6510.30127 0.14842966 354820000 5144.0385749713.866211 6724.275879 0.16675071 451665000 4498393008229.101563 7449.505371 0.28470105 10715290009889.301758 11357.90625 0.39604846 580195800-8905.42676 13937.03418 0.53135455 17472270-9111.32129 23054.61719 0.69760585 18091630-4168.74414 24546.38086 0.86714745 2297245-10351.6738 26320.81055 0.69899482 14793040-24691.2012 32295.86523 0.45503291 769933900-10468.335 23223.64648 0.65785742 228943200
-7385.17529 8559.124023 0.40023336 175488500-0.31917939 0.26289284 0.24130528 43256210-0.06092461 0.27132195 0.82500821 326264.34380.31263313 0.41695362 0.46362436 14620290-0.2271006 0.55385715 0.68690151 4448868Char Int NF
Season_9Season_10Season_11Philly Int L1Char Int L1Philly Int NF
Season_3Season_4Season_5Season_6Season_7Season_8
Input variablesResidual dfMultiple R-squaredStd. Dev. estimateResidual SS
Constant termMonthSeason_1Season_2
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
13
Appendix G. Philadelphia Hub Data, Visualization, and Models
Visualization
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
14
Naïve Forecast
Residual
Linear Regression
40,000
60,000
80,000
100,000
120,000
140,000
160,000
-‐10 10 30 50 70 90 110 130 150
International Passengers
Month (Jan 2000-‐Jul 2011)
US Airways International Passengers -‐ Philadelphia
INTERNATIONAL -‐ Actual Predicted
-‐30000
-‐20000
-‐10000
0
10000
20000
30000
-‐10 10 30 50 70 90 110 130 150
International Passengers
Residual
Residual
Of the Validation Set
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
15
Linear Regression
40000
60000
80000
100000
120000
140000
160000
-‐10 10 30 50 70 90 110 130 150
International Passengers
Month (Jan 2000 -‐ Jul 2011)
International Passengers by Day -‐ Philadelphia -‐ Linear Regression
Predicted Actual
-‐30000
-‐20000
-‐10000
0
10000
20000
30000
-‐10 10 30 50 70 90 110 130 150
International Passengers
ErrorOf the Validation Set
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
16
ACF Chart
Lags ACF0 11 0.779110792 0.565246763 0.375680124 0.181334385 0.051954456 0.012051897 0.070165118 0.161001899 0.2268183210 0.2480409611 0.3394291412 0.33980766
ACF Values
-‐1
-‐0.5
0
0.5
1
0 1 2 3 4 5 6 7 8 9 10 11 12
ACF
Lags
ACF Plot for Error
ACF UCI LCI
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
17
Second-Degree Polynomial
40000
60000
80000
100000
120000
140000
160000
-‐10 10 30 50 70 90 110 130 150
International Passengers
Month (Jan 2000-‐Jul 2011)
International Passengers -‐ Philadelphia -‐ Polynomial Regression
Predicted Actual
-‐25000
-‐20000
-‐15000
-‐10000
-‐5000
0
5000
10000
15000
20000
25000
-‐10 10 30 50 70 90 110 130 150
International Passengers
Error
Of the Validation Set
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
18
Lags ACF0 11 0.725428462 0.448388763 0.203191374 -0.02896575 -0.178794356 -0.18736387 -0.090189788 0.063116369 0.1749769710 0.2143886411 0.2961700612 0.25871855
ACF Values
-‐1
-‐0.5
0
0.5
1
0 1 2 3 4 5 6 7 8 9 10 11 12
ACF
Lags
ACF Plot for Error
ACF UCI LCI
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
19
Holt-Winter Smoothing Model
30000
50000
70000
90000
110000
130000
150000
170000
-‐10 10 30 50 70 90 110 130 150
International Passengers
Month (Jan 2000 -‐ Jul 2011)
International Passenger Data -‐ Philadephia -‐ Holt-‐Winter's Smoothing Model
Actual Forecast
-‐40000
-‐30000
-‐20000
-‐10000
0
10000
20000
30000
40000
50000
-‐10 10 30 50 70 90 110 130 150
International Passengers
Error
Of the Validation Set
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
20
Lags ACF0 11 0.745900032 0.523990693 0.291489184 0.069408145 -0.089279176 -0.130413777 -0.100537268 -0.027118429 0.005978610 0.0036608911 0.0607258512 0.07369822
ACF Values
-‐1
-‐0.5
0
0.5
1
0 1 2 3 4 5 6 7 8 9 10 11 12
ACF
Lags
ACF Plot for Residuals
ACF UCI LCI
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
21
Appendix H. Charlotte Hub Data, Visualization, and Models
Visualization
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
22
Naïve Forecast
Of the Validation Set
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
23
Linear Regression
Of the Validation Set
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
24
Lags ACF0 11 0.832736372 0.672086953 0.512105414 0.395605035 0.325604116 0.292568687 0.263537568 0.285036569 0.3352407810 0.3655091811 0.3605510912 0.30536404
ACF Values
-‐1
-‐0.5
0
0.5
1
0 1 2 3 4 5 6 7 8 9 10 11 12
ACF
Lags
ACF Plot for Residuals
ACF UCI LCI
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
25
2nd Degree Polynomial Regression
Of the Validation Set
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
26
Lags ACF0 11 0.745107832 0.503648943 0.285808894 0.120560435 -0.002439466 -0.06826827 -0.126615768 -0.097209119 -0.030565610 0.0085102911 0.0098193712 -0.05770537
ACF Values
-‐1
-‐0.5
0
0.5
1
0 1 2 3 4 5 6 7 8 9 10 11 12
ACF
Lags
ACF Plot for Residuals
ACF UCI LCI
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
27
Holt-Winter’s Smoothing Model
Of the Validation Set
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
28
Lags ACF0 11 0.688838012 0.406006013 0.143953234 -0.106071675 -0.275335616 -0.318840037 -0.295816248 -0.178329079 0.0242169410 0.1462660111 0.2082496412 0.26288953
ACF Values
-‐1
-‐0.5
0
0.5
1
0 1 2 3 4 5 6 7 8 9 10 11 12
ACF
Lags
ACF Plot for Residuals
ACF UCI LCI
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
29
Appendix I. Model Forecast and Useful Prediction Intervals
The Regression Model
Coefficient ARIMA Coefficient
13579.98242 Const. term -0.00035575
411.3641968 AR1 0.66156042
2993.385742 SAR1 -0.12542911
1503.771484
6685.907227
757.5430908
-9556.07129 MonthMLR
PredictionPrevious Mo. Error
Naïve Prev. Mo.
Error
Predicted Error
Model prediction
-8846.43555 140 54,805 2,410 1,594 56,399 -10158.2998 141 39,956 (4,522) (2,992) 36,965 -16366.2285 142 57,479 (2,686) (1,777) 55,702 -31625.9258 143 62,965 (3,710) (2,454) 60,510 -14514.5215 144 72,816 (3,373) (2,232) 70,585 -9440.38574 145 76,221 589 390 76,611
146 75,143 (1,169) (774) 74,369 147 80,736 1,396 924 81,660 148 75,219 1,061 702 75,921 149 65,317 2,315 1,531 66,848 150 66,438 2,928 1,937 68,375 151 65,538 4,241 2,806 68,343
Season_10
Season_11
ARIMA Model
PredictionSeason_4
Season_5
Season_6
Season_7
Season_8
Season_9
Input variables
Constant term
Month
Season_1
Season_2
Season_3
Forecasted Section
BSFC - US Airways Final Project Megan Hanson, Delfin de las Heras Rico, Philip Sclafani, Yating Yu
30
Final comparison of three graph lines: Naïve, Our Model and Actual Data