bayesian travel time reliability models

BAYESIAN TRAVEL TIME RELIABILITY MODELS

Morgan State University The Pennsylvania State University

University of Maryland University of Virginia

Virginia Polytechnic Institute & State University West Virginia University

The Pennsylvania State University The Thomas D. Larson Pennsylvania Transportation Institute

Transportation Research Building University Park, PA 16802-4710 Phone: 814-865-1891 Fax: 814-863-3707

www.mautc.psu.edu

Bayesian Travel Time Reliability Models

By Feng Guo, Dengfeng Zhang, and Hesham Rakha

Mid-Atlantic Universities Transportation Center Final Report

Virginia Tech Transportation Institute, Department of Statistics,

Virginia Polytechnic Institute and State University

June 30, 2015

DISCLAIMER

The contents of this report reflect the views of the authors, who are responsible for the facts and the accuracy of the information presented herein. This document is disseminated under the sponsorship of the U.S. Department of Transportation’s University Transportation Centers Program, in the interest of information exchange. The U.S. Government assumes no liability for the contents or use thereof.

1. Report No. 2. Government Accession No. 3. Recipient’s Catalog No.

4. Title and Subtitle Bayesian Travel Time Reliability Models

5. Report Date June 30, 2015,

6. Performing Organization Code Virginia Tech

7. Author(s) Feng Guo, Dengfeng Zhang, and Hesham Rakha

8. Performing Organization Report No.

9. Performing Organization Name and Address Virginia Tech Transportation Institute 3500 Transportation Research Plaza Blacksburg, VA 24061

10. Work Unit No. (TRAIS)

11. Contract or Grant No.

12. Sponsoring Agency Name and Address

US Department of Transportation Research & Innovative Technology Admin UTC Program, RDT-30 1200 New Jersey Ave., SE Washington, DC 20590

13. Type of Report and Period Final Report, 6/2012-6/2015 14. Sponsoring Agency Code

15. Supplementary Notes

16. Abstract Travel time reliability is a stochastic process affected by multiple factors, with traffic volume being the most important one. This study built up and advanced the multi-state models by proposing regressions on the proportions and distribution parameters for underlying traffic states. The Bayesian analysis provides valid credible intervals for each parameter without asymptotic assumption. Two alternative approaches were proposed and evaluated. The first approach is a Bayesian multi-state travel time regression model which provides a regression for key model parameters to traffic volume; the second approach is a hidden Markov regression which not only provides a link between key model parameters and traffic volume, but also incorporates the dependency structure among traffic volume in adjacent time windows. Both approaches provide advanced methodology for modeling traffic time reliability under complex stochastic scenarios.

t

17. Key Words Traffic simulation, traffic modeling, driver behavior, car following

19. Security Classif. (of this report) 20. Security Classif. (of this page) 21. No. of Pages 22. Price 47

ABSTRACT

Travel time reliability is a stochastic process affected by multiple factors, with traffic volumebeing the most important one. This study built up and advanced the multi-state models byproposing regressions on the proportions and distribution parameters for underlying trafficstates. The Bayesian analysis provides valid credible intervals for each parameter withoutasymptotic assumption. Two alternative approaches were proposed and evaluated. The firstapproach is a Bayesian multi-state travel time regression model which provides a regressionfor key model parameters to traffic volume; the second approach is a hidden Markov regres-sion which not only provides a link between key model parameters and traffic volume, butalso incorporates the dependency structure among traffic volume in adjacent time windows.Both approaches provide advanced methodology for modeling traffic time reliability undercomplex stochastic scenarios.

Contents

1 Introduction 1

2 Travel Time Reliability: The Bayesian Multi-state Travel Time RegressionModel 42.1 Introduction and model specification . . . . . . . . . . . . . . . . . . . . . . 42.2 Model Fitting using Markov Chain Monte Carlo Algorithm . . . . . . . . . 6

2.2.1 Model 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.2 Model 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.1 Model comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.2 Simulation Evaluation for Model 2 . . . . . . . . . . . . . . . . . . . 142.3.3 Robustness of Misspecified θs . . . . . . . . . . . . . . . . . . . . . . 16

2.4 Model Application to Field-collected Data . . . . . . . . . . . . . . . . . . . 192.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Travel Time Reliability: Hidden Markov Model 243.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3.1 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3.2 Model Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.3.3 Bootstrap and Confidence Interval . . . . . . . . . . . . . . . . . . . 323.3.4 Determine the Number of Components . . . . . . . . . . . . . . . . . 343.3.5 Goodness of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3.6 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.4.1 No Covariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.4.2 With Covariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.5 Application for Field-Collected Data . . . . . . . . . . . . . . . . . . . . . . 463.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4 Summary 54

ii

List of Figures

2.1 Illustration of Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Average Traffic Volume by Hour of a Day . . . . . . . . . . . . . . . . . . . . 102.3 Probability of Congested State versus Traffic Volume in Simulation Studies . 122.4 Model 1 vs. Model 2: Coverage Probabilities Comparison in Five Settings . . 142.5 Model 2: Coverage Probabilities Comparison . . . . . . . . . . . . . . . . . . 162.6 Misspecified and True Model Comparison . . . . . . . . . . . . . . . . . . . . 182.7 Theoretical, Misspecified and True Model Comparison . . . . . . . . . . . . . 192.8 Parameters Estimates under Different θ′ss . . . . . . . . . . . . . . . . . . . . 212.9 Probability in Congested State and Traffic Volume: Real Data . . . . . . . . 22

3.1 Autocorrelation Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 Box-Cox Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3 Hidden Markov Model: An Illustration . . . . . . . . . . . . . . . . . . . . . 273.4 Illustration of Two States Markov Chain . . . . . . . . . . . . . . . . . . . . 283.5 Hidden Markov Model: Flow Chart . . . . . . . . . . . . . . . . . . . . . . . 303.6 Confidence Interval by Profile Likelihood . . . . . . . . . . . . . . . . . . . . 333.7 Hidden Markov Model: Estimation . . . . . . . . . . . . . . . . . . . . . . . 383.8 HMM vs. Traditional 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.9 HMM vs. Traditional 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.10 HMM vs. Traditional 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.11 HMM vs. Traditional 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.12 95% C.I. of HMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.13 Illustration of Low Sampling Rate . . . . . . . . . . . . . . . . . . . . . . . . 463.14 Illustration of Potential Improvement . . . . . . . . . . . . . . . . . . . . . . 473.15 Histogram of the Log Likelihood Ratio . . . . . . . . . . . . . . . . . . . . . 483.16 χ2 and Empirical Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 493.17 Illustration of Three States Markov Chain . . . . . . . . . . . . . . . . . . . 513.18 Residual Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

iii

List of Tables

2.1 Variance of Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Models 1 and 2: Average of Posterior Means Comparison . . . . . . . . . . . 132.3 Models 1 and 2: Coverage Probabilities Comparison . . . . . . . . . . . . . . 132.4 Model 2 between Settings 2 and 3: Coverage Probabilities Comparison . . . 142.5 More Results of Model 2: Coverage Probabilities . . . . . . . . . . . . . . . . 152.6 Misspecified Models: Average of Posterior Means Comparison . . . . . . . . 172.7 Misspecified Models: Coverage Probabilities Comparison . . . . . . . . . . . 172.8 Results from Real Data with Different θ′ss . . . . . . . . . . . . . . . . . . . 20

3.1 HMM vs. Traditional: No Covariate 1 . . . . . . . . . . . . . . . . . . . . . 393.2 HMM vs. Traditional: No Covariate 2 . . . . . . . . . . . . . . . . . . . . . 403.3 Parameter Estimation of HMM . . . . . . . . . . . . . . . . . . . . . . . . . 463.4 Kolmogorov-Smirnov Test Result . . . . . . . . . . . . . . . . . . . . . . . . 483.5 Parameter Estimation for Real Data . . . . . . . . . . . . . . . . . . . . . . 50

iv

Chapter 1

Introduction

The objective of this study is to develop Bayesian multi-state travel time reliability modelsfor evaluating travel time uncertainty under various traffic conditions. The reliability oftravel time is a key performance index of transportation system and has been a majortransportation research area. Reliability is one of the four key focus areas of the SecondStrategic Highway Research Plan (SHRP2). The Federal Highway Administration (FHWA)defines travel time reliability as “consistency or dependability in travel times, as measuredfrom day-to-day or across different times of day.” Understanding the nature of travel timereliability will help individual travelers for trip planing and trip decision making, as well asfacilitating transportation management agencies to improve the efficiency of transportationsystem.

Travel time is affected by multiple factors such as traffic condition, weather, and incidents.Many of these factors are random in nature and stochastic models should be used to quantifythe uncertainty associated with travel time. Traditionally, uni-mode distributions have beenadopted for travel time reliability modeling and the log-normal distribution has been themost popular model (Emam and Ai-Deek 2006, Tu et al. 2008). A number of candidate dis-tributions have been discussed and compared: lognormal, gamma, Weibull and exponentialdistributions. However, these approaches could not accommodate the high level of hetero-geneity commonly presented in the travel time data. Thus the single-mode distributionsusually yield poor model fitting under complex travel conditions, especially during peakhours of a day (Guo et al. 2010).

Compared to single-mode distributions, mixture distributions can accommodate data withmultiple modes and are flexible in modeling data generated from complex systems (Fowlkes1979).The multi-state travel time reliability model has been demonstrated to provide su-perior data fitting, scientifically sound interpretation, as well as close relationship with theunderline traffic flow characteristics (Guo et al. 2012, Park et al. (2010)). The advantages ofmixture normal and mixture lognormal have been demonstrated in the application of field-collected data (Guo et al. 2012). This study is based on the multi-state travel time model

1

2

framework.

One of the most attractive features of the multi-state model is its capability to associatetravel time distribution with underlying traffic conditions. Park et al. (2010) showed thattravel time states are related to the fundamental diagram, i.e., traffic flow, speed, and density.Two levels of uncertainty can be quantitatively assessed: the probability of a given trafficcondition, for example, congested or free-flow and the variation of travel time within eachtraffic condition.

Besides the free-flow and congested states, the model can also accommodate delay caused bytraffic incidents (Park et al. (2011)). However, one of the most important factors affectingtravel time, the traffic volume, has yet to be incorporated into the multi-state model. Thisstudy advanced the previous methods using two alternative approaches to incorporate theinfluence of traffic volume: Bayesian mixed-effect travel time regression model and hiddenMarkov model.

The traffic volume, defined as the number of vehicles traveling through a specific segment ofthe road within a specific time period, plays an essential role in the present research. Thestudy extended the multi-state travel time model by incorporating the effects from trafficvolume. The proposed models were applied to field data collected along a section of the I-35freeway in San Antonio, Texas (hereinafter I-35 data). The study covers a sixteen kilometersection with an average daily traffic volume around 150,000 vehicles. The travel time wascollected when vehicles tagged by a radio frequency device passed the identification stationson New Braunfels Ave. (Station no. 42) and O’Connor Rd. (Station no. 49). We set thetime period to summarize the traffic volume as one hour, and collected the traffic volumefrom [0:00, 0:59] to [23:00, 23:59] for more than 20 weekdays.

We proposed several Bayesian multi-state regression models to incorporate traffic volumeinto the estimation of probability encountering congested traffic state. The model was fittedusing the Markov Chain Monte Carlo (MCMC) algorithms, which enable us to obtain theposterior distribution of models parameters as well as the uncertainty of estimation (Lenkand DeSarbo 2000). As adopted the probit link function, which is more convenient in theBayesian context compared to logit function because the corresponding Gibbs sampler iseasier to implement (Geweke and Keane 1997).

The Bayesian multi-state regression models discussed above is based on the assumptionthat all of the observations are conditionally independent. The independence assumptionis typically not satisfied for travel time among periods close to each other. We proposeda hidden Markov model to incorporate the dependency structure among travel time datacollected in adjacent time units (Baum and Petrie 1966). The Hidden Markov model can beseen as a mixture model which relaxes the independence assumption (Qi et al. 2007). It isable to incorporate the dependency structure of observations, and also include the traditionalmixture model as a special case (Scott 2002).

Hidden Markov models have bee applied in a wide variety of applications, including speech

3

recognition (Rabiner 1989), biometrics (Albert 1991), econometrics (Hamilton 1989), andcomputational biology (Krogh et al. 1994). We developed hidden Markov models for traveltime reliability evaluation. The proposed model incorporate the impact of traffic volumein the transition matrix of the Markov process. The results show that the hidden Markovmodel outperforms traditional mixture models.

Chapter 2

Travel Time Reliability: The BayesianMulti-state Travel Time RegressionModel

2.1 Introduction and model specification

Travel time of vehicles contains substantial variability. The Federal Highway Administrationhas formally defined “Travel time reliability as consistency or dependability in travel times,as measured from day-to-day or across different times of day.”Understanding the natureof travel time reliability will help individual travelers for trip planing and trip decisionmaking, as well as facilitating transportation management agencies to improve the efficiencyof transportation system.

The multi-state model has been developed for modeling travel time reliability, and one ofthe most attractive features is its capability to associate travel time with underlying trafficconditions. In the Gaussian mixture model, the travel time variable, Y , is assumed to followa two-component mixture distribution with density function:

f(y|λ, µ1, µ2, σ21, σ

22) = λfN(y|µ1, σ

21) + (1− λ)fN(y|µ2, σ

22)

where fN represents the density function of a normal distribution with mean µi and varianceσ2i . Without loss of generality, we assume that µ1 < µ2. Under this condition, µ1 and µ2

indicate the mean travel time under free-flow state and congested state. Subsequently, λand 1− λ are the probability of free-flow state and congested state. σ2

1 and σ22 represent the

variance of travel time under the free-flow state and the congested state.

The probability of the free-flow state, denoted by λ, has support in (0, 1). To link λ with

4

5

traffic volume x, a common approach is to use the logit link function:

log(λ

1− λ) = β0 + β1 ∗ x, or more general,

log(λ

1− λ) = Xβ,

where covariates matrix X contains 1’s as the first column, and β is a vector of regressioncoefficients.

The traffic volume is defined as the number of vehicles traveling through a specific segmentof road within a given time period. An alternative for the logit link function is the probitlink function, which is the inverse of standard normal cumulative distribution function:

Φ−1(1− λ) = Xβ

For Bayesian models, the probit function is preferred due to its ease in Markov Chain MonteCarlo simulation to generate the posterior distribution. In the probit model, a latent variablewi ∈ R is introduced for each observation to indicate which group the observation belongsto:

yi ∈{Group1 if wi < 0Group2 otherwise

Assume the latent variable wi ∼ N(Xiβ, 1), where Xi is the ith row in matrix X. It can beshown that:

λ = 1− Φ(Xβ) = P (w < 0|µ = Xβ, σ2 = 1)

This setting establishes the relationship between the proportion of two latent groups and thecovariate(s). The likelihood function is correspondingly fN(y|µ1, σ

21)I(w<0)fN(y|µ2, σ

22)I(w≥0).

As shown by Guo et al. (2012), the variability in the mean travel speed in the congested state,µ2, can be substantial. From an engineering perspective, there exist certain relationshipsbetween µ2 and the traffic volume xi. Two alternative models were proposed to relate µ2

with traffic volume:(1)µ2i = θ0 + θ1 ∗ xi = Xiθ

(2)µ2i = θs ∗ µ1 + θ ∗ xiThe first model assumes that µ1 and µ2 are estimated independently. The second modelassumes that the intercept is proportional to µ1 with a predetermined scale parameter θs.With proper selection of θs, the second model can ensure that the estimated main traveltime for the free flow and congested condition are sufficiently separated.

Following the convention of the Bayesian approach, we use the precision parameter ψj todenote the inverse of the variance of the two components (i.e. 1/σ2

j , j=1,2).

Two levels of uncertainty are quantitatively assessed in the proposed model. The first level ofuncertainty is the probability of a given traffic condition, for example, congested or free-flow;the second level of uncertainty is the variation of travel time for each traffic condition.

6

To complete the Bayesian model setup, the following non-informative priors are adoptedaccording to Yang and Berger (1996):

π(µ1) ∝ 1, π(β0) ∝ 1, π(β1) ∝ 1, π(θ0) ∝ 1, π(θ1) ∝ 1, π(ψ1) ∝ 1/ψ1, π(ψ2) ∝ 1/ψ2

It is desirable that a Bayesian model not be sensitive to the choice of prior distributions.Several alternative priors, such as the normal distribution with different variance, are testedas shown in (Table 2.1). As can be seen, the results are not significantly influenced and theyare quite similar to that from non-informative priors, most likely due to the large samplesize. Therefore, non-informative priors are used in the model.

Table 2.1: Variance of Priors

σ2β ∞ 10 100 1000σ2θ ∞ 10 100 1000

2.2 Model Fitting using Markov Chain Monte Carlo

Algorithm

The conclusions of the study are based on the posterior distribution of the parameters asshown below:

f(µ1, ψ, β, θ, w|X, y) ∝ f(y|µ1, ψ, β, θ, w,X)f(µ1, ψ, β, θ, w|X)

∝ f(y|µ1, ψ, w, θ)f(w|X, β)f(µ1, ψ, β, θ|X)

∝ f(y|µ1, ψ, w, θ)f(w|X, β)π(µ1)π(ψ1)π(ψ2)π(β)π(θ),

were f(y|µ1, ψ, w, θ) is the density function of multi-state normal distribution:

fN(y|µ1, 1/ψ1)I(w<0)fN(y|Xθ, 1/ψ2)

I(w≥0), and f(w|X, β) is the multivariate normal withmean Xβ and covariance matrix I.

Since there is no closed form solution for the above posterior distribution, simulation-basedMarkov Chain Monte Carlo algorithm is used to estimate the posterior distribution. TheMCMC algorithm samples posterior distribution from full condition distribution for eachparameter. The conditional distributions are developed in the following subsection.

2.2.1 Model 1

The full conditional distribution for each parameter in Model 1 is shown below.

7

1. The full conditional for w:

f(w|...) ∝n∏i=1

(fN(yi|µ1, 1/ψ1)I(wi < 0) + fN(yi|Xiθ, 1/ψ2)I(wi ≥ 0))fN(wi|Xiβ, 1)

This is the multi-state truncated normal. Define a = fN(yi|µ1, 1/ψ1), b = fN(yi|Xiθ, 1/ψ2),then with probability a

a+b, wi is sampled from fN(wi|Xiβ, 1) truncated at wi < 0; with

probability ba+b

, wi is sampled from fN(wi|Xiβ, 1) truncated at wi ≥ 0.

2. The full conditional for µ1:

f(µ1|...) ∝n∏i=1

(fN(yi|µ1, 1/ψ1)I(wi < 0) + fN(yi|Xiθ, 1/ψ2)I(wi ≥ 0))

∝∏i:wi<0

fN(yi|µ1, 1/ψ1)

∼ N(∑i:wi<0

yin1

,1

n1ψ1

)

This is a univariate normal distribution. n1 is the number of w′is that are smallerthan 0. Corresponding to the model assumption µ1 < µ2, we will right truncate thisdistribution at min(Xiθ).

3. The full conditional for ψ1:

f(ψ1|...) ∝ ψ−11

n∏i=1

fN(yi|µ1, 1/ψ1)I(wi<0)fN(yi|Xiθ, 1/ψ2)

I(wi≥0)

∼ ψn12−1

1 exp(−1

2ψ1

∑i:wi<0

(yi − µ1)2)

This is the Gamma distribution with shape parameter n1

2and rate parameter 1

2

∑i:wi<0(yi−

µ1)2.


f(ψ2|...) ∝ ψ−12

n∏i=1


I(wi≥0)

∼ ψn22−1

2 exp(−1

2ψ2

∑i:wi≥0

(yi −Xiθ)2)

where n2 is the number of w′is that are greater than or equal to 0. This is the Gammadistribution with shape parameter n2

2and rate parameter 1

2

∑i:wi≥0(yi −Xiθ)

2.

8

5. The full conditional for β:

f(β|...) ∝n∏i=1

f(wi|Xi, β)

∝ exp(−∑n

i=1(wi −Xiβ)2

2)

This is the bivariate normal distribution with mean (XTX)−1XTw and covariancematrix (XTX)−1.

6. The full conditional for θ:

f(θ|...) ∝n∏i=1


I(wi≥0)

∝∏i:wi≥0

fN(yi|Xiθ, 1/ψ2)

Define:Σ+ is the n2 ∗ n2 diagonal matrix with the diagonal elements being 1/ψ2’s.X+ is the submatrix of X such columns i that wi ≥ 0y+ is the subvector of y such elements i that wi ≥ 0, then:

f(θ|...) ∝ |Σ+|−12 exp(−1

2(y+ −X+θ)

TΣ−1+ (y+ −X+θ))

∼ N((XT+Σ−1+ X+)−1XT

+Σ−1+ y+, (XT+Σ−1+ X+)−1)

This is the bivariate normal with mean (XT+Σ−1X+)−1XT

+Σ−1y+ and covariance matrix(XT

+Σ−1X+)−1.

2.2.2 Model 2

Compared to model 1, this model has one fewer parameter and the full conditional distribu-tions have been changed accordingly.

1. The full conditional for w:

f(w|...) ∝n∏i=1

(fN(yi|µ1, 1/ψ1)I(wi < 0) + fN(yi|θs ∗ µ1 + θ ∗ xi, 1/ψ2)I(wi ≥ 0))fN(wi|Xiβ, 1)

2. The full conditional for µ1:

f(µ1|...) ∝n∏i=1

fN(yi|µ1, 1/ψ1)I(wi < 0) + fN(yi|θs ∗ µ1 + θ ∗ xi, 1/ψ2)

∼ N(ψ1

∑i:wi<0 yi + θsψ2

∑i:wi≥0(yi − θ ∗ xi)

n1ψ1 + θsn2ψ2

,1

n1ψ1 + θsn2ψ2

)

9

This is still a univariate normal distribution, but the parameters are different comparedwith model 1.


f(ψ1|...) ∝ ψ−11

n∏i=1

fN(yi|µ1, 1/ψ1)I(wi<0)fN(yi|θs ∗ µ1 + θ ∗ xi, 1/ψ2)

I(wi≥0)

∼ ψn12−1

1 exp(−1

2ψ1

∑i:wi<0

(yi − µ1)2)


f(ψ2|...) ∝ ψ−12

n∏i=1

fN(yi|µ1, 1/ψ1)I(wi<0)fN(yi|θs ∗ µ1 + θ ∗ xi, 1/ψ2)

I(wi≥0)

∼ ψn22−1

2 exp(−1

2ψ2

∑i:wi≥0

(yi − θs ∗ µ1 − θ ∗ xi)2)

5. The full conditional for β:

f(β|...) ∝n∏i=1

f(wi|Xi, β)

∝ exp(−∑n

i=1(wi −Xiβ)2

2)

This is the bivariate normal distribution with mean (XTX)−1XTw and covariancematrix (XTX)−1.

6. The full conditional for θ:

f(θ|...) ∝n∏i=1

fN(yi|µ1, 1/ψ1)I(wi < 0) + fN(yi|θs ∗ µ1 + θ ∗ xi, 1/ψ2)

∝∏i:wi≥0

fN(yi|θs ∗ µ1 + θ ∗ xi, 1/ψ2)

∼ N(

∑i:wi≥0(yi − θs ∗ µ1)xi∑

i:wi≥0 x2i

,1

ψ2

∑i:wi≥0 x

2i

)

2.3 Simulation Study

We conducted a simulation study to examine the proposed models based on the data setcollected on Interstate I-35 near San Antonio, Texas (Guo et al. 2010). The study corridor

10

covered a 16 kilometer section with an average daily traffic volume around 150,000 vehicles.The travel time was collected when vehicles tagged using a radio frequency device passedthe automatic vehicle identification stations on New Braunfels Avenue (Station no. 42) andOConnor Road (station no. 49). Figure 2.1 illustrates data collection sites setup.

Figure 2.1: Illustration of Data Collection

The traffic volume is the number of vehicles traveling through the road segment during aspecific time period. The analysis unit is set to one hour, and the hourly traffic volume from[0:00, 0:59] to [23:00, 23:59] is calculated. Although hourly traffic volume is adopted in thisanalysis, in theory any time unit can be used if there are sufficient data within the time unit.The data set contains 237 distinct hours of observations. We average the traffic volume bythe hours of a day. Figure 2.2 illustrates the range of average traffic volume by hour.

Figure 2.2: Average Traffic Volume by Hour of a Day

In the original data only vehicles equipped with electronic tags were counted. These vehiclesaccount for a proportion of the total traffic. In order to estimate the real traffic volume, wesimulated new data sets according to the shape of the original data and extended it by a

11

scale:

Yij : Simulated traffic volume of hour i in day j. Yij = [c ∗ µi + εij]+, εij ∼ N(0, d2)

Xij : Original traffic volume of hour i in day j. i=0...23, j= 1...10 or 11

µi : Average traffic volume of hour i. µi =

∑kXik

Numbe of days

Based on historical data and engineering judgment, we selected d=100 and c=50. Thetraffic volume and the travel time data sets were generated according to the following pro-cedure.

For a given model and a set of predetermined parameters, the simulation study is conductedas follows:

1. Set n=Number of simulations we plan to run.

2. For (i in 1:n){

Generate data

Do{

Markov Chain Monte Carlo

}While convergence

Record if the 95% credible intervals cover the true values

}

For each round simulation, we ran more than 5,000 MCMC iterations and ensured conver-gence of the MCMC chains. The inference statistics, such as posterior mean, 95% credibleinterval, and coverage probabilities are calculated. The analysis focuses on the compari-son between Model 1 and Model 2, model performance, and robustness under misspecifiedparameters.

2.3.1 Model comparison

The main difference between Model 1 and Model 2 is that Model 2 provides a mechanism tocontrol the difference between free-flow mean travel time µ1 and the baseline of congestedmean travel time µ2 via the parameter θs. The θs could be determined via engineeringexpertise and preliminary data analysis. It is tempting to set θs = 1, which corresponding tothe scenario that intercept for congested state equals to the free flow travel time. However,initial analyses indicate that the model identifiability issue rises when θs is too close to 1.The identifiability issue is caused by the fact that when θs is small, the mean travel timeof the two component distribution, i.e., free flow and congested flow are too close to eachother. To provide sufficient separation between the two component distributions, we set theminimum θs values at θs = 1.2 in the simulation study

The values of µ1, ψ1, ψ2, β0 and β1 are set according to historical data. Figure 2.3 showsthe relationship between probability in the congested state and the traffic volume.

12

Figure 2.3: Probability of Congested State versus Traffic Volume in Simulation Studies

Five different settings of θ0(θs) and θ1 are evaluated. The results are summarized in Table2.2. As can be seen, the point estimates of the parameters generally are very close to the truevalues. Model 2 seems to be slightly better than Model 1, but the difference is minimal.

One key criterion for evaluating model performance is the coverage probability. Ideally thecoverage probability of the posterior credible intervals should cover the true model parameterat the nominal significant level, i.e., for 1,000 simulation, 95 % of times (950 out of 1000) theresulting 95% posterior credible intervals should cover the true model parameter. As can beseen in Table 2.3 and Figure 2.4, the coverage probability for parameter µ1 is close to 0.95 inmost cases. However, the coverage probability for β0, β1 and θ1 could be quite off, especiallywhen the two component distributions are similar to each other. Overall performance ofModel 2 is better than Model 1, especially when the two component distributions are similarto each other.

13

Table 2.2: Models 1 and 2: Average of Posterior Means Comparison

µ1 ψ1 ψ2 β0 β1 θ0(θs) θ1Setting 1 500 0.01 1.0e-4 -3 0.005 600 (1.2) 0.6Model 1 499.9 0.010 9.8e-5 -2.93 0.0049 592.9 0.61Model 2 499.9 0.010 9.8e-5 -2.94 0.0049 0.60Setting 2 500 0.01 1.0e-4 -3 0.005 650 (1.3) 0.6Model 1 499.9 0.010 9.3e-5 -2.98 0.0049 648.1 0.60Model 2 500.0 0.010 9.9e-5 -2.98 0.0050 0.60Setting 3 500 0.01 1.0e-4 -3 0.005 650 (1.3) 0.3Model 1 499.9 0.0093 9.3e-5 -2.83 0.0047 628.2 0.32Model 2 500.0 0.010 9.5e-5 -2.88 0.0048 0.30Setting 4 500 0.01 1.0e-4 -3 0.005 700 (1.4) 0.3Model 1 499.9 0.010 9.8e-5 -2.96 0.0049 694.8 0.31Model 2 500.0 0.010 9.8e-5 -2.96 0.0049 0.30Setting 5 500 0.01 1.0e-4 -3 0.005 750 (1.5) 0.3Model 1 500.0 0.010 9.9e-5 -2.98 0.0049 747.1 0.30Model 2 500.0 0.010 9.9e-5 -3.00 0.0050 0.30

Table 2.3: Models 1 and 2: Coverage Probabilities Comparison

µ1 ψ1 ψ2 β0 β1 θ0(θs) θ1Setting 1 500 0.01 1.0e-4 -3 0.005 600 (1.2) 0.6Model 1 0.94 0.87 0.85 0.71 0.77 0.64 0.72Model 2 0.93 0.91 0.79 0.69 0.78 0.97Setting 2 500 0.01 1.0e-4 -3 0.005 650 (1.3) 0.6Model 1 0.96 0.95 0.93 0.93 0.91 0.91 0.93Model 2 0.97 0.96 0.95 0.95 0.93 0.97Setting 3 500 0.01 1.0e-4 -3 0.005 650 (1.3) 0.3Model 1 0.97 0.57 0.23 0.17 0.24 0.11 0.17Model 2 0.95 0.76 0.36 0.36 0.49 0.90Setting 4 500 0.01 1.0e-4 -3 0.005 700 (1.4) 0.3Model 1 0.93 0.84 0.92 0.85 0.89 0.86 0.86Model 2 0.96 0.90 0.89 0.86 0.90 0.95Setting 5 500 0.01 1.0e-4 -3 0.005 750 (1.5) 0.3Model 1 0.94 0.97 0.97 0.91 0.92 0.88 0.93Model 2 0.98 0.99 0.92 0.96 0.96 0.93

14

Figure 2.4: Model 1 vs. Model 2: Coverage Probabilities Comparison in Five Settings

Figure 2.4 shows additional investigation for Model 2 under Settings 2 and 3, in which thevalue of θs is set to 1.3. Since the value of θ1 plays an important role in the coverageprobability, we selected two more points between 0.3 and 0.6. In general, when θ1 increases,the coverage probabilities are closer to the target 95% rate.

Table 2.4: Model 2 between Settings 2 and 3: Coverage Probabilities Comparison

Value of θ1 µ1 ψ1 ψ2 β0 β1 θ10.3 0.95 0.76 0.36 0.36 0.49 0.900.4 0.89 0.87 0.75 0.67 0.74 0.930.5 0.94 0.92 0.84 0.90 0.91 0.940.6 0.97 0.96 0.95 0.95 0.93 0.97

2.3.2 Simulation Evaluation for Model 2

To evaluate the robustness of Model 2, we conducted a more complicated simulation studybased on the combinations of various parameters. The following parameters are fixed amongall simulation setups:

µ1 = 500, ψ1 = 0.01, β0 = −3, ψ2 = 0.0001

15

For the parameters of interest, we tested multiple levels for each parameter.

β1 ∈ {0.004, 0.0045, 0.005}θ ∈ {0.3, 0.6}θs ∈ {1.2, 1.3, 1.4, 1.5}

β1 represents the relationship between proportion of congested state and the traffic volume.θ indicates the relationship between travel time under congested state and the traffic volume.There are 24 different settings for simulation. The outputs are summarized in Table 2.5. Theresults show that the coverage probabilities are generally good when the two components arewell separated. The model performance can be more clearly identified in Figure 2.5. Thedashed line denotes the 95% significant level for reference. As can be seen, the larger θ andθs are, the higher the coverage probabilities will be.

Table 2.5: More Results of Model 2: Coverage Probabilities

ID β1 θ θs Cov of µ1 Cov of ψ1 Cov of ψ2 Cov of β0 Cov of β1 Cov of θ1 0.004 0.3 1.2 0.78 0.09 0 0 0 0.232 0.004 0.3 1.3 0.89 0.71 0.09 0.09 0.24 0.783 0.004 0.3 1.4 0.91 0.94 0.61 0.84 0.88 0.924 0.004 0.3 1.5 0.95 0.95 0.93 0.96 0.96 0.945 0.004 0.6 1.2 0.91 0.87 0.71 0.69 0.71 0.966 0.004 0.6 1.3 0.95 0.94 0.91 0.89 0.89 0.967 0.004 0.6 1.4 0.95 0.93 0.93 0.93 0.94 0.938 0.004 0.6 1.5 0.93 0.98 0.96 0.94 0.97 0.939 0.0045 0.3 1.2 0.79 0.18 0 0 0.01 0.4110 0.0045 0.3 1.3 0.91 0.77 0.24 0.19 0.29 0.8411 0.0045 0.3 1.4 0.93 0.94 0.92 0.93 0.93 0.7412 0.0045 0.3 1.5 0.94 0.92 0.93 0.91 0.94 0.9213 0.0045 0.6 1.2 0.93 0.89 0.78 0.74 0.75 0.9414 0.0045 0.6 1.3 0.97 0.99 0.94 0.93 0.93 0.9615 0.0045 0.6 1.4 0.98 0.96 0.89 0.88 0.89 0.9816 0.0045 0.6 1.5 0.97 0.94 0.95 0.96 0.94 0.8917 0.005 0.3 1.2 0.83 0.26 0.1 0 0 0.6618 0.005 0.3 1.3 0.95 0.76 0.36 0.36 0.49 0.919 0.005 0.3 1.4 0.96 0.9 0.89 0.86 0.9 0.9520 0.005 0.3 1.5 0.98 0.99 0.92 0.96 0.96 0.9321 0.005 0.6 1.2 0.93 0.91 0.79 0.69 0.78 0.9722 0.005 0.6 1.3 0.97 0.96 0.95 0.95 0.93 0.9723 0.005 0.6 1.4 0.92 0.93 0.95 0.92 0.93 0.9324 0.005 0.6 1.5 0.94 0.94 0.94 0.95 0.96 0.87

16

Figure 2.5: Model 2: Coverage Probabilities Comparison

2.3.3 Robustness of Misspecified θs

The parameter θs is one of the key parameters of Model 2 and is predetermined instead ofestimated from data. The true value of θs is typically unknown. Therefore, the robustnessof the model with respect to misspecification is critical for applications. For example, willthe results change substantially if the true value of θs is 1.2 but was misspecified based onθs = 1.3. We evaluated this issue with four different settings.

The results of the simulation are shown in Table 2.6. As can be seen, the point estimates ofthe parameters are generally stable. In the case of misspecified θs, the model would generallyoverestimate the mean of component 2. From Table 2.7, it can be concluded that when thetwo components are not well separated (i.e., θs and θ are small), the misspecified models can

17

sometimes have better coverage of the regression coefficients for the mixture proportion (ψ1,β0 and β1).

Table 2.6: Misspecified Models: Average of Posterior Means Comparison

µ1 ψ1 ψ2 β0 β1 θs θ1 µ2

Setting 1 500 0.01 1.0e-4 -3 0.005 1.2 0.3 769True Model 499.9 0.010 9.29e-5 -2.72 0.0046 1.2 0.30 774

Misspecified 1 499.9 0.010 9.26e-5 -2.84 0.0048 1.3 0.24 782Misspecified 2 499.9 0.010 8.89e-5 -2.92 0.0049 1.4 0.18 800

Setting 2 500 0.01 1.0e-4 -3 0.005 1.2 0.6 939True Model 500.0 0.010 9.81e-5 -2.94 0.0049 1.2 0.60 947Misspecified 500.0 0.010 9.63e-5 -2.96 0.0050 1.3 0.54 958

Setting 3 500 0.01 1.0e-4 -3 0.005 1.3 0.3 819True Model 500.0 0.010 9.52e-5 -2.87 0.0048 1.3 0.30 828

Misspecified 1 500.0 0.010 9.52e-5 -2.95 0.0048 1.4 0.24 837Misspecified 2 500.0 0.010 9.03e-5 -2.98 0.0050 1.5 0.18 852

Setting 4 500 0.01 1.0e-4 -3 0.005 1.3 0.6 989True Model 500.0 0.010 9.91e-5 -2.98 0.0050 1.3 0.60 989Misspecified 500.0 0.010 9.74e-5 -2.99 0.0050 1.4 0.54 1005

Note: The µ2 is estimated by θs ∗ µ1 + θ1 ∗ X

Table 2.7: Misspecified Models: Coverage Probabilities Comparison

µ1 ψ1 ψ2 β0 β1 θs θ1Setting 1 500 0.01 1.0e-4 -3 0.005 1.2 0.3

True Model 0.91 0.30 0.09 0 0 1.2 0.71Misspecified 1 0.78 0.67 0.09 0.18 0.28 1.3 0.28Misspecified 2 0.78 0.81 0 0.61 0.67 1.4 0

Setting 2 500 0.01 1.0e-4 -3 0.005 1.2 0.6True Model 0.96 0.94 0.78 0.73 0.77 1.2 0.95Misspecified 0.90 0.95 0.59 0.91 0.95 1.3 0

Setting 3 500 0.01 1.0e-4 -3 0.005 1.3 0.3True Model 0.95 0.73 0.42 0.35 0.42 1.3 0.88

Misspecified 1 0.90 0.90 0.41 0.84 0.85 1.4 0Misspecified 2 0.89 0.95 0.02 0.88 0.92 1.5 0

Setting 4 500 0.01 1.0e-4 -3 0.005 1.3 0.6True Model 0.94 0.88 0.89 0.94 0.95 1.3 0.93Misspecified 0.92 0.92 0.71 0.92 0.91 1.4 0

Figure 2.6 shows the coverage probabilities comparison between true and misspecified modelsin four different settings. The dashed line is used to denote 95% for reference. It can be

18

observed that the true models are superior in estimating θ1 and ψ2, while the misspecifiedmodels perform better in ψ1, β0 and β1.

Figure 2.6: Misspecified and True Model Comparison

Although misspecified models are generally not good at estimating θ1, it is the mean valueof µ2: µ2 = θs ∗ θ0 + θ1 ∗ x that is the of ultimate interest. In order to evaluate theinfluence of misspecified θs on µ2, we evaluated the relationship between traffic volume andthe corresponding µ2 under theoretical result, true model estimate, and misspecified modelestimate. Figure 2.7 shows that the misspecified model estimates are close to the theoreticalresults when the traffic volume is high, which directly links to the congested state. Therefore,the application of these models is still robust when θs is misspecified.

19

Figure 2.7: Theoretical, Misspecified and True Model Comparison

2.4 Model Application to Field-collected Data

We applied Model 2 to the field-collected data collected on I-35, near San Antonio, Texax,as introduced before. Models with different values of θs were fitted. The results are shownin Table 2.8.

20

Table 2.8: Results from Real Data with Different θ′ss

Parameter θs = 1 θs = 1.1 θs = 1.2 θs = 1.3 θs = 1.4µ1 578.6 578.5 578.3 578.3 578.2ψ1 0.00083 0.00083 0.00083 0.00082 0.00081ψ2 1.01e-5 9.99e-6 9.68e-6 9.29e-6 8.86e-6β0 -0.97 -1.00 -1.04 -1.09 -1.13β1 0.031 0.033 0.036 0.038 0.040θ 15.90 12.25 8.68 5.19 1.76µ2 758.6 774.7 792.0 810.3 829.4

Note: The µ2 is estimated by θs ∗ µ1 + θ ∗ X

Figure 2.8 shows the relationship between θs and other critical model parameters. Both themeans and the standard deviations of the two components are quite stable with respect tothe change of θs.

21

Figure 2.8: Parameters Estimates under Different θ′ss

Figure 2.9 shows the relationship between probability in congested state and traffic volumeunder different settings of θs. As can be seen, the difference is pronounced near traffic volume60 70. It should be noted that the traffic volume here is a subset of the actual traffic volumethus should be interpreted as an indicator of traffic condition.

22

Figure 2.9: Probability in Congested State and Traffic Volume: Real Data

2.5 Summary

The multi-state model provides a flexible and efficient framework for modeling travel timereliability, especially under complex traffic conditions. Guo et al. (2012) illustrated thatthe multi-state model outperforms single-state models in congested or near-congested traffic

23

conditions and the advantage is substantial in high traffic volume conditions.

The objective of this study is to quantitatively evaluate the influence of traffic volume onthe mixture of two components. The study advances the multi-state models by propos-ing regressions on the proportions and distribution parameters for underlying traffic states.The Bayesian analysis also provides feasible credible intervals for each parameter withoutasymptotic assumption.

Previous studies usually modeled the travel time independently without establishing the re-lationship between travel time and important transportation statistics such as traffic volume.The models developed can also be easily extended to include more covariates in either linearor nonlinear forms.

The application results indicate that there is a negative relationship between the proportionof free-flow state and the traffic volume, which confirms the statement raised by Guo et al.(2012) that for low traffic volume conditions, there might only exist one travel time state andsingle-state models will be sufficient. The estimation for the congested state indicates thatthe travel time under such condition exhibits substantial variability and is positively relatedwith traffic volume, which also verifies the phenomenon found by Guo et al. (2012).

There are several potential extensions to the current research. Current research only includeslognormal and normal distributions. A number of other distributions, e.g., Gamma andextreme value distributions, can also be investigated. One of the assumptions for the existingBayesian mixture model is that all the observations are independent, which could be relaxedin the Hidden Markov model.

Chapter 3

Travel Time Reliability: HiddenMarkov Model

3.1 Introduction

The Bayesian mixture regression model discussed in the previous chapter is based on theassumption that the observations are independent. In most cases, the travel time of vehicleswere collected chronologically, thus the travel time in adjacent time periods is most likely tobe corrected because of the continuity of traffic flow. Although it is possible to apply auto-correlated error terms to handle this problem (Cochrane and Orcutt 1949), the interpretationregarding this scenario is unclear.

In order to accommodate the dependency structure of the data, we adopt a gentle method-ology: the Hidden Markov Model (HMM). The basic concept of the hidden Markov modelwas introduced by Baum and Petrie (1966). It can be shown that the traditional mixturemodel as a special case of HMM (Scott 2002).

Hidden Markov models are popular in a wide variety of applications including (Couvreur1996), speech recognition (Rabiner 1989), biometrics (Albert 1991), econometrics (Hamilton1989), computational biology (Krogh et al. 1994), fault detection (Smyth 1994) and manyother areas.

3.2 Autocorrelation

As an exploratory analysis, we treat the I-35 data as a time series and calculate autocor-relation among time windows separated by different lags. Autocorrelation is a measure ofsimilarity between observations with certain time lags (Wiener 1930).

24

25

For a sequence {Xt}, the autocorrelation is defined as:

ACF (t, s) =E(Xt − µt)(Xs − µs)

σtσs

By assuming that {Xt} is second-order stationary (Wold 1938), the autocorrelation can bewritten as:

ACF (s) =E(Xt − µ)(Xt+s − µ)

σ2

For an independent sequence, it is easy to see that ACF (s) should be small regardless thevalue of s. For a sequence as {Xt : Xt = 0.5 ∗ Xt−1 + εt}, the ACF (s) will be quite largewhen s=1 and decreases gradually as s increases.

Figure 3.1 shows two plots with independent data and the field collected data. The x-axisis the time lag, while the y-axis is the ACF . The plot on the left shows the ACF of theindependent sequence while the plot on the right is estimated from the I-35 data. It is clearthat the observed travel time is not an independent data set.

Figure 3.1: Autocorrelation Comparison

A formal test of the autocorrelation is the Durbin–Watson test (Durbin and Watson 1950).The Durbin–Watson statistic is defined as:

d =

∑Tt=2(Xt −Xt−1)

2∑Tt=1X

2t

26

The value of d is between 0 and 4. If the Durbin–Watson statistic is substantially less than 2,there might be positive correlation. On the other hand, there might be negative correlation.Under the normal assumption, the null distribution of the Durbin Watson–statistic is a linearcombination of chi-squared variables.

To satisfy the normal assumption, we use Box-Cox transformation (Choongrak 1959): x→xλ−1λ

. Figure 3.2 indicates that λ = −4 is the optimal choice.

Figure 3.2: Box-Cox Transformation

The Durbin–Watson statistic from the (transformed) travel time is 0.8244, which yields ap-value close to zero. This confirms the high positive autocorrelation among the travel timedata .

3.3 Theoretical Background

3.3.1 Model Specification

A hidden Markov model consists of two sequences: the observed sequence {xt}, t = 1, 2, ..., nand the latent state sequence {st}, t = 1, 2, ..., n. Given the st, the distribution of observed

27

data xt is fully determined by the value of st. For example, if we denote the travel time inseconds as {xt}, t = 1, 2, ..., n, then the sequence st could be defined as:

st =

{1 if the road is under free-flow2 if the road is under congestion

Figure 3.3: Hidden Markov Model: An Illustration

The two values of st represent the two travel time states, which correspond to the twocomponents in a mixture distribution. Given the state, the observed data xt follows one ofthe distributions:

f(xt|st) =

{f(x|Θ1), if st = 1f(x|Θ2), if st = 2

The form of the distribution f(x|Θ) could be normal, Gamma, Poisson, multinomial, orothers. For example, xt|st = 1 ∼ N(1000, 1002) and xt|st = 2 ∼ N(500, 302).

The term ”Hidden” indicates that {st} is a latent sequence which cannot be observed. Sec-ondly, the term “Markov” indicates an important property of {st}:

P (st|st−1, ..., s1) = P (st|st−1),∀t ≥ 2

Thus, {st} is a Markov chain and has its transition probability matrix. For a two-statesequence, the transition matrix is as follow:

P =

(P11 P12

P21 P22

),

28

where Pij is the probability that P (st+1 = j|st = i).

It can be shown that if {st} is a trivial Markov Chain, i.e. i.i.d., then the hidden Markovmodel is equivalent to a traditional mixture model. Figure 3.4 is an illustration of thetwo-state Markov chain for traffic states.

Figure 3.4: Illustration of Two States Markov Chain

The basic properties of a Markov chain are listed below:

Irreducible: It is possible to get to any state from any state;Aperiodic: A state i has period k (k > 2) if {n : p

(n)ii > 0} = {k ∗ d : d ≥ 1}. If none of the

states is periodic, the chain is aperiodic;Positive recurrent : A state i is positive recurrent if the expected time that state i returnsto itself is finite.

If every state in an irreducible chain is positive recurrent, there exists a unique stationarydistribution π that satisfies:

πP = π

If an irreducible chain is positive recurrent and aperiodic, it is said to have a limiting distri-bution φ:

limn→+∞

p(n)ij = φj,

where p(n)ij = P (si+n = j|si = i)

29

A limiting distribution, when it exists, is always a stationary distribution, but the converseis not true. The stationary or limiting distribution can be used to address the long-term be-havior of a Markov chain. For example, suppose a hidden Markov model has such transitionmatrix:

P =

(0.8 0.20.1 0.9

),

By solving: {π1 = 0.8π1 + 0.1π2

π1 + π2 = 1

The solution for the above equations is π1 = 1/3, π2 = 2/3. Roughly speaking, 1/3 of theobservations will be in state 1 while 2/3 of the observations will be in state 2 in a long-termrun. This relates the hidden Markov model with the traditional mixture model.

One of the primary interests for travel time uncertainty research is to evaluate the influence oftraffic volume. In the HMM framework, we build regression models on transition probabilitiesusing traffic volume data. Hereafter we use y to denote the observed data and x as thecovariate.

When the HMM has only two states, the transition matrix can be modeled in the style oflogistic regression models. The transition probability matrix is a 2 × 2 matrix. Due to theconstraints that P11 + P12 = P21 + P22 = 1, the matrix has two free parameters. Chunget al. (2007) discussed a similar model. For each row of the transition matrix, two logisticregression models with one covariate can be used:

log(P12

P11

) = β0,1 + β1,1x

log(P22

P21

) = β0,2 + β1,2x

When the Markov chain has more than two states, a multinomial logistic regression modelcan be be applied. The first column is typically chosen as baseline. For example, the threestates model is:

log(P12

P11

) = β0,1 + β1,1x log(P13

P11

) = β0,2 + β1,2x

log(P22

P21

) = β0,3 + β1,3x log(P23

P21

) = β0,4 + β1,4x

log(P32

P31

) = β0,5 + β1,5x log(P33

P31

) = β0,6 + β1,6x

Figure 3.5 illustrates the basic infrastructure of the HMM. Both the historical data (trafficvolume and travel time) and the observed data (real-time traffic volume) can be used to doprediction.

30

Figure 3.5: Hidden Markov Model: Flow Chart

3.3.2 Model Estimation

There are several methods to estimate the parameters in the HMM. One of the popularmethods is EM algorithm (Baum et al. 1970, Bilmes 1998). Alternatives include the Viterbiand Gradient algorithm (Rabiner 1989). Bayesian method (Jean-Luc and Chin-Hui 1991)has also been proposed. There are some existing software packages specifically for fitting theHMM in R (Visser and Speekenbrink 2010). Although a Bayesian approach to HMM analysisdoes show some advantages in complex models, Rydn (2008) claimed that the results aregenerally similar, and it is sufficient to use EM algorithm in most practical problems.

If we define:Lk(t) = P (st = k|X)

Hk,l(t) = P (st = k, st+1 = l|X)

The Lk(t) is the conditional probability of being at state k at time t given the entire observedsequence X. The Hk,l(t) is the conditional probability of being at state k at time t and beingat state l at time t+ 1 given the entire observed sequence X.

31

The initial probabilities of state k (k = 1, . . . ,M) can be estimated by:

P (s1 = k) ∝T∑t=1

Lk(t),

M∑k=1

P (s1 = k) = 1

The EM algorithm can be described as follows (Li and Gray 2000):

• E stepCompute Lk(t) and Hk,l(t) under current parameter values.

• M step

µk =

∑Tt=1 Lk(t)xt∑Tt=1 Lk(t)

Σk: Covariance =

∑Tt=1 Lk(t)(xt − µk)(xt − µk)T∑T

t=1 Lk(t)

P (st+1 = l|st = k): Transition probability =

∑T−1t=1 Hk,l(t)∑T−1t=1 Lk(t)

The forward-backward algorithm can be used in the estimation (E) step, .

Define:ak(x1, ..., xt) = P (x1, ..., xt, st = k)

bk(xt+1, ..., xT ) = P (xt+1, ..., xT |st = k)

The forward algorithm is:

ak(x1) = P (s1 = k) ∗ fk(x1)

ak(x1, ..., xt) = fk(xt)M∑i=1

ai(x1, ..., xt−1)pik

where fk is the probability density of component k, pik is the transition probability.

The backward algorithm is:

bk(xT+1, ..., xT ) = 1 (Arbitrary setting)

bk(xt+1, ..., xT ) =M∑i=1

pkifi(xt+1)bi(xt+2, ..., xT )

32

Then Lk(t) and Hk,l(t) can be estimated as:

Lk(t) =ak(x1, ..., xt)bk(xT+1, ..., xT )∑Mi=1 ai(x1, ..., xt)bi(xT+1, ..., xT )

Hk,l(t) =ak(x1, ..., xt)pklfl(xt+1)bk(xt+1, ..., xT )∑M

i=1

∑Mj=1 ai(x1, ..., xt)pijfj(xt+1)bk(xt+1, ..., xT )

The function Lk(t) can be used to estimate the state to which an observation belongs.However, the estimation is based on individual observation can cause unwanted issues. Forexample, it is possible that the result shows that st = 1 and st+1 = 2; however, p12 = 0,which makes the entire sequence meaningless. The Viterbi algorithm (Viterbi 1967) mightbe applied to obtain the sequence with largest posterior probability.

3.3.3 Bootstrap and Confidence Interval

Confidence interval is the focus of classical statistical inference. Visser et al. (2000) proposedseveral ways to obtain the confidence interval of hidden Markov models: finite approximationof Hessian, profile likelihood, and bootstrap. He claimed that the results from first one areusually too narrow. Therefore, we evaluated the other two methods.

The profile likelihood method is based on profile likelihood ratio and χ2 distribution. Thebasic idea is to evaluate the change of the log-likelihood caused by a single parameter bytreating all the other parameters as nuisance (Meeker and Escobar 1995). A profile likelihoodfor parameter β is defined as the likelihood function that all the other parameters are fixedat their MLEs:

PL(β) = maxδL(β; δ)

Suppose the MLE of β is β, it can be shown that:

−2 ∗ (logPL(β)− logPL(β)) ∼ χ2(1) asymptotically.

Based on the χ2(1) distribution, we may derive the lower and upper bounds of the confidenceinterval easily. Figure 3.6 is an intuitive illustration, where Bm is the MLE while Bu and Blare the upper and lower bounds.

33

Figure 3.6: Confidence Interval by Profile Likelihood

The bootstrap idea is a popular technique to obtain confidence interval (Efron and Tibshirani1994). However, a naive resampling method is not appropriate for the hidden Markov modelbecause that will break the dependency structure of the original data. Parametric bootstrapcan be applied to handle this issue. There are generally three ways to do parametric bootstrapwith hidden Markov model.

1. Based on parameter estimation.

2. Based on original data.

3. Mixture of 1 and 2.

For the first approach, the parameters are estimated by original data and a new data setwill be simulated solely based on the parameter estimates. The basic assumption for thismethod is that the model is correctly specified.

For the second approach, after model fitting the residuals are to be collected:

ri = yi − yi

The sampling with replacement is done within the set of residuals and new observations aregenerated as follows:

ynewi = yi + rnewi , rnewi ∈ {r1, r2, ...rN}

The sample size of original data should be sufficiently large.

34

For the third approach, we assume that the random errors are i.i.d. normal:

ynewi = yi + rnewi , rnewi ∼ N(0, σ2)

However, it is worth noting that the variance of the error terms might contain substantialheterogeneity. For example, suppose there are two states in a hidden Markov model. It ishighly possible that the distributions in the two states have different variance structures.Therefore, adjustments must be applied for methods 2 and 3 (Bandeen-Roche et al. 1997,Wang et al. 2005):

1. Assign each observation to a group based on posterior probability.

2. Within each group, do the resampling of residuals.

3. Repeat 1 and 2.

By bootstrapping, new data sets can be generated and parameter estimates will be evaluated.After that, there are two ways to generate a 95% confidence interval: either by 1.96 ∗ σβor by empirical 2.5% and 97.5% quantiles. The results should be close for sufficiently largedata sets.

3.3.4 Determine the Number of Components

A challenging issue in the hidden Markov model is to choose the proper number of compo-nents. several criteria and procedures have been proposed. In this section, we will presentthree general approaches to address this issue: likelihood ratio test, criteria-based modelselection, and cross-validation.

The likelihood ratio test (Neyman and Pearson, 1933) has been well known as an efficientway of model selection. Under certain regularity conditions, the log likelihood ratio undernull hypothesis can be tested through the χ2 test. However, it has been shown that twomixture distributions with different numbers of components cannot satisfy those regularityconditions (Wolfe, 1971). Wolfe claimed that a modified version of likelihood ratio test couldpossibly be applied:

H0 : n = c0

H1 : n = c1, (c1 > c0)

− 2

N(N − 1− d− c1

2)(logL(c0)− logL(c1)) ∼ χ2(

2d

c1 − c0)

where N is the sample size, n is the number of components, and d = c1−c0. The assumptionthat c1 > c0 is based on the statistics version of ”Occam’s razor”: We always prefer a simplemodel that might work. Unless there is strong evidence to support that a more complicatedmodel is significantly better, we will stick to the simple model. Wolfe’s approach only

35

provides a rough approximation but was easy to implement in those years when computingresources were limited.

McLachlan (1987) proposed that bootstrap can be applied to obtain the approximate dis-tribution of the log likelihood ratio test statistic under null hypothesis. The premise is togenerate random samples by a mixture distribution with c0 components, calculate the loglikelihood ratio test statistics, and then establish the empirical distribution based on theobserved test statistics. The generated empirical distribution can be used to calculate thep-value of the original data.

The criteria-based model selection method has also gained popularity to asses the number ofcomponents in mixture models. The likelihood function, interpreted as a measure of goodnessof model fitting, could not be used as a criterion to select the number of components in amixture model due to its tendency to choose more complicated models (Biernacki et al. 2000).A number of criteria have been discussed to handle this issue. Usually these criteria add apenalty term along with the likelihood to represent the trade-off between model complexityand utility, for example, the AIC criteria (Akaike 1974). Hurvich and Tsai (1989) suggestedthe use of AICc (corrected AIC) instead of AIC, since AIC tends to overfit the data. TheAICc is defined as:

AICc = −2 ∗ logL+ 2k ∗ (1 +(k + 1)

N − k − 1)

where k is the number of parameters.

Another popular criterion is BIC (Schwarz 1978):

BIC = −2 ∗ logL+ k ∗ log(N)

BIC generally adds more penalty on the model complexity compared to AIC, and it has beenshown that BIC is equivalent to the Minimum Description Length(MDL) criterion (Rissanen1978).

Another useful criterion is Minimum Message Length (MML). MML (Wallace and Boulton1968) was derived from the perspective of information theory. The process of modeling isconsidered as encoding the data and the model parameters can be considered as the extracost of the encoding.

Therefore, the length of an encoded message can be described as:

Length(θ, Y ) = Length(θ) + Length(Y |θ)

If Length(θ) is short, then the model is simple but correspondingly Length(Y |θ) will belong.

The cross-validation approach was proposed by Celeux and Durand (2008). It was based onhalf-sampling. Celeux showed that if we pick the odd numbers of observations or the evennumbers of observations in a data set generated by hidden Markov model, the result is still a

36

hidden Markov chain. Therefore, we may simply use the ”odd subset” of the original sampleto fit the model, and calculate the likelihood of the ”even subset” of the original sample.The likelihood can be used as a criterion to proceed model selection.

3.3.5 Goodness of Fit

Assessing the goodness of fit of a given hidden Markov model is an important topic. As inregular regression models, the residuals can be used. However, due to the inherent hetero-geneity of the hidden Markov model, the residuals must be adjusted by classes, as we haveseen in the previous section. Wang et al. (2005) showed that the class-adjusted residualsare asymptotically equivalent to the distributions of residuals from the latent classes. Zuc-chini and MacDonald (2009) proposed a different approach using the pseudo-residual. Thepseudo-residual is defined as the probability of seeing a less extreme response than observedgiven all observations except that at time t:

ut = P (Yt ≤ yt|yi, ∀i 6= t)

For well-fitted models, the pseudo residuals should be approximately Uniform[0, 1] dis-tributed.

MacKay Altman (2004) provides an intuitive graphical approach similar to the Q-Q plot.By plotting the estimated distribution against the empirical distribution, the lack of fit canbe detected with high probability for a large sample size. The estimated distribution is givenby:

F (y|θ) =K∑i=1

πiFi(y|θi)

If the model is correctly specified, the plot of empirical against estimated distributions shouldbe close to a straight line.

3.3.6 Prediction

Predicting the future travel time from historical data in hidden Markov model can be con-ducted according to the following Markov property.

Based on model specification,

f(yt|st) =

{N(µ1, σ

21), if st = 1

N(µ2, σ22), if st = 2

We have:

E(yt|st) =

{µ1, if st = 1µ2, if st = 2

37

If the model contains regression in the mean parameter,

f(yt|st, xt) =

{N(µ1, σ

21), if st = 1

N(θ0 + θ1 ∗ xt, σ22), if st = 2

we have:

E(yt|st, xt) =

{µ1, if st = 1θ0 + θ1 ∗ xt, if st = 2

Given the states, the expected value of yt can be used as the predicted value of travel time.However, the state is unobservable so our prediction is actually the expected future traveltime, E(yt|y1, ...yt−1, x1, ..., xt−1). The key of this problem is to predict st using historicaldata. Assume the initial distribution for the Markov chain st is:

A =

(P (s0 = 1)P (s0 = 2)

)=

(p0

1− p0

)The transition matrix is:

T =

(P11 P12

P21 P22

)Then the distribution of s1 is:

A ∗ TThe distribution of s2 is:

A ∗ T 2

and so on.

The marginal distribution of st at any time t can be estimated through Markov property.The transition matrix is estimated by the data. The initial distribution could be either setmanually or estimated from the last observed travel time in the previous time period.

If the transition matrix is modeled through regression:

log(P12

P11

) = β0,1 + β1,1x

log(P22

P21

) = β0,2 + β1,2x

It is straightforward to show that:

P11 =exp(β0,1 + β1,1x)

exp(β0,1 + β1,1x) + 1

P12 =1

exp(β0,1 + β1,1x) + 1

P21 =exp(β0,2 + β1,2x)

exp(β0,2 + β1,2x) + 1

P22 =1

exp(β0,2 + β1,2x) + 1

38

Consider a simple example: Suppose that during the time interval [7:00-7:59], the trafficvolume is 8. The transition matrix is:

log(P12

P11

) = −6 + 0.1 ∗ x

log(P22

P21

) = 0.6 + 0.15 ∗ x

T (8) =

(0.9946 0.00540.8581 0.1419

)Since the distribution of first vehicle is unknown, we might use the non-informative prior:

A =

(0.50.5

)To predict the state of the first vehicle in the next time interval, i.e. s8, it can be shownthat:

A ∗ T (8)7 =

(0.9940.006

)That is, y8 has 99.4% probability to be in the free-flow state and µ1 can be used asa predictedvalue.

Figure 3.7 is an overview of the prediction procedures in the hidden Markov model.

Figure 3.7: Hidden Markov Model: Estimation

39

3.4 Simulation Study

3.4.1 No Covariate

First, consider a simple case in which the covariate is not present in the model. We simulate1,000 data sets, each with 5,000 observations, according to the transition matrix (values arebased on estimates from real data): (

0.992 0.0080.068 0.932

)Each data set will be fitted by both traditional and hidden Markov models. As can be seenfrom Table 3.1, the confidence interval estimates of HMM are slightly narrower.

Table 3.1: HMM vs. Traditional: No Covariate 1

Name True Traditional 95% C.I. HMM 95% C.I.µ1 580 579.9 (578.4, 581.2) 579.9 (578.4, 581.1)µ2 1035 1037.1 (989.8, 1067.1) 1036.6 (1001.2,1062.5)σ1 41 40.9 (40.0, 42.1) 40.9 (40.1, 42.1)σ2 371 369.8 (348.4, 389.0) 370.1 (350.3, 388.9)

40

Figure 3.8: HMM vs. Traditional 1

Figure 3.8 indicates that the log-likelihoods of HMM models are larger than that of tradi-tional models. The mean difference in log-likelihood is around 951, which is a substantialdifference.

We also tried another set of parameters and Table 3.2 also implies that HMM can generateslightly narrower confidence intervals.

Table 3.2: HMM vs. Traditional: No Covariate 2

Name True Traditional 95% C.I. HMM 95% C.I.µ1 580 579.9 (578.7,581.0) 579.8 (578.8,581.2)µ2 750 751.6 (708.1,793.4) 751.2 (710.1,788.0)σ1 41 41.0 (40.1, 42.1) 41.0 (40.0 41.9)σ2 371 369.2 (341.3,396.3) 369.2 (340.6,394.0)

41

3.4.2 With Covariate

When the covariate is considered in the model, the simulation also indicated that the HMMis superior to the traditional mixture model. We simulated 500 data sets, each with 5,000observations, according to the parameters setting (values are based on estimates from realdata):

log(y) ∼{N(log(500), σ1 = 0.07), if st = 1N(log(1000), σ2 = 0.31), if st = 2

log(P12

P11

) = −6 + 0.1 ∗ x

log(P22

P21

) = 0.6 + 0.15 ∗ x

Due to computing issues, we use log transform of the original data and the log likelihoodvalues will be changed accordingly.

Figure 3.9 indicates that the log-likelihoods of HMM models are larger than that of tradi-tional models. The mean difference in log-likelihood is around 997, which is a substantialdifference.

42


Figures 3.10 and 3.11 clearly demonstrate the advantage of HMM. Both the mean estimatesand the variance estimates from HMM are superior.

43


44


Figure 3.12 illustrates the 95% confidence intervals of several parameters estimates of hiddenmarkov model. The estimates from different samples are relatively symmetric and centeredat the true values.

45

Figure 3.12: 95% C.I. of HMM

Table 3.3 provides the numbers in Figure 3.12.

46

Table 3.3: Parameter Estimation of HMM

Name True 95% C.I.µ1 500 (499,501)µ2 1000 (977.7,1026.7)σ1 0.07 (0.070, 0.071)σ2 0.3 (0.28,0.32)β0,1 -6 (-7.04,-5.07)β1,1 0.1 (0.03,0.16)β0,2 0.6 (-0.56,1.52)β1,2 0.15 (0.09, 0.23)

3.5 Application for Field-Collected Data

The proposed model was applied to the data collected from the I-35 near San Antonio,Texas. The actual travel time of each vehicle was measured when vehicles equipped withratio frequency tags passed automatic vehicle identification (AVI) stations. The AVI datacollection approach collects accurate travel time but only measured the travel time of ve-hicles equipped with electronic ratio frequency equipment. Therefore, the collected datarepresent the actual traffic flow by a scale. Figure 3.13 illustrates the sampling scheme andthe observations can be considered as proportional to the actual traffic flow. In order topredict the travel time with higher precision, the sampling rate could be scaled up (Figure3.14) but the basic modeling steps are identical.

Figure 3.13: Illustration of Low Sampling Rate

47

Figure 3.14: Illustration of Potential Improvement

The number of hidden states in real data can be determined through the likelihood ratio test.We first evaluate whether two states are sufficient to depict the hidden structure from thedata. Three or more states will be considered if two-state model does not provide sufficientfitting:

H0 : n = 2

H1 : n = 3

Since the log likelihood ratio does not follow χ2 distribution, the bootstrap sampling methodwas adopted. Figure 3.15 shows the histogram of the log likelihood ratio from 500 sam-ples.

48

Figure 3.15: Histogram of the Log Likelihood Ratio

The observed log likelihood ratio has the value around 577.2 with a p-value smaller than0.05. Therefore, we reject the null hypothesis.

We further tested whether the empirical distribution follows χ2. The Kolmogorov-Smirnovtest was used to test the null hypothesis that the empirical distribution follows χ2 withseveral alternative degrees of freedom. The results are shown in Table 3.4.

Table 3.4: Kolmogorov-Smirnov Test Result

Degrees of Freedom P-value4 1.598e-105 0.033366 1.788e-107 < 2.2e-16

49

The results from above table indicate that none of these χ2 distributions is can be consideredas good approximate for the empirical distribution. Compared to χ2(5), the most similaralternative, the empirical distribution has heavier mass on the left (heavier tail) (Figure3.16).

Figure 3.16: χ2 and Empirical Distributions

The BIC values for the two-component and three-component models are -8572.2 and -9090.8respectively. This also implies that three-component model is superior in fitting the data.For the Message Length criterion, the values are -4280.6 and -4548.8. We also tried AICcand similar results were obtained. Based on the half-sampling cross validation, the loglikelihoods are 2066.8 and 2178.6. In sum, all these model comparison metrics indicate thatthe three-component model fits the data better.

50

The three states model specification is as follows:

log(P12

P11

) = β0,1 + β1,1x log(P13

P11

) = β0,2 + β1,2x

log(P22

P21

) = β0,3 + β1,3x log(P23

P21

) = β0,4 + β1,4x

log(P32

P31

) = β0,5 + β1,5x log(P33

P31

) = β0,6 + β1,6x

f(yt|st) =

N(µ1, σ

21), if st = 1

N(µ2, σ22), if st = 2

N(µ3, σ23), if st = 3

µi = θ0,i + θ1,i

Table 3.5 shows the parameter estimation for the three component model.

Table 3.5: Parameter Estimation for Real Data

β0,1 β1,1 β0,2 β1,2 β0,3 β1,3-4.89 0.056 -5.37 0.383 1.09 0.063β0,4 β1,4 β0,5 β1,5 β0,6 β1,6

-2.24 0.10 -1.3 0.25 0.46 0.33θ0,1 θ1,1 θ0,2 θ1,2 θ0,3 θ1,36.35 0* 6.46 0.005 7.03 0*σ1 σ2 σ3

0.066 0.092 0.27

* Note: 0 means not significant.

The results indicate that when the traffic volume is higher, the free-flow state will be morelikely to move to the congested state and the congested state will be more likely to stay. Itis worth noting that there is a medium state between free-flow and congested. The averagevalue of P31 is around 10−14, so it is very unlikely to move directly from congested state tofree-flow state. Most of the time the chain will the medium state as “interim period,” asshown in Figure 3.17.

51

Figure 3.17: Illustration of Three States Markov Chain

The marginal distribution for the three states, estimated by Viterbi algorithm, is: Free-flow84.4%, Medium 8.2%, Congested 7.4%.

The pseudo class-adjusted residual plots indicate that the free-flow and medium states arevery close to normal distribution while the congested state is slightly skewed. The standard-ized residuals plot implies that the residuals are generally within the range (−3, 3) and donot significantly increase by time.

52

Figure 3.18: Residual Check

3.6 Summary

In this project, we apply the hidden Markov model to the travel time reliability problem inorder to accommodate the dependency structure of observations and to understand how thetraffic volume influences the travel time of vehicles.

53

We focused on the model with two possible states, “free-flow” and “congested”, of the hiddenMarkov chain, The parameters and proportions of the two states are estimated. Moreover,we apply the well-known logit function in the transition matrix to include the covariate oftraffic volume. The modeling result shows that the traffic volume has a positive effect on theproportion of ”congested” condition as well as the mean parameters of such condition.

We have compared the model fitting of the hidden Markov model with that of an ordinarymixture model, and significant improvement has been documented. To sum up, the hiddenMarkov model is superior to interpret the data without sacrificing model simplicity.

Chapter 4

Summary

The multi-state model provides a flexible and efficient framework for modeling travel timereliability, especially under complex traffic conditions. Guo et al. (2012) mentioned thatthe multi-state model outperforms single-state models in congested or near-congested trafficconditions, and the advantage is substantial in a high-traffic-volume condition.

The objective of this study was to quantitatively evaluate the influence of traffic volume onthe mixture of two component distributions. Our work advanced the multi-state models byproposing regressions on the proportions and distribution parameters for underlying trafficstates. The Bayesian analysis also provides feasible credible intervals for each parameterwithout asymptotic assumption.

Previous studies usually modeled the travel time solely without establishing the relationshipbetween travel time and important transportation statistics such as traffic volume. Ourmodel can also be easily extended to include more covariates in either linear or nonlinearforms.

The modeling results indicated that a negative relationship between the proportion of free-flow state and traffic volume, which confirms the statement raised by Guo et al. (2012) thatfor low traffic volume condition, there might only exist one travel time state and single-statemodels will be sufficient. The estimation for the congested state indicates that the traveltime under such condition exhibits substantial variability and is positively related with trafficvolume, which also verifies the phenomenon found by Guo et al. (2012).

We applied the hidden Markov model to the travel time reliability problem in order toaccommodate the dependency structure of observations and to understand how the trafficvolume influences the travel time of vehicles.

Regarding the model specification we considered two possible states of the hidden Markovchain, which are “free-flow” and “congested” states. The parameters and proportions of thetwo states were estimated. Moreover, we applied the logit function in the transition matrix

54

55

to include the covariate of traffic volume. The modeling result shows that the traffic volumehas a positive effect on the proportion of congested condition as well as the mean parametersof such condition.

We compared the model fitting of the hidden Markov model with that of ordinary mixturemodel, and the significant improvement was documented. We also summarized several issuesin this model, such as the technique to determine the number of components. To sum up,the hidden Markov model provided superior to interpret the data without sacrificing modelsimplicity.

Bibliography

[1] Akaike, H. (1974). A new look at the statistical model identification. Automatic Control,IEEE Transactions on, 19(6):716–723.

[2] Albert, P. S. (1991). A two-state Markov mixture model for a time series of epilepticseizure counts. Biometrics, 47(4):1371–1381.

[3] Bandeen-Roche, K., Miglioretti, D. L., Zeger, S. L., and Rathouz, P. J. (1997). Latentvariable regression for multiple discrete outcomes. Journal of the American StatisticalAssociation, 92(440):1375–1386.

[4] Baum, L. E. and Petrie, T. (1966). Statistical inference for probabilistic functions offinite state Markov chains. The Annals of Mathematical Statistics, 37(6):1554–1563.

[5] Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970). A maximization techniqueoccurring in the statistical analysis of probabilistic functions of Markov chains. The Annalsof Mathematical Statistics, 41(1):164–171.

[6] Biernacki, C., Celeux, G., and Govaert, G. (2000). Assessing a mixture model for cluster-ing with the integrated completed likelihood. Pattern Analysis and Machine Intelligence,IEEE Transactions on, 22(7):719–725.

[7] Bilmes, J. A. (1998). A gentle tutorial of the EM algorithm and its application toparameter estimation for gaussian mixture and hidden Markov models. InternationalComputer Science Institute, 4(510):126.

[8] Celeux, G. and Durand, J.-B. (2008). Selecting hidden Markov model state number withcross-validated likelihood. Computational Statistics, 23(4):541–564.

[9] Choongrak, K. (1959). A note on Box-Cox transformation diagnostics. Technometrics,38(2):178.

[10] Chung, H., Walls, T., and Park, Y. (2007). A latent transition model with logisticregression. Psychometrika, 72(3):413–435.

56

57

[11] Cochrane, D. and Orcutt, G. H. (1949). Application of least squares regression torelationships containing auto-correlated error terms. Journal of the American StatisticalAssociation, 44(245):32–61.

[12] Couvreur, C. (1996). Hidden Markov models and their mixtures. Dept. Math., UniversitCatholique de Louvain, Louvain, Belgium.

[13] Durbin, J. and Watson, G. S. (1950). Testing for serial correlation in least squaresregression: I. Biometrika, 37(3/4):409–428.

[14] Efron, B. and Tibshirani, R. J. (1994). An introduction to the bootstrap, volume 57.CRC Press.

[15] Emam, E. and Ai-Deek, H. (2006). Using real-life dual-loop detector data to developnew methodology for estimating freeway travel time reliability. Transportation ResearchRecord: Journal of the Transportation Research Board, 1959(-1):140–150.

[16] Fowlkes, E. B. (1979). Some methods for studying the mixture of two normal (lognor-mal) distributions. Journal of the American Statistical Association, 74(367):561–575.

[17] Geweke, J. F. and Keane, M. P. (1997). Mixture of normals probit models. FederalReserve Bank of Minneapolis. Staff Report.

[18] Guo, F., Li, Q., and Rakha, H. (2012). Multistate travel time reliability models withskewed component distributions. Transportation Research Record: Journal of the Trans-portation Research Board, 2315(-1):47–53.

[19] Guo, F., Rakha, H., and Park, S. (2010). Multistate model for travel time reliability.Transportation Research Record: Journal of the Transportation Research Board, 2188(-1):46–54.

[20] Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary timeseries and the business cycle. Econometrica: Journal of the Econometric Society, pages357–384.

[21] Hurvich, C. M. and Tsai, C.-L. (1989). Regression and time series model selection insmall samples. Biometrika, 76(2):297–307.

[22] Jean-Luc, G. and Chin-Hui, L. (1991). Bayesian learning of Gaussian mixture densitiesfor hidden Markov models. Association for Computational Linguistics. 112457: 272-277.

[23] Krogh, A., Brown, M., Mian, I. S., Sjlander, K., and Haussler, D. (1994). HiddenMarkov models in computational biology: Applications to protein modeling. Journal ofMolecular Biology, 235(5):1501–1531.

[24] Lenk, P. and DeSarbo, W. (2000). Bayesian inference for finite mixtures of generalizedlinear models with random effects. Psychometrika, 65(1):93–119.

58

[25] Li, J. and Gray, R. M. (2000). Image segmentation and compression using hiddenMarkov models. Springer.

[26] MacKay Altman, R. (2004). Assessing the goodnessoffit of hidden Markov models.Biometrics, 60(2):444–450.

[27] McLachlan, G. J. (1987). On bootstrapping the likelihood ratio test stastistic for thenumber of components in a normal mixture. Applied Statistics, pages 318–324.

[28] Meeker, W. Q. and Escobar, L. A. (1995). Teaching about approximate confidenceregions based on maximum likelihood estimation. The American Statistician, 49(1):48–53.

[29] Neyman, J. and Pearson, E. S. (1933). On the problem of the most efficient tests ofstatistical hypotheses. Springer.

[30] Park, S., Rakha, H., and Guo, F. (2010). Multi-state travel time reliability model: Modelcalibration issues. In 89th Transportation Research Board Annual Meeting. TransportationResearch Board.

[31] Park, S., Rakha, H., and Guo, F. (2011). Multi-state travel time reliability model:Impact of incidents on travel time reliability. In Intelligent Transportation Systems (ITSC),2011 14th International IEEE Conference on, pages 2106–2111. IEEE.

[32] Qi, Y., Paisley, J. W., and Carin, L. (2007). Music analysis using hidden Markovmixture models. Signal Processing, IEEE Transactions on, 55(11):5209–5224.

[33] Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applicationsin speech recognition. Proceedings of the IEEE, 77(2):257–286.

[34] Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14(5):465–471.

[35] Rydn, T. (2008). Em versus Markov chain Monte Carlo for estimation of hidden Markovmodels: A computational perspective. Bayesian Analysis, 3(4):659–688.

[36] Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics,6(2):461–464.

[37] Scott, S. L. (2002). Bayesian methods for hidden Markov models: Recursive computingin the 21st century. Journal of the American Statistical Association, 97(457):337–351.

[38] Smyth, P. (1994). Hidden Markov models for fault detection in dynamic systems. Pat-tern recognition, 27(1):149–164.

[39] Tu, H., Van Lint, J., and van Zuylen, H. J. (2008). Travel time reliability model onfreeways. In Transportation Research Board 87th Annual Meeting.

59

[40] Visser, I., Raijmakers, M. E., and Molenaar, P. (2000). Confidence intervals for hiddenMarkov model parameters. British journal of mathematical and statistical psychology,53(2):317–327.

[41] Visser, I. and Speekenbrink, M. (2010). depmixS4: An R package for hidden Markovmodels. Journal of Statistical Software, 36(7):1–21.

[42] Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptoticallyoptimum decoding algorithm. Information Theory, IEEE Transactions on, 13(2):260–269.

[43] Wallace, C. S. and Boulton, D. M. (1968). An information measure for classification.The Computer Journal, 11(2):185–194.

[44] Wang, C.-P., Hendricks Brown, C., and Bandeen-Roche, K. (2005). Residual diagnosticsfor growth mixture models: Examining the impact of a preventive intervention on mul-tiple trajectories of aggressive behavior. Journal of the American Statistical Association,100(471):1054–1076.

[45] Wiener, N. (1930). The auto-correlation function. Acta. Math, 55:273.

[46] Wold, H. (1938). A study in the analysis of stationary time series. Uppsala.

[47] Wolfe, J. H. (1971). A Monte Carlo Study of the Sampling Distribution of the LikelihoodRatio for Mixtures of Multinormal Distributions. Naval Personal and Training ResearchLab, San Diego.

[48] Yang, R. and Berger, J. O. (1996). A catalog of noninformative priors. Institute ofStatistics and Decision Sciences, Duke University.

[49] Zucchini, W. and MacDonald, I. L. (2009). Hidden Markov models for time series: Anintroduction using R. CRC Press.

bayesian travel time reliability models

Documents