bayesian regression methodology for project prioritization
TRANSCRIPT
This article was downloaded by: [York University Libraries]On: 15 November 2014, At: 04:23Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41Mortimer Street, London W1T 3JH, UK
International Journal of Pavement EngineeringPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/gpav20
Bayesian Regression Methodology for Project PrioritizationMustaque Hossain a , Tanveer Chowdhury b & Andrew J. Gisi ca Department of Civil Engineering , Kansas State University , Manhattan, KS, 66506, USAb Pavement Management Engineer, Virginia Department of Transportation, Richmond District ,Colonial Heights , VA, 23834, USAc Kansas Department of Transportation , Bureau of Materials and Research , 2300 Van Buren, Topeka,KS, 66611, USAPublished online: 27 Oct 2010.
To cite this article: Mustaque Hossain , Tanveer Chowdhury & Andrew J. Gisi (2002) Bayesian Regression Methodology for ProjectPrioritization, International Journal of Pavement Engineering, 3:3, 185-195, DOI: 10.1080/1029843021000067836
To link to this article: http://dx.doi.org/10.1080/1029843021000067836
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in thepublications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations orwarranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsedby Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings,demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectlyin connection with, in relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction,redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expresslyforbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Bayesian Regression Methodology for Project Prioritization
MUSTAQUE HOSSAINa,*, TANVEER CHOWDHURYb and ANDREW J. GISIc
aDepartment of Civil Engineering, Kansas State University, Manhattan, KS 66506, USA; bPavement Management Engineer, Richmond District, VirginiaDepartment of Transportation, Colonial Heights, VA 23834, USA; cKansas Department of Transportation, Bureau of Materials and Research, 2300 Van
Buren, Topeka, KS 66611, USA
(Received 2 June 2001; Revised 16 July 2002)
In highway infrastructure planning, results from pavement evaluation need to be aggregated into acomposite or combined measure of quality for project selection at the network level. In the projectprioritization process of the Kansas Department of Transportation (KDOT), a pavement structuralrating attribute, Pavement Structural Evaluation (PSE), is used. Currently these ratings are donesubjectively based on the condition of the pavement indicated by the visual distresses, maintenancehistory, and engineering judgement because KDOT does not collect any deflection data during network-level distress survey. This paper describes the application of classical and Bayesian regressionmethodologies for better estimation of PSE values using the results from the Falling WeightDeflectometer (FWD) tests and network-level distress survey for project prioritization purposes.
Keywords: Pavement; Pavement management system; Pavement evaluation; Bayesian regression;FWD
INTRODUCTION
Pavements are the most costly elements in the highway
infrastructure system (AASHTO, 1986). A World Bank
study in 1987 estimated that one quarter of the paved roads
outside urban areas in developing countries were in need
of reconstruction, and that an additional 40% of paved
roads required strengthening then or in the next few years
(Paterson, 1987). Similar situations have been arising in
developed countries to varying degrees from the eighties.
For example, accelerated deterioration of the federally-
aided highways in the United States (U.S.) required a 44%
increase in funding in 1982 to meet the repair and
rehabilitation costs of the system (NAPA, 1998). It is well
accepted that the interstate highway system in the U.S. is
deteriorating fast. The system already carries two and half
times the traffic it did in 1975, and congestion is still
increasing. In the past seven years, highway capacity has
grown 2% while traffic has increased by 37% (NAPA,
1998). In May 1998, the Congress passed the Transpor-
tation Equity Act for the 21st Century (TEA-21), the six-
year $216 billion highway bill for roads, bridges and mass
transit. Until the year 2003, the bill is believed to
guarantee that all incoming revenues to the Highway Trust
Fund can only be used for highway and mass transit
investments. It is also believed that even if the entire $216
billion is spent on repairing interstate highways, it would
not be enough to restore, upgrade, and maintain them
(NAPA, 1998).
Such projections at the international and national
levels exemplify the problems facing the highway
planners, financiers, managers and engineers everywhere
at national or local levels to varying degrees. The
problem concerns deterioration of an aging highway
infrastructure and how best to control it, taking into
account the best interests and constraints of the economy
and resources. Largely because of the worldwide need for
extensive rehabilitation programs in the 1980s and 1990s,
and in order to avoid such sharp peaks in highway
expenditure, increasing efforts are being made to develop
and implement improved road management and plan-
ning tools. These tools are required for evaluating the
allocation of financial needs of road maintenance,
rehabilitation programs, evaluating the design and
maintenance standards appropriate for the funding
available to the highway sector, not to mention planning
and prioritizing works in the program (Paterson, 1987).
All such planning and projections depend upon
evaluation of the road network which is consistent across
the network and time.
ISSN 1029-8436 print/ISSN 1477-268X online q 2002 Taylor & Francis Ltd
DOI: 10.1080/1029843021000067836
*Corresponding author. E-mail: [email protected]
The International Journal of Pavement Engineering, 2002 Vol. 3 (3), pp. 185–195
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 0
4:23
15
Nov
embe
r 20
14
Pavement evaluation in pavement management systems
(PMS) is generally directed toward the following objec-
tives (Haas et al., 1994):
1. Selection of projects and treatment strategies at the
network level.
2. Identification of specific maintenance requirements at
the project level.
Each of these objectives requires pavement evaluation
information to greater or lesser degrees of detail. In the
case of lesser detail, aggregation of the individual
measures comprising the information, such as composite
or combined measure of pavement quality is widely used.
Such a combined measure for each pavement section
is helpful at the network level for technical decisions,
e.g. project prioritization.
PAVEMENT STRUCTURAL EVALUATION (PSE)
In the priority ranking procedure of the Kansas Depart-
ment of Transportation (KDOT), a measure of pavement
structural quality, PSE is used (KDOT, 1984). Pavements
are rated on the basis of “control section” on a scale of
0–10 for PSE, 10 being the best or no work required. A
control section is the basic reporting, identifying and
analysis unit for KDOT (KDOT, 1998). It is defined as a
segment of roadway with reasonably uniform geometric,
traffic, surface and base characteristics for its entire length.
These sections are also used in project prioritization
purposes. In the ranking process during prioritization of
candidate roadway projects for reconstruction, PSE is
expected to be an indicator of structural deficiency of the
control section. The other major attribute is observed
surface condition. However, currently the relative weight
of the PSE attribute in the interstate roadway priority
formula is 0.208, or twice the next weighted attribute of
observed condition ðweight ¼ 0:104Þ: The other attributes
in the formula are rideability, lane width, shoulder width,
volume to capacity ratio, commercial traffic, number of
narrow structures per mile, etc. The PSE rating attribute for
non interstate roadways also has very high weight.
Annual PSE ratings are furnished by the district offices
of KDOT and are based on the condition and strength of
base and surface as indicated by the maintenance history
and visual distresses, subgrade failures, and judgmental
ability of the section to provide an adequate surface for the
type of traffic. The Geotechnical unit of KDOT provides a
possible range of PSE values for the control sections based
on algorithms developed by the experts using the distress
survey data. However, these values did not appear to be
helpful to the districts and in some cases, led to confusion.
Discrepancies were also observed in the annual PSE
ratings on the same route across the district line. Since
KDOT does not collect any deflection data during
network-level distress survey, the PSE computation
process does not directly take into account any structural
evaluation. However, some of the distresses, such as,
fatigue cracking, are considered structure related.
OBJECTIVE
The primary objective of this study was to derive an
improved methodology to estimate the PSE values from
the Falling Weight Deflectometer (FWD) tests and the
network-level distress survey results.
STUDY APPROACH
A number of variables were investigated as potential
predictors of the PSE values. Multiple linear regression
models were developed with these independent variables
to objectively quantify the decrease of the PSE value. The
Bayesian regression methodology (Smith et al., 1979;
Jackson and Mahoney, 1991; Kaweski and Nickeson,
1997) was also used to develop regression models with
the same variables as the classic multiple linear regression
analysis. Finally, models developed by the classical and
Bayesian regression methodologies were tested on a
different set of pavements in a different KDOT District,
and appropriate models were recommended for global use
on the KDOT network.
DATA COLLECTION
KDOT maintains two types of asphalt pavements—Full-
Design and Partial-Design Bituminous Pavements. Full-
Design Bituminous (FDBIT) pavements were designed for
current and projected traffic, and usually carry heavier
traffic than the Partial-Design Bituminous (PDBIT)
pavements. PDBIT pavements resulted from the paving
and maintenance of the farm-to-market roads in the 1940s
and 1950s. The structural behavior of these two types of
pavements are different and hence, they were analyzed
separately. Forty two FDBIT and 112 PDBIT “control”
sections on the non interstate routes of Districts I and IVof
KDOT were selected in this study. The sections ranged in
length from at least three kilometers to several kilometers.
Deflection data was collected on the study sections by a
Dynatest 8000 FWD in 1993, 1994 and 1995. Tests were
also done in District II in 1996 and 1997. Ten FWD tests per
1.6 km (1 mile) were performed on the outer wheel path of
the travel lane. Other information, e.g. thickness history,
age and cumulative ESAL’s since the last rehabilitation
action, number of transverse cracks per roadway mile,
and PSE values assigned by the districts for the selected
sections, were collected from the PMS database of KDOT.
FWD DATA ANALYSIS
The structural capacity of flexible pavements declines
with time and traffic. In the 1993 AASHTO Design Guide,
M. HOSSAIN et al.186
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 0
4:23
15
Nov
embe
r 20
14
the structural capacity for existing flexible pavement is
expressed as the effective structural number, SNeff
(AASHTO, 1993). The primary objective of structural
evaluation is to determine the effective structural capacity
of existing pavements. The evaluation process must
consider the current condition of the existing pavement
materials, and also consider how those materials will
behave in the future. In this study, the approach suggested
in the 1993 AASHTO Guide was followed to calculate the
effective structural number of the control sections. The
FWD first sensor deflection was normalized to 40 kN
(9000 lb) load and was corrected for temperature at 208C
(688F). The FWD deflections were then used to
backcalculate the subgrade resilient modulus (Mr). The
effective pavement modulus (Ep) value was determined
from the equation suggested by AASHTO. Details of the
calculations can be found elsewhere (Chowdhury, 1998).
Once the Ep value was determined, the effective structural
number could be calculated by the following formula
provided by AASHTO (1993):
SNeff ¼ 0:0045 £ D £ ðEpÞ1=3 ð1Þ
where, D is the total thickness of the pavement above
subgrade.
ASSESSMENT OF PAVEMENT STRUCTURAL
DETERIORATION
The major factors contributing to the structural deterio-
ration of asphalt pavements are traffic and climate. Three
variables were selected to predict the decrease in structural
number ðDSN ¼ Design SN 2 SNeffÞ to assess structural
deterioration at the network level:
1. Age (in years) of the pavements since the last
rehabilitation action,
2. Cumulative number of ESAL’s that have passed over
the pavement since the last rehabilitation action, and
3. Thickness (in inches) of the pavements.
Different models were developed for the FDBIT and
PDBIT pavements since the structural behavior of these
pavements are different. By doing simple linear regression
analysis, it was apparent that the decrease in structural
number was highly correlated with age, cumulative
number of ESAL’s and thickness for the FDBIT pavements,
and age and cumulative ESAL’s for the PDBIT pavements.
However, the variables age and cumulative ESAL’s were
found to be highly correlated between themselves. Thus,
only variable age was retained in the analysis since this
parameter is more accurate than calculated ESAL’s. It is
to be noted here that for secondary roadways studied in
this project, traffic growth is very uniform. The following
models were developed based on the multiple linear
regression analysis. More details of the statistical analyses
can be found elsewhere (Hossain et al., 2000).
FDBIT Pavements:
DSN ¼ 0:0218 £ age þ 0:001 £ thickness
ðR2 ¼ 0:81Þð2Þ
PDBIT Pavements:
DSN ¼ 0:0166 £ age ðR2 ¼ 0:72Þ ð3Þ
CLASSICAL REGRESSION ANALYSIS TO
PREDICT DPSE
The major objective of this research was to objectively and
quantitatively determine the PSE values of the pavements
since the last rehabilitation action. However, the decrease
in PSE value was taken as the dependent variable because
it somewhat represents a “normalized” value. Classical
multiple regression analysis was performed to estimate
the decrease in the PSE (DPSE) values. One of the most
important aspects of classical regression analysis is the
selection of independent variables that are strong indica-
tors of the dependent variable. The selection was done in
two steps (Ott, 1993):
1. Enumerating the independent variables, and
2. Evaluating and selecting independent variables sub-
jectively or by analyzing correlation.
Selection of Independent Variables for Prediction of
Change in PSE Values (DPSE)
Extensive literature search was done to select the inde-
pendent variables to predict the decrease in PSE values.
Expert opinion was also sought for this purpose. Since
PSE ratings are based on the condition of the base and
surface, as indicated by the maintenance costs, subgrade
failures, and ability of the section to provide an adequate
surface for the prevailing traffic, the following variables
were selected to reflect those conditions:
(1) Age of the pavement since the last rehabilitation
action (in years) (AGE),
(2) Cumulative ESAL’s that have passed over the
pavement since the last action,
(3) AC layer thickness (in inches) (TH),
(4) PSE value assigned to the pavement immediately
after the last action (PSE),
(5) Decrease in structural number (DSN), and
(6) Distress level due to transverse cracking (DL).
The selected variables were plotted on scatter plots
against the dependent variable, DPSE values, and were
inspected for possible trends. Also, correlation coeffi-
cients for different pairs were determined. It was apparent
from the scatter plot that age and DSN were not linearly
related to DPSE values. In the case of age, the rationale is
that PSE values do not decrease at the same rate with time.
During the initial years this rate is lower, but after a certain
BAYESIAN PROJECT PRIORITIZATION 187
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 0
4:23
15
Nov
embe
r 20
14
period, the PSE values start to decrease drastically. A trial-
and-error approach was followed to determine the
transformed functional form for an independent variable
(Chowdhury, 1998). After several trials, the variable age
was transformed to (age)1.5. For the relationship between
the dependent variable, DPSE and the independent vari-
able, age, the Pearson’s correlation coefficients improved
from 0.35 to 0.68 for the FDBIT and 0.39 to 0.56 for the
PDBIT pavements, when the transformation was
performed. Similarly, the variable decrease in structural
number DSN, was transformed to exp (DSN) to improve
the correlation coefficient of the relationship from 0.49
to 0.61 for the FDBIT, and 0.48 to 0.55 for the PDBIT
pavements. The variable AC layer thickness was dropped
from the PDBIT model as a predictor since the thickness
of this type of pavement was not designed to carry the
expected traffic. The variables age and cumulative ESALs
have a very high correlation between themselves
(correlation coefficient of 0.65 for FDBIT and 0.58 for
PDBIT). Therefore, only one of them (age) was included
in the model to avoid possible multicolinearity or
overspecification of the model (Chowdhury, 1998).
Transverse cracking was included in the model as a
binary (0 or 1) variable. No rutting was included since it is
believed that the amount of rutting on these sections (most
below 12.5 mm) did not influence the PSE ratings by
KDOT. Transverse cracking on the pavements in Kansas
is measured by the number of equivalent roadway-width
cracks. According to KDOT PMS rating guide (KDOT,
1996), the crack severity is categorized using three severity
codes:
Code 1 or DL1: No roughness, 6 mm (0.25 in.) or wider
with no secondary cracking; or any width with second-
ary cracking less than 1.2 m (4 ft) per lane.
Code 2 or DL2: Any width crack with noticeable
roughness due to depression or bump. Also includes
cracks that have greater than 1.2 m (4 ft) of secondary
cracking but no roughness.
Code 3 or DL3: Any width crack with significant
roughness due to depression or bump. Secondary crack-
ing will be more severe than code 2.
Different combinations of the coded cracks will result
in different distress levels due to transverse cracking
(KDOT, 1996).
MODEL SELECTION
Criteria Used to Select a model
The following criteria were used to select a model:
i) Minimize mean sum square errors (MSE): The
smallest MSE will result in the narrowest confidence
intervals and largest test statistics. The model with
the smallest MSE involving the least number of
independent variables can generally be considered as
the best model (Ott, 1993).
ii) Maximize the Coefficient of Determination (R 2): R 2
is a measure of how well the estimated model fits the
observed data. The best model selected is generally
the one with the largest R 2.
iii) Minimum increase of R 2: The best model is selected
as the model associated with the smallest increase in
R 2 with the addition of an extra variable.
iv) Mallows Cp statistic: The best model is usually
thought to have a Cp value closest to p, where, p is
the number of regression coefficients. Models asso-
ciated with Cp greater than p are usually thought to
be biased or misspecified models (Ott, 1993).
Models Obtained
FDBIT Pavements: Detailed analyses and summary
statistics of the model development can be found
elsewhere (Chowdhury, 1998). For FDBIT pavements,
the selected models are:
Distress Level 1
DPSE ¼ 0:216 £ ðAGEÞ1:5 2 20:82 £ exp½DSN�
þ 0:138 £ TH þ 0:328 £ PSE þ 17:65
£ DL1 ð4Þ
Distress Level 2
DPSE ¼ 0:216 £ ðAGEÞ1:5 2 20:82 £ exp½DSN�
þ 0:138 £ TH þ 0:328 £ PSE þ 18:06
£ DL2 ð5Þ
Distress Level 3
DPSE ¼ 0:216 £ ðAGEÞ1:5 2 20:82 £ exp½DSN�
þ 0:138 £ TH þ 0:328 £ PSE þ 18:38
£ DL3 ð6Þ
½R2 ¼ 78:4%; RMSE ¼ 0:48; n ¼ 38�
The statistical p-values for the parameters imply that
all variables are significant at a level of more than
95%. The Analysis of Variance (ANOVA) results
showed that the model had an F-value of 37 and its
significance value was 0.0001. Since the selected model
had a high F-value and a very low p-value, it satisfied
the model selection criteria mentioned earlier. Also,
the estimated root mean square error (s ) value for the
model was 0.47, which revealed the fact that the
selected model would predict the decrease in PSE
M. HOSSAIN et al.188
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 0
4:23
15
Nov
embe
r 20
14
values at a variability of ^2s or ^0.94 with a
confidence of 99%.
PDBIT Pavements: The selected models are:
Distress Level 1
DPSE ¼ 0:024 £ ðAGEÞ1:5 2 1:145 £ exp½DSN�
þ 0:171 £ PSE þ 0:229 £ DL1 ð7Þ
Distress Level 2
DPSE ¼ 0:024 £ ðAGEÞ1:5 2 1:145 £ exp½DSN�
þ 0:171 £ PSE þ 0:958 £ DL2 ð8Þ
Distress Level 3
DPSE ¼ 0:024 £ ðAGEÞ1:5 2 1:145 £ exp½DSN�
þ 0:171 £ PSE þ 2:27 £ DL3 ð9Þ
½R2 ¼ 86:6%; RMSE ¼ 0:42; n ¼ 104�
Again, the p-values for the parameters indicated that all
the variables were significant at a level of more than 95%.
The ANOVA results showed that the model had an
F-value of 132 and its significance value was 0.0001
(Chowdhury, 1998). Thus, these models met the model
selection criteria discussed earlier.
Table I shows the individual contribution of the vari-
ables used to develop the above models. The results
show that the variable exp [DSN] had the highest con-
tribution to the models for the FDBIT pavements. Also,
for these pavements, initial PSE and the transformed age
variable, [(age)1.5] had similar contributions. For the
PDBIT pavements, initial PSE had the highest contri-
bution. Distress level also contributed significantly for
these pavements.
BAYESIAN REGRESSION ANALYSIS
It is well established that prediction equations are very
important tools for the PMS. However, databases to
support development and updating of these models are
often inadequate, noisy or altogether lacking. Conven-
tional statistical modeling tools, such as classical
regression analysis, may have limited success in these
applications (Kurlanda and Kajner, 1996). A promising
solution lies in the use of Bayesian regression, which
explicitly allows experts to be used to supplement poor
quality data. Bayesian regression methodology was
adopted by the Canadian Strategic Highway Research
Program (C-SHRP) for the Canadian Long Term Pavement
Performance (C-LTPP) monitoring program (Nesbitt and
Sparks, 1990).
An Overview of the Bayesian Regression Approach
In its simplest sense, Bayesian regression is a specialized
adaption of the Bayes Theorem involving development of
multivariate regression models which explicitly consider
two disparate sources of information (Kaweski and
Nickeson, 1997):
1. A Prior Information: Information that is known prior
to an experiment.
2. Experimental Data: Information that is derived from
an experiment.
The interpretation and conclusion drawn from the
experimental data can be quite different depending on
what other evidence exists on the subject at hand.
However, this difference in interpretation does not simply
mean biasing a result. Interpretation of results using Bayes
Theorem is a mathematically consistent way to interpret
new evidence/information.
The Bayesian statistical method for model develop-
ment systematically combines prior knowledge and
experience with data to improve the predictive relation-
ship (Smith et al., 1979; Jackson and Mahoney, 1991;
Kaweski and Nickeson, 1997). The Bayes Theorem
calculates a meaningful and credible answer without
relying solely on a small database. In doing so, the Bayes
technique allows decisions to be made in the short term
while improvements to the data, judgement and the
model continue to be made. The Bayesian regression
achieves a balance between two solutions based on the
data or judgement alone. In assembling information for
Bayesian regression, data collected in the traditional
manner is supplemented with prior knowledge. The
so-called “prior” may be drawn from expert judgement,
“old” data sets, or knowledge that is generally accepted
in the field. Expert judgement can also be encoded by
polling experts and asking them to estimate the value of
the dependent variable for a combination of contributory
variables. Once collected, the experts’ observations are
interpreted similar to the traditional data (Kaweski and
Nickeson, 1997). The predicted variable then can be
expressed as:
Y ¼ bpr0 þ bpr1X1 þ bpr2X2 þ . . .þ bprkXk þ epr ð10Þ
TABLE I Individual contribution of different variables towards themodels
VariablesFDBIT pavements PDBIT pavements
Individual R 2
(%) RMSEIndividual R 2
(%) RMSE
Exp[D(SN)] 44.1 0.68 30.1 0.91Initial PSE 42.4 0.69 50.2 0.77(Age)1.5 38.2 0.70 30.3 0.89Distress level 22.4 0.79 39.6 0.82Thickness 22.1 0.80 N. A. N. A.
BAYESIAN PROJECT PRIORITIZATION 189
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 0
4:23
15
Nov
embe
r 20
14
where:
Y is the dependent variable,
Xi are the predictor variables,
k is the number of independent variables,
bpri are the regression coefficients for the prior, and
epr is the random error term for the prior with the
degrees of freedom as
vpr ¼ npr 2 k 2 1
Developing Prior and Assembling Sample Data
The evidence/information known prior to collecting new
data is known in Bayesian terminology as “the prior.” The
prior can be derived either subjectively using expert
judgement or objectively based on the existing data or
models. Both approaches require that prior information be
put into either an N-prior or G-prior format. Both N-prior
and G-prior summarize a linear regression, which
represents the prior state of knowledge in the Bayesian
regression calculation. The prior includes the coefficients
of the linear regression equation along with the
corresponding regression statistics such as the variance
of the regression coefficients. The regression statistics
indicate the certainty of the prior and are used to weigh the
balance between the prior and the data in the Bayesian
regression calculation. The N-prior uses the variance–
covariance matrix for prior in the Bayesian regression
calculations (Kaweski and Nickeson, 1997). A list of
information needed to define both N-prior and G-prior is
provided in Table II.
The G-prior option is typically used when the
coefficient ‘means’ have been estimated directly by
experts. The G-prior derives the variance/covariance
matrix for the coefficient means based on a set of
independent variable data. The G-prior factor, denoted by
g, is used to increase or decrease the influence of the prior
in the calculation of the posterior. A typical value for g is
1. This essentially gives the prior variance/covariance
matrix weight equal to that for the experimental data. The
greater the value of g, the more influence the prior will
have on the posterior. Since the pseudo/prior data used in
this research were not derived from expert opinion only,
the N-prior option of Bayesian regression was used.
The precision matrix for the prior, A, is calculated for
the N-prior as (Zellner, 1987; Press, 1989):
A ¼ Var=Covarpr
vpr 2 2
vprSe2
� �� �21
ð11Þ
where
Se2pr ¼ varðeprÞ ¼
Pni¼1 ðYipr–actual
2 Yipr–predictedÞ2
v
and
var=covar ¼
s2b0pr
gb0b1pr· · · gb0bkpr
gb1b0prs2
b1pr· · · gb1bkpr
..
. ... ..
. ...
gbkb0prgbkb1pr
· · · s2bkpr
2666666664
3777777775
where s 2bi
indicates the variance for coefficient, and gbibi
indicates the covariance between two coefficients.
With the experimental data, the experiment precision
matrix, H, is (Zellner, 1987; Press, 1989):
H ¼ ðXtXÞ ð12Þ
where X is the matrix of observed values, and X t denotes
the transform of matrix X. The coefficient matrix, b, is then:
b ¼ H21XtY ð13Þ
where Y is the vector of the dependent data.
The posterior precision matrix, M, is calculated by
adding the prior matrix to the experimental data matrix as:
M ¼ A þ H ð14Þ
The posterior regression coefficients are calculated
using a weighted average of the prior regression coeffi-
cients and posterior regression coefficients as:
bpos ¼ M21ðAbpr þ HbÞ ð15Þ
The posterior degrees of freedom are calculated by
adding the prior degrees of freedom and experimental
degrees of freedom and adding the number of coefficients
in the functional form (Zellner, 1987; Press, 1989):
vpos ¼ ðvpr þ v þ k þ 1Þ ð16Þ
The number “1” in Eq. (16) is added assuming that a
constant bo is present in the regression equation (Zellner,
1987; Press, 1989).
Bayesian Regression to Predict the Decrease in the PSE
Values
The Bayesian regression analysis in this study used the
XLBayes software, a Bayesian regression program based
in the EXCEL environment (Kaweski and Nickeson,
1997). In this program, prior data needs to be combined
with the sample data to obtain the desired posteriors
TABLE II Required prior information (after Kaweski and Nickeson,1997)
Prior informationRequired for
N-priorRequired for
G-prior
Means vector U U
Variance/Covariance Matrix U –G-prior data set – U
G-prior factor – U
Residual variance U U
Degrees of freedom U U
M. HOSSAIN et al.190
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 0
4:23
15
Nov
embe
r 20
14
(Kaweski and Nickeson, 1997). The prior data can be
drawn from the expert judgement, old data sets or
knowledge that is generally accepted in the field. In this
study, the data set for a number of pavements from the
KDOT Districts I and IV for 1993 and 1994 was used as
prior data, and the data for 1995 was used as the sample
data. The same functional form and transformations of the
independent variables as in the classical regression were
used.
Results of Bayesian Regression and Selected PosteriorModels
The selected posterior models using N-prior Bayesian
regression analysis are presented below:
FDBIT Pavements:
Distress Level 1
DPSE ¼ 0:123 £ ðAGEÞ1:5 2 9:329 £ exp½DSN�
þ 0:106 £ TH þ 0:374 £ PSE þ 5:89
£ DL1 ð17Þ
Distress Level 2
DPSE ¼ 0:123 £ ðAGEÞ1:5 2 9:329 £ exp½DSN�
þ 0:106 £ TH þ 0:374 £ PSE þ 6:04
£ DL2 ð18Þ
Distress Level 3
DPSE ¼ 0:123 £ ðAGEÞ1:5 2 9:329 £ exp½DSN�
þ 0:106 £ TH þ 0:374 £ PSE þ 6:47
£ DL3 ð19Þ
PDBIT Pavements:
Distress Level 1
DPSE ¼ 0:021 £ ðAGEÞ1:5 2 1:873 £ exp½DSN�
þ 0:303 £ PSE þ 0:392 £ DL1 ð20Þ
Distress Level 2
DPSE ¼ 0:021 £ ðAGEÞ1:5 2 1:873 £ exp½DSN�
þ 0:303 £ PSE þ 0:881 £ DL2 ð21Þ
Distress Level 3
DPSE ¼ 0:021 £ ðAGEÞ1:5 2 1:873 £ exp½DSN�
þ 0:303 £ PSE þ 1:974 £ DL3 ð22Þ
The Classical regression results using pseudo data,
development of the N-prior and the posterior regression
coefficients for the FDBIT and PDBIT pavements have
been reported elsewhere (Chowdhury, 1998).
MODEL EVALUATION
The purpose of model evaluation is to draw conclusions
about the Bayesian posterior results. Evaluation empha-
sizes comparisons among the data, prior, and posterior.
These comparisons may be used for additional iterations
for the analysis later on. The statistical performance of a
classical regression model is typically measured by
evaluating the standard error (Se), coefficient of determi-
nation (R 2), F-statistic, and t-statistic. In Bayesian
regression, only Se and t-statistic can be evaluated.
Neither R 2 nor the F-statistic can be calculated because
they rely on the experimental data which does not exist for
the posterior results (Kaweski and Nickeson, 1997).
Data, Prior, and Posterior PDF Plots
An important output of XLBayes is the Probability
Density Function (PDF) plot for each coefficient in the
model. These plots graphically compare the distributions
of the same coefficient when based on the data alone, the
prior alone, or the Bayesian posterior. Figure 1 shows a
sample PDF plot for the coefficient of the transformed
variable (age)1.5 used in the model.
Under the assumptions of both classical multiple linear
and Bayesian regressions, the model coefficients follow t-
distribution. The width of the bell-shaped curve shows the
confidence in estimating the coefficients. The PDF plots of
all coefficients reveal that the probability distribution for
the posterior estimate is “tighter” than either the prior or
the data. This is intuitively reasonable as the prior and the
data reinforce each other with similar estimates of
coefficients. Bayesian regression models can always be
updated by inserting more data in the model which makes
the posterior more and more definitive (Kaweski and
Nickeson, 1997).
Student’s t-statistic
The Student’s t-test is used to determine whether a
regression coefficient is statistically significant. In that
case, one can be confident that the variable in question
really has statistically significant influence on the model
results. The higher the value of t, the more confidence
about its value and significance. If the regression coeffi-
cients in the prior and posterior are not statistically
significant, it may be useful to rerun the analysis after
excluding the insignificant variable. If the standard error
term does not increase significantly, the excluded variable
may not be a statistically significant contributory variable.
The ideal result is for the data and prior to reinforce each
other, resulting in a posterior coefficient that has a smaller
standard error than either one individually. This is not
always the case, however, and the posterior may, in fact,
BAYESIAN PROJECT PRIORITIZATION 191
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 0
4:23
15
Nov
embe
r 20
14
have a larger standard error. Irrespective of how much the
variance has changed, it is desirable that the coefficients in
the posterior model all be statistically significant
(Kaweski and Nickeson, 1997).
The t-statistics and standard deviations of different
coefficients of the models in this study are presented in
Table III. The t-value for a regression coefficient is calcu-
lated by dividing the mean of the regression coefficient by
its standard deviation:
t ¼ bi=sbið23Þ
It is observed that the t-statistics of all selected variables
are outside the range of 1.96 and 21.96, which means that
they are significant at 5% level of significance.
Standard Error of the Residuals (Se)
The standard error of the residuals, Se, is a basic measure
of regression model performance. The standard error
(or the standard deviation) of the residuals is simply the
square root of the residual variance, S2e : The lower the Se,
the closer the predictions made by the model are to the
actual observations of the dependent variable, and
therefore, the better the model. The posterior standard
error in Bayesian regression is calculated by adding the
prior standard error, to the experimental data standard
error and adding two additional factors to account for
the deviation of the posterior regression coefficients from
the experimental coefficient and the deviation of the
posterior regression coefficients from the prior regression
coefficients (Kaweski and Nickeson, 1997):
S2e ¼
ðvprS2epr
þ vS2eÞ þ ðb 2 bposÞ
tHðb 2 bposÞ þ ðbpr 2 bposÞtAðbpr 2 bposÞ
vpos
ð24Þ
Under the assumptions of the regression analysis, the
residual has a mean of zero and is normally distributed.
FIGURE 1 Comparison of normal probability plots for age.
TABLE III Standard deviation and t-statistic of the posterior coefficients
Pavement type Variable Std. deviation t-value Res. Var. ðS2eÞ
FDBIT (Age)1.5 0.034 3.620 0.32Thickness 0.041 2.547
Exp[D(SN)] 4.240 22.200PSE 0.107 3.486DL1 2.979 1.98DL2 2.876 2.101DL3 2.424 2.670
PDBIT (Age)1.5 0.008 2.349 0.20Exp[D(SN)] 0.500 23.746
PSE 0.038 7.850DL1 0.196 1.990DL2 0.383 2.301DL3 0.466 4.234
M. HOSSAIN et al.192
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 0
4:23
15
Nov
embe
r 20
14
Table III shows that the standard deviation, Se for the
FDBIT and PDBIT models are 0.56 and 0.44, respec-
tively. Therefore, the selected models in this study will
predict the DPSE values within ^1.1 units of actual
ratings for FDBIT and ^0.88 units for PDBIT pavements
with 95% confidence.
RESULTS AND DISCUSSION
Prediction of DPSE Values Using Proposed Models
As mentioned earlier, data from 1993, 1994 and 1995 was
used in the regression analysis. Statistical tests were
performed on the models, which yielded very convincing
and satisfactory results. Data from a different set of
control sections collected in different years were selected
to test the models. These control sections were not
included in the regression analysis. For 1996, 12 FDBIT
and 26 PDBIT sections and for 1997, 10 FDBIT and 19
PDBIT sections were chosen randomly to test the models
developed in this study. Both classical and Bayesian
regression models were used to predict the DPSE values
on these control sections. At the same time, the rated
decrease in the PSE values assigned by the KDOT
engineers were also collected. Figures 2–5 compare the
predicted and rated DPSE values graphically.
The PSE values are always assigned by the districts as
integers. Since the coefficients of the regression
equations are not integers nor are the independent
variables, the outputs from the models are evidently
decimals. The predicted DPSE values were rationally
rounded up or down to the nearest integer to mimic
district ratings. The figures indicate that the predicted
DPSE values very closely approximate the rated DPSE
values on most of the control sections. In a few cases,
discrepancies were observed in the KDOT district
ratings. For example, on Route K-68 (Project No. 14 in
Fig. 4), the PSE rating has increased by 2 although no
FIGURE 2 Comparison of rated and predicted D(PSE) values (FDBIT pavements, 1996 data).
FIGURE 3 Comparison of rated and predicted D(PSE) values (FDBIT pavements, 1997 data).
BAYESIAN PROJECT PRIORITIZATION 193
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 0
4:23
15
Nov
embe
r 20
14
rehabilitation action has been taken on this pavement
since the last action four years ago. On the other hand,
both the Bayesian and classical regression models
suggest that the PSE value should decrease by 2.
Similarly, other discrepancies in the present rating
system were rationally and objectively addressed by the
selected models.
Range of the Independent Variables
Like all other regression equations there is a range of each
independent variable for which the selected models are
expected to predict the dependent variable with sufficient
accuracy. The prediction interval band will be wider
outside that range, and it is statistically incorrect to use the
models in those cases. The suggested ranges of the
independent variables are:
1. Age since last rehabilitation action: 1–18 years,
2. Thickness: 100–760 mm (4–30 in),
3. PSE rating in the base year: 2–10,
4. Decrease in structural number, DSN: 0.001–2.5,
5. Distress level due to transverse cracking: 1–3.
Paired t-test Results
Paired t-tests were performed to determine whether data
from two different sources have the same mean or
whether they are statistically similar (Ott, 1993). Rated
FIGURE 4 Comparison of rated and predicted D(PSE) values (PDBIT pavements, 1996 data).
FIGURE 5 Comparison of rated and predicted D(PSE) values (PDBIT pavements, 1997 data).
M. HOSSAIN et al.194
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 0
4:23
15
Nov
embe
r 20
14
decrease in the PSE values were compared with the
predicted decreases derived from both classical and
Bayesian regression models. The results of these t-tests
are tabulated in Table IV. The results indicate that for
both FDBIT and PDBIT pavements and for all
regression models, there was no significant difference
between the two sets of data. From the sum of the
squared errors, it can be concluded that for the FDBIT
pavements, the Bayesian and classical regression models
yielded similar results while for the PDBIT pavements,
the Bayesian regression models appeared to be slightly
more accurate.
CONCLUSIONS
1. The current PSE rating system used by KDOT is
subjective and discrepancies were observed in the
rating process. The regression models proposed in this
study predict the PSE values by taking into account the
FWD deflection data, age, thickness, and distress level
of pavements and hence, the model output are
representative of the actual structural condition of the
pavement sections. Independent tests showed that the
proposed models very closely approximate the present
PSE ratings obtained at the KDOT district level.
2. The models obtained from the classical and Bayesian
regression methodologies were very similar in form
and yielded statistically similar results when tested on
a different set of pavements. Both models appear to be
statistically sound from the view points of prediction
capabilities and model utility. The Bayesian regression
models yielded slightly better results during testing. It
should be noted that the Bayesian regression is a
continuous process of updating the existing “partial
state of knowledge.” As the existing database gets
enriched with more data, the Bayesian regression will
result in a posterior with an even smaller confidence
interval. Hence, it is highly recommended that the
existing models be updated every other year with more
recent data. Also, models are applicable for the Kansas
conditions only. However, this study illustrates the
ways to develop such models for other states or
regions.
Acknowledgements
The authors would like to thank VEMAX Management,
Inc. and C-SHRP for providing the XLBayes software
used in this study. Assistance of Ms Lea Ann Caffrey,
formerly with KDOT, in data collection is gratefully
acknowledged.
References
AASHTO (1986) Guide for Design of Pavement Structures (AmericanAssociation of State Highway and Transportation Officials,Washington, DC).
AASHTO (1993) Guide for Design of Pavement Structures (AmericanAssociation of State Highway and Transportation Officials,Washington, DC).
Chowdhury, T., (1998) Bayesian regression methodology for networklevel pavement project rating, M.S. Thesis, Department of CivilEngineering, Kansas State University, Manhattan.
Haas, R., Hudson, R.W. and Zaniewski, J.P. (1994) Modern PavementManagement, Krieger Publishing Co., Malabar, FL, pp 161–165.
Hossain, M., Chowdhury, T. and Gisi, A.J. (2000) Network-LevelPavement Structural Evaluation, The ASTM Journal of Testing andEvaluation, May.
Jackson, N. and Mahoney, J. (1991), Course Notes, Chapter 6, AnAdvanced Course in Pavement Management Systems, FederalHighway Administration, Washington, DC, pp 9–10.
KDOT (1984) Development of a Highway Improvement Priority Systemfor Kansas, Division of Planning and Development, KansasDepartment of Transportation, Topeka.
KDOT (1996) Kansas NOS Condition Survey Report, Bureau ofMaterials and Research, Kansas Department of Transportation,Topeka, Attachments I and II.
KDOT (1998) CANSYS: Control Section Analysis System, Division ofPlanning and Development, Kansas Department of Transportation,Topeka.
Kaweski, D. and Nickeson, M. (1997) C-SHRP Bayesian Modeling: AUser’s Guide, Transportation Association of Canada, Ottawa.
Kurlanda, M.H. and Kajner, L. (1996) Predicting Roughness Progressionof Asphalt Overlays, Transportation Research Board, Washington,DC, Record 1539, pp 125–131.
Nesbitt, D. and Sparks, G. (1990) Design of Long Term PavementMonitoring System for the Canadian Strategic Highway ResearchProgram, Canadian Strategic Highway Research Program, Ottawa,Canada.
Ott, R.L. (1993) An Introduction to Statistical Methods and DataAnalysis, Duxbury Press, Belmont, CA.
Paterson, W.D.O. (1987) Road Deterioration and Maintenance Effects:Models for Planning and Management, The Johns HopkinsUniversity Press, Maryland and London, Published for the WorldBank.
Press, S. (1989) Bayesian Statistics: Principles, Models and Appli-cations, Wiley, New York.
Smith, W., Finn, F., Kulkarni, R., Saraf, C. and Nair, K. (1979) BayesianMethodlogy for Verifying Recommendations to Minimize AsphaltPavement Distress, Transportation Research Board, Washington, DC,NCHRP Report No. 213.
Zellner, A. (1987) An Introduction to Bayesian Inference inEconometrics, Krieger Publishing Co., Malabar, Florida.
TABLE IV Results of paired t-tests
Pavement type
Paired t-test results
Bayesian Classical
FDBIT tcrit(two tail) ¼ 2.079 tcrit(two tail) ¼ 2.079t ¼ 21.46 t ¼ 21.89Sum of sq. err. ¼ 16.74 Sum of sq. err. ¼ 16.87
PDBIT tcrit(two tail) ¼ 2.015 tcritðtwotailÞ ¼ 2:015t ¼ 21.39 t ¼ 21.93Sum of sq. err. ¼ 7.78 Sum of sq. err. ¼ 12.41
BAYESIAN PROJECT PRIORITIZATION 195
Dow
nloa
ded
by [
Yor
k U
nive
rsity
Lib
rari
es]
at 0
4:23
15
Nov
embe
r 20
14