bayesian regression methodology for project prioritization

This article was downloaded by: [York University Libraries]On: 15 November 2014, At: 04:23Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41Mortimer Street, London W1T 3JH, UK

International Journal of Pavement EngineeringPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/gpav20

Bayesian Regression Methodology for Project PrioritizationMustaque Hossain a , Tanveer Chowdhury b & Andrew J. Gisi ca Department of Civil Engineering , Kansas State University , Manhattan, KS, 66506, USAb Pavement Management Engineer, Virginia Department of Transportation, Richmond District ,Colonial Heights , VA, 23834, USAc Kansas Department of Transportation , Bureau of Materials and Research , 2300 Van Buren, Topeka,KS, 66611, USAPublished online: 27 Oct 2010.

To cite this article: Mustaque Hossain , Tanveer Chowdhury & Andrew J. Gisi (2002) Bayesian Regression Methodology for ProjectPrioritization, International Journal of Pavement Engineering, 3:3, 185-195, DOI: 10.1080/1029843021000067836

To link to this article: http://dx.doi.org/10.1080/1029843021000067836

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in thepublications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations orwarranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsedby Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings,demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectlyin connection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction,redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expresslyforbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/gpav20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/1029843021000067836

http://dx.doi.org/10.1080/1029843021000067836

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Bayesian Regression Methodology for Project Prioritization

MUSTAQUE HOSSAINa,*, TANVEER CHOWDHURYb and ANDREW J. GISIc

aDepartment of Civil Engineering, Kansas State University, Manhattan, KS 66506, USA; bPavement Management Engineer, Richmond District, VirginiaDepartment of Transportation, Colonial Heights, VA 23834, USA; cKansas Department of Transportation, Bureau of Materials and Research, 2300 Van

Buren, Topeka, KS 66611, USA

(Received 2 June 2001; Revised 16 July 2002)

In highway infrastructure planning, results from pavement evaluation need to be aggregated into acomposite or combined measure of quality for project selection at the network level. In the projectprioritization process of the Kansas Department of Transportation (KDOT), a pavement structuralrating attribute, Pavement Structural Evaluation (PSE), is used. Currently these ratings are donesubjectively based on the condition of the pavement indicated by the visual distresses, maintenancehistory, and engineering judgement because KDOT does not collect any deflection data during network-level distress survey. This paper describes the application of classical and Bayesian regressionmethodologies for better estimation of PSE values using the results from the Falling WeightDeflectometer (FWD) tests and network-level distress survey for project prioritization purposes.

Keywords: Pavement; Pavement management system; Pavement evaluation; Bayesian regression;FWD

INTRODUCTION

Pavements are the most costly elements in the highway

infrastructure system (AASHTO, 1986). A World Bank

study in 1987 estimated that one quarter of the paved roads

outside urban areas in developing countries were in need

of reconstruction, and that an additional 40% of paved

roads required strengthening then or in the next few years

(Paterson, 1987). Similar situations have been arising in

developed countries to varying degrees from the eighties.

For example, accelerated deterioration of the federally-

aided highways in the United States (U.S.) required a 44%

increase in funding in 1982 to meet the repair and

rehabilitation costs of the system (NAPA, 1998). It is well

accepted that the interstate highway system in the U.S. is

deteriorating fast. The system already carries two and half

times the traffic it did in 1975, and congestion is still

increasing. In the past seven years, highway capacity has

grown 2% while traffic has increased by 37% (NAPA,

1998). In May 1998, the Congress passed the Transpor-

tation Equity Act for the 21st Century (TEA-21), the six-

year $216 billion highway bill for roads, bridges and mass

transit. Until the year 2003, the bill is believed to

guarantee that all incoming revenues to the Highway Trust

Fund can only be used for highway and mass transit

investments. It is also believed that even if the entire $216

billion is spent on repairing interstate highways, it would

not be enough to restore, upgrade, and maintain them

(NAPA, 1998).

Such projections at the international and national

levels exemplify the problems facing the highway

planners, financiers, managers and engineers everywhere

at national or local levels to varying degrees. The

problem concerns deterioration of an aging highway

infrastructure and how best to control it, taking into

account the best interests and constraints of the economy

and resources. Largely because of the worldwide need for

extensive rehabilitation programs in the 1980s and 1990s,

and in order to avoid such sharp peaks in highway

expenditure, increasing efforts are being made to develop

and implement improved road management and plan-

ning tools. These tools are required for evaluating the

allocation of financial needs of road maintenance,

rehabilitation programs, evaluating the design and

maintenance standards appropriate for the funding

available to the highway sector, not to mention planning

and prioritizing works in the program (Paterson, 1987).

All such planning and projections depend upon

evaluation of the road network which is consistent across

the network and time.

ISSN 1029-8436 print/ISSN 1477-268X online q 2002 Taylor & Francis Ltd

DOI: 10.1080/1029843021000067836

*Corresponding author. E-mail: [email protected]

The International Journal of Pavement Engineering, 2002 Vol. 3 (3), pp. 185–195

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:23

15

Nov

embe

r 20

14

Pavement evaluation in pavement management systems

(PMS) is generally directed toward the following objec-

tives (Haas et al., 1994):

1. Selection of projects and treatment strategies at the

network level.

2. Identification of specific maintenance requirements at

the project level.

Each of these objectives requires pavement evaluation

information to greater or lesser degrees of detail. In the

case of lesser detail, aggregation of the individual

measures comprising the information, such as composite

or combined measure of pavement quality is widely used.

Such a combined measure for each pavement section

is helpful at the network level for technical decisions,

e.g. project prioritization.

PAVEMENT STRUCTURAL EVALUATION (PSE)

In the priority ranking procedure of the Kansas Depart-

ment of Transportation (KDOT), a measure of pavement

structural quality, PSE is used (KDOT, 1984). Pavements

are rated on the basis of “control section” on a scale of

0–10 for PSE, 10 being the best or no work required. A

control section is the basic reporting, identifying and

analysis unit for KDOT (KDOT, 1998). It is defined as a

segment of roadway with reasonably uniform geometric,

traffic, surface and base characteristics for its entire length.

These sections are also used in project prioritization

purposes. In the ranking process during prioritization of

candidate roadway projects for reconstruction, PSE is

expected to be an indicator of structural deficiency of the

control section. The other major attribute is observed

surface condition. However, currently the relative weight

of the PSE attribute in the interstate roadway priority

formula is 0.208, or twice the next weighted attribute of

observed condition ðweight ¼ 0:104Þ: The other attributes

in the formula are rideability, lane width, shoulder width,

volume to capacity ratio, commercial traffic, number of

narrow structures per mile, etc. The PSE rating attribute for

non interstate roadways also has very high weight.

Annual PSE ratings are furnished by the district offices

of KDOT and are based on the condition and strength of

base and surface as indicated by the maintenance history

and visual distresses, subgrade failures, and judgmental

ability of the section to provide an adequate surface for the

type of traffic. The Geotechnical unit of KDOT provides a

possible range of PSE values for the control sections based

on algorithms developed by the experts using the distress

survey data. However, these values did not appear to be

helpful to the districts and in some cases, led to confusion.

Discrepancies were also observed in the annual PSE

ratings on the same route across the district line. Since

KDOT does not collect any deflection data during

network-level distress survey, the PSE computation

process does not directly take into account any structural

evaluation. However, some of the distresses, such as,

fatigue cracking, are considered structure related.

OBJECTIVE

The primary objective of this study was to derive an

improved methodology to estimate the PSE values from

the Falling Weight Deflectometer (FWD) tests and the

network-level distress survey results.

STUDY APPROACH

A number of variables were investigated as potential

predictors of the PSE values. Multiple linear regression

models were developed with these independent variables

to objectively quantify the decrease of the PSE value. The

Bayesian regression methodology (Smith et al., 1979;

Jackson and Mahoney, 1991; Kaweski and Nickeson,

1997) was also used to develop regression models with

the same variables as the classic multiple linear regression

analysis. Finally, models developed by the classical and

Bayesian regression methodologies were tested on a

different set of pavements in a different KDOT District,

and appropriate models were recommended for global use

on the KDOT network.

DATA COLLECTION

KDOT maintains two types of asphalt pavements—Full-

Design and Partial-Design Bituminous Pavements. Full-

Design Bituminous (FDBIT) pavements were designed for

current and projected traffic, and usually carry heavier

traffic than the Partial-Design Bituminous (PDBIT)

pavements. PDBIT pavements resulted from the paving

and maintenance of the farm-to-market roads in the 1940s

and 1950s. The structural behavior of these two types of

pavements are different and hence, they were analyzed

separately. Forty two FDBIT and 112 PDBIT “control”

sections on the non interstate routes of Districts I and IVof

KDOT were selected in this study. The sections ranged in

length from at least three kilometers to several kilometers.

Deflection data was collected on the study sections by a

Dynatest 8000 FWD in 1993, 1994 and 1995. Tests were

also done in District II in 1996 and 1997. Ten FWD tests per

1.6 km (1 mile) were performed on the outer wheel path of

the travel lane. Other information, e.g. thickness history,

age and cumulative ESAL’s since the last rehabilitation

action, number of transverse cracks per roadway mile,

and PSE values assigned by the districts for the selected

sections, were collected from the PMS database of KDOT.

FWD DATA ANALYSIS

The structural capacity of flexible pavements declines

with time and traffic. In the 1993 AASHTO Design Guide,

M. HOSSAIN et al.186

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:23

15

Nov

embe

r 20

14

the structural capacity for existing flexible pavement is

expressed as the effective structural number, SNeff

(AASHTO, 1993). The primary objective of structural

evaluation is to determine the effective structural capacity

of existing pavements. The evaluation process must

consider the current condition of the existing pavement

materials, and also consider how those materials will

behave in the future. In this study, the approach suggested

in the 1993 AASHTO Guide was followed to calculate the

effective structural number of the control sections. The

FWD first sensor deflection was normalized to 40 kN

(9000 lb) load and was corrected for temperature at 208C

(688F). The FWD deflections were then used to

backcalculate the subgrade resilient modulus (Mr). The

effective pavement modulus (Ep) value was determined

from the equation suggested by AASHTO. Details of the

calculations can be found elsewhere (Chowdhury, 1998).

Once the Ep value was determined, the effective structural

number could be calculated by the following formula

provided by AASHTO (1993):

SNeff ¼ 0:0045 £ D £ ðEpÞ1=3 ð1Þ

where, D is the total thickness of the pavement above

subgrade.

ASSESSMENT OF PAVEMENT STRUCTURAL

DETERIORATION

The major factors contributing to the structural deterio-

ration of asphalt pavements are traffic and climate. Three

variables were selected to predict the decrease in structural

number ðDSN ¼ Design SN 2 SNeffÞ to assess structural

deterioration at the network level:

1. Age (in years) of the pavements since the last

rehabilitation action,

2. Cumulative number of ESAL’s that have passed over

the pavement since the last rehabilitation action, and

3. Thickness (in inches) of the pavements.

Different models were developed for the FDBIT and

PDBIT pavements since the structural behavior of these

pavements are different. By doing simple linear regression

analysis, it was apparent that the decrease in structural

number was highly correlated with age, cumulative

number of ESAL’s and thickness for the FDBIT pavements,

and age and cumulative ESAL’s for the PDBIT pavements.

However, the variables age and cumulative ESAL’s were

found to be highly correlated between themselves. Thus,

only variable age was retained in the analysis since this

parameter is more accurate than calculated ESAL’s. It is

to be noted here that for secondary roadways studied in

this project, traffic growth is very uniform. The following

models were developed based on the multiple linear

regression analysis. More details of the statistical analyses

can be found elsewhere (Hossain et al., 2000).

FDBIT Pavements:

DSN ¼ 0:0218 £ age þ 0:001 £ thickness

ðR2 ¼ 0:81Þð2Þ

PDBIT Pavements:

DSN ¼ 0:0166 £ age ðR2 ¼ 0:72Þ ð3Þ

CLASSICAL REGRESSION ANALYSIS TO

PREDICT DPSE

The major objective of this research was to objectively and

quantitatively determine the PSE values of the pavements

since the last rehabilitation action. However, the decrease

in PSE value was taken as the dependent variable because

it somewhat represents a “normalized” value. Classical

multiple regression analysis was performed to estimate

the decrease in the PSE (DPSE) values. One of the most

important aspects of classical regression analysis is the

selection of independent variables that are strong indica-

tors of the dependent variable. The selection was done in

two steps (Ott, 1993):

1. Enumerating the independent variables, and

2. Evaluating and selecting independent variables sub-

jectively or by analyzing correlation.

Selection of Independent Variables for Prediction of

Change in PSE Values (DPSE)

Extensive literature search was done to select the inde-

pendent variables to predict the decrease in PSE values.

Expert opinion was also sought for this purpose. Since

PSE ratings are based on the condition of the base and

surface, as indicated by the maintenance costs, subgrade

failures, and ability of the section to provide an adequate

surface for the prevailing traffic, the following variables

were selected to reflect those conditions:

(1) Age of the pavement since the last rehabilitation

action (in years) (AGE),

(2) Cumulative ESAL’s that have passed over the

pavement since the last action,

(3) AC layer thickness (in inches) (TH),

(4) PSE value assigned to the pavement immediately

after the last action (PSE),

(5) Decrease in structural number (DSN), and

(6) Distress level due to transverse cracking (DL).

The selected variables were plotted on scatter plots

against the dependent variable, DPSE values, and were

inspected for possible trends. Also, correlation coeffi-

cients for different pairs were determined. It was apparent

from the scatter plot that age and DSN were not linearly

related to DPSE values. In the case of age, the rationale is

that PSE values do not decrease at the same rate with time.

During the initial years this rate is lower, but after a certain

BAYESIAN PROJECT PRIORITIZATION 187

Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:23

15

Nov

embe

r 20

14

period, the PSE values start to decrease drastically. A trial-

and-error approach was followed to determine the

transformed functional form for an independent variable

(Chowdhury, 1998). After several trials, the variable age

was transformed to (age)1.5. For the relationship between

the dependent variable, DPSE and the independent vari-

able, age, the Pearson’s correlation coefficients improved

from 0.35 to 0.68 for the FDBIT and 0.39 to 0.56 for the

PDBIT pavements, when the transformation was

performed. Similarly, the variable decrease in structural

number DSN, was transformed to exp (DSN) to improve

the correlation coefficient of the relationship from 0.49

to 0.61 for the FDBIT, and 0.48 to 0.55 for the PDBIT

pavements. The variable AC layer thickness was dropped

from the PDBIT model as a predictor since the thickness

of this type of pavement was not designed to carry the

expected traffic. The variables age and cumulative ESALs

have a very high correlation between themselves

(correlation coefficient of 0.65 for FDBIT and 0.58 for

PDBIT). Therefore, only one of them (age) was included

in the model to avoid possible multicolinearity or

overspecification of the model (Chowdhury, 1998).

Transverse cracking was included in the model as a

binary (0 or 1) variable. No rutting was included since it is

believed that the amount of rutting on these sections (most

below 12.5 mm) did not influence the PSE ratings by

KDOT. Transverse cracking on the pavements in Kansas

is measured by the number of equivalent roadway-width

cracks. According to KDOT PMS rating guide (KDOT,

1996), the crack severity is categorized using three severity

codes:

Code 1 or DL1: No roughness, 6 mm (0.25 in.) or wider

with no secondary cracking; or any width with second-

ary cracking less than 1.2 m (4 ft) per lane.

Code 2 or DL2: Any width crack with noticeable

roughness due to depression or bump. Also includes

cracks that have greater than 1.2 m (4 ft) of secondary

cracking but no roughness.

Code 3 or DL3: Any width crack with significant

roughness due to depression or bump. Secondary crack-

ing will be more severe than code 2.

Different combinations of the coded cracks will result

in different distress levels due to transverse cracking

(KDOT, 1996).

MODEL SELECTION

Criteria Used to Select a model

The following criteria were used to select a model:

i) Minimize mean sum square errors (MSE): The

smallest MSE will result in the narrowest confidence

intervals and largest test statistics. The model with

the smallest MSE involving the least number of

independent variables can generally be considered as

the best model (Ott, 1993).

ii) Maximize the Coefficient of Determination (R 2): R 2

is a measure of how well the estimated model fits the

observed data. The best model selected is generally

the one with the largest R 2.

iii) Minimum increase of R 2: The best model is selected

as the model associated with the smallest increase in

R 2 with the addition of an extra variable.

iv) Mallows Cp statistic: The best model is usually

thought to have a Cp value closest to p, where, p is

the number of regression coefficients. Models asso-

ciated with Cp greater than p are usually thought to

be biased or misspecified models (Ott, 1993).

Models Obtained

FDBIT Pavements: Detailed analyses and summary

statistics of the model development can be found

elsewhere (Chowdhury, 1998). For FDBIT pavements,

the selected models are:

Distress Level 1

DPSE ¼ 0:216 £ ðAGEÞ1:5 2 20:82 £ exp½DSN�

þ 0:138 £ TH þ 0:328 £ PSE þ 17:65

£ DL1 ð4Þ

Distress Level 2


þ 0:138 £ TH þ 0:328 £ PSE þ 18:06

£ DL2 ð5Þ

Distress Level 3


þ 0:138 £ TH þ 0:328 £ PSE þ 18:38

£ DL3 ð6Þ

½R2 ¼ 78:4%; RMSE ¼ 0:48; n ¼ 38�

The statistical p-values for the parameters imply that

all variables are significant at a level of more than

95%. The Analysis of Variance (ANOVA) results

showed that the model had an F-value of 37 and its

significance value was 0.0001. Since the selected model

had a high F-value and a very low p-value, it satisfied

the model selection criteria mentioned earlier. Also,

the estimated root mean square error (s ) value for the

model was 0.47, which revealed the fact that the

selected model would predict the decrease in PSE


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:23

15

Nov

embe

r 20

14

values at a variability of ^2s or ^0.94 with a

confidence of 99%.

PDBIT Pavements: The selected models are:

Distress Level 1


þ 0:171 £ PSE þ 0:229 £ DL1 ð7Þ

Distress Level 2


þ 0:171 £ PSE þ 0:958 £ DL2 ð8Þ

Distress Level 3


þ 0:171 £ PSE þ 2:27 £ DL3 ð9Þ

½R2 ¼ 86:6%; RMSE ¼ 0:42; n ¼ 104�

Again, the p-values for the parameters indicated that all

the variables were significant at a level of more than 95%.

The ANOVA results showed that the model had an

F-value of 132 and its significance value was 0.0001

(Chowdhury, 1998). Thus, these models met the model

selection criteria discussed earlier.

Table I shows the individual contribution of the vari-

ables used to develop the above models. The results

show that the variable exp [DSN] had the highest con-

tribution to the models for the FDBIT pavements. Also,

for these pavements, initial PSE and the transformed age

variable, [(age)1.5] had similar contributions. For the

PDBIT pavements, initial PSE had the highest contri-

bution. Distress level also contributed significantly for

these pavements.

BAYESIAN REGRESSION ANALYSIS

It is well established that prediction equations are very

important tools for the PMS. However, databases to

support development and updating of these models are

often inadequate, noisy or altogether lacking. Conven-

tional statistical modeling tools, such as classical

regression analysis, may have limited success in these

applications (Kurlanda and Kajner, 1996). A promising

solution lies in the use of Bayesian regression, which

explicitly allows experts to be used to supplement poor

quality data. Bayesian regression methodology was

adopted by the Canadian Strategic Highway Research

Program (C-SHRP) for the Canadian Long Term Pavement

Performance (C-LTPP) monitoring program (Nesbitt and

Sparks, 1990).

An Overview of the Bayesian Regression Approach

In its simplest sense, Bayesian regression is a specialized

adaption of the Bayes Theorem involving development of

multivariate regression models which explicitly consider

two disparate sources of information (Kaweski and

Nickeson, 1997):

1. A Prior Information: Information that is known prior

to an experiment.

2. Experimental Data: Information that is derived from

an experiment.

The interpretation and conclusion drawn from the

experimental data can be quite different depending on

what other evidence exists on the subject at hand.

However, this difference in interpretation does not simply

mean biasing a result. Interpretation of results using Bayes

Theorem is a mathematically consistent way to interpret

new evidence/information.

The Bayesian statistical method for model develop-

ment systematically combines prior knowledge and

experience with data to improve the predictive relation-

ship (Smith et al., 1979; Jackson and Mahoney, 1991;

Kaweski and Nickeson, 1997). The Bayes Theorem

calculates a meaningful and credible answer without

relying solely on a small database. In doing so, the Bayes

technique allows decisions to be made in the short term

while improvements to the data, judgement and the

model continue to be made. The Bayesian regression

achieves a balance between two solutions based on the

data or judgement alone. In assembling information for

Bayesian regression, data collected in the traditional

manner is supplemented with prior knowledge. The

so-called “prior” may be drawn from expert judgement,

“old” data sets, or knowledge that is generally accepted

in the field. Expert judgement can also be encoded by

polling experts and asking them to estimate the value of

the dependent variable for a combination of contributory

variables. Once collected, the experts’ observations are

interpreted similar to the traditional data (Kaweski and

Nickeson, 1997). The predicted variable then can be

expressed as:

Y ¼ bpr0 þ bpr1X1 þ bpr2X2 þ . . .þ bprkXk þ epr ð10Þ

TABLE I Individual contribution of different variables towards themodels

VariablesFDBIT pavements PDBIT pavements

Individual R 2

(%) RMSEIndividual R 2

(%) RMSE

Exp[D(SN)] 44.1 0.68 30.1 0.91Initial PSE 42.4 0.69 50.2 0.77(Age)1.5 38.2 0.70 30.3 0.89Distress level 22.4 0.79 39.6 0.82Thickness 22.1 0.80 N. A. N. A.


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:23

15

Nov

embe

r 20

14

where:

Y is the dependent variable,

Xi are the predictor variables,

k is the number of independent variables,

bpri are the regression coefficients for the prior, and

epr is the random error term for the prior with the

degrees of freedom as

vpr ¼ npr 2 k 2 1

Developing Prior and Assembling Sample Data

The evidence/information known prior to collecting new

data is known in Bayesian terminology as “the prior.” The

prior can be derived either subjectively using expert

judgement or objectively based on the existing data or

models. Both approaches require that prior information be

put into either an N-prior or G-prior format. Both N-prior

and G-prior summarize a linear regression, which

represents the prior state of knowledge in the Bayesian

regression calculation. The prior includes the coefficients

of the linear regression equation along with the

corresponding regression statistics such as the variance

of the regression coefficients. The regression statistics

indicate the certainty of the prior and are used to weigh the

balance between the prior and the data in the Bayesian

regression calculation. The N-prior uses the variance–

covariance matrix for prior in the Bayesian regression

calculations (Kaweski and Nickeson, 1997). A list of

information needed to define both N-prior and G-prior is

provided in Table II.

The G-prior option is typically used when the

coefficient ‘means’ have been estimated directly by

experts. The G-prior derives the variance/covariance

matrix for the coefficient means based on a set of

independent variable data. The G-prior factor, denoted by

g, is used to increase or decrease the influence of the prior

in the calculation of the posterior. A typical value for g is

1. This essentially gives the prior variance/covariance

matrix weight equal to that for the experimental data. The

greater the value of g, the more influence the prior will

have on the posterior. Since the pseudo/prior data used in

this research were not derived from expert opinion only,

the N-prior option of Bayesian regression was used.

The precision matrix for the prior, A, is calculated for

the N-prior as (Zellner, 1987; Press, 1989):

A ¼ Var=Covarpr

vpr 2 2

vprSe2

� �� 21

ð11Þ

where

Se2pr ¼ varðeprÞ ¼

Pni¼1 ðYipr–actual

2 Yipr–predictedÞ2

v

and

var=covar ¼

s2b0pr

gb0b1pr· · · gb0bkpr

gb1b0prs2

b1pr· · · gb1bkpr

..

. ... ..

. ...

gbkb0prgbkb1pr

· · · s2bkpr

2666666664

3777777775

where s 2bi

indicates the variance for coefficient, and gbibi

indicates the covariance between two coefficients.

With the experimental data, the experiment precision

matrix, H, is (Zellner, 1987; Press, 1989):

H ¼ ðXtXÞ ð12Þ

where X is the matrix of observed values, and X t denotes

the transform of matrix X. The coefficient matrix, b, is then:

b ¼ H21XtY ð13Þ

where Y is the vector of the dependent data.

The posterior precision matrix, M, is calculated by

adding the prior matrix to the experimental data matrix as:

M ¼ A þ H ð14Þ

The posterior regression coefficients are calculated

using a weighted average of the prior regression coeffi-

cients and posterior regression coefficients as:

bpos ¼ M21ðAbpr þ HbÞ ð15Þ

The posterior degrees of freedom are calculated by

adding the prior degrees of freedom and experimental

degrees of freedom and adding the number of coefficients

in the functional form (Zellner, 1987; Press, 1989):

vpos ¼ ðvpr þ v þ k þ 1Þ ð16Þ

The number “1” in Eq. (16) is added assuming that a

constant bo is present in the regression equation (Zellner,

1987; Press, 1989).

Bayesian Regression to Predict the Decrease in the PSE

Values

The Bayesian regression analysis in this study used the

XLBayes software, a Bayesian regression program based

in the EXCEL environment (Kaweski and Nickeson,

1997). In this program, prior data needs to be combined

with the sample data to obtain the desired posteriors

TABLE II Required prior information (after Kaweski and Nickeson,1997)

Prior informationRequired for

N-priorRequired for

G-prior

Means vector U U

Variance/Covariance Matrix U –G-prior data set – U

G-prior factor – U

Residual variance U U

Degrees of freedom U U


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:23

15

Nov

embe

r 20

14

(Kaweski and Nickeson, 1997). The prior data can be

drawn from the expert judgement, old data sets or

knowledge that is generally accepted in the field. In this

study, the data set for a number of pavements from the

KDOT Districts I and IV for 1993 and 1994 was used as

prior data, and the data for 1995 was used as the sample

data. The same functional form and transformations of the

independent variables as in the classical regression were

used.

Results of Bayesian Regression and Selected PosteriorModels

The selected posterior models using N-prior Bayesian

regression analysis are presented below:

FDBIT Pavements:

Distress Level 1


þ 0:106 £ TH þ 0:374 £ PSE þ 5:89

£ DL1 ð17Þ

Distress Level 2


þ 0:106 £ TH þ 0:374 £ PSE þ 6:04

£ DL2 ð18Þ

Distress Level 3


þ 0:106 £ TH þ 0:374 £ PSE þ 6:47

£ DL3 ð19Þ

PDBIT Pavements:

Distress Level 1


þ 0:303 £ PSE þ 0:392 £ DL1 ð20Þ

Distress Level 2


þ 0:303 £ PSE þ 0:881 £ DL2 ð21Þ

Distress Level 3


þ 0:303 £ PSE þ 1:974 £ DL3 ð22Þ

The Classical regression results using pseudo data,

development of the N-prior and the posterior regression

coefficients for the FDBIT and PDBIT pavements have

been reported elsewhere (Chowdhury, 1998).

MODEL EVALUATION

The purpose of model evaluation is to draw conclusions

about the Bayesian posterior results. Evaluation empha-

sizes comparisons among the data, prior, and posterior.

These comparisons may be used for additional iterations

for the analysis later on. The statistical performance of a

classical regression model is typically measured by

evaluating the standard error (Se), coefficient of determi-

nation (R 2), F-statistic, and t-statistic. In Bayesian

regression, only Se and t-statistic can be evaluated.

Neither R 2 nor the F-statistic can be calculated because

they rely on the experimental data which does not exist for

the posterior results (Kaweski and Nickeson, 1997).

Data, Prior, and Posterior PDF Plots

An important output of XLBayes is the Probability

Density Function (PDF) plot for each coefficient in the

model. These plots graphically compare the distributions

of the same coefficient when based on the data alone, the

prior alone, or the Bayesian posterior. Figure 1 shows a

sample PDF plot for the coefficient of the transformed

variable (age)1.5 used in the model.

Under the assumptions of both classical multiple linear

and Bayesian regressions, the model coefficients follow t-

distribution. The width of the bell-shaped curve shows the

confidence in estimating the coefficients. The PDF plots of

all coefficients reveal that the probability distribution for

the posterior estimate is “tighter” than either the prior or

the data. This is intuitively reasonable as the prior and the

data reinforce each other with similar estimates of

coefficients. Bayesian regression models can always be

updated by inserting more data in the model which makes

the posterior more and more definitive (Kaweski and

Nickeson, 1997).

Student’s t-statistic

The Student’s t-test is used to determine whether a

regression coefficient is statistically significant. In that

case, one can be confident that the variable in question

really has statistically significant influence on the model

results. The higher the value of t, the more confidence

about its value and significance. If the regression coeffi-

cients in the prior and posterior are not statistically

significant, it may be useful to rerun the analysis after

excluding the insignificant variable. If the standard error

term does not increase significantly, the excluded variable

may not be a statistically significant contributory variable.

The ideal result is for the data and prior to reinforce each

other, resulting in a posterior coefficient that has a smaller

standard error than either one individually. This is not

always the case, however, and the posterior may, in fact,


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:23

15

Nov

embe

r 20

14

have a larger standard error. Irrespective of how much the

variance has changed, it is desirable that the coefficients in

the posterior model all be statistically significant

(Kaweski and Nickeson, 1997).

The t-statistics and standard deviations of different

coefficients of the models in this study are presented in

Table III. The t-value for a regression coefficient is calcu-

lated by dividing the mean of the regression coefficient by

its standard deviation:

t ¼ bi=sbið23Þ

It is observed that the t-statistics of all selected variables

are outside the range of 1.96 and 21.96, which means that

they are significant at 5% level of significance.

Standard Error of the Residuals (Se)

The standard error of the residuals, Se, is a basic measure

of regression model performance. The standard error

(or the standard deviation) of the residuals is simply the

square root of the residual variance, S2e : The lower the Se,

the closer the predictions made by the model are to the

actual observations of the dependent variable, and

therefore, the better the model. The posterior standard

error in Bayesian regression is calculated by adding the

prior standard error, to the experimental data standard

error and adding two additional factors to account for

the deviation of the posterior regression coefficients from

the experimental coefficient and the deviation of the

posterior regression coefficients from the prior regression

coefficients (Kaweski and Nickeson, 1997):

S2e ¼

ðvprS2epr

þ vS2eÞ þ ðb 2 bposÞ

tHðb 2 bposÞ þ ðbpr 2 bposÞtAðbpr 2 bposÞ

vpos

ð24Þ

Under the assumptions of the regression analysis, the

residual has a mean of zero and is normally distributed.

FIGURE 1 Comparison of normal probability plots for age.

TABLE III Standard deviation and t-statistic of the posterior coefficients

Pavement type Variable Std. deviation t-value Res. Var. ðS2eÞ

FDBIT (Age)1.5 0.034 3.620 0.32Thickness 0.041 2.547

Exp[D(SN)] 4.240 22.200PSE 0.107 3.486DL1 2.979 1.98DL2 2.876 2.101DL3 2.424 2.670

PDBIT (Age)1.5 0.008 2.349 0.20Exp[D(SN)] 0.500 23.746

PSE 0.038 7.850DL1 0.196 1.990DL2 0.383 2.301DL3 0.466 4.234


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:23

15

Nov

embe

r 20

14

Table III shows that the standard deviation, Se for the

FDBIT and PDBIT models are 0.56 and 0.44, respec-

tively. Therefore, the selected models in this study will

predict the DPSE values within ^1.1 units of actual

ratings for FDBIT and ^0.88 units for PDBIT pavements

with 95% confidence.

RESULTS AND DISCUSSION

Prediction of DPSE Values Using Proposed Models

As mentioned earlier, data from 1993, 1994 and 1995 was

used in the regression analysis. Statistical tests were

performed on the models, which yielded very convincing

and satisfactory results. Data from a different set of

control sections collected in different years were selected

to test the models. These control sections were not

included in the regression analysis. For 1996, 12 FDBIT

and 26 PDBIT sections and for 1997, 10 FDBIT and 19

PDBIT sections were chosen randomly to test the models

developed in this study. Both classical and Bayesian

regression models were used to predict the DPSE values

on these control sections. At the same time, the rated

decrease in the PSE values assigned by the KDOT

engineers were also collected. Figures 2–5 compare the

predicted and rated DPSE values graphically.

The PSE values are always assigned by the districts as

integers. Since the coefficients of the regression

equations are not integers nor are the independent

variables, the outputs from the models are evidently

decimals. The predicted DPSE values were rationally

rounded up or down to the nearest integer to mimic

district ratings. The figures indicate that the predicted

DPSE values very closely approximate the rated DPSE

values on most of the control sections. In a few cases,

discrepancies were observed in the KDOT district

ratings. For example, on Route K-68 (Project No. 14 in

Fig. 4), the PSE rating has increased by 2 although no

FIGURE 2 Comparison of rated and predicted D(PSE) values (FDBIT pavements, 1996 data).

FIGURE 3 Comparison of rated and predicted D(PSE) values (FDBIT pavements, 1997 data).


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:23

15

Nov

embe

r 20

14

rehabilitation action has been taken on this pavement

since the last action four years ago. On the other hand,

both the Bayesian and classical regression models

suggest that the PSE value should decrease by 2.

Similarly, other discrepancies in the present rating

system were rationally and objectively addressed by the

selected models.

Range of the Independent Variables

Like all other regression equations there is a range of each

independent variable for which the selected models are

expected to predict the dependent variable with sufficient

accuracy. The prediction interval band will be wider

outside that range, and it is statistically incorrect to use the

models in those cases. The suggested ranges of the

independent variables are:

1. Age since last rehabilitation action: 1–18 years,

2. Thickness: 100–760 mm (4–30 in),

3. PSE rating in the base year: 2–10,

4. Decrease in structural number, DSN: 0.001–2.5,

5. Distress level due to transverse cracking: 1–3.

Paired t-test Results

Paired t-tests were performed to determine whether data

from two different sources have the same mean or

whether they are statistically similar (Ott, 1993). Rated

FIGURE 4 Comparison of rated and predicted D(PSE) values (PDBIT pavements, 1996 data).

FIGURE 5 Comparison of rated and predicted D(PSE) values (PDBIT pavements, 1997 data).


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:23

15

Nov

embe

r 20

14

decrease in the PSE values were compared with the

predicted decreases derived from both classical and

Bayesian regression models. The results of these t-tests

are tabulated in Table IV. The results indicate that for

both FDBIT and PDBIT pavements and for all

regression models, there was no significant difference

between the two sets of data. From the sum of the

squared errors, it can be concluded that for the FDBIT

pavements, the Bayesian and classical regression models

yielded similar results while for the PDBIT pavements,

the Bayesian regression models appeared to be slightly

more accurate.

CONCLUSIONS

1. The current PSE rating system used by KDOT is

subjective and discrepancies were observed in the

rating process. The regression models proposed in this

study predict the PSE values by taking into account the

FWD deflection data, age, thickness, and distress level

of pavements and hence, the model output are

representative of the actual structural condition of the

pavement sections. Independent tests showed that the

proposed models very closely approximate the present

PSE ratings obtained at the KDOT district level.

2. The models obtained from the classical and Bayesian

regression methodologies were very similar in form

and yielded statistically similar results when tested on

a different set of pavements. Both models appear to be

statistically sound from the view points of prediction

capabilities and model utility. The Bayesian regression

models yielded slightly better results during testing. It

should be noted that the Bayesian regression is a

continuous process of updating the existing “partial

state of knowledge.” As the existing database gets

enriched with more data, the Bayesian regression will

result in a posterior with an even smaller confidence

interval. Hence, it is highly recommended that the

existing models be updated every other year with more

recent data. Also, models are applicable for the Kansas

conditions only. However, this study illustrates the

ways to develop such models for other states or

regions.

Acknowledgements

The authors would like to thank VEMAX Management,

Inc. and C-SHRP for providing the XLBayes software

used in this study. Assistance of Ms Lea Ann Caffrey,

formerly with KDOT, in data collection is gratefully

acknowledged.

References

AASHTO (1986) Guide for Design of Pavement Structures (AmericanAssociation of State Highway and Transportation Officials,Washington, DC).

AASHTO (1993) Guide for Design of Pavement Structures (AmericanAssociation of State Highway and Transportation Officials,Washington, DC).

Chowdhury, T., (1998) Bayesian regression methodology for networklevel pavement project rating, M.S. Thesis, Department of CivilEngineering, Kansas State University, Manhattan.

Haas, R., Hudson, R.W. and Zaniewski, J.P. (1994) Modern PavementManagement, Krieger Publishing Co., Malabar, FL, pp 161–165.

Hossain, M., Chowdhury, T. and Gisi, A.J. (2000) Network-LevelPavement Structural Evaluation, The ASTM Journal of Testing andEvaluation, May.

Jackson, N. and Mahoney, J. (1991), Course Notes, Chapter 6, AnAdvanced Course in Pavement Management Systems, FederalHighway Administration, Washington, DC, pp 9–10.

KDOT (1984) Development of a Highway Improvement Priority Systemfor Kansas, Division of Planning and Development, KansasDepartment of Transportation, Topeka.

KDOT (1996) Kansas NOS Condition Survey Report, Bureau ofMaterials and Research, Kansas Department of Transportation,Topeka, Attachments I and II.

KDOT (1998) CANSYS: Control Section Analysis System, Division ofPlanning and Development, Kansas Department of Transportation,Topeka.

Kaweski, D. and Nickeson, M. (1997) C-SHRP Bayesian Modeling: AUser’s Guide, Transportation Association of Canada, Ottawa.

Kurlanda, M.H. and Kajner, L. (1996) Predicting Roughness Progressionof Asphalt Overlays, Transportation Research Board, Washington,DC, Record 1539, pp 125–131.

Nesbitt, D. and Sparks, G. (1990) Design of Long Term PavementMonitoring System for the Canadian Strategic Highway ResearchProgram, Canadian Strategic Highway Research Program, Ottawa,Canada.

Ott, R.L. (1993) An Introduction to Statistical Methods and DataAnalysis, Duxbury Press, Belmont, CA.

Paterson, W.D.O. (1987) Road Deterioration and Maintenance Effects:Models for Planning and Management, The Johns HopkinsUniversity Press, Maryland and London, Published for the WorldBank.

Press, S. (1989) Bayesian Statistics: Principles, Models and Appli-cations, Wiley, New York.

Smith, W., Finn, F., Kulkarni, R., Saraf, C. and Nair, K. (1979) BayesianMethodlogy for Verifying Recommendations to Minimize AsphaltPavement Distress, Transportation Research Board, Washington, DC,NCHRP Report No. 213.

Zellner, A. (1987) An Introduction to Bayesian Inference inEconometrics, Krieger Publishing Co., Malabar, Florida.

TABLE IV Results of paired t-tests

Pavement type

Paired t-test results

Bayesian Classical

FDBIT tcrit(two tail) ¼ 2.079 tcrit(two tail) ¼ 2.079t ¼ 21.46 t ¼ 21.89Sum of sq. err. ¼ 16.74 Sum of sq. err. ¼ 16.87

PDBIT tcrit(two tail) ¼ 2.015 tcritðtwotailÞ ¼ 2:015t ¼ 21.39 t ¼ 21.93Sum of sq. err. ¼ 7.78 Sum of sq. err. ¼ 12.41


Dow

nloa

ded

by [

Yor

k U

nive

rsity

Lib

rari

es]

at 0

4:23

15

Nov

embe

r 20

14

bayesian regression methodology for project prioritization

Documents