aggregate travel demand forecasting from...

14
AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel McFadden and Fred Reid, University of California, Berkeley This paper derives an expression for obtaining aggregate (interzonal) travel forecasts given a probit disaggregate demand model and zonal averages and intrazonal variances of the independent variables. It also derives expres- sions for the biases in aggregate model calibrations resulting from zonal homogeneity assumptions in the variables. The conditions under which these biases are important are discussed. Expressions are also determined for obtaining consistent, unbiased estimates for both aggregate and dis- aggregate models that take into account nonhomogeneous zones and prac- tical data limitations. •FORECASTS of aggregate travel demand between zones, determined as a function of policy instruments, are a basic input to the transportation planning process. Tradi- tionally, these forecasts have been based on models of aggregate interzonal flows, calibrated by using zonal average trip attributes and socioeconomic characteristics. Interzonal travel demand is, tautologically, the result of aggregating individual travel decisions for the zonal population. Traditional aggregate models do not attempt to make this connection. By contrast, the recent travel demand literature has empha- sized the individual decision-making unit and the behavioral foundations of travel de- mand (1, 2, 4, 8, 9). Aggregate and disaggregate travel demand models are sometimes viewed by trans- portation planners as mutually exclusive or competitive approaches to the forecasting problem. We shall argue that, to the contrary, they are complementary. [A good analogy can be made between the problem of modeling travel demand and the physical theory of a perfect gas. The observable macroparameters of a gas, such as pressure and temperature, are linked by empirical laws such as Boyle's law; however, the be- havior of individual gas molecules is described by the kinetic theory of gases. If the gas is in equilibrium, then Boyle's law predicts macrobehavior accurately, and it is unnecessary to consider the molecular structure of the gas. However, the empirical laws fail for gases in disequilibrium, and it is necessary to turn to the molecular theory. This requires the further step of describing the distribution of molecular be- havior to obtain the desired macroaverages; this is provided by Maxwell statistics. Equilibrium in the analogy to travel demand is identified with the concept of an identical choice environment for all individuals in a zone. Aggregate demand models, like the empirical gas laws, should forecast accurately under this homogeneity condition but cannot be expected to succeed when zones are heterogeneous. Disaggregated behavioral models, therefore, should have the properties of implying the aggregate models under the homogeneity conditions and of forecasting correctly under heterogeneity conditions. The specification of the distribution of individual behavior in the population (as with Maxwell statistics applied to gases in disequilibrium) will play a critical role in the disaggregate travel demand model.] The disaggregate models provide a theoretical foundation for the aggregate models and provide conditions under which the aggregate models will give valid forecasts. The aggregate models may provide the most conve- nient means of forecasting when zonal homogeneity conditions are met. This paper demonstrates that the aggregate and disaggregate models have a common foundation and that it may be possible to use a synthesis of the models to facilitate cal- ibration and improve forecasting accuracy. The following questions are answered: 24

Upload: others

Post on 26-Feb-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AGGREGATE TRAVEL DEMAND FORECASTING FROM …onlinepubs.trb.org/Onlinepubs/trr/1975/534/534-003.pdf · AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel

AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel McFadden and Fred Reid, University of California, Berkeley

This paper derives an expression for obtaining aggregate (interzonal) travel forecasts given a probit disaggregate demand model and zonal averages and intrazonal variances of the independent variables. It also derives expres­sions for the biases in aggregate model calibrations resulting from zonal homogeneity assumptions in the variables. The conditions under which these biases are important are discussed. Expressions are also determined for obtaining consistent, unbiased estimates for both aggregate and dis­aggregate models that take into account nonhomogeneous zones and prac­tical data limitations.

•FORECASTS of aggregate travel demand between zones, determined as a function of policy instruments, are a basic input to the transportation planning process. Tradi­tionally, these forecasts have been based on models of aggregate interzonal flows, calibrated by using zonal average trip attributes and socioeconomic characteristics. Interzonal travel demand is, tautologically, the result of aggregating individual travel decisions for the zonal population. Traditional aggregate models do not attempt to make this connection. By contrast, the recent travel demand literature has empha­sized the individual decision-making unit and the behavioral foundations of travel de­mand (1, 2, 4, 8, 9).

Aggregate and disaggregate travel demand models are sometimes viewed by trans­portation planners as mutually exclusive or competitive approaches to the forecasting problem. We shall argue that, to the contrary, they are complementary. [A good analogy can be made between the problem of modeling travel demand and the physical theory of a perfect gas. The observable macroparameters of a gas, such as pressure and temperature, are linked by empirical laws such as Boyle's law; however, the be­havior of individual gas molecules is described by the kinetic theory of gases. If the gas is in equilibrium, then Boyle's law predicts macrobehavior accurately, and it is unnecessary to consider the molecular structure of the gas. However, the empirical laws fail for gases in disequilibrium, and it is necessary to turn to the molecular theory. This requires the further step of describing the distribution of molecular be­havior to obtain the desired macroaverages; this is provided by Maxwell statistics. Equilibrium in the analogy to travel demand is identified with the concept of an identical choice environment for all individuals in a zone. Aggregate demand models, like the empirical gas laws, should forecast accurately under this homogeneity condition but cannot be expected to succeed when zones are heterogeneous. Disaggregated behavioral models, therefore, should have the properties of implying the aggregate models under the homogeneity conditions and of forecasting correctly under heterogeneity conditions. The specification of the distribution of individual behavior in the population (as with Maxwell statistics applied to gases in disequilibrium) will play a critical role in the disaggregate travel demand model.] The disaggregate models provide a theoretical foundation for the aggregate models and provide conditions under which the aggregate models will give valid forecasts. The aggregate models may provide the most conve­nient means of forecasting when zonal homogeneity conditions are met.

This paper demonstrates that the aggregate and disaggregate models have a common foundation and that it may be possible to use a synthesis of the models to facilitate cal­ibration and improve forecasting accuracy. The following questions are answered:

24

Page 2: AGGREGATE TRAVEL DEMAND FORECASTING FROM …onlinepubs.trb.org/Onlinepubs/trr/1975/534/534-003.pdf · AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel

25

1. Given a correctly calibrated disaggregated behavioral model, how can aggregate interzonal flows be forecast?

2. What biases are introduced in the calibration of aggregate models when zones are not homogeneous? How do these biases affect demand forecasts? What correc­tions can be applied?

3. What biases are introduced in the calibration of disaggregated behavioral models when some independent variables are approximated by zonal averages? What correc­tions can be applied?

DISAGGREGATED BEHAVIORAL MODELS

The axioms of disaggregated behavioral demand modeling are that individuals represent the basic decision-making unit and that each individual will choose one alternative among those available that he or she finds most desirable or useful. This depends on the attributes of the alternative and the socioeconomic characteristics of the individual. For simplicity, we will develop this model and the results only for the classical prob­lem of modal split for a work trip. The substance of our conclusions continues to hold for more complex aspects of travel behavior. The full power and generality of the dis­aggregated approach become apparent primarily in forecasting complex demand sys­tems.

Suppose an urban area is partitioned into zones, indexed 1, ... , J. Let k = 1, ... , K index the individuals in the population, and let N1J denote the set of indi­viduals commuting from zone i to zone j . Each individual is assumed to face a binary modal choice, and a and bare the modes. Now, consider a specific individual k having a vector of socioeconomic characteristics sk and facing vectors of attributes x•k and xbk for the two modes. According to the behavioral model, the individual will have a utility function of u = Uk(x, sk), which summarizes the desirability of a mode with attributes x, and will choose the alternative that gives the higher utility; i.e., mode a will be chosen if Uk(x•k, sk) > Uk(xbk, sk). Not all attributes of alternatives and socioeconomic charac­teristics determining tastes can be measured. Consequently, the utility function of an individual drawn randomly from the population can be thought of as containing a random component reflecting his or her unmeasured idiosyncrasies in tastes. We therefore write the utility function in the form

Uk(x, s) = V(x, s) + Ek(x, s) (1)

where V(x, s) is common to all members of the population and can be interpreted as the representative utility, and Ek(x, s) is the random component. The condition for mode a to be chosen by individual k can now be written

(2)

The unmeasured term on the left side of Eq. 2 has a statistical distribution in the population and a cumulative distribution function G. Consider an individual drawn ran­domly from the subpopulation who has socioeconomic characteristics sk and who faces alternative trip attribute vectors x•k and xbk. The probability that this individual will choose mode a is given by

(3)

Page 3: AGGREGATE TRAVEL DEMAND FORECASTING FROM …onlinepubs.trb.org/Onlinepubs/trr/1975/534/534-003.pdf · AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel

26

The methodology of disaggregate behavioral demand analysis is to specify paramet­ric functional forms for the representative utility function V and the distribution func­tion G. [In general, the parameters of the distribution function G will be functions of the measured trip attributes and socioeconomic characteristics of the subpopulation. We suppress these arguments and later assume G to be independent of these variables. This is a strong restriction which is inconsistent with some descriptions of the struc­ture of taste variation in the population. For example, the disaggregated behavioral model of Quandt (8) is incompatible with this restriction. A more complete discussion of this specification is given in Domencich and McFadden (4).] Observed choices from a sample are treated as drawings from binomial distributions, and the probabilities are given in Eq. 3. Statistical methods such as the maximum likelihood procedure are used to calibrate the unknown parameters. Following this procedure, we first specify that the representative utility function be linear in unknown parameters, and the form is

L V(x, s) = I: f3QZQ(x, s)

l=l (4)

where the ZQ(x, s) are numerical functions. [Numerical functions ZQ can be simple or complex functions of the trip attributes and socioeconomic characteristics. For ex­ample, Z 2 may be a trip attribute such as trip cost or on-vehicle time or a transfor­mation such as the logarithm or square of one of these variables. It may be a dummy variable that is one for mode a and zero for mode b, which corresponds to a pure mode preference effect. It may involve interactions of trip attributes and socioeconomic variables such as trip cost divided by trip time, trip cost divided by wage, or trip walking time times an index of physical health. It may involve interactions of trip at­tributes or socioeconomic variables with mode dummy variables, for example, a vari­able that is income for mode a and zero for mode b or on-vehicle time for mode band zero for mode a. Because the utility comparison involves only differences of the ZQ variables, choice will be influenced only by factors that vary between modes. This means a pure socioeconomic characteristic such as income, unless interacted with a pure mode preference dummy, will not influence tastes and should not be included as a Z2 variable. However, it should be noted that an arbitrary smooth utility function V(x, s) can be approximated as closely as desired by the form in Eq. 3. An important class of models will be those in which no pure mode effects appear either singly or in interactions with other variables. These generic models can be used to forecast new mode demand.]

Defining z~ = ZQ(xak, sk) - Z2(xbk, sk) and zk = z~, ... , z~' and letting f3 denote the column vector of unknown parameters, we rewrite Eq. 3 as

(5)

Equation 5, derived from the behavioral model, has a simple conventional interpreta­tion. Vector zk measures the differential attributes of the two modes, which are weighted to account for the effect of measured socioeconomic differences. For ex­ample, z~ might be the cost of mode a less the cost of mode b, and z~ the on-vehicle time of mode a less the on-vehicle time of mode b, which is multiplied by the individ­ual's after-tax wage. The coefficient vector f3 weights the components of zk into a single measure {3'zk of the differential impedance of the two modes, and G represents the response curve that gives the proportion of individuals choosing mode a at each level of relative impedance. To simplify further analysis, we will assume that the distribution function G is standard normal. This condition allows us to obtain simple closed formulas but is not critical for our general conclusions. If ~ denotes the cu­mulative standard normal distribution, Eq. 5 has a final form

Page 4: AGGREGATE TRAVEL DEMAND FORECASTING FROM …onlinepubs.trb.org/Onlinepubs/trr/1975/534/534-003.pdf · AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel

27

(6)

that determines the probability ·of the choice of mode a by an individual drawn randomly from the subpopulation of individuals facing a vector zk of differential attributes of the two modes. This form in statistics and transportation demand analysis is known as the probit model.

AGGREGATE MODAL SPLITS FROM DISAGGREGATED MODELS

Suppose that the disaggregated behavioral model in Eq. 6 is a correct specification of the distribution of individual modal choices in the population and that the parameter vector {3 has been calibrated accurately in a statistical study. We now wish to deter­mine the aggregate modal split between zones i and j. The formula for this aggregate is straightforward. Recall that N

1 J is the set of individuals k traveling from i to j and

let n 1J be the nwnber of individuals in this set. Because Pk = «i({3'zk) is our best pre­diction that an individual with a measured choice environment described by zk will choose mode a, the best prediction for the aggregate of individuals making the trip is

(7)

Equation 7 is just the expectation of the response probability of the empirical distri­bution of the vector of the independent variables z. When n1J is large, this formula is closely approximated by the expectation of the response probability of the underlying distribution of the independent variables. We asswne that the distribution of z for in­dividuals in N1J is normal; the mean is z1l and the covariance matrix is A1J. (This is a plausible large sample approximation even when the individual observations are clearly nonnormal, e.g., discrete. It should be noted that some of the conclusions, e.g., the result concerning the consistency of disaggregated models calibrated from zonal aver­ages, depend critically on the symmetry of this distribution and would change substan­tially if the distribution were skewed.} Then, the differential impedance y = {3'z is dis­tributed normally with mean {3'z1J and variance a~J = f3' NJ{3. Therefore, the expecta­tion of the response probability is

(8)

Equation 8 can be simplified further by using the convolution properties of normal distributions. In general, i.f W1 and W2 are independent normal random variables with mean u1 and variance a~ for i = 1, 2, one has W 1 - W2 distributed normally with mean IJ.1 - µ.2 and variance ai + a~, and this implies

(9)

Page 5: AGGREGATE TRAVEL DEMAND FORECASTING FROM …onlinepubs.trb.org/Onlinepubs/trr/1975/534/534-003.pdf · AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel

28

Setting w = µ1 = O, a1 = 1, aa = aq, and µ2 = fj'ziJ, Eq. 9 implies a final form for the aggregate modal split between zones i and j:

(10)

Equation 10 can also be obtained directly by returning to the condition given in Eq. 2 for an individual to choose mode a. Given tllat V(xa\ sk) - V(xbk, sk) = {3 1 zk, this con­dition becomes

(11)

The first two terms in Eq. 11 taken together are assumed to have a standard normal distribution, and fj'zk is assumed to be normal with mean {3'z1J and variance a=. There­fore, the left side of Eq. 11 is normal with mean -fj'-ziJ and variance 1 + a~ 3 • This im­plies that the probability of the event in Eq. 11 is given by Eq. 10.

By comparing Eqs. 10 and 6 for the individual response probability, one sees that the components of differential impedance have the same relative weights, given by the component of (3, but that the effect of the mean differential impedance for the zones is

attenuated by the factor ..j 1 + a;J, which reflects the degree of heterogeneity of the variables facing the individuals in the zone. If the zones are homogeneous, so that each individual has the same socioeconomic characteristics and faces the same trip attributes yielding a~3 = f3' N 3{3 = 0, then the aggregate and disaggregate models co­incide. This conclusion provides a condition under which the disaggregated model can be calibrated directly from interzonal data.

Equation 10 provides a method of forecasting interzonal modal splits from a knowl­edge of the zonal average z1J of the variables entering differential impedance and of the intrazonal covariances A1 J of these variables. It is important to note that one does not require a sample of individuals going from i to j or observations on individual data points, although an alternative approach to computing the aggregate flows is to use Eq. 7 directly for a random sample from the population. The zonal means z1J are often available from transportation surveys; the covariances NJ are usually not re­ported but could be constructed from the underlying data. The effect of a policy change can be forecast from Eq. 10, provided the effect of the policy on z13 and A1J can be de­termined. The most straightforward case is a policy change that has a homogeneous impact on the zone, as for example a $0.05 increase in the basic transit fare. This changes the cost component of zu and leaves N; unchanged. A more complex example wbuld be an increase in parking charges given that zones are heterogeneous with re­spect to the availability of free parking to different individuals. This increase would change the corresponding component of z1J but would also spread the distribution of parking charges more widely, which would increase the variance of this component in A13 and possibly also change the covariance of this component and other variables. Specifying the precise effect of a policy change on NJ may be challenging; various ap­proximations may become necessary, including the extreme approximation implicit in conventional aggregate models that NJ is always zero and is therefore unchanged by policy. The error introduced by this last approximation may be substantial.

An empirical example shows the order of magnitude of the factor ..J 1 +~Jin Eq. 10. We consider an automobile-bus modal-split model calibrated on 160 individual workers in the San Francisco Bay area (7, model 1). This model (7) has one pure socioeconomic variable (income), one pure transportation variable (cost difference), and four mixed variables defined by the after-tax wage times the time difference for on-vehicle, walk, initial wait, and transfer wait times. Table 1 gives means and standard errors of the

Page 6: AGGREGATE TRAVEL DEMAND FORECASTING FROM …onlinepubs.trb.org/Onlinepubs/trr/1975/534/534-003.pdf · AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel

Table 1. Overall means and standard errors for population sample.

Variable Standard

Number Description Mean Error

1 Family income, ~ $10,000/year 8,670 2,238.3 2 Out-of-pocket cost per round trip", cents 0.88 102.5 3 On-vehicle time", one way x net wage, -51. 7

$ / hour 65.0

4 Walk time" x net wage, $/hour -40.2 64.6 5 Initial wait time• x net wage, $/hour -37 .4 33.2 6 Transfer wait time• x net wage, $/hour -28.3 42.1 7 Automobile dummy 1

1 For automobile and bus trips.

Table 2. Total correlation matrix.

Variable

variable 2 4 5 6

l 1 0.224 -0.307 -0.372 -0.303 -0.27 2 0.224 1 0.195 -0.294 0.206 0.146 3 -0.307 0.195 1 0.0913 0.518 0.597 ~ -0.372 -0.294 0.0913 1 0.0746 0.0597 5 -0.303 0.206 0.518 0.0746 1 0.642 6 -0.27 0.146 0.597 0.0597 0.642 1

Table 3. Average intrazonal covariance matrix as proportion of total covariance matrix for population sample.

variable

variable 2 4 5 6

1 0.91 -0.332 2.25 0.324 1.19 0.464 2 -0.332 0.136 0.156 -0.173 -0.14 -0.223 3 2.25 0.156 0.652 1.07 0.568 0.191 4 0.324 -0.173 1.07 0.168 0.88 0.756 5 1.19 -0.14 0.568 0.88 0.45 0.4 6 0.464 -0.223 0.191 0.756 0.4 0.26

29

Normalized Standard Error

Probit of Intrazonal Coefficient Variances

0.0000391 1.46 -0.00551 6.98 -0.00514 1.40

-0.000055 5.39 -0.0103 1.39 -0.0114 3.32 0.0898

variables and their probit coefficients (3. Table 2 gives the overall correlation matrix of these variables. Using the MTC 440 zone network for the San Francisco Bay area, we grouped the sample into zones. Assuming a common intrazonal covariance matrix A= A1 J, we obtained the estimate of A (expressed as a proportion of the overall co­variance matrix), which is gi-ven in Table 3. We note that 91 percent of .iilcome vari­ation is intrazonal. The percentage is only 13.6 for cost difference variation but is as high as 65 for one of the variables for mixed wage times cost. For this A matrix,

a~J = {3' A{3 = 0.485 and the factor '11 + a~J = 1.22. This implies that the elasticity of any independent variable of aggregate demand for the first mode between zones i and j will be 82 percent of the average of the corresponding individual demand elasticities for trips from i to j. [This is a conservative estimate of the bias in typical modal­split models because (a) the calibration on which our calculation is based underweighs transit walk time because of a failure to distinguish walk access and automobile access transit trips and (b) a variety of additional automobile availability and socioeconomic variables are excluded from the model. Jn each of these cases, the contribution of intrazonal variance can be expected to be a high proportion of the total.] We conclude that direct application to aggregate modal splits of elasticities calculated from disag­gregate models will tend to overestimate the magnitude of demand response, and

Page 7: AGGREGATE TRAVEL DEMAND FORECASTING FROM …onlinepubs.trb.org/Onlinepubs/trr/1975/534/534-003.pdf · AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel

30

elasticities calculated from aggregate data will underestimate, on the average, indiv­idual demand elasticities.

To determine whether the assumption of a common intrazonal matrix A= A1J was a good one, we computed the standard errors over the origin-destination pairs in our sample of the diagonal variance elements in the AiJ· These standard errors divided by their means (i.e., the diagonal elements of A) are given in Table 1. We conclude that there is substantial variation in the AiJ over origin-destination pairs and that it would be a poor practice in applications to assume a common A matrix. Finally, we note that over half the contribution to the variance a~J = {3A' f3 in our example is con­tributed by off-diagonal (covariance) terms. Thus, it is not sufficient to look only at the variance elements of A. Because of the small data base, these conclusions are necessarily tentative and are intended only to suggest orders of magnitude.

With any sigmoid-shaped choice function, attempts to represent aggregate modal splits by the disaggregated function with homogeneous zone assumptions will similarly bias the results. The bias will be toward the extreme possible values of the modal split-too low for values of Eq. 6 below 0.5, too high above 0.5. Only in the special cases where zones are homogeneous or where interzonal splits are equal to 0.5 and their differential impedances are distributed symmetrically about their mean would there be no bias. The magnitude of the bias found in the example is sufficient to re­quire that the zone variances be accounted for in the aggregation process.

CALIBRATION OF AGGREGATE MODELS

We next examine the effects of calibrating an aggregate model directly from interzonal flows when, in fact, individual responses conform to the disaggregate behavioral model of Eq. 6. The typical aggregate modal-split model assumes that the frequency of choice of mode a for trips between i and j is given by a function of the differential impedance of the two modes, and differential impedance is measured in turn by a weighted com­bination of the zonal average differential attributes of the two modes. Given that the response curve is normal, this model is

(12)

where 'Y is the vector of unknown parameters weighting the components of differential trip attributes into a measure of differential impedance. Although Eq. 12 is identical to Eq. 6 for individual response probability, it would be justified in traditional aggre­gate demand analysis on the basis of its success as an empirical law rather than on behavioral grounds. Calibration of the model of Eq. 12 from data on interzonal flows would typically be done by applying least squares for origin-destination pairs to the equation

(13)

where w-1 is the inverse cumulative standard normal distribution, Tj1j is an unobserved error, and P1J are the observed frequencies of choice of mode a between zones. Ex­cept for possible adjustments introduced by use of weighted least squares, the result­ing estimator is

(14)

Page 8: AGGREGATE TRAVEL DEMAND FORECASTING FROM …onlinepubs.trb.org/Onlinepubs/trr/1975/534/534-003.pdf · AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel

31

Now, suppose that travel demand behavior is in fact determined by the disaggregated model in Eq. 6. Aside from sampling error, which is inversely proportional to the number of trips between zones, and which can be ignored when the total number of trips is large, the observed interzonal modal-split frequencies Pu from Eq. 10 will satisfy the following:

(15)

or

(16)

If one substitutes Eq. 16 in Eq. 14, the value of .Y converges, with a probability of one as sample size increases, to

'Y=[.~ziJz!J']-1

[~.ziJz!Jl/v1 + a~J]f3 1, J 1, J

(17)

From Eq. 17 we can draw inferences on the biases introduced in the estimation of 'Y by the presence of heterogeneity within the zones. First, we note that, in homogeneous zones where a~3 = O, .Y is a consistent estimator of {3. When zones are heterogeneous,

the effect of the term ~ will gene1·ally be to bias the estimates .Y downward in magnitude. If large magnitu~es of a component of z1 J tend to be associated with large values of a~ J' the bias in the corresponding coefficient will tend to be larger than for a coefficient for which the component of ziJ and a~J tends to be uncorrelated. Given that a~J is a constant a2 for all zones, Eq. 17 reduces to a si mple expression for the bias

y = 1 f3 ~

(18)

The magnitudes calculated previously suggest that y will underestimate f3 in magnitude by 18 percent or more in typical aggregate models.

The seriousness of the bias introduced in the calibration of .Y from interzonal flows depends on the extent to which these biases introduce errors in demand forecasts. Consider, first, the relative bias in various components of .Y and take the example of transit access time versus cost. In a typical transportation study, some zones will be densely serviced by transit, and this will result in low average transit access time and a low intrazonal variance for this variable. Other zones will be sparsely serviced, and this will lead to a high average transit access time and a high variance. This pattern will substantially bias the coefficient of transit access time; however, transit cost will be relatively homogeneous within zones, and this will result in less bias for the coef­ficient of cost. Therefore, the aggregate model will undervalue transit access time relative to cost and will therefore underpredict the net increase in transit demand, which would occur because of a policy change increasing the density of transit routes as well as transit cost.

Page 9: AGGREGATE TRAVEL DEMAND FORECASTING FROM …onlinepubs.trb.org/Onlinepubs/trr/1975/534/534-003.pdf · AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel

32

Aside from the distortions introduced by differential biases in parameter estimates, the forecasting success of the calibrated aggregate model will depend on whether the policy change wider study affects the intrazonal covariance matrix for the independent variables. Given that a~J is constant for all zones and that a policy change affects ziJ

but leaves A1J wichanged, the aggregate model forecasts correctly. Consider, for ex­ample, a wiiform increase in the base transit fare by comparing Eqs. 6 and 10, in which 'Y is set equal to the calibrated value from Eq. 18. The aggregate model fore­casts will err for any policy change that affects NJ; this will be the case particularly for policies that have a heterogeneous impact within the zone. For example, a policy change reducing access times by increasing the number of transit stops will reduce the access-time component in z1J and the corresponding variance in NJ. The true re­sponse given by Eq. 10 Will differ from the response p1·edicted by the calibr ated aggre­gate model by a term r eflecting the effect of this policy on ~J' If the differential im­pedances are distributed so that the frequency of extreme response probabilities in favor of transit is lower than that in favor of the alternative mode, the effect of de­creasing a~J will usually be to decrease transit patronage. As a result, the calibrated aggregate model will overpredict the rise in transit patronage.

The preceding paragraphs have pointed out that, by using the aggregate model of Eq. 12, direct calibration and forecasting can result in substantial prediction biases. This is true except when the zones are homogeneous or when policy change has a ho­mogeneous effect on the zone, which leaves the covariance matrix of the independent variables wichanged. However, when consistent estimates of the intrazonal variances a 1 J are available, the estimator of Eq. 17 can be modified to

(19)

Equation 19 gives a consis tent estimate of the parameter vector fj. Then, Eq. 10 can be used with the estimates y ancJ a~J to correctly forecast the effects of policy changes that are either homogeneous or heterogeneous in their impact on zones.

In practice, it may be feasible to obtain a consistent estimate of the intrazonal co­variance matrix A1J from external sources, but it may not be feasible to obtain the initial consistent estimators of fj necessary to construct estimators of a~J = {3 ' A1Jfj. An alternative is to consider the estimator in Eq. 19 as an implicit fwlction of y:

(20)

Solution of this system of equations by iterative methods provides a consistent estima­tor y* of fj. Given that NJ is wiiform across zones, this equation has the solution

(21)

where

M = "z1J z1J 1 Z.2. L#I

i, j

Page 10: AGGREGATE TRAVEL DEMAND FORECASTING FROM …onlinepubs.trb.org/Onlinepubs/trr/1975/534/534-003.pdf · AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel

and

M.y = :E ziJ ~-1cP1J) i, j

33

The difference in the denominator of Eq. 21 is likely to result in rather unsatisfactory statistical properties of this estimator in small samples. In particular, this estimator may fail to exist. Equation 21 may provide a useful initial step for iterative solutions of Eq. 20 for unequal NJ if we initially use some average A of the NJ. We conclude that an efficient and practical procedure for demand forecasting may be (a) to obtain consistent estimators of the NJ covariance matrices and the behavioral parameters {3,

(b) to use these estimates to obtain estimates of a~J = {3' A1 Jf3 and ,,/ 1 + a~P (c) to use interzonal flows in the adjusted aggregate model estimator in Eq. 19 to obtain a more precise estimator of (3, and (d) to use these estimates in Eq. 10 to obtain consistent forecasts of the effect of a transportation policy.

CALIBRATION OF DISAGGREGATED MODELS WITH AGGREGATE DATA

Given data for a sample of individuals, which include accurate measurements of the attributes of the transportation alternatives faced by each person, the disaggregated model of Eq. 6 can be calibrated by straightforward application of a variety of statis­tical techniques. These techniques include maximum likelihood procedures and, under suitable circumstances, minimum chi-square procedures (3, 6). A second approach to calibration is to use the conclusions of the preceding discussfOn to estimate the behav­ioral model from zonal data when the intrazonal covariance matrix of the independent variables is known or can be estimated consistently from external sources.

In practice, a third approach is important. Data on choices, socioeconomic char­acteristics, and some attributes of alternative trips are collected for a sample of in­dividuals, and the remaining attributes of trips are measured only by zonal averages. For example, individual data may be collected on income, age, and travel costs, and zonal averages obtained from transportation grids may be used for access and on­vehicle travel times. This introduces a measurement error in the independent vari­ables, which may bias the estimates obtained by applying the statistical methods or­dinarily used in calibrating disaggregated models . (The reason calibration of a mixed model with individual response frequencies and interzonal average explanatory variables gives consistent estimates and calibration of an aggregate model with zonal average re­sponse frequencies does not give such estimates is that an arithmetic average of inverse cumulative normal transformations of frequencies does not equal the inverse cumulative normal transformation of the arithmetic average of the frequencies. It is this nonlin­earity of the response curve that makes it necessary to distinguish disaggregate and aggregate models.)

We will now determine the conditions under which biases will occur and derive cor­rection formulas. The first useful conclusion is that replacing all independent vari­ables by zonal averages yields consistent estimates of the parameters of the behavioral model when the usual statistical methods are applied. (This conclusion depends criti­cally on the assumption that the intrazonal distributions of the independent variables are not skewed. If, to the contrary, these distributions are skewed, and modes lie between the zonal mean vectors and the total population mean vector, then the use of zonal aver­ages introduces a regression-to-the-mean effect, Which generally biases estimates downward in magnitude.) (Note that individual observations are used as the dependent variables and, thus, differ from the zonal average frequencies used in the previous section.) The loss of efficiency in estimation resulting from use of this procedure may be substantial because not all information is being used. However, in very large

Page 11: AGGREGATE TRAVEL DEMAND FORECASTING FROM …onlinepubs.trb.org/Onlinepubs/trr/1975/534/534-003.pdf · AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel

34

samples, this loss may be offset by the saving in cost of providing accurate measures of the travel attributes for each individual. The second result provides a straightfor­ward linear transformation of the estimates obtained from the usual statistical proce­dures, which makes the resulting estimators consistent. These correction formulas require external estimates of the intrazonal variance of the independent variables.

The behavioral model of Eq. 6 can be written as

(22)

Partition zk = (zl, zU, where z~ is the (possibly empty) subvector of components that can be measured for each individual and z~ is the subvector of components that will be approximated in the estimation process by the zonal averages. Then,

(23)

where (3 is partitioned commensurately. It is convenient to cast the calibration process into a regression format by assuming there are a large number of individuals of each type k and that Pk is the response frequency of this homogeneous group. Then,

(24)

where (k are error terms with a zero mean, which converge to zero with a probability of one as the numbers of members of the groups increase. Suppose now that z~ is re­placed by the zonal average z1J and apply ordinary least squares to the equation.

(25)

This general estimation procedure is known as Berkson's method, which under a cor­rect specification of the independent variables provides estimates of the parameters, which are consistent and equivalent in asymptotically large samples to the maximum likelihood estimator.

We rewrite Eq. 25 in vector notation as

(26)

where y and ( are column vectors with components yk and (k respectively, and z1 and "2"2 are matrices with rows z~' and z~J' respectively. The ordinary least squares es­timates from Eq. 26 then satisfy

(27)

But Eq. 24, written in the vector notation used above, is y = z1 {31 + z2{32 + (, and this implies

Page 12: AGGREGATE TRAVEL DEMAND FORECASTING FROM …onlinepubs.trb.org/Onlinepubs/trr/1975/534/534-003.pdf · AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel

35

(28)

The last term in this expression converges to zero with a probability of one as the num­ber of individuals of each type becomes large. Then &1, &.i converge (with a probability of one) in this limit to values a1, a:i and satisfy

(29)

We will now consider further the structure of the matrices in this expression. De­fine E to be a square block-diagonal matrix with a block for each zone pair i, j of the form (1/n1J) e 1Je1J ', where e1J is a column vector of ones of length n!J. Then, E satis­fies

Z2 = Ez2 (30)

and

E 2 = E

The average of the intrazonal covariance matrices A1J introduced previously will be denoted by A and can be defined in terms of the matrix E:

nA =I: n 13 A1J = I: I: (zk - Z1J)(zk - -z1l)' i, j i, j kE"N1J

(31)

Partition

_(An A12) A - ~1 A22 (32)

commensurately with (z1, Z2). Then, define MkQ from Eq. 31 as

(33)

Page 13: AGGREGATE TRAVEL DEMAND FORECASTING FROM …onlinepubs.trb.org/Onlinepubs/trr/1975/534/534-003.pdf · AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel

36

(34)

Equation 29 can be written

(35)

(36)

(°'1)- _! (M11 M12 + A12)-

1

(0 nA12) (a1) a2 n M21 M22 0 0 a2

(37)

Define

(38}

Then, writing out the expression for the inverse of a partitioned matrix, we obtain

(f31) (°'1) (M.lf(M12 + A12)C-

1 M21 + 111) _1

R = ,.. - c-1 M M11 A12°'2 1'2 ""2 - 21

(39)

The second term in Eq. 39 reflects the bias in the estimators of the disaggregated model introduced by using zonal average values for the variables z2. First, we note that, if the intrazonal correlations of the variables measured by zonal averages are all zero, i.e., A12 = O, then the usual estimators are consistent, and no correction is nec­essary. This is true, in particular, if all variables are measured as zonal averages so that z1 is empty. Second, we note that the magnitude of the bias is determined by the degree of intrazonal variability relative to total variability. If, for example, A12 = 8M12 for some e with O ~ e < 1, then

(40)

where C = M22 - (1 + 8)M21M;fM12. Third, Eq. 39 provides a correction for the usual disaggregate estimators, which makes them consistent for this problem. Application of this correction requires that consistent estimates of A12 be obtained from external sources. Estimates of A might be obtained from previous transportation surveys or from more limited data sets when it is possible to limit, a priori, the structure of A. An example of the latter construction would be an estimate of access-time covariance with income in each zone based on the geometry of the zone, the location of transit routes, and census block statistics on income.

Page 14: AGGREGATE TRAVEL DEMAND FORECASTING FROM …onlinepubs.trb.org/Onlinepubs/trr/1975/534/534-003.pdf · AGGREGATE TRAVEL DEMAND FORECASTING FROM DISAGGREGATED BEHAVIORAL MODELS Daniel

37

SUMMARY

This paper has established links between aggregate travel demand models and disaggre­gate behavioral models. Equation 10 provides a formula for computing interzonal flows, given a calibrated disaggregate model, and zonal averages and intrazonal variances for the independent variables. Biases resulting from direct calibration of an aggregate model when individual behavior conforms to the disaggregate model are derived. The implications of these biases for forecasting are discussed. Equation 20 provides a method for obtaining consistent estimates of the parameters of the disaggregate model from interzonal flow data. Biases introduced in calibration of disaggregate models when some independent variables are approximated by zonal averages are also dis­cussed. It is shown that the use of zonal averages for all independent variables results in consistent estimates of the parameters of the behavioral model. Equation 39 gives a formula for correcting the bias in the estimates of the behavioral parameters when a mix of individual and zonal average variables is used, and an estimate of the intra­zonal covariance matrix can be obtained from external sources.

ACKNOWLEDGMENT

This research has been supported by a grant from the National Science Foundation. This study has not been carried out as part of the research program of the Metropolitan Transportation Commission or the U.S. Department of Transportation, and the views expressed herein do not necessarily reflect the opinions of these agencies.

REFERENCES

1. M. Ben-Akiva. Structure of Travel Demand Models. Transportation Systems Division, Department of Civil Engineering, Massachusetts Institute of Technology, 1972, unpublished.

2. D. Brand. The State of the Art of Travel Demand Forecasting: A Critical Review. Harvard Univ., 1972, unpublished.

3. D. Cox. Analysis of Binary Data. Methuen, 1970. 4. T. Domencich and D. McFadden. Urban Travel Demand-A Behavioral Analysis.

North Holland, 1974. 5. T. Lisco. The Value of Commuters' Travel Time-A Study in Urban Transporta­

tion. PhD dissertation, Univ. of Chicago, 1967. 6. D. McFadden. Conditional Logit Analysis of Qualitative Choice Behavior. In

Frontiers of Econometrics (P. Zarembka, ed.), Academic Press. -7. D. McFadden. The Measurement of Urban Travel Demand. Journal of Public

Economics, 1974. 8. R. Quandt. The Demand for Travel: Theory and Measurement. D. C. Heath

and Company, 1970. 9. P. Stopher and T. Lisco. Modelling Travel Demand: A Disaggregate Behavioral

Approach-Issues and Applications. Proc., Transportation Research Forum, 1970, pp. 195-214.

10. F. Reid. Travel Time Data Base for the BART Impact Study. BART Impact Program, Metropolitan Transportation Commission, Working paper 2-1-74, 1974.