estimating demand outline where do demand functions come from? sources of information for demand...
Post on 19-Dec-2015
220 views
TRANSCRIPT
Estimating Demand
Outline
•Where do demand functions come from?
•Sources of information for demand estimation
•Cross-sectional versus time series data
•Estimating a demand specification using the ordinary least squares (OLS) method.
•Goodness of fit statistics.
The goal of forecasting
To transform available data into equations that provide the best possible forecasts of economic variables—e.g., sales revenues and costs of production—that are crucial for management.
Demand for air travel Houston to Orlando
Q = 25 + 3Y + PO – 2P
Recall that our demand function was estimated as follows:
[4.1]
Where Q is the number of seats sold; Y is a regional income index; P0 is the fare charged by a rival airline, and P is the airline’s own fare.
Now we will explain how
we estimated this demand
equation
Questions managers should ask about a forecasting equations
1. What is the “best” equation that can be obtained (estimated) from the available data?
2. What does the equation not explain?
3. What can be said about the likelihood and magnitude of forecast errors?
4. What are the profit consequences of forecast errors?
How do get the data to estimate demand forecasting equations?
•Customer surveys and interviews.
•Controlled market studies.
•Uncontrolled market data.
Campbell’s soup estimates demand functions from data
obtained from a survey of more than 100,000 consumers
Survey pitfalls Sample bias Response bias Response accuracy Cost
Time -series data: historical data--i.e., the data sample consists of a series of daily, monthly, quarterly, or annual data for variables such as prices, income , employment , output , car sales, stock market indices, exchange rates, and so on.
Cross-sectional data: All observations in the sample are taken from the same point in time and represent different individual entities (such as households, houses, etc.)
Types of data
Year Month Day Won per Dollar1997 3 10 8771997 3 11 880.51997 3 12 879.51997 3 13 880.51997 3 14 881.51997 3 17 8821997 3 18 8851997 3 19 8871997 3 20 886.51997 3 21 8871997 3 24 8901997 3 25 891
Time series data: Daily observations, Korean Won per dollar
Student ID Sex Age Height Weight
777672431 M 21 6’1” 178 lbs.
231098765 M 28 5’11” 205 lbs.
111000111 F 19 5’8” 121 lbs.
898069845 F 22 5’4” 98 lbs.
000341234 M 20 6’2” 183 lbs
Example of cross sectional data
Estimating demand equations using regression analysis
Regression analysis is a statistical technique that allows us to quantify the relationship between a
dependent variable and one or more independent or “explanatory” variables.
Y
X0
X and Y are notperfectly correlated.However, there is on average a positiverelationshipbetween Y and X
X1 X2
Regression theory
1
Y1
E(Y|X1)
Y
X0 X1
E(Y |Xi) = 0 + 1Xi
1 = Y1 - E(Y|X1)
We assume that expected conditional values
of Y associated with alternative values of X
fall on a line.
Our model is specified as follows:
Q = f (P) where Q is ticket sales and P is the fare
Specifying a single variable model
Q is the dependent variable—that is, we
think that variations in Q can be explained by
variations in P, the “explanatory” variable.
ii PQ 10
0 and 1 are called parameters or population parameters.
We estimate these parameters using the data we have available
iii PQ 10
Estimating the single variable model
[1]
[2]
Since the datapoints are unlikely to fall
exactly on a line, (1)must be modified
to include a disturbanceterm (εi)
Estimated Simple Linear Regression Estimated Simple Linear Regression EquationEquation
The The estimated simple linear regression estimated simple linear regression equationequation
0 1y b b x 0 1y b b x
• is the estimated value of is the estimated value of yy for a given for a given xx value. value.yy• bb11 is the slope of the line. is the slope of the line.• bb00 is the is the yy intercept of the line. intercept of the line.
• The graph is called the estimated regression line.The graph is called the estimated regression line.
Estimation Process
Regression ModelRegression Modelyy = = 00 + + 11xx + +
Regression EquationRegression EquationEE((yy) = ) = 00 + + 11xx
Unknown ParametersUnknown Parameters00, , 11
Sample Data:Sample Data:x yx y
xx11 y y11
. .. . . .. . xxnn yynn
bb00 and and bb11
provide estimates ofprovide estimates of00 and and 11
EstimatedEstimatedRegression EquationRegression Equation
Sample StatisticsSample Statistics
bb00, , bb11
0 1y b b x 0 1y b b x
Least Squares Method
Least Squares Criterion
min (y yi i )2min (y yi i )2
where:where:
yyii = = observedobserved value of the dependent variable value of the dependent variable
for the for the iith observationth observation^yyii = = estimatedestimated value of the dependent variable value of the dependent variable
for the for the iith observationth observation
Slope for the Estimated Regression Equation
1 2
( )( )
( )i i
i
x x y yb
x x
1 2
( )( )
( )i i
i
x x y yb
x x
Least Squares Method
yy-Intercept for the Estimated Regression -Intercept for the Estimated Regression EquationEquation
Least Squares MethodLeast Squares Method
0 1b y b x 0 1b y b x
where:where:xxii = value of independent variable for = value of independent variable for iithth observationobservation
nn = total number of observations = total number of observations
__yy = mean value for dependent variable = mean value for dependent variable
__xx = mean value for independent variable = mean value for independent variable
yyii = value of dependent variable for = value of dependent variable for iithth observationobservation
Line of best fit
The line of best fit is the one that minimizes the
squared sum of the vertical distances of the sample points from the
line
1. Specification
2. Estimation
3. Evaluation
4. Forecasting
The 4 steps of demand estimation using regression
Year and Average Number AverageQuarter Coach Seats Fare
97-1 64.8 25097-2 33.6 26597-3 37.8 26597-4 83.3 24098-1 111.7 23098-2 137.5 22598-3 109.6 22598-4 96.8 22099-1 59.5 23099-2 83.2 23599-3 90.5 24599-4 105.5 24000-1 75.7 25000-2 91.6 24000-3 112.7 24000-4 102.2 235
Mean 87.3 239.7Std. Dev. 27.9 13.1
Table 4-2
Ticket Prices and Ticket Sales along an Air Route
Simple linear regression begins by plotting Q-P values on a scatter diagram to determine if there exists an approximate linear relationship:
Scatter plot diagram
Passengers
16014012010080604020
Fare
290
280
270
260
250
240
230
220
210
Scatter plot diagram with possible line of best fit
Average One-way Fare
7
6
5
4
3
2
$ 2 0
2 0
2 0
2 0
2 0
2 0
Demand curve: Q = 330- P
500 100 150
Number of Seats Sold per Flight
Note that we use X to denote the explanatoryvariable and Y is the dependent variable.
So in our example Sales (Q) is the “Y” variable and Fares (P) is the “X” variable.
Q = Y
P = X
Computing the OLS estimators
We estimated the equation using the statistical software package SPSS. It generated the following output:
Coefficientsa
478.690 88.036 5.437 .000
-1.633 .367 -.766 -4.453 .001
(Constant)
FARE
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: PASSa.
Reading the SPSS Output
From this table we see that our estimate of 0 is 478.7 and our estimate
of 1 is –1.63.
Thus our forecasting equation is given by:
ii PQ 63.17.478ˆ
Step 3: Evaluation
Now we will evaluate the forecasting equation using standard goodness of fit statistics, including:
1. The standard errors of the estimates.
2. The t-statistics of the estimates of the coefficients.
3. The standard error of the regression (s)
4. The coefficient of determination (R2)
•We assume that the regression coefficients are normally distributed variables.
•The standard error (or standard deviation) of the estimates is a measure of the dispersion of the estimates around their mean value.
•As a general principle, the smaller the standard error, the better the estimates (in terms of yielding accurate forecasts of the dependent variable).
Standard errors of the estimates
The following rule-of-thumb is useful: The standard error of the regression coefficient should be less than half of the size of the corresponding regression coefficient.
2ˆˆ 11 ss
2
22ˆ1
i
i
xkn
es
Note that:
XXx ii
1sLet denote the standard error of our estimate of 1
Thus we have:
Where:
and
iii QQe ˆ
and
k is the number of estimated coefficients
Computing the standard error of 1
Coefficientsa
478.690 88.036 5.437 .000
-1.633 .367 -.766 -4.453 .001
(Constant)
FARE
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: PASSa.
By reference to the SPSS output, we see that the standard error of our estimate
of 1 is 0.367, whereas the (absolute value)our estimate of 1 is 1.63 Hence our estimate is about 4 ½
times the size of its standard error.
The SPSS output tells us that the t statistic for the the fare coefficient (P)
is –4.453 The t test is a wayof comparing the errorsuggested by the null
hypothesis to the standard error of the estimate.
To test for the significance of our estimate of 1, we set the following null hypothesis, H0, and the alternative hypothesis, H1
H0: 1 0
H1: 1 < 0
The t distribution is used to test for statistical significance of the estimate:
45.4049.0
063.1ˆ
1ˆ
11
s
t
The t test
The coefficient of determination, R2, is defined as the proportion of the total variation in the dependent variable (Y) "explained" by the regression of Y on the independent variable (X). The total variation in Y or the total sum of squares (TSS) is defined as:
n
i
i
n
i
i yYYTSS1
22
1
The explained variation in the dependent variable(Y) is called the regression sum of squares (RSS) and is given by:
n
i
i
n
i
i yYYRSS1
22
1
ˆˆ
Note: YYy ii
Coefficient of determination (R2)
What remains is the unexplained variation in the dependent variable or the error sum of squares (ESS)
n
i
i
n
i
i eYYESS1
22
1
ˆ
We can say the following:
•TSS = RSS + ESS, or
•Total variation = Explained variation + Unexplained variationR2 is defined as:
n
i
i
n
i
i
n
i
i
n
i
i
y
e
y
y
RSS
ESS
TSS
RSSR
1
2
1
2
1
2
1
2
2 1ˆ
1
We see from the SPSS model summary table that R2 for this model is .586
ANOVAb
6863.624 1 6863.624 19.826 .001a
4846.816 14 346.201
11710.440 15
Regression
Residual
Total
Model1
Sum ofSquares df
MeanSquare F Sig.
Predictors: (Constant), FAREa.
Dependent Variable: PASSb.
Model Summary
.766a .586 .557 18.6065Model1
R R SquareAdjusted R
Square
Std. Errorof the
Estimate
Predictors: (Constant), FAREa.
Note that: 0 R2 1
If R2 = 0, all the sample points lie on a horizontal line or in a circle
If R2 = 1, the sample points all lie on the regression line
In our case, R2 0.586, meaning that 58.6 percent of the variation in the dependent variable (consumption) is explained by the regression.
Notes on R2
This is not a particularly good fit based on R2 since 41.4 percent of the variation in the dependent variable is unexplained.
The standard error of the regression (s) is given by:
kn
e
s
n
i
i
1
2
Standard error of the regression
The model summary tells us that s = 18.6
Regression is based on the assumption that the error term is normally distributed, so that 68.7% of the actual values of the dependent variable (seats sold) should be within one standard error ($18.6 in our example) of their fitted value.
Also, 95.45% of the observed values of seats sold should be within 2 standard errors of their fitted values (37.2).
Model Summary
.766a .586 .557 18.6065Model1
R R SquareAdjusted R
Square
Std. Errorof the
Estimate
Predictors: (Constant), FAREa.
Step 4: Forecasting
ii PQ 63.17.478ˆ
Recall the equation obtained from the regression results is :
Our first step is to perform an “in-
sample” forecast.
At the most basic level, forecasting consists of inserting forecasted values
of the explanatory variable P (fare) into the forecasting equation to obtain forecasted values of the
dependent variable Q (passenger seats sold).
Year and Predicted Actual Quarter Sales (Q*) Sales (Q) Q* - Q (Q* - Q)sq
97-1 64.8 70.44 5.64 31.8197-2 33.6 45.94 12.34 152.2897-3 37.8 45.94 8.14 66.2697-4 83.3 86.77 3.47 12.0498-1 111.7 103.1 -8.6 73.9698-2 137.5 111.26 -26.24 688.5498-3 109.6 111.26 1.66 2.7698-4 96.8 119.43 22.63 512.1299-1 59.5 103.1 43.6 1900.9699-2 83.2 94.94 11.74 137.8399-3 90.5 78.61 -11.89 141.3799-4 105.5 86.77 -18.73 350.8100-1 75.7 70.44 -5.26 27.6700-2 91.6 86.77 -4.83 23.3300-3 112.7 86.77 -25.93 672.3600-4 102.2 94.94 -7.26 52.71
Sum of Squared Errors 4846.80
In-Sample Forecast of Airline Sales
In-Sample Forecast of Airline Sales
Year/Quarter
00.300.199.399.198.398.197.397.1
Pass
engers
160
140
120
100
80
60
40
20
Actual
Fitted
Our ability to generate accurate forecasts of the dependent variable depends on two factors:
•Do we have good forecasts of the explanatory variable?
•Does our model exhibit structural stability, i.e., will the causal relationship between Q and P expressed in our forecasting equation hold up over time? After all, the estimated coefficients are average values for a specific time interval (1987-2001). While the past may be a serviceable guide to the future in the case of purely physical phenomena, the same principle does not necessarily hold in the realm of social phenomena (to which economy belongs).
Can we make a good forecast?
Single Variable Regression Using Excel
We will estimate an equation and use it to
predict home prices in two cities. Our data set is on
the next slide
City Income Home Price
Akron, OH 74.1 114.9
Atlanta, GA 82.4 126.9
Birmingham, AL 71.2 130.9
Bismark, ND 62.8 92.8
Cleveland, OH 79.2 135.8
Columbia, SC 66.8 116.7
Denver, CO 82.6 161.9
Detroit, MI 85.3 145
Fort Lauderdale, FL 75.8 145.3
Hartford, CT 89.1 162.1
Lancaster, PA 75.2 125.9
Madison, WI 78.8 145.2
Naples, FL 100 173.6
Nashville, TN 77.3 125.9
Philadelphia, PA 87 151.5
Savannah, GA 67.8 108.1
Toledo, OH 71.2 101.1
Washington, DC 97.4 191.9
•Income (Y) is average family income in 2003
•Home Price (HP) is the average price of a new or existing home in 2003.
Model Specification
YbbHP 10
Scatter Diagram: Income and Home Prices
80
100
120
140
160
180
200
50 60 70 80 90 100 110
Income
Ho
me
Pri
ces
Regression Statistics
Multiple R 0.906983447
R Square 0.822618973
Adjusted R Square 0.811532659
Standard Error 11.22878416
Observations 18
CoefficientsStandard
Error t Stat
Intercept -48.11037724 21.58459326 -2.228922114
Income 2.332504769 0.270780116 8.614017895
ANOVA
df SS
Regression 19355.71550
2
Residual 162017.36949
8
Total 17 11373.085
Excel Output
YHP 33.211.48
City Income Predicted HP
Meridian, MS 59,600 $ 138,819.89
Palo Alto, CA 121,000 $ 281,881.89
Equation and prediction