topic 1. definitions - rice universitybatsell/part 4 linear model.pdf · part iv linear models...

PART IV LINEAR MODELS

Topic 1. Definitions

4-1

1. Scalar A scalar is a number.

2. Vector A vector is a column of numbers.

3. Linear combination A scalar times a vector plus a scalar times a vector, plus a scalar times a vector . . . etc.

4. Adding two vectors To add two (column) vectors, they must be of the same length, the same number of observations; adding two vectors together is accomplished by adding the contents of each row, one at a time, to form a new vector.

5. Multiplying a scalar by a vector To multiply a scalar by a vector, one simply creates a new vector the same length as the old vector, where every new value is calculated as the value in the old vector multiplied by the scalar.

A + B = C 1 0 1 1 0 1 1 0 1 1 0 1 0 1 1 0 1 1

2A + 3B = D 1 0 2 1 0 2 1 0 2 1 0 2 0 1 3 0 1 3


Topic 2. Linear Independence

4-2

1. Definition A set of vectors is said to be linearly independent if no vector in the set can be expressed as a linear combination of others in the set.

1. How to Determine Linear Independence

When faced with a set of vectors, it will sometimes be necessary to determine how many of the vectors are linearly independent. The steps below can be followed:

1) Determine if any vector in the set of N can be written as a linear combination of the rest.

If not, there are N linearly independent vectors. (Stop.)

2) If any one vector can be expressed as a linear combination of the rest (any scalars including 0 are permissible), then eliminate that vector.

3) Of the remaining N-1 vectors, determine if any one can be written as a linear combination

of the rest. If not, then among the N vectors, N-1 are linearly independent. If yes, eliminate the vector and proceed with the set of N-2 vectors.

4) Continue the process until all vectors in the set remaining are linearly independent. If k

vectors have been eliminated, there are (N-k) vectors that are linearly independent.

3. Examples 1) 2) 3)

A B C 1 1 0 1 1 0 1 1 0 1 0 1 1 0 1

A B C D E 1 1 0 1 0 1 1 0 0 1 1 0 1 1 0 1 0 1 0 1 1 0 1 1 0

A B C D E 1 1 0 1 1 1 1 0 1 0 1 0 1 0 1 1 0 1 0 1 1 0 1 0 0


Topic 3. Simple Example – Media Test

4-3

1. Simple Example – Media, No Unit Vector: Problem

The Data Sales Media 37 1 35 1 39 1 42 2 44 2 44 2 46 2

A) Full Model

S = a1M1 + a2M2 Sales a1M1 + a2M2 + Error

37 ___ ___ ___

35 ___ ___ ___

39 ___ ___ ___

42 ___ ___ ___

44 ___ ___ ___

44 ___ ___ ___

46 ___ ___ ___ 1) Fill in the 7 values for M1 and M2 above.

2) Solve for a1 and a2: a1 = _____ a2 = _____

3) Fill in the 7 values in the error vector above.

4) Solve for the error sum-of-squares for the Full Model. ESSF = _____.

5) Write down:

Value of expected value for Sales for media 1:

Value of expected value for Sales for media 2:


Topic 3. Media Test (Continued)

4-4

B) Restricted Model

S = bU

Sales = bU + Error

37 ___ ___

35 ___ ___

39 ___ ___

42 ___ ___

44 ___ ___

44 ___ ___

46 ___ ___ 1) Fill in the 7 values for U above.

2) Solve for b: b = _____


4) Solve for the error sum-of-squares for the Restricted Model. ESSR = _____.

5) Write down:

Value of expected value for sales for media 1: Value of expected value for sales for media 2:

C) Solve for F

F =

ESSR - ESSFNLF - NLR

ESSFNOB - NLF

Answer for F: _____________



4-5

2. Simple Example – Media, No Unit Vector: Answer

The Data Sales Media 37 1 35 1 39 1 42 2 44 2 44 2 46 2

A) Full Model

1) & 3) The Model Sales a1M1 a2M2 Error 37 1 0 0 35 1 0 -2 39 1 0 +2 42 0 1 -2 44 0 1 0 44 0 1 0 46 0 1 +2 2) a1 = 37 a2 = 44

4) ESSF =16

5) Value of expected value for Sales for Media 1: ___37____

Value of expected value for Sales for Media 2: ___44____



4-6

Linear restriction a1 = a2 = b Sales = a1M1 + a2M2 = bM1 + bM2 = b(M1 + M2) = bU

B) Restricted Model

1) & 3) ___________The Model_____________ Sales = bU Error 37 1 -4 35 1 -6 39 1 -2 42 1 +1 44 1 +3 44 1 +3 46 1 +5 2) b = 41

4) ESSR = 100

5) Value of expected value for Sales for Media 1: ___41____

Value of expected value for Sales for Media 2: ___41____

C) Solve for F

F =


ESSFNOB - NLF

=

25.262.3

84

5161

84

2716

1216100

===

−

−−



4-7

3. Simple Example – Media, With Unit Vector: Problem

A) Full Model

S = a0U + a3M1

Sales = a0U + a3M1 + Error

37 ___ ___ ___

35 ___ ___ ___

39 ___ ___ ___

42 ___ ___ ___

44 ___ ___ ___

44 ___ ___ ___

46 ___ ___ ___ 1) Fill in the 7 values for U and M1 above.

2) Solve for a0 and a3: a0 = _____ a3 = _____


4) Solve for the error sum-of-squares for this Full Model, ESSF = _____.

5) Write down:

Value of expected value for sales, media 1: Value of expected value for sales, media 2:

B) Restricted Model:

Note: it is the same as the Restricted Model in the above example with no unit vectors.

C) Solve for F.

F = _________________________________________________________



4-8

4. Simple Example – Media, With Unit Vector: Answer

A) Full Model

1) & 3)

Sales = a0U + a3M1 + Error 37 1 1 0 35 1 1 -2 39 1 1 +2 42 1 0 -2 44 1 0 0 44 1 0 0 46 1 0 +2 2) a0 = 44 a3 = -7

4) ESSF =16

5) Value of expected value for Sales for Media 1:

a0(1) + a3(1) = 44(1) - 7(1) = 37

Value of expected value for Sales for Media 2: a0(1) + a3(0) = a0 = 44

Linear restriction a3 = 0

B) Restricted Model

S = 0a U (Same as in example with no unit vector.)

C) Solve for F.

100 16 84842 1 1F 26.2516 16 3.2

7 2 5

−−= = = =

−



4-9

5. Simple Example – Media, With Unit Vector: SPSS

A) Data

sales media media1

1 37.00 1.00 1.00

2 35.00 1.00 1.00

3 39.00 1.00 1.00

4 42.00 2.00 .00

5 44.00 2.00 .00

6 44.00 2.00 .00

7 46.00 2.00 .00



4-10

B) Regression Output

Variables Entered/Removedb

MEDIA1a . EnterModel1

VariablesEntered

VariablesRemoved Method

All requested variables entered.a.

Dependent Variable: SALESb.

Model Summary

.917a .840 .808 1.7889Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), MEDIA1a.

ANOVAb

84.000 1 84.000 26.250 .004a

16.000 5 3.200100.000 6

RegressionResidualTotal

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), MEDIA1a.


Coefficientsa

44.000 .894 49.193 .000-7.000 1.366 -.917 -5.123 .004

(Constant)MEDIA1

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: SALESa.

ESSR-ESSF

ESSF ESSR

a0 a3

PART IV LINEAR MODEL

Topic 4. Simple Example – Price, 3 Levels

4-11

(A) The Model Simple Example Now we illustrate the one independent variable test using a simple example with only seven observations. The dependent variable is sales and the independent variable is price with 3 levels: $5, $10, or $15. Here is the raw data. Full Model 1. Full Model in theory

We construct the Full model in which we allow a different estimate for sales at each level of price:

S = a1P5 + a2P10 + a3P15 ……………….. Model (1) 2. Full Model in SPSS

Since SPSS automatically adds the unit vector to our model, we must drop one of the three binary predictor vectors, either P5, P10 or P15. (We must drop one of the vectors because Unit = P5 + P10 + P15, and this would introduce a linear dependency into the model. Dropping the vector is not a problem, however, because model (2) below and model (1) above are equivalent models.)

S = a0U + a1P5 + a2P10 ……………….. Model (2)

In this model (2), which we call our full model, sales is the dependent variable (measured at the interval or ratio level), U is the unit vector of all ones, a0 is the weight on the unit vector,

Trade Area Unit Sales Price Charged 1 140 $5 2 136 $5 3 122 $10 4 124 $10 5 104 $15 6 108 $15 7 106 $15


Topic 4. Price, 3 Levels (Continued)

4-12

P5 is a binary predictor vector, a1 is the weight on P5, P10 is a binary predictor vector and its weight is a2. What is in P5? P5 has ones and zeros. It has a one when the sales for the row came from a trade area where we charged $5 and a zero otherwise. What is in P10? P10 has ones and zeros. It has a one when the sales for the row came from a trade area where we charged $10 and a zero otherwise.

3. Converting raw observations into Full Model

The above observations about P5 and P10 are illustrated in the following complete depiction of the model.

S = a0U + a1P5 + a2P10

0001100

0000011

1111111

610810410

124122136140

(The binary predictor vectors P5 and P10 are created using the recode function in SPSS and the data on whether $5 or $10 was charged.)

4. Calculating expected value (EV) of Sales using Full Model

From our full model, what is the expected value of sales given that $5 (or $10) was charged? To answer that question we first have to say what is in each vector of the model under the condition “$5 was charged.” U has a 1, P5 has a 1 and P10 has a 0.

EV(S: $5) = a0U + a1P5 + a2P10 = a0(1) + a1(1) + a2(0) = a0 + a1

When “$10 was charged”, U still has a 1, but P5 has a 0 and P10 has a 1.

EV(S: $10) = a0U + a1P5 + a2P10 = a0(1) + a1(0) + a2(1) = a0 + a2



4-13

When “$15 was charged”, U still has a 1, but P5 has a 0, and P10 also has a 0. EV(S: $15) = a0U + a1P5 + a2P10 = a0(1) + a1(0) + a2(0) = a0 5. Parameters’ estimation in SPSS

The parameters a0, a1 and a2 are estimated in SPSS by using the Regression function under the Statistics menu where sales is the dependent variable and P5 and P10 are the independent variables. (Don’t worry about where they come from, this will be explained later.)

In our example, a0=106, a1=32, a2=17. (Details as per the SPSS outputs shown later.) So our full model can be written as: S = (106)U + (32)P5 + (17)P10

6. Restating the model with error term E1

• Using the parameter estimates, we can restate the model with the error term E1as follows. S = (106)U + (32)P5 + (17)P10 + E1

+−+−

+

022112-2

0001100

0000011

1111111

106108104124122136140

• How can we get the value of E1 in the above model?

First, we need to get the expected values of Sales at different price levels: $5, $10, and $15. To do so, we simply plug in the parameter estimates into our solutions in 4 earlier.

EV(S: $5) = a0 + a1, so our estimate for sales at $5 is 106 + 32 = 138. EV(S: $10) = a0 + a2, so estimate at $10 is 106 + 17 = 123. EV(S: $15) = a0, so estimate at $15 is 106. Then, by comparing the estimate value and the raw observations at each of the price level, we can get the value of E1.



4-14

7. Error sum-of-squares of full model (ESSF)

The error sum-of-squares of our full model is simply the sum of the squared errors in E1:

ESSF = (2) 2 + (-2) 2 + (-1) 2 + (1) 2 + (-2) 2 + (2) 2 + (0) 2

= 4 + 4 + 1 + 1 + 4 + 4 + 0

= 18

Restricted Model 1. The hypothesis in our test

The hypothesis we wish to test is whether our sample could have come from a population where there is no relationship between price and sales. Put another way, whether our sample could have come from a population where the sales at all three price levels were equal.

In another words, if there is no relationship between price and sales, the expected value of sales would be the same at different price levels. So We can state this hypothesis in null form:

EV(S: $5) = EV(S: $10) = EV(S: $15) Substituting the appropriate parameters, it can be re-written as:

a0 + a1 = a0 + a2 = a0

Note that the one and only condition when the above equation is true is where a1 = a2 = 0. 2. Linear restriction

So the linear restriction we impose on the Full Model (2) is a1 = a2 = 0. This is what we are testing. Could our sample have come from a population where a1 = a2 = 0?

3. Restricted model

The linear restriction gives us our restricted model

S = Ua 0′ ……………….. Model (3) (We write 0a′ because when SPSS automatically runs the restricted model with just a weight on the unit vector, the value for a0 in such a model will almost always be different than the value for a0 in Model (2), the full model. The least squares estimate for 0a′ in Model(3) is 120. Note: For a model in which only the unit vector is present, the weight on the unit vector will simply be the average of the dependent variable.)



4-15

Rewriting Model (3) with the error vector E2,we have:

S = 120 U + E2

−−−++

++

14121642

1620

1111111

106108104124122136140

Note that in this model we estimate sales to be the same (every time our estimate is 120), regardless of the price charged.

4. Error sum-of-squares of restricted model (ESSR)

The error sum-of-squares of our restricted model is simply the sum of the squared errors in E2:

ESSR = (20) 2 + (16) 2 + (2) 2 + (4) 2 + (-16) 2 + (-12) 2 + (-14) 2

= 400 + 256 + 4 + 16 + 256 + 144 + 196

= 1,272 F Statistic Calculation 1. Now we calculate our F statistic with the following numbers: • There are 3 linearly independent predictor vectors in the full model: (NLF = 3); • There is 1 linearly independent predictor vector in the restricted model: (NLR = 1); • There are 7 observations in our example: (NOB = 7) • Error sum-of-squares of the full model is 18: (ESSF = 18); error sum-of-squares of the

restricted model is 1,272 (ESSR = 1,272).

.33.1395.4

627

4182254,1

3718

1318272,1

F ===

−

−−

=



4-16

2. Interpretation

Using the SPSS output, we can get the probability that we would observe an F of 139.33 or larger in a sample taken from a population where the true F is zero is a very, very low number, less than .0005. Since these odds are so small, we conclude that our sample did not come from a population where the linear restriction is true (equivalent to saying F in population is not zero). So, if the linear restriction is not true in the population, this means that sales are different when the price is different; and we must think carefully about the price we charge. Perhaps we can use cost and margin data to figure out its optimal price.

(B) SPSS Data

area sales price p5 p10

1 1.00 140.00 5.00 1.00 .00

2 2.00 136.00 5.00 1.00 .00

3 3.00 122.00 10.00 .00 1.00

4 4.00 124.00 10.00 .00 1.00

5 5.00 104.00 15.00 .00 .00

6 6.00 108.00 15.00 .00 .00

7 7.00 106.00 15.00 .00 .00



4-17

Output Interpretation

Regression


P10, P5a . EnterModel1

VariablesEntered




Model Summary

.993a .986 .979 2.1213Model1



Predictors: (Constant), P10, P5a.

ANOVAb

1254.000 2 627.000 139.333 .000a

18.000 4 4.5001272.000 6


Model1


Predictors: (Constant), P10, P5a.


Coefficientsa

106.000 1.225 86.549 .00032.000 1.936 1.072 16.525 .00017.000 1.936 .570 8.779 .001

(Constant)P5P10

Model1

B Std. Error


Beta

Standardized

Coefficients

t Sig.


ESSR-ESSF

ESSF ESSR

a0 a1 a2

ESSR-ESSF ESSR-ESSF


Topic 5. More On The Concept of Linear Models And F Statistics Test

4-18

Example Data Base Assume we are working with the data from the example file in which we test marketing a new product. In the test market, we systematically varied prices ($5 or $10), advertising (equivalent to $10,000 per market area or $20,000 per market area) and secret ingredient X (essentially, at 4 different levels). There were 96 different test market areas, each roughly equivalent in terms of size, income, and all other relevant characteristics and we recorded the sales of the product in the market area after a suitable interval. In this data set, it is easy to identify the dependent variable (Sales) because everything else was part of the carefully controlled experiment. So what we want to do is to test for relationships between each of the controlled variables and Sales. Does price affect sales? Does advertising affect sales? Does the level of the secret ingredient affect sales? Assume for the moment that our data is really data from a population of interest (and not the sample that it is). The Logic What would be true if there was a relationship between the dependent variable Sales and the independent variable Price? We would observe that for different values of Price we obtained different values for Sales. If this were a product which was price sensitive, then we would expect for Sales to be higher when Price was lower. Since we are dealing with 48 observations for each price level, we would expect the average Sales for the price of $5 to be higher than the average Sales for the price of $10. One way to test this would be to calculate the 2 averages and compare them. (Remember, we assumed this was our population of interest, so if the averages are different, we conclude there exists a relationship.) But simply comparing averages will not work for all of the hypotheses we wish to test. There are many fairly complex hypotheses we wish to test that require us to think differently than simply in terms of averages.

Linear Models 1. The Concept of Linear Model In Topic 1 Part IV, we introduced the concept of a linear combination of a set of vectors. It is simply the sum of a weight times a vector, plus a weight times a vector, ....etc. Put most simply: A linear model is a linear combination of a set of predictor vectors.


Topic 5. Linear Models & F Test (Continued)

4-19

It is a model in the sense that it is intended to reproduce (or fit) the values for one variable (we call it the dependent variable) given the values on 1 or more other variables (we call them the independent variables). For example, we might create a linear model to predict Sales as a function of Price. Or advertising. Or Price and Advertising. Or Price, Advertising and our secret Ingredient X. 2. Full Model and Restricted Model To test our hypotheses, we need to create 2 models -- a full model and a restricted model -- and compare them in terms of their fit to a set of data. The restricted model is created by imposing a linear restriction on the weights in the full model. If the linear restriction is true, then the restricted model will fit the data almost as well as the full model. If the linear restriction is not true, then the restricted model will not fit the data as well as the full model. 3. Example Demonstration Now we use an example to illustrate the full and restricted model. Suppose we wish to test for a relationship between Price and Sales. • Full model We know that Price has 2 levels ($5 and $10). So we first create a full model in which we express Sales as a function of Price. Because Price has 2 levels, we form 2 predictor vectors: 1 to be associated with Sales values that resulted when Price was at $5 and the other to be associated with Sales values that resulted when Price was at $10. The predictor vectors will be binary, i.e., they will contain zeros or ones, and they can be thought of as membership vectors in the sense that they indicate whether a particular sales result is a "member of" the $5 price condition or the $10 price condition. The full model looks like this: S = a1(P5) + a2(P10) Where: S is the sales value P5 is the binary predictor vector which will contain i) a one if the observed sales value came from a test market

area where $5 was charged, and ii) a zero otherwise P10 is the binary predictor vector which will contain: i) a one if the observed sales value came from a test market

area where $10 was charged; and, ii) a zero otherwise a1 is the weight (to be estimated) for predictor vector P5, and a2 is the weight (to be estimated) for predictor vector P10.



4-20

If we submitted the above model and data to a software package, it would produce estimates for a1 and a2 equal to 134.83, and 122.31, respectively. (Incidentally, it turns out in this simple case that the estimate for a1 will be equal to the average sales when the price is $5 and the estimate for a2 will be equal to the average sales when the price is $10.) Some definitions: i) We call a1 the expected value for sales at a price of $5. We call 134.83 the value of the expected value for sales at a price of $5. ii) We call a2 the expected value for sales at a price of $10.

We call 122.31 the value for the expected value for sales at a price of $10. • Full model with an error vector Because we almost never have a model which fits the data perfectly, we must add an error vector E1 to our model. So the full model with an error vector looks like this: S = a1(P5) + a2(P10) + E1 While using the estimates of a1 and a2, as well as the observations from our data base, we can get the values of this error vector.

To calculate the value for the error term in row 1 we would have: 1 102 = 134.83 (0) + 122.31 (1) - 20.31 For rows 2, 3, 4, 95, and 96 we would have: 2 120 = 134.83 (0) + 122.31 (1) - 2.31

S = 134.83 (P5) + 122.31 (P10) + E11 102 0 1 -20.312 120 0 1 -2.313 137 1 0 +2.174 125 1 0 -9.83. . . . .. . . . .. . . . .

95 134 1 0 -0.8396 140 0 1 +17.69



4-21

3 137 = 134.83 (1) + 122.31 (0) + 2.17 4 125 = 134.83 (1) + 122.31 (0) - 9.83 95 134 = 134.83 (1) + 122.31 (0) - 0.83 96 140 = 134.83 (0) + 122.31 (1) + 17.69 Focus, for a moment on the error vector. The weights, a1 = 134.83 and a2 = 122.31, are chosen so as to minimize the sum of the squares of the error terms. There is no other set of values for a1 and a2 that would produce a lower error sum of squares. The error sum-of-squares is a measure of how well our model "fits" the data. In our full model, the error sum-of-squares ESSF = 8,778.98. • Restricted model Remember that our model has allowed for one estimate for sales at a price of $5 (134.83) and another estimate for sales at a price of $10 (122.31). The fact that there is a difference between the averages suggests there is a relationship. But our way of testing this is to now create a restricted model which does not allow for differences in estimates for sales at price = $5 and price = $10 and compare the new error sum-of-squares to the old error sum-of-squares. To create our restricted model, we need to impose a linear restriction on the weights in the full model that embodies our hypothesis. In this case our hypothesis (stated in the "null" form) is that:

There is no relationship between the price charged and the resulting sales. In terms of the expected values for the full model, our hypothesis is that:

The expected value for sales at price = $5 is equal to the expected value for sales at price = $10: EV(S:P5) = EV(S:P10)

But in our full model, the expected value for sales at price = $5 is a1 and the expected value for sales at price = $10 is a2. So in terms of the weights, the hypothesis is represented by the linear restriction: EV(S:P5) = EV(S:P10) = a1 = a2

Now we impose the linear restriction on the weights in the full model (let a1 = a2 = c), we get our restricted model (with error vector): S = c(P5 + P10) + E2



4-22

But P5 + P10 gives us a vector with all ones. We label such a vector the unit vector, u. So our restricted model is S = c(u) + E2. Our least-squares estimate for c is 128.57. (Incidentally, when the restricted model is just the unit vector, the weight will be the average of all of the values for the dependent variable.) The error sum-of-squares for the restricted model, is ESSR = 12,541.49. • Analysis By imposing the linear restriction the ESS went from 8,778.98 to 12,541.49. Thus, the restricted model is not nearly as good a fit as the full model. F Statistics 1. The concept But we can't use ESS alone as our index of fit. Differences between ESS for a full model and a restricted model, although affected by differences in fit, can also be affected by differences in the number of parameters being estimated. For this reason we need to construct an index which takes all relevant factors into consideration and provides one single summary of the difference between the full model and the restricted model. We call our index the F statistic and it is calculated using the following formula:

F =


ESSFNOB - NLF

Where ESSR: the error sum-of-squares for the restricted model; ESSF: the error sum-of-squares for the full model; NLF: the number of linearly independent predictor vectors in the full model; NLR: the number of linearly independent predictor vectors in the restricted model; NOB: the number of observations on which the two models are based. Note that, all other things equal, the greater the difference between ESSR and ESSF, the greater will be the value for F. Also note that when ESSR = ESSF (ESSR can never be less than ESSF), F equals zero.



4-23

2. Sampling error concern Now suppose we reintroduce the fact that our data is really a sample. If no relationship exists between price and sales in the population then ESSR will equal ESSF in the population. That is, the average sales for both price levels will be the same. Therefore, it won't matter whether we allow 2 estimates (as we do in the full model) or 1 estimate (as we do in the restricted model). If ESSR = ESSF in the population, then F = 0 in the population. Thus, when there is no relationship between Price and Sales in the population, the F will be zero. But because we are taking samples, it would be possible for us to obtain sample F values that were not zero, even though the true F for the population was zero. So we need to know the sampling distribution for the F statistic. The sampling distribution for F depends on degrees of freedom. But this time, instead of only 1, there are 2: DF1 and DF2. DF1 = NLF – NLR, the denominator in the numerator in the formula for F; DF2 = NOB – NLF, the denominator in the denominator in the formula for F. Once we know DF1 and DF2, we can draw the sampling distribution for F. 3. Example Demonstration In our example

F =

12,541.49 − 8,778.98

2 − 18,778.98

96 − 2

=

3,762.51

18,778.98

94

=

3,762.51

93.39= 40.29

The probability that, with DF1 = 1 and DF2 = 94, we would get an F of 40.29 or larger in a sample taken from a population where the true F was 0, is .0001. Since this probability is so low, we can conclude that our linear restriction a1 = a2 is probably not true in the population from which this sample was taken. Thus, the average sales in the population where we charge $5 would not be the same as the average sales where we charge $10, so there must be a relationship between price and sales.


Topic 6. Steps For One Variable Test

4-24

Suggested Steps for Conducting One Independent Variable Test 1. Pick two variables where you believe one variable is dependent on (i.e. is possibly

caused by) the other. Label the two variables as dependent and independent, respectively. The dependent variable must be at least interval scaled. (An exception for this will be made in this class for the Fail3/Fail4 database where the dependent variable is binary, 1 or 0.)

2. Now inspect the values for the dependent variable. If a plot of the values for the dependent variable reveals that a few values are clearly outliers -- that is, a few are very large or very small and clearly set apart from the rest of the observations – then create a new working file in which the entire row for each of these “outlier” observations has been deleted.

3. With the observations that remain after step 2, now focus on the values for the independent variable. If the independent variable is nominal and/or takes on only a few discrete values, then proceed to step 4. But if the independent variable is continuous, then try to divide its values into roughly 4 to 7 groups where the interval widths are equal. To group your observations: a) Decide on the number of groups you would like to have; b) Ignore the extreme values of the independent variable, calculate the interval width

as (Max – Min)/(# of intervals desired).

4. For each different group on the independent variable, use the recode feature to create a binary predictor vector (a membership vector). Make sure to recode missing values on the independent variable for a row into missing values in the binary predictor vector for that row.

5. Make certain you have at least 5 observations per group. If you don’t, you need to recode differently and go back to step 4. Checking for at least 5 observations per group can be accomplished by running frequencies or descriptives on the binary predictor vectors.

6. Use regression under the statistics menu to run the model.

7. Pull the appropriate numbers from the output to complete the tables illustrated in the example one-variable test assignment.


Topic 7. Two Independent Variable Test

4-25

(A) The Two Independent Variable Test With Binary Predictor Vectors 1. In this test, we select two independent variables and create binary predictor vectors

for both. When you create the binary predictor vectors, make sure they contain either ones, zeros, or the system-missing value indicator. Create such vectors for all levels of both variables, not just the N-1 levels you have been creating.

2. Now create new binary predictor vectors by multiplying (Transform/Compute) every binary predictor vector on the first independent variable by every binary predictor vector on the second independent variable. If the first variable has N levels and the second has K levels, in this step you will be creating N times K vectors. For example, in test data, two levels on price crossed with four levels on X gives 8 new binary predictor vectors. P5x1 would be a vector with one when the sales came from a trade area where $5 was charged and xlevel was 1 gram; it would have zeros otherwise. P5x2 would have ones where $5 was charged and there were 1.5 grams of the secret ingredient. This would continue all the way up to P10x4 which is the last of the eight vectors and would have ones where $10 was charged and there were 3 grams of the secret ingredient. If you were testing a 2-level variable by a 2-level variable, you would be creating 4 binary predictor vectors.

3. Now you need to create the full model (Model 1). To run the model we must drop one of our 8 (xlevel by price example) binary predictor vectors because SPSS is going to add the unit vector. We run this model by submitting the N*K-1 predictor vectors. We get this model’s error sum-of-squares from the residual line of the output and the parameters from the output just like before.

4. SPSS automatically tests the linear restriction that all parameters (except the weight on the unit vector) are zero.

5. We now want to test to see if price mattered in our Full model and we perform that test by forcing the information on price out of our model and just running with the binary predictor vectors for xlevel. The results for this model (Model 2) are compared to the results of the Full model in the form of an F test and using the F tables.

6. To test to see if xlevel mattered in our Full model we force the information on xlevel out of our model and see how much worse the model (Model 3) with just price fits in the form of an F test.


Topic 7. Two Independent Variable Test (Continued)

4-26

7. Note: If you have some rows missing data on one of your two independent variables but not the other, it will be necessary to create the binary predictor vectors for models 2 and 3 by summing up the appropriate vectors from the full model. This is the only way that all 3 models will be run with exactly the same set of observations. For example, to create the vectors for running just price you need to recreate p5 by summing p5x1+p5x2+p5x3+p5x4. x1 would be created by summing p5x1+p10x1, and so on.

(B) SPSS Outputs Descriptives Because the mean of a column of 1’s and 0’s is the proportion of 1’s, the descriptives output can be used to easily calculate how many 1’s are in each binary predictor vector. For example, 96 x .1250 =12. Thus all 8 binary predictor vectors (BPVs) have exactly 12 observations. Given our knowledge of the data, this is what we would have expected.

Descriptive Statistics

96 .125096 .125096 .125096 .125096 .125096 .125096 .125096 .125096

P5X1P5X2P5X3P5X4P10X1P10X2P10X3P10X4Valid N (listwise)

N Mean



4-27

Model 1 Regression Variables Entered/Removedb

P10X3,P10X2,P10X1,P5X4,P5X3,P5X2,P5X1

a

. Enter

Model1

VariablesEntered




Model Summary

.891a .795 .778 5.5306Model1


Std. Errorof the

Estimate

Predictors: (Constant), P10X3, P10X2, P10X1,P5X4, P5X3, P5X2, P5X1

a.

ANOVAb

10411.958 7 1487.423 48.629 .000a

2691.667 88 30.58713103.625 95


Model1

Sum ofSquares df

MeanSquare F Sig.

Predictors: (Constant), P10X3, P10X2, P10X1, P5X4, P5X3, P5X2, P5X1a.


Coefficientsa

129.250 1.597 80.957 .000-.583 2.258 -.017 -.258 .7974.667 2.258 .132 2.067 .0428.583 2.258 .243 3.802 .000

-2.417 2.258 -.068 -1.070 .287-24.250 2.258 -.686 -10.740 .000-12.000 2.258 -.340 -5.315 .000

8.500 2.258 .241 3.765 .000

(Constant)P5X1P5X2P5X3P5X4P10X1P10X2P10X3

Model1

B Std. Error


Beta

Standardized

Coefficients

t Sig.




4-28

Model 2 Regression Variables Entered/Removedb

RXLEVEL3,RXLEVEL2,RXLEVEL1

a . Enter

Model1

VariablesEntered




Model Summary

.639a .408 .389 9.1806Model1


Std. Errorof the

Estimate

Predictors: (Constant), RXLEVEL3, RXLEVEL2,RXLEVEL1

a.

ANOVAb

5349.542 3 1783.181 21.157 .000a

7754.083 92 84.28413103.625 95


Model1

Sum ofSquares df

MeanSquare F Sig.

Predictors: (Constant), RXLEVEL3, RXLEVEL2, RXLEVEL1a.


Coefficientsa

128.042 1.874 68.326 .000-11.208 2.650 -.415 -4.229 .000-2.458 2.650 -.091 -.928 .3569.750 2.650 .361 3.679 .000

(Constant)RXLEVEL1RXLEVEL2RXLEVEL3

Model1

B Std. Error


Beta

Standardized

Coefficients

t Sig.




4-29

Model 3 Regression


P5a . EnterModel1

VariablesEntered




Model Summary

.407a .165 .156 10.7869Model1


Std. Errorof the

Estimate

Predictors: (Constant), P5a.

ANOVAb

2166.000 1 2166.000 18.615 .000a

10937.625 94 116.35813103.625 95


Model1

Sum ofSquares df

MeanSquare F Sig.

Predictors: (Constant), P5a.


Coefficientsa

122.313 1.557 78.559 .0009.500 2.202 .407 4.315 .000

(Constant)P5

Model1

B Std. Error


Beta

Standardized

Coefficients

t Sig.



Topic 8. F Tables

4-30

Steps In Testing Hypotheses Using The F Tables 1. Run the Full and Restricted Models, calculate the F statistic using the appropriate error sum-

of-squares, and note the two degrees of freedom. Say our sample’s calculated F was 7.2.

2. Pick the probability you wish to use for this test: .01, .025, .05, .10, then use the 1% table, the 2.5% table, the 5% table, or the 10% table respectively.

3. For your test’s degrees of freedom, look up the F value. 4. What does the F value from the table indicate? Say we are working with the 5% table and

the F we pull from the table is 3.0.

This means that (only) 5% of the time would one get an F of 3.0 or larger from a sample taken from a population where the true F was 0.

Put another way: We would say (only) 5% of the time would one get an F of 3.0 or larger from a sample taken from a population where the linear restriction on the parameters of the Full Model to get the Restricted Model was true.

(In the case of the one independent variable test) Put another way: (Only) 5% of the time would one get an F of 3.0 or larger from a sample taken from a population where the average for the dependent variable was the same across all levels of the independent variable.

5. Since our calculated F of 7.2 is larger than the table F of 3.0, the odds in all three statements of 4 above are less than 5% for our sample.

6. Thus, we can conclude that:

The F for the population form which our sample came is probably not zero. Or,

Our sample probably did not come from a population where the linear restriction is true. Or,

(In the case of the one independent variable test) Our sample probably did not come from a population where the average for the dependent variable was the same across all levels of the independent variable.

7. Thus, we conclude there is probably a relationship between the two variables in the population from which our sample came.


Topic 9: Test For Linearity

4-31

The Logic In the test for linearity, we first specify a full model in which we create binary predictor vectors for each of several different levels (at least three) of an independent variable. We want to find out if constant increases in the independent variable will result in constant increases in the dependent variable. For example, assume that when the value for the independent variable increases from 1 to 2 (an increase of 1 unit) that the value for the dependent variable increases from 15 to 30 (an increase of 15 units). If it is also true that for any other one unit increase on the independent variable, the dependent variable increases by approximately 15 units, and for 1/2 unit increases on the independent variable the dependent variable increases by 7.5 units, then the relationship is probably linear. But to know whether this sample could have come from a population where the relationship is linear we must do a statistical test. Hypothesis for Test In reality we don't believe that XLEVEL is linearly related to Sales. But the null hypothesis is that XLEVEL is linearly related to sales. We will test this hypothesis by comparing, with an F statistic, the ESS for the full model (with binary vectors) to the ESS of a restricted model in which the relationship is forced to be linear. Full Model 1. Full model with unit vector

The dependent variable is sales, the independent variable is Xlevel with 4 levels. The full model with unit vector is:

S = a0u + a2X2 + a3X3 + a4X4 where X2 contains a 1 if the sales figure came from an area where the level of ingredient "X"

was 1.5 and a zero otherwise; and so on through X4. X2 has 24 1's and 72 0's. The same is true for X3 through X4. The unit vector, of course, has all 1’s.

2. Expected value of Sales at each xlevel in full model EV(S: X1) = a0(1) + a2(0) + a3(0)+ a4(0)= a0 EV(S: X2) = a0(1) + a2(1) + a3(0)+ a4(0)= a0 + a2 EV(S: X3) = a0(1) + a2(0) + a3(1)+ a4(0)= a0 + a3 EV(S: X4) = a0(1) + a2(0) + a3(0)+ a4(1)= a0 + a4


Topic 9: Test For Linearity (Continued)

4-32

3. SPSS output – full model Regression


XL4, XL3,XL2

a . Enter

Model1

VariablesEntered




Model Summary

.639a .408 .389 9.1806Model1



Predictors: (Constant), XL4, XL3, XL2a.

ANOVAb

5349.542 3 1783.181 21.157 .000a

7754.083 92 84.28413103.625 95


Model1


Predictors: (Constant), XL4, XL3, XL2a.


Coefficientsa

116.833 1.874 62.345 .0008.750 2.650 .324 3.302 .001

20.958 2.650 .777 7.908 .00011.208 2.650 .415 4.229 .000

(Constant)XL2XL3XL4

Model1

B Std. Error


Beta

Standardized

Coefficients

t Sig.


ESSF

a0 a2 a3 a4

a0



4-33

4. For this full model, the R-square is .408, adjusted R-square is .389, and the standard error of

the estimate is 9.1806. Using the parameters estimated by SPSS, we can calculate the expected value of Sales.

EV(S: X1) = a0 = 116.833 EV(S: X2) = a0 + a2 = 116.833 + 8.750 = 125.583 EV(S: X3) = a0 + a3 = 116.833 + 20.958 = 137.791 EV(S: X4) = a0 + a4 = 116.833 + 11.208 = 128.038 Restricted Model 1. EV(S:X2) - EV(S:X1) = (a0 + a2) - (a0) = a2=.5c

EV(S:X3) - EV(S:X2) = (a0 + a3 ) - (a0 + a2) = a3- a2= .5c EV(S:X4) - EV(S:X3) = (a0 + a4) - (a0 + a3) = a4- a3 =c

a2 = .5c a3 = a2 + .5c = .5c + .5c = c a4 = a3 + c = c + c = 2c

2. Restricted model 3. Expected value of Sales at each xlevel in restricted model

42325.'0 cXcXcXUaS +++=

)42325(.'0 XXXcUa +++=

'0

'0 )]0(2)0()0(5[.)1()1:( acaXSEV =+++=

cacaXSEV 5.)]0(2)0()1(5[.)1()2:( '0

'0 +=+++=

cacaXSEV +=+++= '0

'0 )]0(2)1()0(5[.)1()3:(

cacaXSEV 2)]1(2)0()0(5[.)1()4:( '0

'0 +=+++=



4-34

4. SPSS output - restricted model Regression


LINVECTa . EnterModel1

VariablesEntered




Model Summary

.346a .120 .110 11.0787Model1


Std. Errorof the

Estimate

Predictors: (Constant), LINVECTa.

ANOVAb

1566.201 1 1566.201 12.760 .001a

11537.424 94 122.73913103.625 95


Model1

Sum ofSquares df

MeanSquare F Sig.

Predictors: (Constant), LINVECTa.


Coefficientsa

122.283 1.752 69.808 .0005.462 1.529 .346 3.572 .001

(Constant)LINVECT

Model1

B Std. Error


Beta

Standardized

Coefficients

t Sig.


ESSR

c '0a



4-35

4. For this restricted model, the R-square is .120, adjusted R-square is .110, and the standard error of estimate is 11.0787.

Using the parameters estimated by SPSS, we can calculate the expected value of Sales.

283.122)1:( '0 == aXSEV

014.125)462.5(5.283.1225.)2:( '

0 =+=+= caXSEV

745.127462.5283.122)3:( '0 =+=+= caXSEV

207.133)462.5(2283.1222)4:( '

0 =+=+= caXSEV Analysis 1. Expected value of Sales

2. F-statistic calculation

ESSR = 11,537.424, ESSf = 7,754.083, NLF = 4, NLR = 2, NOB = 96

44.222835.846705.1891

92083.7754

2341.3783

496083.775424

083.7754424.11537

===

−

−−

=F

TEST FO R LINE AR ITY

102

107

112

117

122

127

132

137

142

147

X LEVEL

SALE (in 1000's)

FULL MODEL RES TRICTE D MODEL

1 GRAM 1.5 GRAMS 2 GRAMS 3 GRAMS



4-36

3. Conclusion

Reject the linear restriction. We would observe an F of 4.88 or larger (with degrees of freedom df1 = 2 and df2 = 92) from a sample taken from a population where the true F was zero only 1% of the time. Since the above computed F is much larger, we would observe it, or one larger, even less often. Therefore, the sample probably did not come from a population where the linear restriction would have been true.

The relationship is not linear, so the peak at the third level of XLEVEL, about two units of the secret ingredient, after which sales go down, is probably not just a chance occurrence. There does appear to be an ideal level on "X".


Topic 10. Steps For Linearity Test

4-37

Suggested Steps For Conducting The Linearity Test 1. Pick two variables that you have already established are related to each other. As

always in linear models, the dependent variable must be measured at the interval or ratio level. (We still include the exception where the dependent is two-level, reflected at 1 or 0, as in Fail3/Fail4). In this test, however, the independent variable should be measured at the interval or ratio level as well. (If the independent variable is only ordinal but has 6 or more levels (i.e., 6 or more binary predictor vectors), then an exception can be made that allows use of this ordinal-level variable in the linearity test, but be sure and tell the reader.)

2. Now inspect the values for the dependent variable. If a plot of the values for the

dependent variable reveals that a few values are clearly outliers – that is, a few are very large or very small and clearly set apart from the rest of the observations – then create a new working file in which the entire row for each of these “outlier” observations has been deleted.

3. With the observations that remain after step 2, now focus on the values for the

independent variable. If there exist natural breaks or meaningful categories, then use these natural breaks or meaningful categories to decide on the separate levels for the independent variable. You are seeking 4 to 7 different groups. The absolute minimum number of different groups for running the linearity test is 3. If there are no natural breaks or logically meaningful categories, then:

a) Decide on the number of groups you would like to have; b) Ignore the extreme values of the independent variable, calculate the interval width

as (Max – Min)/(# of intervals desired). 4. For each different group on the independent variable, use the recode feature to first

create a recoded independent variable. Then use the recode feature and the recoded independent variable to create a binary predictor vector (a membership vector) for each level on the independent variable. Make sure to recode missing values on the independent variable for a row into missing values in the binary predictor vector for that row.

8. Make certain you have at least 5 observations per group. If you don’t, you need to

recode differently and go back to step 4. Checking for at least 5 observations per group can be accomplished by running frequencies or descriptives on the binary predictor vectors.

9. Use regression under the statistics menu to run this model with binary predictor

vectors


Topic 10. Steps For Linearity Test (Continued)

4-38

10. Pull ESS (residual) as ESS for this full model, R-square, adjusted R-square, and the

parameter estimates from the output. Use the parameter estimates and the values of 1 or 0 in the predictor vectors to calculate the values of the expected values for the full model for each level on the independent variable.

11. For each level on the independent variable, write down a number that serves to indicate a

typical number on the independent variable for that level. This can be the midpoint of the numbers in the range, or the average of the values for the independent variable in that range, or if the distribution of values in the range in skewed, you can make a rough estimate of where the median would be.

12. Use the indicator value (form step 8) for the range to create LINVEC. For the Xlevel

example remember that the indicator values were 1, 1.5, 2, and 3 and LINVEC had 0, .5,1, and 2; essentially, LINVEC will have a 0 for the first level, and then the difference from the first level to each of the other levels. For levels 2 through 4 in the Xlevel example, LINVEC has 1.5-1=.5; 2-1=1; 3-1=2.

13. Run the restricted model with LINVEC as the only independent variable. Of course SPSS

adds the unit vector. 14. Pull ESS (residual) as ESS for this restricted model, R-square, adjusted R-square, and the

two parameter estimates from the output. Use the parameter estimates and the values in U and LINVEC for the various levels to calculate the values of the expected values for this restricted model.

15. Calculate the F statistic. Select the critical probability you are using for this test and go to

the table (one of 4: p=.10, p=.05, p=.025, or p=.01) for that critical probability. Use the 2 degrees of freedom for your calculated F to find the critical F in the table.

16. Compare your calculated F to the critical F in the table. If you calculated F is larger than the

critical F, reject the linear restriction; if not, accept the linear restriction. If you reject, you are concluding that the sample probably did not come from a population where the relationship is linear. If you accept, you are concluding that the sample probably did come from a population where the relationship is linear.


Topic 11: Building Regression Models

4-39

Introduction In previous assignments, the models have been constructed with binary predictor vectors to represent different levels of an independent variable. The independent variable could be any level of measurement: nominal, ordinal, interval or ratio. If the variable was continuous, it had to be recoded based on ranges which then constituted each of the levels on the independent variable and these were reflected in binary predictor vectors. The models we will now build, however, can have both binary predictor vectors as well as the raw values of the independent variables. We may also create what are called “pseudo-variables” by squaring the independent variables or taking the product between pairs of independent variables. Variables In The Model

• The Dependent Variable

The dependent variable should be measured at the interval or ratio level. (Exceptions are sometimes made if the dependent is nominal but only has two levels of 0 and 1, or if the variable is ordinal but has many levels.)

It is a good idea to examine a histogram (or frequency distribution) of the dependent variable and identify any outliers. Outliers are values extremely far removed (either much lower or much higher) from the bulk of roughly continuous values. In examining the distribution of the dependent variable, it is also sometimes useful to have the mean and standard deviation of the dependent variable and consider how many standard deviations away from the mean a particular value is. Essentially, if you have a few values on the dependent variable that are very far removed from the other values, then you probably need to delete these observations from the dataset before building your regression model.

• The Independent Variables

The independent variables with which you build your regression model can contain either the original, raw values of the independent variables, or they can be binary predictor vectors representing each of several different levels of an independent variable. If the independent variable is nominal, or if it is ordinal with only a few levels, then binary predictor vectors must be created to represent the different levels. If the independent variable is interval or ratio, then you can use either the raw values, or binary predictor vectors representing the various levels.


Topic 11: Regression Models (Continued)

4-40

• Curvilinear Relationships If you use the raw values on the independent variable, then you are essentially assuming that the best way to capture the relationship between the dependent variable and the independent variable is linear, or in the form of a straight line. If, however, you believe the nature of the relationship between the two variables is best represented by a curve, then you need to include both the raw values on the independent variable as well as a pseudo-variable that contains the square of the raw values of the independent variable.

• Interaction Effects

If one has several independent variables in a regression model, one way to describe the modeling of the effect of the independent variables on the dependent is that the effects are additive. That is, we can capture the overall effect by simply adding up the various effects from each variable separately. Sometimes, however, the combined or joint effects of the independent variables cannot be captured in a simple additive form. In these cases one needs to construct new pseudo-variables that are the product of two independent variables.

Example Data Base We will use the testdata (X level) database to illustrate the points above as well as how to interpret the output and how to build a regression model.

• Test Objectives Recall that in testdata we had 96 observations. Unit sales is our dependent variable and

advertising, price and X level are our independent variables. We wish to build and test regression models to accomplish two objectives:

1) We want to test whether, from the experiment generated data, advertising, price or X

level have an impact on sales; and, 2) We want to build a model that will allow us to predict sales for any particular

combination of values for advertising, price and X level.

• Create New Independent Variables Based on our earlier tests we have good reason to believe that Sales may be related to X

level in a curvilinear fashion. So, before running the regression, we create a new variable called “XLSQ” which is the square of X level. We also have reason to believe that to fully capture the effect of X level and price in terms of the way in which they affect sales we need an interaction term. So, we create another new variable called “XLPR” which is the product of price and X level.



4-41

• Run The Regression We now submit all of the variables to SPSS in a regression run where we label “SALES” as the dependent variable and “ADVER”, “PRICE”, “XLEVEL”, “XLSQ” and “XLPR” as the independent variables. When the output (details as per the attached Output 1) comes back we first note the F and Sig. (significance) in the ANOVA table. SPSS automatically compares our model to a restricted model with just the unit vector. This is essentially a test of the linear restriction that the weight on every one of our variables (except the constant) is zero. The large F of 49.41 and low probability (.000) shows that our model does better than chance in fitting sales. The probability is zero (at least to three decimal places) that we would have observed an F of 49.41 or larger in a sample taken from a population where the true F was zero. So we know our model is worthwhile pursuing. Indeed, we note from the model summary table that our model explains 73.3% of the variance in sales and has an adjusted R2 of .718.

• Examine The Significance

But now we want to know if every one of our variables is making a significant contribution to our model. To address this question we examine each of the t-statistics and their significance numbers. Recall that t2 = F. We can think of each t statistic and its probability as a test of whether that variable, by itself, is contributing to our model. It is essentially a test of the linear restriction that the weight on that variable is zero. For example, note that the significance of the t for ADVERT is .724. This means that one would get a sample with a t of .354 or larger, or -.354 or smaller, 72.4% of the time from a population where the true t is zero. Because this probability is so large we cannot reject the possibility that t is zero. Therefore, we conclude that this sample could have come from a population where the t is zero and thus ADVERT has no effect on SALES. All of the other t’s for all of the other variables have significance values that are really low. Consequently, they are significant contributors to our model and we should not drop the corresponding variables from the model.

• Steps In Dropping Variables

1) Run the Full model with the variables (and any pseudo-variables) that you think should be in the model. Remember that any nominal independent variable with N levels must be represented by N-1 binary predictor variables. (The N-1 is because the Unit vector will automatically be added to the model and create a linear dependency unless we drop one of our N binary predictor vectors.)

2) Examine the significance probabilities for every variable except the constant. If all of the

probabilities are below your cutoff probability, stop. You have your model. If one or more probability is above your cutoff, pick the variable with the largest probability and rerun the model with that variable eliminated. (If you did a manual F test comparing the two models at this point you would see that the square of the t for the dropped variable in the Full model was equal to the manually calculated F.)



4-42

3) Continue dropping each variable from the models until all of the significance probabilities are below your cutoff.

• The Final Model

After we drop ADVERT in the testdata example, the output (details as per the attached Output 2) shows all significance probabilities for every t to be significant – below any cutoff we might use. The model at this point is:

SALES = 119.845 - 7.047(PRICE) + 40.229(XLEVEL) - 13.293(XLSQ) + 2.721(XLPR)

This model has a significant F equal to 62.331 and explains 73.3% of the variance in SALES. The adjusted R-square is .721.

• Deriving Estimates For Sales

We now derive estimates for SALES at various values for PRICE and XLEVEL. Recall that the values in PRICE are 5 or 10 and the values in XLEVEL are around 1, 1.5, 2 and 3. We round the weights in the model to hundredths to simplify our task.

1) What Sales (in unit) would be if XLEVEL = 1? Let PRICE = 5, and XLEVEL = 1. We have Sales = 119.85 - 7.05(5) + 40.23(1) - 13.29(1)2 + 2.72(5 X 1) = 125.14.

If we increase PRICE to $10, we get

Sales = 119.85 - 7.05(10) + 40.23(1) - 13.29(1)2 + 2.72(10 X 1) = 103.49.

Therefore, when XLEVEL = 1, a $5 increase in price results in a decrease in sales of 21,650 units. (Because our sales numbers are in thousands.)

2) What Sales (in unit) would be if XLEVEL = 2?

Now let PRICE = 5, and XLEVEL = 2. Sales = 119.85 - 7.05(5) + 40.23(2) - 13.29(2) 2 + 2.72(5 X 2) = 139.10.

If we increase PRICE to $10, we get

Sales = 119.85 - 7.05(10) + 40.23(2) - 13.29(2) 2 + 2.72(10 X 2) = 131.05.

Note that the loss of sales with a $5 price increase is less when X level = 2 than when X level = 1: 21,650 versus 8,050.

3) How about the total revenue?

When XLEVEL = 2 and PRICE = 5 we get Revenue = 139.10($5) = $695.50 (in thousands) If X XLEVEL = 2 and Price = $10 we get Revenue = 131.05($10) = $1,310.50 (in thousands)



4-43

It would appear that if we get X level right, we could charge $10 instead of $5. The extra margin more than makes up for the lost unit sales.

Format For Reporting Regression Results 1) Describe the dependent variable including the nature of its distribution and number of

observations. 2) Describe the independent variables, the values they take on, any pseudo-variables that were

created and any hypotheses you have about the direction and nature of relationships. 3) Show the output from the SPSS regression run on the entire set of variables. 4) In Word create a table that lists each variable that was dropped, in the order it was dropped,

along with its probability when it was dropped and the R-square and adjusted R-square for the model before it was dropped.

5) Show the output from the SPSS regression run on the final model. 6) Include any simulation results and graphics that help demonstrate how changes in the

values for the original independent variables affect values for the dependent variable.



4-44

Regression — Output 1 Variables Entered/Removedb

XLPR,ADVERT,PRICE,XLSQ,XLEVEL

a

. Enter

Model1

VariablesEntered




Model Summary

.856a .733 .718 6.2352Model1



Predictors: (Constant), XLPR, ADVERT, PRICE, XLSQ,XLEVEL

a.

ANOVAb

9604.664 5 1920.933 49.410 .000a

3498.961 90 38.87713103.625 95


Model1


Predictors: (Constant), XLPR, ADVERT, PRICE, XLSQ, XLEVELa.


Coefficientsa

119.200 7.935 15.021 .000-7.047 .704 -1.508 -10.004 .000

4.502E-02 .127 .019 .354 .72440.199 6.478 2.557 6.205 .000

-13.287 1.421 -3.549 -9.349 .0002.722 .343 1.778 7.942 .000

(Constant)PRICEADVERTXLEVELXLSQXLPR

Model1

B Std. Error


Beta

Standardized

Coefficients

t Sig.




4-45

Regression — Output 2 Variables Entered/Removedb

XLPR,PRICE,XLSQ,XLEVEL

a. Enter

Model1

VariablesEntered




Model Summary

.856a .733 .721 6.2051Model1



Predictors: (Constant), XLPR, PRICE, XLSQ, XLEVELa.

ANOVAb

9599.801 4 2399.950 62.331 .000a

3503.824 91 38.50413103.625 95


Model1


Predictors: (Constant), XLPR, PRICE, XLSQ, XLEVELa.


Coefficientsa

119.845 7.686 15.593 .000-7.047 .701 -1.508 -10.052 .00040.229 6.447 2.559 6.240 .000

-13.293 1.414 -3.551 -9.399 .0002.721 .341 1.778 7.980 .000

(Constant)PRICEXLEVELXLSQXLPR

Model1

B Std. Error


Beta

Standardized

Coefficients

t Sig.



Topic 12. Steps For Regression Model

4-46

Suggested Steps Before Running Regression 1. If you think the nature of the relationship is curvilinear, square the independent

variable and put both the original value of the independent variable and its square into the model.

2. If, on the graph of your two-variable test, the gap between the lines is clearly not a

constant, create a new variable which is the product of the two independent variables and add it to the regression. This new product variable will help capture interaction.

3. In the time series, you may want to use a value for the independent variable which

occurs before (earlier than) the value of the dependent variable. You can use the lag function in SPSS to accomplish this.

4. In some situations where the dependent variable is highly skewed, you may want to

work with the log of the dependent variable.

Steps In Developing Your Regression Model And Predictive Test 1. Select approximately 20% of your observations at random, copy then to a hold-out

data set and delete them from the working version of your file. (Make sure you have the complete file safely backed up somewhere.) (The hand-out on SPSS describes one method for selecting and eliminating the observations at random.)

2. Develop any interaction variables (See item 2 above), or squared variables, etc. 3. Run the entire model. 4. Drop variables where the t statistics are not significant (probability above your

cutoff), one at a time. (If you are applying the one-tail test because you hypothesize the sign on the t before the test, divide the output probability (significance) by 2 to arrive at your appropriate probability.) Make sure you only drop variables one at a time. Also leave the constant in the equation, no matter what it’s probability is.

5. When you reach your final model, all probabilities must be below your cutoff.


Topic 12. Steps For Regression Model (Continued)

4-47

6. Take the coefficients from the output and apply them to the values for the appropriate independent variables in the hold-out data set to calculate a true predicted value for each of the observations in the hold-out data set.

7. Use correlation (Statistics/Correlate/Bivariate) to calculate the correlation and R-square (if

correlation is positive) between the observed and predicted values in the hold-out data set. 8. In reporting on regression models, show the SPSS output from the very first model fit, create

a table in Word listing each variable that was dropped, its probability and the R-square and adjusted R-square value at each step and then show the SPSS output for the final model.

9. Summarize the goodness of fit of the model and the model’s performance in the hold-out

data set. 10. Graph and describe any simulations that you performed.


Topic 13: Assessing The Fit

4-48

For any model, once its parameters are estimated, it is important to assess the fit of the model. To assess the fit, we use 3 indices: R-Square, Adjusted R-Square, and the Standard Error of the Estimate. The Correlation Coefficient, R Consider two variables, X and Y, with the values indicated below:

Xi Yi 1 4 3 13 5 15 7 24

One interesting question that can be asked is, "To what extent do the values for these variables vary together"? That is, to what extent do they "co-vary"? A measure of the degree to which two variables vary together (linearly) is the correlation coefficient. The correlation coefficient is given by:

( )( )

( ) ( )

−

−

−−=

∑∑

∑

==

=

N

ii

N

ii

N

iii

YYXX

YYXXR

1

2

1

2

1

The correlation coefficient for X and Y is calculated below: ( 4=X , 14=Y )

( )( ) ( )975441.

56.6362

404062

2022062

====R

X i Y i

1 4 -3 -10 30 9 1003 13 -1 -1 1 1 15 15 1 1 1 1 17 24 3 10 30 9 100

62 20 202sum:

XX i − YYi − ))(( YYXX ii −− 2)( XXi − 2)( YYi −


Topic 13: Assessing The Fit (Continued)

4-49

For any two variables, R will be between -1 and +1. If R is negative, the two variables' values vary inversely - as the values for one go up, the values for the other go down. If R is positive, they vary directly - as the values for one go up, the values for the other go up. The closer to -1 (or +1) the value for R is, the more the two variables are linearly related. Percent of the Variance That is Explained • Scenario 1: using Y to predict Y

Suppose we are interested in "predicting" Y. One measure we might use to "predict" Y is the mean on Y (Y ). If we use the mean for Y, to "predict" Y, then a summary measure of how well we did is the "average squared error" as calculated below:

Average squared error = ( ) NYYN

ii /

1

2

−∑

=

= 202/4 = 50.5

The averaged squared error when we use the mean of Y to "predict" Y is 50.5. (You may recognize this value is the same as the measure of dispersion for Y which is called the variance of Y.) Notice that because the average error is 0, 50.5 is also what we would get if we calculated the variance of the errors, V (Yi = Y ). In a sense, 50.5 is a measure of the variance of our errors when we use the mean on Y to "predict" Y. • Scenario 2: using linear model Y = 2 + 3X to predict Y

Now suppose we wish to create a model in which we assume a linear relationship between X and Y and we wish to use X to "predict" Y. Assume we fit such a "linear" model and we obtained Y = 2 + 3X.

Now use this model to "predict” Y and label the predicted values Y. Also calculate the average squared error as below:

Y i

4 14 -10 10013 14 -1 115 14 1 124 14 10 100

sum: 202

YY i − 2)( YYi −Y



4-50

Average squared error = ( ) NYYN

ii /ˆ

1

2

−∑

=

=10/4 = 2.5

The average squared error when we use X to "predict" Y is 2.5. Notice that it is the variance of our error terms. In a sense, 2.5 is a measure of the variance of our errors when we use X to "Predict" Y. The variance of the errors when we use the mean of Y to "predict" Y is 50.5. Trying to "predict" Y with its mean is our worst case scenario. If another variable is only slightly related to Y, we can improve our ability to "predict" Y by using our estimated model with a weight on the unit vector and a weight on the other variable. And the amount we reduce the variance of the errors - in this example from 50.5 to 2.5 - is an indication of how much knowledge of the other variable - in this case X - can be used to reduce the variance of our errors in predicting Y. • Calculating the percent of variance explained

Now suppose we label our worst case error variance as Total Variance (TV) and we label our error variance, which still remains even after we use X to "predict" Y, as Unexplained Variance (UV). [We call it unexplained because it is the error variance which remains left over (i.e., unexplained) even after we use X to "predict" Y.] In this example, TV = 50.5 and UV = 2.5. Thus, Explained Variance (EV) - which equals TV - UV, is 48. Now, of this total variance in Y (50.5) we have explained 48, or 48/50.5 = 95%. So we can say that "X explains 95% of the variance in Y." • Relationship between R-square and percent of variance explained

Remember that the correlation between X and Y was calculated to be .975441. What do you think the square of this correlation would turn out to be? In general, the square of the correlation between two variables will be equal to the percentage of the variance in one that can be "explained" in the form of a linear relationship with the other. Furthermore, when we extend the model to include several independent variables and calculate the square of the correlation between the observed values for the dependent variable and the values given by our model, this quantity (the correlation squared) reflects the percentage of the variance in the dependent variable that is "explained" by the model.

Y i X i

4 1 5 -1 113 3 11 +2 415 5 17 -2 424 7 23 +1 1

sum: 10

YYiˆ− 2)ˆ( YYi −Y



4-51

Adjusted R-Square

Rationale: if the number of observations is only equal to the number of parameters, then one can fit the data perfectly - as the # of parameters gets closer and closer to the number of observations, then our ability to fit the data is reflected in terms of R2 values that are closer and closer to 1. Because our ability to fit the data is more and more exaggerated as the number of parameters approaches the number of our observations, we must correct for the degree of this exaggeration. The way to make this correction is through adjusted R2

Adjusted ( )

−−

−−=NLN

NRR 111 22

Where R2 = the square of the correlation;

N = the number of observations; and,

NL = the number of linearly independent predictor vectors in the model.

Standard Error of the Estimate

The perspective of R-Square is how close do we reproduce the observed data with our model after we have estimated its parameters. But we should also assess just how far off our model is as well. The measure to reflect how far off we are is the standard error of the estimate. For any model, the standard error of the estimate is given by:

NLNESSs−

=

where ESS = the error sum of squares for the model;

N = the number of observations; and,

NL = the number of linearly independent predictor vectors in the model

This index can be thought of as the square root of the average squared error. This measure is expressed in terms of the original units (because we take the square root) and reflects how far off our model estimates are, on the average, from the actual observations.

topic 1. definitions - rice universitybatsell/part 4 linear model.pdf · part iv linear models...

Documents