12 simple linear regression and...

146
12 Simple Linear Regression and Correlation Chapter 12 Stat 4570/5570 Material from Devore’s book (Ed 8), and Cengage

Upload: lamquynh

Post on 07-Feb-2018

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

12 Simple LinearRegression and

Correlation

Chapter 12 Stat 4570/5570

Material from Devore’s book (Ed 8), and Cengage

Page 2: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

So far, we tested whether subpopulations’ means are equal.

What happens when we have multiple subpopulations?

For example, so far we can answer:

Is fever reduction effect of a certain drug the same forchildren vs. adults? Our hypothesis would be:H0 : µ1 = µ2

But what happens if we now wish to consider many agegroups: infants (0-1), toddlers (1-3), pre-school (3-6), earlyschool (6-9), … old age (65 and older). i.e:

H0 : µ1 = µ2 = µ3 = µ4 = … = µ15

How do we do this? What would some of the alternatives be?

Page 3: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

What would some of the alternatives be?

Ha : µ1 < µ2 < µ3 < µ4 < … < µ15

This is doable, but it would be a cumbersome test.

Instead, we can postulate:

Ha : µ1 = µ2 - b = µ3 -2b = … = µ15 – 14b ;

Then, H0: b = 0

In other words, we postulate a specific relationship betweenthe means.

Page 4: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

4

Modeling relations:Simple Linear Models

(relationship status: it’s not complicated)

Page 5: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

5

The Simple Linear Regression ModelThe simplest deterministic mathematical relationshipbetween two variables x and y is a linear relationship

y = β0 + β1x.

The set of pairs (x, y) for which y = β0 + β1x determines astraight line with slope β1 and y-intercept β0.

The objective of this section is to develop an equivalentlinear probabilistic model.

If the two variables are probabilistically related, then for afixed value of x, there is uncertainty in the value of thesecond variable.

Page 6: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

6

The Simple Linear Regression Model

The variable whose value is fixed (by the experimenter) willbe denoted by x and will be called the independent,predictor, or explanatory variable.

For fixed x, the second variable will be random; we denotethis random variable and its observed value by Y and y,respectively, and refer to it as the dependent or responsevariable.

Usually observations will be made for a number of settingsof the independent variable.

Page 7: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

7

The Simple Linear Regression Model

Let x1, x2, …, xn denote values of the independent variablefor which observations are made, and let Yi and yi,respectively, denote the random variable and observedvalue associated with xi.

The available bivariate data then consists of the n pairs(x1, y1), (x2, y2), …, (xn, yn).

A picture of this data called a scatter plot gives preliminaryimpressions about the nature of any relationship. In such aplot, each (xi, yi) is represented as a point plotted on a twodimensional coordinate system.

Page 8: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

8

Example 1The order in which observations were obtained was notgiven, so for convenience data are listed in increasingorder of x values:

Thus (x1, y1) = (.40, 1.02), (x5, y5) = (.57, 1.52), and so on.

cont’d

Page 9: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

9

Example 1The scatter plot (OSA=y, palwidth = x):

cont’d

Page 10: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

10

Example 1• There is a strong tendency for y to increase as x increases. That is, larger values of OSA tend to be associated with larger values of width—a positive relationship between the variables.

• It appears that the value of y could be predicted from x by finding a line that is reasonably close to the points in the plot (the authors of the cited article superimposed such a line on their plot).

cont’d

Page 11: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

11

A Linear Probabilistic ModelFor the deterministic model y = β0 + β1x, the actualobserved value of y is a linear function of x.

The appropriate generalization of this to a probabilisticmodel assumes that

The average of Y is a linear function of x

-- ie, that for fixed x the actual value of variableY differsfrom its expected value by a random amount.

Page 12: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

12

A Linear Probabilistic ModelDefinitionThe Simple Linear Regression Model

There are parameters β0, β1, and σ 2, such that for any fixedvalue of the independent variable x, the dependent variableis a random variable related to x through the modelequation Y = β0 + β1x + ∈

The quantity ∈ in the model equation is the ERROR -- arandom variable, assumed to be symmetrically distributedwith

E(∈) = 0 and V(∈) = σ 2.

(12.1)

Page 13: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

13

A Linear Probabilistic ModelThe variable ∈ is sometimes referred to as the randomdeviation or random error term in the model.

Without ∈, any observed pair (x, y) would correspond to apoint falling exactly on the line β0 + β1x, called the true (orpopulation) regression line.

The inclusion of the random error term allows (x, y) to falleither above the true regression line (when ∈ > 0) or belowthe line (when ∈ < 0).

Page 14: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

14

A Linear Probabilistic ModelThe points (x1, y1), …, (xn, yn) resulting from n independentobservations will then be scattered about the trueregression line:

Page 15: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

15

A Linear Probabilistic ModelOn occasion, the appropriateness of the simple linearregression model may be suggested by theoreticalconsiderations (e.g., there is an exact linear relationshipbetween the two variables, with ∈ representingmeasurement error).

Much more frequently,the reasonableness of themodel is indicated bydata -- ascatter plot exhibiting asubstantial linear pattern.

Page 16: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

16

If we think of an entire population of (x, y) pairs, thenµY | x∗ is the mean of all y values for which x = x∗, andσ 2Y | x∗ is a measure of how much these values of y spreadout about the mean value.

If, for example, x = age of a child and y = vocabulary size,then µY | 5 is the average vocabulary size for all 5-year-oldchildren in the population, and σ 2Y | 5 describes the amountof variability in vocabulary size for this part of thepopulation.

A Linear Probabilistic Model

Page 17: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

17

Once x is fixed, the only randomness on the right-hand sideof the linear model equation is in the random error ∈.Recall that its mean value and variance are 0 and σ 2,respectively, whatever the value of x. This implies that

µY | x∗ = E(β0 + β1x∗ + ∈) = ?σ 2Y | x∗ = ?

Replacing x∗ in µY | x∗ by x gives the relation µY | x = β0 + β1x,which says that the mean value (expected value, average)of Y, rather than Y itself, is a linear function of x.

A Linear Probabilistic Model

Page 18: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

18

The true regression line y = β0 + β1x is thus the line of mean values; itsheight for any particular x value is the expected value of Y for thatvalue of x.

The slope β1 of the true regression line is interpreted as the expected(average) change in Y associated with a 1-unit increase in the value ofx.

(This is equivalent to saying “the change in average (expected) Yassociated with a 1-unit increase in the value of x.”)

The variance (amount of variability) of the distribution of Y values is thesame at each different value of x (homogeneity of variance).

A Linear Probabilistic Model

Page 19: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

19

The variance parameter σ 2 determines the extent to whicheach normal curve spreads out about the regression line

When errors are normally distributed…

distribution of ∈

(b) distribution of Y for different values of x

Page 20: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

20

When σ 2 is small, an observed point (x, y) will almostalways fall quite close to the true regression line, whereasobservations may deviate considerably from their expectedvalues (corresponding to points far from the line) when σ 2

is large.

Thus, this variance can be used to tell us how good thelinear fit is (we’ll explore this notion of model fit later.)

A Linear Probabilistic Model

Page 21: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

21

Example 2Suppose the relationship between applied stress x andtime-to-failure y is described by the simple linear regressionmodel with true regression line y = 65 – 1.2x and σ = 8.

Then for any fixed value x∗ of stress, time-to-failure has anormal distribution with mean value 65 – 1.2x∗ andstandard deviation 8.

Roughly speaking, in the population consisting of all (x, y)points, the magnitude of a typical deviation from the trueregression line is about 8.

Page 22: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

22

Example 2For x = 20, Y has mean value…?

What is P(Y > 50 when x = 20)?P(Y > 50 when x = 25)?

cont’d

Page 23: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

23

Example 2These probabilities are illustrated in:

cont’d

Page 24: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

24

Example 2Suppose that Y1 denotes an observation on time-to-failure made with x= 25 and Y2 denotes an independent observation made with x = 24.

ThenY1 – Y2 is normally distributed

with mean valueE(Y1 – Y2) = β0 + β1(25) - β0 + β1(24) = β1 = -1.2,

varianceV(Y1 – Y2 ) = σ 2 + σ 2 = 128,

standard deviation .

Page 25: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

25

Example 2The probability that Y1 exceeds Y2 is

P(Y1 – Y2 > 0) =

= P(Z > .11)

= .4562

That is, even though we expected Y to decrease whenx increases by 1 unit, it is not unlikely that the observed Yat x + 1 will be larger than the observed Y at x.

cont’d

Page 26: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

26

A couple of caveats with linear relationships…

“no linear relationship” does not mean “no relationship”

Here, no linear, but perfect quadratic relation between x and y

010

2030

4050

Y

-10 -5 0 5 10X

Page 27: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

27

A couple of caveats…

Relation not linear – but an ok approximation in this range of x

24

68

10

4 6 8 10 12 14X2

Fitted values Y2

Page 28: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

28

A couple of caveats…

Too dependent on that one point out there

46

810

12

4 6 8 10 12 14X3

Fitted values Y3

Page 29: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

29

A couple of caveats…

Totally dependent on that one point out there

68

1012

14

5 10 15 20X4

Fitted values Y4

Page 30: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

30

A couple of caveats…

Nice!

46

810

12

4 6 8 10 12 14X1

Fitted values Y1

Page 31: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

31

Estimating modelparameters

Page 32: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

32

Estimating Model Parameters

The values of β0, β1, and σ 2 will almost never be known to

an investigator.

Instead, sample data consists of n observed pairs(x1, y1), …, (xn, yn),from which the model parameters and the true regressionline itself can be estimated.

The data (pairs) are assumed to have been obtainedindependently of one another.

Page 33: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

33

Estimating Model ParametersThat is, yi is the observed value of Yi, whereYi = β0 + β1xi + ∈I

and the n deviations ∈1, ∈2,…, ∈n are independent rv’s.

Independence of Y1, Y2, …, Yn follows from independenceof the ∈i’s.

According to the model, the observed points will bedistributed around the true regression line in a randommanner.

Page 34: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

34

Estimating Model ParametersFigure shows a typical plot of observed pairs along with twocandidates for the estimated regression line.

Page 35: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

35

Estimating Model ParametersThe “best fit” line is motivated by the principle of least squares,which can be traced back to the German mathematicianGauss (1777–1855):

a line provides the bestfit to the data if the sumof the squared verticaldistances (deviations)from the observed pointsto that line is as smallas it can be.

Page 36: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

36

Estimating Model ParametersThe sum of squared vertical deviations from the points(x1, y1),…, (xn, yn) to the line is then

The point estimates of β0 and β1, denoted by and , arecalled the least squares estimates – they are thosevalues that minimize f(b0, b1).

That is, and are such that for any b0and b1.

Page 37: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

37

Estimating Model ParametersThe estimated regression line or least squares line isthen the line whose equation is y = + x.

The minimizing values of b0 and b1 are found by takingpartial derivatives of f(b0, b1) with respect to both b0 and b1,equating them both to zero [analogously to f ʹ′(b) = 0 inunivariate calculus], and solving the equations

Page 38: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

38

Estimating Model ParametersCancellation of the –2 factor and rearrangement gives thefollowing system of equations, called the normalequations:

nb0 + (Σxi)b1 = Σyi

(Σxi)b0 + (Σxi2)b1 = Σxiyi

These equations are linear in the two unknowns b0 and b1.

Provided that not all xi’s are identical, the least squaresestimates are the unique solution to this system.

Page 39: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

39

Estimating Model ParametersThe least squares estimate of the slope coefficient β1 of thetrue regression line is

Shortcut formulas for the numerator and denominator of are

Sxy = (Σxiyi – (Σxi)(Σyi)) / n Sxx = (Σxi2 – (Σxi)2)/n

(12.2)

Page 40: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

40

Estimating Model ParametersThe least squares estimate of the intercept β0 of the trueregression line is

The computational formulas for Sxy and Sxx require only thesummary statistics Σxi, Σyi, Σxi

2 and Σxiyi (Σyi2 will be

needed shortly).

In computing use extra digits in because, if x is largein magnitude, rounding will affect the final answer.In practice, the use of a statistical software package ispreferable to hand calculation and hand-drawn plots.

(12.3)

Page 41: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

41

ExampleThe cetane number is a critical property in specifying theignition quality of a fuel used in a diesel engine.

Determination of this number for a biodiesel fuel isexpensive and time-consuming.

The article “Relating the Cetane Number of BiodieselFuels to Their Fatty Acid Composition: A Critical Study”(J. of Automobile Engr., 2009: 565–583) included thefollowing data on x = iodine value (g) and y = cetanenumber for a sample of 14 biofuels.

Page 42: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

42

ExampleThe iodine value is the amount of iodine necessary tosaturate a sample of 100 g of oil. Fit the simple linearregression model to this data.

cont’d

Page 43: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

43

ExampleScatter plot with the least squares line superimposed.

cont’d

Page 44: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

44

Estimating σ 2 and σ

Page 45: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

45

Estimating σ 2 and σThe parameter σ 2 determines the amount of spread aboutthe true regression line. Two separate examples:

Page 46: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

46

Estimating σ 2 and σAn estimate of σ 2 will be used in confidence interval (CI)formulas and hypothesis-testing procedures presented inthe next two sections.

Because the equation of the true line is unknown, theestimate is based on the extent to which the sampleobservations deviate from the estimated line.

Many large deviations (residuals) suggest a large value ofσ 2, whereas deviations all of which are small in magnitudesuggest that σ 2 is small.

Page 47: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

47

Estimating σ 2 and σ

DefinitionThe fitted (or predicted) values are obtainedby successively substituting x1,…, xn into the equation ofthe estimated regression line:

The residuals are the differencesbetween the observed and fitted y values.

Note – the residuals are estimates of the true error,since they are deviations of the y’s from the estimatedregression line, while the errors are deviations from the .

Page 48: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

48

Estimating σ 2 and σ

In words, the predicted value is:-The value of y that we would predict or expect when usingthe estimated regression line with x = xi;- The height of the estimated regression line above thevalue xi for which the ith observation was made.- It’s the mean for that population where x = xi.

The residual is the vertical deviation between the point (xi,yi) and the least squares line—a positive number if thepoint lies above the line and a negative number if it liesbelow the line.

Page 49: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

49

Estimating σ 2 and σ

When the estimated regression line is obtained via theprinciple of least squares, the sum of the residuals shouldin theory be zero, if the error distribution is symmetric (andit is, due to our assumption).

Page 50: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

50

Example

Relevant summary quantities (summary statistics) are

Σxi = 2817.9, Σyi = 1574.8, Σx2i = 415,949.85,

Σxi yi = 222,657.88, and Σy2i = 124,039.58,

from which x = 140.895, y = 78.74, Sxx = 18,921.8295,Sxy = 776.434.

Page 51: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

51

ExampleAll predicted values (fits) and residuals appear in theaccompanying table.

cont’d

Page 52: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

52

Estimating σ 2 and σ

The error sum of squares (equivalently, residual sum ofsquares), denoted by SSE, is

and the estimate of σ 2 is

Page 53: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

53

Estimating σ 2 and σ

The divisor n – 2 in s2 is the number of degrees of freedom(df) associated with SSE and the estimate s2.

This is because to obtain s2, the two parameters β0 and β1must first be estimated, which results in a loss of 2 df (justas µ had to be estimated in one sample problems, resultingin an estimated variance based on n – 1df).

Replacing each yi in the formula for s2 by the rv Yi gives theestimator S2.

It can be shown that S2 is an unbiased estimator for σ 2

Page 54: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

54

Example, cont.The residuals for the filtration rate–moisture content datawere calculated previously.

The corresponding error sum of squares is?

The estimate of σ 2 is?

Page 55: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

55

Estimating σ 2 and σ

Computation of SSE from the defining formula involvesmuch tedious arithmetic, because both the predicted valuesand residuals must first be calculated.

Use of the following computational formula does not requirethese quantities.

Page 56: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

56

The Coefficient of Determination

Page 57: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

57

The Coefficient of DeterminationDifferent variability in observed y values:

Using the model to explain y variation:(a) data for which all variation is explained;(b) data for which most variation is explained;(c) data for which little variation is explained

Page 58: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

58

The Coefficient of DeterminationThe error sum of squares SSE can be interpreted as ameasure of how much variation in y is left unexplained bythe model—that is, how much cannot be attributed to alinear relationship.

What would SSE be in the first plot?

A quantitative measure of the total amount of variation inobserved y values is given by the total sum of squares

Page 59: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

59

The Coefficient of Determination

Total sum of squares is the sum of squared deviationsabout the sample mean of the observed y values – whenno predictors are taken into account.

Thus the same number y is subtracted from each yi in SST,whereas SSE involves subtracting each different predictedvalue from the corresponding observed yi.

This in some sense is as bad as SSE can get if there is noregression model (ie, slope is 0).

Page 60: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

60

The Coefficient of DeterminationJust as SSE is the sum of squared deviations about theleast squares line SST is the sum ofsquared deviations about the horizontal line at height (sincethen vertical deviations are yi – y).

Figure 12.11

Sums of squares illustrated: (a) SSE = sum of squared deviations about the least squares line;(b) SSE = sum of squared deviations about the horizontal line

(b)(a)

Page 61: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

61

The Coefficient of DeterminationDefinitionThe coefficient of determination, denoted by r2, is givenby

It is interpreted as the proportion of observed y variationthat can be explained by the simple linear regression model(attributed to an approximate linear relationship between yand x).

The higher the value of r2, the more successful is thesimple linear regression model in explaining y variation.

Page 62: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

62

The Coefficient of DeterminationWhen regression analysis is done by a statistical computerpackage, either r2 or 100r2 (the percentage of variationexplained by the model) is a prominent part of the output.

What do you do if r2 is small?

Page 63: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

63

ExampleThe scatter plot of the iodine value–cetane number data inFigure 12.8 portends a reasonably high r2 value.

Figure 12.8

Scatter plot for Example 4 with least squares line superimposed, from Minitab

Page 64: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

64

ExampleThe coefficient of determination is then

r2 = 1 – SSE/SST = 1 – (78.920)/(377.174) = .791

How do we interpret this value?

cont’d

Page 65: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

65

The Coefficient of DeterminationThe coefficient of determination can be written in a slightlydifferent way by introducing a third sum ofsquares—regression sum of squares, SSR—given by

SSR = Σ( – y)2 = SST – SSE.

Regression sum of squares is interpreted as the amount oftotal variation that is explained by the model.

Then we have

r2 = 1 – SSE/SST = (SST – SSE)/SST = SSR/SST

the ratio of explained variation to total variation.

Page 66: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

66

Inference About the SlopeParameter β1

Page 67: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

67

Example 10The following data is representative of that reported in thearticle “An Experimental Correlation of Oxides of NitrogenEmissions from Power Boilers Based on Field Data” (J. ofEngr. for Power, July 1973: 165–170), x = burner-arealiberation rate (MBtu/hr-ft2) and y = NOx emission rate(ppm).

There are 14 observations, made at the x values 100, 125,125, 150, 150, 200, 200, 250, 250, 300, 300, 350, 400, and400, respectively.

Page 68: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

68

Simulation experimentSuppose that the slope and intercept of the true regression line areβ1 = 1.70 and β0 = –50, with σ = 35.

Let’s fix x values 100, 125, 125, 150, 150, 200, 200, 250, 250, 300, 300,350, 400, and 400.

We then generate a sample of random deviationsfrom a normal distribution with mean 0 and standard deviation 35and then add to β0 + β1xi to obtain 14 corresponding y values.

LS calculations were then carried out to obtain the estimated slope,intercept, and standard deviation for this sample of 14 pairs (xi,yi).

cont’d

Page 69: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

69

Simulation experimentThis process was repeated a total of 20 times, resulting inthe values:

There is clearly variation in values of the estimated slopeand estimated intercept, as well as the estimated standarddeviation.

cont’d

Page 70: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

70

Simulation experiment:graphs of the true regression line and 20 least squares lines

Page 71: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

71

Inferences About the Slope Parameter β1

The estimators are:

Page 72: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

72

Inferences About the Slope Parameter β1

Invoking properties of a linear function of random variablesas discussed earlier, leads to the following results.

1. The mean value of is….2. The variance and standard deviation are…3. The distribution of is ….

Page 73: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

73

Inferences About the Slope Parameter β1

The variance of equals the variance σ 2 of therandom error term—or, equivalently, of any Yi,divided by . This denominator is ameasure of how spread out the xi’s are about .

Making observations at xi values that are quitespread out results in a more precise estimator ofthe slope parameter.

If the xi’s are spread out too far, a linear modelmay not be appropriate throughout the range ofobservation.

Page 74: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

74

Inferences About the Slope Parameter β1

TheoremThe assumptions of the simple linear regression modelimply that the standardized variable

has a t distribution with n – 2 df.

Page 75: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

75

A Confidence Interval for β1

Page 76: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

76

A Confidence Interval for β1

As in the derivation of previous CIs, we begin with aprobability statement:

Manipulation of the inequalities inside the parentheses toisolate β1 and substitution of estimates in place of theestimators gives the CI formula.

Page 77: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

77

ExampleVariations in clay brick masonry weight have implications notonly for structural and acoustical design but also for design ofheating, ventilating, and air conditioning systems.

The article “Clay Brick Masonry Weight Variation” (J. ofArchitectural Engr., 1996: 135–137) gave a scatter plot ofy = mortar dry density (lb/ft3) versus x = mortar air content (%)for a sample of mortar specimens, from which the followingrepresentative data was read:

Page 78: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

78

ExampleThe scatter plot of this data in Figure 12.14 certainlysuggests the appropriateness of the simple linear regressionmodel; there appears to be a substantial negative linearrelationship between air content and density, one in whichdensity tends to decrease as air content increases.

Scatter plot of the data from Example 11

cont’d

Page 79: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

79

Hypothesis-Testing Procedures

Page 80: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

80

Hypothesis-Testing ProceduresThe most commonly encountered pair of hypotheses aboutβ1 is H0: β1 = 0 versus Ha: β1 ≠ 0. When this null hypothesisis true, µY x = β0 independent of x. Then knowledge of xgives no information about the value of the dependentvariable.

Null hypothesis: H0: β1 = β10

Test statistic value:

Page 81: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

81

Hypothesis-Testing ProceduresAlternative Hypothesis Alternative Hypothesis

Ha: β1 > β10 t ≥ tα,n – 2

Ha: β1 < β10 t ≤ –tα,n – 2

Ha: β1 ≠ β10 either t ≥ tα/2,n – 2 or t ≤ – tα/2,n – 2

A P-value based on n – 2 can be calculated just as wasdone previously for t tests.

the test statistic value is the t ratio t =

Page 82: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

82

Regression analysis in R

Page 83: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

83

Regression analysis in R

Fitting Linear Models in R

Description

lm is used to fit linear models (regression)

Command:lm(formula, data, subset, ...)

Example:Model1 = lm(outcome ~ predictor)

Page 84: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

84

Example: Regression analysis in RRobert Hooke (England, 1653-1703) was able to assess the relationshipbetween the length of a spring and the load placed on it. He just hungweights of different sizes on the end of a spring, and watched whathappened. When he increased the load, the spring got longer. When hereduced the load, the spring got shorter. And the relationship was more orless linear.

Let b be the length of the spring with no load. When a weightof x kilograms is tied to the end of the spring, the spring stretches to anew length.

Page 85: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

85

Example: Regression analysis in R

According to Hooke's law, the amount of stretch isproportional to the weight x. So the new length of the spring is

y = mx + b

In this equation, m and b are constants which depend on thespring.

Their values are unknown and have to be estimated usingexperimental data.

Page 86: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

86

Hooke’s Law in R

Weight Length

0 kg 439.00 cm

2 439.124 439.216 439.318 439.4010 439.50

Data below come from an experiment in which weights ofvarious sizes were loaded on the end of a length of piano wire.

The first column shows the weight of the load. The secondcolumn shows the measured length. With 20 pounds of load,this "spring" only stretched about 0.2 inch (10 kg isapproximately 22 lb, 0.5 cm is approximately 0.2 in).

Piano wire is not very stretchy!

439 cm is about 14.4 feet(the room needed to have afairly high ceiling)

Page 87: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

87

Regression analysis in R

x = c(0, 2, 4, 6, 8, 10)y = c( 439.00, 439.12, 439.21, 439.31, 439.40, 439.50)plot( x, y, xlab = 'Load (kg)', ylab = 'Length (cm)' )

0 2 4 6 8 10

439.

043

9.1

439.

243

9.3

439.

443

9.5

Load (kg)

Leng

th (c

m)

Page 88: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

88

Regression analysis in RHooke.lm = lm( y ~ x)

summary(Hooke.lm)

Residuals: 1 2 3 4 5 6-0.010952 0.010762 0.002476 0.004190 -0.004095 -0.002381

Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) 4.390e+02 6.076e-03 72254.92 < 2e-16 ***x 4.914e-02 1.003e-03 48.98 1.04e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.008395 on 4 degrees of freedomMultiple R-squared: 0.9983, Adjusted R-squared: 0.9979F-statistic: 2399 on 1 and 4 DF, p-value: 1.04e-06

Page 89: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

89

Regression analysis in Rcoefficients(Hooke.lm)# (Intercept) x# 439.01095238 0.04914286

Hooke.coefs = coefficients(Hooke.lm)

Hooke.coefs[1]439.011

Hooke.coefs[2]0.04914286

# expected stretch for x = 5kgHooke.coefs[1] + Hooke.coefs[2] * 5 = 439.2567

If we knew that line for certain, then the estimated standard deviation of the actualstretch around that expected value would be:Residual standard error: 0.008395

Page 90: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

90

Regression analysis in R

coefficients(Hooke.lm)# (Intercept) x# 439.01095238 0.04914286

So, in the Hookeʼs law, m = 0.05 cm per kg, and b = 439.01cm. The method of least squares estimates the length of thespring under no load to be 439.01 cm.

And each kilogram of load makes the spring stretch by anamount estimated as 0.05 cm.

Note that there is no question here about what is causing what:correlation is not causation, but in this controlled experimentalsetting, the causation is clear and simple. This is an exceptionmore than a rule in statistical modeling.

Page 91: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

91

Regression analysis in R

The method of least squares estimates the length of the springunder no load to be 439.01 cm. This is a bit longer than themeasured length at no load (439.00 cm).

A statistician would trust the least squares estimate over themeasurement. Why?

example taken from analytictech.com

Page 92: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

92

Regression analysis in Ry.hat <- Hooke.coefs[1] + Hooke.coefs[2]* x439.0110 439.1092 439.2075 439.3058 439.4041 439.5024

> predict(Hooke.lm)439.0110 439.1092 439.2075 439.3058 439.4041 439.5024

e.hat <- y - y.hat

mean( e.hat )# -2.842171e-14

cbind( y, y.hat, e.hat )# y y.hat e.hat# [1,] 439.00 439.0110 -0.010952381# [2,] 439.12 439.1092 0.010761905# [3,] 439.21 439.2075 0.002476190# [4,] 439.31 439.3058 0.004190476# [5,] 439.40 439.4041 -0.004095238# [6,] 439.50 439.5024 -0.002380952

Page 93: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

93

Regression analysis in R

plot( x, y, xlab = 'Load (kg)', ylab = 'Length (cm)' )lines( x, y.hat, lty = 1, col = 'red' )

0 2 4 6 8 10

439.

043

9.1

439.

243

9.3

439.

443

9.5

Load (kg)

Leng

th (c

m)

Page 94: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

94

Regression analysis in R

plot(y.hat, e.hat, xlab = 'Predicted y values', ylab = 'Residuals' )abline(h = 0, col = 'red')

This plot doesn'tlook good, actually

439.0 439.1 439.2 439.3 439.4 439.5

-0.0

10-0

.005

0.00

00.

005

0.01

0

Predicted y values

Res

idua

ls

Page 95: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

95

Regression analysis in R

qqnorm( e.hat, main = 'Residual Normal QQplot' )abline( mean( e.hat ), sd( e.hat ), col = 'red' )

This plotlooks good

-1.0 -0.5 0.0 0.5 1.0

-0.0

10-0

.005

0.00

00.

005

0.01

0

Residual Normal QQplot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Page 96: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

96

Inferences Concerning µY x∗ and thePrediction of Future Y Values

Page 97: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

97

Inference Concerning Mean of Future YHooke.coefs = coefficients(Hooke.lm)# (Intercept) x# 439.01095238 0.04914286

# expected stretch for x = 5kgHooke.coefs[1] + Hooke.coefs[2] * 5 = 439.2567

If we knew that line for certain, then the estimated standarddeviation of the actual stretch around that expected valuewould be:

Residual standard error: 0.008395

But we donʼt know m and b for certain – weʼve just estimatedthem, and we know that their estimators are Normal randomvariables.

Page 98: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

98

Let x∗ denote a specified value of the independentvariable x.

Once the estimates and have been calculated, + x∗ can be regarded either as a point estimate of(the expected or true average value of Y when x = x∗) or asa prediction of the Y value that will result from a singleobservation made when x = x∗.

The point estimate or prediction by itself gives noinformation concerning how precisely has beenestimated or Y has been predicted.

Inference Concerning Mean of Future Y

Page 99: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

99

This can be remedied by developing a CI for and aprediction interval (PI) for a single Y value.

Before we obtain sample data, both and are subject tosampling variability—that is, they are both statistics whosevalues will vary from sample to sample.

Suppose, for example, that _0 = 439 and _1 = 0.05.

Then a first sample of (x, y) pairs might give = 439.35, = 0.048; a second sample might result in = 438.52, = 0.051; and so on.

Inference Concerning Mean of Future Y

Page 100: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

100

It follows that = + x∗ itself varies in value from sampleto sample, so it is a statistic.

If the intercept and slope of the population line are theaforementioned values 439 and 0.05, respectively, and x∗=5kgs, then this statistic is trying to estimate the value

439 + 0.05(5) = 439.25

The estimate from a first sample might be439.35 + 0.048(5) = 439.59, from a second sample mightbe 438.52 + 0.051(5) = 438.775 , and so on.

Inference Concerning Mean of Future Y

Page 101: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

101

Consider the value x∗ = 10.Recall that because 10 is further than x = 5 from the “centerof the data”, the estimated “y.hat” values are more variable.

Inference Concerning Mean of Future Y

Page 102: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

102

In the same way, inferences about the mean Y value + x∗ are based on properties of the samplingdistribution of the statistic + x∗.

Substitution of the expressions for and into + x∗followed by some algebraic manipulation leads to therepresentation of + x∗ as a linear function of the Yi’s:

The coefficients d1, d2, …., dn in this linear function involvethe xi’s and x∗, all of which are fixed.

Inference Concerning Mean of Future Y

Page 103: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

103

Application of the rules to this linear function gives thefollowing properties.

PropositionLet = + where x∗ is some fixed value of x. Then

1. The mean value of is

Thus + x∗ is an unbiased estimator for + x∗ (i.e., for ).

Inference Concerning Mean of Future Y

Page 104: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

104

2. The variance of is

And the standard deviation is the square root of this expression. The estimated standard deviation of + x∗, denoted by or , results from replacing σ by its estimate s:

3. has a normal distribution.

Inference Concerning Mean of Future Y

Page 105: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

105

The variance of + x∗ is smallest when x∗ = x andincreases as x∗ moves away from x in either direction.

Thus the estimator of µY x∗ is more precise when x∗ is nearthe center of the xi’s than when it is far from the values atwhich observations have been made. This will imply thatboth the CI and PI are narrower for an x∗ near x than for anx∗ far from x.

Most statistical computer packages will provide both + x∗ and for any specified x∗ upon request.

Inference Concerning Mean of Future Y

Page 106: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

106

Inference Concerning Mean of Future YHooke.coefs = coefficients(Hooke.lm)# (Intercept) x# 439.01095238 0.04914286

> predict(Hooke.lm)439.0110 439.1092 439.2075 439.3058 439.4041 439.5024

# expected stretch for x = 5kgHooke.coefs[1] + Hooke.coefs[2] * 5 = 439.2567

> predict(Hooke.lm, se.fit=TRUE)$fit 1 2 3 4 5 6439.0110 439.1092 439.2075 439.3058 439.4041 439.5024

$se.fit[1] 0.006075862 0.004561497 0.003571111 0.0035711110.004561497 0.006075862

Page 107: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

107

Just as inferential procedures for _1 were based on the tvariable obtained by standardizing _1, a t variable obtainedby standardizing + x∗ leads to a CI and test procedureshere.

TheoremThe variable

has a t distribution with n – 2 df.

Inference Concerning Mean of Future Y

Page 108: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

108

A probability statement involving this standardized variablecan now be manipulated to yield a confidence interval for

A 100(1 – α )% CI for the expected value of Y whenx = x∗, is

This CI is centered at the point estimate for andextends out to each side by an amount that depends on theconfidence level and on the extent of variability in theestimator on which the point estimate is based.

Inference Concerning Mean of Future Y

Page 109: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

109

Inference Concerning Mean of Future Y

preds = predict(Hooke.lm, se.fit=TRUE)$fit439.0110 439.1092 439.2075 439.3058 439.4041 439.5024

$se.fit0.00608 0.00456 0.00357 0.00357 0.00456 0.00608

> plot( x, y, xlab = 'Load (kg)', ylab = 'Length (cm)',cex=2)> lines( x, y.hat, lty = 1, col = 'red' )> preds = predict(Hooke.lm, se.fit=TRUE)> preds = predict(Hooke.lm, se.fit=TRUE)> CI.ub = preds$fit + preds$se.fit*qt(.975,6-2)> CI.lb = preds$fit - preds$se.fit*qt(.975,6-2)> lines(x,CI.ub,lty=2,col="red")> lines(x,CI.lb,lty=2,col="red")

Page 110: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

110

Inference Concerning Mean of Future Y

Page 111: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

111

Example: concreteCorrosion of steel reinforcing bars is the most importantdurability problem for reinforced concrete structures.

Carbonation of concrete results from a chemical reactionthat also lowers the pH value by enough to initiatecorrosion of the rebar.

Representative data on x = carbonation depth (mm) andy = strength (MPa) for a sample of core specimens takenfrom a particular building follows

Page 112: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

112

Example -- concrete cont’d

Page 113: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

113

Example -- concreteLet’s now calculate a 95% confidence interval, for the meanstrength for all core specimens having a carbonation depthof 45. The interval is centered at

The estimated standard deviation of the statistic is

The 16 df t critical value for a 95% confidence level is 2.120,from which we determine the desired interval to be

cont’d

Page 114: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

114

A Prediction Interval for a FutureValue of Y

Page 115: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

115

A Prediction Interval for a Future Value of Y

Rather than calculate an interval estimate for , aninvestigator may wish to obtain a range or an interval ofplausible values for the value of Y associated with somefuture observation when the independent variable hasvalue x∗.

Consider, for example, relating vocabulary size y to age ofa child x. The CI with x∗ = 6 would provide a range thatcovers with 95% confidence the true average vocabularysize for all 6-year-old children.

Alternatively, we might wish an interval of plausible valuesfor the vocabulary size of a particular 6-year-old child. Howcan you tell that a child is “off the chart” for example?

Page 116: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

116

A Prediction Interval for a Future Value of Y

What is the difference between a confidence interval and aprediction interval?

Page 117: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

117

A Prediction Interval for a Future Value of Y

The error of prediction is a differencebetween two random variables. Because the future value Yis independent of the observed Yi’s,

Page 118: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

118

A Prediction Interval for a Future Value of Y

What is ?

It can be shown that the standardized variable

has a t distribution with n – 2 df.

Page 119: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

119

A Prediction Interval for a Future Value of Y

Manipulating to isolate Y between the two inequalitiesyields the following interval.

A 100(1 – α)% PI for a future Y observation to be madewhen x = x∗ is

Page 120: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

120

A Prediction Interval for a Future Value of Y

The interpretation of the prediction level 100(1 – α)% issimilar to that of previous confidence levels—if is usedrepeatedly, in the long run the resulting interval will actuallycontain the observed y values 100(1 – α)% of the time.

As n → , the width of the CI approaches 0, whereas thewidth of the PI does not (because even with perfectknowledge of _0 and _1, there will still be randomness inprediction).

Page 121: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

121

Example -- concreteLet’s return to the carbonation depth-strength data exampleand calculate a 95% PI for a strength value that wouldresult from selecting a single core specimenwhose depth is 45 mm. Relevant quantities from thatexample are

For a prediction level of 95% based on n – 2 = 16 df, thet critical value is 2.120, exactly what we previously used fora 95% confidence level.

Page 122: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

122

Diagnostic Plots

Page 123: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

123

Diagnostic PlotsThe basic plots that many statisticians recommend for an

assessment of model validity and usefulness are thefollowing:

1. ei∗ (or ei) on the vertical axis versus xi on the horizontal

axis

2. ei∗ (or ei) on the vertical axis versus on the horizontal

axis

3. on the vertical axis versus yi on the horizontal axis

4. A histogram of the standardized residuals

Page 124: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

124

Example

Page 125: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

125

Difficulties and Remedies

Page 126: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

126

Difficulties and RemediesAlthough we hope that our analysis will yield plots like

these, quite frequently the plots will suggest one or moreof the following difficulties:

1. A nonlinear relationship between x and y is appropriate.

2. The variance of ∈ (and of Y) is not a constant σ2 but depends on x.

3. The selected model fits the data well except for a very few discrepant or outlying data values, which may have greatly influenced the choice of the best-fit function.

Page 127: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

127

Difficulties and Remedies4. The error term ∈ does not have a normal distribution.

5. When the subscript i indicates the time order of the observations, the ∈i’s exhibit dependence over time.

6. One or more relevant independent variables have been omitted from the model.

Page 128: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

128

Difficulties and Remedies

(a) (b)

(a) nonlinear relationship; (b) nonconstant variance;Plots that indicate abnormality in data:

Page 129: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

129

Difficulties and Remedies

.

(c) (d)

(c) discrepant observation; (d) observation with large influence;

Plots that indicate abnormality in data:

cont’d

Page 130: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

130

Difficulties and Remedies

.

(e) (f)

Plots that indicate abnormality in data:

(e) dependence in errors; (f) variable omitted

cont’d

Page 131: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

131

Difficulties and RemediesIf the residual plot exhibits a curved pattern, then a

nonlinear function of x may be used.

Figure 13.2(a)

Page 132: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

132

Difficulties and RemediesThe residual plot below suggests that, although a straight-

line relationship may be reasonable, the assumption thatV(Yi) = σ2 for each i is of doubtful validity.

Using advanced methods like weighted LS (WLS), as canmore advanced models, is recommended for inference.

Page 133: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

133

Difficulties and RemediesWhen plots or other evidence suggest that the data set

contains outliers or points having large influence on theresulting fit, one possible approach is to omit theseoutlying points and re-compute the estimated regressionequation.

This would certainly be correct if it were found that theoutliers resulted from errors in recording data values orexperimental errors.

If no assignable cause can be found for the outliers, it isstill desirable to report the estimated equation both withand without outliers omitted.

Page 134: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

134

Difficulties and RemediesYet another approach is to retain possible outliers but to

use an estimation principle that puts relatively lessweight on outlying values than does the principle of leastsquares.

One such principle is MAD (minimize absolute deviations),which selects and to minimize Σ | yi – (b0 + b1xi |.

Unlike the estimates of least squares, there are no niceformulas for the MAD estimates; their values must befound by using an iterative computational procedure.

Page 135: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

135

Difficulties and RemediesSuch procedures are also used when it is suspected that

the ∈i’s have a distribution that is not normal but insteadhave “heavy tails” (making it much more likely than forthe normal distribution that discrepant values will enterthe sample); robust regression procedures are those thatproduce reliable estimates for a wide variety ofunderlying error distributions.

Page 136: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

136

So far we have learned about LR…1. How to fit SLR

2. How to do inference in SLR

3. How to check for assumption violations - we learned to usestandardized residuals instead of ordinary residuals

4. We learned about outliers

5. We learned about the more sneaky kind: the outliers in the Xspace – they almost always have a small ordinary residual (thatresidual’s variance is tiny, since their X is far away from the meanof all X’s), so they won’t be visible on the ordinary residual plot (butthey may be big on the standardized res plot). They are calledleverage points (points with high leverage)

6. Removal of leverage points results in a very different line, thenthey are called influential points.

Page 137: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

137

Today we will talk about MLR

1. How to fit MLR – estimate the true linear relationship(regression surface) between the outcome and a predictor.Interpret slope on X1 as the Average Change in Y when X1increases by 1 unit, holding all other X’s constant.

2. How to do inference for one slope, several slopes (is asubset of slopes significant – does this subset of variables”matter”?), or all slopes.

3. Same approach to checking for assumption violations (non-constant variance, non-linear patterns in residuals)

4. Outliers (outliers in Y space – those points with largeresiduals)

5. Must think about *any* outliers in the X space – leveragepoints -- they may have a small ordinary residual.

Page 138: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

138

Multiple Regression AnalysisDefinitionThe multiple regression model equation is

Y = β0 + β1x1 + β2x2 + ... + βkxk + ∈

where E(∈) = 0 and V(∈) = σ 2.

In addition, for purposes of testing hypotheses andcalculating CIs or PIs, it is assumed that ∈ is normallydistributed.

This is not a regression line any longer, but a regressionsurface.

Page 139: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

139

Multiple Regression AnalysisThe regression coefficient β1 is interpreted as the expected

change in Y associated with a 1-unit increase in x1 whilex2,..., xk are held fixed.

Analogous interpretations hold for β2,..., βk.

Thus, these coefficients are called partial or adjustedregression coefficients.

In contrast, the simple regression slope is called themarginal (or unadjusted) coefficient.

Page 140: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

140

Models with Interaction and Quadratic Predictors

For example, if an investigator has obtained observations on y, x1, andx2, one possible model is

Y = β0 + β1x1 + β2x2 + ∈.

However, other models can be constructed by formingpredictors that are mathematical functions of x1 and/or x2.

For example, with x3 = x1^2 and x4 = x1x2, the model

Y = β0 + β1x1 + β2x2 + β3x3 + β4x4 + ∈

also has the general MR form.

Page 141: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

141

Models with Interaction and Quadratic Predictors

In general, it is not only permissible for some predictorsto be mathematical functions of others but also oftendesirable in the sense that the resulting model may bemuch more successful in explaining variation in y thanany model without such predictors. This is still “linearregression”, even though the relationship betweenoutcome and predictors may not be.

For example, the model

Y = β0 + β1x + β2x2 + ∈ is still MLR with k = 2, x1 = x, and x2= x2.

Page 142: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

142

Example -- travel timeThe article “Estimating Urban Travel Times:

A Comparative Study” (Trans. Res., 1980: 173–175)described a study relating the dependent variabley = travel time between locations in a certain city and theindependent variable x2 = distance between locations.

Two types of vehicles, passenger cars and trucks, wereused in the study.

Let

Page 143: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

143

Example – travel timeOne possible multiple regression model is

Y = β0 + β1x1 + β2x2 + ∈

The mean value of travel time depends on whether avehicle is a car or a truck:

mean time = β0 + β2x2 when x1 = 0 (cars)

mean time = β0 + β1 + β2x2 when x1 = 1 (trucks)

cont’d

Page 144: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

144

Example -- travel timeThe coefficient β1 is the difference in mean times between

trucks and cars with distance held fixed; if β1 > 0, onaverage it will take trucks the same amount of timelonger to traverse any particular distance than it will forcars.

A second possibility is a model with an interactionpredictor:Y = β0 + β1x1 + β2x2 + β3x1x2 + ∈

cont’d

Page 145: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

145

Example – travel timeNow the mean times for the two types of vehicles are

mean time = β0 + β2x2 when x1 = 0

mean time = β0 + β1 + (β2 + β3 )x2 when x1 = 1

Does it make sense to have different intercepts for carsand truck? (Think about what intercept actually means).

So, what would be a third model you could consider?

cont’d

Page 146: 12 Simple Linear Regression and Correlationamath.colorado.edu/sites/default/files/2014/04/462802561/Week10.pdf · The Simple Linear Regression Model Let x 1, x 2, …, x n denote

146

Example – travel timeFor each model, the graph of the mean time versus

distance is a straight line for either type of vehicle, asillustrated in the figure.

Regression functions for models with one dummy variable (x1) and one quantitative variable x2

(a) no interaction (b) interaction

cont’d