linear regression 12

Upload: ashigaik

Post on 06-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Linear Regression 12

    1/65

    2003 Prentice-Hall, Inc. Chap 10-1

    Business Statistics:A First Course

    (3rd Edition)

    Chapter 10

    Simple Linear Regression

  • 8/3/2019 Linear Regression 12

    2/65

    2003 Prentice-Hall, Inc.Chap 10-2

    Chapter Topics

    Types of Regression Models

    Determining the Simple Linear Regression

    Equation Measures of Variation

    Assumptions of Regression and Correlation

    Residual Analysis Measuring Autocorrelation

    Inferences about the Slope

  • 8/3/2019 Linear Regression 12

    3/65

    2003 Prentice-Hall, Inc.Chap 10-3

    Chapter Topics

    Correlation - Measuring the Strength of theAssociation

    Estimation of Mean Values and Prediction ofIndividual Values

    Pitfalls in Regression and Ethical Issues

    (continued)

  • 8/3/2019 Linear Regression 12

    4/65

    2003 Prentice-Hall, Inc.Chap 10-4

    Purpose of Regression Analysis

    Regression Analysis is Used Primarily to ModelCausality and Provide Prediction

    Predict the values of a dependent (response)variable based on values of at least oneindependent (explanatory) variable

    Explain the effect of the independent variables on

    the dependent variable

  • 8/3/2019 Linear Regression 12

    5/65

    2003 Prentice-Hall, Inc.Chap 10-5

    Types of Regression Models

    Positive Linear Relationship

    Negative Linear Relationship

    Relationship NOT Linear

    No Relationship

  • 8/3/2019 Linear Regression 12

    6/65

    2003 Prentice-Hall, Inc.Chap 10-6

    Simple Linear Regression Model

    Relationship Between Variables is Describedby a Linear Function

    The Change of One Variable Causes the OtherVariable to Change

    A Dependency of One Variable on the Other

  • 8/3/2019 Linear Regression 12

    7/65 2003 Prentice-Hall, Inc.

    Chap 10-7

    PopulationRegressionLine

    (conditional mean)

    Simple Linear Regression Model

    Population regression line is a straight line thatdescribes the dependence of the average valueaverage value

    (conditional mean)(conditional mean) of one variable on the other

    PopulationY intercept

    PopulationSlopeCoefficient

    RandomError

    Dependent(Response)Variable

    Independent(Explanatory)Variable

    ii iY X 0 1+ +=|Y X

    (continued)

  • 8/3/2019 Linear Regression 12

    8/65 2003 Prentice-Hall, Inc.Chap 10-8

    Simple Linear Regression Model(continued)

    ii iY X

    0 1+ +=

    = Random Error

    Y

    X

    (Observed Value ofY) =

    Observed Value ofY

    |Y X iX 0 1= +

    i

    0

    1

    (Conditional Mean)

  • 8/3/2019 Linear Regression 12

    9/65 2003 Prentice-Hall, Inc.Chap 10-9

    Sample regression line provides an estimateestimateofthe population regression line as well as apredicted value of Y

    Linear Regression Equation

    Sample

    Y Intercept

    Sample

    Slope

    Coefficient

    Residual0 1i iib bY X e+ +=

    0 1Y b b X = + =

    Simple Regression Equation

    (Fitted Regression Line, Predicted Value)

  • 8/3/2019 Linear Regression 12

    10/65 2003 Prentice-Hall, Inc.Chap 10-10

    Linear Regression Equation

    and are obtained by finding the valuesof and that minimizes the sum of the

    squared residuals

    provides an estimateestimateof

    provides and estimateestimateof

    0b 1b

    0b 1b

    0b 01b 1

    (continued)

    ( )2

    2

    1 1

    n n

    i i i

    i i

    Y Y e= =

    =

  • 8/3/2019 Linear Regression 12

    11/65 2003 Prentice-Hall, Inc.Chap 10-11

    Linear Regression Equation(continued)

    Y

    X

    Observed Value

    |Y X iX 0 1= +

    i

    0

    1

    ii iY X 0 1+ +=

    0 1

    i iY b b X = +

    ie

    0 1i iib bY X e+ +=

    1b

    0b

  • 8/3/2019 Linear Regression 12

    12/65 2003 Prentice-Hall, Inc.Chap 10-12

    Interpretation of the Slope andIntercept

    is the average value of Y when

    the value of X is zero.

    measures the change in the

    average value of Y as a result of a one-unitchange in X.

    | 0Y X 0 ==

    |

    1

    Y X

    X

    =

  • 8/3/2019 Linear Regression 12

    13/65 2003 Prentice-Hall, Inc.Chap 10-13

    Interpretation of the Slope and

    Intercept

    is the estimatedestimatedaverage value

    of Y when the value of X is zero.

    is the estimatedestimatedchange in the

    average value of Y as a result of a one-unit

    change in X.

    (continued)

    | 0

    Y Xb

    0 ==

    |

    1

    Y X

    bX

    =

  • 8/3/2019 Linear Regression 12

    14/65 2003 Prentice-Hall, Inc.Chap 10-14

    Simple Linear Regression:Example

    You wish to examinethe linear dependency

    of the annual sales ofproduce stores on theirsizes in square footage.Sample data for 7

    stores were obtained.Find the equation ofthe straight line thatfits the data best.

    AnnualStore Square Sales

    Feet ($1000)

    1 1,726 3,6812 1,542 3,395

    3 2,816 6,653

    4 5,555 9,543

    5 1,292 3,318

    6 2,208 5,563

    7 1,313 3,760

  • 8/3/2019 Linear Regression 12

    15/65 2003 Prentice-Hall, Inc.Chap 10-15

    Scatter Diagram: Example

    0

    2000

    4000

    6000

    8000

    10000

    12000

    0 1000 2000 3000 4000 5000 6000

    Sq u a re F e e t

    A

    nnualSales

    ($000)

    Excel Output

  • 8/3/2019 Linear Regression 12

    16/65 2003 Prentice-Hall, Inc.Chap 10-16

    Simple Linear RegressionEquation: Example

    0 1

    1636.415 1.487

    i i

    i

    Y b b X

    X

    = +

    = +

    From Excel Printout:

    Coefficients

    Intercept 1636.414726

    X Variable 1.486633657

  • 8/3/2019 Linear Regression 12

    17/65 2003 Prentice-Hall, Inc.Chap 10-17

    Graph of the Simple LinearRegression Equation: Example

    0

    2000

    4000

    6000

    8000

    10000

    12000

    0 1000 2000 3000 4000 5000 6000

    Square Fee t

    A

    nnualSales

    ($000)

    Yi=16

    36.415

    +1.48

    7Xi

  • 8/3/2019 Linear Regression 12

    18/65 2003 Prentice-Hall, Inc.

    Chap 10-18

    Interpretation of Results:Example

    The slope of 1.487 means that each increase of oneunit in X, we predict the average of Y to increase byan estimated 1.487 units.

    The equation estimates that foreach increase of 1square footin the size of the store, the expectedannual sales are predictedto increase by $1487.

    1636.415 1.487i iY X= +

  • 8/3/2019 Linear Regression 12

    19/65 2003 Prentice-Hall, Inc.

    Chap 10-19

    Simple Linear Regression inPHStat

    In Excel, use PHStat | Regression | Simple

    Linear Regression

    EXCEL Spreadsheet of Regression Sales onFootage

    Microsoft Excel

    Worksheet

    f

  • 8/3/2019 Linear Regression 12

    20/65 2003 Prentice-Hall, Inc.

    Chap 10-20

    Measures of Variation:

    The Sum of Squares

    SST = SSR + SSE

    Total

    Sample

    Variability

    = Explained

    Variability

    + Unexplained

    Variability

    f i i

  • 8/3/2019 Linear Regression 12

    21/65 2003 Prentice-Hall, Inc.

    Chap 10-21

    Measures of Variation:

    The Sum of Squares

    SST = Total Sum of Squares Measures the variation of the Yi values around

    their mean,

    SSR = Regression Sum of Squares Explained variation attributable to the relationship

    between Xand Y

    SSE = Error Sum of Squares Variation attributable to factors other than the

    relationship between Xand Y

    (continued)

    Y

  • 8/3/2019 Linear Regression 12

    22/65 2003 Prentice-Hall, Inc.

    Chap 10-22

    Measures of Variation:The Sum of Squares

    (continued)

    Xi

    Y

    X

    Y

    SST=(Yi

    -Y)2

    SSE=(Yi-Yi)2

    SSR= (Yi-Y)2

    _

    _

    _

  • 8/3/2019 Linear Regression 12

    23/65 2003 Prentice-Hall, Inc.

    Chap 10-23

    The ANOVA Table in Excel

    ANOVAdf SS MS F Significance

    FRegression k SSR MSR =SSR/k

    MSR/MSE P-value of

    the F Test

    Residuals n-k-1 SSE MSE=SSE/(n-k-1)Total n-1 SST

  • 8/3/2019 Linear Regression 12

    24/65

    2003 Prentice-Hall, Inc.Chap 10-24

    Measures of VariationThe Sum of Squares: Example

    A NO V A

    d f SS MS F Signi ficance F

    Regress ion 1 3 0380 456 .12 303 8045 6 8 1.17 909 0.00 028 120

    Residual 5 1 87 11 99 .5 95 3 74 23 9.9 2

    Total 6 32251655.71

    Excel Output for Produce Stores

    SSR

    SSERegression (explained) df

    Degrees of freedom

    Error (residual) df

    Total df

    SST

  • 8/3/2019 Linear Regression 12

    25/65

    2003 Prentice-Hall, Inc.Chap 10-25

    The Coefficient of Determination

    Measures the proportion of variation in Y thatis explained by the independent variable X inthe regression model

    2 Regression Sum of Squares

    Total Sum of Squares

    SSRr

    SST= =

  • 8/3/2019 Linear Regression 12

    26/65

    2003 Prentice-Hall, Inc.Chap 10-26

    Coefficients of Determination (r2)and Correlation (r)

    r2 = 1, r2 = 1,

    r2 = .81, r2 = 0,Y

    Yi= b0 + b1XiX

    ^

    Y

    Yi= b0 + b1Xi

    X

    ^Y

    Yi= b0 + b1XiX

    ^

    Y

    Yi= b0 + b1XiX

    ^

    r= +1 r= -1

    r= +0.9 r= 0

  • 8/3/2019 Linear Regression 12

    27/65

    2003 Prentice-Hall, Inc.Chap 10-27

    Standard Error of Estimate

    The standard deviation of the variation ofobservations around the regression equation

    ( )2

    1

    2 2

    n

    i

    i

    YX

    Y YSSE

    Sn n

    =

    = =

    f

  • 8/3/2019 Linear Regression 12

    28/65

    2003 Prentice-Hall, Inc.Chap 10-28

    Measures of Variation:Produce Store Example

    Regression Statistics

    Multiple R 0.9705572

    R Square 0.94198129

    Adjusted R Square 0.93037754

    Standard Error 611.751517

    Observations 7

    Excel Output for Produce Stores

    r2 = .94

    94% of the variation in annual sales can beexplained by the variability in the size of the

    store as measured by square footage

    Syx

  • 8/3/2019 Linear Regression 12

    29/65

    2003 Prentice-Hall, Inc.Chap 10-29

    Linear Regression Assumptions

    Normality Y values are normally distributed for each X

    Probability distribution of error is normal 2. Homoscedasticity (Constant Variance)

    3. Independence of Errors

    V i ti f E A d th

  • 8/3/2019 Linear Regression 12

    30/65

    2003 Prentice-Hall, Inc.Chap 10-30

    Y values are normally distributed

    around the regression line.

    For eachXvalue, the spread or

    variance around the regression line isthe same.

    Variation of Errors Around theRegression Line

    X1

    X2

    X

    Y

    f()

    Sample Regression Line

  • 8/3/2019 Linear Regression 12

    31/65

    2003 Prentice-Hall, Inc.Chap 10-31

    Residual Analysis

    Purposes Examine linearity

    Evaluate violations of assumptions Graphical Analysis of Residuals

    Plot residuals vs. X and time

  • 8/3/2019 Linear Regression 12

    32/65

    2003 Prentice-Hall, Inc.Chap 10-32

    Residual Analysis for Linearity

    Not Linear Linear

    X

    e eX

    Y

    X

    Y

    X

    R id l A l i f

  • 8/3/2019 Linear Regression 12

    33/65

    2003 Prentice-Hall, Inc.Chap 10-33

    Residual Analysis forHomoscedasticity

    Heteroscedasticity Homoscedasticity

    SR

    X

    SR

    X

    Y

    X X

    Y

  • 8/3/2019 Linear Regression 12

    34/65

    2003 Prentice-Hall, Inc.Chap 10-34

    Residual Plot

    0 1000 2000 3000 4000 5000 6000

    Square Feet

    Residual Analysis:Excel Outputfor Produce Stores Example

    Excel Output

    Observation Predicted Y Residuals1 4202.344417 -521.3444173

    2 3928.803824 -533.8038245

    3 5822.775103 830.2248971

    4 9894.664688 -351.6646882

    5 3557.14541 -239.1454103

    6 4918.90184 644.09816037 3588.364717 171.6352829

    f

  • 8/3/2019 Linear Regression 12

    35/65

    2003 Prentice-Hall, Inc.Chap 10-35

    Residual Analysis forIndependence

    The Durbin-Watson Statistic Used when data is collected over time to detect

    autocorrelation (residuals in one time period are

    related to residuals in another period) Measures violation of independence assumption

    2

    1

    2

    2

    1

    ( )n

    i i

    i

    n

    i

    i

    e e

    D

    e

    =

    =

    =

    Should be close to 2.If not, examine the modelfor autocorrelation.

    D bi W t St ti ti i

  • 8/3/2019 Linear Regression 12

    36/65

    2003 Prentice-Hall, Inc.Chap 10-36

    Durbin-Watson Statistic inPHStat

    PHStat | Regression | Simple LinearRegression Check the box for Durbin-Watson Statistic

    Obt i i th C iti l V l f

  • 8/3/2019 Linear Regression 12

    37/65

    2003 Prentice-Hall, Inc.Chap 10-37

    5 = .

    k=1 k=2

    n dL dU dL dU

    15 1.08 1.36 .95 1.54

    16 1.10 1.37 .98 1.54

    Obtaining the Critical Values ofDurbin-Watson Statistic

    Table 13.4 Finding critical values of Durbin-Watson Statistic

    U i th D bi W t

  • 8/3/2019 Linear Regression 12

    38/65

    2003 Prentice-Hall, Inc.Chap 10-38

    Accept H0(no autocorrelatin)

    Using the Durbin-WatsonStatistic

    : No autocorrelation (error terms are independent)

    : There is autocorrelation (error terms are notindependent)

    0H

    1H

    0 42dL

    4-dL

    dU

    4-dU

    Reject H0(positive

    autocorrelation)

    Inconclusive Reject H0(negative

    autocorrelation)

    Resid al Anal sis fo

  • 8/3/2019 Linear Regression 12

    39/65

    2003 Prentice-Hall, Inc.Chap 10-39

    Residual Analysis forIndependence

    Not Independent Independente e

    TimeTime

    Residual Is Plotted Against Time to Detect Any Autocorrelation

    No Particular PatternCyclical Pattern

    Graphical Approach

    Inference about the Slope:

  • 8/3/2019 Linear Regression 12

    40/65

    2003 Prentice-Hall, Inc.Chap 10-40

    Inference about the Slope:t Test

    t Test for a Population Slope Is there a linear dependency ofYon X?

    Null and Alternative Hypotheses

    H0: 1 = 0 (No Linear Dependency) H1: 1 0 (Linear Dependency)

    Test Statistic

    1

    1

    1 1

    2

    1

    where

    ( )

    YX

    bn

    b

    i

    i

    b St SS

    X X

    =

    = =

    . . 2d f n=

  • 8/3/2019 Linear Regression 12

    41/65

    2003 Prentice-Hall, Inc.Chap 10-41

    Example: Produce Store

    Data for 7 Stores:

    Estimated RegressionEquation:

    The slope of thismodel is 1.487.

    Does Square FootageAffect Annual Sales?

    AnnualStore Square Sales

    Feet ($000)

    1 1,726 3,681

    2 1,542 3,395

    3 2,816 6,653

    4 5,555 9,5435 1,292 3,318

    6 2,208 5,563

    7 1,313 3,760

    1636.415 1.487 iY X= +

    Inferences about the Slope:

  • 8/3/2019 Linear Regression 12

    42/65

    2003 Prentice-Hall, Inc.Chap 10-42

    Inferences about the Slope:t Test Example

    H0: 1 = 0H1: 1 0= .05df= 7 - 2 = 5Critical Value(s):

    Test Statistic:

    Decision:Conclusion:

    There is evidence that

    square footage affects

    annual sales.

    t0 2.5706-2.5706

    .025

    Reject Reject

    .025

    From Excel Printout

    Reject H0

    Coef fic ientsS tandard Errort Stat P -value

    In te rc e pt 1 63 6. 41 47 4 51 .4 95 3 3 .6 24 4 0 .0 15 1F ootage 1.4866 0.1650 9.0099 0.0002

    1b 1bS t

    Inferences about the Slope:

  • 8/3/2019 Linear Regression 12

    43/65

    2003 Prentice-Hall, Inc.Chap 10-43

    Inferences about the Slope:Confidence Interval Example

    Confidence Interval Estimate of the Slope:

    11 2n bb t S

    Excel Printout for Produce Stores

    At 95% level of confidence the confidence interval forthe slope is (1.062, 1.911). Does not include 0.

    Conclusion:There is a significant linear dependencyof annual sales on the size of the store.

    Lower 95% Upper 95%

    Intercept 475.810926 2797.01853

    X Va riable 1.06249037 1.91077694

    Inferences about the Slope:

  • 8/3/2019 Linear Regression 12

    44/65

    2003 Prentice-Hall, Inc.Chap 10-44

    Inferences about the Slope:F Test

    F Test for a Population Slope Is there a linear dependency ofYon X?

    Null and Alternative Hypotheses

    H0: 1 = 0 (No Linear Dependency) H1: 1 0 (Linear Dependency)

    Test Statistic

    Numerator d.f.=1, denominator d.f.=n-2

    ( )

    1

    2

    SSR

    FSSE

    n

    =

    Relationship between a t Test

  • 8/3/2019 Linear Regression 12

    45/65

    2003 Prentice-Hall, Inc.Chap 10-45

    Relationship between a t Testand an F Test

    Null and Alternative Hypotheses H0: 1 = 0 (No Linear Dependency)

    H1: 1 0 (Linear Dependency)

    ( )2

    2 1, 2n nt F =

    Inferences about the Slope:

  • 8/3/2019 Linear Regression 12

    46/65

    2003 Prentice-Hall, Inc.Chap 10-46

    A N O V A

    d f S S M S F S ig n i f i c a n

    R e g r e ssi o n 1 3 0 3 8 0 4 5 6 . 1 23 0 3 8 0 4 5 6 . 1 28 1 .1 79 0 . 0 0 0 2

    R e si d u a l 5 1 8 7 1 1 9 9 .5 9 53 7 4 2 3 9 . 9 1 9

    T o ta l 6 3 2 2 5 1 6 5 5 . 7 1

    Inferences about the Slope:F Test Example

    Test Statistic:

    Decision:

    Conclusion:

    H0: 1= 0H1: 1 0= .05numeratordf = 1denominatordf= 7 - 2 = 5

    There is evidence that

    square footage affects

    annual sales.

    From Excel Printout

    Reject H0

    0 6.61

    Reject

    = .05

    1, 2nF

  • 8/3/2019 Linear Regression 12

    47/65

    2003 Prentice-Hall, Inc.Chap 10-47

    Purpose of Correlation Analysis

    Correlation Analysis is Used to MeasureStrength of Association (Linear Relationship)Between 2 Numerical Variables Only Strength of the Relationship is Concerned No Causal Effect is Implied

  • 8/3/2019 Linear Regression 12

    48/65

    2003 Prentice-Hall, Inc.Chap 10-48

    Purpose of Correlation Analysis

    Population Correlation Coefficient(Rho) isUsed to Measure the Strength between the

    Variables

    Sample Correlation Coefficient r is anEstimate of and is Used to Measure theStrength of the Linear Relationship in the

    Sample Observations

    (continued)

    Sample of Observations from

  • 8/3/2019 Linear Regression 12

    49/65

    2003 Prentice-Hall, Inc.Chap 10-49r= .6 r= 1

    Sample of Observations fromVarious r Values

    Y

    X

    Y

    X

    Y

    X

    Y

    X

    Y

    X

    r= -1 r= -.6 r= 0

  • 8/3/2019 Linear Regression 12

    50/65

    2003 Prentice-Hall, Inc.Chap 10-50

    Features of and r

    Unit Free

    Range between -1 and 1

    The Closer to -1, the Stronger the NegativeLinear Relationship

    The Closer to 1, the Stronger the PositiveLinear Relationship

    The Closer to 0, the Weaker the LinearRelationship

  • 8/3/2019 Linear Regression 12

    51/65

    2003 Prentice-Hall, Inc.Chap 10-51

    t Test for Correlation

    Hypotheses H0:= 0 (No Correlation)

    H1: 0 (Correlation)

    Test Statistic

    ( ) ( )

    ( ) ( )

    2

    2 1

    2 2

    1 1

    where

    2

    n

    i i

    i

    n n

    i i

    i i

    rt

    r

    n

    X X Y Y

    r r

    X X Y Y

    =

    = =

    =

    1

    = =

  • 8/3/2019 Linear Regression 12

    52/65

    2003 Prentice-Hall, Inc.Chap 10-52

    Example: Produce Stores

    Regression Statistics

    M ult iple R 0.9705572

    R S quare 0.94198129

    A djus ted R S quare 0.93037754

    S t andard E rror 611.751517

    O bs ervations 7

    From Excel Printout r

    Is there anyevidence of linear

    relationship betweenAnnual Sales of astore and its SquareFootage at .05 levelof significance? H0:= 0 (No association)

    H1: 0 (Association)

    = .05df= 7 - 2 = 5

    Example: Produce Stores

  • 8/3/2019 Linear Regression 12

    53/65

    2003 Prentice-Hall, Inc.Chap 10-53

    Example: Produce StoresSolution

    0 2.5706-2.5706

    .025Reject Reject.025

    Critical Value(s):

    Conclusion:There is evidence of a linear

    relationship at 5% level of

    significance

    Decision:Reject H0

    2

    .97069.0099

    1 .9420

    52

    rt

    r

    n

    = = =

    1

    The value of the t statistic isexactly the same as the tstatistic value for test on theslope coefficient

  • 8/3/2019 Linear Regression 12

    54/65

    2003 Prentice-Hall, Inc.Chap 10-54

    Estimation of Mean Values

    Confidence Interval Estimate for :

    The Mean ofYgiven a particular Xi

    2

    22

    1

    ( )1

    ( )

    ii n YX n

    i

    i

    X XY t S n

    X X

    =

    +

    t value from table

    with df=n-2

    Standard error

    of the estimate

    Size of interval vary according todistance away from mean, X

    | iY X X

    =

  • 8/3/2019 Linear Regression 12

    55/65

    2003 Prentice-Hall, Inc.Chap 10-55

    Prediction of Individual Values

    Prediction Interval for IndividualResponse Yi at a Particular Xi

    Addition of 1 increases width of intervalfrom that for the mean of Y

    2

    2

    2

    1

    ( )1 1

    ( )

    ii n YX n

    i

    i

    X XY t Sn

    X X

    =

    + +

    Inte l E tim te fo Diffe ent

  • 8/3/2019 Linear Regression 12

    56/65

    2003 Prentice-Hall, Inc.Chap 10-56

    Interval Estimates for DifferentValues ofX

    Y

    X

    Prediction Interval

    for a individual Yi

    A givenX

    Confidence

    Interval for the

    mean ofY

    Yi=b0+

    b1Xi

    X

  • 8/3/2019 Linear Regression 12

    57/65

    2003 Prentice-Hall, Inc.Chap 10-57

    Example: Produce Stores

    Data for 7 Stores:

    Regression Equation

    Obtained:

    Consider a storewith 2000 squarefeet.

    AnnualStore Square Sales

    Feet ($000)

    1 1,726 3,681

    2 1,542 3,395

    3 2,816 6,653

    4 5,555 9,543

    5 1,292 3,318

    6 2,208 5,563

    7 1,313 3,760

    1636.415 1.487i

    Y X= +

    Estimation of Mean Values:

  • 8/3/2019 Linear Regression 12

    58/65

    2003 Prentice-Hall, Inc.Chap 10-58

    Estimation of Mean Values:Example

    Find the 95% confidence interval for the average annualsales for stores of 2,000 square feet

    2

    22

    1

    ( )1 4610.45 612.66

    ( )

    ii n YX n

    i

    i

    X XY t S

    nX X

    =

    + =

    Predicted Sales

    Confidence Interval Estimate for | iY X X =

    ( ) 1636.415 1.487 4610.45 $000iY X= + =

    2 52350.29 611.75 2.5706YX nX S t t= = = =

    Prediction Interval for Y :

  • 8/3/2019 Linear Regression 12

    59/65

    2003 Prentice-Hall, Inc.Chap 10-59

    Prediction Interval for Y:Example

    Find the 95% prediction interval for annual sales of oneparticular store of 2,000 square feet

    Predicted Sales)

    2

    2

    2

    1

    ( )1 1 4610.45 1687.68

    ( )

    ii n YX n

    i

    i

    X XY t S

    nX X

    =

    + + =

    Prediction Interval for IndividualiX X

    Y =

    ( ) 1636.415 1.487 4610.45 $000iY X= + =

    2 52350.29 611.75 2.5706YX nX S t t= = = =

    Estimation of Mean Values and Prediction

  • 8/3/2019 Linear Regression 12

    60/65

    2003 Prentice-Hall, Inc.Chap 10-60

    of Individual Values in PHStat

    In Excel, use PHStat | Regression | Simple

    Linear Regression

    Check the Confidence and Prediction Interval forX= box

    EXCEL Spreadsheet of Regression Sales on

    Footage

    Microsoft Excel

    Worksheet

  • 8/3/2019 Linear Regression 12

    61/65

    2003 Prentice-Hall, Inc.Chap 10-61

    Pitfalls of Regression Analysis

    Lacking an Awareness of the AssumptionsUnderlining Least-squares Regression

    Not Knowing How to Evaluate the Assumptions Not Knowing What the Alternatives to Least-

    squares Regression are if a ParticularAssumption is Violated

    Using a Regression Model Without Knowledgeof the Subject Matter

    Strategy for Avoiding the Pitfalls

  • 8/3/2019 Linear Regression 12

    62/65

    2003 Prentice-Hall, Inc.Chap 10-62

    Strategy for Avoiding the Pitfallsof Regression

    Start with a scatter plot of X on Y to observepossible relationship

    Perform residual analysis to check theassumptions

    Use a histogram, stem-and-leaf display, box-

    and-whisker plot, or normal probability plot ofthe residuals to uncover possible non-normality

    Strategy for Avoiding the Pitfalls

  • 8/3/2019 Linear Regression 12

    63/65

    2003 Prentice-Hall, Inc.Chap 10-63

    Strategy for Avoiding the Pitfallsof Regression

    If there is violation of any assumption, usealternative methods (e.g., least absolutedeviation regression or least median of squares

    regression) to least-squares regression oralternative least-squares models (e.g.,curvilinear or multiple regression)

    If there is no evidence of assumption violation,then test for the significance of the regressioncoefficients and construct confidence intervalsand prediction intervals

    (continued)

  • 8/3/2019 Linear Regression 12

    64/65

    2003 Prentice-Hall, Inc.Chap 10-64

    Chapter Summary

    Introduced Types of Regression Models

    Discussed Determining the Simple LinearRegression Equation

    Described Measures of Variation

    Addressed Assumptions of Regression andCorrelation

    Discussed Residual Analysis

    Addressed Measuring Autocorrelation

  • 8/3/2019 Linear Regression 12

    65/65

    Chapter Summary

    Described Inference about the Slope

    Discussed Correlation - Measuring theStrength of the Association

    Addressed Estimation of Mean Values andPrediction of Individual Values

    Discussed Pitfalls in Regression and Ethical

    Issues

    (continued)