correlation and simple regression introduction to business statistics, 5e kvanli/guynes/pavur...

24
Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing

Post on 19-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Correlation andSimple Regression

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Bivariate Data

X = family IncomeY = square footage of their home

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Figure 14.1

Coefficient of Correlation

The sample coefficient of correlation, r, measures the strength of the linear relationship that exists within a sample of n bivariate data.

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Coefficient of Correlation Value

r ( x x )( y y )

( x x )2 (y y ) 2

xy ( x)( y) / n

x2 ( x)2 / n y2 ( y)2 / n

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Coefficient of Correlation Properties

• r ranges from -1.0 to 1.0.

• The larger |r| is, the stronger the linear relationship.

• r near zero indicates that there is no linear relationship. X and Y are uncorrelated.

• r = 1 or -1 implies that a perfect linear pattern exists between the two variables.

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Coefficient of Correlation Properties

• The sign of r tells you whether the relationship between X and Y is a positive (direct) or a negative (inverse) relationship.

• The value of r tells you very little about the slope of the line. Except if the sign of r is positive the slope of the line is positive and if r is negative then the slope is negative.

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Various Values of r

Figure 14.2

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Covariance

• The sample covariance between two variables, cov(X,Y) is a measure of the joint variation of the two variables X and Y and is defined to be:

cov( X,Y) 1

n 1 (x x )( y y )

1

n 1SCPXY

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Least Squares Line

The least squares line is the line through the data that minimizes the sum of the differences between the observations and the line. d2 = d1

2 + d22 + d3

2 + … + dn2

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Least Squares LineFigure 14.6

XbbY o 1ˆ

b1 SCPXY

SSXand b0 y b1x

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Sum of Squares of Error

d2 = SSE (y ˆ y )2

SSE ( y ˆ y )2 = SSY (SCPXY)2

SSX

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Assumptions for the Simple Regression Model

• The mean of each error component is zero.

• Each error component (random variable) follows an approximate normal distribution.

• The variance of the error component is the same for each value of X.

• The errors are independent of each other.

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Simple Linear Regression Model Assumptions

Figure 14.10Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Hypothesis Test on theSlope of the Regression Line

Ho : 1 = 0 (X provides no information)Ha : 1 0 (X does provide information)

Figure 14.11Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Hypothesis Test on theSlope of the Regression Line

Ho : 1 = 0Ha : 1 0

t b1 1

s/ SSX

b1 1

sb1

Reject Ho if |t| > t, n-2Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

(1- ) 100% Confidence Interval for 1

b1 t / 2,n 2 sb1to b1 t / 2 ,n 2sb1

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Measuring the Strength of the Model

r b1SSX

SSY

b1

SCP XYSS X

r SCPXY

SSX SSY

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Hypothesis Test to Determine the Significance of the Model

Ho : 1 = 0 (no linear relationship exists)Ha : 1 0 ( a linear relationship exists)

t r

1 r2

n 2

Reject Ho if |t| > t, n-2Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Coefficient of Determination

SSE SSY (SCPXY)2

SSXand r2

(SCPXY)2

SS XSSY

r 2 coefficient of determination

1 SSE

SSY

percentage of explained variation in the dependentvariable using the simple linear regression model

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Total Variation

Figure 14.18Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

(1- ) 100% Confidence Interval for y|x

ˆ Y t / 2,n 2 s1

n

( x0 x ) 2

SS Xto ˆ Y t / 2,n 2 s

1

n

(x0 x )2

SS X

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Prediction Interval for YXo

ˆ Y t / 2,n 2 s 11

n

( x0 x )2

SSX

Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Checking Model Assumptions

Figure 14.22Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing

Checking Model Assumptions

Figure 14.23Introduction to Business Statistics, 5e

Kvanli/Guynes/Pavur

(c)2000 South-Western College Publishing