correlation and simple regression introduction to business statistics, 5e kvanli/guynes/pavur...
Post on 19-Dec-2015
223 views
TRANSCRIPT
Correlation andSimple Regression
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Bivariate Data
X = family IncomeY = square footage of their home
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Figure 14.1
Coefficient of Correlation
The sample coefficient of correlation, r, measures the strength of the linear relationship that exists within a sample of n bivariate data.
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Coefficient of Correlation Value
r ( x x )( y y )
( x x )2 (y y ) 2
xy ( x)( y) / n
x2 ( x)2 / n y2 ( y)2 / n
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Coefficient of Correlation Properties
• r ranges from -1.0 to 1.0.
• The larger |r| is, the stronger the linear relationship.
• r near zero indicates that there is no linear relationship. X and Y are uncorrelated.
• r = 1 or -1 implies that a perfect linear pattern exists between the two variables.
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Coefficient of Correlation Properties
• The sign of r tells you whether the relationship between X and Y is a positive (direct) or a negative (inverse) relationship.
• The value of r tells you very little about the slope of the line. Except if the sign of r is positive the slope of the line is positive and if r is negative then the slope is negative.
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Various Values of r
Figure 14.2
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Covariance
• The sample covariance between two variables, cov(X,Y) is a measure of the joint variation of the two variables X and Y and is defined to be:
cov( X,Y) 1
n 1 (x x )( y y )
1
n 1SCPXY
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Least Squares Line
The least squares line is the line through the data that minimizes the sum of the differences between the observations and the line. d2 = d1
2 + d22 + d3
2 + … + dn2
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Least Squares LineFigure 14.6
XbbY o 1ˆ
b1 SCPXY
SSXand b0 y b1x
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Sum of Squares of Error
d2 = SSE (y ˆ y )2
SSE ( y ˆ y )2 = SSY (SCPXY)2
SSX
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Assumptions for the Simple Regression Model
• The mean of each error component is zero.
• Each error component (random variable) follows an approximate normal distribution.
• The variance of the error component is the same for each value of X.
• The errors are independent of each other.
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Simple Linear Regression Model Assumptions
Figure 14.10Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Hypothesis Test on theSlope of the Regression Line
Ho : 1 = 0 (X provides no information)Ha : 1 0 (X does provide information)
Figure 14.11Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Hypothesis Test on theSlope of the Regression Line
Ho : 1 = 0Ha : 1 0
t b1 1
s/ SSX
b1 1
sb1
Reject Ho if |t| > t, n-2Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
(1- ) 100% Confidence Interval for 1
b1 t / 2,n 2 sb1to b1 t / 2 ,n 2sb1
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Measuring the Strength of the Model
r b1SSX
SSY
b1
SCP XYSS X
r SCPXY
SSX SSY
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Hypothesis Test to Determine the Significance of the Model
Ho : 1 = 0 (no linear relationship exists)Ha : 1 0 ( a linear relationship exists)
t r
1 r2
n 2
Reject Ho if |t| > t, n-2Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Coefficient of Determination
SSE SSY (SCPXY)2
SSXand r2
(SCPXY)2
SS XSSY
r 2 coefficient of determination
1 SSE
SSY
percentage of explained variation in the dependentvariable using the simple linear regression model
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Total Variation
Figure 14.18Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
(1- ) 100% Confidence Interval for y|x
ˆ Y t / 2,n 2 s1
n
( x0 x ) 2
SS Xto ˆ Y t / 2,n 2 s
1
n
(x0 x )2
SS X
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Prediction Interval for YXo
ˆ Y t / 2,n 2 s 11
n
( x0 x )2
SSX
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing
Checking Model Assumptions
Figure 14.22Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur
(c)2000 South-Western College Publishing