math30-6 lecture 4.pptx
TRANSCRIPT
-
8/10/2019 MATH30-6 Lecture 4.pptx
1/32
MATH30-6
Probability and Statistics
Multivariate Analysis
-
8/10/2019 MATH30-6 Lecture 4.pptx
2/32
Objectives
At the end of the lesson, the students are expected to
Construct a scatter diagram;
Use simple linear regression for building empirical
models to engineering and scientific data; Understand how the method of least squares is used to
estimate the parameters in a linear regression model;
and
Interpret the different values obtained.
-
8/10/2019 MATH30-6 Lecture 4.pptx
3/32
Deterministic Relationship
A model that predicts variable perfectly
Example:
The displacement (dt) of a particle at a certain time is
related to its velocity. + where
d0= displacement of the particle from the origin at time
t= 0; andv= velocity.
-
8/10/2019 MATH30-6 Lecture 4.pptx
4/32
Regression Analysis
The collection of statistical tools that are used to model
and explore relationships between variables that are
related in a nondeterministic manner
Used because there are many situations where therelationship between variables is not deterministic
Examples:
- The electrical energy consumption of a house (y) is
related to the size of the house (x, in ft
2
).- The fuel usage of an automobile (y) is related to the
vehicle weight (x).
-
8/10/2019 MATH30-6 Lecture 4.pptx
5/32
Simple Linear Regression
Single regressor variable or predictor variable x and a
dependent or response variable Y
The expected value of Y for each value of x is
| + , where the intercept and slopeare unknown regression coefficients. We assume Y can be described by the model + + (Equation 11-2), where is a
random error with mean zero and (unknown) variance
.
-
8/10/2019 MATH30-6 Lecture 4.pptx
6/32
Simple Linear Regression
The random errors corresponding to different
observations are also assumed to be uncorrelated
random variables.
Regression model may be thought as an empiricalmodel.
-
8/10/2019 MATH30-6 Lecture 4.pptx
7/32
Method of Least Squares
Suppose that we have n pairs of observations, , , , , , . See Fig. 11-3. The estimates of and should result in a line that is
(in some sense) a bestfitto the data. German scientist Karl Gauss (1777-1855) proposed
estimating the parameters and in Equation 11-2to minimize the sum of squares of the vertical
deviationsin Fig. 11-3.
This criterion for estimating the regression coefficients
is called the method of least squares.
-
8/10/2019 MATH30-6 Lecture 4.pptx
8/32
Method of Least Squares
-
8/10/2019 MATH30-6 Lecture 4.pptx
9/32
Method of Least Squares
Using Equation 11-2 ( + + ), we may expressthe nobservations in the sample as + + , 1, 2, ,
Equation (11-3)
and the sum of the squares of the deviations of the
observations from the true regression line is
= =
Equation (11-4)
-
8/10/2019 MATH30-6 Lecture 4.pptx
10/32
Method of Least Squares
The least squares estimators of and , say and ,must satisfy
, 2
= 0
,
2
= 0
Equations (11-5)
-
8/10/2019 MATH30-6 Lecture 4.pptx
11/32
Method of Least Squares
Simplifying Equations (11-5)
+
=
=
=+
=
=
Equations 11-6 (least squares normal equations)
-
8/10/2019 MATH30-6 Lecture 4.pptx
12/32
Least Squares Estimates
Equation 11-7
= = = = =
Equation 11-8
where 1 = and 1 = .
-
8/10/2019 MATH30-6 Lecture 4.pptx
13/32
Least Squares Estimates
Notationally, it is occasionally convenient to give special
symbols to the numerator and denominator of Equation
11-8. Given data , , , , , , , let
=
= = Equation 11-10 (denominator) and
=
=
= = Equation 11-11 (numerator)
-
8/10/2019 MATH30-6 Lecture 4.pptx
14/32
-
8/10/2019 MATH30-6 Lecture 4.pptx
15/32
Fitted or Estimated Regression Line
11.2/398 The grades of a class of 9 students on a midtermreport () and on the final examination () are asfollows:
(a) Estimate the linear regression line.
(b) Estimate the final examination grade of a student who
received a grade of 85 on the midterm report.
77 50 71 72 81 94 96 99 67
82 66 78 34 47 85 99 99 68
-
8/10/2019 MATH30-6 Lecture 4.pptx
16/32
Fitted or Estimated Regression Line
10-11/424 An article in the Journal of MonetaryEconomics assesses the relationship between percentage
growth in wealth over a decade and a half of savings for
baby boomers of age 40 to 55 with these peoplesincome
quartiles. The article presents a table showing five income
quartiles, and for each quartile there is a reported
percentage growth in wealth. The data are as follows.
Run a simple linear regression of these five pairs of
numbers and estimate a linear relationship between
income and percentage growth in wealth.
Income quartile 1 2 3 4 5
Wealth growth (%) 17.3 23.6 40.2 45.8 56.8
-
8/10/2019 MATH30-6 Lecture 4.pptx
17/32
Fitted or Estimated Regression Line
10-12/424 A financial analyst at Goldman Sachs ran a
regression analysis of monthly returns on a certain
investment () versus returns for the same month on theStandard & Poors index (
). The regression results
included 765.98and 934.49. Give the least-squares estimate of the regression slope parameter.
-
8/10/2019 MATH30-6 Lecture 4.pptx
18/32
Correlation
The degree of linear association between the two
random variablesXand Y
Indicated by the correlation coefficient
is the population (true) correlation coefficient,estimated by r, the sample correlation coefficient or
Pearson product-moment correlation coefficient
can take on any value from 1, through 0, to 1.
-
8/10/2019 MATH30-6 Lecture 4.pptx
19/32
Possible Interpretations of
1. When is equal to zero, there is no correlation. Thatis, there is no linear relationship between the tworandom variables.
2. When
1, there is a perfect, positive, linear
relationship between the two variables. That is,whenever one of the variables, or , increases, theother variable also increases; and whenever one ofthe variables decreases, the other one must alsodecrease.
3. When 1 , there is a perfect negative linearrelationship betweenand . Whenor increases,the other variable decreases; and when onedecreases, the other one must increase.
-
8/10/2019 MATH30-6 Lecture 4.pptx
20/32
Possible Interpretations of
4. When the value of is between 0 and 1 in absolutevalue, it reflects the relative strength of the linear
relationship between the two variables. For example,
a correlation of 0.90 implies a relatively strong
positive, relationship between the two variables. A
correlation of 0.70 implies a weaker, negative (as
indicated by the minus sign), linear relationship. A
correlation
0.30 implies a relatively weak
(positive) linear relationship betweenand .
-
8/10/2019 MATH30-6 Lecture 4.pptx
21/32
Correlation
-
8/10/2019 MATH30-6 Lecture 4.pptx
22/32
Correlation
-
8/10/2019 MATH30-6 Lecture 4.pptx
23/32
Sample Correlation Coefficient
The estimate of
Also referred to as the Pearson product-moment
correlation coefficient
-
8/10/2019 MATH30-6 Lecture 4.pptx
24/32
Sample Correlation Coefficient
Interpretations of r
1.00 perfect positive (negative) correlation
0.91 - 0.99 very high positive (negative) correlation
0.71 - 0.90 high positive (negative) correlation
0.51 - 0.70 moderate positive (negative) correlation
0.31 - 0.50 low positive (negative) correlation
0.01 - 0.30 negligible positive (negative) correlation0.00 no correlation
-
8/10/2019 MATH30-6 Lecture 4.pptx
25/32
Coefficient of Determination
Denoted by r2
A descriptive measure of the strength of the regression
relationship, a measure of how well the regression line
fits the data
Ordinarily, we do not use r2for inference about 2.
-
8/10/2019 MATH30-6 Lecture 4.pptx
26/32
Coefficient of Determination11-13/400 A study of the amount of rainfall and the
quantity of air pollution removed produced thefollowing data:
Daily Rainfall, (0.01cm)
Particulate Removed,(g/m3)
4.3 1264.5 121
5.9 116
5.6 118
6.1 114
5.2 118
3.8 132
2.1 141
7.5 108
-
8/10/2019 MATH30-6 Lecture 4.pptx
27/32
-
8/10/2019 MATH30-6 Lecture 4.pptx
28/32
Coefficient of Determination
11-43/436 With reference to Exercise 11.13 on page 400,assume a bivariate normal distribution for and .
(a) Calculate .(b) Test the null hypothesis that
0.5 against the
alternative that
-
8/10/2019 MATH30-6 Lecture 4.pptx
29/32
Summary
A scatter diagram displays observations on two
variables,xand y. Each observation is represented by a
point showing its x-y coordinates. The scatter diagram
can be very effective in revealing the joint variability of
xand yor the nature of relationship between them.
The method of least squares is used to estimate the
parameters of a system by minimizing the sum of the
squares of the differences between the observed
values and the fitted or predicted values from thesystem.
-
8/10/2019 MATH30-6 Lecture 4.pptx
30/32
Summary
Generally, correlation is a measure of the
interdependence among data. The concept may
include more than two variables. The term is most
commonly used in a narrow sense to express the
relationship between quantitative variables or ranks.
The correlation coefficient (r) is a dimensionless
measure of the linear association between two
variables, usually lying in the interval from 1 to +1,
with zero indicating the absence of correlation (but notnecessarily the independence of the two variables.)
-
8/10/2019 MATH30-6 Lecture 4.pptx
31/32
Summary
The coefficient of determination (r2) is often used to
judge the adequacy of a regression mode. Its value
tells that the model accounts for r2% of the variability
in the data.
-
8/10/2019 MATH30-6 Lecture 4.pptx
32/32
References
Aczel-Sounderpandian. Business Statistics, 7th Ed.
2008
Montgomery and Runger. Applied Statistics and
Probability for Engineers, 5thEd. 2011
Walpole, et al. Probability and Statistics for Engineers
and Scientists 9thEd. 2012, 2007, 2002