math30-6 lecture 4.pptx

8/10/2019 MATH30-6 Lecture 4.pptx

1/32

MATH30-6

Probability and Statistics

Multivariate Analysis


2/32

Objectives

At the end of the lesson, the students are expected to

Construct a scatter diagram;

Use simple linear regression for building empirical

models to engineering and scientific data; Understand how the method of least squares is used to

estimate the parameters in a linear regression model;

and

Interpret the different values obtained.


3/32

Deterministic Relationship

A model that predicts variable perfectly

Example:

The displacement (dt) of a particle at a certain time is

related to its velocity. + where

d0= displacement of the particle from the origin at time

t= 0; andv= velocity.


4/32

Regression Analysis

The collection of statistical tools that are used to model

and explore relationships between variables that are

related in a nondeterministic manner

Used because there are many situations where therelationship between variables is not deterministic

Examples:

- The electrical energy consumption of a house (y) is

related to the size of the house (x, in ft

2

).- The fuel usage of an automobile (y) is related to the

vehicle weight (x).


5/32

Simple Linear Regression

Single regressor variable or predictor variable x and a

dependent or response variable Y

The expected value of Y for each value of x is

| + , where the intercept and slopeare unknown regression coefficients. We assume Y can be described by the model + + (Equation 11-2), where is a

random error with mean zero and (unknown) variance

.


6/32

Simple Linear Regression

The random errors corresponding to different

observations are also assumed to be uncorrelated

random variables.

Regression model may be thought as an empiricalmodel.


7/32

Method of Least Squares

Suppose that we have n pairs of observations, , , , , , . See Fig. 11-3. The estimates of and should result in a line that is

(in some sense) a bestfitto the data. German scientist Karl Gauss (1777-1855) proposed

estimating the parameters and in Equation 11-2to minimize the sum of squares of the vertical

deviationsin Fig. 11-3.

This criterion for estimating the regression coefficients

is called the method of least squares.


8/32



9/32


Using Equation 11-2 ( + + ), we may expressthe nobservations in the sample as + + , 1, 2, ,

Equation (11-3)

and the sum of the squares of the deviations of the

observations from the true regression line is

= =

Equation (11-4)


10/32


The least squares estimators of and , say and ,must satisfy

, 2

= 0

,

2

= 0

Equations (11-5)


11/32


Simplifying Equations (11-5)

+

=

=

=+

=

=

Equations 11-6 (least squares normal equations)


12/32

Least Squares Estimates

Equation 11-7

= = = = =

Equation 11-8

where 1 = and 1 = .


13/32

Least Squares Estimates

Notationally, it is occasionally convenient to give special

symbols to the numerator and denominator of Equation

11-8. Given data , , , , , , , let

=

= = Equation 11-10 (denominator) and

=

=

= = Equation 11-11 (numerator)


14/32


15/32

Fitted or Estimated Regression Line

11.2/398 The grades of a class of 9 students on a midtermreport () and on the final examination () are asfollows:

(a) Estimate the linear regression line.

(b) Estimate the final examination grade of a student who

received a grade of 85 on the midterm report.

77 50 71 72 81 94 96 99 67

82 66 78 34 47 85 99 99 68


16/32


10-11/424 An article in the Journal of MonetaryEconomics assesses the relationship between percentage

growth in wealth over a decade and a half of savings for

baby boomers of age 40 to 55 with these peoplesincome

quartiles. The article presents a table showing five income

quartiles, and for each quartile there is a reported

percentage growth in wealth. The data are as follows.

Run a simple linear regression of these five pairs of

numbers and estimate a linear relationship between

income and percentage growth in wealth.

Income quartile 1 2 3 4 5

Wealth growth (%) 17.3 23.6 40.2 45.8 56.8


17/32


10-12/424 A financial analyst at Goldman Sachs ran a

regression analysis of monthly returns on a certain

investment () versus returns for the same month on theStandard & Poors index (

). The regression results

included 765.98and 934.49. Give the least-squares estimate of the regression slope parameter.


18/32

Correlation

The degree of linear association between the two

random variablesXand Y

Indicated by the correlation coefficient

is the population (true) correlation coefficient,estimated by r, the sample correlation coefficient or

Pearson product-moment correlation coefficient

can take on any value from 1, through 0, to 1.


19/32

Possible Interpretations of

1. When is equal to zero, there is no correlation. Thatis, there is no linear relationship between the tworandom variables.

2. When

1, there is a perfect, positive, linear

relationship between the two variables. That is,whenever one of the variables, or , increases, theother variable also increases; and whenever one ofthe variables decreases, the other one must alsodecrease.

3. When 1 , there is a perfect negative linearrelationship betweenand . Whenor increases,the other variable decreases; and when onedecreases, the other one must increase.


20/32

Possible Interpretations of

4. When the value of is between 0 and 1 in absolutevalue, it reflects the relative strength of the linear

relationship between the two variables. For example,

a correlation of 0.90 implies a relatively strong

positive, relationship between the two variables. A

correlation of 0.70 implies a weaker, negative (as

indicated by the minus sign), linear relationship. A

correlation

0.30 implies a relatively weak

(positive) linear relationship betweenand .


21/32

Correlation


22/32

Correlation


23/32

Sample Correlation Coefficient

The estimate of

Also referred to as the Pearson product-moment

correlation coefficient


24/32

Sample Correlation Coefficient

Interpretations of r

1.00 perfect positive (negative) correlation

0.91 - 0.99 very high positive (negative) correlation

0.71 - 0.90 high positive (negative) correlation

0.51 - 0.70 moderate positive (negative) correlation

0.31 - 0.50 low positive (negative) correlation

0.01 - 0.30 negligible positive (negative) correlation0.00 no correlation


25/32

Coefficient of Determination

Denoted by r2

A descriptive measure of the strength of the regression

relationship, a measure of how well the regression line

fits the data

Ordinarily, we do not use r2for inference about 2.


26/32

Coefficient of Determination11-13/400 A study of the amount of rainfall and the

quantity of air pollution removed produced thefollowing data:

Daily Rainfall, (0.01cm)

Particulate Removed,(g/m3)

4.3 1264.5 121

5.9 116

5.6 118

6.1 114

5.2 118

3.8 132

2.1 141

7.5 108


27/32


28/32

Coefficient of Determination

11-43/436 With reference to Exercise 11.13 on page 400,assume a bivariate normal distribution for and .

(a) Calculate .(b) Test the null hypothesis that

0.5 against the

alternative that


29/32

Summary

A scatter diagram displays observations on two

variables,xand y. Each observation is represented by a

point showing its x-y coordinates. The scatter diagram

can be very effective in revealing the joint variability of

xand yor the nature of relationship between them.

The method of least squares is used to estimate the

parameters of a system by minimizing the sum of the

squares of the differences between the observed

values and the fitted or predicted values from thesystem.


30/32

Summary

Generally, correlation is a measure of the

interdependence among data. The concept may

include more than two variables. The term is most

commonly used in a narrow sense to express the

relationship between quantitative variables or ranks.

The correlation coefficient (r) is a dimensionless

measure of the linear association between two

variables, usually lying in the interval from 1 to +1,

with zero indicating the absence of correlation (but notnecessarily the independence of the two variables.)


31/32

Summary

The coefficient of determination (r2) is often used to

judge the adequacy of a regression mode. Its value

tells that the model accounts for r2% of the variability

in the data.


32/32

References

Aczel-Sounderpandian. Business Statistics, 7th Ed.

2008

Montgomery and Runger. Applied Statistics and

Probability for Engineers, 5thEd. 2011

Walpole, et al. Probability and Statistics for Engineers

and Scientists 9thEd. 2012, 2007, 2002

math30-6 lecture 4.pptx

Documents