ch11 curve fitting

Ch11 Curve Fitting

Dr. Deshi [email protected]

2/30

Outline

The method of Least SquaresInferences based on the Least Squares EstimatorsCurvilinear RegressionMultiple Regression

3/30

11.1 The Method of Least Squares

Study the case where a dependent variable is to be predicted in terms of a single independent variable. The random variable Y depends on a random variable X. Regressing curve of Y on x, the relationship between x and the mean of the corresponding distribution of Y.

4/30

Linear regression

5/30

Linear regression

Linear regression: for any x, the mean of the distribution of the Y’s is given by x

In general, Y will differ from this mean, and we denote this difference as follows

Y x is a random variable and we can also choose

so that the mean of the distribution of this random is equal to zero.

6/30

EXx 1 2 3 4 5 6 7 8 9 10 11 12y 16 35 45 64 86 96 106 124 134 156 164 182

7/30

Analysisˆ

ˆi i i

y a bxe y y

1

n

ii

e as close as possible to zero.

8/30

Principle of least squares

2 2

1 1

( ( ))n n

i i ii i

e y a bx

Choose a and b so that

is minimum. The procedure of finding the equation of the line which best fits a given set of paired data, called the method of least squares. Some notations:

2

2 2 1

1 1

( )( )

n

in ni

xx i ii i

xS x x x

n

2

2 2 1

1 1

( )( )

n

in ni

yy i ii i

yS y y y

n

1 1

1 1

( )( )( )( )

n n

i in ni i

xy i i i ii i

x yS x x y y x y

n

9/30

Least squares estimators

, where , are the means of ,xy

xx

Sa y b x and b x y x y

S

Fitted (or estimated) regression line

y a bx Residuals: observation – fitted value= ( )i iy a bx

The minimum value of the sum of squares is called the residual sum of squares or error sum of squares. We will show that n

2

1

2

residual sum of squares= ( - - )

/

i ii

xy xy xx

SSE y a bx

S S S

10/30

EX solution

Y = 14.8 X + 4.35

11/30

X-and-Y

X-axis Y-axis independent dependent predictor predicted carrier response input output

12/30

Example

You’re a marketing analyst for Hasbro Toys. You gather the following data:

Ad $ Sales (Units)1 12 13 24 25 4

What is the relationship between sales & advertising?

13/30

01234

0 1 2 3 4 5

Scattergram Sales vs. Advertising

Sales

Advertising

14/30

the Least Squares Estimators

15/30

11.2 Inference based on the Least Squares Estimators

We assume that the regression is linear in x and, furthermore, that the n random variable Yi are independently normally distribution with the means Statistical model for straight-line regression

i i iY x

ix

i are independent normal distributed random variable having zero means and the common variance 2

16/30

Standard error of estimate

The i-th deviation and the estimate of is

2

2 2

1

1 [ ( )]2

n

e i ii

S y a bxn

Estimate of can also be written as follows2

2

2

( )

2

xyyy

xxe

SS

SS

n

17/30

Statistics for inferences: based on the assumption made concerning the distribution of the values of Y, the following theorem holds. Theorem. The statistics

2

( ) ( )( )

xxxx

e xx e

nSa bt and t Ss S n x s

are values of random variables having the t distribution with n-2 degrees of freedom.

Confidence intervals2

/ 2

/ 2

1 ( ):

1:

exx

exx

xa t sn S

b t sS

18/30

Example

The following data pertain to number of computer jobs per day and the central processing unit (CPU) time required.

Number of jobsx

CPU timey

12345

254910

19/30

EX

1) Obtain a least squares fit of a line to the observations on CPU time

2, 0xy

xx

Sb a y bx

S 2y x

20/30

Example2) Construct a 95% confidence interval for α

22 / 46 400 /10 2

2 3yy xy xx

e

S S Ss

n

The 95% confidence interval of α, / 2 0.025 3.182t t

2

/ 21 1 90 3.182 * 2 * 4.72

5 10exx

xa t sn S

21/30

Example

3) Test the null hypothesis against the alternative hypothesis at the 0.05 level of significance.

1

1

Solution: the t statistic is given by

( ) 2 1 10 2.2362xx

e

bt Ss

Criterion: 0.05 2.353t t

Decision: we cannot reject the null hypothesis

22/30

11.3 Curvilinear Regression

Regression curve is nonlinear.Polynomial regression:

20 1 2

ppY x x x

Y on x is exponential, the mean of the distribution of values of Y is given by xy

Take logarithms, we have log log logy x

Thus, we can estimate by the pairs of value , ( , log )i ix y

23/30

Polynomial regression

If there is no clear indication about the function form of the regression of Y on x, we assume it is polynomial regression

20 1 2

kkY a a x a x a x

24/30

Polynomial Fitting•Really just a generalization of the previous case•Exact solution•Just big matrices

25/30

11.4 Multiple Regression

0 1 1 2 2 k kb b x b x b x The mean of Y on x is given by

0 1 1 2 2

21 0 1 1 1 2 1 2

22 0 2 1 1 2 2 2

y nb b x b x

x y b x b x b x x

x y b x b x x b x

20 1 1

1

[ ( )]n

i i k iki

y b b x b x

Minimize

We can solve it when r=2 by the following equations

26/30

Example

P365.

27/30

Multiple Linear Fitting

X1(x), . . .,XM(x) are arbitrary fixed functions of x (can be nonlinear), called the basis functions

normal equations of the least squaresproblem

Can be put in matrix form and solved

28/30

Correlation Models

1. How strong is the linear relationship between 2 variables?2. Coefficient of correlation used

Population correlation coefficient denoted Values range from -1 to +1

29/30

Correlation

Standardized observation

The sample correlation coefficient r

Observation - Sample meanSample standard deviation

i

x

x xs

1

1 ( )( )1

ni i

i x y

x x y yrn s s

30/30

Coefficient of Correlation Values

-1.0-1.0 +1.0+1.000-.5-.5 +.5+.5

No No CorrelationCorrelation

Increasing degree of Increasing degree of negative correlationnegative correlation

Increasing degree of Increasing degree of positive correlationpositive correlation

ch11 curve fitting

Documents

distribution of values

squares estimatorsfitted

squares estimatorswe

squares estimators11

error sum of squares

t distribution

values of y

residual sum of squares