ch11 curve fitting
DESCRIPTION
Ch11 Curve Fitting. Dr. Deshi Ye [email protected]. Outline. The method of Least Squares Inferences based on the Least Squares Estimators Curvilinear Regression Multiple Regression. 11.1 The Method of Least Squares. - PowerPoint PPT PresentationTRANSCRIPT
Ch11 Curve Fitting
Dr. Deshi [email protected]
2/30
Outline
The method of Least SquaresInferences based on the Least Squares EstimatorsCurvilinear RegressionMultiple Regression
3/30
11.1 The Method of Least Squares
Study the case where a dependent variable is to be predicted in terms of a single independent variable. The random variable Y depends on a random variable X. Regressing curve of Y on x, the relationship between x and the mean of the corresponding distribution of Y.
4/30
Linear regression
5/30
Linear regression
Linear regression: for any x, the mean of the distribution of the Y’s is given by x
In general, Y will differ from this mean, and we denote this difference as follows
Y x is a random variable and we can also choose
so that the mean of the distribution of this random is equal to zero.
6/30
EXx 1 2 3 4 5 6 7 8 9 10 11 12y 16 35 45 64 86 96 106 124 134 156 164 182
7/30
Analysisˆ
ˆi i i
y a bxe y y
1
n
ii
e as close as possible to zero.
8/30
Principle of least squares
2 2
1 1
( ( ))n n
i i ii i
e y a bx
Choose a and b so that
is minimum. The procedure of finding the equation of the line which best fits a given set of paired data, called the method of least squares. Some notations:
2
2 2 1
1 1
( )( )
n
in ni
xx i ii i
xS x x x
n
2
2 2 1
1 1
( )( )
n
in ni
yy i ii i
yS y y y
n
1 1
1 1
( )( )( )( )
n n
i in ni i
xy i i i ii i
x yS x x y y x y
n
9/30
Least squares estimators
, where , are the means of ,xy
xx
Sa y b x and b x y x y
S
Fitted (or estimated) regression line
y a bx Residuals: observation – fitted value= ( )i iy a bx
The minimum value of the sum of squares is called the residual sum of squares or error sum of squares. We will show that n
2
1
2
residual sum of squares= ( - - )
/
i ii
xy xy xx
SSE y a bx
S S S
10/30
EX solution
Y = 14.8 X + 4.35
11/30
X-and-Y
X-axis Y-axis independent dependent predictor predicted carrier response input output
12/30
Example
You’re a marketing analyst for Hasbro Toys. You gather the following data:
Ad $ Sales (Units)1 12 13 24 25 4
What is the relationship between sales & advertising?
13/30
01234
0 1 2 3 4 5
Scattergram Sales vs. Advertising
Sales
Advertising
14/30
the Least Squares Estimators
15/30
11.2 Inference based on the Least Squares Estimators
We assume that the regression is linear in x and, furthermore, that the n random variable Yi are independently normally distribution with the means Statistical model for straight-line regression
i i iY x
ix
i are independent normal distributed random variable having zero means and the common variance 2
16/30
Standard error of estimate
The i-th deviation and the estimate of is
2
2 2
1
1 [ ( )]2
n
e i ii
S y a bxn
Estimate of can also be written as follows2
2
2
( )
2
xyyy
xxe
SS
SS
n
17/30
Statistics for inferences: based on the assumption made concerning the distribution of the values of Y, the following theorem holds. Theorem. The statistics
2
( ) ( )( )
xxxx
e xx e
nSa bt and t Ss S n x s
are values of random variables having the t distribution with n-2 degrees of freedom.
Confidence intervals2
/ 2
/ 2
1 ( ):
1:
exx
exx
xa t sn S
b t sS
18/30
Example
The following data pertain to number of computer jobs per day and the central processing unit (CPU) time required.
Number of jobsx
CPU timey
12345
254910
19/30
EX
1) Obtain a least squares fit of a line to the observations on CPU time
2, 0xy
xx
Sb a y bx
S 2y x
20/30
Example2) Construct a 95% confidence interval for α
22 / 46 400 /10 2
2 3yy xy xx
e
S S Ss
n
The 95% confidence interval of α, / 2 0.025 3.182t t
2
/ 21 1 90 3.182 * 2 * 4.72
5 10exx
xa t sn S
21/30
Example
3) Test the null hypothesis against the alternative hypothesis at the 0.05 level of significance.
1
1
Solution: the t statistic is given by
( ) 2 1 10 2.2362xx
e
bt Ss
Criterion: 0.05 2.353t t
Decision: we cannot reject the null hypothesis
22/30
11.3 Curvilinear Regression
Regression curve is nonlinear.Polynomial regression:
20 1 2
ppY x x x
Y on x is exponential, the mean of the distribution of values of Y is given by xy
Take logarithms, we have log log logy x
Thus, we can estimate by the pairs of value , ( , log )i ix y
23/30
Polynomial regression
If there is no clear indication about the function form of the regression of Y on x, we assume it is polynomial regression
20 1 2
kkY a a x a x a x
24/30
Polynomial Fitting•Really just a generalization of the previous case•Exact solution•Just big matrices
25/30
11.4 Multiple Regression
0 1 1 2 2 k kb b x b x b x The mean of Y on x is given by
0 1 1 2 2
21 0 1 1 1 2 1 2
22 0 2 1 1 2 2 2
y nb b x b x
x y b x b x b x x
x y b x b x x b x
20 1 1
1
[ ( )]n
i i k iki
y b b x b x
Minimize
We can solve it when r=2 by the following equations
26/30
Example
P365.
27/30
Multiple Linear Fitting
X1(x), . . .,XM(x) are arbitrary fixed functions of x (can be nonlinear), called the basis functions
normal equations of the least squaresproblem
Can be put in matrix form and solved
28/30
Correlation Models
1. How strong is the linear relationship between 2 variables?2. Coefficient of correlation used
Population correlation coefficient denoted Values range from -1 to +1
29/30
Correlation
Standardized observation
The sample correlation coefficient r
Observation - Sample meanSample standard deviation
i
x
x xs
1
1 ( )( )1
ni i
i x y
x x y yrn s s
30/30
Coefficient of Correlation Values
-1.0-1.0 +1.0+1.000-.5-.5 +.5+.5
No No CorrelationCorrelation
Increasing degree of Increasing degree of negative correlationnegative correlation
Increasing degree of Increasing degree of positive correlationpositive correlation