site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2011/04/reglec31.pdf · overview the yield y iof...

Regression AnalysisChapter 3: Simple Linear Regression

Matrix Version

Department of MathematicsThe Islamic University of Gaza

2010-2011, Semester 2

Dr. Bisher M. Iqelan (Department of Math.)3: Simple Linear Regression (Matrix Version) 2010-2011, Semester 2 1 / 77

Overview

The yield Yi of a process in which amount Xi of a material is used, isrecorded on n different occasions. It is assumed that the mean yielddepends linearly on the amount of material, so that

Yi = β0 + β1Xi + εi, i = 1, 2, . . . , n, (1)

where β0 and β1 are unknown quantities, E εi = 0 and Var εi = σ2

for all i, and the Yi’s are uncorrelated.

The system of equations (1) can be written in matrix form as

Y = Xβ + E ,

Y =

Y1Y2...Yn

, X =

1 X11 X2...

...1 Xn

, β =(β0β1

)and E =

ε1ε2...εn

Overview

Yi = β0 + β1Xi + εi, i = 1, 2, . . . , n, (1)

for all i, and the Yi’s are uncorrelated.The system of equations (1) can be written in matrix form as

Y = Xβ + E ,

Y =

Y1Y2...Yn

, X =

1 X11 X2...

...1 Xn

, β =(β0β1

)and E =

ε1ε2...εn

Overview

Yi = β0 + β1Xi + εi, i = 1, 2, . . . , n, (1)

for all i, and the Yi’s are uncorrelated.The system of equations (1) can be written in matrix form as

Y = Xβ + E ,

Y =

Y1Y2...Yn

, X =

1 X11 X2...

...1 Xn

, β =(β0β1

)and E =

ε1ε2...εn

Review of Matrices

I A matrix: a rectangular array of elements arranged in rows andcolumns

I an example

Column Column1 2

Row 1Row 2Row 3

10 20100 2001000 2000

Review of Matrices

I an example

Column Column1 2

Row 1Row 2Row 3

10 20100 2001000 2000

Review of Matrices

I an example

Column Column1 2

Row 1Row 2Row 3

10 20100 2001000 2000

A matrix with r rows and c columns

I

A =

a11 a12 . . . a1j . . . a1ca21 a22 . . . a2j . . . a2c

......

ai1 ai2 . . . aij . . . aic...

......

...ar1 ar2 . . . arj . . . arc

I Sometimes, in short notations we denote it as

A = [aij ] i = 1, ..., r; j = 1, ..., c

I r and c are called the dimension of a matrix

I

A =

a11 a12 . . . a1j . . . a1ca21 a22 . . . a2j . . . a2c

......

ai1 ai2 . . . aij . . . aic...

......

...ar1 ar2 . . . arj . . . arc

A = [aij ] i = 1, ..., r; j = 1, ..., c

I

A =

a11 a12 . . . a1j . . . a1ca21 a22 . . . a2j . . . a2c

......

ai1 ai2 . . . aij . . . aic...

......

...ar1 ar2 . . . arj . . . arc

A = [aij ] i = 1, ..., r; j = 1, ..., c

I

A =

a11 a12 . . . a1j . . . a1ca21 a22 . . . a2j . . . a2c

......

ai1 ai2 . . . aij . . . aic...

......

...ar1 ar2 . . . arj . . . arc

A = [aij ] i = 1, ..., r; j = 1, ..., c

Square matrix and Vector

I Square matrix: equal number of rows and columns

(1 75 −2

),

a11 a12 a13a21 a22 a23a31 a32 a33

I Vector: matrix with only one row or one column

A =(

3 −2 5), B =

2610

(1 75 −2

),

a11 a12 a13a21 a22 a23a31 a32 a33

A =(

3 −2 5), B =

2610

(1 75 −2

),

a11 a12 a13a21 a22 a23a31 a32 a33

A =(

3 −2 5), B =

2610

Transpose of a matrix and equality of matrices

I Transpose of a matrix A is another matrix denoted by AT

A =

2 5−4 06 1

, AT = ( 2 −4 65 0 1

)

I Two matrices are equal if they have the same dimension and all thecorresponding elements are equal. Suppose for example

A =

a11 a12a21 a22a31 a32

, B = 12 50−3 10

16 21

If A = B, then a11 = 12, a12 = 50, and so on.

A =

2 5−4 06 1

, AT = ( 2 −4 65 0 1

)

A =

a11 a12a21 a22a31 a32

, B = 12 50−3 10

16 21

A =

2 5−4 06 1

, AT = ( 2 −4 65 0 1

)

A =

a11 a12a21 a22a31 a32

, B = 12 50−3 10

16 21

A =

2 5−4 06 1

, AT = ( 2 −4 65 0 1

)

A =

a11 a12a21 a22a31 a32

, B = 12 50−3 10

16 21

Matrix addition and subtraction

I Adding or subtracting of two matrices require that they have thesame dimension.

A =

(1 3 52 4 6

), B =

(2 5 13 6 7

)

A+B =

(1 + 2 3 + 5 5 + 12 + 3 4 + 6 6 + 7

)=

(3 8 65 10 13

)

A−B =(

1− 2 3− 5 5− 12− 3 4− 6 6− 7

)=

(−1 −2 4−1 −2 −1

)

Matrix addition and subtraction

I Adding or subtracting of two matrices require that they have thesame dimension.

A =

(1 3 52 4 6

), B =

(2 5 13 6 7

)

A+B =

(1 + 2 3 + 5 5 + 12 + 3 4 + 6 6 + 7

)=

(3 8 65 10 13

)

A−B =(

1− 2 3− 5 5− 12− 3 4− 6 6− 7

)=

(−1 −2 4−1 −2 −1

)

Matrix multiplication

I Multiplication of a Matrix by a Scalar

A =

5 2 53 4 01 6 7

,4 ∗A = A ∗ 4 = 4

5 2 53 4 01 6 7

= 20 8 2012 16 0

4 24 28

I Multiplication of a Matrix by a Matrix. If A has dimension r × cand B has dimension c× s, the product AB is a matrix of dimensionr × s with the element in the ith row and jth column:

c∑k=1

aikbkj

A =

5 2 53 4 01 6 7

,4 ∗A = A ∗ 4 = 4

5 2 53 4 01 6 7

= 20 8 2012 16 0

4 24 28

c∑k=1

aikbkj

A =

5 2 53 4 01 6 7

,4 ∗A = A ∗ 4 = 4

5 2 53 4 01 6 7

= 20 8 2012 16 0

4 24 28

c∑k=1

aikbkj

Matrix multiplication Examples

I Example 1

(2 4 03 1 5

) 1 21 00 3

= ( 2 ∗ 1 + 4 ∗ 1 + 0 ∗ 0 2 ∗ 2 + 4 ∗ 0 + 0 ∗ 33 ∗ 1 + 1 ∗ 1 + 5 ∗ 0 3 ∗ 2 + 1 ∗ 0 + 5 ∗ 3

)

=

(6 44 21

)

I Example 2 (5 32 6

)(a1a2

)=

(5a1 + 3a22a1 + 6a2

)

I Example 1

(2 4 03 1 5

) 1 21 00 3

= ( 2 ∗ 1 + 4 ∗ 1 + 0 ∗ 0 2 ∗ 2 + 4 ∗ 0 + 0 ∗ 33 ∗ 1 + 1 ∗ 1 + 5 ∗ 0 3 ∗ 2 + 1 ∗ 0 + 5 ∗ 3

)

=

(6 44 21

)I Example 2 (

5 32 6

)(a1a2

)=

(5a1 + 3a22a1 + 6a2

)

I Example 1

(2 4 03 1 5

) 1 21 00 3

= ( 2 ∗ 1 + 4 ∗ 1 + 0 ∗ 0 2 ∗ 2 + 4 ∗ 0 + 0 ∗ 33 ∗ 1 + 1 ∗ 1 + 5 ∗ 0 3 ∗ 2 + 1 ∗ 0 + 5 ∗ 3

)

=

(6 44 21

)I Example 2 (

5 32 6

)(a1a2

)=

(5a1 + 3a22a1 + 6a2

)

Regression Examples

I One can easily check that1 X11 X2...

...1 Xn

(β0β1

)=

β0 + β1X1β0 + β1X2

...β0 + β1Xn

I Now let

Y =

Y1Y2...Yn

, X =

1 X11 X2...

...1 Xn

, β =(β0β1

)and E =

ε1ε2...εn

Regression Examples

...1 Xn

(β0β1

)=

β0 + β1X1β0 + β1X2

...β0 + β1Xn

I Now let

Y =

Y1Y2...Yn

, X =

1 X11 X2...

...1 Xn

, β =(β0β1

)and E =

ε1ε2...εn

Regression Examples

...1 Xn

(β0β1

)=

β0 + β1X1β0 + β1X2

...β0 + β1Xn

I Now let

Y =

Y1Y2...Yn

, X =

1 X11 X2...

...1 Xn

, β =(β0β1

)and E =

ε1ε2...εn

Regression Models

The regression model

Y1 = β0 + β1X1 + ε1

Y2 = β0 + β1X2 + ε2...

...

Yn = β0 + β1Xn + εn

can be written asY = Xβ + E

Regression Models

Y1 = β0 + β1X1 + ε1

Y2 = β0 + β1X2 + ε2...

...

Regression Models

Y1 = β0 + β1X1 + ε1

Y2 = β0 + β1X2 + ε2...

...

Regression Models Important Calculations

I Other Calculations

XTX =

(1 1 . . . 1X1 X2 . . . Xn

)1 X11 X2...

...1 Xn

=

n

n∑i=1

Xi

n∑i=1

Xi

n∑i=1

X2i

XTX =

(1 1 . . . 1X1 X2 . . . Xn

)1 X11 X2...

...1 Xn

=

n

n∑i=1

Xi

n∑i=1

Xi

n∑i=1

X2i

XTX =

(1 1 . . . 1X1 X2 . . . Xn

)1 X11 X2...

...1 Xn

=

n

n∑i=1

Xi

n∑i=1

Xi

n∑i=1

X2i

Regression Models Important Calculations (Cont..)

XTY =

(1 1 . . . 1X1 X2 . . . Xn

)Y1Y2...Yn

=

n∑i=1

Yi

n∑i=1

XiYi

and

Y TY =(Y1 Y2 . . . Yn

)

Y1Y2...Yn

=n∑i=1

Y 2i

XTY =

(1 1 . . . 1X1 X2 . . . Xn

)Y1Y2...Yn

=

n∑i=1

Yi

n∑i=1

XiYi

and

Y TY =(Y1 Y2 . . . Yn

)

Y1Y2...Yn

=n∑i=1

Y 2i

XTY =

(1 1 . . . 1X1 X2 . . . Xn

)Y1Y2...Yn

=

n∑i=1

Yi

n∑i=1

XiYi

and

Y TY =(Y1 Y2 . . . Yn

)

Y1Y2...Yn

=n∑i=1

Y 2i

Special types of matrices

I Symmetric Matrix: A = AT

A =

1 3 53 2 45 4 9

I Diagonal Matrix: a square zero matrix whose diagonal elementsare not all zeros.

B =

b11 0 00 b22 00 0 b33

I Identity Matrix

I =

1 0 00 1 00 0 1

facts: for any appropriate matrix AI = A and IB = B.

A =

1 3 53 2 45 4 9

B =

b11 0 00 b22 00 0 b33

I Identity Matrix

I =

1 0 00 1 00 0 1

A =

1 3 53 2 45 4 9

B =

b11 0 00 b22 00 0 b33

I Identity Matrix

I =

1 0 00 1 00 0 1

A =

1 3 53 2 45 4 9

B =

b11 0 00 b22 00 0 b33

I Identity Matrix

I =

1 0 00 1 00 0 1

I zero vector and unit vector:

0 =

00...0

, 1 =

11...1

I Inverse of a square matrix: the inverse of a square matrix A isanother square matrix, denoted by A−1, such that AA−1=A−1A=ISince(

−0.1 0.40.3 −0.2

)(2 43 1

)= I =

(2 43 1

)(−0.1 0.40.3 −0.2

)So

A−1 =

(−0.1 0.40.3 −0.2

)

0 =

00...0

, 1 =

11...1

I Inverse of a square matrix: the inverse of a square matrix A isanother square matrix, denoted by A−1, such that AA−1=A−1A=I

Since(−0.1 0.40.3 −0.2

)(2 43 1

)= I =

(2 43 1

)(−0.1 0.40.3 −0.2

)So

A−1 =

(−0.1 0.40.3 −0.2

)

0 =

00...0

, 1 =

11...1

−0.1 0.40.3 −0.2

)(2 43 1

)= I =

(2 43 1

)(−0.1 0.40.3 −0.2

)

So

A−1 =

(−0.1 0.40.3 −0.2

)

0 =

00...0

, 1 =

11...1

−0.1 0.40.3 −0.2

)(2 43 1

)= I =

(2 43 1

)(−0.1 0.40.3 −0.2

)So

A−1 =

(−0.1 0.40.3 −0.2

)Dr. Bisher M. Iqelan (Department of Math.)3: Simple Linear Regression (Matrix Version) 2010-2011, Semester 2 15 / 77

0 =

00...0

, 1 =

11...1

−0.1 0.40.3 −0.2

)(2 43 1

)= I =

(2 43 1

)(−0.1 0.40.3 −0.2

)So

A−1 =

(−0.1 0.40.3 −0.2

Finding the Inverse of a matrix

I For 2× 2 matrix, we can easily find its inverse; If

A =

(a bc d

)then

A−1 =

(dD

−bD

−cD

aD

)where D = ad− bc

I For high dimensional matrix, its inverse is not easy to calculate byhand

A =

(a bc d

)then

A−1 =

(dD

−bD

−cD

aD

)where D = ad− bc

A =

(a bc d

)then

A−1 =

(dD

−bD

−cD

aD

)where D = ad− bc

Regression Example (continue)

I the inverse of matrix

XTX =

n

n∑i=1

Xi

n∑i=1

Xi

n∑i=1

X2i

D = n

n∑i=1

X2i − (n∑i=1

Xi)2 = n

n∑i=1

X2i −(

n∑i=1

Xi)2

n

= n

n∑i=1

(Xi − X̄)2

Regression Example (continue)

So

(XTX)−1

=

n∑i=1

X2i

n

n∑i=1

(Xi − X̄)2

−n∑i=1

Xi

n

n∑i=1

(Xi − X̄)2

−n∑i=1

Xi

n

n∑i=1

(Xi − X̄)2n

n

n∑i=1

(Xi − X̄)2

Use of Inverse Matrix

I Suppose we want to solve two equations:

2Y1 + 4Y2 = 20

3Y1 + Y2 = 10

Rewrite the equations in matrix notation:(2 43 1

)(Y1Y2

)=

(2010

)So the solution to the equations(

Y1Y2

)=

(2 43 1

)−1(2010

)=

(−0.1 0.40.3 −0.2

)(2010

)=

(24

)I Estimation a regression model need to solve linear equations, andinverse matrix is very useful.

2Y1 + 4Y2 = 20

3Y1 + Y2 = 10

)(Y1Y2

)=

(2010

)

So the solution to the equations(Y1Y2

)=

(2 43 1

)−1(2010

)=

(−0.1 0.40.3 −0.2

)(2010

)=

(24

2Y1 + 4Y2 = 20

3Y1 + Y2 = 10

)(Y1Y2

)=

(2010

Y1Y2

)=

(2 43 1

)−1(2010

)=

(−0.1 0.40.3 −0.2

)(2010

)=

(24

)

I Estimation a regression model need to solve linear equations, andinverse matrix is very useful.

2Y1 + 4Y2 = 20

3Y1 + Y2 = 10

)(Y1Y2

)=

(2010

Y1Y2

)=

(2 43 1

)−1(2010

)=

(−0.1 0.40.3 −0.2

)(2010

)=

(24

2Y1 + 4Y2 = 20

3Y1 + Y2 = 10

)(Y1Y2

)=

(2010

Y1Y2

)=

(2 43 1

)−1(2010

)=

(−0.1 0.40.3 −0.2

)(2010

)=

(24

Other basic facts for matrices

I A+B = B +A

I C(A+B) = CA+CB

I (AT )T

= A

I (AB)T = BTAT

I (A−1)−1

= A

I (AB)−1 = B−1A−1

I (AT )−1

= (A−1)T

Other basic facts for matrices

I A+B = B +A

I C(A+B) = CA+CB

I (AT )T

= A

I (AB)T = BTAT

I (A−1)−1

= A

I (AB)−1 = B−1A−1

I (AT )−1

= (A−1)T

Random vector and matrices

I Random vector

Y =

Y1Y2Y3

I Expectation of random vector

E(Y ) =

E(Y1)E(Y2)E(Y3)

I For any Random vectors

Y =

Y1Y2Y3

, Z = Z1Z2

Z3

Then

E(Y +Z) = E(Y ) + E(Z)

I Random vector

Y =

Y1Y2Y3

E(Y ) =

E(Y1)E(Y2)E(Y3)

Y =

Y1Y2Y3

, Z = Z1Z2

Z3

Then

E(Y +Z) = E(Y ) + E(Z)

I Random vector

Y =

Y1Y2Y3

E(Y ) =

E(Y1)E(Y2)E(Y3)

Y =

Y1Y2Y3

, Z = Z1Z2

Z3

Then

E(Y +Z) = E(Y ) + E(Z)

I Random vector

Y =

Y1Y2Y3

E(Y ) =

E(Y1)E(Y2)E(Y3)

Y =

Y1Y2Y3

, Z = Z1Z2

Z3

Then

E(Y +Z) = E(Y ) + E(Z)

I Variance-covariance Matrix of random vector

Var(Y ) = E[(Y − E(Y )) (Y − E(Y ))T

]=

Var(Y1) Cov(Y1, Y2) Cov(Y1, Y3)Cov(Y2, Y1) Var(Y2) Cov(Y2, Y3)Cov(Y3, Y1) Cov(Y3, Y2) Var(Y3)

I In simple linear regression model, errors are uncorrelated, soVar(E) = σ2I.

For example consider n = 3.

Var(E) =

σ2 0 00 σ2 00 0 σ2

Var(Y ) = E[(Y − E(Y )) (Y − E(Y ))T

]=

Var(E) =

σ2 0 00 σ2 00 0 σ2

Var(Y ) = E[(Y − E(Y )) (Y − E(Y ))T

]=

Var(E) =

σ2 0 00 σ2 00 0 σ2

Some basic facts

I If we have a random vector W equal to a random vectorY multiplied by a constant matrix A

W = AY

we have

E(W ) = AE(Y )

Var(W ) = Var(AY ) = AVar(Y )AT

I If c is a constant vector, then

E(c+AY ) = c+AE(Y )

andVar(c+AY ) = Var(AY ) = AVarY AT

Some basic facts

W = AY

we have

E(W ) = AE(Y )

E(c+AY ) = c+AE(Y )

Some basic facts

W = AY

we have

E(W ) = AE(Y )

E(c+AY ) = c+AE(Y )

An illustration: Example

I Let W = AY be such that(W1W2

)=

(1 −11 1

)(Y1Y2

)

Then

E

[(W1W2

)]=

(1 −11 1

)(E(Y1)E(Y2)

)=

(E(Y1)− E(Y2)E(Y1) + E(Y2)

)I Also

Var

[(W1W2

)]=

(1 −11 1

)(Var(Y1) Cov(Y1, Y2)

Cov(Y2, Y1) Var(Y2)

)(1 1−1 1

)

)=

(1 −11 1

)(Y1Y2

)Then

E

[(W1W2

)]=

(1 −11 1

)(E(Y1)E(Y2)

)=

(E(Y1)− E(Y2)E(Y1) + E(Y2)

)

I Also

Var

[(W1W2

)]=

(1 −11 1

Cov(Y2, Y1) Var(Y2)

)(1 1−1 1

)

)=

(1 −11 1

)(Y1Y2

)Then

E

[(W1W2

)]=

(1 −11 1

)(E(Y1)E(Y2)

)=

(E(Y1)− E(Y2)E(Y1) + E(Y2)

)I Also

Var

[(W1W2

)]=

(1 −11 1

Cov(Y2, Y1) Var(Y2)

)(1 1−1 1

)

)=

(1 −11 1

)(Y1Y2

)Then

E

[(W1W2

)]=

(1 −11 1

)(E(Y1)E(Y2)

)=

(E(Y1)− E(Y2)E(Y1) + E(Y2)

)I Also

Var

[(W1W2

)]=

(1 −11 1

Cov(Y2, Y1) Var(Y2)

)(1 1−1 1

)

An illustration: Example 2

I In simple linear regression model, Y = Xβ + E , it follows fromabove

Var(Y ) = Var(Xβ + E)= Var(E)

=

σ2 0 0 . . . 0 00 σ2 0 . . . 0 00 0 σ2 . . . 0 0...

......

. . ....

...0 0 0 . . . σ2 00 0 0 . . . 0 σ2

= σ2I

An illustration: Example 2

I In simple linear regression model, Y = Xβ + E , it follows fromabove

Var(Y ) = Var(Xβ + E)= Var(E)

=

σ2 0 0 . . . 0 00 σ2 0 . . . 0 00 0 σ2 . . . 0 0...

......

. . ....

...0 0 0 . . . σ2 00 0 0 . . . 0 σ2

= σ2I

Simple linear regression model (matrix version)

The model

Y1 = β0 + β1X1 + ε1

Y2 = β0 + β1X2 + ε2...

...

with assumption

1 E(εi) = 0, i = 1, 2, . . . , n

2 Var(εi) = σ2, Cov(εi, εj) = 0 for all 1 ≤ i 6= j ≤ n.

3 εi ∼ N(0, σ2), i = 1, ..., n are independent

The model

Y1 = β0 + β1X1 + ε1

Y2 = β0 + β1X2 + ε2...

...

with assumption

1 E(εi) = 0, i = 1, 2, . . . , n

The model

Y1 = β0 + β1X1 + ε1

Y2 = β0 + β1X2 + ε2...

...

with assumption

1 E(εi) = 0, i = 1, 2, . . . , n

Recall, the model can be written as

Y = Xβ + E

Note that

E(E) = 0, Var(E) =

σ2 0 0 . . . 0 00 σ2 0 . . . 0 00 0 σ2 . . . 0 0...

......

. . ....

...0 0 0 . . . σ2 00 0 0 . . . 0 σ2

= σ2I

The assumptions can be rewritten as

1 E(E) = 02 Var(E) = σ2I3 E ∼ N(0, σ2I)

Y = Xβ + E

Note that

E(E) = 0, Var(E) =

σ2 0 0 . . . 0 00 σ2 0 . . . 0 00 0 σ2 . . . 0 0...

......

. . ....

...0 0 0 . . . σ2 00 0 0 . . . 0 σ2

= σ2I

1 E(E) = 02 Var(E) = σ2I3 E ∼ N(0, σ2I)

Y = Xβ + E

Note that

E(E) = 0, Var(E) =

σ2 0 0 . . . 0 00 σ2 0 . . . 0 00 0 σ2 . . . 0 0...

......

. . ....

...0 0 0 . . . σ2 00 0 0 . . . 0 σ2

= σ2I

1 E(E) = 02 Var(E) = σ2I3 E ∼ N(0, σ2I)

Thus the modelY = Xβ + E

is such thatE(Y ) = Xβ and Var(Y ) = σ2I

The model (with assumptions 1, 2, and 3.) can also be written as

Y ∼ N(Xβ, σ2I)

orY = Xβ + E , E ∼ N(0, σ2I)

Y ∼ N(Xβ, σ2I)

orY = Xβ + E , E ∼ N(0, σ2I)

Y ∼ N(Xβ, σ2I)

orY = Xβ + E , E ∼ N(0, σ2I)

Linear Dependence and Rank of Matrix

Consider the following matrix;

A =

1 2 5 12 2 10 63 4 15 1

Note that the third column vector is a multiple of the first columnvector. 510

15

= 5 12

3

We say that the columns of A are linearly dependent. If no vector inthe set can be so expressed, we define the set of vectors to be linearlyindependent.

A =

1 2 5 12 2 10 63 4 15 1

15

= 5 12

3

A =

1 2 5 12 2 10 63 4 15 1

15

= 5 12

3

A =

1 2 5 12 2 10 63 4 15 1

15

= 5 12

3

Linear Dependence and Rank of Matrix (Cont..)

Definition

When c scalars k1, ..., kc, not all zero, can be found such that:

klC1 + k2C2 + ...+ kcCc = 0

where 0 denotes the zero column vector, the c column vectors arelinearly dependent. If the only set of scalars for which the equalityholds is k1 = 0, ..., kc = 0, the set of c column vectors is linearlyindependent.

To illustrate for our example, k1 = 5, k2 = 0, k3 = −1, k4 = 0 leads to:

5

123

+ 0 22

4

− 1 510

15

+ 0 16

1

= 00

0

Hence, the column vectors are linearly dependent. Note that some ofthe kj equal zero here. For linear dependence, it is only required thatnot all kj be zero.

Definition

klC1 + k2C2 + ...+ kcCc = 0

5

123

+ 0 22

4

− 1 510

15

+ 0 16

1

= 00

0

Definition

klC1 + k2C2 + ...+ kcCc = 0

5

123

+ 0 22

4

− 1 510

15

+ 0 16

1

= 00

0

Definition

klC1 + k2C2 + ...+ kcCc = 0

5

123

+ 0 22

4

− 1 510

15

+ 0 16

1

= 00

0

Rank of Matrix

Definition (Rank of Matrix)

The rank of a matrix is defined to be the maximum number oflinearly independent columns in the matrix.

We know that the rank of A in our earlier example cannot be 4,since the four columns are linearly dependent.We can, however, find three columns (1, 2, and 4) which are linearlyindependent.There are no scalars k1, k2, k4 such that k1C1 + k2C2 + k4C4 = 0other than k1 = k2 = k4 = 0. Thus, the rank of Ain our example is 3.

The rank of a matrix is unique and can equivalently be defined as themaximum number of linearly independent rows. It follows that therank of an r × c matrix can not exceed min(r, c), the minimum of thetwo values r and c.

Rank of Matrix

We know that the rank of A in our earlier example cannot be 4,since the four columns are linearly dependent.

We can, however, find three columns (1, 2, and 4) which are linearlyindependent.There are no scalars k1, k2, k4 such that k1C1 + k2C2 + k4C4 = 0other than k1 = k2 = k4 = 0. Thus, the rank of Ain our example is 3.

Rank of Matrix

We know that the rank of A in our earlier example cannot be 4,since the four columns are linearly dependent.We can, however, find three columns (1, 2, and 4) which are linearlyindependent.

There are no scalars k1, k2, k4 such that k1C1 + k2C2 + k4C4 = 0other than k1 = k2 = k4 = 0. Thus, the rank of Ain our example is 3.

Rank of Matrix

Least Squares Estimation of Regression Parameters

As we have shown, the normal equations are,

n∑i=1

ei = 0 ≡ nβ̂0 + β̂1n∑i=1

Xi =

n∑i=1

Yi

n∑i=1

Xiei = 0 ≡ β̂0n∑i=1

Xi + β̂1

n∑i=1

X2i =

n∑i=1

XiYi

in matrix notation, the normal equations are:

XTX︸︷︷︸2×2

β̂︸︷︷︸2×1

= XTY︸︷︷︸2×1

(2)

where β̂ is the vector of the least squares regression coefficients:

β̂ =

(β̂0β̂1

)

n∑i=1

ei = 0 ≡ nβ̂0 + β̂1n∑i=1

Xi =

n∑i=1

Yi

n∑i=1

Xiei = 0 ≡ β̂0n∑i=1

Xi + β̂1

n∑i=1

X2i =

n∑i=1

XiYi

XTX︸︷︷︸2×2

β̂︸︷︷︸2×1

= XTY︸︷︷︸2×1

(2)

β̂ =

(β̂0β̂1

)

n∑i=1

ei = 0 ≡ nβ̂0 + β̂1n∑i=1

Xi =

n∑i=1

Yi

n∑i=1

Xiei = 0 ≡ β̂0n∑i=1

Xi + β̂1

n∑i=1

X2i =

n∑i=1

XiYi

XTX︸︷︷︸2×2

β̂︸︷︷︸2×1

= XTY︸︷︷︸2×1

(2)

β̂ =

(β̂0β̂1

n∑i=1

ei = 0 ≡ nβ̂0 + β̂1n∑i=1

Xi =

n∑i=1

Yi

n∑i=1

Xiei = 0 ≡ β̂0n∑i=1

Xi + β̂1

n∑i=1

X2i =

n∑i=1

XiYi

XTX︸︷︷︸2×2

β̂︸︷︷︸2×1

= XTY︸︷︷︸2×1

(2)

β̂ =

(β̂0β̂1

To see this, recall that we obtained

XTX =

(n

∑ni=1Xi∑n

i=1Xi∑n

i=1X2i

), XTY =

( ∑ni=1 Yi∑n

i=1XiYi

)

Equation (2) thus states:(n

∑ni=1Xi∑n

i=1Xi∑n

i=1X2i

)(β̂0β̂1

)=

( ∑ni=1 Yi∑n

i=1XiYi

)2×2 2×1 2×1

or equivalently:(nβ̂0 + β̂1

∑ni=1Xi

β̂0∑n

i=1Xi + β̂1∑n

i=1X2i

)=

( ∑ni=1 Yi∑n

i=1XiYi

)2×1 2×1

These are precisely the normal equations we derived before.

XTX =

(n

∑ni=1Xi∑n

i=1Xi∑n

i=1X2i

), XTY =

( ∑ni=1 Yi∑n

i=1XiYi

)Equation (2) thus states:(

n∑n

i=1Xi∑ni=1Xi

∑ni=1X

2i

)(β̂0β̂1

)=

( ∑ni=1 Yi∑n

i=1XiYi

)2×2 2×1 2×1

∑ni=1Xi

β̂0∑n

i=1Xi + β̂1∑n

i=1X2i

)=

( ∑ni=1 Yi∑n

i=1XiYi

)2×1 2×1

These are precisely the normal equations we derived before.

XTX =

(n

∑ni=1Xi∑n

i=1Xi∑n

i=1X2i

), XTY =

( ∑ni=1 Yi∑n

i=1XiYi

n∑n

i=1Xi∑ni=1Xi

∑ni=1X

2i

)(β̂0β̂1

)=

( ∑ni=1 Yi∑n

i=1XiYi

)2×2 2×1 2×1

∑ni=1Xi

β̂0∑n

i=1Xi + β̂1∑n

i=1X2i

)=

( ∑ni=1 Yi∑n

i=1XiYi

)2×1 2×1

These are precisely the normal equations we derived before.Dr. Bisher M. Iqelan (Department of Math.)3: Simple Linear Regression (Matrix Version) 2010-2011, Semester 2 33 / 77

XTX =

(n

∑ni=1Xi∑n

i=1Xi∑n

i=1X2i

), XTY =

( ∑ni=1 Yi∑n

i=1XiYi

n∑n

i=1Xi∑ni=1Xi

∑ni=1X

2i

)(β̂0β̂1

)=

( ∑ni=1 Yi∑n

i=1XiYi

)2×2 2×1 2×1

∑ni=1Xi

β̂0∑n

i=1Xi + β̂1∑n

i=1X2i

)=

( ∑ni=1 Yi∑n

i=1XiYi

)2×1 2×1

These are precisely the normal equations we derived before.Dr. Bisher M. Iqelan (Department of Math.)3: Simple Linear Regression (Matrix Version) 2010-2011, Semester 2 33 / 77

Further, let X̄ = 1n∑n

i=1Xi, Ȳ =1n

∑ni=1 Yi,

and

sxx =n∑i=1

(Xi − X̄)2 =n∑i=1

X2i − nX̄2

=n∑i=1

X2i −(∑n

i=1Xi)2

n

sxy =n∑i=1

(Xi − X̄)(Yi − Ȳ ) =n∑i=1

XiYi − nX̄Ȳ

=

n∑i=1

XiYi −(∑n

i=1Xi) (∑n

i=1 Yi)

n.

I sxx is the corrected sum of squares of the X-values.

I sxy is the corrected sum of products of the X- and Y -values.

i=1Xi, Ȳ =1n

∑ni=1 Yi, and

sxx =

n∑i=1

(Xi − X̄)2 =n∑i=1

X2i − nX̄2

=

n∑i=1

X2i −(∑n

i=1Xi)2

n

sxy =n∑i=1

(Xi − X̄)(Yi − Ȳ ) =n∑i=1

XiYi − nX̄Ȳ

=n∑i=1

XiYi −(∑n

i=1Xi) (∑n

i=1 Yi)

n.

i=1Xi, Ȳ =1n

∑ni=1 Yi, and

sxx =

n∑i=1

(Xi − X̄)2 =n∑i=1

X2i − nX̄2

=

n∑i=1

X2i −(∑n

i=1Xi)2

n

sxy =

n∑i=1

(Xi − X̄)(Yi − Ȳ ) =n∑i=1

XiYi − nX̄Ȳ

=n∑i=1

XiYi −(∑n

i=1Xi) (∑n

i=1 Yi)

n.

i=1Xi, Ȳ =1n

∑ni=1 Yi, and

sxx =

n∑i=1

(Xi − X̄)2 =n∑i=1

X2i − nX̄2

=

n∑i=1

X2i −(∑n

i=1Xi)2

n

sxy =

n∑i=1

(Xi − X̄)(Yi − Ȳ ) =n∑i=1

XiYi − nX̄Ȳ

=n∑i=1

XiYi −(∑n

i=1Xi) (∑n

i=1 Yi)

n.

i=1Xi, Ȳ =1n

∑ni=1 Yi, and

sxx =

n∑i=1

(Xi − X̄)2 =n∑i=1

X2i − nX̄2

=

n∑i=1

X2i −(∑n

i=1Xi)2

n

sxy =

n∑i=1

(Xi − X̄)(Yi − Ȳ ) =n∑i=1

XiYi − nX̄Ȳ

=n∑i=1

XiYi −(∑n

i=1Xi) (∑n

i=1 Yi)

n.

i=1Xi, Ȳ =1n

∑ni=1 Yi, and

sxx =

n∑i=1

(Xi − X̄)2 =n∑i=1

X2i − nX̄2

=

n∑i=1

X2i −(∑n

i=1Xi)2

n

sxy =

n∑i=1

(Xi − X̄)(Yi − Ȳ ) =n∑i=1

XiYi − nX̄Ȳ

=n∑i=1

XiYi −(∑n

i=1Xi) (∑n

i=1 Yi)

n.

With this notation we can write

XTX =

(n

∑ni=1Xi∑n

i=1Xi∑n

i=1X2i

)=

(n nX̄nX̄ sxx + nX̄

2

)

XTY =

( ∑ni=1 Yi∑n

i=1XiYi

)

=

(nȲ

sxy + nX̄Ȳ

)Hence

(XTX)−1 =1

nsxx

(sxx + nX̄

2 −nX̄−nX̄ n

)

=

1

n+X̄2

sxx− X̄sxx

− X̄sxx

1

sxx

XTX =

(n

∑ni=1Xi∑n

i=1Xi∑n

i=1X2i

)=

2

)XTY =

( ∑ni=1 Yi∑n

i=1XiYi

)=

(nȲ

sxy + nX̄Ȳ

)

Hence

(XTX)−1 =1

nsxx

(sxx + nX̄

2 −nX̄−nX̄ n

)

=

1

n+X̄2

sxx− X̄sxx

− X̄sxx

1

sxx

XTX =

(n

∑ni=1Xi∑n

i=1Xi∑n

i=1X2i

)=

2

)XTY =

( ∑ni=1 Yi∑n

i=1XiYi

)=

(nȲ

sxy + nX̄Ȳ

)Hence

(XTX)−1 =1

nsxx

(sxx + nX̄

2 −nX̄−nX̄ n

)

=

1

n+X̄2

sxx− X̄sxx

− X̄sxx

1

sxx

Then, assuming that rank(X) = 2 (i.e., that the Xi’s are not allequal), we get

β̂ =

(β̂0β̂1

)= (XTX)−1XTY (3)

=

1

n+X̄2

sxx− X̄sxx

− X̄sxx

1

sxx

( nȲsxy + nX̄Ȳ)

=

Ȳ − sxysxx X̄sxysxx

i.e.

β̂0 = Ȳ − β̂1X̄ and β̂1 =sxysxx

Estimated Regression Coefficients: An example

Example 1: In some study, consider the following information:

Y =

16510151322

; X =

1 41 11 21 31 31 4

Now, let us do the required calculations:

XTX =

(6 1717 55

); XTY =

(81261

)Finally,

β̂ =

(6 1717 55

)−1(81261

)=

(0.4394.610

)Hence, β̂0=0.439 and β̂1=4.610.

Y =

16510151322

; X =

1 41 11 21 31 31 4

XTX =

(6 1717 55

); XTY =

(81261

)Finally,

β̂ =

(6 1717 55

)−1(81261

)=

(0.4394.610

)Hence, β̂0=0.439 and β̂1=4.610.

Y =

16510151322

; X =

1 41 11 21 31 31 4

XTX =

(6 1717 55

); XTY =

(81261

)Finally,

β̂ =

(6 1717 55

)−1(81261

)=

(0.4394.610

)

Hence, β̂0=0.439 and β̂1=4.610.

Y =

16510151322

; X =

1 41 11 21 31 31 4

XTX =

(6 1717 55

); XTY =

(81261

)Finally,

β̂ =

(6 1717 55

)−1(81261

)=

(0.4394.610

)Hence, β̂0=0.439 and β̂1=4.610.

Y =

16510151322

; X =

1 41 11 21 31 31 4

XTX =

(6 1717 55

); XTY =

(81261

)Finally,

β̂ =

(6 1717 55

)−1(81261

)=

(0.4394.610

)Hence, β̂0=0.439 and β̂1=4.610.

Example 2: For the ozone data used before;

Y =

242237231201

; X =

1 0.021 0.071 0.111 0.15

give

XTX =

(4 .3500

.3500 .0399

), XTY =

(911

76.99

)and, then

(XTX)−1 =

(1.07547 −9.43396−9.43396 107.81671

)Hence, the estimates of the regression coefficients are

β̂ = (XTX)−1XTY =

(253.434−293.531

)

Y =

242237231201

; X =

1 0.021 0.071 0.111 0.15

give

XTX =

(4 .3500

.3500 .0399

), XTY =

(911

76.99

)and, then

(XTX)−1 =

(1.07547 −9.43396−9.43396 107.81671

(253.434−293.531

)

Y =

242237231201

; X =

1 0.021 0.071 0.111 0.15

give

XTX =

(4 .3500

.3500 .0399

), XTY =

(911

76.99

)and, then

(XTX)−1 =

(1.07547 −9.43396−9.43396 107.81671

(253.434−293.531

)

Y =

242237231201

; X =

1 0.021 0.071 0.111 0.15

give

XTX =

(4 .3500

.3500 .0399

), XTY =

(911

76.99

)and, then

(XTX)−1 =

(1.07547 −9.43396−9.43396 107.81671

(253.434−293.531

Y =

242237231201

; X =

1 0.021 0.071 0.111 0.15

give

XTX =

(4 .3500

.3500 .0399

), XTY =

(911

76.99

)and, then

(XTX)−1 =

(1.07547 −9.43396−9.43396 107.81671

(253.434−293.531

Properties of β̂

β̂ has the following properties when XTX is invertible.I E(β̂) = β (i.e., β̂ is unbiased).

I Var(β̂) = σ2(XTX)−1

Properties of β̂

β̂ has the following properties when XTX is invertible.I E(β̂) = β (i.e., β̂ is unbiased).I Var(β̂) = σ2(XTX)−1

E(β̂) = E((XTX)−1XTY ) = (XTX)−1XTE(Y )

= (XTX)−1XTXβ = (XTX)−1(XTX)β = β

Properties of β̂

Var(β̂) = Var((XTX)−1XTY )

= (XTX)−1XTVar(Y )((XTX)−1XT )T

= (XTX)−1XT (σ2I)((XTX)−1XT )T

= σ2(XTX)−1XT ((XTX)−1XT )T

= σ2(XTX)−1XTX(XTX)−1 = σ2(XTX)−1

Properties of β̂

Under normality β̂ ∼ N2(β, σ2(XTX)−1

)

Quiz

Use Matrix notation for simple linear model to find Var(β̂0),Var(β̂1)and Cov(β̂0, β̂1)

Solution:We have shown that:

Var(β̂) = σ2(XTX)−1 = σ2

1

n+X̄2

sxx− X̄sxx

− X̄sxx

1

sxx

Hence,

Var(β̂0) = σ2

(1

n+X̄2

sxx

)Var(β̂1) = σ

2 1

sxx=

σ2

sxx

Cov(β̂0, β̂1) = −X̄

sxxσ2

Quiz

Var(β̂) = σ2(XTX)−1 = σ2

1

n+X̄2

sxx− X̄sxx

− X̄sxx

1

sxx

Hence,

Var(β̂0) = σ2

(1

n+X̄2

sxx

)Var(β̂1) = σ

2 1

sxx=

σ2

sxx

Cov(β̂0, β̂1) = −X̄

sxxσ2

Quiz

Var(β̂) = σ2(XTX)−1 = σ2

1

n+X̄2

sxx− X̄sxx

− X̄sxx

1

sxx

Hence,

Var(β̂0) = σ2

(1

n+X̄2

sxx

)Var(β̂1) = σ

2 1

sxx=

σ2

sxx

Cov(β̂0, β̂1) = −X̄

sxxσ2

Quiz

Var(β̂) = σ2(XTX)−1 = σ2

1

n+X̄2

sxx− X̄sxx

− X̄sxx

1

sxx

Hence,

Var(β̂0) = σ2

(1

n+X̄2

sxx

)Var(β̂1) = σ

2 1

sxx=

σ2

sxx

Cov(β̂0, β̂1) = −X̄

sxxσ2

The Ŷ and Residuals Vectors

Let the vector of the fitted values Yi be denoted by Ŷ

Ŷ =

Ŷ1Ŷ2...

Ŷn

In matrix notation, we then have:

Ŷ = X β̂ (4)n×1 n×2 2×1

because:

Ŷ =

Ŷ1Ŷ2...

Ŷn

=

1 X11 X2...

...1 Xn

(β̂0β̂1

)=

β̂0 + β̂1X1β̂0 + β̂1X2

...

β̂0 + β̂1Xn

Ŷ =

Ŷ1Ŷ2...

Ŷn

Ŷ = X β̂ (4)n×1 n×2 2×1

because:

Ŷ =

Ŷ1Ŷ2...

Ŷn

=

1 X11 X2...

...1 Xn

(β̂0β̂1

)=

β̂0 + β̂1X1β̂0 + β̂1X2

...

β̂0 + β̂1Xn

Ŷ =

Ŷ1Ŷ2...

Ŷn

Ŷ = X β̂ (4)n×1 n×2 2×1

because:

Ŷ =

Ŷ1Ŷ2...

Ŷn

=

1 X11 X2...

...1 Xn

(β̂0β̂1

)=

β̂0 + β̂1X1β̂0 + β̂1X2

...

β̂0 + β̂1Xn

Ŷ =

Ŷ1Ŷ2...

Ŷn

Ŷ = X β̂ (4)n×1 n×2 2×1

because:

Ŷ =

Ŷ1Ŷ2...

Ŷn

=

1 X11 X2...

...1 Xn

(β̂0β̂1

)=

β̂0 + β̂1X1β̂0 + β̂1X2

...

β̂0 + β̂1Xn

Hat Matrix

We can express the matrix result for Ŷ in (4) as follows by using theexpression for β̂ (3):

Ŷ = Xβ̂ = X(XTX)−1XTY

or, equivalently:

Ŷ = H Y (5)n×1 n×n n×1

i.e., H puts a hat on Y !where

H = X(XTX

)−1XT

The square n× n matrix H is called the hat matrix and plays animportant role in the theory of Linear Models. It is clear that Hinvolves only the observations on the predictor variable X.We see from (5) that the fitted values Ŷi can be expressed as linearcombinations of the response variable observations Yi, with thecoefficients being elements of the matrix H.

Hat Matrix

or, equivalently:

Ŷ = H Y (5)n×1 n×n n×1

H = X(XTX

)−1XT

Hat Matrix

or, equivalently:

Ŷ = H Y (5)n×1 n×n n×1

i.e., H puts a hat on Y !

where

H = X(XTX

)−1XT

Hat Matrix

or, equivalently:

Ŷ = H Y (5)n×1 n×n n×1

H = X(XTX

)−1XT

The square n× n matrix H is called the hat matrix and plays animportant role in the theory of Linear Models. It is clear that Hinvolves only the observations on the predictor variable X.

We see from (5) that the fitted values Ŷi can be expressed as linearcombinations of the response variable observations Yi, with thecoefficients being elements of the matrix H.

Hat Matrix

or, equivalently:

Ŷ = H Y (5)n×1 n×n n×1

H = X(XTX

)−1XT

Hat Matrix

or, equivalently:

Ŷ = H Y (5)n×1 n×n n×1

H = X(XTX

)−1XT

Hat Matrix: Example

For the ozone data used in Example 2;

H =

1 0.021 0.071 0.111 0.15

( 1.0755 −9.4340−9.4340 107.8167)(

1 1 1 10.02 0.07 0.11 0.15

)

=

.741240 .377358 .086253 −.204852.377358 .283019 .207547 .132075.086253 .207547 .304582 .401617−.204852 .132075 .401617 .671159

Thus, for example,

Ŷ1 = .741Y1 + .377Y2 + .086Y3 − .205Y4.

Hat Matrix: Example

H =

1 0.021 0.071 0.111 0.15

( 1.0755 −9.4340−9.4340 107.8167)(

1 1 1 10.02 0.07 0.11 0.15

)

=

.741240 .377358 .086253 −.204852.377358 .283019 .207547 .132075.086253 .207547 .304582 .401617−.204852 .132075 .401617 .671159

Thus, for example,

Ŷ1 = .741Y1 + .377Y2 + .086Y3 − .205Y4.

Hat Matrix: Example

H =

1 0.021 0.071 0.111 0.15

( 1.0755 −9.4340−9.4340 107.8167)(

1 1 1 10.02 0.07 0.11 0.15

)

=

.741240 .377358 .086253 −.204852.377358 .283019 .207547 .132075.086253 .207547 .304582 .401617−.204852 .132075 .401617 .671159

Thus, for example,

Ŷ1 = .741Y1 + .377Y2 + .086Y3 − .205Y4.

Properties of the Hat matrix

I H is symmetric and idempotent (the latter means H2 = H).

I I −H is symmetric and idempotent.

I HX = X

I (I −H)X = 0

I HX = X

I (I −H)X = 0

I HX = X

I (I −H)X = 0

I HX = X

I (I −H)X = 0

I HX = X

I (I −H)X = 0

Properties of the fitted vector

Vector of fitted values: Ŷ = Xβ̂.

I EŶ = XβI VarŶ = σ2H

E(Ŷ ) = E(HY ) = HE(Y ) = HXβ = Xβ.

The variancecovariance matrix of Ŷ can be derived using either therelationship Ŷ = Xβ̂ or Ŷ = HY . Applying the rules for variancesof linear functions to the first relationship gives

Var(Ŷ ) = X[Var(β̂)

]XT

= X[σ2(XTX)−1

]XT

= X(XTX)−1XTσ2

= Hσ2

The derivation using the second relationship gives

Var(Ŷ ) = H [Var(Y )]HT = HHTσ2 = Hσ2.

When E is normally distributed, Ŷ ∼ Nn(Xβ,Hσ2

).

Vector of fitted values: Ŷ = Xβ̂.I EŶ = Xβ

I VarŶ = σ2H

E(Ŷ ) = E(HY ) = HE(Y ) = HXβ = Xβ.

]XT

= X[σ2(XTX)−1

]XT

= X(XTX)−1XTσ2

= Hσ2

).

Vector of fitted values: Ŷ = Xβ̂.I EŶ = XβI VarŶ = σ2H

E(Ŷ ) = E(HY ) = HE(Y ) = HXβ = Xβ.

]XT

= X[σ2(XTX)−1

]XT

= X(XTX)−1XTσ2

= Hσ2

).

E(Ŷ ) = E(HY ) = HE(Y ) = HXβ = Xβ.

]XT

= X[σ2(XTX)−1

]XT

= X(XTX)−1XTσ2

= Hσ2

).

E(Ŷ ) = E(HY ) = HE(Y ) = HXβ = Xβ.

The variance

site.iugaza.edu.pssite.iugaza.edu.ps/biqelan/files/2011/04/reglec31.pdf · overview the yield y iof...

Documents