lecture 10a: matrix algebra. matrices: an array of elements vectors column vector row vector square...

Lecture 10A:Matrix Algebra

a=0@1213471Ab=(20521)C=0@3122541121AD=0@0134291AMatrices: An array of elements

Vectors

Column vector Row vector

Square matrix

Dimensionality of a matrix: r x c (rows x columns) think of Railroad Car

(3 x 1)(1 x 4)

(3 x 3) (3 x 2)

A matrix is defined by a list of its elements. B has ij-th element Bij -- the element in row iand column j

Usually written in bold lowercase

General Matrices Usually written in bold uppercase

A=µ3012∂andB=µ1221∂( () )C=A+B=µ4233∂andD=A°B=µ2°2°11∂( ( )) - --

Addition and Subtraction of MatricesIf two matrices have the same dimension (same numberof rows and columns), then matrix addition and subtractionsimply follows by adding (or subtracting) on an element byelement basis

Matrix addition: (A+B)ij = A ij + B ij

Matrix subtraction: (A-B)ij = A ij - B ij

Examples:

C=0@[email protected]¢¢¢¢¢¢¢¢¢¢¢¢2...541...121CCCCA=µabdB∂( )… ……a=(3);b=(12);d=µ21∂;B=µ5412∂( () )

Partitioned MatricesIt will often prove useful to divide (or partition) the elements of a matrix into a matrix whose elements areitself matrices.

One can partition a matrix in many ways• A row vector whose elements are column vectorsC=(c1c2c3)wherec1=0@3211A;c2=0@1511A;c3=0@2421AA column vector whose elements are row vectorsC=0@b1b2b31Ab1=(312);b2=(254);b3=(112)

a¢b=nXi=1aibi.a=0B@12341CAandb=(4579)Towards Matrix Multiplication: dot products

The dot (or inner) product of two vectors (both oflength n) is defined as follows:

Example:

a.b = 1*4 + 2*5 + 3*7 + 4*9 = 60

Towards Matrix Multiplication: Systems of Equations

Matrix multiplication is defined in such a way so asto compactly represent and solve linear systems ofequations.æ(y;z1)=Ø1æ2(z1)+Ø2æ(z1;z2)+¢¢¢+Ønæ(z1;zn)æ(y;z2)=Ø1æ(z1;z2)+Ø2æ2(z2)+¢¢¢+Ønæ(z2;zn)...............æ(y;zn)=Ø1æ(z1;zn)+Ø2æ(z2;zn)+¢¢¢+Ønæ2(zn)y=π+Ø1x1+¢¢¢Ønxn…The least-squares solution for the linear model

yields the following system of equations for the i

This can be more compactly written in matrix form as 0BB@æ2(z1)æ(z1;z2):::æ(z1;zn)æ(z1;z2)æ2(z2):::æ(z2;zn)............æ(z1;zn)æ(z2;zn):::æ2(zn)1CCA0BB@Ø1Ø2...Øn1CCA=0BB@æ(y;z1)æ(y;z2)...æ(y;zn)1CCA

Matrix Multiplication: Quick overview

The order in which matrices are multiplied effectsthe matrix product, e.g. AB = BA

For the product of two matrices to exist, the matricesmust conform. For AB, the number of columns of A mustequal the number of rows of B.

The matrix C = AB has the same number of rows as Aand the same number of columns as B.

C(rxc) = A(rxk) B(kxc)

Inner indices must matchcolumns of A = rows of B

Outer indices given dimensions ofresulting matrix, with r rows (A)and c columns (B)

ij-th element of C is given by Cij=kXl=1AilBljElements in the ithrow of matrix A

Elements in thejth column of B

More formally, consider the product L = [email protected]=(Mi1Mi2¢¢¢Mic)…

Express the matrix M as a column vector of row vectorsN=(n1n2¢¢¢nb)[email protected]…

Likewise express N as a row vector ofcolumn vectors

The ij-th element of L is the inner productof M’s row i with N’s column j L=BB@m1¢n1m1¢n2¢¢¢m1¢nbm2¢n1m2¢n2¢¢¢m2¢nb............mr¢n1mr¢n2¢¢¢mr¢nbCCA01…

…

…

..

.

.

.

.

.

.

.

AB=µabcd∂µefgh∂=µae+bgaf+bhce+dgcf+dh∂( ( () ) )BA=µae+cfeb+dfga+chgd+dh∂( )

Examples

The Transpose of a Matrix

The transpose of a matrix exchanges the rows and columns, AT

ij = Aji

Useful identities(AB)T = BT AT

(ABC)T = CT BT AT [email protected][email protected] product = abT = a(n X 1) bT(1 X n)

Indices match, matrices conformDimension of resulting productis 1 X 1 (i.e. a scalar)

(a1¢¢¢an)[email protected]=aTb=nXi=1aibi… Note thatbTa = (aTb)T = aTb

Outer product = aTb = aT (n X 1) b (1 X n)

Resulting product is an nxn matrix

[email protected](b1¢¢¢bn)=abT=0BB@a1b1a1b2¢¢¢a1bna2b1a2b2¢¢¢a2bn............anb1anb2¢¢¢anbn1CCA…

The Identity Matrix, IThe identity matrix serves the role of thenumber 1 in matrix multiplication: AI =A, IA = A

I is adiagonal matrix, with all diagonal elementsbeing one, all off-diagonal elements zero.

Iij =1 for i = j

0 otherwiseI3x3=0@1000100011A

The Inverse Matrix, A-1A=µabcd∂( )For

For a square matrix A, define the Inverse of A, A-1, asthe matrix satisfyingA-1A=AA-1=IA°1=1ad°bcµd°b°ca∂

If this quantity (the determinant)is zero, the inverse does not exist.

If det(A) is not zero, A-1 exists and A is said to benon-singular. If det(A) = 0, A is singular, and nounique inverse exists (generalized inverses do)

Useful identities

(AB)-1 = B-1 A-1 (AT)-1 = (A-1)T

Key use of inverses --- solving systems of equations

Ax = c

Matrix of known constantsVector of known constantsVector of unknowns we wish to solve forA-1 Ax = A-1 c. Note that A-1 Ax = Ix = xHence, x = A-1 c0BB@æ2(z1)æ(z1;z2):::æ(z1;zn)æ(z1;z2)æ2(z2):::æ(z2;zn)............æ(z1;zn)æ(z2;zn):::æ2(zn)1CCA0BB@Ø1Ø2...Øn1CCA=0BB@æ(y;z1)æ(y;z2)...æ(y;zn)1CCA

V= c ---> = V-1 c

Example: solve the OLS for in y = + 1x1 + 2x2 + ec=0@æ(y;z1)æ(y;z2)1AV=0@æ2(z1)æ(z1;z2)æ(z1;z2)æ2(z2)1A = V-1 c æ(z1;z2)=Ω12æ(z1)æ(z2)It is more compact to useV°1=1æ2(z1)æ2(z2)(1°Ω212)0@æ2(z2)°æ(z1;z2)°æ(z1;z2)æ2(z1)1A-

-

--0@Ø1Ø21A=1æ2(z1)æ2(z2)(1°Ω212)0@æ2(z2)°æ(z1;z2)°æ(z1;z2)æ2(z1)1A0@æ(y;z1)æ(y;z2)1A- -

-

Ø1=11°Ω212∑æ(y;z1)æ2(z1)°Ω12æ(y;z2)æ(z1)æ(z2)∏Ø2=11°Ω212∑æ(y;z2)æ2(z2)°Ω12æ(y;z1)æ(z1)æ(z2)∏-

-

-

-

(

( )

)

If 12 = 0, these reduce to the two univariate slopes,Ø1=æ(y;z1)æ2(z1)andØ2=æ(y;z2)æ2(z2)

Quadratic and Bilinear Forms

Quadratic product: for An x n and xn x 1 xTAx=nXi=1nXj=1aijxixjScalar (1 x 1)

Bilinear Form (generalization of quadratic product) for Am x n, an x 1, bm x1 their bilinear form is bT

1 x m Am x n an x 1

Note that bTA a = aTAT b

bTAa=mXi=1nXj=1Aijbiaj

Covariance Matrices for Transformed Variablesæ2°cTx¢=æ2√nXi=1cixi!=æ0@nXi=1cixi;nXj=1cjxj1A=nXi=1nXj=1æ(cixi;cjxj)=nXi=1nXj=1cicjæ(xi;xj)=cTVc( ) (

What is the variance of the linear combination, c1x1 + c2x2 + … + cnxn ? (note this is a scalar)

Likewise, the covariance between two linear combinationscan be expressed as a bilinear form,æ(aTx;bTx)=aTVb

Now suppose we transform one vector of randomvariables into another vector of random variables

Transform x into (i) yk x 1 = Ak x n xn x 1

(ii) zm x 1 = Bm x n xn x 1

The covariance between the elements of thesetwo transformed vectors of the origin is ak x m covariance matrix = AVBT

For example, the covariance between yi and yj

is given by the ij-th element of AVAT

Likewise, the covariance between yi and zj

is given by the ij-th element of AVBT

The Multivariate Normal Distribution (MVN)p(x)=nYi=1(2º)°1=2æ°1iexpµ°(xi°πi)22æ2i∂=(2º)°n=2√nYi=1æi!°1exp√°nXi=1(xi°πi)22æ2i!-

- ---

-

- -( (( )

Consider the pdf for n independent normalrandom variables, the ith of which has meani and variance 2

i

This can be expressed more compactly in matrix form

Define the covariance matrix V for the vector x of the n normal random variable byV=0BB@æ210¢¢¢00æ22¢¢¢0............0¢¢¢¢¢¢æ2n1CCA…

…

Fun fact! For a diagonal matrix D, theDet = | D | = productof the diagonal elements

jVj=nYi=1æ2iDefine the mean vector by π=0BB@π1π2...πn1CCAnXi=1(xi°πi)2æ2i=(x°π)TV°1(x°π)- - --

Hence in matrix from the MVN pdf becomesp(x)=(2º)°n=2jVj°1=2exp∑°12(x°π)TV°1(x°π)∏- - -- --( )Now notice that this holds for any vector and symmetric(positive-definite) matrix V (pd -> aTVa = Var(aTx) > 0 for all nonzero a)

Properties of the MVN - I

1) If x is MVN, any subset of the variables in x is also MVN

2) If x is MVN, any linear combination of the elements of x is also MVN. If x is MVN(,V) fory=x+a;yisMVNn(π+a;V)fory=aTx=nXk=1aixi;yisN(aTπ;aTVa)fory=Ax;yisMVNm≥Aπ;ATVA¥( )

Properties of the MVN - II

3) Conditional distributions are also MVN. Partition xinto two components, x1 (m dimensional column vector)and x2 ( n-m dimensional column vector)x=µx1x2∂( )π=µπ1π2∂andV=0@Vx1x1Vx1x2VTx1x2Vx2x21A( )

x1 | x2 is MVN with m-dimensional mean vectorπx1jx2=π1+Vx1x2V°1x2x2(x2°π2)--

and m x m covariance matrix

--Vx1jx2=Vx1x1°Vx1x2V°1x2x2VTx1x2

Properties of the MVN - III

4) If x is MVN, the regression of any subset of X on another subset is linear and homoscedastic x1=πx1x2+ej=π1+Vx1x2V°1x2x2(x2°π2)+e--

Where e is MVN with mean vector 0 andVariance-covariance matrix Vx1jx2

The regression is linear because it is a linear functionof x2

linear

The regression is homoscedastic because the varianceis a constant that does not change with the particularvalue of the random vector x2

homoscedastic

Example: Regression of Offspring value on Parental values

Assume the vector of offspring value and the values ofboth its parents is MVN. Then from the correlationsamong (outbred) relatives,

0@zozszd1AªMVN240@πoπsπd1A;æ2z0@1h2=2h2=2h2=210h2=2011A35is

Let x1=(zo);x2=µzszd∂( )Vx1;x1=æ2z;Vx1;x2=h2æ2z2(11);Vx2;x2=æ2zµ1001∂( )Hence, -zo=πo+h2æ2z2(11)æ°2zµ1001∂µzs°πszd°πd∂+e=πo+h22(zs°πs)+h22(zd°πd)+e-( )( )

- -

-=π1+Vx1x2V°1x2x2(x2°π2)+e--Where e is normal with mean zero and varianceæ2e=æ2z°h2æ2z2(11)æ°2zµ1001∂h2æ2z2µ11∂=æ2zµ1°h42∂((

( )-

- ))-

Hence, the regression of offspring trait value giventhe trait values of its parents is

zo = o + h2/2(zs- s) + h2/2(zd- d) + e

where the residual e is normal with mean zero andVar(e) = z

2(1-h4/2)

Similar logic gives the regression of offspring breedingvalue on parental breeding value as

Ao = o + (As- s)/2+ (Ad- d)/2 + e = As/2+ Ad/2 + e

where the residual e is normal with mean zero andVar(e) = A

2/2

Lecture 10b:Linear Models

Quick Review of the Major Points

The general linear model can be written as

y = X + eVector of observed dependent valuesDesign matrix --- observations of the variables in the assumed linear model

Vector of unknown parents to estimateVector of residuals. Usually assumed MVN with meanzeroIf the residuals are homoscedastic and uncorrelated,

so that we can write the cov matrix of e as Cov(e) = 2Ithen use the OLS estimate, OLS((XTX)-1 XTy

If the residuals are heteroscedastic and/or dependent,Cov(e) = V and we must use GLS solution for ,GLS() = (XT V-1 X)-1 V-1 XTy

y=π+Ø1x1+Ø2x2+¢¢¢+Ønxx+e…

Linear ModelsOne tries to explain a dependent variable y as a linearfunction of a number of independent (or predictor)variables.

A multiple regression is a typical linear model,

Here e is the residual, or deviation between the truevalue observed and the value predicted by the linearmodel.

As with a univariate regression (y =a + bx + e), the modelparameters are typically chosen by least squares,wherein they are chosen to minimize the sum ofsquared residuals, ei

2

The (partial) regression coefficients are interpretedas follows. A unit change in xi while holding allother variables constant results in a change of i in y

Predictor and Indicator VariablesIn a regression, the predictor variables aretypically continuous, although they need not be.

Suppose we measuring the offspring of p sires. One linear model would be

yij = + si + eij

Trait value of offspring j from sire iOverall mean. This term in included to give the si termsa mean value of zero, i.e., they are expressed as deviationsfrom the mean

The effect for sire i (the mean of its offspring). Recallthat variance in the si estimates Cov(half sibs) = VA/2

The deviation of the jth offspring from the familymean of sire i. The variance of the e’s estimates thewithin-family variance.

Note that the predictor variables here are the si, somethingthat we are trying to estimate

We can write this in linear model form by using indicatorvariables,xik=Ω1ifsirek=i0otherwiseGiving the above model asyij=π+pXk=1skxik+eij

Models consisting entirely of indicator variablesare typically called ANOVA, or analysis of variancemodels

Models that contain no indicator variables (other thanfor the mean), but rather consist of observed value ofcontinuous or discrete values are typically calledregression models

Both are special cases of the General Linear Model(or GLM)

yijk = + si + dij + xijk + eijk

ANOVA modelRegression modelExample: Nested half sib/full sib design with an age correction on the trait

Linear Models in Matrix FormSuppose we have 3 variables in a multiple regression,with four (y,x) vectors of observations.yi=π+Ø1xi1+Ø2xi2+Ø2xi2+ei

In matrix form, y=XØ+ey=0B@y1y2y3y41CAØ=0BπØ1Ø2Ø31C@Ae=0B@e1e2e3e41CAX=0B@1x11x12x131x21x22x231x31x32x331x41x42x431CAVector of parametersto estimate

The design matrix X. Details of both the experimental design and the observed values of the predictor variables all reside solely in X

The Sire Model in Matrix Form

The model here is yij = + si + eij

Consider three sires. Sire 1 has 2 offspring, sire 2 hasone and sire 3 has three. The GLM form isy=0BBBBBy11y12y21y31y32y331CCCCC;X=0BBBBB1100110010101001100110011CCCCC@A@AØ=0B@πs1s2s31CA;ande=0BBBBB@e11e12e21e31e32e331CCCCCA

Ordinary Least Squares (OLS)When the covariance structure of the residuals has acertain form, we solve for the vector using OLS

If the residuals are homoscedastic and uncorrelated,2(ei) = e

2, (ei,ej) = 0. Hence, each residual is equallyweighted, nXi=1be2i=beTbe=(y°Xb)T(y°Xb)- -Sum of squaredresiduals canbe written as

Taking (matrix) derivatives shows this is minimized byb=(XTX)°1XTy-This is the OLS estimate of the vectorThe variance-covariance estimate for the sample estimates

isVb=(XTX)°1æ2e- Predicted value of the y’s

If residuals follow a MVN distribution, OLS = ML solution

cæ2e=1n°rank(X)nXi=1be2i-

Example: Regression Through the Origin

yi = xi + ei

[email protected][email protected]Ø=(Ø)XTX=nXi=1x2iXTy=nXi=1xiyib=≥XTX¥°1XTy=Pxiyix2iP( )- æ2(b)=≥XTX¥°1æ2e=æ2ePx2i( )-æ2e=1n1X(yibxi)2°°- -æ2(b)=1n°1P(yi°bxi)2Px2i--

Polynomial Regressions and Interaction Effects

GLM can easily handle any function of the observedpredictor variables, provided the parameters to estimateare still linear, e.g. Y = +1f(x) + 1g(x) + … + e

Quadratic regression:yi=Æ+Ø1xi+Ø2x2i+eiØ=0@ÆØ1Ø[email protected] terms (e.g. sex x age) are handled similarlyØ=0B@ÆØ1Ø2Ø31CAyi=Æ+Ø1xi1+Ø2xi2+Ø3xi1xi2+eiX=0BB@1x11x12x11x121x21x22x21x22............1xn1xn2xn1xn21CCAWith x1 held constant, a unit change in x2 changes yby 2 + 3x1

Likewise, a unit change in x1 changes y by 1 + 3x2

Fixed vs. Random EffectsIn linear models are are trying to accomplish two goals:estimation the values of model parameters and estimateany appropriate variances.

For example in the simplest regression model, y = + x + e, we estimate the values for and and also the variance of e. We, of course, can alsoestimate the ei = yi - ( + xi )

Note that are fixed constants are we trying toestimate (fixed factors or fixed effects), while theei values are drawn from some probability distribution(typically Normal with mean 0, variance 2

e). The ei are random effects.

“Mixed” models contain both fixed and random factors

This distinction between fixed and random effects isextremely important in terms of how we analyzed a model.If a parameter is a fixed constant we wish to estimate,it is a fixed effect. If a parameter are drawn fromsome probability distribution and we are trying to makeinference on either the distribution and/or specific realizations from this distribution, it is a random effect.

We generally speak of estimating fixed factors andpredicting random effects.

Example: Sire model

yij = + si + eij

Fixed effectRandom effectIt depends. If we have (say) 10 sires, if we are ONLYinterested in the values of these particular 10 sires and don’t care to make any other inferences about the population from which the sires are drawn, then we can treat them as fixed effects. In the case, the model is fully specified the covariance structure for the residuals.Thus, we need to estimate , s1 to s10 and 2

e, and wewrite the model as yij = + si + eij, 2(eij) = 2

e.

Is the sire effect fixed or random?Conversely, if we are not only interested in these10 particular sires but also wish to make someinference about the population from which theywere drawn (such as the additive variance, since 2

A = 42s, ), then the si are random effects. In this

case we wish to estimate and the variances2

s and 2w. Since 2si also estimates (or predicts)

the breeding value for sire i, we also wish toestimate (predict) these as well. Under a Random-effects interpretation, we write the model asyij = + si + eij, 2(eij) = 2

e, 2(si) = 2s

Generalized Least Squares (GLS)

Suppose the residuals no longer have the samevariance (i.e., display heteroscedasticity). Clearlywe do not wish to minimize the unweighted sumof squared residuals, because those residuals withsmaller variance should receive more weight.

Likewise in the event the residuals are correlated,we also wish to take this into account (i.e., performa suitable transformation to remove the correlations)before minimizing the sum of squares.

Either of the above settings leads to a GLS solutionin place of an OLS solution.

In the GLS setting, the covariance matrix for thevector e of residuals is written as R where Rij = (ei,ej)

The linear model becomes y = X + e, cov(e) = R

The GLS solution for is b=XTR°1X°1XTR°1y≥¥---

( )The variance-covariance of the estimated model parameters is given by

--

( )Vb=≥XTR°1X¥°1æ2e

ExampleOne common setting is where the residuals are uncorrelatedbut heteroscedastic, with 2(ei) = 2

e/wi.

For example, sample i is the mean value of ni individuals, with 2(ei) = e

2/ni. Here wi = niR=Diag(w°11;w°12;:::;w°1n)- - -

Consider the model yi = + [email protected]Ø=µÆØ∂( )

Here

-R°1=Diag(w1;w2;:::;wn)givingXTR°1y=w0@ywxyw1A- XTR°1X=w0@1xwxwx2w1A-

wherew=nXi=1wi;xw=nXi=1wixiw;x2w=nXi=1wix2iwyw=nXi=1wiyiw;xyw=nXi=1wixiyiw

a=yw°bxwb=xyw°xwywx2w°x2w-

--æ2(a)=æ2e¢x2ww(x2w°x2w)æ2(b)=æ2ew(x2w°x2w)æ(a;b)=°æ2exww(x2w°x2w)----

This gives the GLS estimators of and as

Likewise, the resulting variances and covariance forthese estimators is

lecture 10a: matrix algebra. matrices: an array of elements vectors column vector row vector square...

Documents

matrix product

matrix c

diagonal matrix

identity matrix

inverse matrix

matrix form

ij b ij matrix subtraction

c t b t