linear models and their mathematical foundations ...€¦ · linear models and their mathematical...

Non-full rank modelsRemedies to deal with the rank deficiency of the design matrix

Testing hypotheses

Linear models and their mathematical foundations:Factorial models

Steffen Unkel

Department of Medical StatisticsUniversity Medical Center Gottingen

Winter term 2018/19 1/34


Testing hypotheses

IntroductionEstimation

Motivating example

Suppose the interest is in the corn yield when differentfertilizers are available and corn is planted in different soiltypes. The questions one is interested in answering are:

1 Does fertilizer type have an effect on crop yield?

2 Does soil type have an effect on crop yield?

3 Do the two treatment factors interact? For instance, theremay be no difference between fertilizer 1 and fertilizer 2 in soiltype 1, but fertilizer 1 may produce a greater corn yield thanfertilizer 2 in soil type 2.

Factorial experiments, also known as factorial designs, areused to answer such questions.

The analysis of data arising from such experiments involvesfactorial models, which facilitate an analysis of the effects dueto the treatment factors on some response.



Testing hypotheses


What are factorial models?

In factorial models the response (e.g. yield) is considered tobe expressible as the sum of...

1 effects due to individual factors (e.g. fertilizer and soil type)acting one at a time,

2 effects due to pairs of factors, and

3 effects beyond their separate combinations (two-factorinteractions), and so on.

A factor has a limited number of variations that are used inthe experiment, known as factor levels.

We will only consider balanced models that have an equalnumber of observations in each factor level or factorcombination.



Testing hypotheses


The design matrix in factorial models

The term “factorial” refers to a particular class of designmatrices X.

In case of factorial models, X is a matrix of zeros and onesonly, and is sometimes called an incidence matrix.

Factorial models are often called analysis of variance(ANOVA) models to be distinguished from linear regressionmodels for which the covariates are continuous.

We shall use the terms factorial models and regression modelsas descriptors for different kind of matrices X.



Testing hypotheses


One-way ANOVA model

The one-way balanced model can be expressed as

yij = µ+ αi + εij , i = 1, . . . , I ; j = 1, . . . , J ,

where µ is the grand mean, α1, α2, . . . , αI represent theeffects of I treatments, each of which is applied to Jexperimental units, and yij is the response of the jthobservation among the J units that receive the ith treatment.

The random error terms are denoted by εij .

In some experimental situations, the I groups may representsamples from I populations whose means we wish to compare,populations that are not created by applying treatments.



Testing hypotheses


One-way ANOVA: assumptions

To complete the model, we make the following assumptions:

A1 E(εij) = 0 ∀ i , j .

A2 Var(εij) = σ2 ∀ i , j .

A3 Cov(εij , εrs) = 0 ∀ (i , j) 6= (r , s).

Occasionally, we will make use of the following additionalassumption:

A4 εij ∼ N (0, σ2) ∀ i , j .

Any of these assumptions may fail to hold with real data.



Testing hypotheses


Alternative one-way ANOVA formulation

The mean for the ith treatment or population can be denotedby µi .

Thus, using assumption A1 we have E(yij) = µi = µ+ αi .

We can rewrite the one-way model equation as

yij = µi + εij , i = 1, . . . , I ; j = 1, . . . , J .

This is called the cell-means model formulation.



Testing hypotheses


Two-way ANOVA model with interaction

The two-way balanced model with interaction can beexpressed as

yijk = µ+αi+βj+γij+εijk , i = 1, . . . , I ; j = 1, . . . , J; k = 1, . . . ,K .

The effect of factor A at the ith level is αi , and the term βj isdue to the jth level of factor B.

The term γij represents the interaction AB between the ithlevel of A and the jth level of B.

If γij is omitted, we have a two-way ANOVA model withoutinteraction.



Testing hypotheses


Two-way ANOVA model: assumptions

To complete the model, we make the following assumptions:

A1 E(εijk) = 0 ∀ i , j , k.

A2 Var(εijk) = σ2 ∀ i , j , k.

A3 Cov(εijk , εrst) = 0 ∀ (i , j , k) 6= (r , s, t).

Occasionally, we will make use of the following additionalassumption:

A4 εijk ∼ N (0, σ2) ∀ i , j , k.

Any of these assumptions may fail to hold with real data.



Testing hypotheses


Alternative two-way ANOVA formulation

Let µij = E(yijk) denote the mean of a random observation inthe (ij)th cell.

Using assumption A1 we haveE(yijk) = µij = µ+ αi + βj + γij .

We can rewrite the two-way model equation in the cell-meansformulation as

yijk = µij + εijk , i = 1, . . . , I ; j = 1, . . . , J; k = 1, . . . ,K .



Testing hypotheses


Experimental situations for two-way ANOVA

There are two experimental situations in which the two-wayANOVA model seem appropriate:

1 Factors A and B represent two types of treatment, forexample, various levels of fertilizer and soil type applied in anagricultural experiment. We apply each of the combinations ofthe levels of A and B to a number of (randomly) selectedexperimental units.

2 In another situation, the population may exist naturally, forexample, gender (males and females) and political preference(e.g. Democrats, Republicans). A (random) sample of anumber of observations is obtained from each of the I × Jpopulations.



Testing hypotheses


Crossed versus nested factors

Crossed factor structure:

Recall that with crossed factors we see all every level of factorA at every level of factor B.

This means that factor level 1 of factor A has the samemeaning across all levels of factor B.

Nested factor structure:

We call factor B nested in factor A if we have different levelsof B within each level of A.

Examples:1 Patients are nested in hospitals.2 Samples are nested in batches3 Students are nested in classes. Classes are nested in schools.

A factorial design can have both crossed and nested factors.



Testing hypotheses


Example of a nested factorial design

Suppose that we want to analyze student performance. Datafrom different classes from different schools (on a studentlevel) are available.

Questions of interest:What is the grade variability between different schools?What is the grade variability between classes within the sameschool?What is the grade variability between students within the sameclass?

This is a nested factorial design, as classes are clearly notcrossed with schools, similarly for students.

We will revisit nested factor structures in the lectures on linearmixed-effects modelling.



Testing hypotheses


Rank deficient design matrices

We can write an ANOVA model in matrix form asy = Xβ + ε.

ANOVA models are often expressed with more parametersthan can be estimated, which results in X being rankdeficient.

If X is rank deficient, then X>X is singular.

The normal equations X>Xβ = X>y do not have uniquesolution.


Testing hypotheses


Estimation of β

If X is n × p with rank(X) = k Xβ = X>y is consistent.

Since the normal equations are consistent, a solution is givenby

β = (X>X)−X>y ,

where (X>X)− is any generalized inverse of X>X.

For a particular generalized inverse (X>X)−,E(β) = (X>X)−X>Xβ.

The expression (X>X)−X>Xβ is not invariant to the choiceof (X>X)−.

Testing hypotheses


Estimation of β (2)

Suppose there is a p × n matrix A such that E(Ay) = β. Ifso, then

β = E(Ay) = E(A(Xβ + ε)) = AXβ .

Since this must hold for all β, we have AX = Ip.

But rank(AX) < p, hence AX cannot be equal to Ip.

Conclusion: there are no linear functions of y that yield anunbiased estimator of β.


Testing hypotheses


Estimation of σ2

We define

SSE = (y− Xβ)>(y− Xβ)

= y>y− β>

X>y

= y>[I− X(X>X)−X>

]y ,

where β is any solution to the normal equationsX>Xβ = X>y.

For an estimator of σ2, we define

s2 =SSE

n − k,

where n is the number of rows of X and k = rank(X).



Testing hypotheses


Estimation of σ2 (2)

For s2 = SSE/(n − k), the following properties hold:

i. E(s2) = σ2.

ii. The estimator s2 is invariant to the choice of β or to thechoice of the generalized inverse (X>X)−.


Testing hypotheses


Maximum likelihood estimation

Let y ∼ Nn(Xβ, σ2I), where X is n × p of rank k X)−X>y , σ2 =1

n(y− Xβ)>(y− Xβ) .

It holds thatβ ∼ Np((X>X)−X>Xβ, σ2(X>X)−X>X(X>X)−).


Testing hypotheses

Estimable linear combinations of parametersReparameterizationImposing side conditions

Linear combinations of the parameters

Having established that we cannot estimate β, we next inquireas to whether we can estimate any linear combination of theparameters, say λ>β.

A linear function of parameters, λ>β, is said to be estimableif there exists a linear combination of the observations with anexpected value equal to λ>β.

In other words, λ>β is estimable if there exists a vector asuch that E(a>y) = λ>β.



Testing hypotheses


Is a particular function λ>β estimable?

Conditions for estimability

A particular linear function λ>β is estimable if and only if anyone of the following equivalent conditions hold:

i. λ> is a linear combination of the rows of X; that is, thereexists a vector a such that a>X = λ>.

ii. λ> is a linear combination of the rows of X>X or λ is a linearcombination of the columns of X>X; that is, there exists avector r such that r>X>X = λ> or X>Xr = λ.

iii. λ or λ> is such that

X>X(X>X)−λ = λ or λ>(X>X)−X>X = λ> ,

where (X>X)− is any (symmetric) generalized inverse of X>X.



Testing hypotheses


Linearly independent functions of β

A set of functions λ>1 ,λ>2 β, . . . ,λ

>mβ is said to be linearly

independent if the coefficient vectors λ1,λ2, . . . ,λm arelinearly independent.

In the non-full-rank model y = Xβ + ε, the number of linearlyindependent functions of β is equal to the rank of X.

All estimable functions can be obtained from Xβ or X>Xβ.

Thus we can examine linear combinations of the rows of X orof X>X to see what functions of the parameters are estimable.



Testing hypotheses


Estimators of λ>β

From what has been stated previously, we have the followingestimators of λ>β:

E1 a>y, where a> satisfies λ> = a>X.

E2 r>X>y, where r> satisfies λ> = r>X>X.

E3 λ>β, where β is a solution of X>Xβ = X>y.



Testing hypotheses


Properties

The estimators E1–E3 have the following properties:

i. E(a>y) = E(r>X>y) = E(λ>β) = λ>β.

ii. Var(r>X>y) = σ2r>X>Xr = σ2r>λ.

iii. Var(λ>β) = σ2λ>(X>X)−λ.

iv. The estimators λ>β and r>X>y are BLUE.

v. The estimator a>y is not guaranteed to have minimumvariance.


Testing hypotheses


Purpose of reparameterization

In reparameterization, we transform the non-full-rank modely = Xβ + ε, where X is n × p of rank k < p ≤ n, to thefull-rank model

y = Zγ + ε ,

where Z is n × k of rank k and γ = Uβ is a set of k linearlyindependent functions of β with U being a k × p matrix ofrank k < p.

Thus Zγ = Xβ and we can write Zγ = ZUβ = Xβ, whereX = ZU.

Then, ZUU> = XU> and Z = XU>(UU>)−1.


Testing hypotheses


Full-rank model

To establish that Z is full-rank, note thatrank(Z) ≥ rank(ZU) = rank(X) = k.

However, Z cannot have rank greater than k since Z has kcolumns.

Therefore, rank(Z) = k and the model y = Zγ + ε is afull-rank model.



Testing hypotheses


Estimation in the full-rank model

For the full-rank model we can use the normal equationsZ>Zγ = Z>y to obtain the unique solutionγ = (Z>Z)−1Z>y.

An unbiased estimator of σ2 is given by

s2 =1

n − k(y− Zγ)>(y− Zγ) .

It holds that Zγ = Xβ and

(y− Xβ)>(y− Xβ) = (y− Zγ)>(y− Zγ) .



Testing hypotheses


Indeterminacy of estimable functions

The set Uβ = γ is only one set of linearly independentestimable functions.

Let Vβ = δ be another set. Then there exists a matrix Wsuch that y = Wδ + ε.

Now an estimable function λ>β can be expressed as

λ>β = b>γ = c>δ .

Henceλ>β = b>γ = c>δ ,

and either reparameterization gives the same estimator ofλ>β.


Testing hypotheses


Purpose of side conditions

Side conditions provide (linear) constraints that make theparameters unique.

Side conditions must be non-estimable functions of β.

Since rank(X) = k < p ≤ n, the rank deficiency in the rank ofX is p − k.

In order to obtain a unique solution vector β, we must defineside conditions that make up this deficiency in rank.

Testing hypotheses


Defining side conditions

We define side conditions Tβ = 0, where T is a (p − k)× pmatrix of rank p − k such that Tβ = 0 is a set ofnon-estimable functions.

If y = Xβ + ε, where X is n × p of rank k Xβ = X>y and Tβ = 0.

Testing hypotheses

Testable hypotheses

We consider hypotheses about the regression coefficients inthe model y = Xβ + ε, where y ∼ Nn(Xβ, σ2I) and X is ann × p design matrix of rank k 1 β,λ

>2 β, . . . ,λ

>t β

such that H0 is true if and only ifλ>1 β = λ>2 β = · · · = λ>t β = 0.

Often the subset of β’s whose equality we wish to test is suchthat every contrast

∑i ciβi is estimable;

∑i ciβi is a contrast

if∑

i ci = 0.


Testing hypotheses

Example of a testable hypothesis

Suppose that we have the model

yij = µ+ αi + βj + εij , i , j = 1, 2, 3

and the hypothesis of interest is H0: α1 = α2 = α3.

By taking linear combinations of the rows of Xβ, we canobtain two linearly independent estimable functions α1 − α2

and α1 + α2 − 2α3.

H0 is true if and only if α1 − α2 and α1 + α2 − 2α3 aresimultaneously equal to zero.

Therefore, H0 is testable and is equivalent to

H0 :

(α1 − α2

α1 + α2 − 2α3

)=

(00

).


Testing hypotheses

General linear hypothesis approach

As illustrated, a hypothesis such as H0: α1 = α2 = α3 can beexpressed in the form H0: Cβ = 0.

We can test this hypothesis in a manner analogous to thatused for the general linear hypothesis test for the full-rankmodel.

We assume that y ∼ N (Xβ, σ2I), where X is n × p of rankk X)−X>y.


Testing hypotheses

General linear hypothesis test

If H0: Cβ = 0 is true, the test statistic

F =SSH/m

SSE/(n − k)

=(Cβ)>[C(X>X)−C>]−1(Cβ)/m

SSE/(n − k)

is distributed as F (m, n − k).

Reject H0 if F ≥ Fα;m;n−k , where Fα;m;n−k is the upper αpercentage point of the (central) F distribution with m andn − k degrees of freedom.


linear models and their mathematical foundations ...€¦ · linear models and their mathematical...

Documents