likelihood ratio testing under non-identifiability: theory and biomedical applications

40
Likelihood Ratio Testing under Non-identifiability: Theory and Biomedical Applications Kung-Yee Liang and Chongzhi Di Department Biostatistics Johns Hopkins University July 9-10, 2009 National Taiwan University

Upload: cutter

Post on 17-Jan-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Likelihood Ratio Testing under Non-identifiability: Theory and Biomedical Applications. Kung-Yee Liang and Chongzhi Di Department Biostatistics Johns Hopkins University July 9-10, 2009 National Taiwan University. Outline. Challenges associated with likelihood inference - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Likelihood Ratio Testing under Non-identifiability:

Theory and Biomedical Applications

Kung-Yee Liang and Chongzhi DiDepartment Biostatistics

Johns Hopkins University

July 9-10, 2009National Taiwan University

Page 2: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Outline• Challenges associated with likelihood inference• Nuisance parameters absent under null hypothesis

– Some biomedical examples

– Statistical implications

• Class I: alternative representation of LR test statistic– Implications

• Class II– Asymptotic null distribution of LR test statistic

– Some alternatives

• A genetic linkage example• Discussion

Page 3: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Likelihood InferenceLikelihood inference has been successful in a

variety of scientific fields• LOD score method for genetic linkage

– BRCA1 for breast cancerHall et al. (1990) Science

• Poisson regression for environmental health– Fine air particle (PM10) for increased mortality in total

cause and in cardiovascular and respiratory causes Samet et al. (2000) NEJM

• ML image reconstruction estimate for nuclear medicine– Diagnoses for myocardial infarction and cancers

Page 4: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Challenges for Likelihood Inference

• In the absence of sufficient substantive knowledge, likelihood function maybe difficult to fully specify– Genetic linkage for complex traits– Genome-wide association with thousands of SNPs– Gene expression data for tumor cells

• There is computational issue as well for high-dimensional observations– High throughput data

Page 5: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Challenges for Likelihood Inference (con’t)

• Impacts of nuisance parameters– Inconsistency of MLE with many nuisance

parameters (Neyman-Scott problem)– Different scientific conclusions with different

nuisance parameter values– Ill-behaved likelihood function

• Asymptotic approximation not ready

Page 6: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Parametric Linkage Analyses - 123 multiplex families - 4 models

0

1

2

3

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170

genetic distance in CentiMorgans

HL

OD

Dominant1_HLOD recessive1_HLOD dominant2_HLOD recessive2_HLOD

Page 7: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

0.0 0.2 0.4 0.6 0.8 1.0

-12

0-1

10

-10

0-9

0-8

0-7

0Beta-Binomial Log-Likelihood

Lo

g li

kelih

oo

d

L0.80.8L0.60.8

0.056

L0.80.1L0.60.1

336.9

0.1

0.5

0.3

0.8

Page 8: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Challenges for Likelihood Inference (con’t)

• There are situations where some of the “regularity” conditions may be violated– Boundary issue (variance components, genetic

linkage, etc.)Self & Liang (1987) JASA

– Discrete parameter spaceLindsay & Roeder (1986) JASA

– Singular information matrix (admixture model)Rotnitzky et al (2000) Bernoulli

Page 9: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Nuisance Parameters Absent under Null

Ex. I.1 (Polar coordinate for bivariate normal)

Under H0: δ = 0, γ is absent

Davies (1977) BiometrikaAndrews and Ploberger (1994) Econometrica

10

01,

sin

cos...

2

1 ~

BVNY

Y dii

i

i

Page 10: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Examples (con’t)

Ex. I.2 (Sterotype model for ordinal categorical response)

For Y = 2, .., C,

log Pr(Y = j)/Pr(Y = 1) = αj + βjtx, j = 2,..,C

= αj + φj βtx

0 = φ1 ≤ φ2 … ≤ φC = 1

Under H0: β = 0, φj ’s are absent

Anderson (1984) JRSSB

Page 11: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Examples (con’t)

Ex. I.3 (Variance component models)In certain situations, the covariance matrix of

continuous and multivariate observations could be expressed as

δM(γ) + λ1M1 + … +λqMq

A hypothesis of interest is H0: δ = 0 (γ is absent)

Ritz and Skovgaard (2005) Biometrika

Page 12: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Examples (con’t)

Ex. I.4 (Gene-gene interactions)

Consider the following logistic regression model:logit Pr(Y = 1|S1, S2) = α + Σk δkS1k + Σj λkS2j

+ γ Σk Σj δk λj S1k S2j

To test genetic association between gene one (S1) by taking into account potential interaction with gene two (S2), the hypothesis of interest is

H0: δ1 = … = δK = 0 (γ is absent)

Chatterjee et al. (2005) American Journal of Human Genetics

Page 13: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Examples (con’t)

Ex. II.1 (Admixture models)

f(y; δ, γ) = δ p(y; γ) + (1 – δ) p(y; γ0)

δ: proportion of linked families

γ: recombination fraction (γ0= 0.5)

Smith (1963) Annals of Human Genetics

The null hypothesis of no genetic linkage can be cast asH0: δ = 0 (γ is absent) or H0: γ = γ0 (δ is absent)

Page 14: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Examples (con’t)

Ex. II.2 (Change point)

logit Pr(Y = 1|x) = β0 + βx + δ(x – γ)+

(x – γ)+ = x – γ if x – γ > 0 and 0 if otherwise

• Alcohol consumption protective for MI when consuming less than γ, but harmful when exceeding the threshold

Pastor et al. (1998) American Journal of Epidemiology

Hypothesis of no threshold existing can be cast asH0: δ = 0 (γ is absent) or H0: γ = ∞ (δ is absent)

Page 15: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Examples (con’t)

Ex. II.3 (Non-linear alternative)

logit Pr(Y = 1|x) = β0 + βx + δh(x; γ)

e.g., h(x; γ) = exp(xγ) – 1

• The effect of alcohol consumption on risk, through log odds, of MI is non-linear if γ ≠ 0

The hypothesis of linearity relationship with a specific non-linear alternative can be cast asH0: δ = 0 (γ is absent) or H0: γ = 0 (δ is absent)

Gallant (1977) JASA

Page 16: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Characteristics of ExamplesMajority of examples can be characterized as

f(y, x; δhy,x(γ, β), β)

Class I Class II

H0: δ = 0 H0: δ = 0 or γ = γ0 ( hy,x(γ0, β) = 0 )

Page 17: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Figure: expected log likelihood function for three cases

Page 18: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class I: Asymptotic

For H0: δ = 0,

1. LRT = 2{logL( ) – logL(0, •)}

= supγ 2{logL( , γ) – logL(0, •)}

= supγ LRT(γ)

2. LRT(γ) = S(γ)t I-1(γ)S(γ) + op(1) = W2(γ) + op(1),

where S(γ) = ∂logL(δ, γ)/∂δ|δ=0, I(γ) = var{S(γ)}

W(γ) = I-1/2(γ) S(γ)

and W(γ) is a Gaussian process in γ with mean 0, variance 1 and autocorrelation ρ(γ1, γ2) = cov{W(γ1), W(γ2)}

ˆ,ˆ

Page 19: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class I: Asymptotic (con’t)

• Results were derived previously by Davies (1977, Biometrika)

• No analytical form available in general

• Approximation, simulation or resampling methods Kim and Siegmund (1989) Biometrika

Zhu and Zhang (2006) JRSSB

Q.: Can simplification be taken place for

• Asymptotic null distribution?

• Approximating the p-value?

Page 20: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class I: Principal Component Representation

Principal component decomposition

• K could be finite or ∞• {ξ1, …,ξK} are independent r.v.’s• ξk ~ N(0, λk), k = 1, .., K• ρ(γ, γ) = Σk λk ωk(γ)2 = 1

)()(),(,)()( 21

1211

k

K

kkk

K

kkkW

Page 21: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class I: Principal Component Representation (con’t)

W2(γ) = {Σk ξk ωk(γ)}2

≤ {Σk ξk2/λk} {Σk λk ωk(γ)2} = {Σk ξk

2/λk}

Consequently, one has

supγ W2(γ) = supγ {Σk ξk ωk(γ)}2 ≤ {Σk ξk2/λk} ~

• The asymptotic distribution of LRT under Ho is bounded by

2K

2K

Page 22: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class I: Simplification

• Simplify to if K < ∞ and for almost every

(ξ1, …, ξK), there exists γ such that

λ1ω1(γ)/ξ1 = ….. = λKωK(γ)/ξK Ex. I.1. (Polar coordinate for bivariate normal)

For any , there exists γ ε [0, π) such that

• LRT ~ instead of

H0: δ = 0 ↔ H0: μ1 = μ2

2K

21

22

)()sincos()(

)cos(),(),sincos()(2

22

12

212

2121212/1

YYnYYnW

YYnW

221 ),( RYY

sin/cos/ 21 YY

Page 23: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class I: Simplification (con’t)

• Simplify to if S(γ) = h(γ) g(Y)– ρ(γ1, γ2) = 1

Ex.: Modified admixture models

γ p(y; δ) + (1 – γ) p(y; δ0), γ ε [a, 1] with a > 0 fixed

H0: δ = δ0 (γ is absent) and the score function for δ at δ0 is

S(γ) = γ ∂logp(y; δ0)/∂δ• Known as restricted LRT for testing H0: δ = δ0

• Has been used in genetic linkage studiesLamdeni and Pons (1993) BiometricsShoukri and Lathrop (1993) Biometrics

21

Page 24: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class I: Approximation for P-values

When simplification fails: Step 1: Calculate W(γ) and ρ(γ1, γ2)

Step 2: Estimate eigenvalues {λ1, …, λK} and eigenfunctions {ω1, …, ωK}, where K is chosen so that first K components explain more than 95% variation

Step 3: Choose a set of dense grid {γ1, …, γM} and for i = 1, …, N, repeat the following steps:– Simulate ξik ~ N(0, λk) for k = 1, …, K

– Calculate Wi(γm) = {Σk ξik ωk(γm)}2 for each m

– Find the maximum of {Wi(γ1), …, Wi(γM)}, Ri say

{R1, …, RN} approximates the null distribution of LRT

Page 25: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class II: Some New Results

• Consider the class of familyf(y, x; δhy,x(γ, β), β),

where hy,x(γ0, β) = 0 for all y and xH0: δ = 0 or γ = γ0

Tasks: 1. Derive asymptotic distribution of LRT under H0

2. Present alternative approaches• Illustrate through Ex. II.1 (Admixture models)

δ Binom (m, γ) + (1 – δ) Binom (m, γ0) δ ε [0, 1] and γ ε [0, 0.5], hy,x(γ, β) = p(y; γ) – p(y; γ0) – For simplicity, assuming β is absent

Page 26: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class II: LRT Representation

Under H0, f(y, x; δ, γ) = f(y, x; 0, • ) = f(y, x; • , γ0 ),

LRT = supδ,γ 2{logL(δ, γ) – logL(0, • )} = supδ,γ LRT(δ, γ)

= max {sup1,4 LRT(a), sup2,4 LRT(b), sup3 LRT(a, b)},

here for fixed a, b > 0 and γ0 = 0.5:

Region 1: δ ε [a, 1], γ ε [0.5 – b, 0.5]

Region 2: δ ε [0, a], γ ε [0, 0.5 – b]

Region 3: δ ε [0, a], γ ε [0.5 – b, 0.5]

Region 4: δ ε [a, 1], γ ε [0, 0.5 – b]

Page 27: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class II: Four Sub-Regions of Parameter Spaces

Page 28: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class II: Regions 1 & 4

With δ ε [a, 1], this reduces to Class I, and sup1,4 LRT(a)= supδ { + op(1)}

W1(δ) = I1-½(δ)S1(δ)

S1(δ) = ∂logL(δ, γ)/∂γ|γ = γ0, I1(δ) = var(S1(δ))

For the admixture models,

• S1(δ) = δ ∂log p(y; γ0)}/∂γ, which is proportional to δ

• is independent of δ

)(21 W

)(21 W

Page 29: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class II: Regions 2 & 4

With γ ε [0, γ0 – b], this reduces to Class I, and

sup2,4 LRT(b)= supγ { + op(1)}

W2(γ) = I2-½(γ)S2(γ)

S2(γ) = ∂logL(δ, γ)/∂δ|δ = 0, I2(γ) = var(S2(γ))

For the admixture models,

• S2(γ) = {p(y; γ) – p(y; γ0)}/p(y; γ0)

• W2(γ) → ∂log p(y; γ0)}/∂γ = W1 as γ → γ0 (or b → 0)

)(22 W

Page 30: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class II: Region 3

With δ ε [0, a], γ ε [0.5 – b, 0.5], expand at [0, 0.5]sup3 LRT(a, b) = supδ,γ { + op(1)}

W3 = I3-½S3

S3 = ∂2logp(y; 0, 0.5)/∂δ∂γ, I3 = var(S3)

For the admixture models,

• S3 = ∂log p(y; γ0)}/∂γ and W3 = W1

23W

Page 31: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class II: Asymptotic Distribution of LRT

Combining three regions and let a, b → 0,LRT = max {sup1,4 LRT(a), sup2,4 LRT(b), sup3 LRT(a,b)}

= max { , sup [W2(γ)]2, }

→ sup {W2(γ)}2 ,

where • The asymptotic null distribution of LRT is supremum of

squared Gaussian process w.r.t. γ• Simplification (null distribution and approximation to p-

values) can be adopted from Class I

23W

3122 )(lim)5.0(0

WWWW

21W

Page 32: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class II: Alternatives

Question: Can one find alternatives test statistics with conventional asymptotic null distributions?

1. Restricted LRT: limit range for δ to [a, 1] with a > 0 (Region 1 and 4)

TR(a) = sup1,4 LRT(a) =

• TR(a) decreases in a

• How to choose a?– Smaller the a, the better

– Chi-square approximation maybe in doubt

21

21 ~)1( poW

Page 33: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class II: Alternatives (con’t)

2. Smooth version (penalized LRT)

Instead of excluding (δ, γ) values in Regions 2 & 3, they are “penalized” toward δ = 0 by considering penalized log-likelihood:

PL(δ, γ; c) = log L(δ, γ) + c g(δ),

where g(δ) ≤ 0 is a smooth penalty with , maximized at δ0 and c > 0 controlling the magnitude of penalty

• Bayesian interpretation – g(δ) could be “prior” on δ

)(lim0

g

Page 34: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Class II: Penalized likelihood

Define the penalized LR test statistic for H0: γ = γ0

PLRT(c) = 2 {sup PL(δ, γ; c) – PL(δ0, γ0; c)}

Under H0: PLRT(c) → as n → ∞• Proof is more demanding*• Similar concerns to restricted LRT

– Decreasing in c– Chi-square approximation maybe invalid with small c– Lose power if δ is small (linked families is small in

proportion)*Different approach for mixture/admixture model was provided by

by Chen et al. (2001, 2004) and Fu et al. (2006)

21

Page 35: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Application: Genetic Linkage for Schizophrenia

• Conducted by Ann Pulver at Hopkins

• 486 individuals from 54 multiplex families

• Interested in marker D22S942 in chromosome 22

• Schizophrenia is relatively high in prevalence with strong evidence of genetic heterogeneity

• To take into account of this phenomenon, consider

δ Binom (m, γ) + (1 – δ) Binom (m, γ0),

where δ ε [0, 1], γ ε [0, 0.5] with γ0 = 0.5

Page 36: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Genetic Linkage Study of Schizophrenia (con’t)

With this admixture model considered,•

• LRT = 6.86, p-value = 0.007• PLRT with

– PLRT(3.0) = 5.36, p-value = 0.010– PLRT(0.5) = 5.49, p-value = 0.009– PLRT(0.01) = 6.84, p-value = 0.004

• Which p-value do we trust better?

06.0ˆ,4.0ˆ

)1,loglim(log)( 00

g

Page 37: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Figure: asymptotic vs empirical distribution of the LRT for genetic linkage example

Page 38: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Discussion

• Issue considered, namely, nuisance parameters absent under null, is common in practice

• Examples can be classified into Class I and II– Class I: H0 can only be specified through δ (= 0)– Class II: H0 can specified either in δ (= 0) or γ (= γ0)

• For Class I, asymptotic distribution of LRT is well known, and through principal component representation– Deriving sufficient conditions for simple null

distribution– Proposing a means to approximate p-values

Page 39: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Discussion (con’t)

• For Class II, less well developed– Deriving asymptotic null distribution of LRT

• Through this derivation, we observe– Connection with Class I– Connection with RLRT and PLRT

• Proof on asymptotic of PLRT non-trivial• Shedding light on why penalty applied to δ not to γ?• Pointing out some peculiar features and shortcoming of

these two approaches

• A genetic linkage example on schizophrenia was presented for illustration

Page 40: Likelihood Ratio Testing  under Non-identifiability:  Theory and Biomedical Applications

Discussion (con’t)

Some future work:

• Constructing confidence intervals/regions

• Generalizing to partial/conditional likelihood– Cox PH model with change point (nuisance function)

• Extending to estimating function approach in the absence of likelihood function to work with– Linkage study of IBD sharing for affected sibpairsE(S(t)) = 1 + (1 - 2θt,γ)2 E(S(γ)) = 1 + (1 - 2θt,γ)2 δ

θt,γ = (1 – exp(–0.02|t – γ|)/2