identification of molecular mechanisms that drive...

Identification of Molecular Mechanisms that Drive Interindividual Variability Using

Mediation Analysis

Joshua Millstein, Assistant Professor

Biostatistics, Preventive Medicine

University of Southern California

[email protected] 1

Conceptual Model

[email protected] 2

DNA

Molecular Mediators

Exposure

Exposure Outcome

Example: Food Allergy

[email protected] 3

rs7192, rs9275596

HLA-DRB1, HLA-DQB1 DNA Methylation

Other Exposures?

Peanuts IgE reactivity

HLA-DRB1, HLA-DQB1 Gene Expression?

Other Exposures?

Hong, et al. 2015.

Example: Food Allergy

[email protected] 4

Hong, et al. 2015.

DNA : Outcome (GWAS)

DNA : Meth, Meth : Outcome

Causal Inference Test (CIT)

Example: Personalized Diabetes Therapy

[email protected] 5

rs552668

ADRA2A Expression

Yohimbine

Metformin T2D Control

Tang, et al. 2014.

Other Exposures?

Example: Personalized Diabetes Therapy

[email protected] 6

Tang, et al. 2014.

DNA : Expr DNA : Outcome Expr : Outcome

CIT

Yohimbine

Example: Immune Response to Influenza Vaccination

[email protected] 7

Mult. SNPs

Expr 20 Genes

Other Exposures?

Infl. Vaccine Antibody Response

Franco, et al. 2013.

Example: Immune Response to Influenza Vaccination

[email protected] 8

Franco, et al. 2013.

Vaccine : mRNA DNA : mRNA

CIT

mRNA : Antibody Response

Causality Conditions

[email protected] 9

1) L causes T

4) The predictive power of L on T is explained by G.

2) L explains variation in G not explained by T.

3) G explains variation in T not explained by L.

L T G

L T G

L T G ?

L T G

L®T

|L G T

|G T L

|L T G

Millstein, et al. 2009. BMC Genetics

Component Tests

[email protected] 10

A working mathematical definition of ‘causal’ is described by a set of conditions within a linear modeling framework:

Four component hypothesis tests:

Standard F-tests (partial F-tests) can be used for tests 1-3, however, test 4 is an equivalence testing problem.

Equivalence testing requires defining

boundaries within which the parameter is sufficiently close to the target.

Non-significance does not equate to

significant equivalence.

T =a1 + b1L +e1

G =a + b2L+ b3T +e2

T =a3 + b4G+ b5L+e3

1) H0 : b1 = 0 H1 : b1 ¹ 0

2) H0 : b2 = 0 H1 : b2 ¹ 0

3) H0 : b4 = 0 H1 : b4 ¹ 0

4) H0 : b5 ¹ 0 H1 : b5 = 0




Intersection/union test:

• Union of the acceptance regions

• Intersection of the rejection regions

• Conservative test of the union of null hypotheses

• P-value is the max of component test p-values

Software freely available from CRAN:

• R package, ‘cit’

• https://cran.r-project.org/web/packages/cit/index.html

• Continuous outcome

• Continuous potential mediator

• Single instrumental variable with values, {0, 1, 2}

pCIT = max(p1, p2, p3, p4)


Cassella & Berger, 2002.

https://cran.r-project.org/web/packages/cit/index.html




CIT False Discovery Rate (qCIT)


Remaining issues:

• How to adjust for multiple testing?

• What if parametric assumptions fail?

• Not obvious how to estimate empirical null distribution using permutation

Solution: novel intersection/union type test with FDR

• Let, P[TD1] denote P[ true discovery for component test 1 | FDR = q1 ]

• True discovery union = P[ all TDs ] = P[ TD1 ] * P[ TD2 ] *…

• P[ TD1 ] = 1 – P[ FD1 ] = 1 – q1

• qCIT = P[ any false ] = 1 – P[ all TDs ]

qCIT =1- (1-q1)*(1-q2 )*(1-q3)*(1-q4) ³ max(q1,q2,q3,q4 )


Parametric

Millstein FDR estimators

Permutation-Based (non-parametric)

FD̂R =S *

S

1- S /m

1- S * /m

FD̂R =ma

S

1- S /m

1-a

m: total number of tests conducted

a: significance level

S: No. tests w/ p-value < a

S*: No. tests w/ p-value < a

from a permuted replicate dataset


Advantages:

1. Non-parametric

2. Powerful

3. Confidence Intervals for FDR

4. R package, ‘fdrci’

5. https://cran.r-project.org/web/packages/fdrci/index.html

Millstein & Volfson, 2013. Frontiers in Genetics

Permutation-Based FDR Example

https://cran.r-project.org/web/packages/fdrci/index.html





Extensions to the CIT software:

1. Binary as well as continuous outcomes

2. Multiple binary and/or continuous instrumental variables

3. Permutation-based FDR

4. Parametric FDR

5. Adjustment covariates

Millstein & Volfson, 2013. Frontiers in Genetics


L T G

L T G

L

T

G

Causal

Reactive

Independent

Underlying Model: Causal

identification of molecular mechanisms that drive...

Documents