algorithms for association mapping of complex diseases with ancestral recombination graphs

18
RECOMB Satellite Workshop , 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis

Upload: jack

Post on 11-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs. Yufeng Wu UC Davis. Association (or LD) Mapping. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

RECOMB Satellite Workshop, 2007

Algorithms for Association Mapping of Complex Diseases With

Ancestral Recombination Graphs

Yufeng WuUC Davis

Page 2: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

2

Association (or LD) Mapping

• Given a subset of SNPs from unrelated individuals, find unobserved genetic variations that strongly discriminate individuals with the trait (cases) and those without the trait (controls)

• Complex Diseases: difficult to map

Page 3: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

3

Illustration (Zollner and Pritchard, Genetics, 2005)

Cases

ControlsSNP markers

1: 0011012: 1100003: 0011104: 0010005: 0000106: 1111017: 1000118: 1100019: 11001010: 10001111: 01000012: 101101

Page 4: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

4

Some Challenges in Association Mapping

1 2

Page 5: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

5

The Genealogy Approach• “..the best information that we could

possibly get about association is to know the full coalescent genealogy…” – Zollner and Pritchard

• Goal: infer genealogy from marker data with recombination– Approximation (e.g. in Zollner and Pritchard)

Page 6: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

6

Ancestral Recombination Graph (ARG)

10 01 00

S1 = 00S2 = 01S3 = 10S4 = 10

MutationsS1 = 00S2 = 01S3 = 10S4 = 11

10 01 0011

Recombination

Assumption:

at most one mutation per site

1 0 0 1

1 1

Page 7: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

7

Full-ARG Approaches• First full ARG mapping method (Minichiello and

Durbin)– Use full plausible ARG, but heuristic– Less complex disease model

• Our results (Wu, 2007)– Sampling full ARGs with provable property, and work on

more complex disease model– Focus on parsimonious history

• minARGs: ARGs that use the minimum number of recombinations

• Near minimum ARGs– Uniform sampling of minARGs

Page 8: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

8

Special Case: ARG with Only Input Sequences

• Self-derivability (SD) Problem: construct an ARG with only the input sequences

• In fact, such ARG, if exits, must be a minARG

• Runs in O(2n) time• Heuristics to extend to non-self-

derivable data

Page 9: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

9

00000

01000

01100

01101

11000

00010

11011

00011 1 2

00000

01000

01100

01101

11000

00010

00011

11011

N1=16400000

01000

01100

11000

00010

11011

00011

01101

N2=76N = 164*1 + 76*2

= 316

Counting Self-derived ARGs

Page 10: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

00000

01000

01100

01101

11000

00010

11011

00011 1 2

00000

01000

01100

01101

11000

00010

00011

11011

16400000

01000

01100

11000

00010

11011

00011

01101

76

1. Random value Rnd = 0.3 < 0.52

316

Select 11011 with prob = 164/316 = 0.52, and 01101 with prob = 76*2/316 = 0.48

2. Pick seq = 11011 as last row to derive3. Move to reduced matrix

Page 11: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

11

ARGs Represents a Set of Marginal Trees

• Clear separation of cases/controls: NOT expected for complex diseases!

Page 12: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

12

Disease Model (Zollner & Pritchard)

Disease mutations: Poisson Process

Two alleles: wild-type and mutant

0.05

0.05

0.05 0.05

0.1

0.1

0.050.05

Page 13: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

13

Disease Penetrance (Zollner & Pritchard)

PA,1: probability of a mutant sequence becomes a casePC,1 = 1.0 - PA,1

PA,0: probability of a wild-type sequence becomes a casePC,0 = 1.0 - PA,0

0.05

0.05

0.05 0.05

0.1

0.1

0.050.05

CaseControl

Page 14: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

14

Phenotype Likelihood (Zollner and Pritchard)

• Given a tree Tx at position x and case/control phenotype of its leaves, what is the probability Pr( | Tx) of observing on Tx? (Zollner & Pritchard)– Sum over all subset of mutated edges

• Adopted in this work

Page 15: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

15

Expected Phenotype Likelihood

• Need for assessing statistical significance.• Null model: randomly permute case/control

labels.• Our result: O(n3) algorithm for computing

expected value of phenotype likelihood.– Exact, fully deterministic method.

Page 16: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

16

Diploid Penetrance

Diploid: two sequences per individualDiploid enetrance:PA,00: prob. Individual with two wild-type sequences becomes a casePA,01 : …, PA,11: …

CaseControl

Efficient computation of phenotype likelihood: stated but unresolved in Zollner and PritchardOur result (Wu, 2007): computing phenotype likelihood with diploid penetrance is NP-hard

Page 17: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

17

Simulation Results

Comparison: TMARG (uniform), TMARG (pathway), LATAG, MARGARITA

50 ARGs per data

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

Uniform Pathway LATAG MARGARITA

50/5000 ARGs per data

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

n50 n5000 LATAG MARGRITA

Page 18: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

18

Acknowledgement

• Software available at: http://wwwcsif.cs.ucdavis.edu/~wuyu

• I want to thank– Dan Gusfield– Dan Brown– Chuck Langley– Yun S. Song