exploring complex diseases using genome-wide association: challenges and strategies

51
Exploring complex diseases using genome-wide association: challenges and strategies Li Jin, Ph.D. Fudan University CAS-MPG Partner Institute for Computational Biology HGM2006, Helsinki

Upload: zuri

Post on 18-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Exploring complex diseases using genome-wide association: challenges and strategies. Li Jin, Ph.D. Fudan University CAS-MPG Partner Institute for Computational Biology. HGM2006, Helsinki. A G C. G G C. Gly. Ser. Positional Cloning. HGM2006, Helsinki. Linkage Disequilibrium. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Exploring complex diseases using genome-wide association: challenges and strategies

Exploring complex diseases using genome-wide association: challenges and strategies

Li Jin, Ph.D.

Fudan University

CAS-MPG Partner Institute for Computational Biology

HGM2006, Helsinki

Page 2: Exploring complex diseases using genome-wide association: challenges and strategies

AGC

GGC

Ser

Gly

Positional Cloning

HGM2006, Helsinki

Page 3: Exploring complex diseases using genome-wide association: challenges and strategies

LinkageDisequilibrium

Linkage

HGM2006, Helsinki

Page 4: Exploring complex diseases using genome-wide association: challenges and strategies

Daly et al. Nature Genetics, 2001 HGM2006, Helsinki

Page 5: Exploring complex diseases using genome-wide association: challenges and strategies

Genome-wide Association Study

Candidate Gene/Region Association Study

Genotyping tagSNPsSelect tagSNPs

Association analysis

HGM2006, Helsinki

Page 6: Exploring complex diseases using genome-wide association: challenges and strategies

Challenges

• Adjustment for multiple testing and power

• Portability of tagging SNPs between populations

• Population stratification

• Mapping the mutation

• Exploring gene-gene interaction

HGM2006, Helsinki

Page 7: Exploring complex diseases using genome-wide association: challenges and strategies

Challenges

• Adjustment for multiple testing and power

• Portability of tagging SNPs between populations

• Population stratification

• Mapping the mutation

• Exploring gene-gene interaction

HGM2006, Helsinki

Page 8: Exploring complex diseases using genome-wide association: challenges and strategies

Multiple Testing

• Large number of SNPs

– Number of tagging SNPs remains to be large (106)

• Multiple testing problem:

– Stringent p-value (10-6 – 10-7)

– Freimer and Sabatti (2004)

– Sample size and power

• Association:

– Linear transformation: T is an invariable

– Nonlinear transformation

)()( PPPPT ATA

7105 gwP

HGM2006, Helsinki

Page 9: Exploring complex diseases using genome-wide association: challenges and strategies

Motivation

PPPhPh AA )()(

)()( PhPh A

Statistics based on

Higher Power?

Statistics based on

PP A

Low Power

HGM2006, Helsinki

Page 10: Exploring complex diseases using genome-wide association: challenges and strategies

Nonlinear Transformations

Entropy

Function Derivative

xx log xlog1 Exponential

xexe

12 xx 12 xPolynomial

Sigmoid

xe 1

12)1( x

x

e

e

Gaussian

2

2

2

)(

cx

e

2

2

2

)(

2

cx

exc

Reciprocal

x

12

1

x

HGM2006, Helsinki

Page 11: Exploring complex diseases using genome-wide association: challenges and strategies

Power (Case-Control )

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.5

1

1.5

Allele Frequency

Expecte

d N

oncentr

ality

Para

mete

r

Entropy

2

Exp

Quadratic

Sigmoid

Gasaian

Reciprocal

Expected noncentrality parameters of the nonlinear test statistics

NA=NG=100, PD=0.5

HGM2006, Helsinki

Page 12: Exploring complex diseases using genome-wide association: challenges and strategies

Association test of MMP-2 gene with esophageal carcinoma

P values entropy exponential polynomial sigmoid reciprocal χ2

3.2 ×10-8 2.3 ×10-7 1.9 ×10-7 2.0 ×10-7 5.1 ×10-6 7.0 ×10-6

Yu C, et al. Cancer Res 2004, 64: 7622-7628

Association Studies

HGM2006, Helsinki

Page 13: Exploring complex diseases using genome-wide association: challenges and strategies

Challenges

• Adjustment for multiple testing and power

• Portability of tagging SNPs between populations

• Population stratification

• Mapping the mutation

• Exploring gene-gene interaction

HGM2006, Helsinki

Page 14: Exploring complex diseases using genome-wide association: challenges and strategies

How LD patterns are compared between populations?

• Step 1: Infer haplotype blocks for each population• Step 2: Compare the boundaries of LD blocks between

populations.Pop A

Pop BTarget SNP

HGM2006, Helsinki

Page 15: Exploring complex diseases using genome-wide association: challenges and strategies

HGM2006, Helsinki

Page 16: Exploring complex diseases using genome-wide association: challenges and strategies

Factors Influencing Block Inferences

• Sample size

• Criterion and thresholds

• Genotyping error

• Gene flow

• Search algorithm

HGM2006, Helsinki

Page 17: Exploring complex diseases using genome-wide association: challenges and strategies

Af

As Eu

Daic (Thai) ?

HGM2006, Helsinki

Page 18: Exploring complex diseases using genome-wide association: challenges and strategies

Samples

Uighur 45

Han50

Wa45 Zhuang

44

Hmong 46

European40

African American48

Samoan50

HGM2006, Helsinki

Page 19: Exploring complex diseases using genome-wide association: challenges and strategies

SNP Selection and Genotyping

• Selected from dbSNP (build 117)• Most of them are double-hits• 26,112 SNPs on Chro. 21• 1 SNP for every 1.3 kb (Golden Path b.34)

• Illumina BeadLab platform• 17 oligonucleotide primer sets• Three QA criteria

– Samples– SNP: trios & duplicates– SNP: Hardy-Weinberg Expectation

HGM2006, Helsinki

Page 20: Exploring complex diseases using genome-wide association: challenges and strategies

Zhuang Han Hmong

Samoan Uighur

Wa

European African AmericanHGM2006, Helsinki

Page 21: Exploring complex diseases using genome-wide association: challenges and strategies

Phylogeny of Human Populations

HMJ

CCY

HAN

WBM

UIG

EUR

AA0.0684

0.0372

0.0093

0.0133

0.0093

0.0202

0.0039

0.0103

0.0341

0.0016

0.0023

0.01

Genetic Distance (FST)

HGM2006, Helsinki

Hmong

Zhuang

Han

Wa

Uyghur

European

African

Page 22: Exploring complex diseases using genome-wide association: challenges and strategies

Measurement of LD Sharing

• SNPs presented in both Pop A & Pop B• SNPs with MAF 0.1 were included• In LD, if r2 c (c = 0.1, 0.5, 0.8)

Pop A

Pop BTarget SNP

a = # LD in A

c = # LD in A & B

b = # LD in B

SAB = c/a

SBA = c/b

200kb

HGM2006, Helsinki

Page 23: Exploring complex diseases using genome-wide association: challenges and strategies

0.5

0.6

0.7

0.8

0.9

0.00 0.02 0.04 0.06 0.08 0.10 0.12

Fst

S

r2 > 0.1

r2 > 0.5SAB ~ FST

FST increases with time after divergence (t)

In non-Africans

HGM2006, Helsinki

Page 24: Exploring complex diseases using genome-wide association: challenges and strategies

Pop A

Pop BTarget SNP

a = # LD in A

c = # LD in A & B

b = # LD in B

SAB = c/a

SBA = c/b

200kb

Correlation of LD between Populations = corr(a,b)

HGM2006, Helsinki

Page 25: Exploring complex diseases using genome-wide association: challenges and strategies

Correlation of LD Between Populations and Genetic Distance (FST)

0. 5

0. 6

0. 7

0. 8

0. 9

1

0 0. 05 0. 1 0. 15

HGM2006, Helsinki

Page 26: Exploring complex diseases using genome-wide association: challenges and strategies

Portability of tagging SNPs (RAB)

RAB =Number of SNPs captured by tagSNPs

Total number of SNPs

Pop A

Pop B

Portability from A to B = RAB

HGM2006, Helsinki

Page 27: Exploring complex diseases using genome-wide association: challenges and strategies

0.05

0.10

0.15

0.20

0.25

0.00 0.02 0.04 0.06 0.08 0.10 0.12

Fst

Rab

r2 > 0.1

r2 > 0.5

RAB ~ FST

• R can be estimated using FST

• FST can be estimated using a small number of SNPs• Conclusion: R can be approximately estimated by typing a small number of SNPs

1-

HGM2006, Helsinki

Page 28: Exploring complex diseases using genome-wide association: challenges and strategies

t

RAB FST

HGM2006, Helsinki

Page 29: Exploring complex diseases using genome-wide association: challenges and strategies

Conclusions

• Substantial LD sharing between populations: ancestral LDs

• tagSNPs are generally portable between populations, at least within Asia

• Portability of a population to another can be estimated empirically using a small set of SNPs

HGM2006, Helsinki

Page 30: Exploring complex diseases using genome-wide association: challenges and strategies

Challenges

• Adjustment for multiple testing and power

• Portability of tagging SNPs between populations

• Population stratification

• Mapping the mutation

• Exploring gene-gene interaction

HGM2006, Helsinki

Page 31: Exploring complex diseases using genome-wide association: challenges and strategies

Population Stratification

• 209 languages belonging to 6 linguistic families• Consistent observation of south-north differentiation• Affect the power of association studies - false positives• Different loci show different level of differentiation: Is

there an adequate adjustment?

HGM2006, Helsinki

Page 32: Exploring complex diseases using genome-wide association: challenges and strategies

Individual treeChromosome 2120,288 SNPs

HGM2006, Helsinki

Page 33: Exploring complex diseases using genome-wide association: challenges and strategies

Cluster Decomposition of Chinese PopulationsHGM2006, Helsinki

Page 34: Exploring complex diseases using genome-wide association: challenges and strategies

Y Chromosomes143 populations

mtDNA91 populations

CODIS STRs79 populations

HLA-A107 populations

Geographic Genetic Clines Based on Principle Components

HGM2006, Helsinki

Page 35: Exploring complex diseases using genome-wide association: challenges and strategies

Distributions of mtDNA Haplogroups

HGM2006, Helsinki

Page 36: Exploring complex diseases using genome-wide association: challenges and strategies

Distributions of Y Haplogroups

HGM2006, Helsinki

Page 37: Exploring complex diseases using genome-wide association: challenges and strategies

All haplogroups

All haplogroups

Major haplogroups

HGM2006, Helsinki

Page 38: Exploring complex diseases using genome-wide association: challenges and strategies

Uyghurs

HGM2006, Helsinki

Page 39: Exploring complex diseases using genome-wide association: challenges and strategies

Uyghurs

HGM2006, Helsinki

Page 40: Exploring complex diseases using genome-wide association: challenges and strategies

Population Stratification

• Different loci show different level of differentiation• Admixture indeed exist at least in some of the

populations• Adjustment for population stratification using average

differentiation is not adequate

HGM2006, Helsinki

Page 41: Exploring complex diseases using genome-wide association: challenges and strategies

Challenges

• Adjustment for multiple testing and power

• Portability of tagging SNPs between populations

• Population stratification

• Mapping the mutation

• Exploring gene-gene interaction

HGM2006, Helsinki

Page 42: Exploring complex diseases using genome-wide association: challenges and strategies

Perfect Phylogeny Approach

• No recombination and recurrent mutation

• No loop in network

• Not necessarily continuous

• Objective: Group SNPs into PP sets

PP(A)PP(B)PP(C)

HGM2006, Helsinki

Page 43: Exploring complex diseases using genome-wide association: challenges and strategies

1

1

2

34

5432site 1site 2site 3site 4

(1, 2, 3) (4, 5)(2 , 3) (1, 4, 5)(1, 2, 3, 5) (4)(2) (1, 3, 4, 5)

Inference of Phylogeny

HGM2006, Helsinki

Page 44: Exploring complex diseases using genome-wide association: challenges and strategies

Sample Size

HaploTree PHASE 2.0.2 PPH

Accuracy Run time Accuracy Run time Accuracy Run time

25 94.81% 0.36s 94.55% 12.23s 92.50% 0.14s

50 97.44% 0.58s 97.37% 14.37s 96.48% 0.23s

100 98.78% 0.82s 98.74% 18.42s 98.07% 0.62s

Comparison of Different Algorithms

HGM2006, Helsinki

Page 45: Exploring complex diseases using genome-wide association: challenges and strategies

1

1

2

34

5432site 1site 2site 3site 4

(1, 2, 3) (4, 5)(2 , 3) (1, 4, 5)(1, 2, 3, 5) (4)(2) (1, 3, 4, 5)

Inference of Phylogeny

HGM2006, Helsinki

Page 46: Exploring complex diseases using genome-wide association: challenges and strategies

Identification of Disease Mutation

• For each PP, it allows a stepwise search to localize the most likely branch (edge) of the mutation.

• The best PP can be determined based on the likelihood (with adjustment of degree of freedom)

PP(A)PP(B)PP(C)

HGM2006, Helsinki

Page 47: Exploring complex diseases using genome-wide association: challenges and strategies

Challenges

• Adjustment for multiple testing and power

• Portability of tagging SNPs between populations

• Population stratification

• Mapping the mutation

• Exploring gene-gene interaction

HGM2006, Helsinki

Page 48: Exploring complex diseases using genome-wide association: challenges and strategies

A Study of CAD

• Coronary Atherosclerosis in Chinese Populations

• 123 candidate genes belong to several pathways including antioxidant, inflammation, coagulation

• 1,518 tagSNPs typed

• 916 samples (492 cases and 424 controls)

HGM2006, Helsinki

Page 49: Exploring complex diseases using genome-wide association: challenges and strategies

HGM2006, Helsinki

Page 50: Exploring complex diseases using genome-wide association: challenges and strategies

CD36MMP8

PDGFC

DSCR1

ITGB1

ITGA2

PDGFB

SELL CCR2ITGA6

LAMA4

EDN1 SELE

TGFB3

VEGF

MSR1

NFKB1

MMP9IL1B

ACE

PON2

PON3 PON1

GPX3

SOD2

TXN

HMOX1GSRGCLM

NOS3GSS

NPR3

TXN

MMP9

Anti-oxidation Pathway

Inflammatory Pathway

With-PW interaction

Between-PW interactionHGM2006, Helsinki

Page 51: Exploring complex diseases using genome-wide association: challenges and strategies

CreditsCredits

• University of Texas – Houston

– Momiao Xiong, Jinying Zhao

• Chinese Human Genome Center at Shanghai

– Wei Huang, Haifeng Wang, Ying Wang, Zhu Chen, Guoping Zhao

• Fudan University & CAS-MPG Institute of Computational Biology

– Shuhua Xu, Fuzhong Xue, Yungang He, Yi Wang, Ming Lu, Ji Qian, Bo Wen, Hui Li, Wenqing Fu, Li Jin

HGM2006, Helsinki