computational genetics winter 2013 lecture 9genetics.cs.ucla.edu/cs124/lecture/lecture9.pdf ·...

50
Computational Genetics Winter 2013 Lecture 9 Eleazar Eskin University of California, Los Angeles

Upload: others

Post on 01-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Computational Genetics Winter 2013 Lecture 9

Eleazar Eskin University of California, Los Angeles

Page 2: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Ancestry Inference

Lecture 9. February 13th, 2013

(Slides from Eran Halperin)

Page 3: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Ancestry and ALL relapse

- Native American ancestry correlates with acute lymphoblastic leukemia’s relapse - Children with > 10% Native American ancestry have considerably higher chances to relapse.

Yang et al., Nature Genetics, 2011

Page 4: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Novembre et al., Nature, 2008

Learn about your ancestry 4

Page 5: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

What is a population? n  Defined by frequencies of mutations.

5

SNP A SNP B SNP C

European .3 .1 .4

African .2 .2 .3 Asian .2 .1 .5

Page 6: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Modeling Mutation Frequency

Spain Russia Germany

.8

.5

.2 Freq

uenc

y

? ?

6

Page 7: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Likelihood from Population

n  SNP has alleles A,G. n  Population Frequencies

Spain: P(A)=0.2 Germany: P(A)=0.5 Russia: P(A)=0.8

n  Genotype Likelihoods Spain Germany Russia AA = (0.2)2 AA = (0.5)2 AA = (0.8)2 AG = 2(0.2)(0.8) AG = 2(0.5)2 AG =

2(0.2)(0.8) GG = (0.8)2 GG = (0.5)2 GG = (0.2)2

7

g = #A’s p = P(A) Likelihood: pg(1-p)(2-g) Log Likelihood: gln(p)+(2-g)ln(1-p)

Page 8: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Spatial Mutation Frequency

Spain Russia Germany

.8

.5

.2 Freq

uenc

y

France

8

Page 9: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

2D Mutation Frequency Functions

n  Mutation frequency function over a map

9

f (x) = 11+ exp(−aT x − b)

Page 10: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Fitting mutation frequency functions

n  Fitting frequency functions ¨  Log likelihood: ¨  Fitting mutation frequency: ¨  Can be solved with convex optimization 10

gij ln f j (xi )+ (2− gij )ln(1− f j (xi ))j∑

i∑

minaj ,bj

gij ln f j (xi )+ (2− gij )ln(1− f j (xi ))i∑

f j (x) =1

1+ exp(−ajT x − bj )

Page 11: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Placing Individuals

n  Given known frequency functions Given individual genotypes

n  Maximum likelihood placement:

¨  Convex optimization

11 0

0.20.4

0.60.8

1

00.2

0.40.6

0.810

0.2

0.4

0.6

0.8

1

00.2

0.40.6

0.81

00.2

0.40.6

0.810

0.2

0.4

0.6

0.8

1

00.2

0.40.6

0.81

00.2

0.40.6

0.810

0.2

0.4

0.6

0.8

1

SNP A

SNP B

SNP C

minxi

gij ln f j (xi )+ (2− gij )ln(1− f j (xi ))j∑

Page 12: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Chicken or egg….

12 0

0.20.4

0.60.8

1

00.2

0.40.6

0.810

0.2

0.4

0.6

0.8

1

00.2

0.40.6

0.81

00.2

0.40.6

0.810

0.2

0.4

0.6

0.8

1

00.2

0.40.6

0.81

00.2

0.40.6

0.810

0.2

0.4

0.6

0.8

1

SNP A

SNP B

SNP C

Page 13: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Localization through a probabilistic model n  The allele frequency at a SNP j for an

individual in position (x,y) is given by fj(x,y). n  In order to find (x,y) for each individual and fj

for each SNP j, one can optimize the likelihood: ¨  Maximize the likelihood of the positions given the

slope functions. ¨  Maximize the likelihood of the slope functions given

the positions.

n  Can be used to localize mixed individuals.

Page 14: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

POPRES Data

n  3,192 individuals n  500,568 SNPs n  Each individual has

4 grandparents from same country

n  [Novembre et al., 2008] ¨  PCA Method

14

Page 15: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

SPatial Ancestry Analysis (SPA)

¨  3,192 individuals ¨  Each with ancestry

from single country in Europe

15

Page 16: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Globe Mapping

n  Frequency functions defined on a sphere

16

Page 17: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Genomes and World Geography

Human Genome Diversity Panel Africa Europe Middle East Central South Asia East Asia Oceania America

17

Page 18: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Detection Selection

n  Sharp allele frequency changes may indicate selection

18

f (x) = 11+ exp(−aT x − b)

Possible Selection

Page 19: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

LCT Gene Region

19

Chromosome 2

Page 20: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Genes under selection

20

Typical Gene LCT Gene

Page 21: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Most extreme frequency changes

21

Page 22: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Principal Component Analysis

n  Each individual can be thought of as a vector of {0,1,2}n where n is the number of SNPs. ¨  We get a matrix G of genotypes.

n  PCA searches for the n-dimensional direction v such that the projection of the genotypes on v maximizes the variance.

Page 23: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

PCA Overview

23

Page 24: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Principal Component Analysis

Plotting the data on a one dimensional line for which the variance is maximized.

Page 25: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

PCA Mapping

25

Page 26: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

HapMap PCA 1-2

26

!!"!#

!!"!$

!!"!%

!!"!&

!!"!'

!!"!(

!

!"!(

!"!'

!"!&

!!"!#

!!"!$

!!"!%

!!"!&

!!"!'

!!"!(

!

!"!(

!"!'

!"!&

!"!%

)

)

*+,-./0)/0.123,4)-0)567389123):5*

:3/8),12-;1032)9-38)<6,381,0)/0;)=1231,0)>7,6?1/0)/0.123,4

@/0)A8-0121)-0)B1-C-0DE)A8-0/

A8-0121)-0)F13,6?6G-3/0)H10I1,E)A6G6,/;6

J7C/,/3-)K0;-/02)-0)@672360E)L1M/2

N/?/0121)-0)L6O46E)N/?/0

P784/)-0)=1Q741E)R104/

F1M-./0)/0.123,4)-0)P62)*0D1G12E)A/G-+6,0-/

F//2/-)-0)R-04/9/E)R104/

L62./02)-0)K3/G4

S6,7Q/)-0)KQ/;/0E)<-D1,-/)T=123)*+,-./U

Page 27: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

HapMap PCA 1-3

27

!!"!#

!!"!$

!!"!%

!!"!&

!!"!'

!!"!(

!

!"!(

!"!'

!"!&

!!"!#!!"!$

!!"!%!!"!&

!!"!'!!"!(

!!"!(

!"!'!"!&

!"!%

!!"!%

!!"!'

!

!"!'

!"!%

!"!#

!"!)

!"(

!"('

*

*

+,-./01*01/234-5*.1*67849:234*;6+

;409*-23.<2143*:.49*=7-492-1*01<*>2342-1*?8-7@201*01/234-5

A01*B9.1232*.1*C2.D.1EF*B9.10

B9.1232*.1*[email protected]*I21J2-F*B7H7-0<7

K8D0-04.*L1<.013*.1*A783471F*M2N03

O0@01232*.1*M7P57F*O0@01

Q8950*.1*>2R852F*S2150

G2N./01*01/234-5*.1*Q73*+1E2H23F*B0H.,7-1.0

G0030.*.1*S.150:0F*S2150

M73/013*.1*L40H5

T7-8R0*.1*LR0<01F*=.E2-.0*U>234*+,-./0V

Page 28: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

HapMap PCA 1,2,4

28

!!"!#!!"!$

!!"!%!!"!&

!!"!'!!"!(

!!"!(

!"!'!"!&

!!"!#!!"!$

!!"!%!!"!&

!!"!'!!"!(

!!"!(

!"!'!"!&

!"!%!!"!)

!!"!#

!!"!%

!!"!'

!

!"!'

!"!%

!"!#

!"!)

!"(

!"('

*

*

+,-./01*01/234-5*.1*67849:234*;6+

;409*-23.<2143*:.49*=7-492-1*01<*>2342-1*?8-7@201*01/234-5

A01*B9.1232*.1*C2.D.1EF*B9.10

B9.1232*.1*[email protected]*I21J2-F*B7H7-0<7

K8D0-04.*L1<.013*.1*A783471F*M2N03

O0@01232*.1*M7P57F*O0@01

Q8950*.1*>2R852F*S2150

G2N./01*01/234-5*.1*Q73*+1E2H23F*B0H.,7-1.0

G0030.*.1*S.150:0F*S2150

M73/013*.1*L40H5

T7-8R0*.1*LR0<01F*=.E2-.0*U>234*+,-./0V

Page 29: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

PCA localization of mixed individuals

Page 30: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

100%

0% 20% 40% 60% 80% Percent

racial admixture

Individual subjects 1-90

Puerto Rican Population (GALA study, E. Burchard)

European

Native American

African

Recently Admixed Populations

Page 31: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Ancestral diversity

Ance

stry

Pro

porti

ons

0.0

0.2

0.4

0.6

0.8

1.0

GALA Mexicans (Founders) Native Am YRI CEU

Ance

stry

Pro

porti

ons

0.0

0.2

0.4

0.6

0.8

1.0

GALA Puerto Ricans (Founders) Native Am YRI CEU

Differences in ancestry across population can reveal historical patterns

Page 32: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Recently Admixed Populations

After generation 1

Page 33: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Recently Admixed Populations

After generation 10

Page 34: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Admixture Mapping Admixture Mapping: Finding regions with disproportional local ancestry.

Example: (Reich et al., Nature Genetics, 2005) •  Multiple Sclerosis (MS) is more prevalent in northern Europeans than in Africans. •  Consider a case of African-Americans with MS. •  Look for regions that are significantly enriched with European ancestry. The incorporation of locus-specific ancestry to the statistic adds10%-50% power (Pasaniuc et al., PLOS Genetics, 2011)

Page 35: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Admixture Mapping Admixture Mapping: Finding regions with disproportional local ancestry.

Admixture Mapping has been widely performed (mostly on African-Americans) for BMI, hypertension, end-stage renal disease, prostate cancer, and others.

Page 36: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Inference of locus-specific ancestry n  For each individual, the ancestry can be

described as a vector over {0,1,2}.

Breakpoints are Poisson distributed λ ~ num of generations * recombination rate

Sankararaman et al., , American J. Human Genet., 2008

Page 37: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Inferring Ancestries in Windows

Local predictions

With high likelihood, there is no breakpoint in the window. (length << 1/ λ)

Deciding on the ancestry according to majority vote across overlapping windows.

Sankararaman et al., , American J. Human Genet., 2008

Page 38: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Supervised Inference of Local Ancestry n  For each individual, the ancestry can be

described as a vector over {0,1,2}. n  The allele frequencies of the ancestral

populations are known

Page 39: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Supervised Inference of Local Ancestry n  For each individual, the ancestry can be

described as a vector over {0,1,2}.

Breakpoints are Poisson distributed λ ~ num of generations * recombination rate

Pasaniuc et al., Bioinformatics., 2009

Page 40: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Inferring Ancestries in Windows

Local predictions With high likelihood, there is at most one breakpoint in the window. Pasaniuc et al., Bioinformatics., 2009

Page 41: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Breakpoint calculation

R As1

At1 At2

As2

F(s,t,r) – the probability of having As, At in the first r SNPs B(s,t,r) – the probability of having As, At in snps r,r+1,… F and B can be computed in linear time using dynamic programming

Page 42: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

A C

C G

A G

A G

A C

G G

……

……

……

……

……

……

Ancestry using Hidden Markov Models A description of the haplotype generation as a Markov model.

Structure (Pritchard et al., 2000) Admixmap (Hoggart et al., 2003) Ancestrymap (Patterson et al., 2004) SABER (Tang et al., 2006)

Page 43: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

A G

A C

G G

Ancestry using Hidden Markov Models

¨  A set of states per SNP ¨  Transition probabilities from SNP to SNP

0.7

0.3

Si Si+1

Page 44: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

A: 0.9 G: 0.1

Ancestry using Hidden Markov Models

¨  Emission probabilities

Si

Page 45: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

A C

C G

A G

A G

A C

G G

……

……

……

……

……

……

Ancestry using Hidden Markov Models n  Hidden Markov Model per population

Page 46: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

A C

C G

A G

A G

A C

G G

……

……

……

……

……

……

Ancestry using Hidden Markov Models

C C

C G

A C

A G

G G

G G

……

……

……

……

……

…… Transitions between populations based on recombination rates and number of generations

Page 47: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Estimate transition probabilities, and emission probabilities using Baum-Welch algorithm

Estimating the model parameters

47

010010001010100110…. 001010001000010100…. 010010101010100010…. 001010101000010100 010010101010000110…. 001010011000010100

010010001010100110…. 001010001000010100…. 010010101010100010…. 001010101000010100 010010101010000110…. 001010011000010100

A CC GA G

A GA CG G

…… …… ……

…… …… ……

C CC GA C

A GG GG G

…… …… ……

…… …… ……

Page 48: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

48

A CC GA G

A GA CG G

…… …… ……

…… …… ……

C CC GA C

A GG GG G

…… …… ……

…… …… ……

Genotype: 010010222001020…..

Ancestry: 00000000000000000111111111111111122222222222222……..

Page 49: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Compressed vs. uncompressed model

49

n  The number of states can be small (5-10). n  In some cases the number of states is simply

the number of haplotypes in the reference population and the emission probabilities are based on these haplotypes.

Page 50: Computational Genetics Winter 2013 Lecture 9genetics.cs.ucla.edu/cs124/lecture/Lecture9.pdf · Genomes and World Geography Human Genome Diversity Panel Africa Europe Middle East Central

Accuracy of the methods

50

n  Current methods get around r2=0.99 for African American populations, and 0.94 for Latinos.

n  Accurate methods can detect recombination rates (Hinch et al., 2011, Wegmann et al., 2011), selection forces (Tang et al., 2007).