comparative genomics 2 - washington university department...

54
Primate Comparative Genomics “…man’s position in the animate world is an indispensable preliminary to the proper understanding of his relations to the universe – and this again resolves itself, in the long run, into an inquiry into the nature and the closeness of the ties which connect him with those singular creatures (the Great Apes) whose history has been sketched in the preceding pages.” -Thomas H. Huxley -Man’s Place in Nature, 1894

Upload: others

Post on 24-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Primate Comparative Genomics

“…man’s position in the animate world is an indispensable preliminary to the proper understanding of his relations to the universe – and this again resolves itself, in the long run, into an inquiry into the nature and the closeness of the ties which connect him with those singular creatures (the Great Apes) whose history has been sketched in the preceding pages.”

-Thomas H. Huxley-Man’s Place in Nature, 1894

Page 2: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Humans and Chimps

Homo sapiens à 99.9% identicalHomo sapiens and Pan troglodytes à 99.0% identical

Page 3: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Why sequence chimps?

Two white papers.http://www.genome.gov/11008056

Page 4: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Chimps Are Resistant To Many Human Diseases

Comparison of disease susceptibility between chimps and humans

Condition Human Chimp

HIV progression to AIDS common very rareInfluenza A symptoms moderate/severe mildHepatitus B/C complications moderate/severe mildPlasmodium falciparum malaria susceptible resistantMenopause universal rareE. Coli K99 gastroenteritis resistant sensitiveAlzheimer’s disease pathology complete incompleteEpithelial cancers common rareSource: Olson, M.V. et al. White paper advocating the complete sequencing of the common chimpanzee, Pan troglyodytes, (2002)

Page 5: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Chimp sequence can inform our unique population history

Kasserman et al (2001) Nat. Genet. 27: 155-56

Page 6: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Chimps can inform our unique population history

• Fixation of deleterious alleles during bottlenecks

• Chimp genome might offer a “fix” to common diseases

speech+speech--

hypertension+

hypertension--

obesity+obesity-- bipedal+

bipedal--

speech+

hypertension+obesity+

bipedal+

Page 7: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Chimp sequence can help detect selection

• Important to know the ancestral allele• Over-representation of the non-ancestral allele can

suggest selection

A

AA

AB B

BB

BBBBBB

BB

B

AB

B

BB

A allele fixed in Chimps

A and B are polymorphic in Humans

Page 8: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Only species appropriate for comparison of fast moving regions

• Pericentric duplications• Subtelomeric repeats• Y-chromosome• 5-7% of the genome is in large segmental

duplications

Page 9: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

What does the genome tell us?

• (Roughly) same size genome (3.1 GB)• (Roughly) same number of genes (~20,500)• (Roughly) same genes• Large number of papers reporting specific

differences between human and chimps• Many papers also claim to detect positive selection

on specific human genes

Not too much yet…

Page 10: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Let’s do the math

How many differences do we need to look at?(3 x 109 bp) (1% divergence) (50% in humans) = 15 million bp

In coding DNA?(15 million bp) (1.5% coding) (75% non-synonomyous) =169,000 bpor about 7 non-synonomyous changes per gene

Non-coding DNA?(15 million bp) (3.5% under selection) = 525,000 bp

Page 11: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

What are the possibilities?

• Gene loss• Gene gain• Gene mutation (a few or many)• Gene regulation• Something else?

Page 12: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Inter- versus Intraspecific Variation

He (man) resembles them (apes) as they resemble one another – he differs from them as they differ from one another.

-Thomas Huxley-Man’s Place in Nature, 1894

Page 13: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Gene Loss

Hypothesis: Humans have lost (one or more) genes compared to chimps, and it is the loss of those functions that accounts for our “humanness”

Page 14: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Sialic Acid Biologyan example of database mining

Chou et al. (1998) Proc. Natl. Acad. Sci. USA 95, 11751-11756

• Apes have lots of Neu5Gc, humans very little• Neu5Gc is located on the surface of epithelial cells• Neu5Gc is present in very low levels in the brain even

in animals that have lots of Neu5Gc

hydroxylase

humanchimpgorillamouse

A 92 bp deletion in the CMP-Neu5a hydroxylase is specific to the human lineage

ATGATGATGATG

Page 15: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Indels are ~50% of human-chimp differencesFrazer et al (2003) Genome. Res. 13: 341-346Locke et al. (2003) Genome. Res. 13: 347-357

Page 16: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Gene Gain

Hypothesis: Humans have gained (one or more) genes compared to chimps, and it is the gain of these new functions that accounts for our “humanness.”

Page 17: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Morpheus Gene Family

Johnson et al. (2001) Nature 413:514-519

Page 18: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Morpheus Gene family

Johnson et al. (2001) Nature 413:514-519

• 20 Kb duplicated segment on short arm of chromosome 16

• 98% identity in introns/non-coding DNA, 81% identity in exonic DNA

• Ka/Ks tests indicate (possibility of) extreme positive selection

• Gene family has no homology to known genes

Page 19: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Morpheus Gene Family

Page 20: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Gene Mutation

Hypothesis: Humans acquired (one or more) substitutions in the coding regions of their genes that alter the functions of those proteins so as to account for our “humanness.”

Page 21: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

What about organism specific substitutions?

http://sayer.lab.nig.ac.jp/~silver/

C-C chemokine receptor (nucleotides 1 to 60)Human_1 ATGGATTATCAAGTGTCAAGTCCAATCTATGACATCAATTATTATACATCGGAGCCCTGC Human_2 ATGGATTATCAAGTGTCAAGTCCAATCTATGACATCAATTATTATACATCGGAGCCCTGC Human_3 ATGGATTATCAAGTGTCAAGTCCAATCTATGACATCAATTATTATACATCGGAGCCCTGC Human_4 ATGGATTATCAAGTGTCAAGTCCAATCTATGACATCAATTATTATACATCGGAGCCCTGC Chimp_1 ATGGATTATCAAGTGTCAAGTCCAATCTATGACATCGATTATTATACATCGGAGCCCTGC Chimp_2 ATGGATTATCAAGTGTCAAGTCCAATCTATGACATCGATTATTATACATCGGAGCCCTGC Chimp_3 ATGGATTATCAAGTGTCAAGTCCAATCTATGACATCGATTATTATACATCGGAGCCCTGC Goril_1 ATGGATTATCAAGTGTCAAGTCCAACCTATGACATCGATTATTATACATCGGAGCCCTGC Goril_2 ATGGATTATCAAGTGTCAAGTCCAACCTATGACATCGATTATTATACATCGGAGCCCTGC Goril_3 ATGGATTATCAAGTGTCAAGTCCAACCTATGACATCGATTATTATACATCGGAGCCCTGC

************************* ********** ***********************

Problem: How can we make a conclusion based on one substitution?

Page 22: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Detecting Selective Sweeps• Selective sweeps are (thought to be) accompanied

by a local reduction in diversity• Test for overabundance of low frequency alleles

(Tajima’s D)

Apadted from Carroll, S. (2003) Nature 422:849-57

beneficial mutation arises Selection drives mutation to fixation mutation/recombination

Page 23: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

FOXP2, The Human Speech Gene?1) Mapped in families with inherited speech

defects (normal IQ)2) Forkhead transcription factor

FOXP2 Nucleotide Substitutions

Enard et al. (2002) Nature 418, 869-72

Page 24: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

FOXP2, The Human Speech Gene?

Enard et al. (2002) Nature 418, 869-72

• Sequencing of adjacent non-coding DNA revealed an excess in the number of low frequency alleles relative to what would be expected given neutral DNA in a randomly mating population of constant size

• Tajima’s D = -2.20 (P<0.01)

Page 25: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Gene Expression

Hypothesis: It is not the structural differences in proteins, but rather their differences in expression between humans and chimps that account for our “humanness.”

Page 26: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Differences in Gene Expression in the Brain?Enard et al (2002) Science 296, 340-343.

microarrays

2D Gels

Page 27: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Neutral Theory of Gene Expression?

• Consider how one might construct a neutral theory of gene expression akin to the neutral theory of gene mutation

Page 28: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

1) What is the sequence of the normal Human Genome?

2) What accounts for the genetic differences between individuals?

Page 29: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Finding Segmental Duplications in the Human Genome

Bailey et al (2002) Science 297:1003-07

Page 30: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Segmental Duplications in the Human Genome

Bailey et al (2002) Science 297:1003-07

Page 31: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Polymorphism in Segmental Duplications

Iafrate et al (2004) Nat Genet 36:949-51

Page 32: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Polymorphism in Segmental Duplications

• CGH studies find many copy number polymorphisms in segmental duplications (~12 per individual)

• Rare and common polymorphisms• Many overlap coding regions• Critical for the interpretation of

amplifications in cancers• Responsible for phenotypic differences

between people?

Page 33: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

SNPs/Hap Map/1000 GenomesThe International HapMap Project is a multi-country effort to identify and catalog genetic similarities and differences in human beings. Using the information in the HapMap, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors. The Project is a collaboration among scientists and funding agencies from Japan, the United Kingdom, Canada, China, Nigeria, and the United States. All of the information generated by the Project will be released into the public domain

Page 34: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Questions

1. How many sub-populations best partition the data?

2. How strong is the evidence for the clusters?3. Do the inferred clusters correspond to our

notions of race, ethnicity, ancestry, or geography?

4. Given the inferred clusters can we accurately can we classify new individuals?

5. Can we identify population admixture or migration events?

Page 35: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Attempts to group humans by genotype

Page 36: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

p and Fst

1. p, average nucleotide diversity (~1 in 1000 bp)

2. Fst, proportion of genetic variation that can be ascribed to differences between populations (~10%)

Page 37: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Summary of Findings• p and Fst are small• Diversity within “African” populations is

highest• Unsupervised clustering tends to support

either 3 or 4 sub-populations depending on number and type of markers and individuals included in the study, but the composition of the groups are often different in different studies

Page 38: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

A contradiction?

• Although they differed on the extent and composition of sub-populations, so far all studies have found evidence of significant sub-structure in human populations

• And yet, all studies agree that Fst is small (between 3-15%)

See review by Jorde and Wooding (2004) Nature Genet. 36: S28-S33

Page 39: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Small Fst does not imply lack of structure

A1

D2

B2

A1

B2

A1

A1

A1A2

A2D2

A1C1

C2

A1

B1

B1

B1A1

C1A2D1A2

A1C2

A1D2

C2

D1D1

A1

C1

D1

B2E2

E2

E1E1E1E1

E2

E2

E2

C2

Page 40: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Clustering human populations by genotype

K-means clustering of gene expression data

• Pick a number (k) of cluster centers

• Assign every gene to its nearest cluster center

• Move each cluster center to the mean of its assigned genes

• Repeat 2-3 until convergence

EM-based clustering of genotype data

• Pick a number (k) of sub-populations

• Assign every individual to a sub-population based on the allele frequencies in the sub-population

• Recalculate the allele frequencies in each sub population

• Repeat 2-3 until convergence

Page 41: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

An ExampleI1= (A1,B1,C2)I2= (A1,B1,C2)I3= (A1,B2,C2)I4= (A2,B2,C1)I5= (A1,B1,C1)I6= (A1,B1,C2)I7= (A1,B1,C2)I8= (A2,B2,C2)I9= (A1,B2,C1)I10= (A2,B1,C2)I11= (A2,B2,C2)I12= (A2,B2,C2)

12 individuals genotyped at three different independent biallelic loci

Page 42: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

k1 k3k2

I1= (A1,B1,C2)I2= (A1,B1,C2)I3= (A1,B2,C2)I4= (A2,B2,C1)

I5= (A1,B1,C1)I6= (A1,B1,C2)I7= (A1,B1,C2)I8= (A2,B2,C2)

I9= (A1,B2,C1)I10= (A2,B1,C2)I11= (A2,B2,C2)I12= (A2,B2,C2)

F(A1)k1=0.75F(B1)k1=0.5F(C1)k1=0.25

F(A1)k2=0.75F(B1)k2=0.75F(C1)k2=0.25

F(A1)k3=0.25F(B1)k3=0.25F(C1)k3=0.25

Consider individual I1= (A1,B1,C2)

P(I1 in k1) = (.75)(.5)(.75) = 0.28P(I1 in k2) = (.75)(.75)(.75) = 0.42P(I1 in k3) = (.25)(.25)(.75) = 0.046

Therefore reassign I1 to k2

Page 43: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

An exampleBamshad et al (2003) Am. J. Hum. Genet. 72:578-89

Page 44: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

But…Bamshad et al (2003) Am. J. Hum. Genet. 72:578-89

Page 45: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Genes mirror geography in EuropeNovembre et al. Nature 456, 98-101

Page 46: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Pharmacogenomics• Many drugs never reach the market because

of side effects in a small minority of patients

• Many drugs on the market are efficacious in only a small fraction of the population

• This variation is (in part) due to genetic determinants – OrissaàEGF mutations– Codeineàcytochrome P450 alleles

Page 47: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Question: Is race, ancestry, ethnicity, geography or genetic substructure a

reasonable proxy for genotype at alleles relevant for drug metabolism?

Answer: So far…No. Still looks as if we will have to genotype the relevant loci before making any guesses

Page 48: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Population genetic structure of variable drug response.

Wilson et al (2001) Nat Genet. 29: 265-269

A = African

B = European

C = Asian

A B CCYP1A2

GSTM1

CYP2C19

DIA4

NAT2

CYP2D6

Page 49: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Evidence for Archaic Asian Ancestry on the Human X ChromosomeGarrigan et al. (2005) Mol. Biol. And Evol. 22:189-192

1) Pseudogene on the X-chromosome2) 18 substitutions between human-chimp3) 15 substitutions between two human alleles4) Assuming a molecular clock the split between

the two human alleles is about 2 million years5) Both alleles found in southern Asia, only one

allele found in Africa6) Only human gene tree to “root” in Asia

Page 50: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Garrigan et al. (2005) Mol. Biol. And Evol. 22:189-192

Page 51: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Garrigan et al. (2005) Mol. Biol. And Evol. 22:189-192

Page 52: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Human evolution in a nutshell

chimpsH. sapien

H. ergaster

H. erectus

H. neanderthalis

5-6 mya

1 mya

0.5 mya

0.2 mya

Page 53: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

Human evolution in a nutshell

chimpsH. sapien

H. ergaster

H. erectusH. neanderthalis

5-6 mya

1 mya

0.5 mya

0.2 mya?

Page 54: Comparative genomics 2 - Washington University Department ...genetics.wustl.edu/.../04/Comparative_genomics_2.pdf · Human Diseases Comparison of disease susceptibility between chimps

So what happened?

1. Strong selection for the Asian allele in southern Asia-not likely since this is a pseudogene locus-fails Tajima’s D test

2. Gene flow between H. sapien and H.erectus in southern Asia

-branch lengths are about right for 2 million years of divergence-H. erectus was in southern Asia until 18,000 years ago

(Morwood et al. and Brown et al. in Nature (2004) vol 431.)

-supporting evidence from genetic analysis of lice and other human parasites (Reed et al (2004) PLoS 2:1972-83)