bioc4700 2014 guest lecture

56
PROTEIN EVOLUTION Function and Human Health Daniel Gaston, PhD October 30 th , 2014

Upload: dan-gaston

Post on 15-Jul-2015

143 views

Category:

Science


1 download

TRANSCRIPT

PROTEIN EVOLUTIONFunction and Human Health

Daniel Gaston, PhD October 30th, 2014

WHY DO WE CARE?whydoes all of this evolution stuff matter anyway?

Why it matters

• Pure scientific curiosity

• Knowledge is intrinsically valuable, regardless of applications

• Critical for truly understanding function

• Translating research/knowledge between model

organisms

• Evolution shapes population genetics

• Critical for understanding how mutations cause disease

Why it matters

• Ecology, ecological interactions, diversity

• Antibiotic resistance

• Microbiome

• Cancer

Major Groups of Organisms

BacteriaArchaea

Eukaryotes

Major Groups of Organisms

BacteriaArchaea

Eukaryotes

Major Groups of Organisms

BacteriaArchaea

Eukaryotes

Major Groups of EukaryotesYou are here

A Brief History of Life on Earth

Time

4.5B: Origin of the Earth

3 – 4B: Origin of Life

2.7B: Bacteria

1.5B: Eukaryotes

1B: Animals

Definitions

• Homology

• Descent from a common ancestor

• All or nothing, no such thing as percent homology

• Divergence

• Change in two sequences over time, after splitting from a common

ancestor

• Convergence

• Similarity due to independent evolutionary events

• On the amino acid level: rare and difficult to prove

EVOLUTION IN PROTEINSProcesses

Two Groups of Processes

• Mutation

• Provides raw material of evolution

• Many different processes and mechanisms

• Happens within individuals

• Selection and Drift

• Happens within populations of organisms

• Affect the frequency if mutations within organisms over time

AGTCCAAGGCCTTAA -------------> AGTTCAAGGCCTTAA

point mutation

CCTTA

AGTCCAAGGCCTTAA

insertion

-------------> AGTCCAAGGCCTTACCTTAA

AAGG

------------->AGTCCAAGGCCTTAA

deletion

AGTCC-CCTTAA

AGTCCAAGGCCTTAA

` inversion

AGTCCAAGGCCTTAA

+

GGTCCTGGAATTCAG

AGTCCAAGGCC

-------------> AGTCCCCTTCCTTAA

------------->

translocation +

AGTCCAAGGCC

GGTCCTGGAATTCAGTTAA

-------------->

duplication

AGTCCAAGGCCAGTCCAAGGCC

AAGG

AGTCCAAGGCCTTAA ---------------> AGTCCAAAGGCTTAA

recombination AGGC

Exon1 Exon 2 Exon 3

Domain 1Domain

2

Exon1Exon 2 Exon 3

Domain 2

Domain A

Exon Shuffling

Genomic Scale Mutations

Gene 1 Gene 2 Gene 3

Genomic Scale Mutations

Gene 1 Gene 2Gene 1a

Mutational Processes

• Arise generally as unrepaired mismatches during DNA

replication

• Some repair processes introduce mutation

• Chemical processes change non-replicating DNA

• Multi-cellularity buffers from all acquired (somatic)

mutations being hereditary

• Humans:

• de novo mutation rate of 1.2 x 10-8/nucleotide/generation

• ~70 per child

• Majority of paternal origin

SELECTION AND DRIFTPolmorphisms and Populations, Oh My!

Mutations, Polymorphisms, Substitutions

• Mutations: Appear in individuals within a population

• Sometimes in human genetics used to specifically describe

pathogenic or disease causing variation

• Polymorphism: An unfixed mutation of varying frequency

within a population

• In human genetics generally used to describe functionally

neutral/benign variation. Often must have a frequency of >5%

• Substitution: A fixed mutation. All individuals within a

population have the mutation

• Most often used when comparing one or more species

Selection and Drift

• Fitness

• Measured in terms of the number of offspring that survive to

themselves reproduce

• Positive Selection

• Rare

• Mutation confers some fitness advantage

• Negative Selection

• Frequent

• Mutation confers a fitness disadvantage

• Neutral

• Mutation has little to no impact on fitness

• Most frequent

Nearly Neutral Theory

Genetic Drift in Action

Examples of Positive Selection

• MHC Genes

• Balancing selection: favours diversity at loci

• Many genes involved in metabolism and digestion

• Accelerated evolution over last ~10,000 years

• Adaptation to Agriculture

• Human adaptations to high altitide

• EPAS1, PPARA, EGLN1 (Tibetans)

• CBARA1, VAV3, ARNT2, THRB (Ethiopian Highlanders)

• EGLN1 (Andean Peruvians)

Mutation at the Codon Level

Synonymous (Silent)

Mutation: Codon still codes

for the same amino acid

Non-Synonymous

Mutation: Codon now

codes for a different amino

acid (missense), premature

stop codon (nonsense), or

alters a start codon

PROTEIN FUNCTION AND

STRUCTUREImpacts on Evolution

Evolutionary Rates and Constraints

• Evolution is only partially random

• Mutations (quasi-random, non-uniform distribution of possibilities)

• Drift (Random)

• Selection (Non-random)

• Evolutionary rate at the protein level is the number of

fixed amino acid substitutions over evolutionary time

• Measured between one or more species-level comparisons

Evolutionary Rates and Constraints

• Different proteins have different overall rates of evolution

• Functional necessity

• Structural necessity

• Number of protein-protein interactions

• Different regions within a protein have different rates of

evolution

• Functional constraint

• Structural constraint

Evolutionary Rates and Constraints

All Eukaryotes site rates (63 taxa) mapped on Lobster

Enolase

low rates blue

high rates red

Site rate categories 1 and 2 (slowest sites)

Site rates Categories 3 and 4

Site rates Categories 5 and 6

Site rates Categories 7 and 8 (fastest sites)

Evolutionary Rate: Structure/Function

Relationship

• Pattern of evolution is that rates are slowest near the

centre, fastest on exterior

• Distance to catalytic centre

• Hydrophobic packing of the interior

• Spatial/size constraints in interior

• More loops and alpha-helices on exterior

• How does this change for structural proteins like tubulin or

actin?

PRACTICAL

APPLICATIONS

Identifying Disease Causing Genes

• Lynch Syndrome

• Autosomal dominant cancer syndrome

• Defective mismatch repair

• Increased risk of many cancers, particularly colorectal

Identifying the Gene using Evolutionary

Reasoning

• Inactivation of genes known to be involved in mismatch

repair in E. coli and yeast lead to ‘mutator’ phenotype

• Microsatellite instability observed

• Searched for homologous genes in humans based on

Microsatellite instability

• Identified MLH1 and MSH2

• Sequenced genes in Lynch syndrome patients and identified

mutations

Identifying Likely Pathogenic Mutations

• Needle in a stack of needles (Exome and Genome

Sequencing)

• Individual humans ~70 new mutations

• Can be hundreds to thousands of shared variants between small

numbers of individuals in a family

Evolutionary Profile of Pathogenic

Mutations

• Highly conserved amino acids more likely to be

functionally important

• Highly conserved genes more likely to be indispensable

• Conservation alone can be misleading

• Factor in evolutionary history and relatedness of species being

compared

• Best tools use many sources of information and high-level machine

learning

Exome Sequencing for Disease: Gastric

Cancer

o Older age of diagnosis

o Often diagnosed at later

stages as symptoms similar

to many common diseases

o 3rd leading cause of cancer

death worldwide: 730,000

deaths per year

o 90% of cases are sporadic

o Most cases of familial

clustering due to shared

environmental factors

o 60% of hereditary cases

caused by mutations in the

gene CDH1

Genomic

Regions

Number of

Exomes

Number of

Variants <5%

Allele Frequency

in Regions of

Interest

Number With

Medium or High

Impact

All Affected 3 14 0

Siblings Only 2 9550 525

All Variants in Exome

Variants in Shared Regions

Variant Frequency in Population

Variant Impact

Candidates

MAP3K6

Protein Kinase

ATP

Bindin

g

Proton

Acceptor

D200Y

V207G

H506Y* P946L

P958T

F849Sfs*142

Coiled-

Coil

Functional Divergence

• Duplicated genes (paralogs)

• Can diverge in function as well as sequence

Gene 1 Gene 2Gene 1a

Types of Functional Divergence

• Subfunctionalization

• Specialize and retain only a subset of ancestral function

• Neofunctionalization

• Gain a new function, lose ancestral

• Subneofunctionalization

• Specialize and elaborate

Functional Divergence and Protein

Families

Functional Divergence and Protein

Families

Detecting Functional Divergence

Detecting Functional Divergence

Glyceraldehyde-3-Phosphate

Dehydrogenase

NAD+ NADH+Pi +H+

NAD+ NADH+ Pi + H+

Cytosol: Glycolysis

Glyceraldehyde-3-Phosphate 1,3-Biphosphate

Glyceraldehyde-3-Phosphate

Dehydrogenase

NADP+ NADPH+Pi +H+

NADP+ NADPH+Pi +H+

Glyceraldehyde-3-Phosphate 1,3-Biphosphate

Plastid: Calvin Cycle

GAPDH Structure

Divergent and Convergent Evolution in

GAPDH• Many sites predicted to be functionally divergent

• 69 in the green group (GapA/B)

• 26 in GapC1

• 20 in both GapC1 and GapA/B

GAPDH Functional Residues