bioc4700 2014 guest lecture

Post on 15-Jul-2015

143 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

PROTEIN EVOLUTIONFunction and Human Health

Daniel Gaston, PhD October 30th, 2014

WHY DO WE CARE?whydoes all of this evolution stuff matter anyway?

Why it matters

• Pure scientific curiosity

• Knowledge is intrinsically valuable, regardless of applications

• Critical for truly understanding function

• Translating research/knowledge between model

organisms

• Evolution shapes population genetics

• Critical for understanding how mutations cause disease

Why it matters

• Ecology, ecological interactions, diversity

• Antibiotic resistance

• Microbiome

• Cancer

Major Groups of Organisms

BacteriaArchaea

Eukaryotes

Major Groups of Organisms

BacteriaArchaea

Eukaryotes

Major Groups of Organisms

BacteriaArchaea

Eukaryotes

Major Groups of EukaryotesYou are here

A Brief History of Life on Earth

Time

4.5B: Origin of the Earth

3 – 4B: Origin of Life

2.7B: Bacteria

1.5B: Eukaryotes

1B: Animals

Definitions

• Homology

• Descent from a common ancestor

• All or nothing, no such thing as percent homology

• Divergence

• Change in two sequences over time, after splitting from a common

ancestor

• Convergence

• Similarity due to independent evolutionary events

• On the amino acid level: rare and difficult to prove

EVOLUTION IN PROTEINSProcesses

Two Groups of Processes

• Mutation

• Provides raw material of evolution

• Many different processes and mechanisms

• Happens within individuals

• Selection and Drift

• Happens within populations of organisms

• Affect the frequency if mutations within organisms over time

AGTCCAAGGCCTTAA -------------> AGTTCAAGGCCTTAA

point mutation

CCTTA

AGTCCAAGGCCTTAA

insertion

-------------> AGTCCAAGGCCTTACCTTAA

AAGG

------------->AGTCCAAGGCCTTAA

deletion

AGTCC-CCTTAA

AGTCCAAGGCCTTAA

` inversion

AGTCCAAGGCCTTAA

+

GGTCCTGGAATTCAG

AGTCCAAGGCC

-------------> AGTCCCCTTCCTTAA

------------->

translocation +

AGTCCAAGGCC

GGTCCTGGAATTCAGTTAA

-------------->

duplication

AGTCCAAGGCCAGTCCAAGGCC

AAGG

AGTCCAAGGCCTTAA ---------------> AGTCCAAAGGCTTAA

recombination AGGC

Exon1 Exon 2 Exon 3

Domain 1Domain

2

Exon1Exon 2 Exon 3

Domain 2

Domain A

Exon Shuffling

Genomic Scale Mutations

Gene 1 Gene 2 Gene 3

Genomic Scale Mutations

Gene 1 Gene 2Gene 1a

Mutational Processes

• Arise generally as unrepaired mismatches during DNA

replication

• Some repair processes introduce mutation

• Chemical processes change non-replicating DNA

• Multi-cellularity buffers from all acquired (somatic)

mutations being hereditary

• Humans:

• de novo mutation rate of 1.2 x 10-8/nucleotide/generation

• ~70 per child

• Majority of paternal origin

SELECTION AND DRIFTPolmorphisms and Populations, Oh My!

Mutations, Polymorphisms, Substitutions

• Mutations: Appear in individuals within a population

• Sometimes in human genetics used to specifically describe

pathogenic or disease causing variation

• Polymorphism: An unfixed mutation of varying frequency

within a population

• In human genetics generally used to describe functionally

neutral/benign variation. Often must have a frequency of >5%

• Substitution: A fixed mutation. All individuals within a

population have the mutation

• Most often used when comparing one or more species

Selection and Drift

• Fitness

• Measured in terms of the number of offspring that survive to

themselves reproduce

• Positive Selection

• Rare

• Mutation confers some fitness advantage

• Negative Selection

• Frequent

• Mutation confers a fitness disadvantage

• Neutral

• Mutation has little to no impact on fitness

• Most frequent

Nearly Neutral Theory

Genetic Drift in Action

Examples of Positive Selection

• MHC Genes

• Balancing selection: favours diversity at loci

• Many genes involved in metabolism and digestion

• Accelerated evolution over last ~10,000 years

• Adaptation to Agriculture

• Human adaptations to high altitide

• EPAS1, PPARA, EGLN1 (Tibetans)

• CBARA1, VAV3, ARNT2, THRB (Ethiopian Highlanders)

• EGLN1 (Andean Peruvians)

Mutation at the Codon Level

Synonymous (Silent)

Mutation: Codon still codes

for the same amino acid

Non-Synonymous

Mutation: Codon now

codes for a different amino

acid (missense), premature

stop codon (nonsense), or

alters a start codon

PROTEIN FUNCTION AND

STRUCTUREImpacts on Evolution

Evolutionary Rates and Constraints

• Evolution is only partially random

• Mutations (quasi-random, non-uniform distribution of possibilities)

• Drift (Random)

• Selection (Non-random)

• Evolutionary rate at the protein level is the number of

fixed amino acid substitutions over evolutionary time

• Measured between one or more species-level comparisons

Evolutionary Rates and Constraints

• Different proteins have different overall rates of evolution

• Functional necessity

• Structural necessity

• Number of protein-protein interactions

• Different regions within a protein have different rates of

evolution

• Functional constraint

• Structural constraint

Evolutionary Rates and Constraints

All Eukaryotes site rates (63 taxa) mapped on Lobster

Enolase

low rates blue

high rates red

Site rate categories 1 and 2 (slowest sites)

Site rates Categories 3 and 4

Site rates Categories 5 and 6

Site rates Categories 7 and 8 (fastest sites)

Evolutionary Rate: Structure/Function

Relationship

• Pattern of evolution is that rates are slowest near the

centre, fastest on exterior

• Distance to catalytic centre

• Hydrophobic packing of the interior

• Spatial/size constraints in interior

• More loops and alpha-helices on exterior

• How does this change for structural proteins like tubulin or

actin?

PRACTICAL

APPLICATIONS

Identifying Disease Causing Genes

• Lynch Syndrome

• Autosomal dominant cancer syndrome

• Defective mismatch repair

• Increased risk of many cancers, particularly colorectal

Identifying the Gene using Evolutionary

Reasoning

• Inactivation of genes known to be involved in mismatch

repair in E. coli and yeast lead to ‘mutator’ phenotype

• Microsatellite instability observed

• Searched for homologous genes in humans based on

Microsatellite instability

• Identified MLH1 and MSH2

• Sequenced genes in Lynch syndrome patients and identified

mutations

Identifying Likely Pathogenic Mutations

• Needle in a stack of needles (Exome and Genome

Sequencing)

• Individual humans ~70 new mutations

• Can be hundreds to thousands of shared variants between small

numbers of individuals in a family

Evolutionary Profile of Pathogenic

Mutations

• Highly conserved amino acids more likely to be

functionally important

• Highly conserved genes more likely to be indispensable

• Conservation alone can be misleading

• Factor in evolutionary history and relatedness of species being

compared

• Best tools use many sources of information and high-level machine

learning

Exome Sequencing for Disease: Gastric

Cancer

o Older age of diagnosis

o Often diagnosed at later

stages as symptoms similar

to many common diseases

o 3rd leading cause of cancer

death worldwide: 730,000

deaths per year

o 90% of cases are sporadic

o Most cases of familial

clustering due to shared

environmental factors

o 60% of hereditary cases

caused by mutations in the

gene CDH1

Genomic

Regions

Number of

Exomes

Number of

Variants <5%

Allele Frequency

in Regions of

Interest

Number With

Medium or High

Impact

All Affected 3 14 0

Siblings Only 2 9550 525

All Variants in Exome

Variants in Shared Regions

Variant Frequency in Population

Variant Impact

Candidates

MAP3K6

Protein Kinase

ATP

Bindin

g

Proton

Acceptor

D200Y

V207G

H506Y* P946L

P958T

F849Sfs*142

Coiled-

Coil

Functional Divergence

• Duplicated genes (paralogs)

• Can diverge in function as well as sequence

Gene 1 Gene 2Gene 1a

Types of Functional Divergence

• Subfunctionalization

• Specialize and retain only a subset of ancestral function

• Neofunctionalization

• Gain a new function, lose ancestral

• Subneofunctionalization

• Specialize and elaborate

Functional Divergence and Protein

Families

Functional Divergence and Protein

Families

Detecting Functional Divergence

Detecting Functional Divergence

Glyceraldehyde-3-Phosphate

Dehydrogenase

NAD+ NADH+Pi +H+

NAD+ NADH+ Pi + H+

Cytosol: Glycolysis

Glyceraldehyde-3-Phosphate 1,3-Biphosphate

Glyceraldehyde-3-Phosphate

Dehydrogenase

NADP+ NADPH+Pi +H+

NADP+ NADPH+Pi +H+

Glyceraldehyde-3-Phosphate 1,3-Biphosphate

Plastid: Calvin Cycle

GAPDH Structure

Divergent and Convergent Evolution in

GAPDH• Many sites predicted to be functionally divergent

• 69 in the green group (GapA/B)

• 26 in GapC1

• 20 in both GapC1 and GapA/B

GAPDH Functional Residues

top related