1 detecting selection using phylogeny. 2 evaluation of prediction methods comparing our results to...

49
1 Detecting selection using phylogeny

Post on 20-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

1

Detecting selection using phylogeny

2

Evaluation of prediction methods Comparing our results to experimentally verified

sites

Positive (hit)Negative

TrueTrue-positive

True-negative

FalseFalse-positive(false alarm)

False-negative(miss)

Our prediction gives:

Is t

he

pre

dic

tio

n c

orr

ect

?

3

Calibrating the method All methods have a parameter (cutoff)

that can be calibrated to improve the accuracy of the method.

For example: the E-value cutoff in BLAST

4

Calibrating E-value cutoff

Positive (hit)Negative

TrueTrue-positive(real homolog(

True-negative(real non-homolog)

FalseFalse-positive(false alarm: not

a homolog)

False-negative(missed a homolog)

Our prediction gives:

Is t

he

pre

dic

tio

n c

orr

ect?

Is

th

is a

ho

mo

log

?

5

Calibrating the E-value What will happen if we raise the E-value cutoff (for

instance – work with all hits with an E-value which is < 10) ?

Positive (hit)Negative

TrueTrue-positive

True-negative

FalseFalse-positive(false alarm)

False-negative(miss)

Our prediction gives:

Is t

he

pre

dic

tio

n c

orr

ect

?

6

Calibrating the E-value On the other hand – if we lower the E-value (look

only at hits with E-value < 10-8)

Positive (hit)Negative

TrueTrue-positive

True-negative

FalseFalse-positive(false alarm)

False-negative(miss)

Our prediction gives:

Is t

he

pre

dic

tio

n c

orr

ect

?

7

Improving prediction Trade-off between

specificity and sensitivity

8

Sensitivity vs. specificity Sensitivity =

Specificity =

True positive

True positive + False negative

Represent all the proteins which are really homologous

True negative

True negative + False positive

Represent all the proteins which are

really NOT homologous

How good we hit real homologs

How good we avoid real non-

homologs

9

Raising the E-value to 10:sensitivityspecificity

Lowering the E-value to 10-8

sensitivity specificity

10

Functional prediction in proteins

(purifying and positive selection)

11

Darwin – the theory of natural selection

Adaptive evolution:

Favorable traits will become more frequent in the population

12

Adaptive evolution When natural selection favors a single allele

and therefore the allele frequency continuously shifts in one direction

13

Kimura – the theory of neutral evolution Neutral evolution:

Most molecular changes do not change the phenotype

Selection operates to preserve a trait (no change)

15

Purifying selection (conservation) - the molecular level Histone 3

16

Synonymous vs. non-synonymous substitutions

Purifying selection: excess of synonymous substitutions

17

Synonymous vs. non-synonymous substitutions

Purifying selection: excess of synonymous substitutions

Synonymous substitution: GUUGUC

Non-synonymous substitution: GUUGCU

18

Conservation as a means of predicting function

Infer the rate of evolution at each siteInfer the rate of evolution at each site

Low rate of evolution Low rate of evolution constraints on the constraints on the site to prevent disruption of function: site to prevent disruption of function: active sites, protein-protein interactions, etc.active sites, protein-protein interactions, etc.

19

Conservation as a means of predicting function

1234567

HumanDMAAHAM

ChimpDEAAGGC

CowDQAAWAP

FishDLAACAL

S. cerevisiaeDDGAFAA

S. pombeDDGALGE

20

Which site is more conserved?

1234567

HumanDMAAHAM

ChimpDEAAGGC

CowDQAAWAP

FishDLAACAL

S. cerevisiaeDDGAFAA

S. pombeDDGALGE

21

Use Phylogenetic information

1234567

HumanDMAAHAM

ChimpDEAAGGC

CowDQAAWAP

FishDLAACAL

S. cerevisiaeDDGAFAA

S. pombeDDGALGE

A

G

A

A

A

G

A

A

A

A

G

G

22

Prediction of conserved residues by estimating evolutionary rates at each site

ConSurf/ConSeq web servers:

23

Working processInput a protein with a known 3D structure

(PDB id or file provided by the user)

Find homologous protein sequences (psi-blast)

Perform multiple sequence alignment (removing doubles)

Construct an evolutionary tree

Project the results on the 3D structure

Calculate the conservation score for each site

24

The Kcsa potassium channel

An outstanding mystery: how does the Kcsa Potassium channel conduct only K+ ions and not Na+?

25

The Kcsa potassium channel structure

The structure of the Kcsa channel was resolved in 1998 Kcsa is a homotetramer with a four-fold symmetry axis

about its pore.

26

The Kcsa potassium selectivity filter The selectivity filter identifies water

molecules bound to K+ When water is bound to Na+: no passage

27

Conservation analysis of Kcsa Use Consurf to study Kcsa conservation

28

ConSurf results

29

Conseq ConSeq performs the same analysis as ConSurf but

exhibits the results on the sequence. Predict buried/exposed relation

exposed & conserved functionally important site buried & conserved structurally important site

30

Conseq analysis

•Exposed & conserved functionally important site•Buried & conserved structurally important site

31

Positive selection & drug resistance

32

Darwin – the theory of natural selection Adaptive evolution:

Favorable traits will become more frequent in the population

33

Adaptive evolution on the molecular level

34

Adaptive evolution on the molecular level

Look for Look for changes changes

which confer which confer an advantagean advantage

35

Naïve detection Observe multiple sequence alignment:

variable regions = adaptive evolution??

36

Naïve detection The problem – how do we know which

sites are simply sites with no selection pressure (“non-important” sites) and which are under adaptive evolution?

37

Solution – look at the DNA

synonymoussynonymous

non-non-synonymoussynonymous

38

Solution – look at the DNA

Purifying selectionSyn > Non-syn

Adaptive evolution = Positive selectionNon-syn > Syn

NeutralselectionSyn = Non-syn

39

Also known as… Ka/Ks (or dn/ds, or ω)

Purifying selection: Ka < Ks (Ka/Ks <1) Neutral selection: Ka=Ks (Ka/Ks = 1) Positive selection: Ka > Ks (Ka/Ks >1)

Non-synonymous mutation rate

Synonymous mutation rate

40

Examples for positive selection Proteins involved in immune system Proteins involved in

host-pathogen interaction ‘arms-race’ Proteins following gene duplication Proteins involved in reproduction systems

41

Selecton – a server for the detection of purifying and positive selection

http://selecton.bioinfo.tau.ac.il

42

Detecting drug resistance using Selecton

43

HIV: molecular evolution paradigm

Rapidly evolving Rapidly evolving virus:virus:

1.1.High mutation High mutation rate (low rate (low fidelity of fidelity of reverse reverse transcriptase)transcriptase)

2.2.High High replication replication raterate

44

HIV Protease

Protease is an Protease is an essential essential enzymeenzyme for viral for viral

replicationreplication

Drugs against Drugs against Protease are Protease are

always part of always part of the “cocktail”the “cocktail”

45

Ritonavir Inhibitor Ritonavir (RTV) is a specific protease

inhibitor (drug)

C37H48N6O5S2

46

Drug resistance

No No drugdrug

DrugDrug

Adaptive evolution Adaptive evolution (positive selection)(positive selection)

47

Used Selecton to analyse HIV-1 protease gene sequences from patients that were treated with RTV only

48

49

Example: HIV Protease Primary mutations Secondary

mutations

novel predictions (experimental validation)

50

Summary Sequence analysis can provide valuable

information about protein function Conservation on the amino acid level

http://consurf.tau.ac.il Positive “Darwinian” selection and

purifying selection http://selecton.bioinfo.tau.ac.il