alignments

28
Alignments Why do Alignments?

Upload: glenna

Post on 24-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Alignments. Why do Alignments?. Detecting Selection. Evolution of Drug Resistance in HIV. Selection on Amino Acid Properties. TreeSAAP (2003) Wu Method ( Sainudiin et al. 2005 ). Alpha-helical tendencies Average number of surrounding residues Beta-structure tendencies Bulkiness - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Alignments

Alignments Why do Alignments?

Page 2: Alignments

DetectingSelection

Evolution of Drug Resistance in HIV

Page 3: Alignments

Selection on Amino Acid Properties TreeSAAP (2003) Wu Method (Sainudiin et al. 2005)

Page 4: Alignments

TreeSAAP Properties Alpha-helical tendencies Average number of surrounding

residues Beta-structure tendencies Bulkiness Buriedness Chromatographic Index Coil tendencies Composition Compressibility Equilibrium constant (ionization of

COOH) Helical contact area Hydropathy Isoelectric point Long-range non-bonded energy Mean r.m.s. fluctuation displacement

Molecular volume Molecular weight Normalized consensus hydrophobicity Partial specific volume Polar requirement Polarity Power to be at the C-terminal Power to be at the middle of alpha-helix Power to be at the N-terminal Refractive index Short and medium range non-bonded

energy Solvent accessible reduction ratio Surrounding hydrophobicity Thermodynamic transfer

hydrophobicity Total non-bonded energy Turn tendencies

Page 5: Alignments

TreeSAAP

Page 6: Alignments

Rhinoviruses

Page 7: Alignments

SelectedSites

Page 8: Alignments

3D Mapping

Page 9: Alignments

PHENOTYPEGENOTYPE

ENVIRONMENT

OPSIN: Model System for Molecular Evolution

Wavelength (nm)400 500 600 700

UV IR

CRLAKIAMTTVALWFIAWTPYLLINWVGMFARSYLSPVYTIWGYVFAKANAVYNPIVYAISHPKYRAAMEKKLPCLSCKTESDDVSESASTTTSS

Page 10: Alignments

Is max Correlated with Ecological Differences?

microscopic thin beam of spectral light

INPUT OUTPUT

INPUT – OUTPUT = pigment absorbance

Detect light not absorbed by the photopigment

400 – 700 nm at 1nm intervals

Page 11: Alignments

0.1

Heliconius eratoHeliconius saraBicyclus anynanaJunonia coenia

Vanessa carduiPapilio xuthus Rh1Papilio xuthus Rh3

Pieris rapaeManduca sextaGalleria mellonellaSpodoptera exiguaPapilio xuthus Rh2

Osmia rufaBombus terretsrisApis mellifera

Camponotus abdominalisCataglyphis bombycinus

Schistocerca gregariaSphrodromantis sp.

Drosophila melanogaster Rh6Drosophila melanogaster Rh1Calliphora erythrocephala Rh1

Drosophila melanogaster Rh2Neogonodactylus oerstedii Rh3Neogonodactylus oerstedii Rh1

Neogonodactylus oerstedii Rh2Homarus gammarus

Neomysis americanaHolmesimysis costata

Procambarus milleriOrconectes virilisProcambarus clarkiiCambarus ludovicianusCambarellus schufeldtiiEuphausia suberba

Mysis relicta sp.IVArchaeomysis grebnitzkii

Limulus polyphemusLimulus polyphemusHemigrapsus sanguineusHemigrapsus sanguineus

Camponotus abdominalisCataglyphis bombycinusApis mellifera

Manduca sextaPapilio xuthus Rh5

Drosophila melanogaster Rh4Drosophila melanogaster Rh3

Apis melliferaSchistocerca gregaria

Papilio xuthus Rh4Manduca sexta

Drosophila melanogaster Rh5Loligo pealiiLoligo forbesiLoligo subulata

Sepia officinalisTodarodes pacificus

Enteroctopus dofleiniGallus gallus pinealAnolis carolinensis pineal

Bos taurus rhodopsin Homo sapiens melatonin 1A

Homo sapiens GPR52

Insect LWS508-575 nm

Crustacean LWS496-533 nm

Insect UV345-375nm

Cephalopod Rh480-499nm

Crustacean MWS (480)Chelicerate LWS (520)

Insect MWS420-490 nm

Insect BL430-460nm

Invertebrate Opsin EvolutionPHYML

amino acid ML tree

Thick branches indicate bootstrap values >Thicker branches indicate bootstrap values > 90%

Page 12: Alignments

Coil Tendencies, Compressibility, Alpha-Helix

Page 13: Alignments

Amino acid alignment number

-2

0

2

4

6

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260

Coil Tendencies

-2

0

2

4

6

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260

Compressibility

-2

0

2

4

6

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260

Power to be at mid alpha

-2

0

2

4

6

8

10

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260

Refractive Index

Z-sc

ore

TMI TMII TMIII TMIV TMV TMVI

TreeSAAP

Page 14: Alignments

0.1

Heliconius eratoHeliconius saraBicyclus anynanaJunonia coenia

Vanessa carduiPapilio xuthus Rh1Papilio xuthus Rh3

Pieris rapaeManduca sextaGalleria mellonellaSpodoptera exiguaPapilio xuthus Rh2

Osmia rufaBombus terretsrisApis mellifera

Camponotus abdominalisCataglyphis bombycinus

Schistocerca gregariaSphrodromantis sp.

Drosophila melanogaster Rh6Drosophila melanogaster Rh1Calliphora erythrocephala Rh1

Drosophila melanogaster Rh2Neogonodactylus oerstedii Rh3Neogonodactylus oerstedii Rh1

Neogonodactylus oerstedii Rh2Homarus gammarus

Neomysis americanaHolmesimysis costata

Procambarus milleriOrconectes virilisProcambarus clarkiiCambarus ludovicianusCambarellus schufeldtiiEuphausia suberba

Mysis relicta sp.IVArchaeomysis grebnitzkii

Limulus polyphemusLimulus polyphemusHemigrapsus sanguineusHemigrapsus sanguineus

Camponotus abdominalisCataglyphis bombycinusApis mellifera

Manduca sextaPapilio xuthus Rh5

Drosophila melanogaster Rh4Drosophila melanogaster Rh3

Apis melliferaSchistocerca gregaria

Papilio xuthus Rh4Manduca sexta

Drosophila melanogaster Rh5Loligo pealiiLoligo forbesiLoligo subulata

Sepia officinalisTodarodes pacificus

Enteroctopus dofleiniGallus gallus pinealAnolis carolinensis pineal

Bos taurus rhodopsin Homo sapiens melatonin 1A

Homo sapiens GPR52

Insect LWS508-575 nm

Crustacean LWS496-533 nm

Insect UV345-375nm

Cephalopod Rh480-499nm

Crustacean MWS (480)Chelicerate LWS (520)

Insect MWS420-490 nm

Insect BL430-460nm

Invertebrate Opsin EvolutionPHYML

amino acid ML tree

Thick branches indicate bootstrap values >Thicker branches indicate bootstrap values > 90%

Page 15: Alignments

Homology

Page 16: Alignments

Homology definitions Homology is an evolutionary relationship that

either exists or does not. It cannot be partial. An ortholog is a homolog that arose through a

speciation event A paralog is a homolog that arose through a

gene duplication event. Paralogs often have divergent function.

Similarity is a measure of the quality of alignment between two sequences. High similarity is evidence for homology. Similar sequences may be orthologs or paralogs.

Page 17: Alignments

One More Homology type Xenology – similarity due to horizontal

gene transfer (HGT) How do you discover this?

Page 18: Alignments

Alignment Problem (Optimal) pairwise alignment consists of

considering all possible alignments of two sequences and choosing the optimal one.

Sub-optimal (heuristic) alignment algorithms are also very important: eg BLAST

Page 19: Alignments

Key Issues Types of alignments (local vs.

global) The scoring system The alignment algorithm Measuring alignment significance

Page 20: Alignments

Types of Alignment Global—sequences aligned from end-

to-end. Local—alignments may start in the

middle of either sequence Ungapped—no insertions or deletions

are allowed Other types: overlap alignments,

repeated match alignments

Page 21: Alignments

Local vs. Global Pairwise Alignments A global alignment includes all elements of

the sequences and includes gaps. A global alignment may or may not include "end

gap" penalties. Global alignments are better indicators of

homology and take longer to compute. A local alignment includes only

subsequences, and sometimes is computed without gaps. Local alignments can find shared domains in

divergent proteins and are fast to compute

Page 22: Alignments

How do you compare alignments? Scoring scheme

What events do we score? Matches Mismatches Gaps

What scores will you give these events? What assumptions are you making?

Score your alignment

Page 23: Alignments

Scoring Matrices How do you determine scores? What is out there already for your use? DNA versus Amino Acids?

TTACGGAGCTTC CTGAGATCC

Page 24: Alignments

Multiple Sequence AlignmentGlobal versus Local Alignments

Progressive alignment Estimate guide tree Do pairwise alignment on subtreesClustalX

Page 25: Alignments

Improvements Consistency-based Algorithms

T-Coffee - consistency-based objective function to minimize potential errors

Generates pair-wise global (Clustal) Local (Lalign) Then combine, reweight, progressive alignment

Page 26: Alignments

Iterative Algorithms Estimate draft progressive alignment

(uncorrected distances) Improved progressive (reestimate guide

tree using Kimura 2-parameter) Refinement - divide into 2 subtrees,

estimate two profiles, then re-align 2 profiles

Continue refinement until convergence

Page 27: Alignments

Software Clustal T-Coffee MUSCLE (limited models) MAFFT (wide variety of models)

Page 28: Alignments

Comparisons Speed

Muscle>MAFFT>CLUSTALW>T-COFFEE

Accuracy MAFFT>Muscle>T-COFFEE>CLUSTALW

Lots more work to do here!