alignments
DESCRIPTION
Alignments. Why do Alignments?. Detecting Selection. Evolution of Drug Resistance in HIV. Selection on Amino Acid Properties. TreeSAAP (2003) Wu Method ( Sainudiin et al. 2005 ). Alpha-helical tendencies Average number of surrounding residues Beta-structure tendencies Bulkiness - PowerPoint PPT PresentationTRANSCRIPT
Alignments Why do Alignments?
DetectingSelection
Evolution of Drug Resistance in HIV
Selection on Amino Acid Properties TreeSAAP (2003) Wu Method (Sainudiin et al. 2005)
TreeSAAP Properties Alpha-helical tendencies Average number of surrounding
residues Beta-structure tendencies Bulkiness Buriedness Chromatographic Index Coil tendencies Composition Compressibility Equilibrium constant (ionization of
COOH) Helical contact area Hydropathy Isoelectric point Long-range non-bonded energy Mean r.m.s. fluctuation displacement
Molecular volume Molecular weight Normalized consensus hydrophobicity Partial specific volume Polar requirement Polarity Power to be at the C-terminal Power to be at the middle of alpha-helix Power to be at the N-terminal Refractive index Short and medium range non-bonded
energy Solvent accessible reduction ratio Surrounding hydrophobicity Thermodynamic transfer
hydrophobicity Total non-bonded energy Turn tendencies
TreeSAAP
Rhinoviruses
SelectedSites
3D Mapping
PHENOTYPEGENOTYPE
ENVIRONMENT
OPSIN: Model System for Molecular Evolution
Wavelength (nm)400 500 600 700
UV IR
CRLAKIAMTTVALWFIAWTPYLLINWVGMFARSYLSPVYTIWGYVFAKANAVYNPIVYAISHPKYRAAMEKKLPCLSCKTESDDVSESASTTTSS
Is max Correlated with Ecological Differences?
microscopic thin beam of spectral light
INPUT OUTPUT
INPUT – OUTPUT = pigment absorbance
Detect light not absorbed by the photopigment
400 – 700 nm at 1nm intervals
0.1
Heliconius eratoHeliconius saraBicyclus anynanaJunonia coenia
Vanessa carduiPapilio xuthus Rh1Papilio xuthus Rh3
Pieris rapaeManduca sextaGalleria mellonellaSpodoptera exiguaPapilio xuthus Rh2
Osmia rufaBombus terretsrisApis mellifera
Camponotus abdominalisCataglyphis bombycinus
Schistocerca gregariaSphrodromantis sp.
Drosophila melanogaster Rh6Drosophila melanogaster Rh1Calliphora erythrocephala Rh1
Drosophila melanogaster Rh2Neogonodactylus oerstedii Rh3Neogonodactylus oerstedii Rh1
Neogonodactylus oerstedii Rh2Homarus gammarus
Neomysis americanaHolmesimysis costata
Procambarus milleriOrconectes virilisProcambarus clarkiiCambarus ludovicianusCambarellus schufeldtiiEuphausia suberba
Mysis relicta sp.IVArchaeomysis grebnitzkii
Limulus polyphemusLimulus polyphemusHemigrapsus sanguineusHemigrapsus sanguineus
Camponotus abdominalisCataglyphis bombycinusApis mellifera
Manduca sextaPapilio xuthus Rh5
Drosophila melanogaster Rh4Drosophila melanogaster Rh3
Apis melliferaSchistocerca gregaria
Papilio xuthus Rh4Manduca sexta
Drosophila melanogaster Rh5Loligo pealiiLoligo forbesiLoligo subulata
Sepia officinalisTodarodes pacificus
Enteroctopus dofleiniGallus gallus pinealAnolis carolinensis pineal
Bos taurus rhodopsin Homo sapiens melatonin 1A
Homo sapiens GPR52
Insect LWS508-575 nm
Crustacean LWS496-533 nm
Insect UV345-375nm
Cephalopod Rh480-499nm
Crustacean MWS (480)Chelicerate LWS (520)
Insect MWS420-490 nm
Insect BL430-460nm
Invertebrate Opsin EvolutionPHYML
amino acid ML tree
Thick branches indicate bootstrap values >Thicker branches indicate bootstrap values > 90%
Coil Tendencies, Compressibility, Alpha-Helix
Amino acid alignment number
-2
0
2
4
6
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260
Coil Tendencies
-2
0
2
4
6
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260
Compressibility
-2
0
2
4
6
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260
Power to be at mid alpha
-2
0
2
4
6
8
10
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260
Refractive Index
Z-sc
ore
TMI TMII TMIII TMIV TMV TMVI
TreeSAAP
0.1
Heliconius eratoHeliconius saraBicyclus anynanaJunonia coenia
Vanessa carduiPapilio xuthus Rh1Papilio xuthus Rh3
Pieris rapaeManduca sextaGalleria mellonellaSpodoptera exiguaPapilio xuthus Rh2
Osmia rufaBombus terretsrisApis mellifera
Camponotus abdominalisCataglyphis bombycinus
Schistocerca gregariaSphrodromantis sp.
Drosophila melanogaster Rh6Drosophila melanogaster Rh1Calliphora erythrocephala Rh1
Drosophila melanogaster Rh2Neogonodactylus oerstedii Rh3Neogonodactylus oerstedii Rh1
Neogonodactylus oerstedii Rh2Homarus gammarus
Neomysis americanaHolmesimysis costata
Procambarus milleriOrconectes virilisProcambarus clarkiiCambarus ludovicianusCambarellus schufeldtiiEuphausia suberba
Mysis relicta sp.IVArchaeomysis grebnitzkii
Limulus polyphemusLimulus polyphemusHemigrapsus sanguineusHemigrapsus sanguineus
Camponotus abdominalisCataglyphis bombycinusApis mellifera
Manduca sextaPapilio xuthus Rh5
Drosophila melanogaster Rh4Drosophila melanogaster Rh3
Apis melliferaSchistocerca gregaria
Papilio xuthus Rh4Manduca sexta
Drosophila melanogaster Rh5Loligo pealiiLoligo forbesiLoligo subulata
Sepia officinalisTodarodes pacificus
Enteroctopus dofleiniGallus gallus pinealAnolis carolinensis pineal
Bos taurus rhodopsin Homo sapiens melatonin 1A
Homo sapiens GPR52
Insect LWS508-575 nm
Crustacean LWS496-533 nm
Insect UV345-375nm
Cephalopod Rh480-499nm
Crustacean MWS (480)Chelicerate LWS (520)
Insect MWS420-490 nm
Insect BL430-460nm
Invertebrate Opsin EvolutionPHYML
amino acid ML tree
Thick branches indicate bootstrap values >Thicker branches indicate bootstrap values > 90%
Homology
Homology definitions Homology is an evolutionary relationship that
either exists or does not. It cannot be partial. An ortholog is a homolog that arose through a
speciation event A paralog is a homolog that arose through a
gene duplication event. Paralogs often have divergent function.
Similarity is a measure of the quality of alignment between two sequences. High similarity is evidence for homology. Similar sequences may be orthologs or paralogs.
One More Homology type Xenology – similarity due to horizontal
gene transfer (HGT) How do you discover this?
Alignment Problem (Optimal) pairwise alignment consists of
considering all possible alignments of two sequences and choosing the optimal one.
Sub-optimal (heuristic) alignment algorithms are also very important: eg BLAST
Key Issues Types of alignments (local vs.
global) The scoring system The alignment algorithm Measuring alignment significance
Types of Alignment Global—sequences aligned from end-
to-end. Local—alignments may start in the
middle of either sequence Ungapped—no insertions or deletions
are allowed Other types: overlap alignments,
repeated match alignments
Local vs. Global Pairwise Alignments A global alignment includes all elements of
the sequences and includes gaps. A global alignment may or may not include "end
gap" penalties. Global alignments are better indicators of
homology and take longer to compute. A local alignment includes only
subsequences, and sometimes is computed without gaps. Local alignments can find shared domains in
divergent proteins and are fast to compute
How do you compare alignments? Scoring scheme
What events do we score? Matches Mismatches Gaps
What scores will you give these events? What assumptions are you making?
Score your alignment
Scoring Matrices How do you determine scores? What is out there already for your use? DNA versus Amino Acids?
TTACGGAGCTTC CTGAGATCC
Multiple Sequence AlignmentGlobal versus Local Alignments
Progressive alignment Estimate guide tree Do pairwise alignment on subtreesClustalX
Improvements Consistency-based Algorithms
T-Coffee - consistency-based objective function to minimize potential errors
Generates pair-wise global (Clustal) Local (Lalign) Then combine, reweight, progressive alignment
Iterative Algorithms Estimate draft progressive alignment
(uncorrected distances) Improved progressive (reestimate guide
tree using Kimura 2-parameter) Refinement - divide into 2 subtrees,
estimate two profiles, then re-align 2 profiles
Continue refinement until convergence
Software Clustal T-Coffee MUSCLE (limited models) MAFFT (wide variety of models)
Comparisons Speed
Muscle>MAFFT>CLUSTALW>T-COFFEE
Accuracy MAFFT>Muscle>T-COFFEE>CLUSTALW
Lots more work to do here!