1 hw clarifications homology implies shared ancestry partial sequence identity does not necessarily...
Post on 22-Dec-2015
218 views
TRANSCRIPT
![Page 1: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/1.jpg)
11
HW ClarificationsHW Clarifications
• Homology implies shared ancestry
• Partial sequence identity does not necessarily imply homology
• A high coverage of sequence identity can imply homology
Identity and Homology
![Page 2: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/2.jpg)
22
HW ClarificationsHW Clarifications
Insertions and Deletions
![Page 3: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/3.jpg)
33
Prediction of Prediction of functional/structural sites in functional/structural sites in a protein using conservation a protein using conservation
and hyper-variation and hyper-variation (ConSeq, ConSurf, Selecton)(ConSeq, ConSurf, Selecton)
![Page 4: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/4.jpg)
44
Empirical findings ofEmpirical findings ofconservation variation among sitesconservation variation among sites::
Functional/Structural sites evolveFunctional/Structural sites evolve
slowerslowerthan than
nonfunctional/nonstructural sitesnonfunctional/nonstructural sites
![Page 5: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/5.jpg)
55
Conservation = functional/structural Conservation = functional/structural importanceimportance
![Page 6: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/6.jpg)
66
Histone 3 proteinHistone 3 protein
![Page 7: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/7.jpg)
77
Xenopus MALWMQCLP-LVLVLLFSTPNTEALANQHLBos MALWTRLRPLLALLALWPPPPARAFVNQHL **** : * *.*: *:..* :. : ****
Xenopus CGSHLVEALYLVCGDRGFFYYPKIKRDIEQBos CGSHLVEALYLVCGERGFFYTPKARREVEG **************:***** ** :*::*
Xenopus AQVNGPQDNELDG-MQFQPQEYQKMKRGIVBos PQVG---ALELAGGPGAGGLEGPPQKRGIV .**. ** * * *****
Xenopus EQCCHSTCSLFQLENYCNBos EQCCASVCSLYQLENYCN **** *.***:*******
Alignment pre-pro-insulin
![Page 8: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/8.jpg)
88
<>
![Page 9: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/9.jpg)
99
![Page 10: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/10.jpg)
1010
Conserved sites: Important for the function or structureImportant for the function or structure Not allowed to mutateNot allowed to mutate “Slow evolving” sites Low rate of evolution
Variable sites: Less important (usually) Change more easily “Fast evolving” sites High rate of evolution
Conservation based inferenceConservation based inference
![Page 11: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/11.jpg)
1111
Detecting conservation: Detecting conservation: Evolutionary rates
d T
dr
2
• Rate = distance/time• Distance = number of substitutions per site • Time = 2*#years (doubled because the sequences evolved independently)
![Page 12: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/12.jpg)
1212
Rate computationRate computation
11223344556677
HumanHumanDDMMAAAAHHAAMM
ChimpChimpDDEEAAAAGGGGCC
CowCowDDQQAAAAWWAAPP
FishFishDDLLAAAACCAALL
S. cerevisiaeS. cerevisiaeDDDDGGAAFFAAAA
S. pombeS. pombeDDDDGGAALLGGEE
MSAPhylogeny
Evolutionary Model
![Page 13: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/13.jpg)
1313
http://http://conseqconseq.tau.ac.il.tau.ac.ilSite-specific rate computation toolSite-specific rate computation tool
![Page 14: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/14.jpg)
1414
Locating the active Locating the active site of Pyruvate kinasesite of Pyruvate kinase
Glycolysis pathway
![Page 15: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/15.jpg)
1515
![Page 16: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/16.jpg)
1616
![Page 17: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/17.jpg)
1717
![Page 18: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/18.jpg)
1818
Conservation scoresConservation scores:: The scores are standardized: the average score of all The scores are standardized: the average score of all
residues is residues is 00, and the standard deviation is , and the standard deviation is 11 Negative valuesNegative values: slowly evolving (= low evolutionary : slowly evolving (= low evolutionary
rate). rate). conserved sitesconserved sites The most conserved site in the protein has the lowest scoreThe most conserved site in the protein has the lowest score
Positive valuesPositive values: rapidly evolving (= fast evolutionary : rapidly evolving (= fast evolutionary rate). rate). variable sitesvariable sites The most variable site in the protein has the highest scoreThe most variable site in the protein has the highest score
Scores are relative to the protein and cannot Scores are relative to the protein and cannot be compared between different proteins!!!be compared between different proteins!!!
![Page 19: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/19.jpg)
1919
![Page 20: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/20.jpg)
2020
SWISS-PROT
![Page 21: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/21.jpg)
2121
Combining protein Combining protein structurestructure
Each protein has a particular 3D structure that determines
its function
Protein structure is better conserved than protein
sequence and more closely related to function
Analyzing a protein structure is more informative than
analyzing its sequence for function inference
![Page 22: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/22.jpg)
2222
Protein core: structurally constrained - usually conserved
Active site: functionally constrained - usually conserved
Surface: tolerant to mutations - usually variable
Core
Surface
Conservation in the structureConservation in the structure
Active site
![Page 23: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/23.jpg)
2323
http://http://consurfconsurf.tau.ac.il.tau.ac.ilSame algorithm as ConSeq, but here the resultsSame algorithm as ConSeq, but here the results are projected onto the 3D structure of the proteinare projected onto the 3D structure of the protein
![Page 24: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/24.jpg)
2424
The structure-function of the potassium The structure-function of the potassium channel transmembrane regionchannel transmembrane region
cytoplasm
![Page 25: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/25.jpg)
2525
![Page 26: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/26.jpg)
2626
![Page 27: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/27.jpg)
2727
![Page 28: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/28.jpg)
2828
![Page 29: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/29.jpg)
2929
ConSeqConSeq//ConSurfConSurf user intervention user intervention(advanced options)(advanced options)
1.1. Choosing the method for calculating the amino-acid Choosing the method for calculating the amino-acid conservation scores: (conservation scores: (BayesianBayesian/Max’ Likelihood)/Max’ Likelihood)
2.2. Entering your own MSA fileEntering your own MSA file3.3. Performing the MSA using: (Performing the MSA using: (MUSCLEMUSCLE/CLUSTALW)/CLUSTALW)4.4. Collecting the homologs from: (Collecting the homologs from: (SWISS-PROTSWISS-PROT/UniProt)/UniProt)5.5. Max. number of homologs: (Max. number of homologs: (5050))6.6. No. of PSI-BLAST iterations: (No. of PSI-BLAST iterations: (11))7.7. PSI-BLAST 3-value cutoff: (PSI-BLAST 3-value cutoff: (0.0010.001))8.8. Model of substitution for proteins: Model of substitution for proteins:
((JTTJTT/Dayhoff/mtREV/cpREV/WAG)/Dayhoff/mtREV/cpREV/WAG)9.9. Entering your own PDB fileEntering your own PDB file10.10. Entering your own TREE fileEntering your own TREE file
![Page 30: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/30.jpg)
3030
Codon-level selectionCodon-level selection
ConSeq/ConSurf:ConSeq/ConSurf: Compute the evolutionary rate of amino-acid Compute the evolutionary rate of amino-acid
sites → the data are amino acidssites → the data are amino acids
Compute only the rate of non-synonymous Compute only the rate of non-synonymous substitutionssubstitutions
UUU → UUC (Phe → Phe ): synonymous
UUU → CUU (Phe → Leu): non-synonymous
![Page 31: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/31.jpg)
3131
For For mostmost proteins, the rate of proteins, the rate of synonymoussynonymous substitutions is muchsubstitutions is much
HigherHigherthan the than the non-synonymousnon-synonymous rate rate
This is called purifying selectionpurifying selection (= conservation (= conservation in ConSeq/Surfin ConSeq/Surf))
Synonymous vs. non-synonymous substitutions
![Page 32: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/32.jpg)
3232
There are rare cases where the non-synonymous rate is much higher than the synonymous rate
This is called positive (Darwinian) positive (Darwinian) selectionselection
Synonymous vs. nonsynonymous substitutions
![Page 33: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/33.jpg)
3333
Examples:Examples: Pathogen proteins evading the host immune Pathogen proteins evading the host immune
systemsystem Proteins of the immune system detecting Proteins of the immune system detecting
pathogen proteinspathogen proteins Pathogen proteins that are drug targetsPathogen proteins that are drug targets Proteins that are products of gene duplicationProteins that are products of gene duplication Proteins involved in the reproductive systemProteins involved in the reproductive system
Positive Selection
The hypothesis:The hypothesis:
promotes the fitness of the organism promotes the fitness of the organism
![Page 34: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/34.jpg)
3434
Computing synonymous and non-synonymous rates
Evolutionary Model
Codon MSAPhylogeny
![Page 35: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/35.jpg)
3535
Inferring positive selectionInferring positive selection
Look at the ratio between the non-Look at the ratio between the non-synonymous rate (synonymous rate (KKaa) and the ) and the
synonymous rate (synonymous rate (KKss))
s
ak
k
![Page 36: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/36.jpg)
3636
Inferring positive selectionInferring positive selection
Ka/Ks Ka/Ks < 1< 1 purifying selectionpurifying selection
Ka/KsKa/Ks > 1 > 1 positive selectionpositive selection
Ka/KsKa/Ks = 1 = 1 no selection (neutral)no selection (neutral)
![Page 37: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/37.jpg)
3737
Our evolutionary model assumes Our evolutionary model assumes there is positive selection in the there is positive selection in the datadata
By chance alone we expect our By chance alone we expect our model to find a few sites with model to find a few sites with Ka/KsKa/Ks >1 >1
Is this really indicative of positive Is this really indicative of positive selection or plain randomness?selection or plain randomness?
Maybe there’s no positive selection after all? Maybe there’s no positive selection after all?
Evolutionary Model
Codon MSAPhylogeny
Ks
Ka0
![Page 38: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/38.jpg)
3838
Solution: Solution: statistically statistically compare compare between hypothesesbetween hypotheses
HH00: There’s no positive selection: There’s no positive selection
HH11: There is positive selection: There is positive selection
HH00: compute the probability: compute the probability (likelihood) (likelihood) of the data of the data
using a model that using a model that does does not not account for positive account for positive selectionselection
10 Ks
Ka
2~)))0(|(
))1(|(ln(2
HMDataL
HMDataL P-value
< 0.05 accept H0
> 0.05 reject H0
Perform a statistical test to accept or reject HPerform a statistical test to accept or reject H00
(likelihood ratio test)(likelihood ratio test)
Ks
Ka0
HH11: compute the probability: compute the probability (likelihood) (likelihood) of the data using a model of the data using a model
that that does account for positive selectiondoes account for positive selection
![Page 39: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/39.jpg)
3939
Note: saturation of synonymous substitutions
Human and wheat are too evolutionary remote
saturation of synonymous substitutions
Pick closer sequences for positive selection analysis
Syn.
Nonsyn.
![Page 40: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/40.jpg)
4040
http://selecton.tau.ac.il
![Page 41: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/41.jpg)
4141
Selecton input
Coding sequences - only ORFsCoding sequences - only ORFs No stop codonsNo stop codons If an MSA is provided it must be If an MSA is provided it must be codon alignedcodon aligned ((
RevTransRevTrans)) The user must provide the sequences – no psi-blast The user must provide the sequences – no psi-blast
optionoption
Codon-level sequences !!!
![Page 42: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/42.jpg)
4242
Positive selection in the primatePositive selection in the primateTRIM5aTRIM5a
![Page 43: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/43.jpg)
4343
PrimatePrimateTRIM5aTRIM5a
TRIM5α from humans, rhesus monkeys, and African green monkeys are all unable to restrict retroviruses isolated from their own species, yet are able to restrict retroviruses from the other species
TRIM5α is an important natural barrier to cross-species retrovirus transmission
TRIM5α is in an antagonistic conflict with the retroviral capsid proteins
TRIM5α is under positive selection
![Page 44: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/44.jpg)
4444
Positive selection analysisPositive selection analysis
![Page 45: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/45.jpg)
4545
Positive selection analysis in SelectonPositive selection analysis in Selecton
H0
H1
![Page 46: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/46.jpg)
4646
Comparing HComparing H00 and H and H11 in Selecton in Selecton
![Page 47: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/47.jpg)
4747
Comparing HComparing H00 and H and H11 in Selecton in Selecton
![Page 48: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/48.jpg)
4848
![Page 49: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/49.jpg)
4949
Selecton resultsSelecton results::
![Page 50: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/50.jpg)
5050
![Page 51: 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity](https://reader037.vdocuments.site/reader037/viewer/2022110323/56649d7e5503460f94a61478/html5/thumbnails/51.jpg)
5151
ResultsResults
Human rhesus swaps at sites 332, 335-340 (SPRY) significantly elevate human resistance to HIV and rhesus resistance to SIV