andr á s fiser department of biochemistry and seaver center for bioinformatics
DESCRIPTION
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates. Andr á s Fiser Department of Biochemistry and Seaver Center for Bioinformatics Albert Einstein College of Medicine Bronx, New York, USA. - PowerPoint PPT PresentationTRANSCRIPT
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining
unique information from multiple templates
András Fiser
Department of Biochemistry andSeaver Center for BioinformaticsAlbert Einstein College of MedicineBronx, New York, USA
Target – TemplateAlignment
Model Building
START
Template Search
Model Evaluation
END
Multiple Mapping Method
Loop, side chain modeling
Statistical potential
Comparative protein structure modeling
Multiple Templates
Why do we need sequence alignments?
#Sequence vs. structure To generate input alignment for comparative modeling / threading
#Sequence vs. databases:
Querying sequence databases
#Sequence vs. sequence:
Establishing residue equivalencies between two proteins to locate conserved/variable regions
Ranking of models built on alternative alignments
Problem: None of the currently available methods produce consistently superior results in all cases
Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKTarget CLW DWTDAERAAIKALWGKIDVGEIGP—-QALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNMTarget A2D DWTDAERAAIKALWGKI—-DVGEIGPQALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM
Template GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQGAMNKALELFRKDIAAKYKELGYTarget CLW DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIHEAWQKFLAVVVSALGRQYH----Target A2D DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKF-G---PSAFTPEIHEAWQKFLAVVVSALGRQYH
Example:Template: 1a6m; Target: 1spg, chain B
~21% sequence identity
Instead of relying on just one alignment method, one should combine results of several alternative techniques
Alternative solutions vs. sequence similarity
Multiple Mapping Method
• Idea:– Improve the accuracy of sequence-to-structure
alignment by optimally splicing alternative inputs.
• Three components:
- Sampling
- Algorithm
- Scoring function
MMM scoring function: increasing the dimensionality of input information
Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILTarget CLW DWTDAERAAIKALWGKIDVGEIGP—-QALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAV
Template KKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQGAMNKALELFRKDIAAKYKELGYTarget CLW QNMDNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIHEAWQKFLAVVVSALGRQYH----
Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILTarget A2D DWTDAERAAIKALWGKI—-DVGEIGPQALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAV
Template KKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQGAMNKALELFRKDIAAKYKELGYTarget A2D QNMDNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKF-G---PSAFTPEIHEAWQKFLAVVVSALGRQYH
Different mapping identifies a different environment for each residue to align
Assess the “fitness” of each mapping
1
1
2
2
Multiple Mapping Method: Algorithm
Step 1: Identify variable regions from the consensus alignment of the input set
Step 2: Select the best scoring variable segments, and combine them with with the core region of the alignment.
Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKTarget CLW DWTDAERAAIKALWGKIDVGEIGP—-QALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNMTarget A2D DWTDAERAAIKALWGKI—-DVGEIGPQALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM
Template GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQGAMNKALELFRKDIAAKYKELGYTarget CLW DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIHEAWQKFLAVVVSALGRQYH----Target A2D DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKF-G---PSAFTPEIHEAWQKFLAVVVSALGRQYH
Example: Template 1a6m; Target 1spg, chain B
21% sequence id
ExperimentalClustalW, RMSD 2.0 ÅAlign2D, RMSD 2.7 Å
Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKTarget MMM DWTDAERAAIKALWGKI—-DVGEIGPQALSRLLIVYPWTQRHFKGFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM
Template GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQGAMNKALELFRKDIAAKYKELGYTarget MMM DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIHEAWQKFLAVVVSALGRQYH----
ExperimentalMMM, RMSD 1.8 Å
CLUSTALW 2.6 ÅALIGN2D 6.1 Å
MMM example using ideal scoring function
CLUSTALW 4.6 ÅALIGN2D 1.1 Å
Multiple Mapping Method: scoring function (1)
A composite scoring function to assess the compatibility/fit of alternative variable segments in the template structural environment.
• The composite scoring function consists of three mostly non-overlapping components.
1. Environment-specific substitution matrices (FUGUE1).
2. A scoring scheme based on a comparison (PHD vs. DSSP) of the secondary structure types (H3P22).
3. Statistically derived residue-residue contact energy (Rykunov and Fiser3).1Shi et al. J. Mol. Biol. (2001) 310, 243-2572Rice et al., J. Mol. Biol (1997) 267, 1026-10383Rykunov & Fiser., Proteins. (2007) 67, 559-68
MMM performance on 1400 pairs
MMM performance on 87 pairs, meta-servers
ESypred3D Consensus
Sampling vs. Scoring
• Multiple Mapping Method optimally combines alternative alignments obtained from different methods or scoring function:
On a benchmark dataset of 6635 protein pair structural alignments, comparative models built using MMM alignments are approximately 0.3 Ǻ and 0.5 Å more accurate on average in the whole spectrum and in the <30% target-template sequence identity regions, respectively, than the average accuracy of models built using the alternative input alignments ( ~3 and ~4 Å).
Summary
Optimally combining multiple templates
Selecting multiple templates
• Target sequence: by PSI-BLAST.
• Hits selected if sequence overlap with the target is > 60% of the actual SCOP domain length or more than 75% of the PDB chain length in case of a missing SCOP classification.
• Iterative clustering procedure identifies the most suitable templates to combine. Templates are selected or discarded according to a hierarchical selection procedure that accounts for – sequence identity between templates and target sequence,
– sequence identity among templates,
– crystal resolution of the templates,
– contribution of templates to the target sequence (i.e. if a region is covered by several templates or by a single template only).
Single versus multiple templatesUsing a dataset of 765 proteins with known structure two sets of models were built: (1) using one template (best E-value hit; light bars), (2) using multiple templates (grey bars)
And…increased coverageHistogram of models’ difference length. Each query sequence is modeled using single and multiple templates. The histogram shows the frequency of (Lm–Ls). Lm: length of model built using multiple templates, and Ls length of the model built using a single template.
The x-ray structure, the model with multiple templates and with a single template are shown in grey, red, and blue, respectively.
Multiple templates agree much better in two exposed regions: A and B, than the model built using single template.
Increased Coverage
The x-ray structure, the model with multiple templates, and model with single templates are shown in grey, red, and blue, respectively.
The addition of extra templates allowed obtaining a longer model that include a beta-turn-beta-turn extra region (20 amino acids), depicted in ribbon.
• Lab members:
– Dmitrij Rykunov
– Rotem Rubinstein
– J. Eduardo Fajardo
– Carlos J. Madrid-Aliste
– Veena Venkatagiriyappa
– Joseph Dybas
– Mario Pujato
– Brajesh Rai
– Narcis Fernandez-Fuentes
– Elliot Sternberger
Acknowledgement
Http://www.fiserlab.org/servers