comparative modelling threading baldomero oliva miguel universitatpompeufabra
TRANSCRIPT
Comparative Comparative ModellingModellingThreadingThreading
Baldomero Oliva MiguelBaldomero Oliva Miguel
UUnniivveerrssiittaatt
PPoommppeeuu
FFaabbrraa
EVOLUTIONEVOLUTIONEVOLUTIONEVOLUTION
DiferencesDiferences(proteins)(proteins)
Genetic informationGenetic information(DNA)(DNA)
DNA secuence 3DDNA secuence 3DDNA secuence 3DDNA secuence 3D
The Sanger Centre
U.S. Genome Project ( 1990 )U.S. Genome Project ( 1990 )Identify human genesIdentify human genes
Store the information in Data BanksStore the information in Data Banks
Development of tools for analisisDevelopment of tools for analisis
Venter, Patrinos & CollinsVenter, Patrinos & Collins
1998199819981998
DOEDOEDOEDOE
NHGRINHGRINHGRINHGRINIHNIHNIHNIH
Increase of sequencesIncrease of sequencesIncrease of sequencesIncrease of sequences
1050000010500000GenBank (DNA)GenBank (DNA)378152378152 TrEMBLTrEMBL9221192211 Swiss ProtSwiss Prot
100
1'000
10'000
100'000
1'000'000
1986 1988 1990 1992 1994 1996 1998 2000 2002 2004
TrEMBL
SwissProt
PDB
Public Database Holdings:
3D known (20%)
Homologs (30%)
Remotes (10%)
Unknown function (20%)
NO homology (20%)
sec. 3D
Knowing the structure Knowing the structure helps to: helps to:
Knowing the structure Knowing the structure helps to: helps to:
• TO KNOWTO KNOW the function: annotation
• IMPROVEIMPROVE the function: genetic engineering
• INHIBITINHIBIT the function: drugs
Structure FactoryStructure FactoryStructure FactoryStructure Factoryexpress & purify
cristallize
X-ray
analises
structure
Objetive
Enoughinformation
Not accurate
NOinformation
3D
3D 3D
3D
How does modelling help?How does modelling help?How does modelling help?How does modelling help?
Structural GenomicsStructural Genomics
X-rayX-ray
Can we predict protein
structures from genome
sequences?
Study of known Study of known structuresstructures
Study of known Study of known structuresstructures
Assigning structure to Assigning structure to sequences with sequences with
unknown structureunknown structure
Assigning structure to Assigning structure to sequences with sequences with
unknown structureunknown structure
secondary
Tertiary
MetMet
ValVal
LeuLeu
TyrTyr
AspAsp
SerSer
ThrThr
•••
sequence quaternary
super-secondary
domain
Caracterize the conformationCaracterize the conformation
Ligand docking
Triose Triose Fosfate Fosfate
IsomeraseIsomerase
CarboxypeptidaseCarboxypeptidase
GlucanaseGlucanaseRibonucleaseRibonuclease
MioglobinMioglobin
The number of different protein folds is limited:
New Folds
Known Folds
The number of different protein folds is limited:
[ last update: Oct 2001 ]
Nº sequences > Nº folds
Structura 3ª
Classification
• UCL, Janet Thornton & Christine Orengo
• Class (C), Architecture(A), Topology(T), Homologous superfamily (H)
Protein Structure Classification
CATH - Protein Structure Classification[ http://www.biochem.ucl.ac.uk/bsm/cath_new/ ]
SCOP - Structural Classification of Proteins
• MRC Cambridge (UK), Alexey Murzin, Brenner S. E., Hubbard T., Chothia C.
• created by manual inspection
• comprehensive description of the structural and evolutionary relationships
[ http://scop.mrc-lmb.cam.ac.uk/scop/ ]
• Class(C)
derived from secondary structure
content is assigned automatically
• Architecture(A)
describes the gross orientation of
secondary structures,
independent of connectivity.
• Topology(T)
clusters structures according to
their topological connections and
numbers of secondary structures
• Homologous superfamily (H)
TSS toxinTSS toxin
8.8% Id8.8% Id
RemoteRemote
Cholera toxinCholera toxin
80% Id80% Id
HomologHomologSCOP SCOP
• Family• Superfamily• Fold• Class
EnterotoxinEnterotoxin
AminoacylAminoacyltRNA synthetasetRNA synthetase
4.4% Id4.4% Id
AnalogAnalog
MNIFEMLRID EGLRLKIYKD TEGYYTIGIG HLLTKSPSLN AAKSELDKAI GRNCNGVITKDEAEKLFNQD VDAAVRGILR NAKLKPVYDS LDAVRRCALI NMVFQMGETG VAGFTNSLRMLQQKRWDEAA VNLAKSRWYN QTPNRAKRVI TTFRTGTWDA YKNL
Can we predict protein
structures ?
Homology modeling
Idea: Extrapolation of the structure for a new (target) sequence from the known 3D-structures of related family members (templates).
Alignments
Nº sequences > Nº folds
Step 1 Selection of templatesStep 1 Selection of templates
Step 2 Alignment with templatesStep 2 Alignment with templates
.
0
20
40
60
80
100
0 50 100 150 200 250
identity
Number of residues aligned
Perc
enta
ge s
equence
identi
ty/s
imila
rity
(B.Rost, Columbia, NewYork)
Sequence identity implies structural similarity
Don’t know region .....
Sequence similarity implies structural similarity?
Hidropaticity Plot
2D Structure
2D prediction
Nº sequences > Nº folds
2D prediction (PHD)2D prediction (PHD)
Alignments
Nº sequences > Nº folds
Structure 3D
Classification- Homologs- Remotes- Analogs
Hidropaticity Plot
2D Structure
2D prediction
Comparative Modelling Comparative Modelling ((homologyhomology))
• Fold is more conserved than sequence.
• Secondary structure is the most conserved structure
• Loops have the higher variability in structure.
Comparative modelling - Composer - ModelerSCR & VR
SCR
VR
- Ring Closure- DB Search- Classification
Alignments
Nº sequences > Nº folds
Structure 3D
Classification- Homologs- Remotes- Analogs
Hidropaticity Plot
2D Structure
2D prediction
MODELMODEL
Comparative modelling - Composer - ModelerSCR & VR
SCR
VR
- Ring Closure- DB Search- Classification
Alignments
Nº sequences > Nº folds
Structure 3D
Classification- Homologs- Remotes- Analogs
Hidropaticity Plot
2D Structure
2D prediction
Step 3 Model BuildingStep 3 Model Building
Fragment ExtractionFragment Extraction
Rigid-body Assembly
Loop ConstructionLoop Construction
Spatial Constrains
Optimisation MethodsOptimisation Methods
Satisfaction of Spatial ConstrainsSatisfaction of Spatial Constrains
• averaging core template backbone atoms
(weighted by local sequence similarity with the target sequence)
• Leave non-conserved regions (loops) for later ….
a) Build conserved core framework
Rigid Body assembly
• use the “spare part” algorithm to find
compatible fragments in a Loop-Database
• “ab-initio” rebuilding of loops (Monte Carlo,
molecular dynamics, genetic algorithms, etc.)
b) Loop modeling
Rigid Body assembly
EF-HandEF-HandCalcium bindingCalcium binding
aa{baalal}bbaa{baalal}bbXh{DXDpDG}XhXh{DXDpDG}Xhaa{baalal}bbaa{baalal}bbXh{Xh{DDXXDDppDDG}XhG}Xh
D 100%
D/N 100% D/N 100%
P-loop GTP bindingP-loop GTP bindingConformation Conformation bb{eppgag}aabb{eppgag}aaMotif sequence Motif sequence hh{GhXXpG}Kphh{GhXXpG}Kp
P-loop GTP bindingP-loop GTP bindingConformation Conformation bb{eppgag}aabb{eppgag}aaMotif sequence Motif sequence hh{GhXXpG}hh{GhXXpG}KpKp
NAD(P)/FADNAD(P)/FADbindingbinding
bb{eab}aabb{eab}aahh{GhG}hXhh{GhG}hXbb{eab}aabb{eab}aahh{hh{GGhhGG}hX}hX
Ring closure searchRing closure search
Distance restraintsDistance restraints
BuildingBuildingCA SuperpositionCA Superposition
BuildingBuildingCA SuperpositionCA Superposition
A. Sali & T. Blundel (1993) JMB 234, 779 - 815
Modeling by Satisfaction of Spatial restraints
M
A
T
EA
F
TS
G
Q
Feature properties can be associated with
• a protein (e.g. X-ray resolution)
• residues (e.g. solvent accessibility)
• pairs of residues (e.g. Ca - Ca distance)
• other features (e.g. main chain classes)
How can we derive modeling restraints from this data?
A restraint is defined as probability density function (pdf) p(x):
1
2
)()21(x
x
dxxpxxxp1)( dxxp
with
0)( xp
Modeling by Satisfaction of Spatial restraints
combine basis pdfs to molecular probability density functions
4.0'2.0 s
4.0''2.0 s
4.0'2.0 s 6.0''4.0 s 4.0''2.0 s6.0'4.0 s
Modeling by Satisfaction of Spatial restraints
Satisfaction of spatial restraints
• Find the protein model with the highest probability
Variable target function:
• Start with e.g. a random conformation model and use only local restraints
• minimize some steps using a conjugate gradient optimization
• repeat with introducing more and more long range restraints until
all restraints are used
Modeling by Satisfaction of Spatial restraints
Threading
Idea: Find the optimal structure for a new (target) sequence in the set of known 3D-structures (templates) by threading the target sequence.
Nº sequences > Nº folds
Classification- Homologs- Remotes- Analogs
Step 1 Step 1 Selection of templatesSelection of templatesFOLD ASSIGNMENTFOLD ASSIGNMENT
Structure 3D
PseudoPotencial
Nº sequences > Nº folds
Structure 3D
Classification- Homologs- Remotes- Analogs
Pseudo-potencialPseudo-potencial
Principles
Z
ezyxP
KTzyxE /),,(
),,(
P(x)
P(x/(y,z))kT)ΔE(x/(y,z)
E(x);E(x/(y,z)))ΔE(x/(y,z)
ln
Potential of Mean Force:
Applied on proteins
Boltzman lawBoltzman law
Use frequencies instead of probabilities.They are taken from non-homologous proteins on the PDB.
Fold recognition / Threading
Principle: Find a compatible fold for a given sequence ....
>Protein XYMSTLYEKLGGTTAVDLAVDKFYERVLQDDRIKHFFADVDMAKQRAHQKAFLTYAFGGTDKYDGRYMREAHKELVENHGLNGEHFDAVAEDLLATLKEMGVPEDLIAEVAAVAGAPAHKRDVLNQ
?
Using ...• 1D – 3D profile matching,• secondary structure predictions, • position specific scoring matrices (PSSM),• mean force potentials, • keyword statistics, • ....
Predicción de“fold”
PseudoPotencial
Nº sequences > Nº folds
Structure 3D
Classification- Homologs- Remotes- Analogs
“fold” prediction
Predicción del “fold”Predicción del “fold”
Combine sequence with all folds
Evaluate energiesEvaluate energies
Data BasePDB
Data BasePDB
Energy < threshold ?Energy < threshold ?
SequenceSequence
Are energy profiles native-like ?Are energy profiles native-like ?YES
NO
NO
Fold recognition from Fold recognition from Secondary structure predictionSecondary structure prediction
TOPITSTOPITS
MODELMODEL
Predicción de“fold”
PseudoPotencial
Nº sequences > Nº folds
Structure 3D
Classification- Homologs- Remotes- Analogs
“fold” prediction
Hidropaticity Plot
2D prediction
MODELMODEL
Predicción de“fold”
PseudoPotencial
Nº sequences > Nº folds
Structure 3D
Classification- Homologs- Remotes- Analogs
“fold” prediction
2D Structure
Comparative modelling - Composer - ModelerSCR & VR
SCR
VR
- Ring Closure- DB Search- Classification
Alignments
Hidropaticity Plot
2D prediction
Evaluate de model
MODELMODEL
Predicción de“fold”
PseudoPotencial
Nº sequences > Nº folds
Structure 3D
Classification- Homologs- Remotes- Analogs
“fold” prediction
2D Structure
Comparative modelling - Composer - ModelerSCR & VR
SCR
VR
- Ring Closure- DB Search- Classification
Alignments
Why Model Evaluation ?
Just because it’s pretty doesn’t make it right !
Topics:
• correct fold
• model coverage (%)
• C - deviation (rmsd)
• alignment accuracy (%)
• side chain placement
1. Errors in side-chain packing .
2. Shifts of correctly aligned residues .
3. Regions without template .
4. Errors due to misalignments .
5. Errors produced by incorrect templates .
Types of ErrorsTypes of Errors
EVA
Evaluation of Automatic protein structure prediction
[ Burkhard Rost, Andrej Sali, http://maple.bioc.columbia.edu/eva/ ]
CASPCommunity Wide Experiment on the Critical Assessment of Techniques for Protein Structure Predictionhttp://PredictionCenter.llnl.gov/casp5/
3D - Crunch
Very Large Scale Protein Modelling Project
http://www.expasy.org/swissmod/SM_LikelyPrecision.html
Model Accuracy Evaluation
Hidropaticity Plot
2D prediction
Evaluate de model
MODELMODEL
Predicción de“fold”
PseudoPotencial
Nº sequences > Nº folds
Structure 3D
Classification- Homologs- Remotes- Analogs
“fold” prediction
2D Structure
Comparative modelling - Composer - ModelerSCR & VR
SCR
VR
- Ring Closure- DB Search- Classification
Alignments
Hidropaticity Plot
2D prediction
Evaluate de model
MODELMODEL
Predicción de“fold”
PseudoPotencial
Nº sequences > Nº folds
Structure 3D
Classification- Homologs- Remotes- Analogs
“fold” prediction
2D Structure
Comparative modelling - Composer - ModelerSCR & VR
SCR
VR
- Ring Closure- DB Search- Classification
Alignments
Side-chains- Multi-Copy
c) Side Chain placement
Find the most probable side chain
conformation, using
• homologues structures • back-bone dependent rotamer libraries
• energetic and packing criteria
Rigid Body assembly
Hidropaticity Plot
2D prediction
Evaluate de model
MODELMODEL
Predicción de“fold”
PseudoPotencial
Nº sequences > Nº folds
Structure 3D
Classification- Homologs- Remotes- Analogs
“fold” prediction
2D Structure
Comparative modelling - Composer - ModelerSCR & VR
SCR
VR
- Ring Closure- DB Search- Classification
Alignments
Side-chains- Multi-Copy
Hidropaticity Plot
2D prediction
Evaluate de model
MODELMODEL
Predicción de“fold”
PseudoPotencial
Nº sequences > Nº folds
Structure 3D
Classification- Homologs- Remotes- Analogs
“fold” prediction
2D Structure
Comparative modelling - Composer - ModelerSCR & VR
SCR
VR
- Ring Closure- DB Search- Classification
Alignments
Side-chains- Multi-Copy
Optimization- Minimization E- Simulated Annealing
• modeling will produce unfavorable contacts and bonds
idealization of local bond and angle geometry
• extensive energy minimization will move coordinates away
keep it to a minimum
• SwissModel is using GROMOS 96 force field for a steepest descent
d) Energy minimization
Rigid Body assembly
OPTIMIZATIONOPTIMIZATION
1- Minimization by Newton-Rapson
Xi+1 = Xi + grad(E)2- Simulated Annealing
Optimization schedule and progress
[ A. Sali & T. Blundel (1993) JMB 234, 779 – 815 ]
Modeling by Satisfaction of Spatial restraints
Hidropaticity Plot
2D prediction
Evaluate de model
MODELMODEL
Predicción de“fold”
PseudoPotencial
Nº sequences > Nº folds
Structure 3D
Classification- Homologs- Remotes- Analogs
“fold” prediction
2D Structure
Comparative modelling - Composer - ModelerSCR & VR
SCR
VR
- Ring Closure- DB Search- Classification
Alignments
Side-chains- Multi-Copy
Optimization- Minimization E- Simulated Annealing
Hidropaticity Plot
2D prediction
Evaluate de model
MODELMODEL
Predicción de“fold”
PseudoPotencial
Nº sequences > Nº folds
Structure 3D
Classification- Homologs- Remotes- Analogs
“fold” prediction
2D Structure
Comparative modelling - Composer - ModelerSCR & VR
SCR
VR
- Ring Closure- DB Search- Classification
Alignments
Side-chains- Multi-Copy
Optimization- Minimization E- Simulated Annealing
Molecular Dynamics
MOLECULAR DYNAMICSMOLECULAR DYNAMICS
Finite differences algorithm
iiiii t
rmR
r
V
m
dt
t
rdtrr
2
2
2/11
Coupling to athermal bath
Statistical Mechanics
d
de
eXdXXX
RTE
RTE
/)(
/)(
Ergodic Hipothesis
XX
EXAMPLEEXAMPLE
clevageclevage
++
Activation segmentActivation segment
activationactivationEnzimeEnzime
PRO-CARBOXIPEPTIDASAPRO-CARBOXIPEPTIDASA
PRO-CARBOXYPEPTIDASESPRO-CARBOXYPEPTIDASES
Porcine Porcine Pro Carboxypeptidase BPro Carboxypeptidase B
PCPBpPCPBp
Porcine Porcine Pro Carboxypeptidase A1Pro Carboxypeptidase A1
PCPA1pPCPA1p
Bovine Bovine Pro Carboxypeptidase A1Pro Carboxypeptidase A1
PCPA1bPCPA1b
10 20 30 40 50 60 70 80-ETFVGDQVLEIVPSNEEQIKNLLQLEAQEHLQLDFWKSPTTPGETAHVRVPFVNVQAVKVFLESQGIAYSIMIEDVQVL PCPA2hPCPA2hKEDFVGHQVLRITAADEAEVQTVKELEDLEHLQLDFWRGPGQPGSPIDVRVPFPSLQAVKVFLEAHGIRYRIMIEDVQSL PCPA1bPCPA1b KEDFVGHQVLRISVDDEAQVQKVKELEDLEHLQLDFWRGPARPGFPIDVRVPFPSIQAVKVFLEAHGIRYTIMIEDVQLL PCPA1pPCPA1p
90 100 110 120 130 140 150 160LDKENEEMLFNRRRERSGN-FNFGAYHTLEEISQEMDNLVAEHPGLVSKVNIGSSFENRPMNVLKFSTGG-DKPAIWLDA PCPA2hPCPA2hLDEEQEQMFASQSRARSTNTFNYATYHTLDEIYDFMDLLVAEHPQLVSKLQIGRSYEGRPIYVLKFSTGGSNRPAIWIDL PCPA1bPCPA1bLDEEQEQMFASQGRARTTSTFNYATYHTLEEIYDFMDILVAEHPALVSKLQIGRSYEGRPIYVLKFSTGGSNRPAIWIDS PCPA1pPCPA1p
170 180 190 200 210 220 230 240GIHAREWVTQATALWTANKIVSDYGKDPSITSILDALDIFLLPVTNPDGYVFSQTKNRMWRKTRSKVSGSLCVGVDPNRN PCPA2hPCPA2hGIHSREWITQATGVWFAKKFTEDYGQDPSFTAILDSMDIFLEIVTNPDGFAFTHSQNRLWRKTRSVTSSSLCVGVDANRN PCPA1bPCPA1bGIXSRXWITQASGVWFAKKITENYGQNSSFTAILDSMDIFLEIVTNPNGFAFTHSDNRLWRKTRSKASGSLCVGSDSNRN PCPA1pPCPA1p
250 260 270 280 290 300 310 320WDAGFGGPGASSNPCSDSYHGPSANSEVEVKSIVDFIKSHGKVKAFIILHSYSQLLMFPYGYKCTKLDDFDELSEVAQKA PCPA2hPCPA2hWDAGFGKAGASSSPCSETYHGKYANSEVEVKSIVDFVKDHGNFKAFLSIHSYSQLLLYPYGYTTQSIPDKTELNQVAKSA PCPA1bPCPA1bWDAGFGGAGASSSPCAETYHGKYPNSEVEVKSITDFVKNNGNIKAFISIXSYSQLLLYPYGYKTQSPADKSELNQIAKSA PCPA1pPCPA1p
330 340 350 360 370 380 390 400AQSLRSLHGTKYKVGPICSVIYQASGGSIDWSYDYGIKYSFAFELRDTGRYGFLLPARQILPTAEETWLGLKAIMEHVRD PCPA2hPCPA2hVEALKSLYGTSYKYGSIITTIYQASGGSIDWSYNQGIKYSFTFELRDTGRYGFLLPASQIIPTAQETWLGVLTIMEHTLN PCPA1bPCPA1bVAALKSLYGTSYKYGSIITVIYQASGGVIDWTYNQGIKYSFSFELRDTGRRGFLLPASQIIPTAQETWLALLTIMEHTLN PCPA1pPCPA1p
SEQUENCE ALIGNMENT OF PRO-CARBOXYPEPTIDASESSEQUENCE ALIGNMENT OF PRO-CARBOXYPEPTIDASES
Comparative Comparative ModellingModelling
Model building Model building of conserved of conserved regionsregions
Baldomero Oliva MiguelBaldomero Oliva Miguel
UUnniivveerrssiittaatt
PPoommppeeuu
FFaabbrraa
Baldomero Oliva MiguelBaldomero Oliva Miguel
UUnniivveerrssiittaatt
PPoommppeeuu
FFaabbrraa
Comparative Comparative ModellingModelling
Model building Model building of conserved of conserved regionsregions
Add and model Add and model build loop build loop regionsregions
Baldomero Oliva MiguelBaldomero Oliva Miguel
UUnniivveerrssiittaatt
PPoommppeeuu
FFaabbrraa
Comparative Comparative ModellingModelling
Model building Model building of conserved of conserved regionsregions
Baldomero Oliva MiguelBaldomero Oliva Miguel
UUnniivveerrssiittaatt
PPoommppeeuu
FFaabbrraa
Add and model Add and model build loop build loop regionsregions
Comparative Comparative ModellingModelling
Model building Model building of conserved of conserved regionsregions
Secondary Structure Prediction and Multiple Secondary Structure Prediction and Multiple Alignment of Pro-CarboxypeptidasesAlignment of Pro-Carboxypeptidases
10 20 30 40 50 60 70 80 EEEEE HHHHHHHHHHHHHHH EEEE EEEEE HHHHHHHHHH EEEEHHHHHHHHEEEEE HHHHHHHHHHHHHHH EEEE EEEEE HHHHHHHHHH EEEEHHHHHHHH PCPA2h PHDPCPA2h PHD-ETFVGDQVLEIVPSNEEQIKNLLQLEAQEHLQLDFWKSPTTPGETAHVRVPFVNVQAVKVFLESQGIAYSIMIEDVQVL PCPA2hPCPA2hKEDFVGHQVLRITAADEAEVQTVKELEDLEHLQLDFWRGPGQPGSPIDVRVPFPSLQAVKVFLEAHGIRYRIMIEDVQSL PCPA1bPCPA1b KEDFVGHQVLRISVDDEAQVQKVKELEDLEHLQLDFWRGPARPGFPIDVRVPFPSIQAVKVFLEAHGIRYTIMIEDVQLL PCPA1pPCPA1p
90 100 110 120 130 140 150 160HHHHHHHHHHHHHHHHHHHHHHHHHHHH PCPA2h PHDPCPA2h PHDLDKENEEMLFNRRRERSGN-FNFGAYHTLEEISQEMDNLVAEHPGLVSKVNIGSSFENRPMNVLKFSTGG-DKPAIWLDA PCPA2hPCPA2hLDEEQEQMFASQSRARSTNTFNYATYHTLDEIYDFMDLLVAEHPQLVSKLQIGRSYEGRPIYVLKFSTGGSNRPAIWIDL PCPA1bPCPA1bLDEEQEQMFASQGRARTTSTFNYATYHTLEEIYDFMDILVAEHPALVSKLQIGRSYEGRPIYVLKFSTGGSNRPAIWIDS PCPA1pPCPA1p
170 180 190 200 210 220 230 240GIHAREWVTQATALWTANKIVSDYGKDPSITSILDALDIFLLPVTNPDGYVFSQTKNRMWRKTRSKVSGSLCVGVDPNRN PCPA2hPCPA2hGIHSREWITQATGVWFAKKFTEDYGQDPSFTAILDSMDIFLEIVTNPDGFAFTHSQNRLWRKTRSVTSSSLCVGVDANRN PCPA1bPCPA1bGIXSRXWITQASGVWFAKKITENYGQNSSFTAILDSMDIFLEIVTNPNGFAFTHSDNRLWRKTRSKASGSLCVGSDSNRN PCPA1pPCPA1p
250 260 270 280 290 300 310 320WDAGFGGPGASSNPCSDSYHGPSANSEVEVKSIVDFIKSHGKVKAFIILHSYSQLLMFPYGYKCTKLDDFDELSEVAQKA PCPA2hPCPA2hWDAGFGKAGASSSPCSETYHGKYANSEVEVKSIVDFVKDHGNFKAFLSIHSYSQLLLYPYGYTTQSIPDKTELNQVAKSA PCPA1bPCPA1bWDAGFGGAGASSSPCAETYHGKYPNSEVEVKSITDFVKNNGNIKAFISIXSYSQLLLYPYGYKTQSPADKSELNQIAKSA PCPA1pPCPA1p
330 340 350 360 370 380 390 400AQSLRSLHGTKYKVGPICSVIYQASGGSIDWSYDYGIKYSFAFELRDTGRYGFLLPARQILPTAEETWLGLKAIMEHVRD PCPA2hPCPA2hVEALKSLYGTSYKYGSIITTIYQASGGSIDWSYNQGIKYSFTFELRDTGRYGFLLPASQIIPTAQETWLGVLTIMEHTLN PCPA1bPCPA1bVAALKSLYGTSYKYGSIITVIYQASGGVIDWTYNQGIKYSFSFELRDTGRRGFLLPASQIIPTAQETWLALLTIMEHTLN PCPA1pPCPA1p
Pseudo-energyPseudo-energy(threading)(threading)
Wrong model
EvaluationEvaluation
G (C’)G (C’)N (C’’)N (C’’)
S (Ccap )S (Ccap )
R (C1)R (C1) E (C2)E (C2)
R (C3)R (C3)
R (C5)R (C5)
R (C4)R (C4)
Schellman Motif ProfileSchellman Motif Profile
C´´ » C3 / C´ GlyC´´ » C3 / C´ Gly
C3 C2 C1 Ccap C’ C’’C3 C2 C1 Ccap C’ C’’
L Q E H G IL Q E H G I A R A K G VA R A K G V M K K L G KM K K L G K
pont d’ hidrògenpont d’ hidrògen
Interacció hidrofòbicaInteracció hidrofòbica
N R R R E R S G N F NN R R R E R S G N F N
ponts d’hidrògenponts d’hidrògen
C3 Ccap C’’C3 Ccap C’’^ ^ ^
-Helix C-cap-Helix C-cap Schellman MotifSchellman Motif
Refinement of the ModelRefinement of the Model
ps
eud
o-e
ne
rgy
ps
eud
o-e
ne
rgy
residueresidue
IMPROVEDIMPROVED
Protein Structure Resources
PDB http://www.pdb.orgPDB – Protein Data Bank of experimentally solved structures (RCSB)
CATH http://www.biochem.ucl.ac.uk/bsm/cath/Hierarchical classification of protein domain structures
SCOP http://scop.mrc-lmb.cam.ac.uk/scop/Alexey Murzin’s Structural Classification of proteins
DALI http://www2.ebi.ac.uk/dali/Lisa Holm and Chris Sander’s protein structure comparison server
SS-Prediction and Fold Recognition
PHD http://cubic.bioc.columbia.edu/predictprotein/Burkhard Rost’s Secondary Structure and Solvent Accessibility Prediction Server
3DPSSM http://www.sbg.bio.ic.ac.uk/~3dpssm/ Fold Recognition Server using 1D and 3D Sequence Profiles coupled with Secondary Structure and Solvation Potential Information.
Protein Homology Modeling Resources
SWISS MODEL: http://www.expasy.ch/swissmod/
Deep View - SPDBV:homepage: http://www.expasy.ch/spdbv/Tutorials http://www.usm.maine.edu/~rhodes/SPVTut/
http://www.bbsrc.ac.uk/molbiol/
WhatIf http://www.cmbi.kun.nl/whatif/Gert Vriend’s protein structure modeling analysis program WhatIf
Modeller: http://guitar.rockefeller.edu/modeller/Andrej Sali's homology protein structure modelling by satisfaction of spatial restraints
FAMS: http://physchem.pharm.kitasato-u.ac.jp/FAMS/fams.htmlFull Automatic Modelling System (FAMS); Kitasato University; Tokyo, Japan
3D-JIGSAW: http://www.bmm.icnet.uk/people/paulb/3dj/form.htmlComparative Modelling Server; Imperial Cancer Research Fund; London, UK
CPHmodels: http://www.cbs.dtu.dk/services/CPHmodels/Centre for Biological Sequence Analysis; The Technical University of Denmark; Denmark
SDSC1: http://cl.sdsc.edu/hm.htmlSDSC Structure Homology Modelling Server; San Diego Supercomputing Centre
Protein Homology Modeling Resources
SWISS MODEL: http://www.expasy.ch/swissmod/
Example
Dear baldo,
Title of your Request: subtilisin
Swiss-Model '(SWISS-MODEL Version 36.0003)' started on 27-Oct-2003
Process identification is AAAa08uiF
The modelling procedure is now in progress, and its results should besent to you shortly.
In case of difficulties, please contact:
Torsten Schwede <[email protected]> Nicolas Guex
Protein Homology Modeling Resources
SWISS MODEL: http://www.expasy.ch/swissmod/
Example
Length of target sequence: 379 residues
Searching sequences of known 3D structuresFound 1c3lA.pdb with P(N)=0Found 1be8_.pdb with P(N)=0Found 1cseE.pdb with P(N)=0Found 1scd_.pdb with P(N)=0Found 1sbc_.pdb with P(N)=0Found 1be6_.pdb with P(N)=0…….
Protein Homology Modeling Resources
3D JigSaw
Example
This is an automatic split of your sequence in protein domains
Follow the link below and select the modelling templates for each modelable domain
2 domain/s found in your query sequence (2 have homologous templates). See them at:http://www.bmm.icnet.uk/~3djigsaw/dom_fish/display2.cgi?output/590e3988 (this file will be erased in a week time)
Your submission:>subtilisinMMRKKSFWFGMLTAFMLVFTMEFSDSASAAQPGKNVEKDYFVGFKSGVKTASVKKDIIKESGGKVDKQFRIINAAKATLDKEALKEVKNDPDVAYVEEDHVAHALGQTVPYGIPLIKADKVQAQGFKGANVKVAVLDTGIQASHPDLNVVGGASFVAGEAYNTDGNGHGTHVAGTVAALDNTTGVLGVAPSVSLFAVKVLNSSGSGSYSGIVSGIEWATTNGMDVINMSLGGPSGSTAMKQAVDNAYSKGVVPVAAAGNSGSSGYTNTIGYPAKYDSVIAVGAVDSNSNRASFSSVGAELEVMAPGAGVYSTYPTNTYATLNGTSMASPHVAGAAALILSKHPNLSASQVRNRLSSTATYLGSSFYYGKGLINVEAAAQSend commentsmailto:[email protected] you soon
http://www.bmm.icnet.uk/~3djigsaw/