dalla sequenza alla struttura
DESCRIPTION
Dalla sequenza alla struttura. Mauro Fasano Dipartimento di Biologia Strutturale e Funzionale Centro di Neuroscienze Università dell’Insubria – Busto Arsizio [email protected] http://fisio.dipbsf.uninsubria.it/cns/fasano. Dalla sequenza alla struttura. VLSEGEWQLVLV. O 2. - PowerPoint PPT PresentationTRANSCRIPT
Dalla sequenza Dalla sequenza alla strutturaalla struttura
Mauro Fasano
Dipartimento di Biologia Strutturale e FunzionaleCentro di Neuroscienze
Università dell’Insubria – Busto [email protected]
http://fisio.dipbsf.uninsubria.it/cns/fasano
Dalla sequenza alla struttura
O2
VLSEGEWQLVLV . . .Sequenza Struttura Funzione
Che informazioni offre la struttura?
• Conformazione dei siti attivi e di legame• Orientazione dei residui conservati• Interpretazione di meccanismi• Visualizzazione di cavità• Calcolo di potenziale elettrostatico• …
Esempio
• FtsZ – divisione cellulare in procarioti, mitocondri e cloroplasti.
• Tubulina – componente strutturale dei microtubuli – comunicazione intracellulare e divisione cellulare.
• FtsZ e Tubulina hanno bassa similarità di sequenza e non sembrerebbero omologhe.
Burns, R., Nature 391:121-123Picture from E. Nogales
FtsZ e tubulina sono omologhe?
• Proteine che hanno conservato la struttura tridimensionale possono derivare da un progenitore comune anche se la divergenza della sequenza non permette più di riconoscere l’omologia.
Un altro esempio
• α-lattalbumina e lisozima possiedono:– Stesso fold– Moderata similarità– Diversa funzione
Metodi sperimentali:
• Diffrazione dei raggi x
• Risonanza magnetica nucleare
Cristallografia a raggi X
• Ottenere cristalli della proteina– 0.3-1.0 mm– Le singole molecole sono ordinate in modo
periodico, ripetitivo. • La struttura è determinata dai dati di
diffrazione.
Image from http://www-structure.llnl.gov/Xray/101index.html
Schmid, M. Trends in Microbiology, 10:s27-s31.
• Le proteine devono cristallizzare– Grande quantità– Solubili
• Accesso a radiazione adatta• Tempo di calcolo per risolvere la struttura
Cristallografia a raggi X
Risonanza Magnetica Nucleare (NMR)
• Proteine in soluzione• Limite di dimensione ~ 40 kDa• Proteine stabili a lungo• Marcatura con 15N, 13C, 2H.• Strumentazione molto costosa• Tempo per assegnare le risonanze
Il Protein Data Bank
Crescita del PDB
Motivi strutturali depositati ogni anno
Percentuale di nuovi motivi strutturali
HEADER BINDING PROTEIN 01-JUN-95 1HXN 1HXN 2COMPND MOL_ID: 1; 1HXN 3COMPND 2 MOLECULE: HEMOPEXIN; 1HXN 4COMPND 3 CHAIN: NULL; 1HXN 5COMPND 4 DOMAIN: C-TERMINAL DOMAIN; 1HXN 6COMPND 5 SYNONYM: HPX; 1HXN 7COMPND 6 HETEROGEN: PO4 1HXN 8SOURCE MOL_ID: 1; 1HXN 9SOURCE 2 ORGANISM_SCIENTIFIC: ORYCTOLAGUS CUNICULUS; 1HXN 10SOURCE 3 ORGANISM_COMMON: RABBIT; 1HXN 11SOURCE 4 TISSUE: SERUM 1HXN 12KEYWDS HEME 1HXN 13EXPDTA X-RAY DIFFRACTION 1HXN 14AUTHOR H.R.FABER,E.N.BAKER 1HXN 15REVDAT 1 15-OCT-95 1HXN 0 1HXN 16JRNL AUTH H.R.FABER,C.R.GROOM,H.BAKER,W.MORGAN,A.SMITH, 1HXN 17JRNL AUTH 2 E.N.BAKER 1HXN 18JRNL TITL 1.8 ANGSTROMS CRYSTAL STRUCTURE OF THE C-TERMINAL 1HXN 19JRNL TITL 2 DOMAIN OF RABBIT SERUM HEMOPEXIN 1HXN 20JRNL REF TO BE PUBLISHED 1HXN 21JRNL REFN 0353 1HXN 22REMARK 1 1HXN 23
ATOM 1 CA GLU 225 -0.900 -1.002 39.233 1.00 70.00 1HXN 170ATOM 2 C GLU 225 -0.185 0.146 39.970 1.00 70.00 1HXN 171ATOM 3 O GLU 225 -0.514 1.329 39.758 1.00 70.00 1HXN 172ATOM 4 N SER 226 0.788 -0.203 40.823 1.00 70.00 1HXN 173ATOM 5 CA SER 226 1.534 0.805 41.594 1.00 70.00 1HXN 174ATOM 6 C SER 226 2.231 1.806 40.681 1.00 68.89 1HXN 175ATOM 7 O SER 226 1.883 1.952 39.514 1.00 70.00 1HXN 176ATOM 8 CB SER 226 2.572 0.130 42.515 1.00 70.00 1HXN 177ATOM 9 OG SER 226 3.237 -0.941 41.848 1.00 70.00 1HXN 178ATOM 10 N THR 227 3.242 2.478 41.223 1.00 65.51 1HXN 179ATOM 11 CA THR 227 3.989 3.417 40.410 1.00 70.00 1HXN 180ATOM 12 C THR 227 4.274 2.705 39.080 1.00 56.25 1HXN 181ATOM 13 O THR 227 4.179 3.296 38.022 1.00 44.63 1HXN 182ATOM 14 CB THR 227 5.354 3.797 41.074 1.00 70.00 1HXN 183ATOM 15 OG1 THR 227 5.114 4.682 42.172 1.00 70.00 1HXN 184ATOM 16 CG2 THR 227 6.256 4.492 40.065 1.00 70.00 1HXN 185
http://www.expasy.ch/spdbv
Esegui
Classificazione delle proteine:
• SCOP (Structural Classification of Proteins, scop.mrc-lmb.cam.ac.uk/scop/, Murzin et. al.):
548 folds (major structural similarity in terms of secondary structures e.g. globin-like, Rossman fold); 1296 families (clear evolutionary relationship or homology e.g. globins, Ras)
• CATH (Class, Architecture, Topology, Homologous Superfamily, www.biochem.ucl.ac.uk/bsm/cath/, Orengo et. al):
35 architectures (gross arrangment of secondary structures e.g. non-bundle, sandwich); 580 topologies (connectivity of secondary structures e.g. globin-like, Rossman fold); 1846 families (clear homology, same function)
Structural Classification Of Proteins
Predizione della struttura secondaria e terziaria
Metodi predittivi
»Comparative modeling> 30% similitudine
»Threading/Fold recognition0 – 30% similitudine
»Ab initionessun omologo
Qualità del modello comparativo
Identità di sequenza:
60-100% Confrontabile con NMR media risoluzioneSpecificità di substrato
30-60% Molecular replacement in cristallografiaPartenza per site-directed mutagenesis
<30% Gravi errori
Building by homology (Homology modelling)
---
G
---
Y
---
M
AAAA
KSTA
AGGG
YFFY
LEDA
VVVV
LVI
L
SEDS
Allineamento con proteine a struttura nota
Modello strutturale
Fold recognition (Threading)
Sequenza:
+Motivi strutturali noti
SLVAYGAAM
Modello strutturale
Ab initio
Sequenza
SLVAYGAAM
Modello strutturale
General Flowchart
Un numero grandissimo di polipeptidi si struttura in un numero finito (e relativamente piccolo) di folds
Almeno una proteina su due di quelle presenti nel database ha un omologo (identità > 30%) che quasi sempre ha lo stesso fold.
Building by homology
Costruire il modello comparativo
1) Cercare il massimo numero di omologhi che possiedano una entry nel PDB. Strumenti che utilizzano PSSM sono più sensibili. In questo caso vengono utilizzate sequenze senza struttura per costruire la PSSM.
2) Costruire un accurato allineamento multiplo tra la sequenza da modellare e tutte le entries che verranno utilizzate come templato.
Trovare strutture di proteine la cui sequenza è simile
allineamento
Modello strutturale
Verifica
OK!
Costruire il modello stesso
Determinare la struttura secondaria in base all’allineamento
Costruire le regioni conservate. Per ciascuna regione possiamo prendere le coordinate del frammento con la maggior similarità di sequenza.
Costruire le regioni variabili, solitamente loops.
Usando raccolte di loops osservati in strutture note, in base alla loro lunghezza ed alla loro sequenza
Costruendo la conformazione del loop ab initio. Vengono generate numerose conformazioni casuali e si calcola l’energia in un opportuno campo di forze.
Costruzione dei loops:
Alcuni siti web di homology modeling
COMPOSER – felix.bioccam.ac.uk/soft-base.html
MODELLER – guitar.rockefeller.edu/modeller/modeller.html
WHAT IF – www.sander.embl-heidelberg.de/whatif/
SWISS-MODEL – www.expasy.ch/SWISS-MODEL.html
Swiss-Modelhttp://www.expasy.ch/swissmod/SWISS-MODEL.html
http://guitar.rockefeller.edu/modeller/about_modeller.shtml
Modeller
Advanced program for homology modeling
Based on distance constraints
Implemented in several popular modelling packages such as InsightIIThe source is available for unix platforms at the above URL
Threading (fold recognition)
La sequenza di input viene confrontata con una libreria di folds noti
Si calcola un punteggio che esprima la compatibilità tra la sequenza e ciascun fold considerato
Punteggi statisticamente significativi indicano che la sequenza ha una certa probabilità di assumere la stessa struttura 3D del fold considerato
Input:
Sequenza
Donatore HAccettore HGlyIdrofobico
Collezione di folds di proteine note
Input:
Sequenza
Donatore HAccettore HGlyIdrofobico
Collezione di folds di proteine note
S=20S=5S=-2Z=5Z=1.5Z= -1
Donatore HAccettore HGlyIdrofobico
Chain/Domain Library
Scoring functions for fold recognition
Ci sono due metodi per valutare la compatibilità sequenza-struttura (1D-3D)
In methods based on structural profile, for every fold a profile is built based on structural features of the fold and compatability of every amino acid to the features.
The structural features of each position are determined based on the combination of secondary structure, solvent accessibility and the property of the local environment (hydrophobic/hydrophilic)
The profile is a defined mathematical structure, adjusted for pair-wise comparisons and dynamic programming
10100N
::::::::
10100167-9987-242
10100-80101-50101
GextGopY…DCA
Amino acid typePo
sitio
n on
sequ
ence
Contact potentials
This method is based on predefined tables which include pseudo-energetic scores to each pairwise interaction of two amino acids.
This method makes use of distance matrix for representation of different folds
For each pair of amino acids which are close in space the interaction energy is summed. The total sum is the indication for the fitness of the sequence into that structure
Scoring Function
…YKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEW…
Tendenza a stare in un certo ambiente: E_s
(singleton term)
Tendenza a stare vicini: E_p
(pairwise term)
Alignment gap penalty: E_g
Energia totale: E_m + E_p + E_s + E_g
Descrive quanto la sequenza assomiglia al templato
Qualità dell’allineamento in una certa posizione: E_m
(mutation term)
• • • • • • • • • • • • • •Amino acid index
Am
ino
acid
inde
x
••••
1
N1 N
•••• • •
••
Expected Performance
Predicted model
X-raystructure
PROSPECT prediction in CASP4:12 out 19 folds (no homology) recognized
Web sites for fold recognition
3D-PSSM - http://www.bmm.icnet.uk/~3dpssm
Profiles:
Libra I - http://www.ddbj.nig.ac.jp/htmls/E-mail/libra/LIBRA_I.html
UCLA DOE - http://www.doe-mbi.ucla.edu/people/frsvr/frsvr.html
Contact potentials
123D - http://www-Immb.ncifcrf.gov/~nicka/123D.html
Profit - http://lore.came.sbg.ac.at/home.html
Risultati
Ab initio methods for modelling
This field is of great theoretical interest but, so far, of very little practical applications. Here there is no use of sequence alignments and no direct use of known structures
The basic idea is to build empirical function that simulates real physical forces and potentials of chemical contacts
If we will have perfect function and we will be able to scan all the possible conformations, then we will be able to detect the correct fold
Algorithms for Ab initio prediction include:A. Searching procedure that scans many possible structures (conformations)B. Scoring function to evaluate and rank the structures
Due to the large search space, heuristic methods are usually applied
The parameters in the searching procedure are the dihedral angles which specify the exact fold of the polypeptide chain
A
C
B
E
D
A
C
B
ED
New Fold Methods
• Since almost all predictors use sequence and structural databases in some form, there is no longer an “ab initio” category
• Assessment is sometimes difficult to communicate due to the complexity of the protein structure and completeness of prediction
• Methods are still somewhat limited to smaller proteins
Rosetta-David Baker
• Based on the assumption that the distribution of conformations sampled by a local segment of the polypeptide chain is reasonably approximated by the distribution of structures adopted by that sequence and closely related sequences in known protein structures.
• Fragment libraries for all possible three and nine residue segments of the chain are extracted from PDB by profile methods
Rosetta-Simulation Procedure
• Information on fragments from secondary structure prediction methods compiled and scored based on equation for local secondary structure propensity
• Conformational space defined by these fragments is then searched by a Monte Carlo procedure with an energy function that favors compact structures with paired beta strands and buried hydrophobic residues, refinement of procedure late in simulation
• Thousands of structures generated• Filters to remove bad structures• Remaning structures clustered and cluster center taken
as the prediction.
Force fields- collection of terms that simulate the forces act between atoms
Terms based on probabilities to find pairs of amino acids or atoms within specific distances
Terms based on surface area and overlapping volume of spheres representing atoms
Methods to evaluate structures are based on
In homology modelling, construction of the side chains is done using the template structures when there is high similarity between the built protein and the templates
In spite of the huge size of the problem (because each side chain influences its neighbours) there are quite succesful algorithms to this problem.
Side chain construction
Without such similarity the construction can be done using rotamer libraries
A compromise between the probability of the rotamer and its fitness in specific position determines the score. Comparing the scores of all the rotamer for a given amino acid determines the preferred rotamer.
In this work we examined differences in structures of amino- acid side chains around point mutations.
Phe
AsnConformation - a given setof dihedral angle which defines a structure.
Rotamer - energetically favourable conformation.
SER 59.6 41.0 SER -62.5 26.4SER 179.6 32.6
Example to library of rotamers
TYR 63.6 90.5 21.0TYR 68.5 -89.6 16.4TYR 170.7 97.8 13.3TYR -175.0 -100.7 20.0TYR -60.1 96.6 10.0TYR -63.0 -101.6 19.3
Model evaluation
The main approaches for model evaluation are:A. Use of internal information (such as the one that used for the model construction)B. Use of external information derived from the databases
After the model is built we can check it by various methods.
If the model turns out to be bad, it is necessary to repeat several stages of the model building
Usually algorithms are checked by building models for proteins which have already solved structure and comparison between the model and the native structure
It is always possible that information from the native structure will be used in direct or indirect ways for model building
A more objective test is prediction of structures before they are publicly distributed (this is the idea of the CASP competitions)