application and implementation of probabilistic profile profile comparison methods for protein fold...
Post on 18-Jul-2015
41 Views
Preview:
TRANSCRIPT
Application and implementation of probabilistic
profile-profile comparison methods
for protein fold recognition
mgr inż. Jakub Paś
Dissertation supervisor: dr hab. Marcin Hoffmann
Auxiliary supervisor: dr Krystian Eitner
Introduction
● The purpose of Phd thesis is to show that profile - profile methods may
outperform other fold recognition approaches in comparison and analysis
of distantly related proteins.
● The work presents advantages of usage probabilistic profile-profile
methods over comparable fold recognition techniques in research
performed by author.
● The thesis is based on several author’s publications in area of gene
identification, detection of distant homologous, domain boundaries
detection, protein modeling, evolutionary analysis and protein - ligand
interaction
● The work shows both applications and implementations of such methods in
molecular biology software.
What is fold Recognition
● Fold recognition are the methods of fold detecting and protein tertiary
structure prediction applied for proteins lacking homologues sequences of
known fold and structure deposited in the Protein Data Bank.
● They are based on assumption that there is strictly limited number of
different protein folds in nature, mostly as a result of evolution and due to
basic physical and chemical constraints of polypeptide chains.
● Fold recognition methods are useful for protein structure prediction,
evolutionary analysis, metabolic pathways analysis, enzymatic efficiency
prediction, molecular docking and drug design.
Sequence comparison methods
used for fold recognition
Sequence based
comparison methods:
● Smith Waterman
● BLAST
Sequence - Profile methods:
● PSI - BLAST
● RPS-BLAST
Profile - profile methods.
FFAS:
● Meta-BASIC
● ORFEUS
Other (non sequence based) fold
recognition methods:
● Threading (HHpred, Raptor)
● Ab initio (Rosetta)
What are probabilistic profile -
profile comparison methods?
● A profile or PSSM simply is a set of vectors, where each vector contains
the frequency of each type of amino acid in a particular position of the
multiple sequence alignment.
● In profile - profile alignments, we have to compare two frequency vectors
and this can be done in several different ways, including calculating the
sum of pairs, the dot-product, or a correlation coefficient between the two
vectors
● The performance of a profile - profile methods depends on: calculation of
the score between two profile positions, alignment methodology and
evolutionary distance between two sequences in study.
● Because profiles contain more information they are more sensitive and
provide better alignments than sequence - sequence methods.
Improvement and assessment
of fold recognition methods
Benchmarking of fold recognition methods:
● CASP(Critical Assessment of protein Structure Prediction)
● CAFASP(Critical Assessment of Fully Automated Structure Prediction)
● LIVEBENCH (Live Benchmark)
● EVA
Application vs implementation
Implementation - realization of an application, or execution of
a plan, idea, model, design, specification, standard, algorithm,
or policy.
Eg. Implementation of profile - profile comparison methods to
create “tools” (molecular biology software)
Application - to put something into effect or into use after it
was implemented.
Eg. Usage of such software in for science discovery
(operating “tools”)
Implementations of profile-profile
comparison methods
GRDB
J Pas, P Stepniak, L Rychlewski, GRDB – Gene
Relational DataBase. Bioinfobank Library Acta, 2011.
2659.
ELM
CM Gould, F Diella, A Via, P Puntervoll, C Gemünd, S
... J Pas, S … ELM: the status of the 2010 eukaryotic
linear motif resource
Nucleic acids research 38 D167-D180 137 2010
Implementations of profile-profile
comparison methods
ORFeus
K Ginalski, J Pas, LS Wyrwicz, M Von Grotthuss, JM
Bujnicki, L Rychlewski ORFeus: Detection of distant
homology using sequence profiles and predicted
secondary structure.
Nucleic Acids Res, 2003. 31(13): p. 3804-3807
PDB-Preview
Fischer, D., J. Pas, and L. Rychlewski, The PDB-
Preview database: a repository of in-silico models
of 'on-hold' PDB entries.
Bioinformatics, 2004. 20(15): p. 2482-4.004. 20(15)
Applications of profile-profile
comparison methods
ELM: the status of the 2010 eukaryotic linear motif resource
CM Gould, F Diella, A Via, P Puntervoll, C Gemünd, S ... J Pas at al.
Nucleic acids research 38 (suppl 1), D167-D180 137 2010
Predicting protein structures accurately
M von Grotthuss, LS Wyrwicz, J Pas, L Rychlewski
Science 304 (5677), 1597-1599 5 2004
How unique is the rice transcriptome?
LS Wyrwicz, M von Grotthuss, J Pas, L Rychlewski
Science (New York, NY) 303 (5655), 168; author reply 168 6 2004
Structure prediction, evolution and ligand interaction of CHASE
domain.
J Pas., LS Wyrwicz, L Rychlewski, J Barciszewski
FEBS Lett, 2004. 576(3): p. 287-90.
Linear motif identification in
distant proteins
Protein structure prediction
Gene identification
Protein topology Detection
Applications of profile-profile
comparison methods
Two sequences encoding chalcone synthase in yellow lupin
(Lupinus luteus l.) may have evolved by gene duplication
D Narożna, J Pas, J Schneider, CJ Mądrzak
Cellular & molecular biology letters 9 (1), 95-105 5 2004
Molecular phylogenetics of the RrmJ/fibrillarin superfamily of
ribose 2'-O-methyltransferases
M Feder, J Pas, LS Wyrwicz, JM Bujnicki
Gene 302 (1-2), 129-138 66 2003
Application of 3D-Jury, GRDB, and Verify3D in fold recognition.
Proteins
M Grotthuss, J Pas, L Wyrwicz, K Ginalski, L Rychlewski
Proteins, 2003. 53 Suppl 6: p. 418-23.
Predicting protein structures accurately
LS Wyrwicz, M Von Grotthuss, J Pas, L Rychlewski
Science 304 (5677), 1597
Gene duplication detection
Molecular phylogeny
Fold Recognition
Distant homologues detection
Gene identification and detection of
distinct homologuesApplications of profile-profile comparison methods
D Narożna, J Pas, J Schneider, CJ Mądrzak, Two sequences encoding
chalcone synthase in yellow lupin (Lupinus luteus l.) may have evolved by gene
duplication. Cell Mol Biol Lett, 2004. 9(1): p. 95-105.
● In the publication about detection of chalcone synthase (CHS)
sequences encoded in yellow lupin profile-profile fold recognition
methods were used to detect two full copies of CHS.
● Using the molecular clock calibration the duplication of genes was
estimated to happened about 16 millions years ago.
● Initial multiple alignment of distant homologues from 52 plant
species was created using multiple profile-profile comparison
methods.
Gene identification and detection of
distinct homologuesApplications of profile-profile comparison methods
D Narożna, J Pas, J Schneider, CJ Mądrzak, Two sequences encoding
chalcone synthase in yellow lupin (Lupinus luteus l.) may have evolved by gene
duplication. Cell Mol Biol Lett, 2004. 9(1): p. 95-105.
● Amino-acids involved in ligand binding has been
detected.
● Catalytic, evolutionary conserved amino acids
helped to produce full multiple sequence alignment
of known sequences of whole CHS family.
Detection of domain boundaries and
modeling of complex proteinsApplications of profile-profile comparison methods
● Domain boundaries and the homologs of the Tn-C domains were
identified using Gene Relate Sequence Database (GRDB)
● The characteristic profiles were computed for every domain using
protein families collected from Pfam, COG and from other genomic
sources.
● The comparison of the target families with about 100,000 other
families was performed using Meta-BASIC program
● The models of Tenascin-C domains were performed
J Pas., et al., Analysis of structure and function of tenascin-C. Int J Biochem
Cell Biol, 2006. 38(9): p. 1594-602.
Detection of domain boundaries and
modeling of complex proteinsApplications of profile-profile comparison methods
● Usage of sensitive profile - profile sequence comparison analysis allowed to detect the order of
functional elements in large multidomain tenascin-C protein, all variable part of a molecule as
well as all isoforms.
● The number of putative fibronectin repeats was corrected. Also previously not identified HSP33
domain with was described.
J Pas., et al., Analysis of structure and function of tenascin-C. Int J Biochem
Cell Biol, 2006. 38(9): p. 1594-602.
Profile-profile comparison
allowed shows
conservation of sequence
patterns and secondary
structure despite the low
amino acid identity which
helped to study evolution
of PYP-like family.
J Pas., et al., Structure
prediction, evolution and
ligand interaction of
CHASE domain. FEBS
Lett, 2004. 576(3): p.
287-90.
Evolutionary analysis and
protein - ligand interactionApplications of profile-profile comparison methods
Evolutionary analysis and
protein - ligand interactionApplications of profile-profile comparison methods
● Molecular model of CRE1
receptor from A. thaliana
docked with (a)trans-zeatin, (b)
kinetin confirms that ligands are
entirely buried
● The visible side chain belongs
to threonine 278 whose
mutation is responsible for loss
of function.
● Molecular modeling and
docking confirmed that ligand
was entirely buried.
● Importance of threonine 278 for
the catalytics activity of the
enzyme was confirmed.
J Pas., et al., Structure prediction, evolution and ligand interaction of CHASE
domain. FEBS Lett, 2004. 576(3): p. 287-90.
Implementation: PDB PreviewImplementations of profile-profile comparison methods
● Not all the entries in the Protein Data Bank (PDB) are publicly available.
● A new structure can be deposited as an “on-hold” entry, non-accessible for
public before final release.
● To access 3D structure before release it is possible in most cases to
generate relatively accurate automatically created computational models.
● The PDB-Preview provide biologists with relatively accurate 3D models for
not yet released PDB shortly after they are deposited in the PDB, and well
before the experimental structure is released.
● Additionally the resulting PDB-CAFASP analysis provides computational
biologists with a continuous blind evaluation of their methods.
D Fisher., J Pas, and L Rychlewski, The PDB-Preview database: a
repository of in-silico models of 'on-hold' PDB entries. Bioinformatics,
2004. 20(15): p. 2482-4.
Implementation: Gene Relational
DataBase (GRDB)Implementations of profile-profile comparison methods
● GRDB is the web service dedicated for searching for distant homologues
of protein sequences which may not be detected using different
approaches such as direct sequence search.
● It performs the comparison of the target family with 100,000 other families,
using Meta BASIC profile-profile comparison methods. (SCOP, CATH,
COG)
● In contrast other methods it allows to use manually build profile as input
and perform comparison between whole protein families.
● GRDB was successfully used for comprehensive classification of proteins
folds and identification of novel families and their representatives in human
(Kuchta, et al., 2009).
J Pas., et al., GRDB – Gene Relational DataBase. Bioinfobank Library Acta,
2011. 2659.
Conclusions
● Profile-profile based sequence comparison methods are usually superior to
sequence methods and may have more possible applications in molecular
biology.
● PDB statistics shows that in recent years, only a limited number of
completely new protein folds appear although several thousand new
structures are deposited to the PDB. Most of the single-domain proteins
can be aligned to a protein already deposited in the PDB.
● The output of experimentally determined protein structures from X-ray
crystallography and NMR spectroscopy are still expensive and time
consuming despite the efforts in structural genomics.
● As more and more novel sequences are produce from the genome projects,
the profile-based methods can be expected to become even more sensitive
(new alignment, scoring methods etc.)
Acknowledgements
Dissertation supervisor: dr hab. Marcin Hoffmann
Auxiliary supervisor: dr Krystian Eitner
My wife: Agnieszka Paś
Recenzja: Prof. Tadeusz Kuliński
● Współczynnik impact factor oraz cytowalność. Autor niestety w żaden
sposób nie wyróżnia tych prac, które stanowią spójny tematyczne zbiór
artykułów i faktycznie wchodzą w skład dysertacji.
● Literówki i błedy językowe
● Błąd merytoryczny. W publikacji:Two sequences encoding chalcone synthase in yellow lupin (Lupinus luteus l.) may have
evolved by gene duplication
Narożna, J Pas, J Schneider, CJ Mądrzak
Cellular & molecular biology letters 9 (1), 95-105 5 2004
istotnie chodziło o 52 sekwencje roślinnej suntazy chalkonowej
w tym 2 z L. Luteus
top related