![Page 1: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/1.jpg)
Rita Casadio
BIOCOMPUTING GROUPUniversity of Bologna, Italy
Prediction of protein function from sequence analysis
![Page 2: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/2.jpg)
The “omic” era
Update: January 2010
Archaea : 74 speciesIn Progress:52
Bacteria: 973 species In Progress: 2266 species
Complete-23Draft Assembly–318
In Progress-359
Eukaryotic:
http://www.ncbi.nlm.nih.gov/genomes/static/gpstat.html
Genome Sequencing Projects:
![Page 3: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/3.jpg)
The Data Bases of Biological Sequences and Structures
>BGAL_SULSO BETA-GALACTOSIDASE Sulfolobus solfataricus.MYSFPNSFRFGWSQAGFQSEMGTPGSEDPNTDWYKWVHDPENMAAGLVSGDLPENGPGYWGNYKTFHDNAQKMGLKIARLNVEWSRIFPNPLPRPQNFDESKQDVTEVEINENELKRLDEYANKDALNHYREIFKDLKSRGLYFILNMYHWPLPLWLHDPIRVRRGDFTGPSGWLSTRTVYEFARFSAYIAWKFDDLVDEYSTMNEPNVVGGLGYVGVKSGFPPGYLSFELSRRHMYNIIQAHARAYDGIKSVSKKPVGIIYANSSFQPLTDKDMEAVEMAENDNRWWFFDAIIRGEITRGNEKIVRDDLKGRLDWIGVNYYTRTVVKRTEKGYVSLGGYGHGCERNSVSLAGLPTSDFGWEFFPEGLYDVLTKYWNRYHLYMYVTENGIADDADYQRPYYLVSHVYQVHRAINSGADVRGYLHWSLADNYEWASGFSMRFGLLKVDYNTKRLYWRPSALVYREIATNGAITDEIEHLNSVPPVKPLRH
GenBank: 108,431,692 sequences 106,533,156,756 nucleotides
SwissProt: 514,212 sequences 180,900,945 residues
PDB: 60,654 structures membrane proteins <2%
NR(*): 10,381,779 sequences 3,542,056,219 residues
Update:January 2009(*) CDS translations+PDB+SwissProt+PIR+PRF
35,5 HGE!
![Page 4: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/4.jpg)
From Genotype to Phenotype
…code for proteins...
>protein kinase
acctgttgatggcgacagggactgtatgctgatctatgctgatgcatgcatgctgactactgatgtgggggctattgacttgatgtctatc....
Genes in DNA...
(about 30,000 in the human genome)
Proteins interact
…proteins correspond to functions...
…when they are expressed
From 5000 to 10000 proteins per tissue
…with different effects depending on
variabilityOver 20 millions of single mutations are
known in genes
….in methabolic pathways
![Page 5: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/5.jpg)
http://string.embl.de
STRING 8—a global view on proteins and theirfunctional interactions in 630 organisms-Jensen et al., 2009, Nucleic Acids Research, Vol 37.
The Human Interactome in STRING
22,937 proteins and 1,482,533 interactions
![Page 6: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/6.jpg)
One problem of the “omic era”:
Protein functional annotation
![Page 7: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/7.jpg)
The Protein Data Bankhttp://www.rcsb.org/pdb/home/home.do
No of Proteins with known structure: 57529
![Page 8: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/8.jpg)
SCOP: Structural Classification of Proteins
Domains are hierarchically classified: - class
- fold: proteins with secondary structures in same arrangement with the same topological connections
- superfamily: structures and functional features suggest a common evolutionary origin
- family: proteins with identities ≥30%; with identities <30% but with similar structures and functions
![Page 9: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/9.jpg)
From the Protein Sequence to the Structure and Function space
Lesk A., 2004
![Page 10: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/10.jpg)
PDB
•Sequence comparison
Sequence Identity (%)
0%
30%
100%
•Fold recognition•Machine-learning aided alignment•Threading
New Folds
•Ab initio and de novo modelling•Machine-learning prediction of structural features
From the Protein
Sequence to the Structure
space
![Page 11: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/11.jpg)
What is protein function?
From the Protein Sequence to the Structure and Function space
![Page 12: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/12.jpg)
What is a function?
For enzymes: function can be defined on the basis of the catalysed molecular reaction.
e.g. aspartic aminotransferase (AST)
![Page 13: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/13.jpg)
In biochemistry, a transaminase or an aminotransferase is an enzyme that catalyzes a type of reaction between an amino acid and an α-keto acid.
Specifically, this reaction (transamination) involves removing the amino group from the amino acid, leaving behind an α-keto acid, and transferring it to the reactant α-keto acid and converting it into an amino acid. The enzymes are important in the production of various amino acids, and measuring the concentrations of various transaminases in the blood is important in the diagnosing and tracking many diseases. Transaminases require the coenzyme pyridoxal-phosphate, which is converted into pyridoxamine in the first phase of the reaction, when an amino acid is converted into a keto acid.
Enzyme-bound pyridoxamine in turn reacts with pyruvate, oxaloacetate, or alpha-ketoglutarate, giving alanine, aspartic acid, or glutamic acid, respectively.
The presence of elevated transaminases can be an indicator of liver damage.
![Page 14: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/14.jpg)
Enzyme Commission (E.C.) classification
A hierarchical classification for enzymes
![Page 15: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/15.jpg)
EC 2.6 Transferring nitrogenous groupsEC 2.6.1TransaminasesEC 2.6.1.1 Aspartate transaminase
Other name(s): glutamic-oxaloacetic transaminase; glutamic-aspartic transaminase; transaminase A; AAT; AspT; 2-oxoglutarate-glutamate aminotransferase; aspartate α-ketoglutarate transaminase; aspartate aminotransferase; aspartate-2-oxoglutarate transaminase; aspartic acid aminotransferase; aspartic aminotransferase; aspartyl aminotransferase; AST; glutamate-oxalacetate aminotransferase; glutamate-oxalate transaminase; glutamic-aspartic aminotransferase; glutamic-oxalacetic transaminase; glutamic oxalic transaminase; GOT (enzyme); L-aspartate transaminase; L-aspartate-α-ketoglutarate transaminase; L-aspartate-2-ketoglutarate aminotransferase; L-aspartate-2-oxoglutarate aminotransferase; L-aspartate-2-oxoglutarate-transaminase; L-aspartic aminotransferase; oxaloacetate-aspartate aminotransferase; oxaloacetate transferase; aspartate:2-oxoglutarate aminotransferase; glutamate oxaloacetate transaminase
Systematic name: L-aspartate:2-oxoglutarate aminotransferase
![Page 16: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/16.jpg)
Problems:
Isoformse.g How to differentiate the function of the cytoplasmic aspartate amintransferase from that of mitochondrial isoform?
Non enzymatic proteins
![Page 17: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/17.jpg)
The Ontologies • Cellular component • Biological process• Molecular function
GO function vocabulary: http://www.geneontology.org/
![Page 18: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/18.jpg)
Gene Ontology classification:The human cytoplasmic aspartate transaminase
GO:0005829
GO:0006533
GO:0004069
![Page 19: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/19.jpg)
One BIG problem of the “omic era”:
Protein functional annotation
![Page 20: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/20.jpg)
Sequence identity 40 %
Functional annotation in silico by homology search
Similar structure and function (??)
ADH1_SULSO ----------MRAVRLVEIGKP--LSLQEIGVPKPKGPQVLIKVEAAGVCHSDVHMRQGRFGNLRIVEADH_CLOBE ----------MKGFAMLGINKLG---WIEKERPVAGSYDAIVRPLAVSPCTSDIHTVFEGA-------ADH_THEBR ----------MKGFAMLSIGKVG---WIEKEKPAPGPFDAIVRPLAVAPCTSDIHTVFEGA-------ADH1_SOLTU MSTTVGQVIRCKAAVAWEAGKP--LVMEEVDVAPPQKMEVRLKILYTSLCHTDVYFWEAKG-------ADH2_LYCES MSTTVGQVIRCKAAVAWEAGKP--LVMEEVDVAPPQKMEVRLKILYTSLCHTDVYFWEAKG-------ADH1_ASPFL ----MSIPEMQWAQVAEQKGGP--LIYKQIPVPKPGPDEILVKVRYSGVCHTDLHALKGDW-------
Sequence comparison is performed with alignment programs
BLAST, Psi-BLAST (http://www.ncbi.nlm.nih.gov/BLAST/) sequenceAltschul et al., (1990) J Mol Biol 215:403-410Altschul et al., (1998) Nucleic Acids Res. 25:3389-3402
Methods for similarity searches:
Pfam (http://pfam.wustl.edu/hmmsearch.shtml) sequence/structure
Bateman et al., (2000) Nucleic Acids Research 28:263-266
![Page 21: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/21.jpg)
Function annotation transfer from sequence through homology
Transfer by inheritance:
![Page 22: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/22.jpg)
http://www.uniprot.org/
![Page 23: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/23.jpg)
The annotation process at UniProt
PDB
![Page 24: Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis](https://reader035.vdocuments.site/reader035/viewer/2022062504/5a4d1b5b7f8b9ab0599ab32c/html5/thumbnails/24.jpg)
Open problems of “inheritance through homology “
•Not all UniProt files are GO annotated
•The optimal threshold value of sequence identity for function transfer is not known
•Proteins contain multiple domains
•Proteins can share common domains and not necessarily the same function
•In proteins different combination of shared domains lead to different biological roles