computational proteomics - nbpgr...protparam tool protparam is a tool which allows the computation...

Post on 02-Oct-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Computational Proteomics

Proteomics Tools

Basic Tools

Primary Structure Analysis Tools

Secondary Structure Analysis Tools

Post-translation Modification Prediction Tools

Topology Prediction Tools

Pattern and Profile Searches Tools

Similarity Searches Tools

RNA Structure Prediction Tools

Protein Identification and Characterization Tools

Basic Tools

Translate Tool

Translate Tool Output:

Reverse Translate Tool

Transcription and Translation Tool

Primary Structure Analysis Tools

ProtParam Tool

ProtParam is a tool which allows the computation of various physical and chemical

parameters for a given protein stored in Swiss-Prot or TrEMBL or for a user entered

sequence.

The computed parameters include the molecular weight, theoretical pI, amino acid

composition, atomic composition, extinction coefficient, estimated half-life, instability

index, aliphatic index and grand average of hydropathicity (GRAVY).

ProtParam Output:

ScanSite pI/MW Tool

ScanSite pI/MW Output:

RADAR (Rapid Automatic Detection and Alignments of Repeats in Protein Sequence)

RADAR Output:

COILS - Prediction of Coiled Coil Regions in Proteins

2ZIP Server

2ZIP Server Output:

ePESTfind

epestfind allows rapid and objective identification of PEST motifs in protein target sequences. Briefly, the PEST hypothesis was based on a literature survey that combined both information on protein stability as well as protein primary sequence information. Initially, the study relied on 12 short-lived proteins with well-known properties [1], but was continually extended later [2,3]. The initial group of proteins included E1A, c-myc, p53, c-fos, v-myb, P730 phytochrome, heat shock protein 70 (HSP 70), HMG-CoA reductase, tyrosine aminotransferase (TAT), ornithine decarboxylase (ODC), alpha-Casein and beta-Casein. Although all these proteins exerted various different cellular functions it became apparent that they shared high local concentrations of amino acids proline (P), glutamic acid (E), serine (S), threonine (T) and to a lesser extent aspartic acid (D). From that it was concluded that PEST motifs reduce the half-lives of proteins dramatically and hence, that they target proteins for proteolytic degradation.

ePESTfind

ePESTfind Output:

ProtScale

ProtScale Output:

Secondary Structure Analysis Tools

GOR IV A Secondary Structure Predicting Server

GOR IV Output:

GOR IV Output: Conti….

SSP (Protein Secondary Structure)

SSP (Protein Secondary Structure) Output:

PDISORDER prediction

PDISORDER V. 1.0 is the program for predicting ordered and disordered regions in protein sequences.

Minimum required sequence length is 40.

It is increasingly evident that intrinsically unstructured protein regions play key roles in cell-signaling,

regulation and cancer (Iakoucheva et al., J. Mol. Biol. (2002) 323, 573–584), which makes them extremely

useful for discovery of anticancer drugs. Requirement of intrinsic structural disorder is shown for many

protein functions - see, for instance, Dunker et al., Biochemistry (2002) 41 (21), 6573 -6582.

PDISORDER prediction Output:

Post-translation

Modification Prediction

Tools

PATS - Prediction of apicoplast targeted sequences

PATS Output:

NetOGlyc - Prediction of O-GalNAc (mucin type) glycosylation sites in mammalian proteins

NetOGlyc - Prediction of O-GalNAc (mucin type) glycosylation sites in mammalian proteins. Conti…

YinOYang - O-beta-GlcNAc attachment sites in eukaryotic protein sequences

YinOYang - O-beta-GlcNAc attachment sites in eukaryotic protein sequences. Conti…

YinOYang - O-beta-GlcNAc attachment sites in eukaryotic protein sequences. Conti…

Myristoylator

Myristoylator Output:

Sulfinator

Sulfinator Output:

ProP

ProP Output:

SignalP

SignalP 3.0 server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks and hidden Markov models.

SignalP Output:

SignalP Output:

NetAcet Server

NetAcet 1.0 server predicts substrates of N-acetyltransferase A (NatA). The method was

trained on yeast data but, as mentioned in the article describing the method, it obtains

similar performance values on mammalian substrates acetylated by NatA orthologs.

NetAcet Server Output:

NetPhosYeast

NetPhosYeast 1.0 server predicts serine and threonine phosphorylation sites in yeast proteins. This service is closely related to NetPhos and NetPhosK.

NetPhosYeast Output:

NetPhosYeast Output:

DictyOGlyc1.1 Server

The DictyOGlyc server produces neural network predictions for GlcNAc O-glycosylation sites in Dictyostelium discoideum proteins.

DictyOGlyc1.1 Server Output:

DictyOGlyc1.1 Server Output:

NetCGlyc1.0 Server

NetCGlyc 1.0 produces neural network predictions of C-mannosylation sites in mammalian proteins.

NetCGlyc1.0 Server Output:

NetOGlyc3.1 Server

The NetOglyc server produces neural network predictions of mucin type GalNAc O-glycosylation sites in mammalian proteins.

NetOGlyc3.1 Server Output:

NetOGlyc3.1 Server Output:

NetGlycate1.0 Server

NetGlycate 1.0 server predicts glycation of ε amino groups of lysines in mammalian proteins.

NetGlycate1.0 Server Output:

NetPicoRNA1.0 Server

The NetPicoRNA 1.0 server produces neural network predictions of cleavage sites of picornaviral proteases

NetPicoRNA1.0 Server Output:

NetCorona1.0 Server

NetCorona predicts coronavirus 3C-like proteinase (or protease) cleavage sites using artificial neural networks on amino acid sequences. Every potential site is scored and a list is compiled in addition to a graphical representation. Refer to publication for more detailed information and performance values.

NetCorona1.0 Server Output:

NetCorona1.0 Server Output:

Topology Prediction Tools

NetNES

NetNES Output:

TargetP

TargetP Output:

TMHMM Server

TMHMM Server Output:

Pattern and Profile Searches

Tools

InterProScan

InterProScan Output:

ProDom

ProDom Output:

ScanProsite

ScanProsite Output:

Similarity Searches Tools

BLAST

WU-BLAST

WU-BLAST Output:

WU-BLAST Output:

Fasta3

Fasta3

Fasta3 Output:

PropSearch

Common protein sequence alignment programs are at present not capable to detect functional and / or structural homologs, if the sequence identity is below the significance threshold of about 25%. PROPSEARCH was designed to find the putative protein family if querying a new sequence has failed using alignment methods. By neglecting the order of amino acid residues in a sequence, PROPSEARCH uses the amino acid composition instead. In addition, other properties like molecular weight, content of bulky residues, content of small residues, average hydrophobicity, average charge a.s.o. and the content of selected dipeptide-groups are calculated from the sequence as well. 144 such properties are weighted individually and are used as query vector. The weights have been trained on a set of protein families with known structures, using a genetic algorithm. Sequences in the database are transformed into vectors as well, and the euclidian distance between the query and database sequences is calculated. Distances are rank ordered, and sequences with lowest distance are reported on top.

PropSearch

PropSearch Output:

Structure Assignment With Text Description (SAWTED)

SAWTED stands for Structure Assignment With Text Description. It is a method to improve the coverage of the detection of remote homologues of known structure by sequence searches (e.g. PSI-BLAST) and fold recognition programs. When sequence database searches return only hits with scores worse than an accepted threshold for reliability the user will often compare what is known about the function of the query sequence with that known about the poor scoring hits. Some hits may appear more sensible than others and deserve closer inspection. In SAWTED this comparison is made automatically using an algorithm to compare the text of SWISS-PROT annotations related to the query and to the poor scoring hits. A single E-value is given for the user to assess the similarity of function. SAWTED is currently implemented to enhance PSI-BLAST searches against the PDB, and as part of our 3D-PSSM fold recognition server. We use the words from the comments (CC) and keywords (KW) sections of the SWISS-PROT entry. The score between two SWISS-PROT entries is calculated using the vector cosine model of text retrieval.

Structure Assignment With Text Description (SAWTED)

RNA Structure Prediction Tools

RNA Secondary Structure Prediction Tool

RNA Secondary Structure Prediction Tool Output:

pknotsRG

pknotsRG is a tool for folding RNA secondary structures, including the class of simple recursive

pseudoknots. The program runs in O(n^4) time and O(n^2) space, therefore its application here

on the BiBiserv is limited to sequences of length up to 800 bases. The energy parameters for

structures containing no pseudoknots are the same as in the actual mfold 3.1. The energy for

pseudoknots is computed with a model similar to that used by Rivas&Eddy in pknots. The

folding temperature is fixed to 37C.

Currently, there are three different modes available:

•pknotsRG-mfe: computes the structure of s (knotted or not) of minimal free energy.

•pknotsRG-enf: enforces a pseudoknot: it delivers the energetically best complete structure of

s that includes at least one pseudoknot somewhere.

•pknotsRG-loc: delivers the best local pseudoknot that can be element of any structure of s,

where "best" is defined by free energy per nucleotide. The rest of s remains unfolded.

pknotsRG

pknotsRG Output:

Protein Identification and Characterization Tools

FindMod

FindMod is a tool that can predict potential protein post-translational modifications (PTM) and find potential single amino acid substitutions in peptides. The experimentally measured peptide masses are compared with the theoretical peptides calculated from a specified Swiss-Prot/TrEMBL entry or from a user-entered sequence, and mass differences are used to better characterise the protein of interest

MASCOT

Mascot is a powerful search engine which uses mass spectrometry data to identify proteins from primary sequence databases. While a number of similar programs available, Mascot is unique in that it integrates all of the proven methods of searching. These different search methods can be categorised as follows: Peptide Mass Fingerprint in which the only experimental data are peptide mass values, (detailed description) Sequence Query in which peptide mass data are combined with amino acid sequence and composition information. A super-set of a sequence tag query, (detailed description) MS/MS Ion Search using uninterpreted MS/MS data from one or more peptides, (detailed description) The general approach for all types of search is to take a small sample of the protein of interest and digest it with a proteolytic enzyme, such as trypsin. The resulting digest mixture is analysed by mass spectrometry. Different types of mass spectrometer have different capabilities. A simple instrument will measure a set of molecular weights for the intact mixture of peptides. An instrument with MS/MS capability can additionally provide structural information by recording the fragment ion spectrum of a peptide. Usually, the digest mixture will be separated by chromatography prior to MS/MS analysis, so that MS/MS spectra from individual peptides can be measured.

MASCOT

MASCOT Results:

pepMAPPER

Mapper, the UMIST protein search tool which uses Mass Spec. data produced by the digestion of a protein to identify a match to a protein from a database.

top related