tools and algorithms in bioinformatics · 1 _____ 12/6/2013 gcba 815 tools and algorithms in...

16
1 __________________________________________________________________________________________________ 12/6/2013 GCBA 815 Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week-14: Protein Structure and PTM Analysis Tools Babu Guda Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center __________________________________________________________________________________________________ 12/6/2013 GCBA 815 Structural Bioinformatics

Upload: dinhnga

Post on 16-Sep-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

1

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

Tools and Algorithms in Bioinformatics GCBA815, Fall 2013

Week-14: Protein Structure and PTM

Analysis Tools

Babu Guda Department of Genetics, Cell Biology and Anatomy

University of Nebraska Medical Center

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

Structural Bioinformatics

2

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

Human cancer-related protein (MDM2) with embedded small-molecule drug compounds (“nutlin”). MDM2 is shown as stick figures; “nutlin” is shown as small cyan colored spheres (van der Wall’s radii).

Picture taken from BayeNetwork

Binding of Drug compound to a cancer-related protein, MDM2

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

Structural View of Biology

•  The function of a biological macromolecule is highly dependent on its structural confirmation

•  Deciphering the structure of DNA (double-helix) has revolutionized biological research

•  Similarly, enzyme functions are highly specific that are regulated by proper orientation of their active sites

•  While a lot of proteins act as enzymes, there are a number of structural proteins that support cellular and tissue-level infrastructure and aid in intra and inter cellular communication

3

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

Examples •  Actin: Support the size, shape, structure and motion of cells •  Cadherin: Adhesive proteins that glue cells together •  Clathrin:Vesicular trafficking •  Collagen: About 25% of all protein in our body •  Integrins: On the cell surface, linking cells •  Vaults: Symmetrical shells made of vault proteins

•  PDB-101: http://www.pdb.org/pdb/101/structural_view_of_biology.do

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

The 20 natural amino acids

4

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

n  Primary structure: The linear amino acid sequence of the polypeptide (PP) chain including post-translational modifications and disulfide bonds.

n  Secondary structure: Local structure of linear segments of the PP backbone atoms without regard to the conformation of the side chains.

n  Tertiary structure: The three-dimensional arrangement of all atoms in a single PP chain.

n  Quaternary structure: The arrangement of separate PP chains (subunits) into the functional protein

Bovine Mitochondrial F1-Atpase (ATP Synthase Chain Heart Isoform; Ec: 3.6.1.34) Chain α : A, B, C; Chain β: D, E, F; Chain γ: G

Calcium/Calmodulin-Dependent Protein Kinase

Structural Forms of Proteins

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

5

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

Protein Data Bank (PDB) http://www.rcsb.org/pdb

Molecule Type

Proteins Nucleic Acids Protein/NA Complexes Other Total

Exp. Method

X-ray 79224 1496 4125 4 84849 NMR 8949 1054 197 7 10207

Electron Microscopy 493 51 162 0 706

Other 208 7 8 14 237 Total 88874 2608 4492 25 95999

6

Rodes, 2006

7

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

Protein structure data format: PDB

8

PDB IDs

•  Four letter code for the compound, case insensitive (Ex: 2HHB)

•  Always start with a numeric followed by alphanumeric

•  Each compound may have multiple chains, a chain ID is denoted by compound ID followed by ‘:’ and chain identifier (Ex: 2HHB:A)

•  If the compound has only one chain (monomer), ‘_’ denotes the chain position (Ex: 1BBS:_)

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

•  Structural alignment involves establishing equivalencies between residues in two or more proteins based on their 3D-coordinates

•  3-D coordinates from C-α atoms are most commonly used for calculation of distance in structural alignments

Structure Alignments

L F KR

I F GR

L F KR

L W GP

9

Protein 3-D Visualization Tools

•  Jmol (http://jmol.sourceforge.net)

•  Simple viewer (PDB)

•  Protein workshop (PDB)

•  QuickPDB viewer (PDB)

•  DeepView - Swiss-Pdb Viewer (http://spdbv.vital-it.ch/)

•  PyMOL (http://ww.pymol.org)

•  KiNG viewer (http://http://kinemage.biochem.duke.edu/software/king.php)

Visualization of Protein Structures

•  All Alpha:

•  Haemoglobin – 1BAB

•  K+ Channel Protein - 1BL8

•  All Beta : Porin - 2POR

•  Mixed Alpha-beta: TIM barrel -1YPI

10

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

Educational resources

•  PDB: http://www.rcsb.org/pdb

•  http://public.csusm.edu/jayasinghe

•  Expasy tools: http://expasy.org

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

Predicting Post-translational Modification (PTM) Sites of Proteins

11

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

General Method for PTM site Prediction

•  PROSITE provides consensus patterns for a number of PTM sites. PTM modifications occur based on the structural or environmental context in the protein fold

•  Because of this reason, methods based on regular expressions (regex) or local alignment methods produce large number of false positives

•  In almost all methods used in PTM site prediction, artificial neural networks (ANNs) or HMMs are used.

•  General procedure:

•  Prepare datasets with experimentally-known PTM sites

•  Separate the dataset into training and testing data

•  Train a network using training data and test it with the test dataset. This process is iterated until the model is well refined

•  Sufficient number of training sequences and good quality data are important for the success of any neural network method

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

Different Post-translational modifications (PTMs)

•  Glycosylation

•  ASN(N)-glycosylation (NetNGlyc)

•  O-glycosylation (NetOGlyc)

•  Sulfation (Sulfinator)

•  Phosphorylation (NetPhos)

•  Myristoylation/Palmitoylation (adding a lipid group

•  SUMOyalation (ubiquitin like proteins)

•  S-nitrosylation

12

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

Prediction of Phosphorylation Sites (NetPhos (http://www.cbs.dtu.dk/services/NetPhos/)

•  Protein kinases, a very large family of enzymes that catalyze phosphorylation

•  NetPhos produces neural network predictions for serine (S), threonine (T) or tyrosine (Y) phosphorylation sites in eukaryotic proteins that affect a multitude of cellular signaling processes

•  Y-kinase Phosphorylation

•  S or T-Phosphorylation in Caesin Kinase II

•  Since these are very short patterns, the amino acids surrounding a phosphorylated residue are significant in determining whether a particular site can be phosphorylated or not

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

Prediction of Glycosylation Sites (NetNGlyc, NetOGlyc)

•  Glycoproteins are specially synthesized molecules by covalent attachment of oligosaccharides to certain proteins at the ASN(N-glycosylation) or Serine or Threonine residues (O-glycosylation).

•  These are usually exported to extra-cellular destinations like mucin in alimentary tract or glycoprotein harmones in the anterior Pitutory gland.

•  N-glycosylation

•  O-glycosyltion

•  No consensus pattern

•  SEA domain is associated with it

13

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

Prediction of Sulfation Sites

•  Tyrosine (Y) sulfation is an important post-translational modification for proteins that go through the secretory pathway. It regulates several protein-protein interactions and modulates the binding affinity of TM peptide receptors

•  Based on the rules described above, HMMs could be trained to build models for predicting proteins sequences with patterns that abide by these rules

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

Sulfinator Algorithm (http://us.expasy.org/tools/sulfinator/)

•  Sulfinator employs four different HMMs to recognize N-terminal (HMM-N), Internal (HMM-I), C-terminal (HMM-C) and in Y-clusters (HMM-Y)

14

Prediction of protein subcellular localization

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=mcb.figgrp.4668

Protein Sorting in Eukaryotic Cells

15

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

ngLOC: An n-gram based Bayesian method King and Guda, Genome Biology (2007)

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

Predicting subcellular proteomes using ngLOC

Yeast Worm Fruitfly Mosquito Zebrafish Chicken Mouse Human S.cerevisiae Nematode D.melano. A.gambiae D.rerio G.gallus M.musculus H.sapiens RANGE Proteome Size: 5799 22400 13649 15145 13803 5394 33043 38149 GO annotated: 5486 12357 9997 8847 10106 4363 23744 24638 % ngLOC Coverage: 97.48 94.92 96.73 97.94 98.64 99.82 94.79 94.52 94.79 - 99.82 Proteome Estimated: 5653 21262 13203 14833 13616 5384 31320 36059 % CYT 15.22 14.80 12.74 14.43 15.01 13.66 13.44 14.14 12.74 - 15.22 % CSK 1.07 1.19 1.05 1.11 1.31 1.24 1.50 1.48 1.05 - 1.50 % END 2.71 3.47 2.85 3.25 3.34 2.53 2.99 3.04 2.53 - 3.47 % EXC 8.88 12.60 12.26 14.28 9.91 12.65 11.52 11.71 8.88 - 14.28 % GOL 1.48 1.31 1.40 1.07 1.68 1.47 1.52 1.56 1.07 - 1.68 % LYS 0.11 0.58 0.55 0.53 0.65 0.44 0.59 0.67 0.11 - 0.67 % MIT 9.55 5.84 4.86 5.52 4.72 4.16 4.24 4.80 4.16 - 9.55 % NUC 33.53 29.75 37.38 29.50 30.31 28.24 27.35 28.38 27.35 - 37.38 % PLA 16.19 24.41 20.06 21.36 21.66 22.78 27.18 24.08 16.19 - 27.18 % POX 0.54 0.66 0.42 0.48 0.51 0.25 0.44 0.46 0.25 - 0.66 % Single-Localized 89.29 94.60 93.59 91.53 89.11 87.42 90.77 90.32 % Multi-Localized 10.71 5.40 6.41 8.47 10.89 12.58 9.23 9.68 % CYT-NUC 6.49 2.36 2.76 3.44 5.40 6.27 4.51 4.74

16

__________________________________________________________________________________________________ 12/6/2013 GCBA 815

ngLOC: Web Server