protein structure prediction and structural genomics computer science department north dakota state...

29
Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

Upload: jerome-mckenzie

Post on 03-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

Protein Structure Prediction and Structural Genomics

Computer Science DepartmentNorth Dakota State University

Fargo, ND

Page 2: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

2

Outline

Structure of Protein Prediction Methods CASP Cup

Page 3: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

3

Protein

Proteins are synthesized as linear chains of amino acids, but they quickly fold into a compact, globular structure.

Polypeptide sequence

Page 4: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

4

Each amino acid has two parts, a backbone and a side chain. The side chain, R, distinguishes the different amino acids. Backbone is constant for all 20 amino acids. It consists of an amide (--NH2)

group, an alpha carbon, and a carboxylic acid (--COOH) group.

Peptide bond formation:

Page 5: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

5

The Amino Acids list

IDGroup Code Group Name

1 Gly Glycine

2 Ala Alanine

3 Val Valine

4 Leu Leucine

5 Ile Isoleucine

6 Ser Serine

7 Thr Threonine

8 Cys Cysteine

9 Met Methionine

10 Pro Proline

11 Asp Aspartic acid

12 Asn Asparagine

13 Glu Glutamic acid

14 Gln Glutamine

15 His Histidine

16 Lys Lysine

17 Arg Arginine

18 Phe Phenylalanine

19 Tyr Tyrosine

20 Trp Tryptophan

Page 6: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

6

Protein Primary structure:

Protein Primary Sequences can be written with a 3-letter code for the 20 amino acids (above) or with a 1-letter code: Ex: Human Insulin

A-Chain: GIVEQCCTSICSLYQLENYCN

B-Chain:

FVNQHLCGSHLVEALYLVCGERGFFYTPKT

Page 7: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

7

Protein Secondary structure

Protein secondary structure refers to regular, repeated patters of folding of the protein backbone.

Patterns result from regular hydrogen bond patterns of backbone atoms.

Page 8: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

8

Protein Secondary Structure

The two most common folding patterns are the alpha helix and the beta sheet.

Page 9: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

9-helix antiparallel -sheet

Two elements of secondary structure are alpha helices (= =-60o) and beta strands (= -135o,=135o), which associate with other beta strands to form parallel or anti-parallel beta sheets

Page 10: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

10

Only two rotatable bonds in protein The bond between the amide

nitrogen and the alpha carbon, referred to as (phi) angle

The bond between the alpha carbon and the carboxyl carbon, referred to as (psi) angle

Secondary Structure

Page 11: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

11

Protein Tertiary Structure Final shapes of proteins are determined

and stabilized by chemical bonds and forces, including weak bonds like Hydrogen bonds, Ionic bonds, Van der Waals bonds, and Hydrophobic attractions.

Tertiary Structure of Ribonuclease:

A globular protein

Alpha helices, beta sheets, and turns contribute to the

Ribonuclease A tertiary structure.

Page 12: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

12

Protein Quaternary Structure

The arrangement of the individual subunits of a protein with multiple polypeptide subunits gives the protein a quaternary structure Ex: Hemoglobin has 2 alpha and 2 beta

subunits. Only proteins with multiple

polypeptide subunits can have quaternary structure.

Page 13: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

13Different protein structure formation:

Page 14: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

14

The Goal of Protein Structure Prediction “The goal of fold assignment and

comparative modeling is to assign, using computational methods, each new genome sequence to the known protein fold or structure that it most closely resembles.”

In other words, to class structure into families that share similar folds or motifs and to construct phylogenies.

Page 15: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

15

Significant Identifying these shared structural motifs

can provide significant insight into the functional mechanisms of the protein family.

“The key to understanding the inner workings of cells is to learn the structure of Proteins that form their architecture and carry out their metabolism.”

Comparing proteomics with genomics, it is fair to say that “genes were easy” and the real work of bioinformatics has just begun.

Page 16: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

16

Protein Classification: Families and superfamilies By definition, proteins that are more

than 50% identical in amino acid sequence across their entire length are said to be members of a single family.

Superfamilies are groups of protein families that are related by lower but still detectable levels of sequence similarity (and therefore have a common but more ancient evolutionary origin).

Page 17: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

17

Protein Classification: Folds Proteins are said to have a common fold if

they have the same major secondary structures in the same arrangement and with same topological connections. For example, all alpha proteins, all beta proteins, alpha/beta proteins, membrane and cell surface proteins, etc.

In many respects, the term fold is used synonymously with structural motif but generally refers to larger combinations of secondary structures.

Page 18: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

18

Protein Classification: Enzyme nomenclature

Each enzyme can be assigned a numerical code, such as 3.2.1.14, where the first number specifies the main class, the second and third numbers correspond to specific subclasses, and the final number represents the serial listing of the enzyme in its subclass.

Page 19: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

19

Experimental Techniques

X-ray Crystallography NMR Spectroscopy 2D electrophoresis Mass spectrometry Protein microarrays

Page 20: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

20

Two Prediction Methods Protein Folding Model

to simulate the protein folding process at various levels of abstraction which provides insights into the forces that determine protein structure and the folding process.

No algorithm developed to date can determine the native structure of a protein accurately.

Comparative Modeling sometimes called homology modeling, seeks to

predict the structure of a target protein via comparison with the structures of related proteins.

Page 21: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

21

Comparative Modeling Algorithms

DALI (Holm1993) STRUCTAL (Gerstein1996) VAST (Gibrat1996) MINAREA (Falicov1996) LOCK (Singh1997) 3dSEARCH (Singh1998)

Page 22: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

22

Prediction Algorithm: 3dSEARCH Designed to compute fast but approximate

alignments of protein structures based on secondary structure elements alone.

The fundamental idea is to represent all secondary structure vectors from all target proteins in a large, highly redundant hash table. Each secondary structure vector from a given query structure can be simultaneously compared to the entire table.

It performed surprisingly well given the simplicity of its technique.

Page 23: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

23

Prediction Algorithm: VAST Aligning secondary structure elements

using graph theory. Steps of VAST Algorithm

All element pairs (one from each protein) that have the same type are represented as nodes.

Two nodes are connected if the distance and angle within some threshold.

Find the maximal subgraph that are fully connected, which is the pairwise alignment.

Compute alignment score as well as P-value.

Page 24: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

24

Prediction Algorithm: DALI

Attempt to compute the optimal similar contact patterns from a 2-d distance matrices.

Use branch-and-bound algorithm to find an approximate solution.

Page 25: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

25

Prediction Algorithm: STRUCTAL

To minimize the root-mean-square difference (RMSD) between two protein backbones.

Use dynamic programming to minimize.

Page 26: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

26

Prediction Algorithm: MINAREA

To compute a triangulation between the C-a atoms of the two proteins in order to minimize the stretched surface area between their backbones.

Use dynamic programming (DP) to find the minimum.

Page 27: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

27

Prediction Algorithm: LOCK

Attempt to find the optimal rigid-body superposition of two structures such that root-mean-square difference (RMSD) between the aligned C-a atoms is minimized.

An iterative approach that performs a greedy search to the nearest local minimum in alignment space.

Page 28: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

28

Gold Standard for Evaluation Scope database is being widely used and

has been recognized as a current standard in structural classification. (http://pdb.wehi.edu.au/scop)

It has been constructed by visual inspection of all structures in Protein Data Bank (PDB). Four levels, ‘class’, ‘fold’, superfamily’, and ‘family’. ‘Class’ are those that have similar overall secondary structure content.

Page 29: Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

29

CASP Competition

CASP competition (Critical Assessment of Techniques for Protein Structure Prediction) http://predictioncenter.llnl.gov/

Their goal is to help advance the methods of identifying protein structure from sequence.