introduction to bioinformatics online course: ibt 2016 ... · protein structure bioinformatics...

74
Protein Structure Bioinformatics Session1: Introduction Rehab Ahmed CBSB, Faculty of Science, University of Khartoum Faculty of Pharmacy, University of Khartoum Introduction to Bioinformatics online course: IBT_2016 Introduction to Bioinformatics online course: IBT_2016 Protein Structural Bioinformatics, Trainer: Rehab Ahmed

Upload: ngocong

Post on 10-Jul-2018

234 views

Category:

Documents


8 download

TRANSCRIPT

Protein Structure Bioinformatics

Session1: IntroductionRehab Ahmed

CBSB, Faculty of Science, University of Khartoum

Faculty of Pharmacy, University of Khartoum

Introduction to Bioinformatics online course: IBT_2016

Introduction to Bioinformatics online course: IBT_2016 Protein Structural Bioinformatics, Trainer: Rehab Ahmed

Learning Objectives

• To recap some basics of amino acids and proteins

• To study the different levels of protein structures

• To shed light on how protein structures are

determined.

• To learn about some relevant databases, file formats

and file viewers.

Introduction to Bioinformatics online course: IBT_2016 Protein Structural Bioinformatics, Trainer: Rehab Ahmed

Learning Outcomes

By the end of this session and practical, students are

expected to be able to

• Explore some recourses, and tools in the PDB

database.

• Use some webservers to predict Protein secondary

structure

Introduction to Bioinformatics online course: IBT_2016 Protein Structural Bioinformatics, Trainer: Rehab Ahmed

Structure of Amino Acid

https://www.mun.ca/biology/scarr/iGen3_06-01.html

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Aliphatic R Groups

• Name

• 3 letter

• One letter

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Aromatic R Groups

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Sulfur-containing R Groups

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Side Chains with Polar Alcohol Groups

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Basic R Groups

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Acidic R Groups

http://iweb.langara.bc.ca/biology/mario/Biol2315notes/biol2315chap3.html

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

https://online.science.psu.edu/sites/default/files/biol110/tutorial16_R_groups.jpg

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Molecular interactionsBonds and protein structures

Introduction to Bioinformatics online course: IBT_2016 Protein Structural Bioinformatics, Trainer: Rehab Ahmed

https://researchpeptides.com/images/misc/peptide-bond-animation.gif

Intermolecular Forces

• Dipole interactions

• Hydrogen bonds

• van der Waals forces

• hydrophobic interactions

• Others.

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

http://www.chem.ucla.edu/~harding/IGOC/D/disulfide_bridge.html

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Intermolecular Forces

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

https://researchpeptides.com/images/misc/Structures-Proteins.jpg

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Structure is instructed in the sequence!!

• Anfinsen's dogma

Christian B. Anfinsen 916–1995, U.S. biochemist: Nobel Prize in Chemistry 1972.

• Principles that Govern the Folding of Protein Chains

• Science 20 Jul 1973:Vol. 181, Issue 4096, pp. 223-230

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

https://online.science.psu.edu/biol011_sandbox_7239/node/7390

Secondary structure

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

α- helix

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

α- helix

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Linus Pauling (1901-1994), Noble prizes in chemistry and peace

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Other types of helices

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

• Alpha helix…….. (I, i+4)

• Others:

-3-10 helix…… (i, i+3)

-π-helix……….. (i, i+5)

https://en.wikipedia.org/wiki/File:Pi-helix_within_an_alpha-helix.jpg

Beta Strands (β-strands)

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Parallel and anti-parallel Beta sheets

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Hairpin

Crossover

Loops/turns

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Motifs in Proteins (Super-Secondary Structure)

• http://swift.cmbi.ru.nl/gv/students/mtom/hmotif.jpg

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Motifs in Proteins (Super-Secondary Structure)

• Psi-loop

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

https://en.wikipedia.org/wiki/File:5CPAgood.png

DSSP (Dictionary of protein secondary structure)

• Criteria for secondary structure.

• Programmed as a pattern-recognition process of hydrogen-bonded and geometrical features extracted from x-ray coordinates.

• Kabsch W, Sander C (1983). "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features". Biopolymers. 22 (12): 2577–637

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

DSSP (Helix, Strand and loops)

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Secondary structure Symbols

Alpha helix G

3-10 helix H

π-helix I

Beta bridge B

Beta strand E

Turns T

High curvature S

Space/no rule applies C

DSSP (Dictionary of protein secondary structure)

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Experimental determination of Secondary Structure

• Spectroscopy

• UV CD circular dichroism

• IR Spectroscopy

• NMR

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

http://www.ap-lab.com/images/CD_STANDARDS.gif

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Secondary structure prediction

• Early/empirical methods:

• Probabilities, and pre-computed residues preferences.

• Chou-Fasman method (~60% accurate)• Chou PY, Fasman GD (Jan 1974). "Prediction of protein conformation". Biochemistry. 13 (2): 222–245.

• CFSSP: Chou & Fasman Secondary Structure Prediction Server

• http://www.biogem.org/tool/chou-fasman/

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Secondary structure prediction

• For instance, helical propensity of residue type X

• Pα(X) = frequency (X in helix) / frequency (X)

• Pα > 1 = favours helix (e.g., Pα(Glu)=1.51)

• Pα < 1 = disfavours helix (e.g., Pα(Gly)=0.57)

Gerard J. Kleywegt’s slide

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Secondary structure prediction

• Database of 2000 residues

• 100 are Alanines

• 500 residues are in a helix

• 50 alanines are in a helix

• What is the propensity for Ala to be in a

• helix? Is Ala a good helix former?

Gerard J. Kleywegt’s slide

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Secondary structure prediction

• Pα(X) = frequency (X in helix) / frequency (X)

• Pα (Ala) = freq (Ala, α) / freq (Ala)

• freq (Ala, α) = 50/500 = 0.1

• freq (Ala) = 100/2000 = 0.05

• Pα (Ala) = 0.1/0.05 = 2.0

• Ala is a good helix former!

Gerard J. Kleywegt’s slide

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Secondary structure prediction

• Current, machine learning-based methods

employ information from multiple sequencealignment, information theory, and somemachine learning algorithms like artificial neuralnetwork and Bayesian networks or acombination of those.

• Eg: PSIPRED:

• http://bioinf.cs.ucl.ac.uk/psipred/

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Tertiary structure

• The tertiary structure is the final specificgeometric shape that a protein assumes.

• It is determined by a variety of bondinginteractions between the "side chains" on theamino acids

• Bond involve: hydrogen bonding, salt bridges,disulfide bonds, and non-polar hydrophobicinteractions.

http://chemistry.elmhurst.edu/vchembook/567tertprotein.html

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Methods of 3D structure Determination

Information on 3D structure can be obtained by

• X-ray crystallography,

• NMR spectroscopy, or,

• Cryo-electron microscopy,

submitted by biologists and biochemists from around the world.

freely accessible on the Internet via the websites of its member organizations.

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

X-ray Crystallography

.

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

X-ray Crystallography

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

X-ray Crystallography

• According to the Online Dictionary of Crystallography the term resolution is used to describe the ability to distinguish between neighboring features in an electron density map

• R factor is one measure of model quality (The level of agreement between calculated and observed intensities). (0-0.6)

• >0.5 is considered of poor quality.

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

X-ray Crystallography

Resolution Evaluation Interpretation

1.2 Å Excellent backbone and most side chains very clear. Some hydrogens may be resolved.

2.5 Å Good backbone and many side chains clear

3.5 Å OK! backbone and bulky side chains

5.0 Å Poor!!! backbone mostly clear; side chains not clear.

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

http://proteopedia.org/wiki/index.php/Resolution

Databases

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

wwPDB

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

RCSB PDB

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

• Repository of information about the 3D structures of large biological molecules.

• Was established in 1971 at Brookhaven National Laboratory

• Research Collaboratory for Structural Bioinformatics (RCSB) became responsible for the management of the PDB in 1998

RCSB PDB

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

RCSB PDB

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

PDB ID(s)

• A 4-character ID eg: 8CAT

• Unique, immutable identifier.

• The IDs are automatically assigned and do not

have meaning.

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Domains

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

• The domain is the basic building block of a protein structure

• 1- A spatially separated unit of the protein structure

• 2- May have sequence and/or structural resemblance to another protein structure or domain.

• 3- May have a specific function associated with it.

http://www.proteinstructures.com/Structure/Structure/protein-domains.html

Pfam

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

• Pfam 30.0

• 16306 entries (06.2016).

• Information about protein families (HMM)

• Annotations.

• links to other databases: RCSB PDB, CATH, SCOP, Proteopedia..etc

Pfam

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Pfam

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Pfam

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Pfam

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Pfam

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Pfam

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

CATH

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

The domains are classified within the CATH structural hierarchy: • Class (C) level, classification based on secondary

structure content, i.e. all alpha, all beta, a mixture of alpha and beta, or little secondary structure;

• Architecture (A) level, the level based on arrangement in three-dimensional space.

• Topology/fold (T) level, how the secondary structure elements are connected and arranged.

• Homologous superfamily (H) level, assignments are made if there is good evidence that the domains are related by evolution, i.e. they are homologous.

• http://www.cathdb.info/wiki

CATH

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

CATH v4.1

PDB Release 01-01-2015

Domains 308999

Superfamilies 2737

Annotated PDBs 108378

CATH

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Proteopedia

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

• Wiki web-resource whose pages have embedded three-dimensional structures surrounded by descriptive

• http://proteopedia.org/wiki/index.php/Main_Page

Proteopedia

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

File formats

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

• Sequence file; FASTA

• Secondary Structure Files(FASTA-formatted file ("ss.txt").

• PDB entry files (PDB, PDBx/mmCIF, XML).

• Small Molecule Files (PDB, CIF, SDF,..)

• Large Structures Represented in mmCIF/PDBx(containing >62 chains and/or 99999 ATOM records)

FASTA-formatted file ("ss.txt")

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Rehab Ahmed

• >101M:A:sequence

• MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRVKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGNFGADAQGAMNKALELFRKDIAAKYKELGYQG

• >101M:A:secstr

• HHHHHHHHHHHHHHGGGHHHHHHHHHHHHHHH GGGGGG TTTTT SHHHHHH HHHHHHHHHHHHHHHHHHTTTT HHHHHHHHHHHHHTS HHHHHHHHHHHHHHHHHH GGG SHHHHHHHHHHHHHHHHHHHHHHHHTT >102L:A:sequenceMNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL

• >102L:A:secstr HHHHHHHHH EEEEEE TTS EEEETTEEEESSS TTTHHHHHHHHHHTS TTB HHHHHHHHHHHHHHHHHHHHH TTHHHHHHHS HHHHHHHHHHHHHHHHHHHHT HHHHHHHHTT HHHHHHHHHSSHHHHHSHHHHHHHHHHHHHSSSGGG

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Rehab Ahmed

PDB File formats

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Rehab Ahmed

Molecular Graphics Software

• Cn3D http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml

• iCn3D http://www.ncbi.nlm.nih.gov/Structure/icn3d/docs/icn3d_about.html

• UCSF Chimera http://www.cgl.ucsf.edu/chimera/index.html

• Visual molecular dynamics (VMD) http://www.ks.uiuc.edu/Research/vmd/

• PyMOL https://www.pymol.org/

• Etc…

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Molecular Representation

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Rehab Ahmed

• What do we mean by Structural

bioinformatics?

• Why Protein Structure Bioinformatics?

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

?????????

• Structural Bioinformatics is a branch of

bioinformatics that deals with structure of the

biological macromolecules; DNA, RNA and

Proteins... (Deal=analysis, storage, visualization,

prediction…etc)

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Structural bioinformatics

• Proteins are the building blocks of all cells.

• In the world of proteins; Structure= Function!?

• DNA encodes life..Yes! But proteins carry out life

processes, replication, reproduction, defense…etc!

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Why Protein Structure bioinformatics

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Why Protein Structure bioinformatics

• This first SB session is meant to cover some basics and fundamentals and to help make us all be at the same page

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Resources/References

• The Anatomy and Taxonomy of Protein Structure(By: Jane S.

Richardson)

http://kinemage.biochem.duke.edu/teaching/anatax/

• http://www.rcsb.org/

• http://sbkb.org/

• http://www.proteinstructures.com/index.html

• http://proteopedia.org/wiki/index.php/Main_Page

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed