the anatomy and taxonomy of protein structure first few lectures: how do we look at protein...

The Anatomy and Taxonomy of Protein Structure

• First few lectures:

• how do we look at protein structures?

• how do we classify and compare them?

• Today, a little about the protein backbone or main chain.

Backbone geometry in proteins

1HEW

C

Phi

Psi

-180

-180

-150

-150

-120

-120

-90

-90

-60

-60

-30

-30

0

0

30

30

60

60

90

90

120

120

150

150

180

180180 180

150 150

120 120

90 90

60 60

30 30

0 0

-30 -30

-60 -60

-90 -90

-120 -120

-150 -150

-180 -180

Ramachandran plot

yellow and blue delineate sterically allowed conformationsRed shows residues in helical secondary structure, cyanIn beta-sheet, and black other. Squares indicate glycines

Angle is almost always close to 180--the peptide bond is planar and trans. and may vary but are limited to certain combinations as shown at right.

Hydrogen bond geometry

• Hydrogen bond not really a covalent ”bond”--not much orbital overlap.

• Model as an electrostatic interaction between two dipoles consisting of the H-N bond and the O sp2 lone pair. In electrostatic theory, the optimal orientation of two such dipoles is head-to-tail. The energy of such an arrangement should decrease as the head and tail are brought together as long as atomic van der Waals radii are not violated (then repulsive forces quickly take over).

• “Ideal” hydrogen bond in this model would have r~3.0 Å, p=180°, =0° and =±60°. Convince yourself of this.

• In small molecule crystals, this is approximately what is observed, though there is a lot of variation in the angles and . Thus the precise C=O…H angle parameters are not critical.

• Main chain-main chain hydrogen bonds found in proteins will show various deviations from this geometry, partly due to the topological constraints imposed by forming secondary structures.

Criteria for identifying hydrogen bonds in protein structures

• What is a reasonable hydrogen bond? Criteria for identifying hydrogen bonds are somewhat arbitrary and many have been used. Here are a couple of examples.

• Geometric criteria: Often H-bonds are just identified by two parameters, the O…N (acceptor-donor) distance r, and a O…H-N angle p. The angles describing the C=O…H geometry are ignored. Typical cutoffs: p > 120° and r < 3.5 Å. (Baker & Hubbard, 1984)

• Electrostatic criteria: One of the most commonly used criteria is a potential function based on a pure electrostatic model (Kabsch & Sander, 1983). Place partial positive and negative charges on the C,O (+ q1,-q1) and N,H (+q2,-q2) atoms and compute a binding energy as the sum of repulsive and attractive interactions between these four atoms:

E=q1q2(1/r(ON)+1/r(CH)-1/r(OH)-1/r(CN))*f

where q1=0.42e and q2=0.20e, f is a dimensional factor (=332) to convert E to kcal/mol, and r(AB) is the interatomic distance between atoms A and B.

A hydrogen bond is then identified by a binding energy less than some arbitrary cutoff, e.g. E< -0.5 kcal/mol.

• Note that the criteria defined above are only applicable when hydrogen atom positions are available. Crystal structures do not have hydrogens--however, their positions can be computed in many cases.

Secondary Structure Identification

• Next week we’ll learn about predicting the locations of secondary structures along the amino acid sequence of a protein from the sequence information alone. To evaluate whether such a prediction is correct, one has to be able to identify secondary structures from an experimentally determined set of protein coordinates: i.e. how do you define where a secondary structure element begins and ends?

• A “trivial but difficult” problem (Richardson, 1981)

• There is no single and correct algorithm for assigning secondary structure type.

• Most commonly used criteria are backbone conformation (phi,psi) and hydrogen bonding pattern.

• DSSP (Kabsch & Sander, 1983) and STRIDE (Frishman & Argos, 1995) are two of the more common programs, though there are many ways of defining secondary structure boundaries.

DSSP: turn and helix definitions3-turn: ‘>’ ‘3’ ‘3’ ‘<‘ notation-N-C-C--N-C-C--N-C-C--N-C-C- residues H O N O H O H O >----------------< H-bond

4-turn: ‘>’ ‘4’ ‘4’ ‘4‘ ‘<‘ notation-N-C-C--N-C-C--N-C-C--N-C-C-N-C-C residues H O N O H O H O H O >----------------------< H-bond

5-turn (just an elaboration of 3- and 4-turn.

A minimal helix is two consecutive N-turns--for a minimal four helix from residue i to i+3: i <--residue>444< and >444< overlap to give>>44<< which defines a helix HHHH from i to i+3 ‘H’ is the notation for a residue in a 4-helix.Notice that the helix does not include the residues involved in the terminal H-bonds.

Longer helices are overlapping minimal helices.

DSSP: bridge, ladder and sheet definitions

parallel bridge:

‘x’ notation

-N-C-C--N-C-C--N-C-C- residues H O H O H O \ . . / H-bonds \. ./ (\ and /, .\ /. or .) . \ / . H O H O H H residues-N-C-C--N-C-C--N-C-C- ‘x’ notations

antiparallel bridge:

‘X’ notation

-N-C-C--N-C-C--N-C-C- residues H O H O H O . ! ! . H-bonds . ! ! . (! or .) . ! ! . . ! ! . O H O H O H residues-C-C-N--C-C-N--C-C-N- ‘X’ notations

ladder= set of one or more consecutive bridges of identical typesheet= set of one or more ladders connected by shared residues

STRIDE (2ndary STRucture IDEntification)

• Uses what is known as a “knowledge-based” potential--we as a community of scientists know intuitively how to define secondary structures, we just can’t put our finger on it!

• So how do we quantify what we already know?

• Set of qualitative criteria--most common criteria used by crystallographers are backbone conformation and hydrogen bonding.

• “standard of truth”--collective wisdom of crystallographers--2ndary structure assignments made by crystallographers when they submitted structures to the Protein Databank.

• STRIDE makes potential energy functions for H-bonding and backbone conformation but leaves floating parameters which are adjusted to best reproduce crystallographers’ assignments.

p22cro_m

C

Phi

Psi

-180

-180

-120

-120

-60

-60

0

0

60

60

120

120

180

180180 180

120 120

60 60

0 0

-60 -60

-120 -120

-180 -180

1011

12 2

2

1110

phi

psi

Boundaries of a helix

Is 10 in the helix?How about 11?How about 2?

Side chainconformation

• side chains differ in their number of degreesof conformational freedom(some don’t have any)

•but side chains of very different size can havethe same number of chiangles.

Names of canonical side chain conformations

t=trans, g=gauche

name of conformation

IUPAC nomenclature:http://www.chem.qmw.ac.uk/iupac/misc/biop.html

Rotamers• a particular combination of angles 1, 2, etc. for a particular residue is known as

a rotamer. • for example, for aspartate, if one considers only the canonical staggered forms,

there are nine (32) possible rotamers: g+g-, g+g+, g-g-, g-g+, tg+, g+t, tg-, g-t, tt• not all rotamers are equally likely. • for example, valine prefers its t rotamer.

distribution ofvaline rotamersin protein structures(from Ponder & Richards, 1987)

1801=0 360

Rotamer libraries

• one of the problems in designing and modelling/predicting protein structures is how to construct an appropriate group of rotamers to represent the possible side chain conformations observed in proteins without using so many as to make the problem computationally intractable.

• such groups of rotamers are known as rotamer libraries (Ponder & Richards, 1987).

• the probability of finding a particular rotamer is affected by what the backbone angles for that residue are (phi, psi). For instance, the g+ conformation is very rarely found in a helix. Thus, backbone-dependent rotamer libraries are also sometimes used.

• We’ll delve into this in more depth in about a week when we do homology modelling

side chain rotamers are not limited to canonical eclipsed

forms--there are many subtly different rotamers

an “x degree rotamer” in this figure means that at least one side chain anglediffers by x degrees.

from Xiang & Honig, 2001

Surface and interior of proteins

• do proteins have a lot of holes/empty space inside?

• how much of a protein’s molecular surface is in contact with the surrounding solvent (water in the case of globular, soluble proteins)?

• are certain residues more likely to be in contact with solvent than others?

• Lee & Richards, 1971; Shrake & Rupley, 1973

• First, represent atoms as spheres with appropriate van der Waals radii

• eliminate overlapping parts of spheres

• This gives a space-filling model similar to the picture at right

Calculating Solvent Accessible Surface Area

•Now roll a sphere of a given radius all around theVan der Waals surface•the sphere will not make contact with the entire van der Waals surface

•its center will trace out a continuous surface as it rolls

Now look at a cross-section: Inner surfaces here are van der Waals. Outer surface is that traced out by the center of the sphere as it rolls around the van der Waals’ surface. If any part of the arc around a given atom is traced out, that atom is accessible to solvent. The solvent accessible surface of the atom is defined as the sum the arcs traced around an atom.

solventaccessiblesurface

arc traced around atom

van der Waalssurface

there’s not much solvent accessible surface in the middle

fromLee &Richards,1971

Fractional accessibility• calculate total solvent accessible surface of protein structure (also can

calculate solvent accessible surface for individual residues/sidechains within the protein)

• can also model the accessible surface area in an unfolded protein using accessible surface area calculations on model tripeptides such as Ala-X-Ala or Gly-X-Gly.

• from these we can calculate what fraction of the surface is buried (inaccessible to solvent) by virtue of being within the folded, native structure of the protein.

• this is done by dividing the accessible surface area in the native protein structure by the accessible surface in the modelled unfolded protein. That’s the fractional accessibility. The residue fractional accessibility and side chain fractional accessibility refer to the same thing calculated for individual residues/sidechains within the structure.

Accessible surface area in protein structures

• accessible surface area As in native states of proteins is a non-linear function

of molecular weight (Miller, Janin, Lesk & Chothia, 1987):

As = 6.3Mr0.73

` where Mr is molecular wt

this is an empiricalcorrelation but it comesclose to the expectedtwo-thirds power law relating surface area tovolume or mass.Why is the exponenta little larger?

How much surface area is buried when a protein folds?

• estimate accessible surface area in unfolded proteins using the accessible surface areas in Gly-X-Gly or Ala-X-Ala models. This is a linear function of molecular weight

At = 1.48Mr + 21

• the total fractional accessibility is As/At ,and the fraction of surface area

buried is 1- As /At

• what fraction of surface area is typically buried for a protein of molecular weight 5000 daltons? 30,000 daltons?

Distribution of residue fractional accessibilities

note broad distribution amongnon-buried residues, and meanaccessibility for non-buried residuesof around 0.5

note that few residues arecompletely exposed to solvent, but that fractionalaccessibility of >1 is possible

from (Miller et al,1987)

note that a sizable group are completely buried(hatched) or nearly completely buried

Buried residues in proteins

size class mean Mr fraction of buried residues0% ASA 5% ASA

small 8000 0.070 0.154medium 16000 0.107 0.240large 25000 0.139 0.309XL 34000 0.155 0.324all 0.118 0.257

•the fraction of buried residues (defined by 0% or 5% ASA cutoffs)increases as a function of molecular weight--for your average proteinaround 25% of the residues will be buried. These form the core.

Core of 434 cro

8% accessibility cutoff

Residue fractional accessibility correlates with free energies of transfer for amino acids between water and organic

solvents

• (Miller, Janin, Lesk & Chothia, 1987)

• (Fauchere & Pliska, 1983)

• the interior of a protein is akin to a

nonpolar solvent in which the nonpolar

sidechains are buried. Polar sidechains,

on the other hand, are usually on the surface.

Hydropathy scales• the correlation between a residue being polar or nonpolar and its tendency to

be buried is a sequence-structure relationship-- a number of such relationships can be seen from examining protein structures. As we will see next week, such relationships are useful in trying to predict protein structure from amino acid sequence.

• many scientists have tried to develop hydrophobicity or hydropathy scales to quantify the tendency of residues to be buried. Most such scales are based on partitioning of the amino acid between water and some nonpolar solvent, or between the surface and interior of proteins.

Kyte-Doolittle Hydropathy

(Kyte & Doolittle, 1981)

nonpolar

polar/charged

on thebubble

Buried polar residues in proteins• while most of the protein interior is made up of nonpolar side chains, the

average protein will have a few buried polar residues, even ones which are capable of carrying a formal charge, e.g. Lys, Arg, Glu, Asp.

• charged residues are almost always paired with other charged residues to make salt bridges, or hydrogen bonded to other polar groups.

• in general, a key rule of protein structure anatomy is that you rarely see buried hydrogen bond donors/acceptors not paired to other acceptors/donors.

Glu35

Arg10

Arg5

buried saltbridge hydrogen

bond to mainchain

Cavities in proteins

• protein interiors generally have high packing densities such that not much void space is present.

• nonetheless, proteins do sometimes have interior cavities big enough to fit water molecules.

the anatomy and taxonomy of protein structure first few lectures: how do we look at protein...

Documents