analyzing protein interaction networks using structural

ANRV345-BI77-18 ARI 6 May 2008 15:23

Analyzing ProteinInteraction Networks UsingStructural InformationChristina Kiel,1 Pedro Beltrao,2

and Luis Serrano1

1EMBL-CRG Systems Biology Unit, Center de Regulacio Genomica, Barcelona08003, Spain; email: [email protected], [email protected] Molecular Biology Laboratory, 69117 Heidelberg, Germany;email: [email protected]

Annu. Rev. Biochem. 2008. 77:415–41

First published online as a Review in Advance onFebruary 5, 2008

The Annual Review of Biochemistry is online atbiochem.annualreviews.org

This article’s doi:10.1146/annurev.biochem.77.062706.133317

Copyright c© 2008 by Annual Reviews.All rights reserved

0066-4154/08/0707-0415$20.00

Key Words

interface modeling, interaction types, protein complexes, structuralproteomics

AbstractDetermining protein interaction networks and predicting networkchanges in time and space are crucial to understanding and mod-eling a biological system. In the past few years, the combinationof experimental and computational tools has allowed great progresstoward reaching this goal. Experimental methods include the large-scale determination of protein interactions using two-hybrid or pull-down analysis as well as proteomics. The latter one is especially valu-able when changes in protein concentrations over time are recorded.Computational tools include methods to predict and validate proteininteractions on the basis of structural information and bioinformat-ics tools that analyze and integrate data for the same purpose. In thisreview, we focus on the use of structural information in combinationwith computational tools to predict new protein interactions, to de-termine which interactions are compatible with each other, to obtainsome functional insight into single and multiple mutations, and toestimate equilibrium and kinetic parameters. Finally, we discuss theimportance of establishing criteria to biologically validate proteininteractions.

415

AR FurtherClick here for quick links to Annual Reviews content online, including:

• Other articles in this volume• Top cited articles• Top downloaded articles• AR’s comprehensive search

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

Sequencealignments: thearrangement ofamino acid sequencesfrom organisms in away that aligns areassharing commonproperties

Contents

1. INTRODUCTION . . . . . . . . . . . . . . 4162. MACROVIEW OF

STRUCTURE. . . . . . . . . . . . . . . . . . . 4183. HOMOLOGY INTERFACE

MODELING . . . . . . . . . . . . . . . . . . . . 4213.1. Quality of Template

Structures . . . . . . . . . . . . . . . . . . . . . 4223.2. Multiple Sequence

Alignments: A Treeof Methods . . . . . . . . . . . . . . . . . . . 423

3.3. Limits of StructuralCoverage . . . . . . . . . . . . . . . . . . . . . 424

4. LEVELS OF PREDICTION . . . . . 4244.1. Protein-Protein Interactions . . 4254.2. Domain-Peptide

Interactions . . . . . . . . . . . . . . . . . . . 4294.3. Nonatomic Detail

Prediction . . . . . . . . . . . . . . . . . . . . 4325. STRUCTURAL INFORMATION

AS A TOOL TO ANALYZEPROTEIN INTERACTIONNETWORKS . . . . . . . . . . . . . . . . . . . 432

6. MINING FOR BIOLOGICALCONTEXT . . . . . . . . . . . . . . . . . . . . . 4336.1. Sequence-Based Methods . . . . . 4336.2. Domain Interaction

Propensity . . . . . . . . . . . . . . . . . . . . 4346.3. Graph Theory Methods . . . . . . 4346.4. Integration of Different

Methods . . . . . . . . . . . . . . . . . . . . . . 4347. LIMITATIONS . . . . . . . . . . . . . . . . . . 435

1. INTRODUCTION

Determining protein-protein and protein-ligand (i.e., DNA, RNA, and small-molecule)interactions is essential if we want a systemsbiology understanding of living organismsand/or cells. Much effort has been made inthe past years in experimentally determiningprotein-protein interactions on a large scale.Among the different methods used, we men-tion the two-hybrid approach with all its vari-eties [see Rual et al. (1), Uetz et al. (2), Ito et al.(3), and Stelzl et al. (4)], the pull-down exper-

iments using different tags (5–7), and morerecently protein chips [8; reviewed in Kung& Snyder (9)]. Each of the different meth-ods have their advantages and disadvantages,which we do not discuss here [see Cusick et al.(10), Fields (11), and Berggard et al. (12)].In parallel over many years different groupshave attempted to predict protein-proteininteractions using homology modeling anddocking software. However, although wehave seen significant progress in the field, theaccuracy of methods is not good enough toattempt large-scale prediction of interactionnetworks [Aloy & Russell (17)]. With theadvent and progress of different structuralgenomics projects, a new methodology,complementary to the above activities termed“interface modeling,” was developed. Thismethodology depends on the availabilityof several good-quality three-dimensional(3D) structures, on correct (structure-based)sequence alignments, and on careful struc-tural inspection of domains and sequences(16). Essentially, it needs the structure of acomplex at high resolution, and it is basedon the fact that proteins belonging to thesame family if they interact they normally doit the same way and using similar positions(15), thus avoiding the docking problem(Figure 1) (18). Other simplifications arethat only the interacting secondary structureelements are considered and the targetsequences modeled on them (19, 61; G.Fernandez, P. Beltrao, L. Serrano, submittedfor publication). Even simpler versions ofthis method just consider the nature of theresidues, which form the complex interfaceand score the putative complex of related pro-teins (15). Complementary to this approach,other methodologies, such as the moleculardynamics of protein-peptide complexes, havebeen used to identify putative peptide ligandsfor a globular domain (110). However, evena successful prediction of an interactionfrom a biophysical point of view (i.e., thetwo proteins bind with nanomolar affinity)does not guarantee that the interaction isphysiological. Thus, other computational

416 Kiel · Beltrao · Serrano

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

Protein domains & structures

Excluding interactions

Network branching

Simultaneous interactions

Protein complex structures

Large-scale experimental interaction network

Prediction of Kd, k

on k

off

Domain family B

Domain family A

Dom ADom B

Protein interface modeling

Large-scale in silico interaction network

Dom 1 Dom 3Dom 2

Figure 1Summary of the main concepts discussed in this review. Structural biology in combination withcomputational tools can be used to validate interactions that have been found experimentally in acomplex and to predict new protein interactions in silico. Furthermore, structural information is used todetermine which interactions are compatible with each other and which ones exclude each other and toestimate equilibrium and kinetic constants.

tools are needed to add some biologicalcredibility to the predicted interactions.

Information regarding protein interac-tions is often depicted in interaction net-work diagrams where usually proteins arerepresented by dots and connections tointeracting proteins are shown as lines. Thisconcept is very powerful in order to ex-plore global properties of network topologies.

Transient proteincomplex: short-lived proteincomplexes; the highdissociation rateconstant makesbinding usually weak

However, treating proteins as dots neglectsthe important biophysical properties of pro-teins and protein complexes, which are cru-cial in mediating their cellular function: Dothe interacting proteins form transient or sta-ble protein complexes? Which domains me-diate the interaction? What are the affinitiesand kinetic constants? Which interactions ex-clude each other and which ones can happen

www.annualreviews.org • Structure-Based Interaction Networks 417

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

Stable proteincomplex: a group oftwo or morepermanentlyassociated proteins;they are a form ofquaternary structure

UBD: ubiquitin-likedomain

simultaneously? What will be the effects ofparticular mutations, found in human popu-lations, on the interactions?

Traditional high-throughput methods donot answer these questions, and for example,they cannot distinguish between incompatiblecomplexes sharing one or more components(13). This has prompted various groups todevelop methodologies to distinguish them.For example, when doing pull-down exper-iments, one can tag all the elements of thecomplexes, and by finding particular associa-tions, in principle, it should be possible to de-cide about incompatibilities (7). Also, anothertype of information, such as colocalization,can be used to distinguish between differentcomplexes sharing common elements (14). Al-ternatively, partial or total answers to thesequestions can be obtained from structural in-formation as pointed out by the pioneeringwork of Aloy & Russell (15, 17), which postu-lated that “Structural details can turn abstractsystem representation into models that moreaccurately reflect reality.” Regarding whichinteractions exclude each other and whichones can happen simultaneously, the M.B.Gerstein lab (104) was the first to apply struc-tural information on a large-scale experimen-tal network (the human proteome). By usinga 3D-structural exclusion, they could distin-guish overlapping from nonoverlapping inter-faces and thus provide evolutionary insightsinto network evolution (104). Similarly, thisinformation of conserved domain-domain in-teractions (and interface types) can be usedto find possible binary interactions if experi-ments discover several proteins in a complex.Furthermore, the idea of testing whether do-main interactions are simultaneously possibleor whether they exclude each other can in-crease the information of a given network,allowing subdivision into mutually exclud-ing complexes (A. Campagna, C. Kiel & L.Serrano, manuscript in preparation). With re-spect to the prediction of binding parameters,the pioneering work of Schreiber and cowork-ers (51) showed that, using structural infor-mation, it was possible obtain an estimate of

the kon for the formation of a protein com-plex. This work was later followed by othergroups (19) and ours, showing that quite ac-curate estimates on kon can be obtained formutant variants and for members of the samefamilies within protein complexes (C. Kiel, N.Kuemmerer, L. Serrano, manuscript in prepa-ration). Baker’s group showed that it was pos-sible to obtain some reasonable estimate onKd values for protein interactions (20). Morerecently our group has shown that for the Ras-Rbd family of protein complexes computermethods can predict quite accurate estimatesof the Kd values (19). Finally protein designalgorithms, such as FoldX and others (86–91)based on X-ray structures and interface mod-els, can reliably predict the effect of mutationson protein complex stabilities, which allow in-terpretation of the effect of SNP variants onprotein functionality and complex formation(21).

In this review, we introduce and summa-rize the main applications of structural in-formation and interface modeling to predictand analyze protein interaction networks, assummarized above (Figure 1), and show thestrengths and weaknesses of the methodol-ogy. We do not discuss other approaches forthe prediction of protein-protein interactionsbased on docking (18) nor the nonhomology-based structural prediction methods (22).Similarly, we do not review current method-ologies for structure prediction (23).

2. MACROVIEW OF STRUCTURE

A protein (especially in eukaryotes) usuallyconsists of many domains, and very often onedomain can interact with different types ofdomains using different areas of its surface, asshown for the Ras-like G domain fold, whichcan interact with many different partnerdomains of various structures (Figure 2a).However, proteins belonging to the samefamily [i.e., ubiquitin-like domain (UBD)]usually interact in a similar way with aparticular protein family (i.e., Ras G proteins)(17) (Figure 2b). In all Ras/UBD complexes,


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

the binding mode is similar and involvesmainly two antiparallel β-sheets of the UBDand the Ras domain, respectively, as well asthe first helix of the UBD (19). However,a recent large-scale structural classificationof protein-protein interfaces has shown that24% of protein hetero-interactions betweenhomologues associate in multiple orienta-tions (24). Thus, further information on thetarget protein families is needed before wecan assume that they will interact the sameway as their homologues.

Structural proteomics efforts have alreadyfound 700–800 different folds (25) of thepredicted 1000 domain folds in nature (26)and ∼2000 of the predicted 10,000 domain-domain interaction types (25). It is estimatedthat more than 20 years is needed to obtain arepresentative structure for each interactiontype (25). One important attempt to reachthis goal is the so-called “Pfam5000” struc-tural genomics effort, which aims to solve thestructure of the 5000 most important domainfamilies found in the Pfam database (27).The identification of domain families fromsequences is a very mature field in bioin-formatics. In Table 1, we list the currentWeb tools and databases used to find domainson the basis of sequence similarity, proteinorder/disorder prediction, domain databasesand classifications, and domain interactiontype databases (28–48).

However, it is not the fold per se that de-termines whether two domains can interact,but rather epitopes on the surface of the do-mains (49) (Figure 2c). Thus, although pro-teins of the same families interact throughequivalent surfaces, there is no guarantee thatany Ras protein will interact with an effec-tor containing a UBD. It is often the casethat one or two differences at key positionsof the interaction surface are enough to pre-vent binding (i.e., Ras and Rap with RalGDSand Raf ). In fact, it has been shown for Ras-effector interactions that the thermodynamicproperties and kinetic properties can vary byorders of magnitude (50), as shown for Rasin complex with RalGDS-RA, Raf-RBD, and

Table 1 Tools to analyze protein sequences, domains, and domaininteraction types

Type and name of tool ReferencesSequence similarity search

BLAST/PSI-BLAST Altschul et al. (28, 29)Protein order and disorder prediction

GlobPlot Linding et al. (30)IUPRed Dosztany et al. (31, 32)DisProt Peng et al. (33)VSL2 Peng et al. (148)PrDOS Ishida & Kinoshita (149)POODLE-L Hirose et al. (150)

Domain databasesSMART Schultz et al. (34), Letunic et al. (35)Pfam Bateman et al. (36)PROSITE Hulo et al. (37)CDD Marchler-Bauer et al. (38)

Domain classificationCATH Orengo et al. (39), Pearl et al. (40)SCOP Murzin et al. (41), Andreeva et al. (42)

Domain interaction type databasesiPfam Finn et al. (43)3did Stein et al. (44)SCOPPI Winter et al. (45)PRISM Ogmen et al. (46)SNAPPI-DB Jefferson et al. (47)PIBASE Davis & Sali (48)

RA: Ras-associationdomain

RBD: Ras-bindingdomain

PLCe-RA2 (Figure 2d–f ). Depending on theresidues on the UBD fold, the affinities incomplex with Ras can be in the nano- or mi-cromolar range. Enthalpy (�H) and entropy(T�S) contributions to binding energy canboth be favorable, or large favorable enthalpycontributions are compensated by small, butunfavorable, entropy contributions. Similarly,the contributions of association and dissoci-ation rate constants (kon and koff) to affinity(Kd = koff/kon) vary significantly. Estimatesfor kon can be readily obtained from the struc-ture of the complexes (51, 52). It is not yetclear to what extent kinetic constants have animportant biological role, or whether it is theKd that is important.

In summary, although in the majority ofthe cases it is found that interaction surfacestend to be conserved between domains be-longing to the same families, it is the detailsof the interaction that determine the binding


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

Kd (

µM)

3.5

2.5

1.5

0.5

Raf-RBD PLCe-RA2 RalGDS-RA Raf-RBD PLCe-RA2 RalGDS-RA Raf-RBD PLCe-RA2

d

PH domain

Coiled coilUb fold PH domain

Immunoglobin like

Immunoglobin like

Coiled coil

Coiled coil

Dimer of 3-helixcoiled-coils

CRIB

HEAT & armadillo repeats

Armadillo repeats

HEAT & armadillo repeats

Cholera toxin

α + β barrel

Ras

G domain fold

a

RalGDS-RARasUb domainsRas

b c

e f10

–5

–10

–15

–20

kcal

mo

l–1

k on(µ

M–1

s–1)

and

ko

ff(s

–1)

504540

3530

25

201510

Kon

Kon

Koff

Koff

Kon

Koff

TΔS

TΔS

TΔS

ΔH

ΔH

ΔH

5

0

Kd

Kd

Kd

50

2

1

0RalGDS-RA

3


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

affinity and kinetic properties. Thus, whenpredicting if two proteins will interact usingthe structural information of a complex in-volving related domains, an atomic model ofthe complex is needed. We discuss this processbelow with an emphasis on the difficulties as-sociated with interface modeling.

3. HOMOLOGY INTERFACEMODELING

There are different methodologies to per-form homology modeling [for reviews, seeGoldsmith-Fischmann & Honig (53) andGinalski (54)], and different automaticservers are available, e.g., SWISS-MODEL,WHAT IF, and MODELLER (http://swissmodel.expasy.org; http://swift.cmbi.kun.nl/WIWWWI/; http://www.salilab.org/modeller) (55–57). When the sequencehomology is large and the loops have the samelength, homology modeling just involvesreplacing the residues, which are differentin the template structure, with those of thesequence to be modeled. This is a side chainmodeling problem, which has been tackled bya number of algorithms (58–60). Side chainmodeling relies on the discretization of theconformational space of each amino acid intoso-called rotamer states and on a search algo-rithm to find the best combination of thesestates. This is followed by energy evaluationto see if there is any structure incompatibilityof the new sequence side chains with therest of the protein, which could indicate

Binding affinity:the strength ofnoncovalentchemical bindingbetween twoproteins or ligands,measured by thedissociation constant(Kd) of the complex

Loop: loops inproteins arepolypeptidesconnectingsecondary structureelements, and theycan vary in size andsequence betweenhomologous proteins

that the backbone of the template structureshould move to accommodate the new sidechains. Sometimes even a single amino aciddifference can slightly change the backbone.One example of this is a UBD sequencewith a proline residue in a central positionof a β-strand or α-helix, which changesthe backbone of these secondary structureelements (61) (see Figure 3a). Anotherexample is the stabilization of a long loop,which is involved in the interface between anE2 and an E3 RING (really interesting newgene) domain by a large bulky amino acid. Atryptophan residue in the E3 RING domainstabilizes the conformation of the loop,which is involved in a complex formationwith its E2 partner. A RING sequence witha small residue at this position cannot bemodeled reliably using this complex struc-ture as a modeling template (62) becausethe loop is expected to have a differentconformation (Figure 3b). If placement ofthe new sequence on the structural templatedoes not result in any major incompatibility,or if the problems lie far away from theinteraction surface, then the model can beevaluated.

When the homology is low, threshold< 30%, [the so-called 30% rule (53), seethe structural explanation for this by Chung& Subbiah (63)], and/or the loops have dif-ferent lengths, the backbone is locally, orglobally, different from that in the referencestructure. For particular domain families, thesequence identity threshold may vary. In some

←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−Figure 2Macroview of structure. (a) Different binding modes of Ras-effector interactions, where the Gdomain-like fold uses different areas of its surface to mediate binding to various domains. (b) Similardomains usually interact in a similar way, as shown by an overlay of four Ras-effector complexes. Ras isshown in gray, and the Ras-binding domains of RalGDS, Raf, PLCe, and PI3-kinase are in yellow, red,blue, and green, respectively. (c) The detailed amino acid contacts in the interface of the two domainsdetermine specificity and affinity toward other domains. Ras and the Ras-binding domain of RalGDS areshown. (d ) The affinities of three Ras-binding domains in complex with Ras (49). (e) The thermodynamicproperties of three Ras-binding domains in complex with Ras (49). ( f ) The kinetic properties of threeRas-binding domains in complex with Ras. Experimental binding information for panels d, e, and f wastaken from Wohlgemuth et al. (49). Abbreviations: �H, enthalpy; kon, association rate constant; koff,dissociation rate constant; T�S, entropy; Ub, ubiquitin.


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

β2

RING domain UBCc domain

TRP

b

a Ras domain Ras-binding domain

Figure 3Examples of residues stabilizing a fold (backbone). (a) Overlay of β2(β-strand 2) of two ubiquitin-like domains without and with a prolineresidue: β2 of Raf-RBD [Protein Data Bank (PDB) entry: 1GUA], andPLCe-RA2 (PDB entry: 2C5L). The β-strand of PLCe-RA2 has to bendto incorporate the proline residue. The complex of Ras and Raf-RBD isgray (PDB entry: 1GUA). (b) Stabilization of a loop by a tryptophanresidue (W408) in RING domains (PDB entry: 1FBV). Ribbonrepresentation of the cCbl-UBCH7 complex in gray (PDB entry: 1FBV).Included in this panel are the helix containing the TRP residue in 1FBVand the following loop as well as the helix of another RING domain(PDB entry: 1RMD) with a Cys residue at this position and the followingloop involved in the interaction.

protein families, very similar structures havelow-sequence homology, and some clear se-quence rules determine the local conforma-tion (i.e., the SH3 family). Whereas in othercases, significant changes in conformationtake place with the same homology level (i.e.,the UBD family). In cases of low-sequenceidentity, different methodologies are used tomodel backbone movements: molecular dy-namics (64), fragment libraries (65–67), or thepossibility of building chimeras using differ-ent parts of proteins belonging to the samefamily (G. Fernandez, in preparation); butmodeling requires a careful examination of theprotein family to which the target belongs.

Loops frequently create a homology mod-eling problem. Insertion or deletion of one ormore residues, or simply changing a criticalresidue in a loop, can result in large confor-mational changes that are difficult to predict.If possible, loops that are not involved in theinteraction between proteins, or between pro-tein and ligand, should not be modeled (16).When it is necessary to model them, the bestsolution is to build chimeras using loop in-formation from other members of the samefamily. If this is not possible, then automaticmethods like MODELLER (55) or WHATIF (56) can be used, but the results should beregarded with caution.

3.1. Quality of Template Structures

When homology modeling is used for se-quences with a high level of sequence iden-tity, success is dependent on the quality of thetemplate structure, the quality of the rotamerlibrary, the search engine, and the force fieldemployed. Usually, a higher resolution of thestructure results in a better prediction. Ideally,structures below 2.2 A resolution should beused. However, in some concrete cases whereinformation from other members of the fam-ily is available, a lower resolution can be con-sidered. One way to assess the quality of thestructure of a complex is to calculate the inter-action energy after refinement with a proteindesign algorithm. If the ��G-binding energy


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

is not negative and is equal to or higher thanthe experimental data, the quality of the tem-plate is not good. In some cases, the problemcould be local (a particular residue with bad,i.e., combinations of angles resulting in vander Waals clashes, ϕ and ψ angles, for ex-ample), whereas in others, it could be moreglobal.

Further validation could be done by per-forming an in silico alanine-scanning mutage-nesis with the original NMR structures andcomparing the results with experimental mu-tagenesis data. If predicting a single alaninemutant is successful, this is an indication thatthe X-ray structure is of high quality (theopposite is not necessarily true because lo-cal conformational changes in response to themutation could result in mispredictions). Inthe case of Ras-effector interactions, alaninemutants were predicted with a correlation of0.7/0.8 for the complexes of Ras in complexwith RalGDS-RA and Raf-RBD (68).

A typical problem encountered when do-ing side chain replacement at the protein, orDNA, level is that the position of the CB(carbon atom β) in the case of amino acids,or of the N9 or N1 atoms in the case ofDNA, is not always constant with respect tothe backbone. This means that placement ofstandard side chains on a protein or DNAbackbone could result in large deviations atthe tip of the residue or base. This problemcan be solved by always superimposing struc-tures on the CB atoms (or N9 or N1 in thecase of DNA). However, this assumes that thedeviations of these atoms with respect to stan-dard side chains are due to structure deter-mination artifacts and are not real. If the de-viations are real and if the atoms have somesmall capacity for displacement, then the cor-rect side chain conformation, which fits witha small displacement of the CB atom, may bemissed.

In any case before doing any homologymodeling, it is always recommended that thestructural template be repaired. This meansflipping the Asn, Gln, and His side chainsback and forth 180◦ to see if they are cor-

MSA: multiplesequence alignment

rectly placed because in X-ray structures it isnot possible to distinguish the CO and NH2groups of Asn and Gln or the CD2 and ND1atoms of His. Side chains should be movedslightly to eliminate van der Waals clashes,and if residues on the surface are part ofthe crystal contacts, other rotamers should beexplored.

NMR structures should not be used unlessthey have been refined to very high resolutionusing dipolar couplings and other techniques(69, 70). If NMR structures are used, the rec-ommended strategy is to repair them usingthe same protein design algorithm that willbe used later for homology modeling and toselect the structure with lowest energy.

3.2. Multiple Sequence Alignments:A Tree of Methods

One of the crucial steps for homology mod-eling that still requires further developmentis the alignment of the target sequence tomodel with the structural template. In par-ticular, when sequence identity between thetarget sequence and template sequence is low(<30%), the accuracy of the alignment andthe produced model are very weak (71).

Multiple sequence alignment (MSA) toolshave been under development for decades(72, 74, 75). Wallace and colleagues (73) re-ported that, in 2005 alone, 20 new publica-tions, describing new methods, were foundin the literature. They showed that the dif-ferent methodologies can be combined in ameta-alignment method (dubbed M-Coffee)by careful selection of independent methods(76).

Perhaps one of the most significant de-velopments for the use of MSAs in homol-ogy modeling has been the incorporation ofstructural information [see the descriptionsof 3DCoffee (77), Staccato (78), and SAlign(79)]. Given the robustness of protein struc-ture against amino acid substitution, homolo-gous proteins are more likely to retain struc-tural similarity than sequence identity overtime (80). For 3DCoffee, it was shown that


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

the inclusion of protein structures among dis-tantly related sequences increased MSA by4% per added structure (77). The use of se-quence alignment tools for the purpose of ho-mology modeling of low target-to-templatesequence identity was recently benchmarked(81). Dalton & Jackson (81) showed that whenusing the same modeling tool (modeller 8v2),3D(T)Coffee + MODELLER perform best.Improvements were achieved for two- to four-template modeling, giving a significant advan-tage over one-template modeling.

Packages that bundle various programsand speed the creation of different modelingprotocols are useful. One of the few examplesof currently available packages is Biskit (82),a Python library for structural bioinformatics.Even using the best automatic sequence align-ment methods, it is, nevertheless, necessary atthe end to inspect the alignment manually us-ing structural information to ensure that it iscorrect (83).

3.3. Limits of Structural Coverage

In 2005, Chandonia & Brenner (27) estimatedthat the available protein structures wouldcover from 35% to 51% of the proteomesof model organisms. This estimate was cal-culated as the fraction of proteins containingat least one domain belonging to a family thathas one of its members resolved structurally.As we discussed above, to create an accuratemodel using homology modeling, stricter re-quirements are needed. Some domain familiesare more easily modeled than others. SH3 do-mains, for example, retain structural similar-ity for low-sequence identity. For this reasonin a best-case scenario, SH3 domains serveas a good model to test how much structuralcoverage is achievable with current homologymodeling methods. We set up an automaticmodeling pipeline using Biskit (82) as a wrap-per for 3DCoffee (77) and MODELLER (57)to test the extensibility of current structuralcoverage for modeling SH3 domains. We usedsingle- and two-template modeling of sevenSH3 domains of known structure to bench-

mark the procedure (see Figure 4a). Usingtwo templates and the best current practices,reasonable SH3 models were obtained whenthe average sequence identity of the templateswas above 30%. At this threshold, 98% ofthe models had an average CA-RMSD (car-bon α-root mean square deviation) of less than1.5 A, and 65% of the models had an averageCA-RMSD below 1.0 A. For single-templatemodeling, a higher identity threshold is re-quired. Using single templates of 40% or bet-ter sequence identity, 88% of the models pro-duced had a CA-RMSD below 1.5 A, and 60%had a CA-RMSD below 1.0 A. As expected,the deviations from the known structure werenot uniformly distributed in the whole struc-ture but were more likely to occur in theloop regions (see Figure 4b). Therefore, thelarger the number of structures of a particularcomplex and of its protein constituents, thelarger the probability of success (19, 61). Wetested the structural coverage for human SH3domains using stringent cutoffs and the 79nonredundant (at a 95% identity cutoff) SH3domains of known X-ray structure that wereavailable in the Protein Data Bank (PDB).Out of a total of 453 human SH3 domains,64 (or 14%) have been solved, and 97 (or22%) have at least 60% identity to knownstructures and should be possible to modelwith very high accuracy. Another 140 domains(31%) have at least 40% average identity totwo of these templates, which also assures thelikelihood of a very accurate model. Currentstructural coverage and automatic homologymodeling methods allow for determination ofclose to 67% of all human SH3 domains (seeFigure 5). This is, however, a very favorablecase, which likely represents an upper boundof the current possibilities for protein mod-eling. For the majority of domains, structuralcoverage is scarce, and therefore, there are se-quences that cannot be modeled reliably.

4. LEVELS OF PREDICTION

In this section, we discuss the advances inthe use of emperical and energy potentials to


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

Figure 4Automatic modeling of SH3 domains. (a) SevenSH3 domains of known structure were modeled bythe current best practices for homology modeling[Dalton & Jackson (81)] using eithersingle-template (circles) or two-template modeling(diamonds). For these seven domains, most of themodels produced were of good quality, even forsequence identities close to 30%. (b) For eachdomain, the average CA-RMSD per residue canbe plotted back onto the known structure todetermine what regions are harder to model. Forall domains, difficulties in modeling were generallylocated in the loop regions, primarily the N-scrloop, the distal loop, and the tip of the RT loop. Inpanel b, low = 0, and high = >2. Abbreviations:CA-RMSD, carbon α-root mean square deviation;RMSD, root mean square deviation.−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→

predict protein interactions. We mention theanalysis of domain-peptide interactions sep-arately to emphasize some of the difficultiesassociated with the prediction of this type ofinteraction.

4.1. Protein-Protein Interactions

Assuming that similar sequences have a sim-ilar fold and that domains with a similar foldinteract through the same surface (17), muchprogress has been made in predicting protein-protein interactions on the basis of structuralinformation (17) and homology modeling (19,61). In Figure 6, we show different templatestructures for use depending on the actualmodeling problem. On the basis of the origi-nal NMR structure, template structures usedin homology modeling can be generated. Themodification of the original template struc-ture depends on the application and on thesequence similarity of the domain to be mod-eled with the template.

� If the sequences are very similar, andthe loops between secondary structureelements have identical lengths, thecomplete X-ray structure is used as atemplate structure (Figure 6a) (68).The predictions are very accurate be-cause they take into consideration re-

Low0

High>2

CA-RMSD

b

a

Tem

pla

te s

equ

ence

(%

)

CA-RMSD

Two templates

Single template

20

40

60

80

100

0 1 32 4

gions at the edge of the interface, whichcontribute a small amount to the overallbinding energy and thus are included.

� If sequences have different loop lengthsthan those in the original X-ray struc-ture and the loops are not, or are min-imally, involved in the interaction, 3D


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

SH3 domains with above 60% identity to at least one known structure

Human SH3 domains with above 40% average identity to two known structures

Human SH3 domains that are not so easily modeled with current structural coverage

Figure 5Modeling human SH3 domains. We built a phylogenetic tree for 453 human SH3 domains from theSMART database (http://smart.embl.de) and 79 nonredundant SH3 domains of known X-ray structurefrom the Protein Data Bank (http://www.pdb.org/pdb/home/home.do). Those domains that eitherhave a known structure or that should be possible to model using available structures are indicated(diamonds).

structures are modified by deleting thestructural parts not involved in the in-teraction (Figure 6b) (19, 61).

� If sequences have different loop lengthsand the loops contribute significantlyto binding, the loops are generated

using the WHAT IF library (56), orany other homology modeling tool,i.e., MODELLER (57), which con-tains loops of different lengths froma database of X-ray loop fragments(Figure 6c). This was successful in the


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

b Homology modeling without loopsa Single-/multiple-point mutants

d Homology modeling with backbone movements or generating chimera

c Homology modeling of loops

Figure 6Template structures for homology modeling. (a) Complete 3D complex structures can be used if thesequences to be modeled have a similar loop lengths. (b) To model sequences with different loops lengths,which are not involved in the interaction, 3D complex structures are modified by deleting loops andsecondary structural elements not involved in the interaction. (c) To model sequences with different looplengths when the loops are involved in the interaction, loop template structures are generated using theWHAT IF library (see text). (d ) If the sequences modeled are expected to have significant backbonechanges, new chimera template structures are generated by superimposing the 3D complex structureswith a 3D structure of a single domain of a similar sequence. Another possibility is to generate newtemplate structures with a changed backbone using molecular dynamic simulations.

prediction of the Ras-effector interac-tion (61).

� If the structure of one of the part-ners to be modeled into the complex isknown and differs from the template,or if the sequence analysis suggests im-portant conformational changes at the

interaction surface, then it is necessaryto generate new templates (Figure 6d ).There are two ways of doing this. Onecan generate complex chimeras by su-perimposing on a 3D complex struc-ture the known 3D structure of a singledomain. Alternatively, if there is clear


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

TNF: tumornecrosis factor

evidence that particular sequences in aprotein family determine certain spe-cific local changes, a domain chimerais built by using different parts fromproteins belonging to the same family(domain chimeras). This was successfulfor the prediction of SH3 domains withtheir target peptides (84).

Is there an ideal domain interaction typefor the prediction of interactions on the ba-sis of structural information? Ras proteinsmediate their binding to effector domainsusing β-sheet interactions and loops. Thestructural flexibility in these interfaces islow because of the backbone hydrogen bond(H-bond) interactions. In fact, the main struc-tural changes occur in the helix α1 in theRas-binding domain; these changes can havesignificantly different conformations depend-ing on the complex. Thus, the accuracy ofthe predictions is very good, although onlysix different template structures have beenused (61). The tumor necrosis factor (TNF)family ligand-receptor binding seems to beideal for structure-based design because TNFfamily members adopt a very similar tertiarystructure, and the diverse features of sur-face residues mediate the specificity betweendifferent TNF family members (85). Inter-actions through the formation of a β-sheetseem easy to predict, probably because thebackbone-backbone H-bonds restrict the pos-sibilities of slightly different conformationsin various complexes. Surfaces that involvelittle or no main chain H-bonds are moreproblematic for the simple reason that sidechain mutations could slightly change the in-teraction geometry between molecules A andB, and therefore the number of templatesrequired for successful prediction increasesenormously. The flexibility in protein-peptidecomplexes is much higher, and thus predictionmethods for protein-peptide interactions arediscussed separately, see below.

Large-scale predictions were done for theRas-effector system (61). Because in this ap-proach the backbone is kept constant and

just the side chains are replaced, the prob-lem of possible backbone movements is takeninto consideration by using several templatestructures to model a sequence and to calcu-late the binding energy. Protein design algo-rithms, such as FoldX (86–88), usually predictchanges in affinity accurately, but the interac-tion energies for different protein complexesare not easily transferable into affinity con-stants, as derived from quantitative bindingexperiments. However, when a set of 20 Rasproteins was modeled in complex with Rasand Rap, a low correlation between the ex-perimental and calculated affinities was found(19). Therefore, a successful approach to an-alyze the interaction energies and to decideif two proteins interact on the basis of theirinteraction energies was to select the modelwith the lowest interaction energy generatedusing different template structures (19, 61).However, if the total energy (protein stabil-ity) of a sequence modeled with a particulartemplate structure is very bad (has high en-ergy) as a result of high van der Waals clashes,the model should be discarded because the re-sult indicates that the sequence and the tem-plate structure are not compatible, and thusthe result of homology modeling will not bereliable, although interaction energies mightbe favorable. After the best model, gener-ated using different template structures, hasbeen selected, one needs criteria for energythresholds to decide whether this sequencewill bind. To do so, energy thresholds aredefined by calibrating energy sum values us-ing experimental information (61). Using thisthreshold information, new binding and non-binding domains can be successfully pre-dicted, and the rest of the prediction is inthe “twilight zone” (61). The prediction ac-curacy of predicting binding and nonbindingdomains is very high (∼0.8) using this method(61).

Interface modeling, although accurate, is atime-consuming process, especially when pre-dictions are done on a genome-wide scale.Consequently, a new method, the prediction


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

of protein-protein interactions on the basis ofenergy matrices, was successfully developedand applied to the prediction of Ras-effectorinteractions (93). In this method, position-specific energy matrices are generated, andusing different Ras-effector X-ray templatestructures, all amino acids in the Ras-bindingdomain are sequentially mutated to all otheramino acid residues, and the effect on bindingenergy is calculated. Then the precalculatedmatrices are used to score the binding of anyRas or effector sequences. Using these ma-trices, the sequences of putative Ras-bindingdomains are scanned quickly to calculate anenergy sum value. By calibrating energy sumvalues with quantitative experimental bindingdata, thresholds are defined, and nonbindingdomains are excluded quickly. Sequences thathave energy sum values above this thresholdare considered potential binding domains andcan be further analyzed using homology in-terface modeling.

In Figure 7, the prediction success iscompared—as judged from a range of en-ergy thresholds—with a set of 50 UBD do-mains in complex with Ras and Rap usingthree different methods: homology model-ing with loops, homology modeling withoutloops, and energy matrices. The highest dis-crimination power is obtained using homol-ogy models that explicitly take loop modelinginto consideration (Figure 7a). After addingexperimental binding information, there aretwo clear thresholds for predicting bindingand nonbinding domains, leaving only a verynarrow twilight zone of 5 kcal/mol. Discrimi-nation power decreases somewhat when mod-eling Ras-effector interactions if no loops aretaken into consideration (Figure 7b). Here,we also observe clear thresholds for bind-ing and nonbinding domains; however, thearea of twilight is little larger (10 kcal/mol).The results for predictions with position en-ergy matrices using the test set of Ras/Rap-effector interactions (Figure 7c) show a verygood discriminative power for nonbindingdomains. However, the false positives can oc-cur at low-energy values, which makes defin-

Protein domain: anelement of overallprotein structurethat is self-stabilizingand often foldsindependently of therest of the proteinchain

ing the threshold for predicting binding do-mains difficult.

4.2. Domain-Peptide Interactions

The studies mentioned above focused on try-ing to predict domain-domain interactionsand usually did not consider domain-peptideinteractions. That is, the interaction is as-sumed to be between two folded structureswith relatively large binding interfaces insteadof interactions with linear peptides as is thecase of several protein domains (SH3, SH2,WW, 14-3-3 domains). The binding speci-ficity of peptide-binding domains can be char-acterized experimentally by many differentapproaches with the use of oriented peptidelibraries. These peptide libraries can be pre-sented either in phage display (94), spottedon cellulose membrane (95, 96), or allowedto interact with protein arrays containing thebinding domains (97). These experimental ap-proaches are time consuming and do not nec-essarily provide with an understanding of thestructural properties that define the specificityof each domain. Developments in structure-based predictions of domain-peptide interac-tions are poised to circumvent these deficits.We discuss these developments, giving em-phasis to the difficulties associated with thelarger conformational variability of peptideligands and the lack of specificity of peptide-binding domains.

Some of the first attempts to predictdomain-peptide interactions with the aid ofsome structural information were developedin the mid- to late 1990s (105). In thisstudy, a simple energy potential function,specially developed for particular domain-peptide interactions, was fitted using struc-tures of known complexes and experimentallydetermined binding energies. This approachrequires significant knowledge and is not eas-ily applicable to other cases. In the early2000s, different groups built upon this idea todetermine the binding specificity of domainfamilies by analysis of complexes and knownbinding peptides for SH3 domains (106),


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

ΔGint

(calculated) of homology models (kcal mol–1)

–33 –31 –29 –27 –25 –23 –21 –19 –17 –15 –13 –11 >0

–25

12

10

8

6

4

2

0

Nu

mb

er o

f m

od

els

12

10

8

6

4

2

0

Nu

mb

er o

f m

od

els

–25

12

10

8

6

4

2

0

Nu

mb

er o

f m

od

els

a

b

c

Binding (100%)Twilight

(50% binding)Nonbinding

(100%)



(100%)

ΔGint


–23 –21 –19 –17 –15 –13 –11 –9 –7 –5 –3 –1 >00

ΔGint




(100%)

–23 –21 –19 –17 –15 –13 –11 –9 –7 –5 –3 –1 >00

Experimental binding energy(kcal mol–1)

–6.0 to –6.4

–6.5 to –6.9

–7.0 to –7.4

–7.5 to –7.9

ΔH = 0

–8.0 to –8.4

–8.5 to –8.9

–9.0 to –9.4

–9.5 to –10


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

protein kinase domains (107), and SH2 do-mains (108). The structures of several do-mains of the same family, in complex witha peptide, were analyzed together with in-formation on known binding peptides. Theauthors, mentioned above, determined theresidues that are important for binding fromthe structural analysis and then used differ-ent methods to correlate the binding residuesin the domain with the target residues in thepeptide. With these rules, it is possible touse any protein domain of the family understudy (SH3, kinase, and SH2), with similarenough sequences, and match key residueswith predicted binding specificity. Predictingthe binding specificity of new domains in thisway can be very accurate but is only applica-ble for domains that are significantly similar tothe ones already studied (109). By this means,current knowledge of binding specificity is ex-tended to domains of identical sequences, butseveral complex structures and experimen-tally determined peptide-binding data mustbe available for the domain family chosen.These procedures are also related to the ap-proach used by Aloy & Russell (15), describedabove, for Ras/RBD interactions.

More recently, our group and the labof Wei Wang have used general purposeenergy force fields to predict the bindingspecificity of peptide-binding domains (84,110, 111; N. Sanchez, under review; G.Fernandez-Ballester, in preparation). Thesemethods have the great advantage that nodomain-specific information is required asidefrom a model of the complex for the do-main under investigation. This might not be adifficult requirement because, in some cases,this model can be obtained from homologymodeling, shown by Gregorio and colleagues,manuscript in preparation, and G. Fernandez-Ballester, manuscript in preparation. As de-

scribed above for Ras-effector interactions, itis possible to use these force fields to do insilico mutagenesis of the target ligand. Oneadvantage of predicting protein-peptide in-teractions is that the target ligand usuallyadopts an extended conformation, and an-other is that ligand positions can be assumedto be mostly independent. From this com-putational analysis, a position-specific scoringmatrix (PSSM) is created containing the infor-mation on the preferred residues at each po-sition in the ligand. Given that the target is apeptide ligand and not a folded domain, usingthese ligand matrices requires no domain in-formation or modeling for the target proteinanalyzed.

The main difficulty in applying these ap-proaches to protein-peptide predictions is thelarge conformation variability possible for theligand. SH3 ligands, for example, are usuallyreferred to as class I or class II ligands de-pending on their orientation (112). Soon afterthe initial peptide studies, a third group forless common types of ligands was suggested(113). Our own recent analysis of SH3 domaincomplexes retrieved 29 SH3 ligands of differ-ent conformations (G. Fernandez-Ballester &L. Serrano, in preparation). This large vari-ability makes it harder to evaluate the bind-ing of putative targets because different aminoacids might be accommodated with move-ments of the peptide backbone. As detailedabove, it is possible to model each differentconformation separately and analyze the en-ergy of all conformations, but the problem be-comes harder to tackle. Future improvementsin domain-peptide predictions from structurehave much to gain from programs capable ofpredicting backbone movements upon muta-tion. This can be achieved by incorporatingmolecular dynamics algorithms (110) or frag-ment libraries (65).

←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−Figure 7Prediction accuracy for a test set of 50 ubiquitin-like domains modeled in complex with Ras and theRas-like protein, Rap1. (a) Homology modeling using loops. (b) Homology modeling without loops.(c) Prediction using position energy matrices. Abbreviation: �Gint, interaction energy.


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

Another disadvantage for target predictionof domain-peptide prediction, when com-pared to protein-protein predictions, is thesmaller specificity defined by the bindingsurface. A domain-peptide interaction is usu-ally determined by a small number of residuesin the target peptide (114), which makes themdifficult to identify in the proteome. How-ever, many of these target sites are thoughtto occur in unstructured regions of the pro-teome (115). Interactions with unstructuredprotein segments might be functionally im-portant for different reasons (e.g., decouplingof specificity and affinity, clustering of multi-ple binding sites, faster rates of association anddissociation), and many computational strate-gies have been developed to predict these sitesof disorder [see the review by Radivojac et al.(116) and Table 1]. Therefore, improvementsin binding accuracy can be achieved by re-stricting the search of putative binding sitesto protein segments predicted to be intrinsi-cally disordered (117).

4.3. Nonatomic Detail Prediction

In principle, predictions, which are based onsequence alone, are not possible because onlyone residue change in the interface could leadto complete loss of binding affinity. However,aside from homology modeling, whichquantitatively describes the interaction inatomic-level detail, nonatomic-detail meth-ods have been successfully used to predict theinteraction of proteins (15, 98). These meth-ods, such as Interaction Prediction throughTertiary Structure (InterPreTS) (http://www.russell.embl.de/cgi-bin/interprets2)(99) and MULTIPROSPECTOR (100), useempirical pair potentials, which describe howwell a homologous pair of sequences fit intoa complex structure. Both were successfullyapplied to predict the specificities of largedomain families, e.g., the complex betweenfibroblast growth factors and their receptors(15).

5. STRUCTURAL INFORMATIONAS A TOOL TO ANALYZEPROTEIN INTERACTIONNETWORKS

As discussed above, structural information incombination with protein design algorithmscan be used to predict new protein interac-tions. However, there are other possible usesof structural information in the analysis of in-teraction networks. The relative interactionenergy, which is based on complex proteinstructures, can be predicted in a fast and ac-curate way with existing protein design al-gorithms. Successful predictions of bindingaffinities for wild-type and mutant complexeshave been carried out using the protein de-sign algorithms FoldX (86–88) and Rosetta(89–91). Examples are the prediction of Ras-effector interactions (19, 61, 68) and ala-nine mutations at the interface of a mem-ber of the TNF-related apoptosis-inducingligand (TRAIL) family in complex with itsreceptor, DR5 (92). Prediction of ubiquitinwith ubiquitin-interacting motifs also givesthe qualitatively correct trend (83). Struc-tural information in combination with bioin-formatic tools (118, 119) and/or protein de-sign algorithms (21) can be used to predict thefunctional effect of human single-nucleotidepolymorphisms.

As mentioned above protein design algo-rithms usually predict changes in affinity ac-curately. However, prediction of the absolutevalue for a binding constant or kinetic param-eter is more difficult. Thus, the interactionenergies for different protein complexes arenot easily transferable into affinity constants,as derived from quantitative binding exper-iments. However, for proteins of the samefamily, and for complexes between proteinsthat do not involve conformational changesupon binding, rough estimates can be ob-tained for the binding constant (19, 20) andkon values (19, 51) provided that a calibra-tion with experimental data has been donefirst.


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

Finally, we should mention the possibil-ity of using structural information to parti-tion protein interaction networks into struc-turally compatible subnetworks. It is quiteobvious now that many proteins do have morepartners than surface available for interac-tion and therefore that some interactions mustbe mutually exclusive. In principle, if enoughinformation is available regarding the com-plexes made by one protein or domain withseveral other proteins or domains, one couldeasily decide which interactions are simulta-neously possible by doing a superimpositionand looking for excluded surfaces (104). In theabsence of structural information, it is pos-sible to look for complexes involving homo-logue domains and, assuming that proteins ofthe same families will in general interact thesame way, do the same exclusion exercise (A.Campagna, C. Kiel, & L. Serrano, manuscriptin preparation). With more structures andcomplexes being deposited every day, the like-lihood of dissecting protein interaction net-works into functional subnetworks is becom-ing more pausible and will be one of the maincontribuitions of structural biology to systemsbiology.

6. MINING FOR BIOLOGICALCONTEXT

The structure-based methods describedabove predict binding affinities in the sameway one would obtain them from in vitroassays. Knowledge of binding affinities aloneis not sufficient to determine if proteinsinteract inside the cell. What determinesbinding in a living organism is a conjugationof factors, such as expression levels, localiza-tion, complex formation (i.e., scaffolding),posttranslation modifications, splicing forms,and association with small compounds. Weneed what one could call the biologicalcontext information or a way to predict it tomake inroads into in vivo binding predictions.Some recent developments started to tacklethis problem by using integrative probabilis-tic approaches that try to weight different

Protein fusion: inprotein evolutionanalysis, a proteincontaining twosequence blocks thatare homologous toproteins in otherspecies

information sources and by combining themwith protein-binding specificity information(61, 120). We explore, below, some of themost commonly used methods to predictprotein-protein interactions and ways tocombine these into a single scoring function.

6.1. Sequence-Based Methods

Some early attempts to predict protein-protein association came from the early com-parative genomics analyses. For example, itwas observed that conserved proximity of twoopen reading frames correlates with an in-crease in likelihood of protein interaction be-tween the coded proteins (121).

In similar fashion, it was shown that phylo-genetic association of protein pairs also signalsfunctional linkage. That is, if two proteins arealways present or absent together (not nec-essarily in close vicinity in the genome) inmany different species, then the two proteinsare likely part of the same complex/pathway(122). Another sequence-based method re-lies on the determination of protein fusionevents. In 1999, Marcotte and colleagues(123) showed that if two proteins are some-times seen in some species fused into onecontiguous protein, then these are very likelyrelated in function and, therefore, also morelikely to interact. These methods have the ad-vantage that they only require the simple anal-ysis of a large number of genomes, but theydo not directly predict protein interactionsbut instead functional association. Anotherdisadvantage is that these methods are notvery effective in eukaryotic species, given thatthey have a more complex genome structureand fewer of these genomes are available tostudy.

A different method to predict protein in-teraction from sequence information was de-veloped by Pazos and colleagues (124, 125).Analyzing alignments of interacting proteins,the authors showed that correlated muta-tions, between the two proteins, are iden-tifiable signals for protein-protein interac-tion. This method not only identifies direct


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

protein-protein association, but it also deter-mines the protein-binding regions.

Since these works by Pazos et al., manyother metrics of correlated mutations havebeen proposed [reviewed by Halperin et al.(126)], although most have been directedat studies of intramolecular interactions in-stead of prediction of intermolecular contacts.Halperin and colleagues (126) tested the ca-pacity of these different measures to predictprotein-protein contacts and found most tobe weak predictors.

A related method to the study of correlatedmutations is the analysis of the coevolutionof entire sequences known as “mirror tree”methods, pioneered by Goh and colleagues in2000 (127) and further improved later, see thereview by Shoemaker & Panchenko (128). Inthis approach, proteins from interacting pro-tein families are aligned, and the correlationof the obtained pairwise distances is used as asignal of coevolution.

In both the correlated mutations and mir-ror tree methods, it is assumed that the evo-lution of the protein sequences of the inter-acting families will be mostly driven by thecorrelated changes in the binding epitopes.Given that most of variance in protein evolu-tionary rates can be explained by the level ofprotein expression (129), it is possible that thepredictive power of these approaches might belimited (130).

6.2. Domain Interaction Propensity

From the analysis of the different interactionnetworks experimentally determined, one canextract the likelihood that any two given pro-tein domains might interact. This knowledgeof domain interaction propensities can then beused to direct the prediction of new protein-protein interactions (101–103). These ap-proaches were further improved by advancesin the statistical analysis (131) with the currenttop-performing algorithm being the InSiteprogram (132). A great advantage of these ap-

proaches is that they not only predict protein-protein interactions but also directly informon the putative domains involved.

6.3. Graph Theory Methods

One early approach in the study of large cel-lular interaction maps was to simplify the in-formation into a graph form. Each componentwas symbolized as a node, and each interactionas an edge in the graph (133). This graph ab-straction is, in many respects, too large a sim-plification of what we already know of proteinsand cellular functions, but it allowed for a vastnumber of studies regarding interaction net-works (133–136). One interesting observationcoming from these graph studies is that pro-tein interactions, or edges, can be predictedjust by looking at the graphs of current incom-plete interactomes (137, 138). Some recentadvances in protein-protein interaction pre-diction have come from trying to do compar-ative graph analysis between the interactomesof different species (139, 140). This type ofapproach, dubbed comparative interactomics(141), can be thought of as the analogy to com-parative genomics.

6.4. Integration of Different Methods

The proliferation of experimental and com-putational methods to study protein-proteininteractions has prompted comparative stud-ies of the different approaches (142). Theseanalyses have shown that the different exper-imental and computational methods are notoverlapping, and all suffer from low accuracyand low coverage when benchmarked againsta trusted dataset. Also, it was demonstratedthat interactions that were observed in morethan one of the analyses were more likely to bea true interactions. From these first efforts tocompare the different methods came then theidea that more reliable information can be ob-tained from the combination of different ex-perimental and computational observations.


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

One year after the comparative analysis fromvon Mering and colleagues (142), two dif-ferent groups provided the first examplesof a statistical combination of the vari-ous approaches with a Bayesian framework(14, 143). This strategy was used in 2005to integrate different information sources topredict human protein interactions with con-siderable success (10,000 predictions with a20% false-positive rate) (144). Given the in-terdependences that occur between the dif-ferent datasets used, there is a limit to thebenefit obtained from this type of integration(145).

These probabilistic weighting schemes canbe used to increase the reliability of interac-tions determined by the prediction of bindingspecificities. Recently this approach was usedto improve the prediction of human kinasetargets (120). In this work, known phospho-rylation targets were linked to the most likelykinase by combining information on bindingspecificity with prediction of functional in-teractions taken from the STRING database(146). In our lab, we have used a naive Bayesnetwork to weigh different interaction pre-dictors with a set of in vivo interactions re-trieved from the Human Protein ReferenceDatabase (147). This combined predictor wasthen used to attribute confidence scores to pu-tative Ras/RBDs (61).

We believe that effort should be made bythe community to establish prediction serversthat are constantly updated to integrate mean-ingful datasets and computation methods.These servers in combination with putativebinding specificity should allow us to predictbiological pathways with great accuracy.

7. LIMITATIONS

Structure-based prediction of protein-proteininteractions is becoming a mature field. Therecent advances in protein design algorithmsand MSAs, in combination with other bioin-formatic approaches, allow genome-wide pre-diction for particular domain interactions.However, there are some serious limitationsthat preclude its broad general use. The moreimportant one is the lack of enough struc-tural templates in many cases, both at thelevel of isolated domains and at the level ofprotein complexes. The quality of the predic-tions is strictly related to the abundance of dif-ferent templates that explore the conforma-tional space available for a particular complex.Molecular dynamics and chimeras can partlycompensate for it, but they require some ex-pertise and are time consuming. It is impor-tant that, in the future, structural genomicsproject scientists focus on filling the structuralgaps in families. Although it does not maketoo much sense to determine the structure ofa domain if there is already a PDB record ofa closely related protein, filling the structuralgaps is critical to solve structures of sequenceswith low homology.

Even with the best methodology (experi-mental and computational), a detailed bioin-formatic or experimental biological validationshould be done. Any computationally or ex-perimentally determined Kd is meaningless inthe absence of a biological context. Thus, amicromolar Kd could be relevant if the con-centration is high or if one of the two pro-teins is localized. Similarly, a nanomolar Kd ismeaningless if the two proteins never see eachother.

SUMMARY POINTS

1. Methods to predict protein interactions on the basis of structural information cancomplement large-scale experimental protein interaction networks.

2. All prediction methods are based on the finding that structurally similar domainsusually interact in a similar way.


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

3. However, it is not the fold per se which determines whether two proteins can interact,but rather certain amino acid residues on the surface of this conserved fold determinethe affinity and specificity between the domains.

4. Homology modeling is used to generate complex models for sequences that are ho-mologous to the template (X-ray) structure, and interaction energies can be calculatedusing protein design force fields.

5. Energy matrices contain the binding energy contributions for all 20 amino acidresidues for every position in the interface. They can be used to quickly scan se-quences and evaluate whether they are potential binding domains and need to bemodeled, and they can exclude sequences that cannot bind.

6. Structural information can also be used to partition interaction networks into struc-turally compatible subnetworks.

7. Crude estimates of binding (equilibrium and kinetic) parameters as well as functionalconsequences of point mutations can be obtained from structural analysis.

8. Bioinformatics data integration helps validate predictions and provides further knowl-edge about expression levels and localization.

9. The main limitations of structure-based predictions are the quality of template struc-tures, slightly different binding modes, and backbone changes of the participatingdomains.

FUTURE ISSUES

1. In the area of homology/interface modeling, the following important issues need tobe solved: backbone moves, loop modeling, and automatization of template selection,which will require the determination of compatibility between sequences and specifictemplate structures.

2. Structure-based sequence alignments need to be optimized.

3. Current structural proteomics approaches will solve the structure of more proteincomplexes and thus increase the number of template structures (∼10,000 domain-domain interaction types are predicted, so far only ∼2,000 are known).

4. The existing methods to predict domains based on the amino acid sequence need tobe optimized by taking secondary structure prediction and structural information intoaccount.

5. Simulations have to be performed to integrate protein interaction network changes(time and localization).

6. Prediction of thermodynamic and kinetic parameters (Kd, kon, koff) on the basis ofstructure need to be further developed.

7. The roles of different Kds/specificity for signal transduction and signaling flow haveto be studied.

8. There is a need for bioinformatic servers, which are capable of integrating predictedbinding affinity with other experimental resources to determine biologically relevantprotein interactions.


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

DISCLOSURE STATEMENT

The authors are not aware of any biases that might be perceived as affecting the objectivity ofthis review.

ACKNOWLEDGMENT

We thank the EU for finantial support (Interaction Proteome, grant number LSHG-CT-2003-505520).

LITERATURE CITED

1. Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, et al. 2005. Nature437:1173–78

2. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, et al. 2000. Nature 403:623–273. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, et al. 2001. Proc. Natl. Acad. Sci. USA

98:4569–744. Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, et al. 2005. Cell 122:957–685. Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, et al. 2002. Nature 415:141–476. Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, et al. 2002. Nature 415:180–837. Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, et al. 2006. Nature 440:631–368. Zhu H, Bilgin M, Bangham R, Hall D, Casamayor A, et al. 2001. Science 293:2101–59. Kung LA, Snyder M. 2006. Nat. Rev. Mol. Cell Biol. 7:617–22

10. Cusick ME, Klitgord N, Vidal M, Hill DE. 2005. Hum. Mol. Genet. 14:R171–8111. Fields S. 2005. FEBS J. 272:5391–9912. Berggard T, Linse S, James P. 2007. Proteomics 7:2833–4213. Devos D, Russell RB. 2007. Curr. Opin. Struct. Biol. 17:370–7714. Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, et al. 2003. Science 302:449–5315. Aloy P, Russell RB. 2002. Proc. Natl. Acad. Sci. USA 99:5896–90116. Kiel C, Serrano L. 2008. In Structural Proteomics, ed. J Sussman, I Silman, pp. 29–46.

Singapore: World Sci. In press17. Aloy P, Russell RB. 2006. Nat. Rev. Mol. Cell Biol. 3:188–9718. Lensink MF, Mendez R, Wodak SJ. 2007. Proteins 69:704–1819. Kiel C, Wohlgemuth S, Rousseau F, Schymkowitz J, Ferkinghoff-Borg J, et al. 2005.

J. Mol. Biol. 348:759–7520. Kortemme T, Joachimiak LA, Bullock AN, Schuler AD, Stoddard BL, Baker D. 2004.

Nat. Struct. Mol. Biol. 11:371–7921. Pey AL, Stricher F, Serrano L, Martinez A. 2007. Am. J. Hum. Genet. 81:1006–2422. Zhang Y. 2007. Proteins 69:108–1723. Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T, Tramontano A. 2007. Proteins

69:3–924. Kim WK, Henschel A, Winter C, Schroeder M. 2006. PLoS Comp. Biol. 2:1151–6425. Aloy P, Russell RB. 2004. Nat. Biotechnol. 22:1317–2126. Chothia C. 1992. Nature 357:543–4427. Chandonia J-M, Brenner SE. 2005. Proteins 58:166–7928. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. J. Mol. Biol. 215:403–1029. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. 1997. Nucleic Acids Res.

25:3389–402


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

30. Linding R, Russell RB, Neduva V, Gibson TJ. 2003. Nucleic Acids Res. 31:3701–831. Dosztanyi Z, Csizmok V, Tompa P, Simon I. 2005. J. Mol. Biol. 347:827–3932. Dosztanyi Z, Csizmok V, Tompa P, Simon I. 2005. Bioinformatics 21:3433–3433. Peng K, Vucetic S, Radivojac P, Brown CJ, Dunker AK, et al. 2005. J. Bioinform. Comput.

Biol. 3:35–6034. Schultz J, Milpetz F, Bork P, Ponting CP. 1998. Proc. Natl. Acad. Sci. USA 95:5857–6435. Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, et al. 2006. Nucleic Acids Res. 34:D257–

6036. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, et al. 2004. Nucleic Acids Res. 32:D138–

4137. Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, et al. 2006. Nucleic Acids Res.

34:D227–3038. Marchler-Bauer A, Anderson JB, Cherukuri PF, DeWeese-Scott C, Geer LY, et al. 2005.

Nucleic Acids Res. 33:D192–9639. Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, et al. 1997. Structure 5:1093–

10840. Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, et al. 2005. Nucleic Acids Res. 33:D247–5141. Murzin AG, Brenner SE, Hubbard T, Chothia C. 1995. J. Mol. Biol. 247:536–4042. Andreeva A, Howorth D, Brenner SE, Hubbard TJP, Chothia C, et al. 2004. Nucleic Acids

Res. 32:D226–2943. Finn RD, Marshall M, Bateman A. 2005. Bioinformatics 21:410–1244. Stein A, Russell RB, Aloy P. 2005. Nucleic Acids Res. 33:D413–1745. Winter C, Henschel A, Kim WK, Schroeder M. 2006. Nucleic Acids Res. 34:D310–1446. Ogmen U, Keskin O, Aytuna AS, Nussinov R, Gursoy A. 2005. Nucleic Acids Res.

33:W331–3647. Jefferson ER, Walsh TP, Roberts TJ, Barton GJ. 2007. Nucleic Acids Res. 35:D580–8948. Davis FP, Sali A. 2005. Bioinformatics 21:1901–749. Wohlgemuth S, Kiel C, Kraemer A, Serrano L, Wittinghofer F, et al. 2005. J. Mol. Biol.

348:741–5850. Kiel C, Serrano L. 2007. Curr. Chem. Biol. 1:215–2551. Selzer T, Albeck S, Schreiber G. 2000. Nat. Struct. Biol. 7:537–4152. Kiel C, Selzer T, Shaul Y, Schreiber G, Herrmann C. 2004. Proc. Nat. Acad. Sci. USA

101:9223–2853. Goldsmith-Fischmann S, Honig B. 2003. Protein Sci. 12:1813–2154. Ginalski K. 2006. Curr. Opin. Struct. Biol. 16:172–7755. Schwede T, Kopp J, Guex N, Peitsch MC. 2003. Nucleic Acids Res. 31:3381–8556. Vriend G. 1990. J. Mol. Graph. 8:52–5657. Sali A, Blundell TL. 1993. J. Mol. Biol. 234:779–81558. Mendes J, Baptista AM, Carrondo MA, Soares CM. 1999. Proteins 37:530–4359. Allen BD, Mayo SL. 2006. J. Comput. Chem. 27:1071–7560. Santana R, Larranaga P, Lozano JA. 2007. Artif. Intell. Med. 39:49–6361. Kiel C, Foglierini M, Kuemmerer N, Beltrao N, Serrano L. 2007. J. Mol. Biol. 370:1020–

3262. Zheng N, Wang P, Jeffrey PD, Pavletich NP. 2000. Cell 102:533–3963. Chung SY, Subbiah S. 1996. Structure 4:1123–2764. Holyoake J, Caulfeild V, Baldwin SA, Sansom MS. 2006. Biophys. J. 91:L84–8665. Simons KT, Bonneaum R, Ruczinski I, Baker D. 1999. Proteins 3(Suppl.):171–7666. Bradley P, Malmstrom L, Qian B, Schonbrun J, Chivian D, et al. 2005. Proteins

61(Suppl.):128–34


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

67. Yarov-Yarovoy V, Schonbrun J, Baker D. 2006. Proteins 62:1010–2568. Kiel C, Serrano L, Herrmann C. 2004. J. Mol. Biol. 340:1039–5869. Wedemeyer WJ, Baker D. 2003. Proteins 53:262–7270. Jensen PR, Axelsen JB, Lerche MH, Poulsen FM. 2004. J. Biomol. NMR 28:31–4171. Kopp J, Schwede T. 2004. Pharmacogenomics 5:405–1672. Lipman DJ, Altschul SF, Kececioglu JD. 1989. Proc. Natl. Acad. Sci. USA 86:4412–1573. Wallace IM, Blackshields G, Higgins DG. 2005. Curr. Opin. Struct. Biol. 15:261–6674. Edgar RC. 2004. Nucleic Acids Res. 32:1792–9775. Do CB, Mahabhashyam MS, Brudno M, Batzoglou S. 2005. Genome Res. 15:330–4076. Wallace IM, O’Sullivan O, Higgins DG, Notredame C. 2006. Nucleic Acids Res. 34:1692–

9977. O’Sullivan O, Suhre K, Abergel C, Higgins DG, Notredame C. 2004. J. Mol. Biol.

340:385–9578. Shatsky M, Dror O, Schneidman-Duhovny D, Nussinov R, Wolfson HJ. 2004. Nucleic

Acids Res. 32:W503–779. Madhusudhan MS, Marti-Renom MA, Sanchez R, Sali A. 2006. Protein Eng. Des. Sel.

19:129–3380. Chothia C, Lesk AM. 1986. EMBO J. 5:823–2681. Dalton JA, Jackson RM. 2007. Bioinformatics 23:1901–882. Grunberg R, Nilges M, Leckner J. 2007. Bioinformatics 23:769–7083. Kiel C, Serrano L. 2006. J. Mol. Biol. 355:821–4484. Musi V, Birdsall B, Fernandez-Ballester G, Guerrini R, Salvatori S, et al. 2006. Protein

Sci. 4:795–80785. Zhang G. 2004. Curr. Opin. Struct. Biol. 14:1–786. Guerois R, Nielsen JE, Serrano L. 2002. J. Mol. Biol. 320:369–8787. Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, et al. 2005. Nucleic Acids Res.

33:W382–8888. Schymkowitz JWH, Rousseau F, Martins IC, Ferkinghoff-Borg J, Stricher F, et al. 2005.

Proc. Natl. Acad. Sci. USA 102:10147–5289. Simons KT, Kooperberg C, Huang E, Baker D. 1997. J. Mol. Biol. 268:209–2590. Simons KT, Ruczinski I, Kooperberg C, Fox B, Bystroff C, et al. 1999. Proteins 34:82–9591. Kortemme T, Baker D. 2002. Proc. Natl. Acad. Sci. USA 99:14116–2192. Van der Sloot AM, Tur V, Szegezdi E, Mullally MM, Cool RH, et al. 2006. Proc. Natl.

Acad. Sci. USA 103:8634–3993. Kiel C, Serrano L. 2007. Bioinformatics 23:2226–3094. Tong AH, Drees B, Nardelli G, Bader GD, Brannetti B, et al. 2002. Science 295:321–2495. Landgraf C, Panni S, Montecchi-Palazzi L, Castagnoli L, Schneider-Mergener J, et al.

2004. PLoS Biol. 2:E1496. Huang H, Li L, Wu C, Schibli D, Colwill K, et al. 2008. Mol. Cell. Proteomics 7:768–8497. Jones RB, Gordus A, Krall JA, MacBeath G. 2006. Nature 439:168–7498. Lu L, Arakaki AK, Lu H, Skolnick J. 2003. Genome Res. 13:1146–5499. Aloy P, Russell RB. 2003. Bioinformatics 19:161–62

100. Lu L, Lu H, Skolnick J. 2002. Proteins 49:350–64101. Sprinzak E, Margalit H. 2001. J. Mol. Biol. 311:681–92102. Wojcik J, Schachter V. 2001. Bioinformatics 17(Suppl. 1):296–305103. Deng M, Mehta S, Sun F, Chen T. 2002. Genome Res. 12:1540–48104. Kim PM, Lu LJ, Xia Y, Gerstein MB. 2006. Science 314:1938–41105. Zhou Y, Abagyan R. 1998. Fold Des. 3:513–22


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

106. Brannetti B, Via A, Cestra G, Cesareni G, Helmer-Citterich M. 2000. J. Mol. Biol.298:313–28

107. Brinkworth RI, Breinl RA, Kobe B. 2003. Proc. Natl. Acad. Sci. USA 100:74–79108. Sheinerman FB, Al-Lazikani B, Honig B. 2003. J. Mol. Biol. 334:823–41109. Ferraro E, Via A, Ausiello G, Helmer-Citterich M. 2006. Bioinformatics 22:2333–39110. Hou T, Chen K, McLaughlin WA, Lu B, Wang W. 2006. PLoS Comput. Biol. 2:e1111. McLaughlin WA, Hou T, Wang W. 2006. J. Mol. Biol. 357:1322–34112. Zarrinpar A, Bhattacharyya RP, Lim WA. 2003. Sci. STKE 2003:RE8113. Cesareni G, Panni S, Nardelli G, Castagnoli L. 2002. FEBS Lett. 513:38–44114. Pawson T, Raina M, Nash P. 2002. FEBS Lett. 513:2–10115. Fuxreiter M, Tompa P, Simon I. 2007. Bioinformatics 23:950–56116. Radivojac P, Iakoucheva LM, Oldfield CJ, Obradovic Z, Uversky VN, Dunker AK. 2007.

Biophys. J. 92:1439–56117. Beltrao P, Serrano L. 2005. PLoS Comput. Biol. 3:e26118. Money S. 2006. Bioinformatics 6:44–56119. Torkamani A, Schork NJ. 2007. Bioinformatics 23:2918–25120. Linding R, Jensen LJ, Ostheimer GJ, van Vugt MA, Jorgensen C, et al. 2007. Cell

129:1415–26121. Dandekar T, Snel B, Huynen M, Bork P. 1998. Trends Biochem. Sci. 23:324–28122. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. 1999. Proc. Natl.

Acad. Sci. USA 96:4285–88123. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, et al. 1999. Science 285:751–53124. Pazos F, Helmer-Citterich M, Ausiello G, Valencia A. 1997. J. Mol. Biol. 271:511–23125. Pazos F, Valencia A. 2002. Proteins 47:219–27126. Halperin I, Wolfson H, Nussinov R. 2006. Proteins 63:832–45127. Goh CS, Bogan AA, Joachimiak M, Walther D, Cohen FE. 2000. J. Mol. Biol. 299:283–93128. Shoemaker BA, Panchenko AR. 2007. PLoS Comput. Biol. 3:e43129. Drummond DA, Raval A, Wilke CO. 2006. Mol. Biol. Evol. 23:327–37130. Hakes L, Lovell SC, Oliver SG, Robertson DL. 2007. Proc. Natl. Acad. Sci. USA 104:7999–

8004131. Riley R, Lee C, Sabatti C, Eisenberg D. 2005. Genome Biol. 6:R89132. Wang H, Segal E, Ben-Hur A, Li QR, Vidal M, Koller D. 2007. Genome Biol. 8:R192133. Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL. 2000. Nature 407:651–64134. Jeong H, Mason SP, Barabasi AL, Oltvai ZN. 2001. Nature 411:41–42135. Wagner A. 2001. Mol. Biol. Evol. 18:1283–92136. Wuchty S, Oltvai ZN, Barabasi AL. 2003. Nat. Genet. 35:176–79137. Goldberg DS, Roth FP. 2003. Proc. Natl. Acad. Sci. USA 100:4372–76138. King AD, Przulj N, Jurisica I. 2004. Bioinformatics 20:3013–20139. Kelley BP, Sharan R, Karp RM, Sittler T, Root DE, et al. 2003. Proc. Natl. Acad. Sci. USA

100:11394–99140. Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, et al. 2005. Proc. Natl. Acad. Sci.

USA 102:1974–79141. Cesareni G, Ceol A, Gavrila C, Palazzi LM, Persico M, et al. 2005. FEBS Lett. 579:1828–

33142. von Mering C, Krause R, Snel B, Cornell M, Oliver SG, et al. 2002. Nature 417:399–403143. Troyanskaya OG, Dolinski K, Owen AB, Altman RB, Botstein D. 2003. Proc. Natl. Acad.

Sci. USA 100:8348–53144. Rhodes DR, Tomlins SA, Varambally S, Mahavisno V, Barrette T, et al. 2005. Nat.

Biotechnol. 23:951–59


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

ANRV345-BI77-18 ARI 6 May 2008 15:23

145. Lu LJ, Xia Y, Paccanaro A, Yu H, Gerstein M. 2005. Genome Res. 15:945–53146. Snel B, Lehmann G, Bork P, Huynen MA. 2000. Nucleic Acids Res. 28:3442–44147. Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, et al. 2003. Genome

Res. 13:2363–71148. Peng K, Radivojac P, Vucetic S, Dunker AK, Obradovic Z. 2006. BMC Bioinformatics

7:208149. Ishida T, Kinoshita K. 2007. Nucleic Acids Res. 35:(Web Server Issue) W460-4150. Hirose S, Shimizu K, Kanai S, Kuroda Y, Noguchi T. 2007. Bioinformatics 23:2046–53


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

AR345-FM ARI 7 May 2008 14:43

Annual Review ofBiochemistry

Volume 77, 2008Contents

Prefatory Chapters

Discovery of G Protein SignalingZvi Selinger � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �1

Moments of DiscoveryPaul Berg � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 14

Single-Molecule Theme

In singulo Biochemistry: When Less Is MoreCarlos Bustamante � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 45

Advances in Single-Molecule Fluorescence Methodsfor Molecular BiologyChirlmin Joo, Hamza Balci, Yuji Ishitsuka, Chittanon Buranachai,and Taekjip Ha � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 51

How RNA Unfolds and RefoldsPan T.X. Li, Jeffrey Vieregg, and Ignacio Tinoco, Jr. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 77

Single-Molecule Studies of Protein FoldingAlessandro Borgia, Philip M. Williams, and Jane Clarke � � � � � � � � � � � � � � � � � � � � � � � � � � � � �101

Structure and Mechanics of Membrane ProteinsAndreas Engel and Hermann E. Gaub � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �127

Single-Molecule Studies of RNA Polymerase: Motoring AlongKristina M. Herbert, William J. Greenleaf, and Steven M. Block � � � � � � � � � � � � � � � � � � � �149

Translation at the Single-Molecule LevelR. Andrew Marshall, Colin Echeverría Aitken, Magdalena Dorywalska,and Joseph D. Puglisi � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �177

Recent Advances in Optical TweezersJeffrey R. Moffitt, Yann R. Chemla, Steven B. Smith, and Carlos Bustamante � � � � � �205

Recent Advances in Biochemistry

Mechanism of Eukaryotic Homologous RecombinationJoseph San Filippo, Patrick Sung, and Hannah Klein � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �229

v

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

AR345-FM ARI 7 May 2008 14:43

Structural and Functional Relationships of the XPF/MUS81Family of ProteinsAlberto Ciccia, Neil McDonald, and Stephen C. West � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �259

Fat and Beyond: The Diverse Biology of PPARγ

Peter Tontonoz and Bruce M. Spiegelman � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �289

Eukaryotic DNA Ligases: Structural and Functional InsightsTom Ellenberger and Alan E. Tomkinson � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �313

Structure and Energetics of the Hydrogen-Bonded Backbonein Protein FoldingD. Wayne Bolen and George D. Rose � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �339

Macromolecular Modeling with RosettaRhiju Das and David Baker � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �363

Activity-Based Protein Profiling: From Enzyme Chemistryto Proteomic ChemistryBenjamin F. Cravatt, Aaron T. Wright, and John W. Kozarich � � � � � � � � � � � � � � � � � � � � � �383

Analyzing Protein Interaction Networks Using Structural InformationChristina Kiel, Pedro Beltrao, and Luis Serrano � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �415

Integrating Diverse Data for Structure Determinationof Macromolecular AssembliesFrank Alber, Friedrich Förster, Dmitry Korkin, Maya Topf, and Andrej Sali � � � � � � � �443

From the Determination of Complex Reaction Mechanismsto Systems BiologyJohn Ross � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �479

Biochemistry and Physiology of Mammalian SecretedPhospholipases A2

Gerard Lambeau and Michael H. Gelb � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �495

Glycosyltransferases: Structures, Functions, and MechanismsL.L. Lairson, B. Henrissat, G.J. Davies, and S.G. Withers � � � � � � � � � � � � � � � � � � � � � � � � � � �521

Structural Biology of the Tumor Suppressor p53Andreas C. Joerger and Alan R. Fersht � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �557

Toward a Biomechanical Understanding of Whole Bacterial CellsDylan M. Morris and Grant J. Jensen � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �583

How Does Synaptotagmin Trigger Neurotransmitter Release?Edwin R. Chapman � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �615

Protein Translocation Across the Bacterial Cytoplasmic MembraneArnold J.M. Driessen and Nico Nouwen � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �643

vi Contents

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

AR345-FM ARI 7 May 2008 14:43

Maturation of Iron-Sulfur Proteins in Eukaryotes: Mechanisms,Connected Processes, and DiseasesRoland Lill and Ulrich Mühlenhoff � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �669

CFTR Function and Prospects for TherapyJohn R. Riordan � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �701

Aging and Survival: The Genetics of Life Span Extensionby Dietary RestrictionWilliam Mair and Andrew Dillin � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �727

Cellular Defenses against Superoxide and Hydrogen PeroxideJames A. Imlay � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �755

Toward a Control Theory Analysis of AgingMichael P. Murphy and Linda Partridge � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �777

Indexes

Cumulative Index of Contributing Authors, Volumes 73–77 � � � � � � � � � � � � � � � � � � � � � � � �799

Cumulative Index of Chapter Titles, Volumes 73–77 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �803

Errata

An online log of corrections to Annual Review of Biochemistry articles may be foundat http://biochem.annualreviews.org/errata.shtml

Contents vii

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

15-4

41. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsita

t Pom

peu

Fabr

a on

06/

05/0

8. F

or p

erso

nal u

se o

nly.

analyzing protein interaction networks using structural

Documents