approximate inference and side-chain prediction

23
Approximate Inference and Side-chain Prediction Chen Yanover and Yair Weiss School of Computer Science and Engineering The Hebrew University of Jerusalem 91904 Jerusalem, Israel {cheny,yweiss}@cs.huji.ac.il Abstract Side-chain prediction is an important subtask in the protein-folding problem. We show that finding a minimal energy side-chain configu- ration is equivalent to performing inference in an undirected graphical model. The graphical model is relatively sparse yet has many cycles. We used this equivalence to assess the performance of approximate inference algorithms in a real-world setting. Specifically, we were in- terested in two questions: (1) which approximate inference algorithms give superior performance and (2) how does this performance compare to the state-of-the-art in computational biology. We looked at three tasks in side-chain graphical models — finding the minimal energy configuration, finding the M best configurations and approximating the free energy and conformational entropy. In all three subtasks we found that belief propagation gave the best re- sults among the approximate inference algorithms and in many cases it outperformed the state-of-the-art in algorithms developed in the computational biology field. 1

Upload: others

Post on 09-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Approximate Inference and Side-chain Prediction

Approximate Inference and Side-chain

Prediction

Chen Yanover and Yair WeissSchool of Computer Science and Engineering

The Hebrew University of Jerusalem91904 Jerusalem, Israel

{cheny,yweiss}@cs.huji.ac.il

Abstract

Side-chain prediction is an important subtask in the protein-foldingproblem. We show that finding a minimal energy side-chain configu-ration is equivalent to performing inference in an undirected graphicalmodel. The graphical model is relatively sparse yet has many cycles.We used this equivalence to assess the performance of approximateinference algorithms in a real-world setting. Specifically, we were in-terested in two questions: (1) which approximate inference algorithmsgive superior performance and (2) how does this performance compareto the state-of-the-art in computational biology.

We looked at three tasks in side-chain graphical models — findingthe minimal energy configuration, finding the M best configurationsand approximating the free energy and conformational entropy. Inall three subtasks we found that belief propagation gave the best re-sults among the approximate inference algorithms and in many casesit outperformed the state-of-the-art in algorithms developed in thecomputational biology field.

1

Page 2: Approximate Inference and Side-chain Prediction

1 Introduction

Inference in graphical models scales exponentially with the number of vari-ables. Since many real-world applications involve hundreds of variables, ithas been impossible to utilize the powerful mechanism of probabilistic in-ference in these applications. Despite the significant progress achieved inapproximate inference, some practical questions still remain open — it is notyet known which algorithm to use for a given problem nor is it understoodwhat are the advantages and disadvantages of each technique. We addressthese questions in the context of real-world protein-folding application — theside-chain prediction problem.

Predicting side-chains conformation given the backbone structure is acentral problem in protein-folding and molecular design. It arises both inab-initio protein-folding (where often the task is separated into an initialbackbone estimation stage and a second side-chain placement stage [Huanget al., 1998]) and in homology modeling schemes where the backbone is as-sumed to be conserved among the homologs but the side-chains configurationneeds to be found.

Side-chain prediction has several attractive properties as a test-bed for ap-proximate inference algorithms. First, for many applications it is not enoughto calculate a single, most-probable side-chain configuration but rather weare interested in the distribution of low-energy configurations. This providesan opportunity for testing approximate inference algorithms for finding theM most probable states and for evaluating the partition function. Second,due to the importance of this problem in computational biology, a large num-ber of special purpose algorithms have been developed [Leach and Lemon,1998, Kuhlman and Baker, 2000, Canutescu et al., 2003]. This provides anopportunity for comparing general purpose approximate inference algorithmsto state-of-the-art algorithms developed specifically for this task.

In this paper we empirically assess the performance of various approx-imate inference algorithms – belief propagation, mean field approximationand Gibbs sampling – on the side-chain prediction task. We compare thisperformance to special purpose programs (SCWRL [Canutescu et al., 2003]and Rosetta [Kuhlman and Baker, 2000]) and algorithms (dead end elimi-nation [Goldstein, 1994] and A∗ [Leach and Lemon, 1998]) for these tasks.Our results indicate that belief propagation performs remarkably well on thistask and that in many cases it outperforms the state-of-the-art.

2

Page 3: Approximate Inference and Side-chain Prediction

Figure 1: Cow actin binding protein (PDB code 1pne, left) and a closerview of its 6 C-terminal residues (center). Given the protein backbone(black) and the amino acid sequence, native side-chain conformation (gray)is searched for. Problem representation as a graphical model for those Cterminal residues shown in the right hand figure (nodes located at Cα atompositions, edges drawn in black).

1.1 Side-chain Prediction

Proteins are chains of simpler molecules called amino acids. All amino acidshave a common structure — a central carbon atom (Cα) to which a hydrogenatom, an amino group (NH2) and a carboxyl group (COOH) are bonded. Inaddition, each amino acid has a chemical group called the side-chain, boundto Cα. This group distinguishes one amino acid from another. Amino acidsare joined end to end during protein synthesis by the formation of peptidebonds. An amino acid unit in a protein is called a residue. The formation ofa succession of peptide bonds generates the backbone (consisting of Cα andits adjacent atoms – N and CO – of each reside), upon which the side-chainsare hanged (Figure 1).

The side-chain prediction problem is defined as follows: given the 3 di-mensional structure of the backbone we wish to predict the placements ofthe side-chains. This problem is considered of central importance in protein-folding and molecular design and has been tackled extensively using a widevariety of methods. Typically, an energy function is defined over a discretiza-tion of the side-chain angles and search algorithms are used to find the globalminimum. Even when the energy function contains only pairwise interac-tions, the configuration space grows exponentially and it can be shown thatthe prediction problem is NP-complete [Fraenkel, 1997,Pierce and Winfree,2002].

3

Page 4: Approximate Inference and Side-chain Prediction

Formally, we denote by Ri, 1 ≤ i ≤ N , the set of energetically preferredconformations (termed rotamers) that each residue can adopt. We wish tominimize an energy function that is typically defined in terms of pairwiseinteractions among nearby residues, Ei,j(ri, rj), and interactions between aresidue and the backbone, Ei(ri):

E(r) =∑

i,j

Ei,j(ri, rj) +∑

i

Ei(ri), (1)

where r = (r1, ..., rN ), ri ∈ Ri, denotes an assignment of rotamers for allresidues.

Since we have a discrete optimization problem and the energy function is asum of pairwise interactions, we can transform the problem into a graphicalmodel with pairwise potentials. Each node corresponds to a residue, andthe state of each node represents the configuration of the side-chain of thatresidue. Then, the probability of a given configuration, P (r) is:

P (r) =1

Ze−

1

TE(r) =

1

Ze−

1

T

iEi(ri)+

i,jEi,j(ri,rj)

=1

Z

i

Ψi(ri)∏

i,j

Ψi,j(ri, rj) (2)

where Z is an explicit normalization factor and T is the system ”temperature”(used as a free parameter). The local potential Ψi(ri) takes into account theprior probability of the rotamer pi(ri) (taken from the rotamer library) andthe energy of the interactions between that rotamer and the backbone:

Ψi(ri) = pi(ri)e− 1

TEi(ri) (3)

Equation 2 requires multiplying Ψi,j for all pairs of residues i, j. However,since most energy functions (see for example equation 13) give zero energy foratoms that are sufficiently far away, we only need to calculate the pairwiseinteractions for nearby residues. To define the topology of the undirectedgraph, we examine all pairs of residues i, j and check whether there exists anassignment ri ∈ Ri, rj ∈ Rj for which the energy is nonzero. If it exists, weconnect nodes i and j in the graph and set the potential to be:

Ψi,j(ri, rj) = e−1

TEi,j(ri,rj) (4)

4

Page 5: Approximate Inference and Side-chain Prediction

Figure 1 shows a subgraph of the undirected graph. The graph is rela-tively sparse (each node is connected to nodes that are close in 3D space)but contains many small loops. A typical protein in the data set gives riseto a model with hundreds of loops of size 3.

For each protein, we have built two representing graphical models: (1)using the approximated SCWRL energy function [Dunbrack and Kurplus,1993] and (2) the more elaborate energy function used in the Rosetta program[Kuhlman and Baker, 2000]. In both cases we used Dunbrack and Karplus’sbackbone dependent rotamer library to define the up to 81 configurations foreach side-chain.

2 Algorithms

We are interested in algorithms for the following tasks:

• Finding the minimal energy configuration.

• Finding the M best configurations.

• Finding the partition function Z defined by Z =∑

re−E(r)/T , or, equiv-

alently, calculating the protein free energy and conformational entropy.

For each of these tasks, we compared the performance of approximate infer-ence algorithms and state-of-the-art, biologically motivated algorithms.

2.1 Finding the best configuration

For the task of finding the minimal energy configuration, the approximatealgorithms we tested were max-product belief propagation (BP), naive meanfield (MF) and a greedy search. The biologically motivated algorithms usedwere dead end elimination (DEE) [Desmet et al., 1992, Goldstein, 1994],SCWRL heuristic search [Dunbrack and Kurplus, 1993] and the Monte Carlosimulated annealing process used in the Rosetta program [Kuhlman andBaker, 2000].

Belief propagation is an exact and efficient local message passing al-gorithm for inference in singly connected graphs [Yedidia et al., 2004]. Itsessential idea is replacing the exponential enumeration (either summation

5

Page 6: Approximate Inference and Side-chain Prediction

or maximizing) over the unobserved nodes with series of local enumerations(a process called ”elimination” or ”peeling”). Once the messages have con-verged, they can be used to approximate the most probable configurations,marginal probabilities over individual nodes, the free energy and the con-formational entropy of the distribution. Loopy BP, that is applying BP tomultiply connected graphical models, may not converge due to circulation ofmessages through the loops [Pearl, 1988]. However, many groups have re-cently reported excellent results using loopy BP as an approximate inferencealgorithm [Freeman and Pasztor, 1999,Murphy et al., 1999,Frey et al., 2001].We have used the max-product version with an asynchronous update sched-ule and allowed it to run for 2000 iterations or until numerical convergence.

The naive mean field approach tries to approximate the joint distri-bution in equation 2 as a product of independent marginals qi(ri). Themarginals qi(ri) can be found by iterating:

qi(ri) ← αΨi(ri) exp

j∈Ni

rj

qj(rj) log Ψi,j(ri, rj)

, (5)

where α denotes a normalization constant and Ni – all nodes neighboring i.We initialized qi(ri) to Ψi(ri) and chose a random update ordering for thenodes. For each protein we repeated this minimization 10 times (each timewith a different update order) and chose the local minimum that gave thelowest energy.

The greedy search algorithm sets the state of each residue to the onethat minimizes the configuration energy, until local minimum is reached. Foreach protein, we repeated this greedy minimization 10 times, each time withdifferent starting point and update order and chose the local minimum thatgave the lowest energy.

Many existing methods for minimizing E(r) are based on Dead EndElimination (DEE). This is a search algorithm that tries to reduce thesearch space until it becomes suitable for an exhaustive search. It is basedon a simple condition that identifies rotamers that cannot be members of theglobal minimum energy conformation [Desmet et al., 1992,Goldstein, 1994].If enough rotamers can be eliminated, the global minimum energy conforma-

tion can be found by an exhaustive search of the remaining rotamers.The “Goldstein DEE criterion” [Goldstein, 1994] states the following: if

6

Page 7: Approximate Inference and Side-chain Prediction

for a couple of rotamers k, l of residue i the inequality

Ei(Ri = k) − Ei(Ri = l) + (6)

+∑

j,j 6=i

mins

[Ei.j(Ri = k,Rj = s) − Ei,j(Ri = l, Rj = s)] > 0

holds true, then the rotamer k is incompatible with the global minimumenergy conformation and can, therefore, be eliminated.

The Side-Chain placement With a Rotamer Library (SCWRL)program is considered one of the leading algorithm for predicting side-chainconformations [Dunbrack and Kurplus, 1993]. It uses a van der Waals energyfunction, which approximates the repulsive portion of the Lennard-Jones 12-6potential (see also equation 13), and a heuristic search strategy. Finally, thestate-of-the-art Rosetta package uses a Monte Carlo simulated annealingprocedure to search for the lowest energy configuration.

2.2 Finding the top M configurations

For the task of finding the M best configurations we compared the configu-rations found using 3 approximate inference algorithms – Best Max MarginalFirst (BMMF) [Yanover and Weiss, 2004], Gibbs sampling and greedy search– to those obtained using 3 biologically motivated algorithms: generalizedDEE, A∗ [Leach and Lemon, 1998] and Rosetta’s simulated annealing [Kuhlmanand Baker, 2000].

The BMMF algorithm [Yanover and Weiss, 2004] provably finds the Mmost probable configurations (or, equivalently, the lowest energy configura-tions) if max-marginals can be calculated exactly. It has been shown thatthe algorithm continues to perform well when the max-marginals are ap-proximated using max-product loopy BP or GBP. Herein, we use the max-product BP algorithm to approximate the max-marginals. As competing ap-proximate inference algorithms, we used the configurations obtained duringGibbs sampling and greedy search as described above. The Gibbs samplingprocess starts in the best configuration found by max-product BP (when itconverges) and uses all subsequent samples (that is burning time set to 0,sampling interval to 1). In the simulations, we stopped the Gibbs samplingafter 50, 000 rounds, chosen so that the run time will be similar to that ofthe other (deterministic) algorithms.

7

Page 8: Approximate Inference and Side-chain Prediction

One can generalize the DEE criterion by replacing the right hand sideof equation (6) with a constant ǫ. Eliminating in this way is guaranteedto preserve all configurations whose energy is less than the minimal energyplus ǫ. Unfortunately, the generalized DEE reduces the search space far lessthan the original DEE criterion and so the algorithm is limited to relativelysmall proteins. Note also that we usually do not know a-priori which ǫ to usefor a given number of required configurations M . Leach and Lemon [Leachand Lemon, 1998] used generalized DEE to reduce the state space and thenapplied A∗ to find (in decreasing order) all side-chain configurations whoseenergy was less than the minimal energy plus ǫ. Finally, we have collected thelowest energy configurations obtained during Rosetta’s simulated annealing.

2.3 Calculating the partition function

For the task of calculating the partition function (or, equivalently, the freeenergy and the conformational entropy), we compared the Bethe [Yedidiaet al., 2004], the MF and Gibbs approximations.

Bethe approximation: We used the sum-product BP algorithm to cal-culate beliefs and pairwise beliefs, bi(ri) and bi,j(ri, rj). The Bethe free en-ergy, FBethe, is:

FBethe = UBethe − HBethe, (7)

where the Bethe average energy is given by:

UBethe =∑

i,j

ri,rj

Ei,j(ri, rj)bi,j(ri, rj) +∑

i

ri

Ei(ri)bi(ri) (8)

and the Bethe entropy is defined as:

HBethe = −∑

i,j

ri,rj

bi,j(ri, rj) ln bi,j(ri, rj) +∑

i

(di − 1)∑

ri

bi(ri) ln bi(ri) (9)

Given the independent marginals calculated using MF approximation(equation (5)), the mean field free energy, FMF , is:

FMF = UMF − HMF , (10)

where the MF average energy is given by:

UMF =∑

i,j

ri,rj

Ei,j(ri, rj)bi(ri)bj(rj) +∑

i

ri

Ei(ri)bi(ri) (11)

8

Page 9: Approximate Inference and Side-chain Prediction

and the MF entropy is defined as:

HMF = −∑

i

ri

bi(ri) ln bi(ri) (12)

Following [Leach and Lemon, 1998] we have also used the samples ob-tained during Gibbs sampling to directly evaluate the partition functionand hence the probability of each sampled configuration. Given these prob-abilities, it is straightforward to calculate the average energy and entropy.

3 Experiments

We compared the performance of the approximate and the biologically mo-tivated algorithms for the various tasks on datasets comprising 370 singlechain proteins using 2 different energy functions, as described below.

3.1 Datasets

As a dataset we used 370 X-ray crystal structures with resolution better thanor equal to 2A, R factor below 20% and mutual sequence identity less than50%. Each protein consisted of a single chain with up to 1,000 residues.Protein structures were acquired from the Protein Data Bank site1. For eachprotein, we built two representing graphical models: (1) using the approx-imated SCWRL energy function [Canutescu et al., 2003] and (2) the moreelaborate energy function used in the Rosetta program [Kuhlman and Baker,2000]. In both cases, we used Dunbrack and Karplus’s backbone dependentrotamer library [Dunbrack and Kurplus, 1993] to define up to 81 configura-tions for each side-chain.

In order to compare the performance of approximate inference algorithms,we wanted to perform exact inference on a subset of the dataset. The stan-dard algorithm for exact inference in graphical models is called the JunctionTree algorithm [Cowell, 1998,Jordan and Weiss, 2002], which we briefly re-view here. It is comprised of four steps:

1. Triangulating the graph by adding edges until the graph contains nocycles of more than 3 nodes

1http://www.rcsb.org/pdb

9

Page 10: Approximate Inference and Side-chain Prediction

2. Constructing a new clique graph from the cliques of the triangulatedgraph. The nodes of the clique graph are cliques of the triangulatedgraph and two node are connected if they have a nonzero intersection.The weight of an edge between two nodes is the cardinality of theintersection.

3. Finding the maximal spanning tree of the clique graph. This tree graphis the junction tree.

4. Performing message passing on the junction tree.

The problem of optimal triangulation of a graph is known to be NP hard,but good heuristics are often used. In practice, the most computationallyintensive part of the junction tree algorithm is the last step, where messagepassing is performed on the junction tree. This message passing step requiresmarginalizing local distributions over the cliques and the amount of compu-tation depends on the number of possible configurations of the variables inthe clique. We define a junction tree as “infeasible” when the number ofconfigurations in a clique exceeds 108.

For more than 96% of the proteins in the dataset, running junction tree(JT) on the original, full-size graphical model is infeasible. We have, there-fore, created another, reduced-size dataset in which each side-chain has only 3possible configurations, corresponding to 3 distinct χ1 values. As interactionenergy, we have used the average energy, over all (χ2, χ3, χ4) configurationswith similar χ1 values. With the SCWRL energy function, running JT wasfeasible for 223 proteins in the reduced-size dataset. Using the Rosetta en-ergy function, building a JT was possible for 25 proteins only. Note that thereduced-size models preserve the original models’ topology.

3.2 Energy functions

SCWRL energy function: SCWRL [Canutescu et al., 2003, Dunbrackand Kurplus, 1993] uses a van der Waals energy function, which approximatesthe repulsive portion of the Lennard-Jones 12-6 potential. For a pair ofatoms, a and b, the energy of interaction is given by:

E(a, b) =

0 : d > R0

−k2d

R0

+ k2 : R0 ≥ d ≥ k1R0

Emax : k1R0 > d(13)

10

Page 11: Approximate Inference and Side-chain Prediction

SCWRL Rosetta

0 20 40 600

5

10

No. of neighbors

% R

esid

ues

0 20 40 600

1

2

3

4

5

No. of neighbors

% R

esid

ues

Figure 2: Histogram of neighborhood size using SCWRL and Rosetta en-ergy functions. The latter takes into account more distant interactions and,therefore, resulted in larger neighborhoods.

where Emax = 10, k1 = 0.8254 and k2 = Emax/(1−k1). d denotes the distancebetween a and b and R0 is the sum of their radii. Constant radii were usedfor protein’s atoms (Carbon - 1.6A, Nitrogen and Oxygen - 1.3A, Sulfur -1.7A). For two sets of atoms, the interaction energy is a sum of the pairwiseatom interactions. Note that the SCWRL energy function always assignszero energy for atoms that are more than 3.4A apart, and therefore gives riseto sparse graphical models (see figure 2).

Rosetta energy function: We have used the Kuhlman and Baker full-atom potential, as implemented in Rosetta’s rotamer packing module, tocalculate the local (side-chain to backbone) and pairwise (side-chain to side-chain) interactions [Kuhlman and Baker, 2000]. The complete full-atom po-tential is comprised of: (1) the attractive portion of the 12 − 6 LennardJones potential, (2) a linear repulsive term, less repulsive than the 12 − 6Lennard-Jones potential (to compensate for the use of a fixed backbone androtamer set), (3) solvation energies calculated using the model of Lazaridisand Karplus [Lazaridis and Karplus, 1999], (4) an approximation to electro-static interactions in proteins based on PDB statistics, (5) hydrogen-bondingpotential [Kortemme et al., 2003] and (6) backbone dependent internal freeenergies of the rotamers estimated from PDB statistics. The Rosetta energyfunction accounts for distant interactions and therefore gives rise to densergraphical models (figure 2), compared to SCWRL’s.

11

Page 12: Approximate Inference and Side-chain Prediction

SCWRL Rosetta

<150 300 450 >4500

20

40

60

80

100

Protein length [aa]

Con

verg

ence

rat

e [%

]

max−BP sum−BP

<150 300 450 >4500

20

40

60

80

100

Protein length [aa]

Con

verg

ence

rat

e [%

]

max−BP sum−BP

Figure 3: Convergence rates for the max- and sum-product BP algorithms,using SCWRL and Rosetta energy functions.

3.3 BP convergence rate

For the loopy belief propagation algorithm, the sequence of messages is notguaranteed to converge to a fixed point. We therefore started with testing theconvergence rate of max- and sum-product BP on the side-chain predictionfull-size dataset, using the SCWRL and Rosetta energy functions (figure 3).As can be seen, sum-product converges more often than max-product andthe difference in convergence rate increases as the inference problems getharder (that is for larger proteins or using Rosetta’s more elaborate energyfunction). Note, however, that even max-product converges for the majorityof proteins up to 600 amino acid long, using both energy functions.

3.4 Finding the minimal energy configuration

For all models where we could find the global minimum (using DEE andthe junction tree exact inference algorithm, see below), BP always found it.Next, we compared the performance of approximate inference algorithms –BP, MF and Gibbs sampling – for the task of finding the best configuration.Figure 4 shows that BP, when it converges, always gave a lower energy.

We then compared the performance of BP and state-of-the-art algorithms.We performed DEE using the algorithm of [Goldstein, 1994] and when itresulted in a small enough search space, we used the max-junction tree al-gorithm [Cowell, 1998] to find the most likely configuration of the variables

12

Page 13: Approximate Inference and Side-chain Prediction

SCWRL Rosetta

<0 0 1 10 100 >1000

10

20

30

40

50

E(Mean field) − E(max−BP)

% P

rote

ins

<0 0 1 10 100 >1000

10

20

30

40

E(Mean field) − E(max−BP)

% P

rote

ins

<0 0 1 10 100 >1000

10

20

30

40

50

E(Greedy) − E(max−BP)

% P

rote

ins

<0 0 1 10 100 >1000

10

20

30

40

E(Greedy) − E(max−BP)

% P

rote

ins

Figure 4: The energies obtained using naive MF (top row) and greedy search(bottom row) are always higher than or equal to max-product BP energies,both for the SCWRL and Rosetta energy functions.

13

Page 14: Approximate Inference and Side-chain Prediction

SCWRL Rosetta

200 400 6000

200

400

600

Protein length [aa]

log[

sear

ch s

pace

]

DEE Original

200 400 6000

200

400

600

Protein length [aa]

log[

sear

ch s

pace

]

DEE Original

0 200 400 600

20

40

60

80

Protein length [aa]

log[

max

cliq

ue]

DEE Original

0 200 400 6000

20

40

60

Protein length [aa]

log[

max

cliq

ue]

DEE Original

Figure 5: Search space (top row) and junction tree max clique size (bottomrow) of the original side-chain prediction problem and the one obtained usingthe Goldstein DEE criterion. The dotted lines indicate the maximal feasibleclique size.

14

Page 15: Approximate Inference and Side-chain Prediction

SCWRL Rosetta

<0 0 1 10 100 >1000

10

20

30

40

50

E(SCWRL) − E(max−BP)

% P

rote

ins

<0 0 1 10 100 >1000

10

20

30

40

E(SA) − E(max−BP)

% P

rote

ins

Figure 6: The energies obtained using SCWRL and Rosetta packages arealways higher than or equal to max-product BP energies.

(and hence the global minimum of the energy function). Murphy’s implemen-tation of the JT algorithm in his BN toolbox for Matlab was used [Murphy,2001]. This method was feasible for 327 (88.4%) of the graphical modelsthat were built using the SCWRL energy function but only for 118 (31.9%)of those built using Rosetta’s energy function (see figure 5).

Finally, we compared the energies found by max-product BP to thosefound by the SCWRL and Rosetta programs using identical energy functions.Figure 6 shows that BP always found a lower energy.

3.5 Finding the M best configurations

For the task of finding the best 100 configurations, we first compared theperformance of the approximate inference algorithms – BMMF using max-product BP, Gibbs sampling and greedy search. Since finding the correct topconfigurations was infeasible for most proteins in our datasets, we sorted theconfigurations found by all these algorithms (based on their energies) andused the top 100 as the “best” ones. Figures 7 and 8 show that BMMF,when it converges, always found better configurations.

Following [Leach and Lemon, 1998], we applied the generalized DEE cri-terion (that is, replacing the right hand side of equation (6) with a constantǫ > 0). Eliminating in this way is guaranteed to preserve all configurationswhose energy is less than the minimal energy plus ǫ. Unfortunately, the gen-

15

Page 16: Approximate Inference and Side-chain Prediction

SCWRL Rosetta

20 40 60 80 100450

460

470

480

Configuration number

Ene

rgy

BMMF Gibbs Greedy

20 40 60 80 100350

360

370

380

390

Configuration numberE

nerg

y

BMMF Gibbs Greedy

Figure 7: The top 100 configurations found by loopy-BMMF compared tothose obtained using Gibbs sampling and greedy local search for a 295 amino-acid protein (PDB code 1cqw). To keep plots informative, we omitted mostof the greedy search energies.

SCWRL Rosetta

BMMF Gibbs Greedy0

20

40

60

80

100

% B

est

BMMF Gibbs Greedy0

20

40

60

80

100

% B

est

Figure 8: Average percentage of best configurations found using the BMMFalgorithm, Gibbs sampling and a greedy search, for each protein in thedataset (bars denote standard deviation).

16

Page 17: Approximate Inference and Side-chain Prediction

SCWRL Rosetta

200 400 6000

200

400

600

Protein length [aa]

log[

sear

ch s

pace

]

ε=0 ε=0.1 ε=1 Original

200 400 6000

200

400

600

Protein length [aa]

log[

sear

ch s

pace

]

ε=0 ε=0.1 ε=1 Original

Figure 9: Search space size of the original side-chain prediction problem andthose obtained using the Goldstein DEE criterion with different ǫ’s.

eralized DEE reduces the search space far less than the original DEE criterion(see figure 9). Note also that we usually do not know a-priori which ǫ to usefor a given number of required configurations. When enough rotamers areeliminated, a junction tree can be built, allowing the efficient enumeration ofall side-chain configurations within ǫ [Nilsson, 1998]. For SCWRL graphicalmodels and ǫ = 1, building a JT was possible for 81.2% of the proteins up to300 residues long and for only 33.3% of the longer proteins. For the graphicalmodels built using the Rosetta energy function and ǫ = 0.1, running JT wasfeasible only for 10.2% of the proteins shorter than 300 amino acids.

Leach and Lemon [Leach and Lemon, 1998] used a combination of 2search algorithms – generalized DEE and the A∗ heuristic search algorithm– to find the M lowest energy configurations. We compared the top 100correct configurations obtained using their algorithm to those found usingthe loopy BMMF algorithm, using BP. In all cases where A∗ was feasible,loopy BMMF always found the correct configurations. Also, the BMMFalgorithm converged more often (for example, for proteins shorter than 300amino acids and using SCWRL energy function, 96.3% compared to 76.3%,see figure 10) and ran much faster.

Finally, we compared the configurations encountered during Rosetta’ssimulated annealing procedure to those found by the BMMF algorithms.Here, again, we do not know the correct top configurations and therefore wesorted the configurations found by the algorithms (based on their energies)

17

Page 18: Approximate Inference and Side-chain Prediction

SCWRL Rosetta

BMMF DEE+JT A*0

20

40

60

80

100

% C

onve

rgen

ce

BMMF SA0

20

40

60

80

100

% B

est

Figure 10: BMMF compared to state-of-the-art algorithms for finding thebest configurations. Left: convergence rates of BMMF, generalized DEEfollowed by junction tree and the A∗ algorithm, using the SCWRL energyfunction. Right: average percentage of best configurations found using theBMMF algorithm compared to those obtained during Rosetta’s simulatedannealing process (bars denote standard deviation).

and used the top 100 as the best ones. Figure 10 shows that BMMF, whenit converges, always found better configurations.

3.6 Approximating the partition function

When running junction tree is feasible, exact algorithms can easily calculatethe free energy and conformational entropy of a given protein. As mentionedabove, the generalized DEE cannot be used for this task and therefore, run-ning JT is infeasible for all but a small number of proteins from the full-sizedataset (27 of the SCWRL graphical models and 4 of Rosetta’s). We there-fore used the reduced-size dataset and calculated the free energies using theJT algorithm. We then compared these exact values to those obtained usingBP, MF and Gibbs sampling. The results are shown in figure 11. Note thatfor the vast majority of the proteins, the free energy errors, obtained by theBethe approximation, were negligible and much smaller than those obtainedby the other methods.

18

Page 19: Approximate Inference and Side-chain Prediction

SCWRL Rosetta

<0.1 1 10 100 >1000

20

40

60

80

Free energy error

Pro

tein

s [%

]

BP MF Gibbs

100 200 300 400 5000

200

400

600

800

JT free energy

App

rox.

free

ene

rgy

sum−BP MF Gibbs

Figure 11: Left: Histogram of free energy errors, obtained by the Bethe, MFand Gibbs sampling approximations compared to the exact JT free energies,using the SCWRL energy function. Right: Free energies, calculated withBethe, MF and Gibbs sampling approximations plotted against the exact JTenergies, using the Rosetta energy function.

3.7 Prediction success

In comparing the performance of the algorithms, we have focused on theenergy of the found configuration, since this is the quantity the algorithmsseek to optimize. A more realistic performance measure is: how well do thealgorithms predict the native structure of the protein? This measure canbe quantified by using the success rate of the algorithms in predicting thenative structure. The dihedral angle χi is deemed correct when it is within30◦ of the native (crystal) structure and χ1 to χi−1 are correct. Success rateis defined as the portion of correctly predicted dihedral angles.

We compiled a new version of the Rosetta program, that uses BP to findthe minimal energy side-chain configuration, instead of simulated anneal-ing (unless BP fails to converge, when it re-runs simulated annealing). Wethen used the RosettaDesign Benchmark web-server2, to compare the successrates obtained using simulated annealing (as implemented in Rosetta orig-inal version) and max-product BP (implemented in our modified version).The RosettaDesign Benchmark web-server runs a given Rosetta executablefile on a benchmark of 276 proteins (64397 residues) and outputs the χ1 and

2http://benchmark.med.unc.edu/

19

Page 20: Approximate Inference and Side-chain Prediction

χ1 success rate (χ1, χ2) success rate

V F P M I L D E K R S T Y H C N Q WAll

70

75

80

85

90

Suc

cess

rat

e [%

]

BP SA

F M I L D E K R Y H N Q W All45

50

55

60

65

70

75

80

Suc

cess

rat

e [%

]

BP SA

Figure 12: Max-product BP and simulated annealing χ1 (left) and (χ1, χ2)(right) success rates, as reported by the RosettaDesign Benchmark web-server. As may be seen, on average, BP performed slightly better thansimulated annealing.

(χ1, χ2) success rates for each amino acid. Figure 12 shows χ1 and (χ1, χ2)success rates, obtained using simulated annealing and BP. As can be seen,on average, max-product BP performed only slightly better than simulatedannealing. This is despite the fact that BP always found a lower energyconfiguration.

4 Discussion

Recent years have shown much progress in approximate inference. We believethat the comparison of different approximate inference algorithms is bestdone in the context of a real-world problem. In this paper we have shownthat for a real-world problem with many loops, the performance of beliefpropagation is excellent. In problems where exact inference was possible,max-product BP always found the global minimum of the energy function,BMMF (using max-product BP) obtained the correct top configurations andsum-product BP best approximated the partition function. In cases whereexact inference was infeasible, max-product BP always found lower energyconfigurations compared to the other algorithms tested.

20

Page 21: Approximate Inference and Side-chain Prediction

As shown, by using inference algorithms we achieved low energy con-formations, compared to existing algorithms. However, this leads only toa modest increase in prediction accuracy. Using an energy function, whichgives a better approximation to the “true” physical energy (and particularly,assigns lowest energy to the native structure) should significantly improvethe success rate. A promising direction for future research is to try and learnthe energy function from examples. Inference algorithms such as BP mayplay an important role in the learning procedure.

References

Canutescu, A. A., Shelenkov, A. A., and Dunbrack, Roland L., J. (2003). Agraph-theory algorithm for rapid protein side-chain prediction. Protein

Sci, 12(9):2001–2014.

Cowell, R. (1998). Advanced inference in Bayesian networks. In Jordan, M.,editor, Learning in Graphical Models. MIT Press.

Desmet, J., Maeyer, M. D., Hazes, B., and Lasters, I. (1992). The dead-endelmination theorem and its use in protein side-chain positioning. Nature,356:539–542.

Dunbrack, Jr., R. L. and Kurplus, M. (1993). Back-bone dependent rotamerlibrary for proteins: Application to side-chain predicrtion. J. Mol. Biol,230:543–574.

Fraenkel, A. S. (1997). Protein folding, spin glass and computational com-plexity. In Proceedings of the 3rd DIMACS Workshop on DNA Based

Computers, held at the University of Pennsylvania, June 23 – 25, 1997,pages 175–191.

Freeman, W. and Pasztor, E. (1999). Learning to estimate scenes from im-ages. In Kearns, M., Solla, S., and Cohn, D., editors, Adv. Neural

Information Processing Systems 11. MIT Press.

Frey, B., Koetter, R., and Petrovic, N. (2001). Very loopy belief propagationfor unwrapping phase images. In Adv. Neural Information Processing

Systems 14. MIT Press.

21

Page 22: Approximate Inference and Side-chain Prediction

Goldstein, R. F. (1994). Efficient rotamer elimination applied to proteinside-chains and related spin glasses. Biophys. J., 66(5):1335–1340.

Huang, E. S., Koehl, P., Levitt, M., Pappu, R. V., and Ponder, J. W. (1998).Accuracy of side-chain prediction upon near-native protein backbonesgenerated by ab initio folding methods. Proteins, 33(2):204–217.

Jordan, M. and Weiss, Y. (2002). Graphical models: probabilistic inference.In Arbib, M. A., editor, Handbook of Neural Networks and Brain Theory.MIT Press, 2nd edition.

Kortemme, T., Morozov, A. V., and Baker, D. (2003). An orientation-dependent hydrogen bonding potential improves prediction of specificityand structure for proteins and protein-protein complexes. Journal of

Molecular Biology, 326(4):1239–1259.

Kuhlman, B. and Baker, D. (2000). Native protein sequences are close tooptimal for their structures. PNAS, 97(19):10383–10388.

Lazaridis, T. and Karplus, M. (1999). Effective energy function for proteinsin solution. Proteins: Structure, Function, and Genetics, 35(2):133–152.

Leach, A. R. and Lemon, A. P. (1998). Exploring the conformational spaceof protein side chains using dead-end elimination and the A* algorithm.Proteins: Structure, Function, and Genetics, 33(2):227–239.

Murphy, K. (2001). The bayes net toolbox for matlab. Computing Science

and Statistics, 33.

Murphy, K., Weiss, Y., and Jordan, M. (1999). Loopy belief propagation forapproximate inference: an empirical study. In Proceedings of Uncertainty

in AI.

Nilsson, D. (1998). An efficient algorithm for finding the M most probableconfigurations in probabilistic expert systems. Statistics and Computing,8:159–173.

Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of

Plausible Inference. Morgan Kaufmann.

22

Page 23: Approximate Inference and Side-chain Prediction

Pierce, N. A. and Winfree, E. (2002). Protein Design is NP-hard. Protein

Eng., 15(10):779–782.

Yanover, C. and Weiss, Y. (2004). Finding the m most probable config-urations in arbitrary graphical models. In Thrun, S., Saul, L., andScholkopf, B., editors, Advances in Neural Information Processing Sys-

tems 16. MIT Press, Cambridge, MA.

Yedidia, J., Freeman, W., and Weiss, Y. (2004). Constructing free energyapproximations and generalized belief propagation algorithms. TechnicalReport 2004-040, MERL. http://www.merl.com/reports/docs/TR2004-040.pdf.

23