suchendra m. bhandarkar and jinling huang j onathan...

Suchendra M. Bhandarkar and Jinling HuangDepartment of Computer Science

The University of GeorgiaAthens, Georgia 30602-7404, USA

suchi@cs. uga.edu, jinling@cs. uga.edu

J onathan ArnoldDepartment of Genetics

The U niversity of GeorgiaAthens, Georgia 30602-7223, USA

arnold @arches. uga.edu

Abstract

Table 1: An example of clone-probe hybridization datain the absence of errors

Reconstructing a physical map of a chromosome

from a genomic library presents a central computa-

tional problem in genetics. Physical map reconstruc-

tion in the presence of errors is a problem of high com-

putational complexity. Parallel Monte Carlo methods

for a maximum likelihood estimation-based approach

to physical map reconstruction are presented. The es-

timation procedure entails gradient descent search for

determining the optimal spacings between probes for

a given probe ordering. The optimal probe ordering

is determined using a simulated Monte Carlo algo-

rithm. A two-tier parallelization strategy is proposed

wherein the gradient descent search is parallelized at

the lower level and the simulated Monte Carlo algo-

rithm is simultaneously parallelized at the higher level.

Implementation and experimental results on a network

of shared-memory symmetric multiprocessors (SMPs)are presented.

Introduction1

Figure 1: An example of clone-probe ordering along achromosomeGeneration of entire chromosomal maps is a cen-

tral problem in genetics. Chromosomal maps fall intotwo broad categories -genetic maps and physica/ maps.Genetic maps are typically of low resolution (1-10 mil-lion base pairs (Mb) ) and represent an ordering ofgenetic markers along a chromosome where the dis-tance between two genetic markers is related to theirrecombination frequency. A physical map is an or-dering of distinguishable DNA fragments called c/onesor contigs by their position along the entire chromo-some where the clones mayor may not contain geneticmarkers. A physical map has a much higher resolution(10-100 thousand base pairs (Kb )) than a genetic mapof the same chromosome. While genetic maps enablea scientist to narrow the search for genes to a par-ticular chromosomal region, it is a physical map that

ultimately allows the recovery and molecular manipu-lation of genes of interest.

The physical mapping protocol essentially deter-mines the nature of clonal data and the probe selec-tion procedure. The physical mapping protocol usedin this project is the one based on sampling withoutreplacement [5]. Under this protocol, a maximal set pof non-overlapping equal-length clones from a libraryis selected as the probe set. The remaining clones Cin the library are hybridized to the probe set result-ing in a digital hybridization signature for each clone.The clone-probe overlap pattern is represented by abinary hybridization matrix H where H ij = 1 if the

Proceedings of the IEEE Computer Society Bioinformatics Conference (CSB’02) 0-7695-1653-X/02 $17.00 © 2002 IEEE

ith clone hybridizes to the jth probe and Hij = 0 oth-erwise (Table 1). If the probes in p are ordered withrespect to their position along a chromosome, then byselecting from H a common overlapping clone for eachpair of adjacent probes, a minimal set of clones andprobes that covers the entire chromosome (i.e., a min-imal tiling) can be obtained (Figure 1). The minimaltiling in conjunction with the sequencing of each indi-vidual clone/probe in the tiling and a sequence assem-bly procedure that determines the overlaps betweensuccessive sequenced clones/probes in the tiling [10]can then be used to reconstruct the DNA sequenceof the entire chromosome. In reality, H could be ex-pected to contain false positives and false negatives.Hij would be a false positive if Hij = 1 when in factHij = 0. Conversely, Hij would be a false negative ifH ij = 0 when in fact H ij = 1. In this paper, we con-fine ourselves to errors in the form of false positivesand false negatives.

In this paper we briefly describe a maximum like-lihood (ML) estimator proposed in [3, 11, 18] whichdetermines the ordering of probes in the probe set pand also the inter-probe spacings under a probabilis-tic model of hybridization errors consisting of falsepositives and false negatives. The estimation proce-dure involves a combination of discrete and contin-uous optimization where determining the probe or-dering entails discrete (i.e., combinatorial) optimiza-tion whereas determining the inter-probe spacings fora particular probe ordering entails continuous opti-mization. We propose a two-tier parallelization strat-egy for efficient implementation of the above estima-tor. The upper level comprises of parallel discreteoptimization using simulated Monte Carlo methodswhereas the lower level comprises of parallel conjugategradient descent. The resulting parallel algorithmsare implemented on a network of shared-memory sym-metric multiprocessors (SMPs) using a combination ofthe Message Passing Interface (MPI) environment [16]and multi-threaded programming [2]. Convergence,speedup and scalability characteristics of the parallelalgorithms are analyzed and discussed.

as follows. Given a set p = {Pl,P2,...'Pn} of nprobes and a set C = {G1, G2, ..., Gk} ofk clones gen-erated using the sampling-without-replacement pro-tocol and the k x n clone-probe hybridization matrixH containing both false positives and false negativeswith predefined probabilities, reconstruct the correctordering II = ( 7rl , 7r2, ..., 7r n ) of the probes and alsothe correct spacing y = (Yl,Y2,...'Yn) between theprobes. The ordering II is a permutation of (1, ..., n)that gives the labels (indices) of the probes in left-to-right order across the chromosome. In the inter-probe spacing vector Y, Yl denotes the space betweenthe left end of the first probe P.-1 and the left end ofthe chromosome, and 1'; the spacing between the rightend of probe P 1 and and the left end of probe P.-i(where 2.$: i .$: n). The spacing between the right endof probe P.-n and the right end of the chromosomeis given by Y n+l = N -nM -l:::~l 1'; where N islength of the chromosome and M is the length of eachclone/probe. Recall that our protocol requires that allprobes and clones be of the same length.

The problem as stated above is ill-posed since theunderlying constraints do not imply a unique solution.Hence the problem is formulated as one of determin-ing a probe ordering and the inter-probe spacings thatmaximize the likelihood of the observed hybridizationmatrix H given predefined probabilities for false pos-itives and false negatives.

2.1 Mathematical N otation

2 Mathematical FormulationML Estimator

of the

The ML estimator reconstructs the ordering ofprobes in the probe set p and the inter-probe spac-ings under a probabilistic model of hybridization er-rors consisting of false positives and false negatives.The probe ordering problem can be formally stated

The mathematical notation used in the formulationof the ML estimator is given below:N : Length of the chromosome,M : Length of a clone/probe,n : Number of probes,k : Number of clones,p : Probability of false positive,77 : Probability of false negative,H = ((hi,j))l~i~k,l~j~n: clone-probe hybridization

matrix,whereh. .- { 1 if clone Ci hybridizes with probe Pj

',1 -O otherwise,

H i: ith row of the hybridization matrix,n = (7r1,.",7rn): permutation of{1,2"",n} whichdenotes the probe labels in the ordering when scannedfrom left to right along the chromosome,Pi = )::::j=l hi,j : number of 1 's in Hi ,

P = ):::::=1 Pi: total number of 1 's in H, andy = (Y1, Y2, ' , , , y n): vector of inter-clone spacingswhere Yi is the spacing between the right end of P7fi-land the left end of P7fi (2 ~ i ~ n), and Y1 is the


and for j = 0, ,n,

Figure 2: Inter-probe spacings: Type 1

P'i v, "., I ~I .., I~ ~~~ ~

M.Y, ~.M~ + Y2 M-Y, " +'i Y,M

Figure 3: Inter-probe spacings: Type 2

spacing between the left end of P7rl and the left end

of the chromosome, andF ~ nn: set of feasible interprobe spacings Y =

{Yl,...,Yn} such that Yi ~ 0, 1 ::;: i::;: n andN -nM -'""":' Yi > °L =l .-.

I(RN(P7fj» = Yj+l -min(Yj+l, M) (3)

We assume that the left ends of the clones are uni-formly distributed over the interval [0, N -M]. There-fore it can be shown that for j = 1,. ..,n -1, theprobability PBoth that a randomly chosen clone willfall in the region RB(P7fj' P7fj+l) is given by:

P -M -min(Yj+l, M) (4)Both -N -M

for j = 1, ..., n the probability POnly that a randomlychosen clone will fall in the region RO(Prj ) is given by:

P; -min(Yj , M) + min(Yj+l, M)( )Only -N -M 5

and for j = 0, ..., n the probability PNone that a ran-domly chosen clone will fall in the region RN(P7fj) is

given by [3, 18]:The ML Model2.2

P -Yj+l -min(Yj+l, M)(6)None -N- M

Let Gi,j be the event that the clone i will fall in theregion Ro ( P" j ) ; Bi ,j the event that the clone i will fallin the region RB(P"j' P"j+l) and Ni,j the event thatthe clone i will fall in the region RN(P".j). Then theconditional probability of observing a clonal signatureHi (i.e., the ith row in H) given a probe ordering IIand an inter-probe spacing vector Y is given by

P(Hi I II,Y) =n

L P(Hi I ll, Y, Oi,j)P(Oi,j I ll, Y) +j=ln-l

L P(Hi I ll,Y, Bi,j)P(Bi,j I ll,Y) +j=ln

L P(Hi I ll, Y, Ni,j )P(Ni,j I ll, Y) (7)j=O

Given ll, Y and Oi,j, implies that only hi,7rj = 1 andall the remaining entries in row Hi should be = 0. Inother words, hi,7rj # 1 implies a false negative and a 1in any other column position in the row Hi implies afalse positive. That is,

Given a vector of inter-probe spacings y(Yl, ...,Yn), there are 2n+l possible cases to considerdepending on whether 0 ~ Yi ~ M or Yi > M where0 ~ i ~ n + 1. It can be shown that the 2n+l casescan be analyzed based on the clone-probe overlap pat-tern [3, 18]. In general, the clone-probe overlap pat-tern results in three different types of regions namely,Type I: The Both region RB(P.-j, P.-j+l) betweenprobes P.-j and P.-j+l' forj = 1,...,n-l. An inter-vening clone hybridizes to both probes if its left endfalls in this region (Figure 2).Type 2: The On/y region Ro(P.-j) of probe P.-j, forj = 1, ..., n. A clone will hybridize to P.-j only if itsleft end falls in this region (Figure 3).Type 3: The None region RN(P.-j) after probe P.-j ,for j = 0, ..., n. A clone will hybridize to no probe ifits left end falls in this region. Here probe P.-o denotesthe beginning of the chromosome (Figure 4) .Let /(R) denote the length of region R. It can beshown that for j = 1, ..., n -1,

l(RB(P7rj, Prj+l)) = M -min(Y;+l, M) (1)

and for j = 1 , n

l{Ro{P7rj» = min{Yj, M) + min{Yj+l , M) (2)

hi,7rj = { ~ (8)'.. ~ P.,~~~~~~~~~~~~~~~

~.~ I M-Y,I

~

with probability 1]

with probability (1 -1])

M+ '2 M-Y, 0+ Y, n where k # jand for k = 1 ,

hi,..k = { ~ with probability (1- p)

with probability p. (9)Figure 4: Inter-probe spacings: Type 3


Ci 1"PPi(l- p)(n-Pi)

and

We assume that the false positive and false negative

errors at different positions along the clonal signature

H i are independent of each other. Hence

P(Hi I ll,Y,Oi,j) =

(1 -17)hi.~j .17(l-hi,~j) .

p(Pi-hi.~j) .(1- p)(n-l)-(Pi-hi,~j) (10)

(n-l)Ri = N -nM + M L ai,1Cjai,1Cj+l (17)

j=l

The goal therefore is to determine II and ¥ that max-imize P(H I ll, ¥) as given in equation (14), that isdetermine (fi, Y) where

(fi, Y) = arg max P(H I ll, ¥) (18)(II,Y)

Following the same argument we can show that

P(Hi III, Y, Bi,j ) =

(1-1J)(hi",j+hi",j+l) .1J(2-hi",j-hi",j+l) .

p(Pi-hi",j -hi,..j+l ) .(1- p)(n-2)-(Pi-hi",j -hi,..j+l )

(11)

Alternatively we could consider the negative log-likelihood function f(ll, Y) given by

f(II,Y) = -lnP(H llI,Y)and

P(Hi I n, Y, Ni,j ) = rI; .(1- p)(n-p;) (12) Since In x is a monotonically increasing function of x

for all x > 0, it follows thatHence we get

(fl, ¥) = arg max P(H I ll, Y) = arg mill f(ll, y(n,Y) (n,Y)

P(Hi I

n

ll, Y) = L [(1- 7])hi",j .7](l-hi",j)

j=l

p(Pi-hi",j) .(1- p)(n-l)-(Pi-h;,~j) .

min(Yj, M) + min(Yj+l, M)]N-M +

2.3 Computation of the ML Estimate

Computing the values of tI and y involves a two

stage procedure:Stage 1: We first determine the optimal spacing Ynfor a given probe ordering II i.e., determine Yn =(Y1, ..., Yn} such that for a given ll,

f(lI,Yn) = minf(lI,¥) = min/n(¥) (21)y y

Here the minimum is taken over all feasible solutions¥ that satisfy the constraints Yi ;::: 0; i = 1, ..., n and

}:=7=1 1"; ~ N -nM .

Stage 2: We determine ft for which,

n-l

L [(l-17)(hi",j+hi",j+l) , 17(2-hi",j-hi",j+l).

j=l

p(Pi-hi",j -hi",j+l) .M -min(Y;+l , M)

] +N-M

t

[ppi .(1- p)(n-Pi) .Y;+1 -min(Y;+l, M)

]j=O N- M

(13)

We assume that the clones E C are independently dis-tributed along the chromosome i.e., each row of H isindependent of the other rows. Hence P(H I n, Y) =

n:=l P(H; I n, Y) which gives us

j(fi,~ n') = minj(ll,Yn) = mint y' (ll)n n n

Here the minimum is taken over all n where n is a

permutation of { 1, ..., n} .The resulting values of fi

and y ii are termed the ML estimates of the true probe

ordering and the inter-probe spacings, respectively.P(H I

k

II,Y) = IICi {Ri-

i=l

n+l

L(ai,1rjj=l

-1 )( ai,1rj-l -1 )min(}j , M) (14) 2.3.1 Computation of Yn

It can be shown that for a given probe ordering ll,fu(Y) is convex in a finite number of closed regions in:F ~ nn and therefore possesses a unique local mini-mum which is also a global minimum [3, 18]. A region1J ~ nn is deemed to be convex if for any pair of pointsp, q E 1J, all points along the line segment ap+(l-a)q

where

~li=.!l.i

p0

if h;,j = O and j = 1,

ifh;,j = 1 and j = 1,

otherwise,

,n

,n (15)ai,j -


E V where O ::;: a::;: 1. A function h : V f-+ R defined

on a convex set V is deemed convex if for all points

ap+ (1- a)q E V where 0::;: a::;: 1, h (ap + (1- a)q)

::;: ah(p)+(I-a)h(q). Alternatively, if ~h ?: O along

the line segment ap + ( 1 -a )q , then function h can

shown to satisfy the above condition for convexity [18].

Furthermore, a region V ~ :F is considered good if for

all Y E V, Yi # M, 1::;: i::;: n + 1. The significance of

a good region is that /n(Y) is differentiable within it.

The objective function /n(Y) can be expressed as

Consider the four disjoint subregions:!" +1,+1

:!"+1,-1, :!"-1,+1 and :!"-1,-1 within:!" where

J:"a,b

~-{Y E :F : aYl ~ aM ;

Yi ~ M,2 ~ i ~ N; bYn+l ~ bM} (29)

k/n(Y) = c- Llnfi(Y

i=l(23)

where

/;(y

n+l

= ~ -L(ai,..j -l)(ai,..j-l

j=l

-l)Yj (24)

Consider a good convex region V ~ F where ¥j # Mfor 1 ~ j ~ n. Consider all points y = p + sV for

s > O which lie on a ray originating at a given point

p E C in the direction V. In C the derivative of f

along the ray is given by

Each of these regions is convex since they result fromthe intersection of half spaces. Also, since the deriva-tive of fn(Y) is defined in the interior of each subre-gion, each subregion is good. Note that we can definethe derivative on the boundary of each subregion F a,b,a,b E {-1,+1}, based on the direction in which theboundary point is approached. Thus by selecting astarting point in each of the subregions ( or as manysubregions as possible without violating any feasibil-ity constraints), one can compute a local minimum forfn(Y) in each of the subregions and select the mini-mum of these local minima to be the global minimumof /n(Y) [11].

The local minimum of /n(Y) in each of the afore-mentioned four disjoint subregions within F can bereached using continuous local search-based tech-niques such as the steepest descent search [8]. Thesteepest descent search is a simple iterative procedurewhich consists of three steps: (i) Determine the ini-tial value of Y, (ii) Compute the downhill gradient atY and (iii) Update the current value of Y using thecomputed value of the downhill gradient. Steps (ii)and (iii) are repeated until the gradient vanishes, orin practice, until the gradient magnitude is less thana prespecified threshold. The local downhill gradientis given by:

d ' 1 dd:;fn(Y) ~ -~ h(\')d:;h(Y

(25)

where

-l)(a;,1rj-l-l)I(Yj) -=

(26) - ( Of(n, Y)OYl ' ...,

(U1,...,Un) Iy=y

U Iy=y

-vf(n,Y) =

and I(x) is a unit step function defined as

1 if x < M ,O if x > M,undefined if x = M .

(30)

~

l(x) =

The current value of f = fold is updated by mov-ing along the downhill gradient direction U. The newvalue of f = f new is given by f new = f old + sU. Theproblem, therefore, is to find an optimal value of s,say s" such that

Using the fact that b fi(Y) = O along the ray, it:an be shown that

d2>0 (28)

(31)

This implies that /n(Y) is convex in every good convexregion V and therefore possesses a unique local mini-mum which is also a global minimum. Consequently,this minimum can be reached using continuous localsearch-based techniques such as steepest descent orconjugate gradient descent [4, 12, 17] .

Having obtained the value of s* , then the new inter-probe spacings are given by y new = Yold + s* U .

To determine an optimal value of s = s* we ex-ploit the convexity of fu(Y) which implies that thelocal optimum for s is also a global optimum. Us-ing the constraints that the spacings are non-negative,

j=l

(ai,1rn -l)I(Yn+l»

kdsifn(Y) = L

i=l


SA algorithm, starting from an initial solution, gener-ates in the limit, an ergodic Markov chain of solutionstates which asymptotically converges to a stationaryBoltzmann distribution [1]. The Boltzmann distribu-tion asymptotically converges to a globally optimalsolution when subject to the annealing process [7].

the clones and probes are of fixed length and the to-tal length of the chromosome is fixed, we compute theupper and lower bounds on the values of s and usethe bisection method [17] to find the optimal valueof s = s* .If any of the boundary conditions (rep-resented as hyperplanes) on the Yi's for i = 1, ...I nare violated, the gradient vector U is projected ontothe admissible region which is represented as the in-tersection of the k hyperplanes corresponding to the kviolated constraints [4] .The minimization procedurethen proceeds along the projected gradient directionUproj instead of U. In the limiting case when k = n,the minimization procedure has reached an extremalvertex of the admissible region and Uproj = 0. Inthis case, the extremal vertex is the desired minimumwithin the admissible region. Thus the minimizationprocedure is halted when U vanishes or when an ex-tremal vertex is reached (i.e., Uproj vanishes) depend-ing on which situation is encountered first.

The LSMC algorithm, on the other hand, combinesthe stochastic decision function with exhaustive localsearch using the 2-opt heuristic. At any point in thesearch space, an exhaustive local search is performedusing the 2-opt heuristic. The locally optimum so-lution is considered to be the current solution. Thecurrent solution is subject to a non-local perturbationtermed as a double-bridge kick [14] which results ina transition to a non-local point in the search space.An exhaustive local 2-opt search is performed start-ing from this new point yielding a new local optimum.The choice between the new local optimum and thecurrent solution is then made using the Metropolisdecision function or the Boltzmann decision functionas in the case of SA. The exhaustive local search usingthe 2-opt heuristic would, strictly speaking, entail theevaluation of the objective function f(n, Y) after each2-opt perturbation. This would cause the LSMC algo-rithm to be computationally extremely intensive. Asan effective compromise, the exhaustive local searchis performed using a modified objective function. Thecolumn in the hybridization matrix H correspondingto a given clone could be considered as a binary hy-bridization signature of that clone. The modified ob-jective function fD (n) computes the sum of the Ham-ming distances between the binary hybridization sig-natures of successive probes in a given probe orderingn. The local minimum of fD(n) is sought using the2-opt heuristic. Since the modified objective functionfD(n) is much easier to compute than the originalob-jective function f(n, Y), the exhaustive local search isvery fast. The LSMC algorithm is illustrated in Figure5. Note, that whereas the SA algorithm samples theentire search space, the LSMC algorithm samples onlythe space of locally optimal solutions. The Metropolisdecision function or the Boltzmann decision functionin the case of the LSMC algorithm is annealed in amanner similar to the SA algorithm.

2.3.2 Computation of tI

Determining the optimal clone ordering tI, entails acombinatorial search through the discrete space of allpossible permutations of { 1, ..., n} .The problem ofcoming up with such an optimal ordering is isomorphicto the classical NP-complete Traveling Salesman Prob-lem (TSP) for which no polynomial-time algorithm fordetermining the optimal solution is known [6]. Onecould use a simulated Monte Carlo search method suchas Simulated Annealing (SA) [7] or the Large StepMarkov Chain (LSMC) [14] both of which are knownto be robust in the presence of local optima in the solu-tion space and give near-optimal solutions in averagepolynomial time.

A single iteration of SA consists of three phases: (i)perturb, (ii) evaluate, and (iii) decide. In the perturbphase, the probe ordering is systematically perturbedby reversing the ordering within a block of probeswhere the endpoints of the block are chosen at ran-dom. This perturbation is referred to as a 2-opt heuris-tic in the context of the TSP [13] .In the evaluatephase, f(n, Yn) is computed. In the decide phase, thenew probe ordering is accepted and replaces the cur-rent probe ordering probabilistically using a stochasticdecision function such as the Metropolis decision func-tion [15] or the Boltzmann decision function [1]. Afterseveral iterations at a particular value of temperature( termed as an annealing step) , the stochastic decisionfunction is annealed in a manner such that the op-timization process resembles a random search in theearlier stages and a greedy local search or a deter-ministic hill-climbing search in the latter stages. The

The annealing schedule needed for asymptotic con-vergence of SA or LSMC is computationally intensive.This provides the motivation for the parallel compu-tation of the ML estimator. We refer the interestedreader to [7] and [14] for a more in-depth treatment ofthe SA and LSMC algorithms.


using a combination of MPI and multi-threaded pro-

gramming.

3.1 Parallel Monte Carlo Search

N

C: Current locally optimal solution

I: Intermediate Solution

N: New locally optimal solutionOB: double bridge perturbation2-opt: exhaustive 2-opt search

Figure 5: The LSMC Algorithm

.Ma..,Mu",C"'u-~ .ChildMuu"C,,'u-.C.'.j"gale Gnd...o "'-0 -,n

Figure 6: The Two-Ievel Parallel Computation of theMaximum Likelihood Estimator

Parallel Computation of the ML Es-timator

3

We have formulated and implemented two modelsof parallel SA (PSA) and parallel LSMC (PLSMC)algorithms based on the distribution of the Markovchain of solution states on an MPI cluster. Thesemodels incorporate control parallelism with multipleinteracting or non-interacting searches of the solutionspace and are described below:(i) The Non-Interacting Local Markov chain (NILM)PSA and PLSMC algorithms.(ii) The Periodically Interacting Local Markov chain(PILM) PSA and PLSMC algorithms.

In the NILM PSA/PLSMC algorithms, each SMPruns an independent version of the serial SA/LSMCalgorithm. Each Markov chain of solution states islocal to a given SMP. The SA/LSMC algorithms runconcurrently but asynchronously on each SMP. Theevaluation function and the decision function are ex-ecuted concurrently on the solution state within eachSMP. On termination of the annealing processes onall the processors, the best solution is selected fromamong all the solutions available on the individualSMPs. The NILM model is essentially that of mul-tiple independent (i.e., noninteracting) searches.

The PILM PSA/PLSMC algorithms are similar totheir NILM counterparts except for the fact that justbefore the temperature parameter is updated using theannealing function, the best candidate solution fromamong those in all the SMPs is selected and dupli-cated on all the SMPs. This focuses the search inthe more promising regions of the solution space. ThePILM model is essentially that of multiple periodicallyinteracting searches.

In the case of all the above PSA/PLSMC algo-rithms, a master process is used as the overall control-ling process. The master process runs on one of theSMPs and spawns child processes on each SMP withinthe MPI system, broadcasts the data subsets neededby each child process, collects the final results fromeach child process and terminates the child processes.In the case of the PILM PSA/PLSMC algorithms, ateach annealing step, the master process collects the re-sults from each child process and broadcasts the bestresult to all the child processes. On convergence, themaster process collects the final results from each ofthe child processes, selects the best result as the finalsolution and terminates the child processes.

Each child process in the PILM PSA/PLSMC algo-rithm receives the initial parameters from the master~

We propose a two-tier parallel computation of theML estimator corresponding to the two stages of op-timization as shown in Figure 6.Level I: Parallel computation of the optimal inter-probe spacing Yn for a given probe ordering II thatminimizes f(ll, Yn). This entails parallelization of thegradient descent search procedure for constrained op-timization in the continuous domain.Level 2: Parallel computation of the optimal probeordering ft for which f(ft, Yfl) is minimum. Thisentails parallelization of the simulated Monte Carlosearch procedure (SA or LSMC) for optimization in

the discrete domain.The parallel algorithms were implemented on a cluster

of shared-memory symmetric multiprocessors (SMPs)


process and runs its local version of the SA/LSMC al-gorithm. At the end of each annealing step each childprocess conveys its result to the master process, re-ceives the best result thus far from the master processand replaces its result with the best result thus farbefore proceeding with the next annealing step. Onconvergence each child process conveys its result tothe master process. The master and child processesfor the NILM PSA/PLSMC algorithms are similar tothose of their PILM counterparts except for the ab-sence of the periodic interaction at the end of each

annealing step.

to global values in shared memory (such as the globalsum of the vector components) are controlled usingmutex (mutual exclusion) locks.

3.3 A Two-tier Parallelization of the MLEstimator

In order to ensure a scalable implementation, twotiers of parallelism were incorporated in the compu-tation of the ML estimator. The finer or lower levelof parallelism pertains to the computation of y fora given probe ordering II using the parallel multi-threaded COD algorithm for continuous optimization.The coarser or upper level of parallelization pertainsto the computation of fi using a simulated MonteCarlo algorithm for discrete optimization. The multi-threaded COD algorithm is embedded within eachof the parallel Monte Carlo algorithms and, as such,the parallelization of the COD algorithm at the finerlevel is transparent to the parallel Monte Carlo algo-rithm at the coarser level. When the parallel CODprocedure is invoked from within the master or childMonte Carlo process, a new set of child COD processes(i.e. threads) is spawned on the available processors(within an SMP), whereas the master COD process(i.e. thread) runs on the same processor as the MonteCarlo process (master or child). The master and childCOD processes (i.e. threads) cooperate to evaluateand minimize the value of f(lI, Yn). Once f(lI, Yn)is minimized, the child COD processes (i.e. threads)terminate and the corresponding processors within anSMP are available for future computation. The two-tier parallelism approach induces a logical tree-shapedinterconnection network on the processors within theSMP cluster .

3.2 Parallel Gradient Descent Search

Steepest descent search and conjugate gradient de-scent (CGD) search are generally used for uncon-strained optimization in the continuous domain. Thesteepest descent search, in our case, has been adaptedto the fact that the solution space of the inter-probespacings is constrained since O ~ Yi ~ M for i =1, ..., n. We have used the CGD search instead of thesteepest descent search since the former is known to beone of the fastest in the class of gradient descent- basedoptimization methods [8].

The CG D search is very similar to the steepest de-scent procedure with the only difference that differentdirections are followed while minimizing the objectivefunction. Instead of consistently following the localdownhill gradient direction, a set of n mutually or-thonormal (i.e., conjugate) direction vectors are gen-erated from the downhill gradient vector where n isthe dimensionality of the solution space [17] .U nlikethe steepest descent algorithm, the CGD algorithmguarantees convergence to a local minimum within n

steps.Due to its inherent sequential nature, we deemed

data parallelism to be appropriate for the parallelCGD algorithm. The Y and U vectors are distributedamongst the different processors within an single SMPand each processor performs the gradient vector com-putation and updates to the inter-probe spacing vec-tor using its local subvectors Yloc and Uloc concur-rently with the other processors within the SMP. Here,lYiocl = IYI/Np and IUlocl = IUI/Np where Np isthe number of processors within each SMP. A multi-threaded programming approach [2] was used with asingle thread running on a single processor within theSMP. Since the individual subvectors have to be peri-odically distributed amongst the processors and alsoperiodically gathered to compute a global value for sduring the bisection procedure, the threads have tobe periodically synchronized using a barrier. Updates

Experimental Results4

The parallel algorithms were implemented on an 8-node dedicated cluster of SMPs running Solaris-x86.Each node is an SMP consisting of 4 Intel Xeon 700MHz processors with 1 MB cache per processor and1 GB of shared memory (RAM). The serial SA andLSMC algorithms were implemented with followingparameters: the initial value for the temperature Twas chosen to be 0.5, the maximum number of itera-tions D for each annealing step was chosen to be 100.n.The current annealing step was terminated when themaximum number of iterations was reached or whenthe number of successful perturbations equaled 10 .nwhichever was encountered first. The temperature


25Nun-D:&BJS

Figure 7: Parallel (multithreaded) CGD algorithmSpeedup curves Figure 8: Parallel (multithreaded) CGD algorithm

Efficiency curves

Figure 9: Parallel SA on simulated data: Speedupcurves

was systematically reduced using a geometric anneal-ing schedule of the form Tnext = (Y. .Tprev , with theannealing factor (Y. = 0.95. The algorithm was termi-nated when the number of successful perturbations inany annealing step equaled 0.

In the case of the parallel SA and LSMC algorithmsthe product ofNsMP (the number ofSMPs) and themaximum number of iterations D performed by anSMP in a single annealing step was kept constant i.e.,D = (lOO. n)/NsMP. This ensured that the overallworkload remained constant as the number of proces-sors was varied, thus enabling one to examine the scal-ability of the speedup and efficiency of the algorithmsfor a given problem size with increasing number ofprocessors. The other parameters for the parallel al-gorithms were identical to those of their serial coun-terparts. In the NILM PSA/PLSMC algorithms, eachprocess was independently terminated when the num-ber of successful perturbations in any annealing stepfor that process equaled 0. In the PILM PSA/PLSMCalgorithms, each process was terminated when thenumber of successful perturbations in an annealingstep equaled 0 for a// the processes. This conditionwas checked during the synchronization phase at theend of each annealing step.

The parallel CGD algorithm was tested on sim-ulated chromosomal data sets with a varying num-ber of probes and clones (n, k) = (50,300), (200,1300)and (500,3250). Figure 7 shows the resulting speedupcurves and Figure 8 the resulting efficiency curves.These results are in conformity with our expectationssince the inter-thread synchronization overhead andthe wait times tend to increasingly dominate the over-all execution time with an increasing number of pro-cessors for a given value of n. The payoff in the paral-lelization of the CGD algorithm is better realized forlarger values of n (i.e., larger problem sizes).

10 15 20 25 30

l--npE,i)e=sO --npE,i)e.lOO --npE,i)e.200 I

Figure 10: Parallel SA on simulated data: Efficiencycurves


10 15 20 25

NumberofPKJCeSa:)m

l--np~-50 --np",,",-100 -.-np",,",-200 I

35

Figure 11: Parallel SA on real data from cosmid 2 andcosmid 3 of Neurospora crassa: Speedup curves Figure 14: Parallel LSMC on simulated data: Effi-

clency curves

10 15 '0Numb.rofPmce~m

25 30 10 15 20N=berofP,",ces~m

1--=BDii2 --=BDii31

25

Figure 12: Parallel SA on real data from cosmid 2 and

cosmid 3 of Neurospora crassa: Efficiency curvesFigure 15: Parallel LSMC on real data from cosmid 2and cosmid 3 of Neurospora crassa: Speedup curves

10 15 20 25

N urn berofP .'cessom

l--nP-.50 -.-np-.100 -.-npJCbe-200 I

3010 25 30

Figure 16: Parallel LSMC on real data from cosmid 2and cosmid 3 of N eurospora crassa: Efficiency curves

Figure 13: Parallel LSMC on simulated data: Speedupcurves


2000,

'.so

]'.00

'850

'800

~ '750,>

,700

'.so

'.00

'550

2000 4000 6000 8000 10000

Tm e ioec)

12000 14000 "'000

1 01

Figure 17: Comparison of convergence rates of theserial SA and serial LSMC algorithms on simulateddata with (n, k) = (50,300)

In the case of the PSA and PLSMC algorithms, weexperimented with simulated chromosomal data withvarying number of probes and clones. The false posi-tive and false negative rates were assumed to be 2%.Figures 9 and 10 show the speedup and efficiency, re-spectively, of the PSA algorithm on simulated data.Figures 13 and 14 show the speedup and efficiency,respectively, of the PSA algorithm on simulated data.The PSA and PLSMC algorithms were also tested onreal data derived from cosmid 2 (n = 109, k = 2046)and cosmid 3 (n = 111, k = 1937) of the fungus Neu-rospora crassa. Figures 11 and 12 show the speedupand efficiency, respectively, of the PSA algorithm onreal data. Figures 15 and 16 show the speedup and ef-ficiency, respectively, of the PLSMC algorithm on realdata. As can be observed, the PSA and PLSMC algo-rithms exhibit consistent and scalable speedup withincreasing number of processors. As expected, thespeedup scales better with increasing number of pro-cessors for larger values of n (i.e. , larger problem size) .Overall, the PSA algorithm was seen to scale betterthan the PLSMC algorithm. The reason for this isthat the serial LSMC algorithm is much faster thanthe serial SA algorithm as shown in Figures 17 and 18on simulated and real data respectively. This impliesthat the PLSMC algorithm is better suited for largerproblem instances than the PSA algorithm.

The absolute root mean squared error (RMSE) Xbetween the true inter-probe spacings Y and the es-timated inter-probe spacings y is defined as X =

fif;!E. The RMSE value is typically expressed asa percentage of N (the chromosome length). In ourexperiments, the percent RMSE value was observedto lie in the range [0.34%,2.63%] for the simulateddata. The percent RMSE value was also observed toasymptotically approach 0 in the limit n -+ 00, whichis in conformity with the statistical theory underlyingmaximum likelihood (ML) estimation [9] .

20000 .0000 60000

Tm e t.ec)

I--B1I C -ssa-SO;J I

Conclusions and F\1ture Directions80000 100000 120000 5

In this paper we presented a maximum likelihood(ML) estimation-based approach to physical map re-construction under a probabilistic model of hybridiza-tion errors consisting of false positives and false neg-atives. The ML estimate reconstructs the opti-mal probe ordering and optimal inter-probe spacingswhen used in conjunction with the sampling-without-replacement experimental protocol. The estimationprocedure was shown to entail continuous optimiza-tion for determining the optimal inter-probe spacings

Figure 18: Comparison of convergence rates of theserial SA and serial LSMC algorithms on real datafrom cosmid 2 of N eurospora crassa


Synthetic Oligonucleotides. Biometrics, Vol. 48, pp.

337-359,1992.

[6] M.S. Carey and D.S. Johnson, Computers andIntractability: A Guide to the Theory of NP-Completeness, W.H. Freeman, New York, NY, 1979.

[7] S. Ceman and D. Ceman, Stochastic Relaxation,Cibbs Distribution and the Bayesian Restoration ofImages, IEEE Trans. Pattern Analysis and MachineIntelligence, Vol. 6, pp. 721-741, 1984.

[8] M. Hestenes and E. Stiefel, Methods of ConjugateGradients for Solving Linear Systems. Journal of Re-search of the National Bureau of Standards, Vol. 49,pp. 409-436, 1980.

[9] R. V. Hogg and A. T .Craig, Introduction to Mathe-matical Statistics, Fifth Edition, Prentice Hall, NewJersey, 1995.

[10] J .D. Kececioglu and E. W .Myers, Combinatorial Al-gorithms for DNA Sequence Assembly, Aigorithmica,Vol. 13, pp. 7-51, 1995.

for a given probe ordering and combinatorial opti-mization for determining the optimal probe order-ing. A two-tier parallelization strategy was proposedwherein the CGD search algorithm for continuous op-timization is parallelized at the lower level and theSA or LSMC algorithm for combinatorial optimiza-tion is simultaneously parallelized at the higher level.The parallel ML estimation algorithm was shown toamenable to efficient implementation on a network ofSMPs where the CGD search is parallelized on a singleSMP using shared-memory, multi-threaded program-ming whereas the SA-based or LSMC-based MonteCarlo search is parallelized on the SMP network usingthe distributed-memory, message-passing-based pro-gramming paradigm within the MPI environment.

Future research will investigate extensions of themaximum likelihood function that also encapsulateerrors due to repeat DNA sequences in addition tofalse positives and false negatives. The current imple-mentation of the ML estimator is targeted towards ahomogeneous platform such as a network of identicalSMPs. Future research will explore and address issuesthat deal with the parallelization of the ML estima-tor on a heterogeneous platform such as a network ofSMPs that differ in processing speeds and memory ca-pacity, since that is a scenario that is more likely tobe encountered in the real world.

Acknowledgments: This research was supported inpart by an NRICGP grant by the US Department ofAgriculture to Dr. Bhandarkar and Dr. Arnold.

References

[11] J.D. Kececioglu, S.S. Shete and J. Arnold, Recon-structing Distances in Physical Maps of Chromo-somes With Nonoverlapping Probes, Proc. 4th ACMConI. Comp. Mol. Bioi. (RECOMB), Tokyo, Japan,pp. 183-192, April, 2000.

[12] D. Kincaid and W. Cheney, Numerical AnalysisMathematics of Scientific Computing, Brooks/ColePublishing Company, Pacific Grove, CA, 1991.

[13] S. Lin and B. Kernighan, An Effective HeuristicSearch Algorithm for the Traveling Salesman Prob-lem, Operations Research, Vol~ 21, pp. 498-516, 1973.

[14] 0. Martin, S. W .Otto and E. W .Felten, Large-StepMarkov Chains for the Traveling Salesman Problem,Complex Systems, Vo. 5, No.3, pp. 299-326, 1991.

[15] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A.Teller and E. Teller, Equation of state calculationsby fast computing machines, Jour. Chemical Physics,Vol. 21, pp. 1087-1092, 1953.

[16] P. Pacheco, Parallel Programming with MPI, MorganKaufmann Publishers, San Francisco, CA, 1996.

[1] E.H.L. Aarts and K. Korst, Simulated Annealing andBoltzman Machines: A Stochastic Approach to Com-binatorial Optimization and Neural Computing, Wi-ley, New York, 1989.

[2] G. Andrews, Foundations of Multithreaded, Parallel,and Distributed Programming, Addison Wesley Pub.Co., Reading, MA, 2000.

[3] 8.M. Bhandarkar, 8.A. Machaka, 8.8. 8hete and R.N.Kota, Parallel Computation of a Maximum Likeli-hood Estimator of a Physical Map, Genetics, specialissue on Computational Biology, Vol. 157, N 0. 3, pp.1021-1043, March 2001.

[17] W.H. Press, B.P. Flannery, S.A. Teukolsky, W.T.Vetterling, Numerical Recipes in C, Cambridge Uni-versity Press, New York, 1988.

[18] S.S. Shete, Estimation Problems in Physical Map-ping of a Chromosome and in a Branching Processwith Immigration, Ph.D. Dissertation, Departmentof Statistics, University of Georgia, Athens, Georgia,August 1998.

[4] C.N. Dorny, A Vector Space Approach to Modelsand Optimization, R.E. Krieger Publishing Company,Huntington, NY, 1980.

[5] Y.X. Fu, W.E. Timberlake and J. Arnold, On theDesign of Genome Mapping Experiments using Short


suchendra m. bhandarkar and jinling huang j onathan...

Documents