geometric problems on two-dimensional array...

CIRCUITS SYSTEMS S I G N A L PROCESS VOLI 7, NO. 2, 1988

GEOMETRIC PROBLEMS ON Two-DIMENSIONAL ARRAY PROCESSORS*

M i L u ~ a n d P e t e r V a r m a n 2

Abstract. Parallel algorithms for solving geometric problems on two array processor models--the mesh-connected computer (MCC) and a two-dimensional systolic array--are presented. We illustrate a recursive divide-and-conquer paradigm for MCC algorithms by presenting a time-optimal solution for the problem of finding the nearest neighbors of a set of planar points represented by their Cartesian coordinates. The algorithm executes on a ~/n • x/n MCC, and requires an optimal O(x/n) time. An algorithm for constructing the convex hull of a set of planar points and an update algorithm for the disk placement problem on an nZ/3x n 2/3 two- dimensional systolic array are presented. Both these algorithms require O(n 2/3) time steps. The advantage of the systolic solutions lies in their suitability for direct hardware implementation.

I. Introduction

Several diverse applicat ion areas such as compute r graphics, pattern recogni- tion, computer -a ided design, and robotics provide computa t ional ly demand- ing problems that are inherently geometr ic in nature. Computa t iona l geometry is concerned with finding efficient algori thmic solutions for computing geometr ic properties such as the convex hull o f a set o f multidimensional points and various distance or intersection propert ies o f a set o f geometr ic objects. A recent survey of the area appears in [14].

Most o f the work in this area has been devoted to designing efficient serial algorithms for geometr ic problems. With the growing availability o f mult iprocessor systems based on VLSI technology, it is possible to consider reducing the time required to solve these problems by employing parallel

* Received May 22, 1987. This research was partially supported by an IBM Faculty Development Award.

Department of Electrical Engineering, Texas A&M University, College Station, Texas 77843, USA.

2 Department of Electrical and Computer Engineering, Rice University, Houston, Texas 77251-1892, USA.

192 Ml Lu AND VARMAN

processing. Fast execution times are desirable in several situations where these problems arise naturally as in air-traffic control and robotics applications. Unfortunately, several of the techniques used in designing efficient serial algorithms appear inherently sequential and hence inappropriate for parallel processors. Consequently, there has been a recent interest in determining efficient parallel solutions for geometric problems [2], [5], [8]-[12].

A cost-effective solution is possible if the parallel system is composed of a large number of concurrently operating, simple, and (ideally) identical cells, interconnected in a regular geometrical pattern (ideally with only nearest neighbor connections). Two-dimensional arrays, in which each cell is connected to at most four neighboring cells, meet several of the constraints of a cost-effective design. The limited number of interconnections on a two-dimensional array, however, makes it challenging to design efficient algorithms on such a model. Not only do the traditional sequential algorithms have to be redesigned to distribute the computation among several concurrently operating cells, but the distribution must be done in a manner that minimizes the communication required between the cells.

In this paper we present a number of algorithms for geometric problems that can be efficiently mapped onto a two-dimensional array of processing cells. We consider two models of two-dimensional arrays-- the mesh- connected computer (MCC) and the systolic array [7]. Several geometric problems have been recently solved using a divide-and-conquer strategy on an MCC [5], [8]-[12]. We give an example of one such problem.

2. Finding nearest neighbors on MCCs

We present a parallel MCC algorithm for the nearest-neighbor problem for a given set S of n planar points. Suppose the coordinate of point i is (xi, yi). Let d(i,j) denote the distance between points i and j under the L2 metric, i.e., d(i , j )= ((xi-xj)2+(yi-yj)2) 1/2. A point v is the nearest neighbor of point u if d(u, v) is no more than d(u,j) for all points j r S. The all-nearest- neighbors problem is to find a nearest neighbor for each point. The best sequential algorithm for this problem has an optimal O(n log n) time performance [1].

2.1. Preliminaries

An n x m MCC consists of N = nm processors (PEs) arranged on a two- dimensional grid with processors at the grid points and connections between every pair of horizontally adjacent and vertically adjacent PEs. For convenience we shall assume that n and m are powers of 2, and N = 2 k, k -> 0. Each PE has a distinct index between 0 and (2 k - 1) and the PE with index j is denoted by PE(j) . The PEs are indexed in shuffled row-major order as shown in Figure 1 for a 4 x 4 MCC. The shuffled row major is obtained from the standard row-major ordering by shuffling the binary representation of

GEOMETRIC PROBLEMS ON 2-D ARRAY PROCESSORS 193

Figure 1

the PE indices. The binary representation of an integer j, 0 <-j < 2 k, will be denoted by (jk-ljk 2" " "jljo).

A submesh of size 2 k r consists of the set of PEs whose indices agree on r most significant bits (MSBs) of their binary representation, for some O~ r<-k. Submeshes A and B of size 2 k r are said to be adjacent if the indices of a PE in submesh A and a PE in submesh B agree on the first ( r - 1) MSBs but differ in the rth MSB.

We assume familiarity with the algorithms for performing the following data routing operations:

(1) Sort Input: mn elements of a totally ordered set distributed equally among

the PEs of an m x n MCC. Output: Route the elements so that PE(i) receives the element with

rank equal to i, 0-< i < ran. Time complexity: O(max(m, n)) (see [15]).

(2) Random Access Write~Read ( RA W/ R A R ) Input: A set of records distributed one per PE of an m x n MCC,

and a (possibly many-to-one) function, f : { O , . . . , m n - 1 } ~ { 0 , . . . , m n - 1}.

Output: Route the record from PE(i) to PE( f ( i ) ) (RAW) or from PE(f ( i ) ) to PE(i) (RAR).

Time complexity: (RAR): O(max(m, n)); RAW): O(max(m, n)), if f is one to one. (See [13].)

2.2. Nearest-neighbor algorithm

In this section we present a parallel algorithm to solve the all-nearest- neighbors problem for a set of n planar points, represented by their Cartesian coordinates. Our algorithm can be readily implemented on a x/~ • ~ MCC with O(x/-~) time performance. This time is optimal up to constant factors.

The algorithm is based on a "divide-and-conquer" strategy and determines the nearest neighbor for each of the points simultaneously. An optimal

194 M] Lu AND VARMAN

serial algorithm based on this strategy was discovered by Bentley and Shamos [1]. For convenience, assume that the number of points n is 2 k for some even k-> 0. Partition the set of points into two equal-sized subsets a, fl by a horizontal line L, so that all points in a have y-coordinates less than or equal to those of points in ft. Next, partition the points in a into two equal-sized subsets by a vertical line L~ and similarly those in fl by a vertical line L, . Let a l , a2, ill, and /~2 be the subsets of a,/3 so induced such that all points in a~(fl,) have x-coordinates no greater than the x-coordinates of points in ~2(fl2). Iteratively partition each of the quadrants into four subquadrants until each slab contains exactly one point (Figure 2(a)).

The main procedure consists of log2 n iterations. At the end of the j th iteration, 1 -<j -< log n, each point would have determined its nearest neighbor in a subset of 2 j points of S. On termination, then, every point would

7 - + - 1

Figure 2

~-- x

c( 1

y I ~2

I l

I I A , F ~ 0 i"

~ j r I _ _ _ - - - - - - I - ~ . I.

G i NI C IH L.r _L _ _

I I I I . I

I _ _ _ _

-----" ~-'I "I-F e" D I I B I " IL I I

I I I

~l 62 ~B

(a)

(b)

K A F O

G N C H

D L B P


have found its nearest neighbor among all the points in S. The iterations alternate between a horizontal and a vertical merge step, beginning with a horizontal merge. In a horizontal (vertical) merge, each point in a 2J-block of points finds its nearest neighbor among the 2 j+l- block of points contained in its slab and the slab immediately to the right/left of (respectively below/above) it. Figure 2(b) indicates the four merge steps on a set of 16 points. With respect to this figure, at the end of iteration 2, point O has found its nearest neighbor among points {F, H, C} and at the end of iteration 3 its nearest neighbor among the points {F, H, C, A, N, K, G}. We now describe the steps involved in a horizontal merge of two subsets of points. (The processing for a vertical merge is symmetrical with the roles of the x and y coordinates reversed.)

Let a, fl be the two subsets of points partitioned by a vertical line L, and assume that at the beginning of the iteration that each point p in a (respectively/3) knows the distance ~p to its nearest neighbor in a (respectively fl). In this iteration, every point in a and 13 will determine its nearest neighbor in the set a w/3.

We concentrate on finding the nearest neighbors of all points in a. A symmetrical set of steps would be used to find the nearest neighbors of all the points in/3. A point p in a can have a neighbor in/3 that is closer than the one already found, only if the horizontal separation between p and L is less than 6p. We denote these points of a as candidate points. Consider an arbitrary point q in/3 (see Figure 3). From the density of planar point packing it is known [1] that at most four points in a (under the L2 metric) can have q as their nearest neighbor. This follows from the observation that if q is the nearest neighbor of points p~ and P2, Pl, P2 6 a then d (p l , q) -< 6p, and d(p2, q)<-6p2. Since d ( p l , P2)>-6p, and d(p~, P2)-> 6p2, it follows that d(p~, P2) -> d(pl, q) and d(p~, P2) -> d(p2, q). Hence, the angle P~qP2 >- 60 ~ For each point q of/3, we need therefore examine only a constant number (four) of points in a to find if q is the nearest neighbor of any of these

f \ / 7 ~ \ \

\ & j ] \ /

Figure 3

196 MI Lu AND VARMAN

points. Of course, a given point in a may be examined by several points in/3.

To determine which four points in a must be examined by a point in/3, each candidate point in a and all points in/3 are projected onto the vertical dividing line L. Each point in/3 must examine the two closest points in a before and after it in the projection. In a sequential implementat ion [1], this is easily accompl ished by scanning the list o f projected points. However , such a strategy would result in an inefficient parallel algorithm owing to the strictly serial nature o f the scan. To allow each of the points in /3 to concurrent ly determine the points in a that it is to examine, we adopt the following strategy. Sort all candidate points in a into an array LI[0 �9 �9 �9 s - 1] and all points in/3 into an array L2[O" �9 �9 t - 1 ] in increasing o rde r ' o f their y-coordinates . Let local(q) denote the index of a point q in array L2. Merge the arrays L~ and L2 to obtain a sorted array L [ 0 . . . s + t - 1 ] , arid let global(q) denote the index of q in L. Then point q must examine the four points in L1 with indices

(global(q) - local(q) - 2 ) �9 �9 �9 (global(q) - local(q)+ 1).

The details o f the algori thm are presented below. Each point is initially available in the array P [0 �9 �9 �9 n - 1], n = 2 k. P[i] consists of three initialized fields P [ i ] . id -- i, P [ i ] . x, and P [ i ] . y where the latter two are the coordinates o f point i. We describe the algorithm in terms of operat ion on arrays. We will later show how to map the operat ions on the arrays onto the PEs of a 2 k/2 x 2 k/2 MCC. Each element o f the arrays used in the algorithm consists o f a record with several fields. We use the notat ion A[i] . (field) to refer to a field in the ith element o f array A[ ].

Initialization For b : = 0 t o ( k - 1 ) do

For each j, 0-<j < 2 b, do in parallel Sort P [ j . 2 k-b. �9 �9 ( j + 1) �9 2 k-b - 1] into nondecreas ing order using either the keys P [ i ] . x or P [ i ] . y depending on whether b is odd or even. End for

For each i, 0 -< i < 2 k, do in parallel /* N N [ i ] and D I S T [ i ] are the id and distance to the nearest neighbor o f

P[i] */ N N [ i] =nill; D I S T [ i ] = o~;

End for End

Main Procedure For b : = l to k d o

For all j, 0 ~ j < 2 k-b do in parallel

G E O M E T R I C P R O B L E M S O N 2-D A R R A Y P R O C E S S O R S 197

Case b: b odd: Horizontal-Merge ( P [ u . �9 �9 v ] , N N [ u . �9 �9 v ] , D I S T [ u . �9 �9 v ] , u = j . 2b, v = ( j + 1)2 b - 1); b even: Vertical-Merge ( P [ u . �9 �9 v ] , N N [ u . �9 �9 v ] , D I S T [ u . �9 �9 v ] , u = j . 2b, v = ( j + 1)2 b - 1);

End for End

P r o c e d u r e H o r i z o n t a l - M e r g e ( P [ u . �9 �9 v ], N N [ u . �9 �9 v ], D I S T [ u . �9 �9 v ], u = 0, V=2b- -1)

/* Determine the nearest neighbor of all points P [ 0 . . . 2b-~-- l ] in slab a */

/* A symmetrical set of steps will determine the nearest neighbors of points p [ 2 b 1. . . 2 b _ l ] i n s lab /3 , /

1. 6 = M a x P [ i ] . x /* coordinate of dividing line */ 0 < i.<2 b I

2. For each i, 0 -< i < 2 b, do in parallel T [ i ] ~ P[i] ; i f 0 - < i < 2 b - l then / * P [ i ] i s i n a * /

if [6 - P [ i ] . x[ > D I S T [ i]

then T [ i ] . y ~ o o ; / * P [ i ] is not a candidate */ End for

3. In parallel do /* Project points onto vertical dividing line */

S o r t T [ 0 �9 �9 �9 2 b 1 _ 1 ] into nondecreasing order using T [ i ] . y as the key. Sort T[2 b - l ' ' " 2 b - 1] into nondecreasing order using T [ i ] . y as the key.

4. For each i, 0 - < i < 2 b, do in parallel S[i]<-- T[i]; Case i:

0 - < i <2b-~: S [ i ]. loca l = nul l ;

2b- l - - < i<2b : S [ i ] . local = i -2b -1 ; /* Obtain local rank of points in/3 */

End case End for

5. Merge the sorted subarrays S[0 �9 �9 �9 2 b-~ - 1] and S [ 2 b 1 . . . 2b _ 1] into a single sorted array S[0 . �9 �9 2 b - 1] using S [ i ] . y as the key

6. For r := - 2 �9 �9 �9 1 do /* Each point in/3 will examine four points in a */ (a) For each i, 0 <- i < 2 b, do in parallel:

N [ i ] <-nul l ; /* Temporary array */ D [ i ] < - n u l l / * Temporary array */ if ( S [ i ] . i d > - 2 b 1) / * Only points in/3 are active */


{index := i - S [ i ] , local+ r; /* index of desired points in sorted array T[ ] of candidates in a */

if (0~ index<2 b-l) (*) { N [ i ]~ T[index]. id; /* id of desired point */

D[i] ~- d(N[i], S[i]. id); /* distance between points */ )

End for (b) For each j, 0-----j<2 b-I, do ill parallel: /* Each candidate point in a determines its nearest neighbor in/3 */

Determine k such that D[k] = Min {D[s]lN[s ] =j}; 0 ~ < s < 2 b

if D[k] < DIST[j] then {DIST[j] ~ D[k]; NN[j] ~- k; }

End for End

2.3. MCC implementation

The points are initially distributed randomly to the PEs, one point per PE. Without loss of generality we refer to the point in PE(i) after the sorting step of the Initialization as the ith point and associate it with index i in the arrays P[ ], NN[ ], and DIST[ ]. At the termination of the algorithm, PE(i) will contain the index of the PE containing the nearest neighbor of i, and the distance between the two points.

Each iteration of the main procedure is carried out independently by the PEs in disjoint submeshes of size 2 Lb/2] X 2 rb/21. At the end of the j th iteration, PE(i) contains the nearest neighbor (in a 2J-block) of point i.

We next consider the steps involved in an implementation of procedure Horizontal-Merge described previously. Step 1 can be performed by implementing a standard "binary tree" computation and distributing the value of 8 to all PEs in the submesh. Each PE creates a record of T[ ] in step 2 in O(1) time based on data locally available to the PE. Following the sort in step 3, the records of T[ ] are redistributed among the PEs and the record in PE(i) following the sort is then referred to as T[i]. Similarly, in step 4 each PE creates a record of S[ ] based on local information; these records are redistributed following the merge in step 5, and S[i] following this step then refers to the record in PE(i). All computations in step 6(a) except for that marked by (*) are performed by PE(i) using locally available data. The computation marked (*) is accomplished by performing a Random Access Read from PE(index) to obtain the record T[index] stored in that PE. The implementation of step 6(b) requires an explanation. When PE(i) updates N[i] and D[i] in step 6(a) (marked by (*)), it also creates a record


(N[ i ] , D[i]). PE(i) transmits this record to PE(NN[i]) by performing a Random Access Write to that PE. Several PEs may attempt to write into the same PE. The second field of the record is used to determine which record actually gets written into the destination PE. Of all the records with the same destination, the one with the smallest D[ ] value succeeds. (See [13] for an implementation of Random Access Writes with conflicts.)

Time complexity. Step 2 and 6 of Procedure Horizontal-Merge require O(1) time and steps 1, 3, 5, and 6 can be performed in time bounded by ci2 ~b/21. The time complexity for the main procedure is thus bounded by

k ( C12 ~b/21+ C2) <-- C2 k/2,

b=l

which is O(x/-n). The time for initialization is similarly bounded by

k--1 C3 2 [(b-k)/2],

b=0

which is O(~-n). The time complexity for the entire algorithm is therefore O(~/-ff) time steps.

3. Solving geometric problems on systolic arrays

A systolic array [7] is a network consisting of a large number of identical simple modules, called cells or PEs. PEs are provided with a fixed amount of local storage, and are laid out with simple and regular interconnections. The systolic network passes data rhythmically along these fixed interconnections. With each pulse of a clock, a PE performs a simple constant-time computation using the data passing through it. Data flows from the memory of the host computer into the array, passing through many processing elements before it returns to memory. In this section we describe the use of a two-dimensional systolic array for solving a number of geometric problems efficiently. In particular we will discuss algorithms for counting the number of intersections of a set of line segments, for constructing the convex hull of a set of planar points and an update algorithm for the fixed-size disk placement problem. All these algorithms use a common paradigm which can be used to solve other geometric problems like finding the nearest neighbors of a set of multidimensional data points by straightforward modifications.

The systolic algorithms presented in this section u s e n 2/3 X n 2/3 cells and require O(n 2/3) time steps. The best known serial algorithms for the problem of counting the number of intersections of a set of line segments requires O ( n 1"695) time [3], and the construction of a planar convex hull can be done serially in optimal O(n log n) time [6]. A well-known serial algorithm to solve the update problem for a given set of n points requires O(n 2) time [4].

200 Ml Lu AND VARMAN

3.1. Overview

Let A = [ a i , a 2 , . . . , a , ] , B = [ b ~ , b z , . . . , b , ] , and C = [ c , , c 2 . . . . . c,], where ci, 1 -< i <- n, is defined by the recurrence

e l ~ : 0 ,

(1) Clk)=clk-l)(~f(ak, bi), l<_i<_n.

In the above equat ion O is an associative b inary opera t ion and both f ( ) and �9 can be compu ted in O(1) time.

The compu ta t ion of ci, 1-< i_< n, can be readily m a p p e d onto a two- d imensional systolic ne twork o f cells as indicated in Figure 4 for the case n -- 4. The cell at posi t ion (k, i) computes the value of c k using c k-1 and bi passed to it f rom cell ( k - 1, i) and ak f rom cell (k, i - 1). Assuming that cell (1, 1) begins comput ing at t ime 1, the compu ta t ion of c~, by cell (n, n) occurs at t ime ( 3 n - 2 ) . It may be noted that the ne twork has a period of

b 4

b 3

b 2

a 4

Figure 4

a 3

a 2

a I

I Cl

( i ) 1 Cl

(2) c 1

I (3)

I , i ) ( ,

I c2

e (2)

c~ 3)

c 2

3 I ~

~ ( 4 ) ~ c3 ~ 14

G E O M E T R I C PROBLEMS ON 2-D A R R A Y P R O C E S S O R S 201

one cycle. That is the computa t ion o f a new C vector using a different set of A and B vectors may be started immediately after the initiation o f the first. We will use this observation to design a more time-efficient systolic network to solve the problem using a smaller number o f cells.

Let a and /3 be two integers, a > fl, and let c~fl = n. Define auxiliary variables x~ as follows: variables x~ as follows:

x(O) U =0 ,

x(k)=x?-l)(~f(ak+(j_l)~,bi), Vi, j ,k, l<i<-n, l<-j<-/3, l<k<-a. i j - - - -

It may be verified that ci, 1 - < i - < n, may be computed f rom x k using the recurrence

Cl ~ = O,

clk)= cl k ')|

C~ : C/~.

The above recurrence can be solved on a systolic network of size (a + 1) x a as shown in Figure 5. The last row of cells is used for comput ing ci's and the a x a network, similar to that o f Figure 4 computes the x f s .

The computa t ion is performed in/3 passes. In the j th pass the a results, cr, ( j - 1 ) a < r<_ja, are computed in the last row of cells, one value per cell. Each pass consists o f / 3 stages. In the kth stage o f the j th pass, the value c~ is accumulated by the appropria te cell in the last row using the value o f x~k passed to it by the a x a network. The kth stage o f the j th pass is performed by feeding the values br and as, ( j - 1)a < r -<ja, (k - 1)a < s -< ka, into the network.

The time complexity o f the computa t ion can be determined as follows. Assume that the first computa t ion (that o f x] 1) begins at time 1 at cell (1, 1). The last computa t ion at cell (1, 1) (that o f x ~ .... ) will begin at time /3 2. Consequently, the last computa t ion at cell ( a + 1, a + 1) will be completed at time (/32+2a--1). By choosing a =/32, we have o [ = n 2 / 3 and / 3 = n 1/3. Thus, the time complexity is O(n 2/3) steps.

3.2. Optimality

In this subsection we show that the network described in the previous section computes the ci, 1-<i<-n, defined by (1) in optimal time (up to constant factors). We define the model more precisely for the lower-bound proof.

Define o-(d) to be the min imum number o f cells in a network that can be reached from an arbitrary cell in d or fewer data movement steps. For the two-dimensional networks under considerat ion, ~r(d)-< cd 2 for some constant c > 0. The elements of _A and B are assumed to be initially stored without duplication in the cells o f the network. We refer to the cell in which an element u c A, B is stored initially as h(u), the home cell o f u.

~q

> z u < > > z

II O0 II

l ,L

= ~

_1

-i

J


Let 6, 6 -> 1, denote the maximum number of data movement steps that are made by any ai (or bi), 1 -~ n, in computing a product term f ( a , bk) (or f(b~, ak)). Let u be an arbitrary element of A or B, and without loss of generality assume it is from A. Since u is involved in a product computat ion with all bi, h(bi) must be reachable from h(u) in 26 or less data movement steps, for all 1 <- i -< n. Consider an arbitrary element bs. All products f ( ak, bs) must be computed in cells reachable from b(hi) in 6 or less data movement steps, and hence within 36 data movement steps from h(u) . Since s was arbitrary, all n 2 product terms ( f (ak , b~), 1 <-- i, k <- n) must be computed in cells reachable from h(u) in 38 or less data movement steps. Thus, all n 2 products must be computed in at most 0-(38) cells, which is no more than 9c62. Assuming unit time for the computation of a single product and for one data movement step, the time T for the computat ion is bounded by

//2 T>_

9C6 2'

T>--6.

The minimum time, T, to perform the computat ion is obtained when

n 2

9C62 = 8,

n2/3 6 - (9c)2/~ - Cl n2/3,

hence,

T = ~ ~ ( n 2 / 3 ) .

To claim that our implementation is optimal, we must ensure that the network satisfies the constraints imposed by the lower-bound model. In particular, the assumption that all elements of A and B are stored without duplication should hold. In the network implementat ion shown, each element enters the network several times. However, note that it always enters the same cell, either in the first row or the first column, and hence, for purposes of the proof, could be considered as being permanently stored in the cell through which it enters.

A final note concerns the external I / O bandwidth requirement of our solution. Since the computat ion requires cln data items, and completes the computation in C2 n2/3 time steps, an external bandwidth of at least cn 1/3 is necessary. The implementation of the network appears to require an external bandwidth of 2/ /2/3 . However, if we examine the data flow, it can be verified that in any cycle at most n 1/3 distinct elements need to be fed into the array in each of the vertical and horizontal directions. I f an element of B is held in the first row of cells for n 1/3 cycles, the bandwidth required for the B elements is bounded by n ~/3. Similarly, if n 1 /3 elements of A are buffered

204 M1 Lu AND VARMAN

in each of the cells in the first column of the network, the bandwidth requirements for the _A elements is also bounded by n 1/3. In summary, the network that solves the recurrence equation (1) achieves the least possible time (O(n2/3)), uses the smallest possible number of cells (0(n4/3)), and requires the minimum possible I /O bandwidth (O(n 1/3)). Hence, the design is optimal.

3.3. Applications

In the following sections we show how a number of geometric problems can be formulated as equation (1), and hence can be solved on an (n 2/3 h- 1) • n 2/3 systolic array in O ( n 2/3) time steps.

3.3.1. Segment intersection counting. Given a set L = { / 1 . . . . . In} of line segments, the segment intersection counting problem determines for each lg, 1 -< i -< n, the number of segments it intersects. Each segment is represented by the coordinates of its endpoints. A cell can determine whether li and lj intersect in O(1) time.

Let a~ = b, = li, the ith line segment. Define

f(aj, bi)=~l if Ij, li intersect, ,

L0 else,

x(O) i.j =0 , Vl<~i<--n, l<--j<-fl,

and O is ordinary addition. Each c~, 1 -< i-< n, is the number of segments that l~ intersects. The same approach may be used to determine the number of intersections of any set of n k-gons (k fixed) in O(n 2/3) time.

3.3.2. Ranking a subset of elements. Given a vector S[1 �9 �9 �9 n] of elements, a total order < on the elements, and a Boolean vec tor 'F [1 . . . n], the ranking problem is defined as follows. For each element S[i] such that F(i) is true, determine the number of elements S[j] such that F(j) is true and S[j] < S[ i].

The problem can readily be cast into the form of o u r recurrence by making the following definitions:

a, = b, = (S[ i ] , F[i]) ,

{10 if F(i)AF(j )A(S[ j]<S[i]) , f(a~, b~) = else,

x~~ l < i < n , l<_j<--[3, I , J - -

and Q is ordinary addition. Each ci, 1 -< i -< n, is the rank of a flagged element S[i] among all the flagged elements in S.


3.3.3. Convex hull. Let S = {p~] i = 1 , . . . , n} be a set o f p lanar points represented by their Car tes ian coordinates and u an arbi t rary point in the plane. Let Sk, 1 < k <- n, denote the set {u} w {p,I i = 1 , . . . , k}. By iteratively examining each of the points p~, 1 -< i-< n, we may de termine whether u is a convex point o f the set S, as follows. Assume inductively that we have de te rmined whether u is a convex point o f Sk; if sO assume that we have de te rmined points (Xk, Yk)~ Sk, such that all o ther points o f Sk- - {u} lie within the convex angle/--XkUyk. We refer to Xk and Yk as the extreme points of Sk. The loop of p rocedure C H ( ) below details the steps in de termining whether u is a convex point o f Sk§ In the p rocedure be low C P is a flag that is set to false only if u is not a convex point o f Sn.

Procedure C H ( S, u) /* Dete rmine if n lies on the convex hull o f { u } u S. */ Begin CP = true: x = pl; Y = P2; Let O denote the convex angle xuy. F o r k = 3 , . . . , n do

if CP then Begin

if Pk lies within the opposite angle of 19 then C P = f a l s e ; /* (Figure 6(a)) */ else Begin /* u is a convex point o f S k *//

if Pk does not lie within the angle | then

Begin / * Upda te extreme points o f Sk, Figure 6(b). */ if Pk and x lie on the same hal f -p lane of line uy

then x = Pk; else y = Pk ;

End / * Else no change in extreme point , Figure 6(c). */ End

End Return ( CP, x, y); End

For the systolic implementa t ion on an n x n network, let ai = bi = Pi, and let c~ denote the pair (L k, Fk). F k is a flag that is true iff bi is a convex point o f Sk (where u is the point hi). L k is the triple (Xk, b~, Yk) where Xk and Yk are the extreme points of Sk. The compu ta t ion pe r fo rmed by each cell consists o f the steps in one iteration of the loop of p rocedure CH. The flag F~', 1 - i-< n, in c~' will be true if and only if pi is on the convex hull of the given set S.

206 M1 Lu AND VARMAN

\ \ �9 Pk

(a)

J f

u f

(b)

Y Pk x

(c)

Figure 6

We now show that this p rob lem can be d e c o m p o s e d for solution on a ( a + l ) x a systolic network. Let S j, l<-j<-fl, denote the set {u}w {Po-l)~+~ " " "Pj,}. Using CH(S j, u) detai led previously, we can determine whether u is a convex point of the set S j. I f so, let x j, yJ be the extreme points o f S j. To de termine whether u is a convex point of S, we need to de termine only if u is a convex point o f the set {u} w {M, yJ IJ = 1 , . . . , / 3 } . This step is imp lemen ted on the ( a + 1)th row o f the systolic network. The cell comput ing ce, 1-< i-< n, receives on /3 consecut ive cycles the extreme points M, yJ of S j (where u is point be), and a flag indicating whether bi is a convex point of S j. Using the p rocedure CH( ) for each of the extreme points M, yJ it receives, the cell can determine if be is on the convex hull o f {be} w {x ~, yJ [j = 1 , . . . , / 3 } , and hence on the convex hull o f S.

The above compu ta t ion makes it possible to flag all points on the convex convex hull o f S, in O(n 2/3) t ime steps. Usually, the expected output for the p lanar hull p rob lem is an ordered list o f the vertices on the convex hull. We now explain how the n 2/3 x n 2/3 network of cells can put the points on the hull into clockwise order in O(n 2/3) t ime steps.

G E O M E T R I C PROBLEMS ON 2-D A R R A Y PROCESSORS 207

Let CH denote the set of points in the convex hull, ]CH] <- n. Determine the leftmost and rightmost points (say L and R, respectively) of CH. The points that lie above the line LR (CHu) should be ordered by increasing x-coordinates and those below LR (CHL) by decreasing x-coordinates.

The determination of L and R and flagging each point as a member of CHu or CHL can be performed straightforwardly on the network in O(n 2/3) time. Using the procedure, Rank described earlier, each point in CHu (CHL) can be ranked to indicate its position in its ordered set. We explain how to order the set by permuting the points on the basis of their rank.

Let rank(i) denote the rank of point i in its set. Initially, the unordered data points are stored in the first n 1/3 r o w s o f the array, n 2/3 elements per row. Each cell finds an integer k such that kn 2/3<. rank(i)< ( k+ 1 ) n 2/3.

Step 1. Perform a vertical rotation in each column to bring the data in row r to row (r+knl/3).

Step2. Perform a horizontal rotation in each row to bring point i with rank(i) to column [rank(i) mod n2/3+ 1].

Step 3. Move the data in row r to row [(r - 1)/n 1/3] + 1 in the same column. This results in the sorted sequence being stored in the first n 1/3 r o w s of the array, n 2/3 elements per row.

All these data movement steps need no more than n 2/3 time each. Thus the time needed for the permutation is 0 ( n 2/3). Figure 7 shows the steps involved in permuting a set of eight items. For a given set of n points, even in the worst case where all the n points are on the convex hull, it takes only O(n 2/3) time to order them and place them in clockwise order.

3.3.4. Fixed-size disk placement updating. The fixed-size disk placement problem is defined as follows: given a set of points pi, i = 1, 2 , . . . , n, in the plane, and a fixed disk of radius r, find a location to place the disk such that the number of points covered by the disk is maximized. The problem is equivalent to the following problem: given a set C of n circles with the same radius r and centered at Pl,P2, . . . ,Pn, find a subset of C whose common intersection is maximum. Furthermore, the points can be weighted, and the total weight of the points covered by the disk can be maximized. As an example of the application of the above problem, n cities with different populations and a radio station of a fixed transmission power are given. The optimization problem is to find the site to set up the station so that the maximum possible population can receive its signal.

The unweighted problem can be solved by finding the intersection points of each circle and the other n - 1 circles. For each circle scan the intersection points in counterclockwise (clockwise) order, and count the number of intersecting circles. This is done by incrementing a counter when we encoun- ter an intersection point and enter the interior of the circle which contributes


n2/3 r

nl/3 I 7 I 5 3 6 2 0 4

nl/3 l 21 0 3

nl/3 11 7 5

6 I 4

1 3 0 2

4 6

0 1 2 3

4 5 6 7

Figure 7

the intersection point, and decrementing the counter when we leave the circle of concern.

Now, suppose that we have already determined the center of the given disk so that it covers the most number of points in a given set. Adding a new point P~+t to the set, we attempt to update the placement of the fixed-size disk, i.e., decide whether there is a new center in which the disk can cover more points.

Let Ci denote the circle with the given fixed radius centered at pi. Find all the intersection points between C,+~ and C~, 1 - i-< n, their coordinates, and the angle/_t~p,+~rn where m is the rightmost point on C,+t. There are at most 2n such intersection points, t~, 1 -<j-< 2n. Two kinds of points may be encountered when we travel along C,+~ in a counterclockwise direction.


t--/-- . \ [/ r \ }

Cn+l

Figure 8

Some points are of type 1 (marked as "o" in Figure 8) by which we enter the interior of a circle intersecting C,+I. The other intersection points are of type 2 (marked as " x " in Figure 8) by which we leave that circle.

For each point ti, 1-< i<-2n, determine counti equal to the number of circles Cj, 1-<j <-n, whose type 1 intersection with C,+1 lies within the angle /--tip,,+lm (measured counterclockwise) and whose type 2 intersection lies outside /-tipn+~m (Figure 8). The maximum value of count,, 1 < - i<-2n, is the largest number of circles overlapping a point of C,+~. I f this is greater than the number of parts covered by an optimally placed circle before the update, the new optimal placement has been found. Otherwise, no updating of the placement is needed.

The mapping of this computation onto the (a + 1)x a systolic array is straightforward. The intersection points t~, 1-< i<-2n, can be obtained by adapting the segment intersection algorithm presented earlier. To determine count, 1 <- i <- 2n, we make four passes of the data through the network as described below (fj is a flag that indicates whether tj is type 1 or type 2):

Pass 1. a~ = bi = ( t i , f ) . " ~- Pass 2. a, = ( t ,+, , f+, ) , bi = ( t , , f ) . Pass 3. a, = ( t , , f ) , b~ =( t i+ , , f+ , ) . Pass 4. ai = (t,+.,f/), bi = (ti+n,f/+.),

i if Z-ajpn+~m<Z_b,p.+lm, f (a j , b~)= if /_ajp.+lm <-/_b~p.+~m and aj is type 1,

- if /ajpn+lm<_Ab~p~+~m and aj is type 2.

The operation | is ordinary addition, count~, 1 <- i <- n, are obtained from the last row of cells after pass 2 and count , n < i <- 2n, obtained after pass 4.

The weighted problem of the fixed-size disk placement updating can be solved using the same algorithm with a minor modification. Let w~ represent the weight of point p~, the center of circle Ci. If t~ is an intersection point of Ci and C,+~, then associate w, with b. The input data to the systolic


ne twork now consists o f the t r ip le (h, w~, f ) . The o p e r a t o r f ( ) is modi f i ed so that 1 and - 1 are r ep laced by wj and - w j , respect ively.

4. Conclusion

The need for r ap id compu ta t i ons o f geomet r ic p roper t i e s such as f inding the neares t ne ighbors or the convex hull o f a set o f points , arises in several app l i ca t ions such as air-traffic control and robot ics . Paral le l p rocess ing offers a cost-effective means o f ob ta in ing des i r ed execut ion speeds for c o m p u t a t i o n a l l y d e m a n d i n g p rob lems . We have p resen ted para l l e l a lgor i thms for solving some geomet r ic p rob l ems on two ar ray p rocesso r m o d e l s - - t h e M C C and the two-d imens iona l sys tol ic array. Whi le a g loba l cont ro l uni t is of ten p r o v i d e d on an M C C to pe rmi t b roadcas t ing o f da t a and ins t ruct ions , the systol ic a r ray does not have g loba l control and each cell pe r fo rms its compu ta t i ons on the basis o f pu re ly local in format ion .

Using a ~ x ~ M C C , our a lgor i thm for f inding the nearest neighbors

of a set o f p l a n a r po in ts has O(v/~) t ime complex i ty which is op t ima l up to cons tan t factors.

A systol ic a lgor i thm is p resen ted for the convex hull construction prob lem, and a r a n k i n g - a n d - p e r m u t i n g techn ique is used to ou tpu t the convex poin ts in a c lockwise order . In add i t ion , we have p resen ted a systol ic u p d a t e a lgor i thm for the disk p lacemen t prob lem. These a lgor i thms execute on an n2/3• n 2/3 two-d imens iona l systol ic a r ray and require O ( n 2/3) t ime steps.

References

[1] J. L. Bentley and M. I. Shamos, Divide-and-conquer in multidimensional space, in Proc. of Sth Ann. ACM Syrup. on Theory of Computing, 1976, pp. 220-230.

[2] B. M. Chazelle, Computational geometry on a systolic chip, IEEE Trans. Comput., 33 (1984), 774-785.

[3] B. M. Chazelle, Intersecting is easier than sorting, in Proe. ofl6th ACM Symp. on Theory of Computing, 1984, pp. 125-134.

[4] B. M. Chazelle and D. T. Lee, On a circle placement problem, in Proc. of Conj'. on Information Systems Science, 1984.

[5] F. Dehne, O(n 1/2) algorithms for maximal elements and ECDF searching problem on a mesh-connected parallel computer, Inform. Process. Lett., 22 (1986), 303-306.

[6] R. L. Graham, An efficient algorithm for determining the convex hull of a finite planar set, Inform. Process. Lett., 1 (1972), 132-133.

[7] H.T. Kung, Why systolic architectures?, Computer, 15 (1982), 37-46. [8] M. Lu, Constructing the Voronoi diagram on a mesh-connected computer, in Proc. of

1986 Int. Conf. on Parallel Processing, 1986, pp. 806-811. [9] M. Lu and P. Varman, Solving geometric proximity problems on mesh-connected com-

puters, in Proc. of 1985 IEEE Comp. Soc. Workshop on Computer Architecture for Pattern Analysis and Image Database Management, 1985, pp. 248-255.

[10] M. Lu and P. Varman, Mesh-connected computer algorithms for rectangle-intersection problems, in Proc. of 1986 Int. Conf. on Parallel Processing, 1986, pp. 301-307.

[11] R. Miller and Q. F. Stout, Computational geometry on a mesh-connected computer, in Proc. of 1984 Int. Conf. on Parallel Processing, 1984, pp. 66-73.


[12] R. Miller and Q. F. Stout, Mesh Computer Algorithms for Computational Geometry, Technical Report 86-18, Department of Computer Science, State University of New York at Buffalo, July 1986.

[13] D. Nassimi and S. Sahni, Data broadcasting in SIMD computers, IEEE Trans. Comput., 30 (1981), 101-106.

[14] F. P. Preparata and D. T. Lee, Computational geometry--a survey, IEEE Trans. Comput., 33 (1984), 1072-1100.

[15] C. D. Thompson and H. T. Kung, Sorting on a mesh-connected parallel computer, Comm. ACM, 20 (1977), 263-271.

geometric problems on two-dimensional array...

Documents