quick recovery of two embedded complete binary trees in a hypercube

7
Quick recovery of two embedded complete binary trees in a hypercube C.-C. HSU Y.-W. Liu Indexing terms: Complete binary tree, Dilation, Embedding, Hypercube, Processor utilisation Abstract: In the paper, the authors propose a novel approach for embedding a (d - 1) level and a (d - 2) level complete binary tree (CBT) into a d-dimensional hypercube (d-cube). Moreover, free processors are used as spare processors to recover a single fault in the two trees. The primary results are that the (d - 1)-CBT can be recovered in at most two steps and the (d - 2)-CBT in one step. The dilation of the recovered embedding is at most two and the processor utilisation is near 15%. 1 Introduction As the number of processors in parallel computers becomes large, models in which processors are never faulty become increasingly unrealistic. However, when a processor fails, it requires hours of computation to recover the fault if there is no proper recovery strategy. A recovery scheme involves reconfiguration and process migration [6, 7, 81. Reconfiguration algorithms can reallocate the data in faulty processors to spare pro- cessors so that the communication structure (embedded structure) can be recovered. Recovery cost is measured by the number of data reallocations, and performance is measured by the dilation of the embedded structure, i.e. the maximum distance between the processor pairs to which all adjacent nodes in an application graph are mapped. The application graphs considered in this paper are complete binary trees (CBTs), which are important in parallel algorithms such as parallel prefix sum, divide- and-conquer, parallel sorting etc. [l, 91. Recently, several researchers have studied the embedding of complete binary trees into hypercubes. In this research, it is impor- tant to ensure that the embeddings can preserve the adja- cency relationships among tree nodes in order to perform data communication efficiently. It has been shown that it is impossible to embed a d-CBT into a d-cube (expansion 1) and still preserve adjacency (dilation 1) [4]. Different methods are proposed for embedding a (d - l)-CBT into a d-cube and preserving adjacency relationships among tree nodes. Wu [4] gave a recursively defined, bottom-up algorithm for determining a proper embedding. Johnsson 0 IEE, 1994 Paper 11SOE (Cl), first received 23rd June 1993 and in revised form 19th January 1994 The authors are with the Department of Information Management, National Taiwan Institute of Technology, Taipei, Taiwan, Republic of China IEE Proc.-Comput. Digit. Tech., Vol. 141, No. 4, July 1994 [3], in contrast, presented a top-down algorithm to define the labelling scheme of the tree nodes. In Wu’s paper, he considered the property of ‘free-free neighbour’ and proved that a taller tree can be embedded using this property. Later, in Leis’s paper [Z], the formal trans- formation formula of the ‘free-free neighbour’ was pro- posed. However, the methods described above only considered embedding in a fault-free hypercube. In other words, there are no recovery strategies in their embed- ding methods. To avoid this disadvantage, some fault tol- erant embeddings have been proposed. Provost [5] developed a top-down and distributed algorithm to embed a (d - 1)-CBT into a d-cube. This embedding can preserve adjacency relationships and tolerate a faulty section of the (d - 1)-CBT using a remapping method. Unfortunately, the recovery cost of the remapping method is large and the dilation of recovered embedding is two. Lee [6] proposed a quick recovery strategy,which can recover embedding with dilation two in one step. However, all of the above fault-tolerant algorithms have the same disadvantage: a lot of processors (about 50%) in the hypercube are not used to embed a CBT. In other words, the processor utilisation is low. In Desphande and Jenevein’s paper [lo], they developed a distributed tree set-up algorithm to embed a double- rooted CBT into a hypercube. They also proved that two (d - 1) level CBTs can be simultaneously mapped to a d-cube and the processor utilisation is near 100%. Never- theless, they must pay a high price to recover a fault since there is only one spare processor in the d-cube. In the worst case, they need to remap all the processors of the d-cube to recover a fault. This process is time-consuming. In order to overcome the above shortcomings, we propose an approach for embedding two CBTs ((d - 1)- CBT and (d - 2)-CBT) into a hypercube (d-cube). Since there are 2k - 1 nodes in a k-CBT, the processor utilisation is [(zd-’ - 1) + (r-’ - 1)]/2’ (near 75%). Furthermore, we use the free processors after embedding to recover a single fault in the two trees. The (d - 1)-CBT can be recovered in at most two steps and the (d - 2)- CBT in one step. The dilation of the recovered embed- ding is at most two. 2 Preliminaries In embedding, the application graph is the topology to be embedded, and the host graph is the topology used to The authors would like to thank the referees for their helpful comments that greatly improve the presentation of this paper. 205

Upload: y-w

Post on 19-Sep-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quick recovery of two embedded complete binary trees in a hypercube

Quick recovery of two embedded complete binary trees in a hypercube

C.-C. HSU Y.-W. Liu

Indexing terms: Complete binary tree, Dilation, Embedding, Hypercube, Processor utilisation

Abstract: In the paper, the authors propose a novel approach for embedding a (d - 1) level and a (d - 2) level complete binary tree (CBT) into a d-dimensional hypercube (d-cube). Moreover, free processors are used as spare processors to recover a single fault in the two trees. The primary results are that the (d - 1)-CBT can be recovered in at most two steps and the (d - 2)-CBT in one step. The dilation of the recovered embedding is at most two and the processor utilisation is near 15%.

1 Introduction

As the number of processors in parallel computers becomes large, models in which processors are never faulty become increasingly unrealistic. However, when a processor fails, it requires hours of computation to recover the fault if there is no proper recovery strategy.

A recovery scheme involves reconfiguration and process migration [6, 7, 81. Reconfiguration algorithms can reallocate the data in faulty processors to spare pro- cessors so that the communication structure (embedded structure) can be recovered. Recovery cost is measured by the number of data reallocations, and performance is measured by the dilation of the embedded structure, i.e. the maximum distance between the processor pairs to which all adjacent nodes in an application graph are mapped.

The application graphs considered in this paper are complete binary trees (CBTs), which are important in parallel algorithms such as parallel prefix sum, divide- and-conquer, parallel sorting etc. [l, 91. Recently, several researchers have studied the embedding of complete binary trees into hypercubes. In this research, it is impor- tant to ensure that the embeddings can preserve the adja- cency relationships among tree nodes in order to perform data communication efficiently. It has been shown that it is impossible to embed a d-CBT into a d-cube (expansion 1) and still preserve adjacency (dilation 1) [4]. Different methods are proposed for embedding a (d - l)-CBT into a d-cube and preserving adjacency relationships among tree nodes. Wu [4] gave a recursively defined, bottom-up algorithm for determining a proper embedding. Johnsson

0 IEE, 1994 Paper 11SOE (Cl), first received 23rd June 1993 and in revised form 19th January 1994 The authors are with the Department of Information Management, National Taiwan Institute of Technology, Taipei, Taiwan, Republic of China

IEE Proc.-Comput. Digit. Tech., Vol. 141, No. 4, July 1994

[3], in contrast, presented a top-down algorithm to define the labelling scheme of the tree nodes. In Wu’s paper, he considered the property of ‘free-free neighbour’ and proved that a taller tree can be embedded using this property. Later, in Leis’s paper [Z], the formal trans- formation formula of the ‘free-free neighbour’ was pro- posed. However, the methods described above only considered embedding in a fault-free hypercube. In other words, there are no recovery strategies in their embed- ding methods. To avoid this disadvantage, some fault tol- erant embeddings have been proposed. Provost [5] developed a top-down and distributed algorithm to embed a (d - 1)-CBT into a d-cube. This embedding can preserve adjacency relationships and tolerate a faulty section of the (d - 1)-CBT using a remapping method. Unfortunately, the recovery cost of the remapping method is large and the dilation of recovered embedding is two. Lee [6] proposed a quick recovery strategy, which can recover embedding with dilation two in one step.

However, all of the above fault-tolerant algorithms have the same disadvantage: a lot of processors (about 50%) in the hypercube are not used to embed a CBT. In other words, the processor utilisation is low. In Desphande and Jenevein’s paper [lo], they developed a distributed tree set-up algorithm to embed a double- rooted CBT into a hypercube. They also proved that two (d - 1) level CBTs can be simultaneously mapped to a d-cube and the processor utilisation is near 100%. Never- theless, they must pay a high price to recover a fault since there is only one spare processor in the d-cube. In the worst case, they need to remap all the processors of the d-cube to recover a fault. This process is time-consuming.

In order to overcome the above shortcomings, we propose an approach for embedding two CBTs ((d - 1)- CBT and (d - 2)-CBT) into a hypercube (d-cube). Since there are 2k - 1 nodes in a k-CBT, the processor utilisation is [(zd-’ - 1) + (r-’ - 1)]/2’ (near 75%). Furthermore, we use the free processors after embedding to recover a single fault in the two trees. The (d - 1)-CBT can be recovered in at most two steps and the (d - 2)- CBT in one step. The dilation of the recovered embed- ding is at most two.

2 Preliminaries

In embedding, the application graph is the topology to be embedded, and the host graph is the topology used to

The authors would like to thank the referees for their helpful comments that greatly improve the presentation of this paper.

205

Page 2: Quick recovery of two embedded complete binary trees in a hypercube

embed the application graph. In this paper, the applica- tion graph is two CBTs and the host graph is a hyper- cube.

A hypercube with degree d (d-cube) is an undirected graph with 2d nodes labelled from 0 to 2’ - 1. There is an edge between a pair of nodes if and only if their binary representations differ in one bit (i.e. Hamming distance is one).

An embedding rp of an application graph G, into a host graph Gh is the function which maps V, to V,, where V, and V, are the set of nodes in G. and G I , respectively. Embedding rp is isomorphic if it is injective, and for each ( v i , u j ) E E,,, (dui), dui)) E E,. The expansion of rp is I V, I / I V, I . If rp is injective, then processor utilisation is equal to the inverse of the expansion. The dilation of rp is max (distance(rp(v,), duj)), for every (U,, vi) E E,. The dila- tion of rp represents the maximum communication delay between adjacent nodes.

3

In this section we propose an approach which embeds two CBTs into a hypercube. First, a (d- 1)-CBT is embedded in a d-cube, then a mirror CBT ((d - 2)-CBT) is embedded in the remaining nodes of the dcube.

First, we divide the dcube into four (d - 2)-subcubes (xx.. xxOO), (xx.. xxOl), (xx. .xxlO), and (xx. .xxl l), which are referred to as the 00, 01, 10, and 11 subcube, respectively. Let k = d - 1. Then, we intend to embed a k-CBT and a (k - 1)-CBT into a (k + 1)-cube in the fol- lowing way:

(1) each node in the two trees will be mapped to a distinct processor

(2) the nodes in the top (k - 2) levels of the main k-CBT will be embedded into a 00 subcube

(3) all nodes except leaves in the mirror (k - 1)-CBT will be embedded into a 11 subcube, and

(4) the nodes in the bottom two levels of the main CBT and the leaves of the mirror CBT will be embedded into 10 and 01 subcubes.

Fig. 1A shows the partitions of the four subcubes. In this figure, the two trccs are embedded into a hypercube in the way described above.

A new approach for embedding two complete binary trees into a hypercube

100 subcubel01 subcube t

I I I 10 subcube 11 subcube

Fig. 1A The two trees in the four subcubes

The embedding of a tree with height k is accomplished by merging two subtrees with height (k - 1) in different subcubes with an additional root node which resides in a free processor. That is, our embedding method embeds a tree in a bottom-up manner. This bottom-up embedding can be achieved if the root node is allocated to a free processor in one of the two subcubes such that the root node connects the two subtrees across the highest dmen- sion. Fig. 1B shows the embedding of two subtrees into a tree.

In fact, in order to achieve our bottom-up embedding, we use the 4-cube as the smallest cube. In spite of this limitation, it is easy to embed a 2-tree and a 1-tree into a 3-cube.

206

3.1 Main CBTembedding In our embedding method, we can treat the mirror CBT as a transformation of the top (k - 1) levels of the main

transfer dimension d 1 (d-l)wbcube

ox..xoo OX..XOl I1 lx..xOO I lx..xOl 1

OX..XIO OX..Xll (d-1)-subcube L

transfer dimension d

o rwl of main-CBT

@ root of mirror-CBT

0 main-CBT

mirror-CBT

10 subcube 11 subcube

d-cube

Fig. 1 B The process of embedding two subtrees into a tree

CBT, which will be explained in the next subsection. In this subsection, we embed a main CBT in a hypercube using a labelling scheme. This labelling scheme is used to decide which processor in the hypercube is allocated to a node in the main CBT.

Let x and y be two adjacent nodes in a CBT. q(x) and &J) represent the mapped processors of x and y under an embedding rp, respectively. rp is stated by labelling the link between x and y with a number representing the dimension difference between d x ) and &J). Consider a k level tree as shown in Fig. 2. In this tree, xR is the root

level 1

.... ~- .-. \”

k-2

Fig. 2 A k lewl nee lobelled by ‘p

node with level 1. If node x locates in the mth level, links l k - , and +,,, denote the labels of the left and right links, respectively. Note that all nodes in the same level have the same left and right link-labels.

We define the labelling scheme of a CBT as follows

3 i f j -1

4 i f j = 3

j i f j i sodd if i is even ifj 2 41j = { l j - ,

ifj= 1

Some of the link-labels l j and r j are listed below

j l l 2 3 4 5 6 7 8 9 10 . . . l j 3 1 4 3 5 4 7 5 9 I... rj 4 2 5 6 7 8 9 10 11 12 ...

IEE hoc-Comput. Digit. Tech., Vol. 141, No. 4, July 1994

Page 3: Quick recovery of two embedded complete binary trees in a hypercube

Fig. 3A illustrates a 3-CBT which is labelled by cp, and Fig. 3B shows how cp embeds the 3-CBT into a 4-cube. Let us denote a path P = (p(xi) I (l,, 12, . . . , Z j ) as a route in a hypercube from processor cp(xi) to a destination pro- cessor via dimensions I,, I,, . . . , l j . The address of the destination processor is obtained by complementing the I,th, I,th, ..., Ijth address bits of cp(xi). For example, in Fig. 3B a path P = (p(xR) l(1, 3) represents a route from the embedded root node cp(xR)=oooO to the left-most embedded node 0101 via dimensions 1 and 3 in the 4-cube. In such a way, we can allocate each tree node to a processor in a hypercube.

Fig. 3A 3-CBT with labels

! i ~ i i1+ -pllol 1111 1110

Fig. 38 Embeddings of a 3-CBT and a 2-CBT into a 4-cube

There exist some properties for this embedding method, which will be used in the later proofs. From the definitions of l j and r j , we know that if j is even, l j = l j - and rj = j + 2 for j > 3. Therefore, I j # 1, and r j # r , for m > j b 3. It is still true when j = 2. Thus, we obtain the following property.

Property I : If j is even, l j # I, and r j # r , for m > j . If j is odd, l j + , # I , and rj+ , # r, for m > (j + 1).

We also know from the definition of l j that l j must be odd except 1, = I , = 4. In addition, if j is even, r j is even and is not equal to four. Hence, we derive that r j # I , for m > j and j is even. Furthermore, when j is even, r , > r j + 2 > j - 3 = l j for m > j 2 8. It is obvious that I j # r ,

for m > j is true when j = 2, 4 and 6. As a result, l j # r , for m > j and j is even. From the above discussions, we have property 2.

Property 2: If j is even, l j # r,,, and r j # I, for m odd, l j + , # r, and r j + , # I, for m > (j + 1). We know from the definitions of rj and l j that

(1) r j # r j+ , , l j # I j + , and r j # ljfor all j .

j. If j is

If j is odd, r j = j + 2 > I , , , = I c i + l ) - 3 = ( j + 1) - 3 = j - 2 for j > 7. It is still true that r j > I j + , when j = 1, 3

IEE Proc.-Compt. Digit. Tech., Vol. 141, No. 4, July 1994

and 5. Therefore, r j # I j + , when j is odd. If j is even, r j = j + 2 > l j + , = j + l forj>4.1tisobviousthatrj#Ijcl if j = 2. So we understand that r j # l j + , when j is even. In summary, we have

(2) r j # l j+ , for all j .

On the other hand, if j is odd, r j + , = (j + 1) + 2 = j + 3 j = l j for j 2 5. It is obvious that r j + , # I j for

j = 1 and 3. So rj+l # l j when j is odd. If j is even, rj+ = ( j + 1) + 2 = j + 3 > l j = l j - 3 = j - 3 for j 2 8. In addition, it is obvious that r j + , # l j for j = 2, 4 and 6. Therefore, rj+ # I j for j is even. In summary, we derive

(3) r j + l # l j for all j .

From (l), (2), and (3), we obtain the following property.

Property 3: l j + , # I j # r j + , # rj for all j.

In addition, if the root node is located in one processor of 00 subcube, we can embed the nodes in the top (k - 2) levels of the k-CBT into the 00 subcube because dimen- sions 1 and 2 never appear in l j and r j for j > 2. Since either I , = 1 or rz = 2 appears exactly once in the path traversing from the root to a node in the bottom two levels, all nodes in the bottom two levels will be mapped to 01 and 10 subcubes. Therefore, 11 subcube must be free. From the discussions, we have properties 4 and 5.

Property 4 : If the root node is located in one processor of the 00 subcube, then the 11 subcube must be free.

Property 5 : If the root node is located in one processor of the 00 subcube, then all nodes in the top (k - 2) levels must be located in the 00 subcube and all nodes in the bottom two levels must be located in the 01 and 10 sub- cubes.

From the labelling scheme, we realise that r j = j + 2 = d(d > 4) is the highest dimension of the d-cube. For example, when we embed a 5-CBT into a 6-cube, the root node will connect two 4-CBTs with link-labels r, = 6 and I, = 3. In other words, these two subtrees are mapped to different subcubes across the highest dimension d = 6. Therefore, we have the following property.

Property 6 : For k > 3, the root node of a (k + 1)-CBT will connect two k-CBTs which are mapped to different subcubes across the highest dimension rk (= k + 2).

Using the above properties, we can prove the following theorem.

Theorem I : cp can embed a (d - 1)-CBT into a d-cube without conflicts and still preserve adjacency relation- ships.

Proof: By induction on d.

Induction base: For d = 4, i.e. embedding a 3-CBT into a 4-cube, we can see in Fig. 3B that cp can embed a 3-CBT into a 4-cube without conflicts and still preserve adja- cency relationships.

Induction hypothesis: Assume that for d = k > 4, the lemma follows. It means that cp can embed a (k - 1)-CBT into a k-cube without conflicts and still preserve adja- cency relationships.

207

Page 4: Quick recovery of two embedded complete binary trees in a hypercube

Induction step: Now consider the case of d = k + 1, i.e. embedding a k-CBT into a (k + 1)-cube. From the induc- tion hypothesis, we know that rp can embed two (k - 1)- CBTs into two different k-subcubes without conflicts and still preserve adjacency relationships in the two subtrees. So we only need to prove that the mapped processor of the root nodes, q(xJ, does not c o a c t with any of the processors allocated to the two (k - 1)-CBTs. From the definition of link labels, we understand that nodes differ with their children in one dimension. Thus, q(xJ is still adjacent to the mapped processors of its children. Suppose there is a non-root node x which is mapped to cp(xR). Let P be the path traversing from cp(x,) to rp(x) via links Ik- or r,- 1, l k -2 or r,- , , . . . , Z j + I or r j+ I j or r j . P is a loop in the hypercube since the source processor is the destination processor in P. Hence, each label I&- or rk-l, l , - , or r k - , , ..., Ij+l or r,+,, and I, or r, in P appears even times.

Let rn be the index of a link-label except j in P. Since < rn < (k - l), we know from properties 1 and 2 that ifj

IS even, l j or rj is a distinct label in P; if j is odd, Z j + , or rj+l . is also a distinct label in P. From property 3, we reahse that l j + , # I, # rj+ # r j . This means that, for any j, each of the possible Ps contains at least a label which appears only once. This leads to a contradiction. The lemma follows.

Theorem 1 implies that rp is an isomorphic embedding [61.

32 Mirror CBT embedding In this subsection, we will present a transformation fun0 tion c(x) for transforming all nodes in the top (k - 1) levels of the main CBT into mirror nodes. Furthermore, these mirror nodes will be mapped to free processors and form a (k - 1)-CBT which preserves adjacency relation- ships.

Let c(x) = cp(x) I(1,2, 3,4) be the transformation which complements the last four address bits of rp(x). Note that the sequence of complementing the four address bits will not influence the final c(x) result. Using this function c(x), we will prove the following lemma and theorem.

Lemma I: The transformation function c can transform all nodes in the top (k - 1) levels of the embedded k-CBT to a mirror (k - 1)-CBT without confiicts and still pre- serve adjacency relationships,

Proof: Assume the root node of the k-CBT is mapped to one processor of 00 subcube. From property 5, we under- stand that all nodes of the top (k - 2) levels are located in the 00 subcube. c(x) transforms these nodes to 11 subcube because both the first and second address bits of q(x) are complemented in c(x). From property 4, we know that these processors in 11 subcube must be free.

Consider a node x in the (k - 1)th level of the k-CBT. From property 5, we know that 01 and 10 subcubes contain the mapped processors of the nodes in the bottom two levels of the k-CBT and some free processors. Thus, c(x) is located in 10 to 01 subcube. We only need to prove that c(x) does not conflict with any of the mapped processors of all nodes in the bottom two levels of the

Suppose, as shown in Fig. 4, x is a child of x’, and x“ is the brother of x. From this figure, we realise that c(x) = q(x) I(1, 2, 3, 4) # q(z) where z is x” or one of the four children of x and x”. In other words, c(x) does not

k-CBT.

208

codlict with any of the mapped processors of the five nodes.

4 A “ I / \ (k-2) level

(k-1) level

leaves

Fig. 4 - pseudolink

Part o f b o t t m f a a levels of a k-CBT embedded by Q

Next, we would like to prove that c(x) # cp(z) where z is a node (except the above five nodes) in the (k - 1)th or the kth level of the main CBT. Suppose z is a leaf node. The traversal, denoted by P, from c(x) to q(z) is accom- plished by the following two steps. First, traverse upward from c(x) via links rl(ll), Il(ri), I , or r , , I , or r 3 , . . . , l j - , or rj-l, Zbrj) to an ancestor in the (k -jJth level. Then, traverse downward from this ancestor via links rj ( Z j ) , I j - or r j - 1, . . . , I , or r 3 , I , or r 2 , I, or r , to dz) . Both I j and rj appear exactly once in this traversal and j 2 3. From property 6 and the labelling scheme, we know that rj is the only and the largest link-label in the (j + 1)-CBTs, thus r j 2 5 for j 2 3 is the only and the largest link-label in P. In other words, there exists at least a link-label rj 2 5 that does not appear even times. It indicates that P is not a cycle and hence c(x) # q(z). We obtain that, for a node x in the (k - 1)th level of the main CBT, c(x) does not conflict with any of the processors allocated to all leaves, i.e. nodes in the kth level.

The traversal from c(x) to a node z’ (Zx”) in the (k- 1)th level can be easily derived from only deleting the last link-label of the downward traversal, i.e. I, or rl. Since I, = 3, rl = 4, and rj 2 5, the deletion of the last link-label will not make the new path become a cycle. That is, c(x) does not conflict with 42’). We have already known c(x) # rp(x”). As a consequence, c(x) does not con- flict with any of the processors allocated to all nodes in the (k - 1)th level.

Moreover, we know from Theorem 1 that the adja- cency relationships of the main CBT are preserved, thus

p-p 01 101 p-p 11 101

11111 - -- 01110 01111 p o f 3 - C B T 11110 ,

--------** Fig. 5 andC

3-CBT __ 4-CBT

Embeddings of a 4-CBT and a 3-CBT into a 5-cube using Q

_ _ _ -

IEE Proc-Comput. Digit. Tech., Vol. 141, No. 4, July 1994

Page 5: Quick recovery of two embedded complete binary trees in a hypercube

the adjacency relationships of the mirror CBT are also preserved. As a result, transformation function e maps each node in the top (k - 2) levels of the mirror (k - 1)- CBT to a free processor in the 11 subcube, and each node in the (k - 1)th level to a free processor in the 01 or 10 subcube.

Theorem 2: A k-CBT and its mirror (k - 1)-CBT can be embedded into a (k + l)-cube by cp and c in such a way that each node in the two trees is mapped to a distinct processor and adjacency relationships are preserved.

Proof: It can be directly derived from Theorem 1 and Lemma 1.

Fig. 5 shows a CCBT and its mirror 3-CBT embedded into a 5-cube using cp and c.

4 Recovery strategy

In this section, we will prove that we can recover a single fault in the two embedded CBTs (k-CBT and (k - 1)- CBT) in at most two steps with dilation at most two.

Before discussing further, let q represent the main k-CBT embedded in a (k + l)-cube under cp.

Lemma 2: Suppose there exists only % in a (k + 1)-cube, and Ij and r j are the left and right link-labels of a node x in the main k-CBT. Then Y = q(x)l(Zj, rj) is not a pro- cessor in %.

Proof: Induction base: It is obvious that k = 2 is true.

Induction hypothesis: Suppose that it is true when k > 2.

Induction step: Consider k + 1. The proof is divided into two parts: one contains the root node only and the other contain the two subtrees of the root node.

Root node: First, we would like to prove Y = q(x,JI(I,, rJ does not conflict with any processor in %+,. From property 3, we know 2, # r,, thus Y does not conflict with q(xR). Assume there is a node z in the two k-CBTs whose mapped processor colrficts with Y. Let P be the path traversing from Y to q(z) via

d x l ) =s y I rk = dx,) I (2,s rk. rk)

dx2) = d x l ) I Ik dx3) = (P(xZ)I(Ik-I Or r k - l h . . . I dxk-j+l)

dXR) I (lk 9 rk 7 rk 9 = d x d ,

dxk-j)l(lj+l or r j + l )

dz) = = d x k - j+ 1) I ( I ) Or r j )

P is a loop in the (k + 1)cube since the source processor is the destination processor in P . Hence, each label r, , I , , Ik-1 or r,- . .. , Zj+ or r j + l , and Ij or r j in P appears even times.

We know from properties 1 and 2 that if j is even, Zj is a distinct label in P; if j is odd, lj+, or r j + l is a distinct label in P . From property 3, we know that Ij+l # Ij # r,+l # r j . This means that, fob any possible j, each of all possible Ps contains at least a unique label. This contra- dicts our assumption. As a result, q(x,J I (Z,, r,) does not conflict with any processor in the T,+ 1.

The two subtrees: We know from the induction hypothe- sis that the mapped processor q(x) in a of q+l does not conflict with Y = q(x)l(lj, rj). From property 6, we

IEE Proc-Comput. Digit. Tech., Vol. 141, No. 4, July 1994

know that the two k-CBTs are mapped to two different (k + 1)-subcubes. Moreover, Y is located in the (k + 1)- subcube to which q(x) belongs, thus Y is different from any processor in the two T,s . Thus, we only need to prove that Y does not conflict with dxR). The path tra- versing from q(x,J to Y is (1) first traversing from rp(x,J to cp(x) via links I , or r,, . . . , and Ij+, or r j+ ; (2) then traversing from q(x) to Y via links Ij and r, . It can be proved in the same way as in part one that this path contains at least a unique label. As a consequence, for any node x in the two k-subtrees, q(x) I (Ij, rj) does not conflict with any processor in the T+

Let Tk-l represent the mirror (k - l)-CBT embedded in a (k + 1)cube under e. From Lemma 2, we can directly derive the following lemma since it can be proved in a similar way.

Lemma 3: Suppose there exists only T i - in a (k + 1)- cube, and I j and rj are the left and right link-labels of a node x in the mirror (k - 1)-CBT. Then Y = c(x) I ( I j , r j ) is not a processor in T ; - 1.

In what follows, we will present our recovery strategies and show their advantages.

Theorem 3: A faulty processor in the mapped mirror (k - 1)-CBT can be recovered in one step with dilation at most two.

Proof: We divide the mirror CBT into two parts for dis- cussion: one contains all leaf nodes, i.e. all nodes in the (k - 1)th level, and the other is the subtree containing all nodes in the top (k - 2) levels of the mirror CBT.

Subtree of mirror CBT: Let x be the faulty node in the subtree described above. If x is the root node xR, then x is replaced with Y = C(X,JI(I,-~, r k - J in one step. Since the two children of xR are mapped to ~ ( X ~ ) I I , - ~ and 4 x 3 I rk- 1, dilation is still one. Consider the case of x # xR . Suppose x connects its parent node xp with link-label L, and xp connects the other child with link-label Lz . We replace c(x) with the free processor Y = c(x$ I (Ll, L,) in one step. Since Y is adjacent to c(x) (= c(xJ I Ll), the dis- tances of Y to the mapped processors of x’s parent and children are all two, i.e. dilation is two. From Lemma 3, we know that Y does not conflict with any mapped pro- cessor of this mirror CBT. In the proof of Lemma 1, we know that the 11 subcube only contains free processors and the mapped processors of the nodes in the top (k - 2) levels of the mirror CBT. Furthermore, Y is also a processor in 11 subcube because L, and L, are neither 1 nor 2. Therefore, processot Y must be free.

Leaf nodes: Recall that the smallest mirror tree has height 2. In the case of height = 2, the faulty leaf node can be replaced with the free processor 0111, which is adjacent to c(xd = 1111 as shown in Fig. 3B. Thus, recovery cost is one step and dilation is one. Consider the case of height > 2. Suppose that x is the faulty leaf node, node x1 is its father (parent), and node x, is its grand- father. In this case, since leaf node x in the mirror CBT is transformed from a node in the (k - 1)th level of the main CBT, x, connects its children with link-labels I , = 4 and r3 = 5. We utilise x,’s replacement processor, Y = c(xJ I ( I 3 , r3), to replace the faulty processor c(x). Since c(xl) = c(xz) I ( I 3 or r3), Y is still adjacent to c(xl), i.e. dila- tion is one. Moreover, because x, is a node in the subtree discussed in part one, Y is free.

or r,-

209

Page 6: Quick recovery of two embedded complete binary trees in a hypercube

According to the above discussions of the two parts, the theorem follows.

Theorem 4: A faulty processor in the mapped main k-CBT can be recovered in at most two steps with dila- tion at most two.

Proof; Suppose the faulty node is x. If x = x R , then the replacement ‘node’ r = x; else r = x’s father. From Lemma 2, we understand that q(x) can be replaced with Y = rp(r)((Zj, rj) in one step. If x = x R , then dilation is one because the mapped processors of xR’s children are q(r) I lj and q(r) I r j ; else dilation is two because Y is adja- cent to the original q(x) = &)I ( I j or rj). In addition, we know also from Lemma 2 that Y does not conftict with any mapped processors of nodes in the main CBT. But Y may be free or conflict with the mapped processor of a node in the mirror CBT. If Y is free, q(x) is recovered in one step with dilation < 2. In the case that Y conflicts with a processor in the embedded mirror CBT, say e), we still replace q(x) with Y (=c(y)). Moreover, we treat &) as a pseudo faulty processor and replace c(y) with the processor derived from Theorem 3 in another one step

-I

a

- - - - - -sO

b

Fig. 6 a in one step b in two steps

3-CBT __ CCBT

210

Recowring n singlefault of the mirror CBT

~ _ _ _

with dilation < 2. As a consequence, the recovery cost is at most two steps and the dilation of the recovered embedding is at most two.

Fig. 6 shows how to recover a faulty node in a 4-CBT and a 3-CBT which are embedded into a 5-cube. Fig. 6a illustrates the recovery of a faulty leaf node in the mirror CBT. The mapped processors of the faulty leaf node, its father, and its grandfather are 00101, 00111, and 01111 respectively. In our embedding, the grandfather node has two link-labels 4 and 5 connecting its children. In the proof of Theorem 3, we know the replacement processor is Y = 01111 I(4, 5) = 10111, which can recover this fault in one step with dilation one.

Fig. 66 illustrates the recovery of a faulty node in the main CBT. The mapped processors of the faulty node and its father are OOO10 and 01010 respectively. In addi- tion, its father connects its children with two link-labels 3 and 4. From Lemma 2, we can find the replacement pro- cessor Y = 01010 I(3,4) = 001 10 in one step with dilation two. Nevertheless, Y conflicts with the mapped processor of a node in the mirror CBT. Hence, we treat the latter as a pseudo faulty processor. From Theorem 3, we can find a free processor Y’ = 01 11 1 I(4,5) = 1011 1 as the replace- ment processor of this pseudoly fault processor (001 10) in one step. Then the pseudo faulty processor becomes the replacement processor (OOOlO) of the actually faulty node. Therefore, the recovery cost is two steps and the dilation of the recovered embedding is two.

5 Conclusions

In this paper, we propose an approach for embedding two complete binary trees into a hypercube, and present recover). stategies with low cost and high performance for recovering a single fault.

The primary results of this paper are the processor uti- lisation is near 75% (0.75) or the expansion is about 1.33 (= 1/0.75), the dilation of the recovered embedding is at most two, and the steps required to recover a single fault is also at most two. In Table 1, we can observe that the

Table 1 : Comparison between different embedding and recovery schemes

~~ ~~

Application Expansion Recovery Dilation of

embeddina stew mover graph

wu (d-1)-CBT 2 - - Desphande two(d-1)-CBTs 1 26 2 Lee (d-1)-CBT 2 1 2

Proposed (d-l)-CBTand 1.33 2 2 (d- 2)-CBT

proposed embedding method has some advantages over the others in the expansion (processor utilisation) and the recovery steps. The future work can be directed to pre- senting strategies which are able to recover multiple faults with small recovery steps and dilation [ll].

6 References

1 SANJAY, R., and SARTAJ, S.: ‘Hypercube algorithms’ (Springer- Verlag, New York, 1990)

2 LEISS, E.L., and REDDY, H.N.: ‘Embedding complete trees into hypercubes’, I / : Process. Lett., 1 9 9 1 , s pp. 197-199

3 JOHNSSON, S.L.: ‘Communication efficient basic linear algebra computations on hypercube architectures’, J . Parallel Distrib. Comput., 1987.4, pp. 133-172

4 WU, A.: ‘Embedding of tree networks into hypercubes‘, J . Parallel Distrib. Comput., 1985,2, pp. 238-249

IEE Proc.-Comput. Digit. Tech., Vol. 141, No. 4, July 1994

Page 7: Quick recovery of two embedded complete binary trees in a hypercube

5 PROVOST, FJ., and MELHEM, R.: ‘Distributed fault tolerant embedding of binary trees and rings in hyprrcubc’. F’roc. Int. Work- shop on Defect d Fault Tolerant in VLSI systems, 1988, pp. 399- 346

6 LEE, T.C.: ‘Quick recovery of embedded structures in hypercube computers’. Proc. 5th Distributed Memory Computer Co&rence, 1990, pp. 1426-1435

7 LIU, J., and McMILLIN, B.M.: ‘A divide and conquer ring embed- ding scheme in hypercubes with dfcient recovery ability’. Proc. Int. Cod. Parallel Processing, 1992, pp. 111: 38-45

8 YANNEY, R.M., and HAYES, J.P.: ‘Distributed recovery in fault-

I E E Proc.-Comput. Digit. Tech., Vol. 141, No. 4, July 1994

tolerant multiprawsor networks’, IEEE Trans. Comput., 1986, C-35, (lo), pp. 871-879

9 LEE, J.G., and KIM, H.J.: ‘Partial sum problem mapping into a hypercube’, In$ Process. Lett., 1990.36, pp. 221-224

10 DESPHANDE, S.R., and JENEVEIN, R.M.: scalability of a binary tree on a hypercube’, Roc. Int. Cod. on ParalkI Processing, 1986, pp. 661-668

11 YANG, P.J., and RAGHAVENDRA, C.S.: ‘Embedding and m n E - guration of binary trees in faulty hypercubes’, Proc. Int. Porallel Processing Symp., 1992, pp. 2-9

211