on a recursion connected with tree balancing algorithms

4
Information Processing Letters 24 (1987) 189-192 13 February 1987 North-Holland ON A RECURSION CONNECTED WITH TREE BALANCING ALGORITHMS D.C. VAN LEIJENHORST and Th.P. VAN DER WEIDE Faculty of Mathematics and Natural Sciences, Catholic University, 6525 ED Nijmegen, The Netherlands Communicated by M.A. Harrison Received 3 March 1986 We derive an explicit solution for a class of recursions, connected with the complexity of various algorithms (e.g., dynamic tree balancing and median sort). Keywords: Binary search tree, balanced tree, time complexity, recurrence relation 1. Introduction Binary trees have proven to be an excellent tool for set manipulation. The average case behaviour is within a constant factor from the optimal case. In order to bound worst-case behaviour, all kinds of balancing schemes have been introduced (see [1]). Let us call a binary tree balanced if the weights of its subtrees differ by at most one, and both subtrees are also balanced. The balanced trees considered here are trees of minimal depth, and therefore optimal for retrieval. However, insertion into such a tree may take large reorganisation to maintain balance. When inserting a key into a balanced tree, it will be necessary in the worst case to transfer a key from the overweighted subtree, in order to restore balance. In this case, the new key will be entered into the subtree, while deleting its maximal (or minimal) key. For the complexity of this operation we introduce C n = worst-case cost to insert a key into a balanced tree of n k9ys while deleting the maximal (or minimal) key, and maintaining balance. Thus, C n has to satisfy a recursion equation of the form C0=et , C2n+l=~n+2Cn (n>~l), C2n=~,n+C~_,+C n (n>0), (1) where a, 13n, ~ are 'overhead costs' depending on the actual representation for trees. For instance, when the difference of the weights of subtrees is directly represented in each node, 13 n and a n are constants. Remarks. (a) When a = 0 and [3 n -- Vn -- 1 we have C n = n. (b) The worst-case costs I,, of insertion alone are related to CI, by a recursion of the form I 0 = 00, I2n+l = o n + I n, and I2n = q'n + In-1 + Cn. (c) Recursion (1) also occurs in the analysis of other algorithms, e.g., in median sort where [3 n and ~n are functions of order O(n) [6], or in off-line balancing of binary trees [5]. Recursion (1) may be seen as a refinement of [1, p. 295]. 0020-0190/87/$3.50 © 1987, Elsevier Science Publishers B.V. (North-Holland) 189

Upload: dc-van-leijenhorst

Post on 25-Aug-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On a recursion connected with tree balancing algorithms

Information Processing Letters 24 (1987) 189-192 13 February 1987 North-Holland

ON A RECURSION CONNECTED WITH TREE BALANCING ALGORITHMS

D.C. VAN LEIJENHORST and Th.P. VAN DER WEIDE

Faculty of Mathematics and Natural Sciences, Catholic University, 6525 ED Nijmegen, The Netherlands

Communicated by M.A. Harrison Received 3 March 1986

We derive an explicit solution for a class of recursions, connected with the complexity of various algorithms (e.g., dynamic tree balancing and median sort).

Keywords: Binary search tree, balanced tree, time complexity, recurrence relation

1. Introduction

Binary trees have proven to be an excellent tool for set manipulation. The average case behaviour is within a constant factor from the optimal case. In order to bound worst-case behaviour, all kinds of balancing schemes have been introduced (see [1]). Let us call a binary tree balanced if the weights of its subtrees differ by at most one, and both subtrees are also balanced. The balanced trees considered here are trees of minimal depth, and therefore optimal for retrieval. However, insertion into such a tree may take large reorganisation to maintain balance.

When inserting a key into a balanced tree, it will be necessary in the worst case to transfer a key from the overweighted subtree, in order to restore balance. In this case, the new key will be entered into the subtree, while deleting its maximal (or minimal) key. For the complexity of this operation we introduce

C n = worst-case cost to insert a key into a balanced tree of n k9ys while deleting the maximal (or minimal) key, and maintaining balance.

Thus, C n has to satisfy a recursion equation of the form

C0=e t , C 2 n + l = ~ n + 2 C n (n>~l ) , C 2 n = ~ , n + C ~ _ , + C n ( n > 0 ) , (1)

where a, 13n, ~ are 'overhead costs' depending on the actual representation for trees. For instance, when the difference of the weights of subtrees is directly represented in each node, 13 n and a n are constants.

Remarks. (a) When a = 0 and [3 n -- Vn -- 1 we have C n = n. (b) The worst-case costs I,, of insertion alone are related to CI, by a recursion of the form I 0 = 00,

I 2 n + l = o n + I n, and I2n = q'n + I n - 1 + Cn. (c) Recursion (1) also occurs in the analysis of other algorithms, e.g., in median sort where [3 n and ~n

are functions of order O(n) [6], or in off-line balancing of binary trees [5]. Recursion (1) may be seen as a refinement of [1, p. 295].

0020-0190/87/$3.50 © 1987, Elsevier Science Publishers B.V. (North-Holland) 189

Page 2: On a recursion connected with tree balancing algorithms

Volume 24, Number 3 INFORMATION PROCESSING LETTERS 13 February 1987

In [7], a recursion, occurring in connection with multi-dimensional divide-and-conquer, is solved and the solution is interpreted as the number of paths in a certain two-dimensional graph. A similar interpretation for recursion (1) is possible but seems less elegant.

2. The general solution

The general solution of recursion (1) can be obtained in terms of certain arithmetical functions as follows. Let a2n = Vn, (X2n-1 = 13n-1 (n >/1), and A(x) = En°°__a %x n (the 'overhead cost inventory'). Let F(x) = E~=0 Cn xn be the generating function of the C n. One easily checks the following lemma.

2.1. Lemma

F(x) = (1 + x)2F(x 2) 4- A(x). (2)

Similar equations occur when counting trees (see [2, p. 55]). It seems, however, that explicit solutions are found only rarely.

The general solution of (2) can be obtained by the expedient of 'repeated substitution':

F(x) = (1 + x ) 2 F ( x 2) + A(x) = (1 + x)2(1 + x 2 ) 2 F ( x 4) + (1 + x ) 2 A ( x 2) + A(x)

oo ~ m-1 2 2ix2 . . . . . F(0) I ' - [ ( l + x 2~ + E A(x2r") I-I ( l + x ) .

j =0 m=0 j =0

One has

m - 1 F(O) =¢x, 1-I (lq-x2J) = (1-- x2m)/(1-- X)' I - I (1 + x 2 ' ) = 1 / ( l - x )

j=o j=o

(these identities go back to Euler and express the fact that each integer has a unique binary representation). After some rearrangements one obtains the following lemma.

2.2. Lemma

oo

F(x) = a / ( 1 - x) z + ~ O/,m+m(X), (3) m = l

oo

where ¢/m(X) = Y'. xm2k(1 -- X2k)2 / (1 -- X) 2. (4) k=0

From this lemma we derive that the Cn's are a linear combination of the ( etj )j ~ n, where the coefficients in the linear form are related to binary representations of integers, as shown in the following theorem.

2.3. Theorem. The general solution of recurrence (1) is

tlog ~J C n = ( n + l ) a + E ( n m ° d 2 k + l ) a x k +

k~0

where X k = Xk(n ) = [n /2k] .

[log n]

E k=0

(2 k - n mod 2 k - l)etM_l,

190

Page 3: On a recursion connected with tree balancing algorithms

Volume 24, Number 3 INFORMATION PROCESSING LETTERS 13 February 1987

Proof . F o r m u l a (4) can be rewr i t ten as

oo oc oo "~m(X) = Y'~ x m 2 k ( I + x + x 2 + ' ' " +x2k--1) 2 = E xm2ky'-q kxj '

k=O k=O j =0

say. N o w no te tha t

c 2c ( ] ÷ X ÷ X 2 ÷ " ' " ÷ x C ) 2:--- E 0 ÷ 1) X j ÷ E

j=0 j=c+ l ( 2 c - j + 1)X j. (5)

If we use this result and subs t i tu te in to (3), we f ind

a + E {Xrnq x n = E F ( x ) = (1 - x) e n = l m2k+j=n n=O

(n + 1)oLx n + n=l

E rn2k~n

%qLm2k)x", where (by (5))

n - m2 k + 1

k = 2 k+l -- 1 -- n + m2 k q n - m2 k

0

for 0 ~< n - m2 k ~< 2 k -- 1,

for 2 k ~< n - m2 k ~< 2(2 k - 1),

for 2(2 k - 1) < n - m2 k.

(6)

R a t h e r surpris ingly, mos t of the t e rms disappear . In (6), first let 0 ~< n - m2 k ~< 2 k - 1 where k is fixed. Put n = a2 k + b, 0 ~< b < 2 k. T h e n a - 1 + (b + 1 ) / 2 k ~< m ~< a + b / 2 k so the on ly con t r i bu t i on to F(x) occurs w h e n m a [ n / 2 k] = X k. This con t r ibu t ion equals k = = etmqn_m2k = e%(b + 1) = tXxk(n m o d 2 k + 1) which yields the first sum in the theorem. The second sum is f o u n d similarly, t ak ing the second range of indices in (6). []

3. A consequence of the general solution

As an in teres t ing consequence of T h e o r e m 2.3 we see how the terms et k p r o p a g a t e in to Ci , which m a y be useful w h e n the costs et k have a burs t character . This occurs, for example , if ot k #: 0 on ly when k m o d p - - 0 for some integer p (viz. per iodical ly occurr ing costs for page faults) . The case p = 2 has a m o r e direct so lu t ion which we p resen t in the next section.

4. Explicit solution of a special case

T h e s imples t case of recur rence (1) occurs when ~n = ~ and 2'n = 2' are cons tan t s . T h e n C n = (n + 1)et + m[3 + (2' - 13)0., where

O o = 0 , 02~+1=2p~ ( n > ~ l ) , p 2 n = l + P n _ l + p n ( n > ~ 0 ) .

W e can wri te the result of the r epea t ed subs t i tu t ion in the p roof of T h e o r e m 2.3 in a sl ightly d i f fe ren t way. n. Let F(x) = Y.,,~opnX , then

oo F(x ) = x / ( 1 - x) 2 + [ 1 / ( 1 - x) 2] E x21- [ 2 / ( 1 - x) 2] Z x2~/( 1 + x2~) -

i~0 i=0

191

Page 4: On a recursion connected with tree balancing algorithms

Volume 24, Number 3 INFORMATION PROCESSING LETTERS 13 February 1987

The first two terms are easy. The third can be rewritten as follows:

oo oO o0

Z x2 ' / ( 1 + x2') = E E (--1) i+jXi= E (1--1)n)X n, i=O n~O i>~O,j~>l, n=21j n=O

where Vn is the number of ' trailing zeros' in the binary expansion of n. If we twice apply the rule

[1/(1-x)] E I2 E ixn, n=0 n=0 i=0

we get oO oO

[1/(1- x) 2] E x2~/( 1 + x2~)= E S~ xn i=0 n=0

with S n Z~=0(1 - Vk) = the number of ones in the binary expansion of n. Let b n

It follows that

the sum of the numbers of ones in the binary expansions of the integers 1 , 2 , . . . , n , using be the length of n in binary.

9,, = (n + 1)(b n + 1) - 2 b - - 2S n.

4.1. Corollary. S n =½nb n + O(n), where b n is the length of the binary representation of n.

This corollary is well known (see [6,7]).

Acknowledgment

We wish to thank the (unknown) referee, who suggested a shorter proof of Theorem 2.3 and also drew our at tention to some of the references.

References

[1] A.V. Aho, J.E. Hopcroft and J.D..Ullman, Data Structures and Algorithms (Addison-Wesley, Reading, MA, 1983).

[2] L. Comtet, Advanced Combinatorics (Reidel, Dordrecht, The Netherlands, 1974).

[3] H. Delange, Sur la fonction sommatoire de la fonction "somme des ciffres", Enseignement Math. 21 (1975) 31-47.

[4] P. Flajolet and U Ramshaw, A note on Gray code and odd-even merge, SIAM J. Comput. 9 (1980) 142-158.

[5] K. Kleine, Offline balancing binary trees using O(log n)

temporary storage, Tech. Memo, Forschungs Zentrum In- formatik, Karlsruhe, Fed. Rep. Germany, October 1985.

[6] D.E. Knuth, The Art of Computer Programming, Vol 3: Sorting and Searching (Addison-Wesley, Reading, MA, 1972).

[7] L. MonJer, Combinatorial solutions of multidimensional divide-and-conquer recurrences, J. Algorithms 1 (1980) 60-75.

192