e–cient enumeration of combinatorial...

110
B.Sc. Engg. Thesis Efficient Enumeration of Combinatorial Objects By Muhammad Abdullah Adnan Student No.: 0005010 Submitted to Department of Computer Science and Engineering in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering Department of Computer Science and Engineering Bangladesh University of Engineering and Technology (BUET) Dhaka-1000 November 13, 2006

Upload: nguyennhi

Post on 30-Jul-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

B.Sc. Engg. Thesis

Efficient Enumeration of Combinatorial Objects

By

Muhammad Abdullah Adnan

Student No.: 0005010

Submitted to

Department of Computer Science and Engineering

in partial fulfilment of the requirements for the degree of

Bachelor of Science in Computer Science and Engineering

Department of Computer Science and Engineering

Bangladesh University of Engineering and Technology (BUET)

Dhaka-1000

November 13, 2006

i

Certificate

This is to certify that the work presented in this thesis entitled “Study of Enumeration

Problems” is the outcome of the investigation carried out by me under the supervision of

Professor Dr. Md. Saidur Rahman in the Department of Computer Science and Engineering,

Bangladesh University of Engineering and Technology (BUET), Dhaka. It is also declared that

neither this thesis nor any part thereof has been submitted or is being currently submitted

anywhere else for the award of any degree or diploma.

(Supervisor) (Author)

Dr. Md. Saidur Rahman Muhammad Abdullah Adnan

Professor Student No.: 0005010

Department of Computer Science Department of Computer Science

and Engineering (BUET), Dhaka-1000. and Engineering (BUET), Dhaka-1000.

Contents

Certificate i

Acknowledgements ix

Abstract x

1 Introduction 1

1.1 Enumeration Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 Order of output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.3 Goals of an Enumeration Algorithm . . . . . . . . . . . . . . . . . . . . . 7

1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.1 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.2 Avoiding Duplications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.3 I/O Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.4 Exhaustive Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Algorithms for Enumeration Problems . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.1 Combinatorial Gray Code Approach . . . . . . . . . . . . . . . . . . . . 9

1.3.2 Family Tree Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4 Scope of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.4.1 Distribution of Objects to Bins . . . . . . . . . . . . . . . . . . . . . . . 12

1.4.2 Distribution of Distinguishable Objects to Bins . . . . . . . . . . . . . . 13

1.4.3 Evolutionary Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

ii

CONTENTS iii

1.4.4 Labeled and Ordered Evolutionary Trees . . . . . . . . . . . . . . . . . . 14

1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Preliminaries 17

2.1 Basic Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.2 Paths and Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1.3 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1.4 Binary Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1.5 Family Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.1.6 Recursion Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.1.7 Evolutionary Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.1.8 Integer Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.1.9 Set Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.1.10 Multiset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.1.11 Simpleset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2 Algorithms and Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.1 The notation O(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.2 Polynomial algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.3 Constant Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2.4 Average Constant Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2.5 Amortized Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3 Graph Traversal Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4 Catalan Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Distribution of Objects to Bins 26

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3 Generating Distribution of Objects to Bins . . . . . . . . . . . . . . . . . . . . . 32

3.3.1 The Family Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

CONTENTS iv

3.3.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4 Efficient Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4.1 Relationship Between Left Sibling and Right Sibling . . . . . . . . . . . . 37

3.4.2 Leaf-Ancestor Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.4.3 The Efficient Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.5 Distributions in Anti-lexicographic Order . . . . . . . . . . . . . . . . . . . . . . 43

3.6 Generating Distributions with Priorities to Bins . . . . . . . . . . . . . . . . . . 44

3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Distribution of Distinguishable Objects to Bins 45

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3 Generating Distribution of Distinguishable Objects . . . . . . . . . . . . . . . . 51

4.3.1 The Family Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.3.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.4 Efficient Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.4.1 Relationship Between Left Sibling and Right Sibling . . . . . . . . . . . . 58

4.4.2 Leaf-Ancestor Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.4.3 Representation of a Distribution in D(n,m, k) . . . . . . . . . . . . . . . 61

4.4.4 The Efficient Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.5 Generating Distributions with Priorities to Bins . . . . . . . . . . . . . . . . . . 64

4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5 Evolutionary Trees 66

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.3 Generating Labeled Evolutionary Trees . . . . . . . . . . . . . . . . . . . . . . . 69

5.4 The Recursion Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.4.1 Parent-Child Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.4.2 Child-Parent Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . 74

CONTENTS v

5.4.3 The Recursion Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.5 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6 Labeled and Ordered Evolutionary Trees 79

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.3 Representation of Evolutionary Trees . . . . . . . . . . . . . . . . . . . . . . . . 84

6.4 The Family Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.4.1 Parent-Child Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.4.2 Child-Parent Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.4.3 The Family Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7 Conclusion 92

References 94

List of Publications 97

Index 98

List of Figures

1.1 Lexicographic order vs Gray code order for binary strings. . . . . . . . . . . . . 5

1.2 Generating permutations using gray code approach: Johnson-Trotter scheme. . 10

1.3 Illustration of the family tree for all set partitions. . . . . . . . . . . . . . . . . 12

2.1 Illustration of a graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Illustration of a tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3 Illustration of a binary tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4 Illustration of a family tree of 15 nodes. . . . . . . . . . . . . . . . . . . . . . . . 20

3.1 The Family Tree T4,3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Representation of a distribution of 4 objects to 3 bins. . . . . . . . . . . . . . . 31

3.3 Efficient Traversal of the family tree T4,4. . . . . . . . . . . . . . . . . . . . . . . 38

3.4 Efficient Traversal of T4,3 keeping extra information. . . . . . . . . . . . . . . . . 40

3.5 Use of stack for tree traversal (T4,4). . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.6 A Gray code for D(4, 3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.7 Illustration of generation of D(4, 3) in anti-lexicographic order. . . . . . . . . . . 43

4.1 The Family Tree T3,3,2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.2 Representation of a distribution of 3 objects to 2 bins where the objects fall into

two classes and 2 objects from class 1 and 1 object from class 2. . . . . . . . . . 50

4.3 The sequence ((0, 0), (2, 1)) has five children. . . . . . . . . . . . . . . . . . . . . 54

4.4 Efficient Traversal of the family tree T3,3,2. . . . . . . . . . . . . . . . . . . . . . 58

4.5 Efficient Traversal of T4,3 keeping extra information. . . . . . . . . . . . . . . . . 61

vi

LIST OF FIGURES vii

4.6 Illustration of data structure that we use to represent a distribution for distin-

guishable objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.7 A Gray code for D(3, 3, 2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.1 The evolutionary tree having four species. . . . . . . . . . . . . . . . . . . . . . 67

5.2 All possible evolutionary trees having three species. . . . . . . . . . . . . . . . . 67

5.3 Representation of evolutionary tree in terms of complete binary tree. . . . . . . 69

5.4 Two evolutionary trees of (a) and (b) are mirror image of one another. . . . . . 70

5.5 Two evolutionary trees of (a) and (b) are sibling equivalent to one another. . . . 70

5.6 The Recursion Tree R4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.7 Illustration of a sequence of subtree A ∈ S(6). . . . . . . . . . . . . . . . . . . . 72

5.8 Illustration of Type I and Type II child of a sequence of subtree A ∈ S(6). . . . 74

6.1 The evolutionary tree having four species. . . . . . . . . . . . . . . . . . . . . . 80

6.2 The Family Tree F4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.3 Representation of evolutionary tree in terms of complete binary tree. . . . . . . 83

6.4 Representation of an evolutionary tree having five species. . . . . . . . . . . . . 84

6.5 Illustration of Family Tree F5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.6 Representation of Family Tree F5. . . . . . . . . . . . . . . . . . . . . . . . . . . 88

List of Tables

1.1 Results on distribution of identical objects to bins. . . . . . . . . . . . . . . . . 15

1.2 Results on distribution of distinguishable objects to bins. . . . . . . . . . . . . . 15

1.3 Results on generating evolutionary trees. . . . . . . . . . . . . . . . . . . . . . . 15

viii

Acknowledgments

First of all, I would like to thank my supervisor Professor Dr. Md. Saidur Rahman for

introducing me to the field of enumeration of combinatorial objects, and for teaching me how

to carry on a research work. I have learned from him how to write, speak and present well. I

thank him for his patience in reviewing my so many inferior drafts, for correcting my proofs and

language, suggesting new ways of thinking and encouraging me to continue my research work.

I again express my heart-felt and most sincere gratitude to him for his constant supervision,

valuable advice and continual encouragement, without which this thesis would have not been

possible.

I would like to express my utmost gratitude to Professor Shin-ichi Nakano for encouraging

me to work in this area and for giving us valuable comments on the manuscripts of my papers.

I would like to thank Shin-ichiro Kawano for helpful discussions.

I would also want to thank Professor Dr. Muhammad Masroor Ali, Head, Department of

Computer Science and Engineering, BUET, for the provision of laboratory facilities.

I would like to acknowledge with sincere thanks the all-out cooperation and services rendered

by the members of our research group for many things. They gave me valuable suggestions and

listened to all of my presentations.

ix

Abstract

One of the problems addressed in the area of combinatorial algorithms is to enumerate all

items of a particular combinatorial class efficiently in such a way that each item is generated

exactly once. Enumeration algorithms have many applications in optimizations, clustering,

data mining, and machine learning, thus we need developments of efficient algorithms, in the

sense of both theory and practice. In this thesis, we study different enumeration problems and

approaches to solve them. In particular, we focus on the applications of enumeration algorithms

to Bioinformatics and Combinatorics. We concentrate on the efficiency improvement of the

algorithms and invent new approaches to solve enumeration problems such that each solution

is generated in constant time in ordinary sense.

A well known counting problem in combinatorics is counting the number of ways objects

can be distributed among bins. In this thesis, we consider it as an enumeration problem

and give efficient algorithms to generate all distributions of n objects to m bins. We gave

elegant algorithms for both identical and distinguishable objects. Generating all distributions

has practical applications in computer networks, distributed architecture, CPU scheduling,

memory management, etc. Our algorithms generate each distribution in constant time without

repetition. We also introduce a new elegant and efficient tree traversal algorithm that generates

each solution in O(1) time in ordinary sense.

In this thesis, we also deal with the problem of generating all evolutionary trees. Generating

all evolutionary trees among different species have many applications in Bioinformatics, Genetic

Engineering, Archaeology, Biochemistry and Molecular Biology. In these applications, to find

a better prediction, sometimes it is necessary to generate all possible evolutionary trees among

different species. We give an algorithm to generate all such probable evolutionary trees having

n ordered species without repetition. We also find out an efficient representation of such

evolutionary trees such that each tree is generated in constant time on average.

x

Chapter 1

Introduction

In computer science, we frequently need to count things and generate solutions. The science of

enumerating is captured by a branch of mathematics called combinatorics . One of the problems

addressed in the area of combinatorial algorithms is to generate all items of a particular combi-

natorial class efficiently in such a way that each item is generated exactly once. To solve many

practical problems it is required to generate samples of random objects from a combinatorial

class. Sometimes a list of objects of a particular class is useful to search for a counter-example

to some conjecture, to find the best solution among all solutions, or to experimentally measure

the average performance of an algorithm over all possible inputs. Early works in combinatorics

focused on counting; because generating all objects requires huge computation. With the aid

of fast computers it now has become feasible to list the objects in combinatorial classes. How-

ever, in order to generate entire list of objects from a class of moderate size, extremely efficient

algorithms are required even with the fastest computers. Due to the reason mentioned above,

recently many researchers have concentrated their attention for developing efficient algorithms

to generate all objects of a particular class without repetitions [JWW80, S97]. Examples of

such exhaustive generation of combinatorial objects include generating all integer partition

and set partitions, enumerating all binary trees, generating permutations and combinations,

enumerating spanning trees, etc. [J63, KN05, NU03, NU04, NU05, ZS98].

In this thesis, we study different enumeration problems and approaches to solve them.

1

Chapter 1. Introduction 2

In particular, we focus on the applications of generation algorithms to Bioinformatics and

Combinatorics . We also give some efficient algorithms to solve two of them. We concentrate

on the efficiency improvement of the algorithms and invent new approaches to solve enumeration

problems such that each solution is generated in constant time (in ordinary sense). The main

feature of our algorithms is that they are constant time solution which is a very important

requirement for generation problems.

A well known counting problem in combinatorics is counting the number of ways objects

can be distributed among bins [AU95, R00, AR06]. The paradigm problem is counting the

number of ways of distributing fruits to children. For example, Kathy, Peter and Susan are

three children. We have four fruits to distribute among them without cutting the fruits into

parts. In how many ways the children receive fruits? The fruits or the objects, that we want

to distribute, may be identical or of different kinds. Based on this criterion, the problem can

be subdivided into two parts - identical case and non-identical case. In this thesis, we consider

it as an enumeration problem and give algorithms to generate all distributions without repeti-

tion. Generating all distributions has practical applications in channel allocation in computer

networks, client-server broker distributed architecture, CPU scheduling, memory management,

etc. [T02, T04]. Our algorithms generate each distribution in constant time with linear space

complexity. We also present an efficient tree traversal algorithm that generates each solution in

O(1) time. To the best of our knowledge, our algorithm is the first algorithm which generates

each solution in O(1) time in ordinary sense. By modifying our algorithm, we can generate

the distributions in anti-lexicographic order. Finally, we extend our algorithms for the case

when the bins have priorities associated with them. As a byproduct of our algorithm, we get

a new algorithm to enumerate all set partitions when the number of partitions is fixed and the

partitions are numbered. The main feature of all our algorithms is that all of them are constant

time solution which is very important requirement for enumeration problems.

In this thesis, we also deal with the problem of generating all evolutionary trees. Gener-

ating all evolutionary trees among different species have many applications in Bioinformatics

[JP04], Genetic Engineering [KR03], Archaeology, Biochemistry and Molecular Biology. In

1.1. Enumeration Problems 3

these applications, to find a better prediction, sometimes it is necessary to generate all possible

evolutionary trees among different species. To a mathematician, such a tree is simply a cycle-

free connected graph, but to a biologist it represents a series of hypotheses about evolutionary

events. In this thesis, we are concerned with generating all such probable evolutionary trees

that will guide biologists to research in all biological subdisciplines. We give an algorithm to

generate all evolutionary trees having n species without repetition. We also find out an efficient

representation of such evolutionary trees such that each tree is generated in constant time on

average. For the purposes of biologists, we also give a new algorithm to generate evolutionary

trees having ordered species.

In this chapter we provide the necessary background and motivation for this study on

enumeration problems. Section 1.1 serves as an introduction to the enumeration problems.

Section 1.2 addresses the algorithmic challenges that any efficient enumeration algorithm must

resolve. Section 1.3 deals with the well known techniques for solving enumeration problems.

In Section 1.4 we describe the scope of this thesis. Finally, Section 1.5 gives a summary of the

results we have found and compares our algorithms with other related algorithms.

1.1 Enumeration Problems

In this section we discuss about enumeration problems and its applications in different areas.

In mathematics and theoretical computer science, an enumeration of a set is a procedure

for listing all members of the set in some definite sequence. An enumeration algorithm is an

algorithm that exhaustively lists all members of a set, so that each instance is listed exactly

once. Often the set under consideration is the set of all solutions of a practical problem and

hence has huge amount of members. Since an enumeration algorithm must list huge amount

solutions without repetition, to devise an enumeration algorithm we must have the following

considerations:

• Representation (how do we represent the object?)

• Efficiency (how fast is the algorithm?)

Chapter 1. Introduction 4

• Order of output (Lexicographic, Gray code, etc.)

First, we must be able to represent the object that we want to generate. The representation

must be simple and must require least memory. Then we must concentrate on the efficiency i.e.

the time complexity of our algorithm must be minimized. Finally, we must determine an order

in which our listing of objects will be generated. The former two are more problem specific and

hence we discuss them with the description of the problems in the following chapters. In the

following subsections we describe the order of output for enumeration problems and also the

applications of enumeration problems.

1.1.1 Order of output

There are two common ways to order output of enumeration problems. Lexicographic and Gray

code order.

Lexicographic Order

Lexicographic order of combinatorial objects is defined as follows. If P = (p1, p2, . . . , ps′) and

Q = (q1, q2, . . . , qs′′) are representations of objects, then P precedes Q lexicographically if and

only if, for some j ≥ 1, pi = qi when i < j, and pj precedes qj. For example, integer partitions

of 5 in lexicographic order are: 11111, 2111, 221, 311, 32, 41, 5 (note that + sign is omitted).

Lexicographic order is desirable as it is the natural (dictionary) order and can be easily

characterized and traced manually. The anti-lexicographic order is the reverse of the order of

lexicographic one. For example, integer partitions of 5 in anti-lexicographic order are: 5, 41,

32, 311, 221, 2111, 11111.

Gray Code Order

A listing of combinatorial objects is said to be in Gray code order if each successive object in

the listing differs by a constant amount. For example, the swapping of elements, or the flipping

1.1. Enumeration Problems 5

of a bit. In Figure 1.1, the second list is known as the Binary Reflected Gray Code. Each

binary string differs by a single bit flip from the previous string.

Lexicographic Gray code

000000001 001010 011

010011110100

101 111101110

111 100

Figure 1.1: Lexicographic order vs Gray code order for binary strings.

1.1.2 Applications

Enumeration problems have many applications in optimizations, clustering, data mining, and

machine learning, thus we need developments of efficient algorithms, in the sense of both theory

and practice. Some instances where a generation algorithm may be very useful are discussed

below.

Maximal Clique Enumeration

A maximal clique is a complete subgraph that is not contained in any other complete subgraph.

Among all maximal cliques, the largest one is the maximum clique. The clique problem is one

of the basic NP-complete problems. Here, we want to consider the problem of enumerating

all maximal cliques in a graph, the clique enumeration problem. In contrast to the maximum

clique problem, which is NP-complete, the clique enumeration problem is NP-hard.

Graph algorithms have been often used to help understanding biology. Clique enumeration is

a core component in many biological applications, such as gene expression networks analysis, cis

regulatory motif finding, and the study of quantitative trait loci for high-throughput molecular

phenotypes.

Chapter 1. Introduction 6

Generating Subsets

A subset describes a selection of objects, where the order among them does not matter. Many

algorithmic problems seek the best subset of a group of things: vertex cover seeks the smallest

subset of vertices to touch each edge in a graph; knapsack seeks the most profitable subset of

items of bounded total size; and set packing seeks the smallest subset of subsets that together

cover each item exactly once. There are 2n distinct subsets of an n-element set, including the

empty set as well as the set itself. This grows exponentially, but at a considerably smaller rate

than the n! permutations of n items. For example, the set {1, 2, 3} has 8 subsets:

{}, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}, {1, 2, 3}

Generating Partitions

There are two different types of combinatorial objects denoted by the term “partition”, namely

integer partitions and set partitions. These terms are described below:

• Integer partitions of n are sets of nonzero integers that add up to exactly n. For exam-

ple, the seven distinct integer partitions of 5 are {5}, {4, 1}, {3, 2}, {3, 1, 1}, {2, 2, 1},{2, 1, 1, 1}, and {1, 1, 1, 1, 1}. An interesting application that requires the generation of

integer partitions is in a simulation of nuclear fission. When an atom is smashed, the

nucleus of protons and neutrons is broken into a set of smaller clusters. The sum of the

particles in the set of clusters must equal the original size of the nucleus. As such, the

integer partitions of this original size represent all the possible ways to smash the atom.

• Set partitions divide the elements 1, . . . , n into nonempty subsets. For example, there

are fifteen distinct set partitions of n = 4: {1234}, {123, 4}, {124, 3}, {12, 34}, {12, 3, 4},{134, 2}, {13, 24}, {13, 2, 4}, {14, 23}, {1, 234}, {1, 23, 4}, {14, 2, 3}, {1, 24, 3}, {1, 2, 34},and {1, 2, 3, 4}. The problem of set partitions have many applications in vertex coloring,

connected components, etc.

1.2. Challenges 7

1.1.3 Goals of an Enumeration Algorithm

Any algorithm for generating all objects of a particular combinatorial class has to achieve a

number of goals or aims. We list the most important ones below.

• Reduce the time complexity,

• Minimize the usage of memory,

• Reduce the amount of output,

• Avoid duplications, and

• Avoid omissions.

In this thesis, we have considered each of these goals while we develop our algorithms. To

achieve the goals, we have developed efficient representations of objects, efficient data structure

for storage, and clever algorithmic techniques. We will address the issues mentioned above

while we describe our algorithms in detail in later chapters.

1.2 Challenges

In this section we discuss the main challenges that any algorithm for enumerating combinatorial

objects must face [S97]. We have considered all these challenges while developing our algorithms

in this thesis and have given algorithmic techniques that successfully resolve the difficulties

mentioned in the following subsections.

1.2.1 Time Complexity

The number of different objects is very large in many cases. For example, the number of

different permutations of n numbers is exponential. Therefore, to generate all the objects of

a particular combinatorial class, we may have to find an exponential number of objects. That

means, the overall time complexity of the algorithm is at best exponential, which means the

Chapter 1. Introduction 8

generation of individual objects must be very efficient. There are a number of techniques that

accomplish the task. We mention some of those techniques in Section 1.4.

1.2.2 Avoiding Duplications

In any enumeration algorithm, we must have a way to avoid generation of redundant objects.

One way to avoid duplications of objects is to store each object generated so far and check each

newly generated object with all the previous one to find whether the newly generated one is a

duplication. This way of checking duplications has two problems. First, the time complexity

goes up. Second, the space requirement becomes very high. We mention some alternatives for

avoiding duplications in Section 1.4.

1.2.3 I/O Operations

Algorithms that solve enumeration problems are generally I/O intensive and the output of the

algorithm dominates the running time. This is because the number of objects generated is

exponential in many cases and each of these objects must be output to an output device. Since

I/O is slower than computation, the more I/O operations an algorithm performs the slower it

becomes. For this reason reducing the amount of output is essential.

1.2.4 Exhaustive Generation

While we exhaustively generate combinatorial objects, we must have an efficient way to deter-

mine the end of generation. One solution to this problem is that we count the number of objects

generated so far and check whether we have explored all the possibilities. But this works only

in the case where we know in advance the total number of distinct objects to be generated and

have an efficient way for detecting repetitions. For many problems, it may be difficult to know

or calculate the exact number objects that will be generated. For example, it is not trivial to

count the number of different triangulations of a given arbitrary plane graph.

1.3. Algorithms for Enumeration Problems 9

1.3 Algorithms for Enumeration Problems

There are a number of standard methods that are in use for solving enumeration problems. As

mentioned in previous sections, there are some difficulties that any enumeration algorithm must

resolve somehow. These challenges include reducing the amount of output, efficient checking

for duplications and omissions, space complexity etc. Different methods have different ways of

dealing with these challenges.

Classical method algorithms first generate combinatorial objects allowing duplications, but

output only if the object has not been output yet. These methods require huge space to store

the list of objects generated so far. Furthermore, checking whether the newly generated object

will be output takes a lot of time.

Orderly methods algorithms [M98] need not to store the list of objects generated so far,

they output an object only if it is a canonical representation of an isomorphism class.

Reverse search method algorithms also need not to store the list. The idea is to implicitly

define a connected graph H such that the vertices of H correspond to the graphs with the given

property, and the edges of H correspond to some relation between the graphs. By traversing

an implicitly defined spanning tree of H, one can find all the vertices of H, which correspond

to all the graphs with the given property.

In the following two subsections, we describe in more detail two other methods for solving

enumeration problems and address the techniques employed by these methods for resolving the

challenges mentioned above.

1.3.1 Combinatorial Gray Code Approach

To generate all the objects of a particular class, one approach is to try to generate the objects as

a list in which successive elements differ only in a small way. The term Combinatorial Gray Code

first appeared in [JWW80] and is now used to refer to any method for generating combinatorial

objects so that successive objects differ in some prespecified, usually small, way. Savage [S97]

gives a description of the state of the art of the area. The advantages anticipated by such

Chapter 1. Introduction 10

gray code approach are manifold. First, generation of successive objects is faster, since each

object is generated from the preceding one by making constant number of changes. Secondly,

the number of objects in a particular class is generally exponential. Generating algorithms

thus produce huge outputs in general, and the output dominates the running time. If we can

reduce the amount of output, the efficiency of the algorithm improves considerably. So in gray

code approach, each object is output as a difference from the preceding one, thus removing the

necessity to output the entire object. Thirdly, gray codes typically involve elegant recursive

constructions provide new insights into the structure of combinatorial families.

There are many problems that can be solved using combinatorial gray code approach. We

list some of them below.

1. Listing all permutations of 1, . . . , n.

2. Listing all k element subsets of an n element set,

3. Listing all binary trees,

4. Listing all spanning trees of a graph,

5. Listing all partitions of an integer n, and

6. Listing linear extensions of certain posets etc.

n=3

123132312321231213

n=2

1221

n=4

1234 432134213241

1243142341234132143213421324314234124312

23142341243142314213241321432134

Figure 1.2: Generating permutations using gray code approach: Johnson-Trotter scheme.

1.3. Algorithms for Enumeration Problems 11

One particular algorithm for generating all permutations of n elements, based on combinato-

rial gray code approach, is the Johnson-Trotter algorithm. Johnson and Trotter independently

showed that it is possible to generate permutations by transpositions even the two elements

exchanged are required to be in adjacent positions [T62, J63]. The recursive scheme, as shown

in Figure 1.2, inserts into each permutation on the list for n1 the element ’n’ in each of the

possible n positions, moving alternately from right to left, then from left to right.

1.3.2 Family Tree Approach

In the family tree or genealogical tree approach, a hierarchical structure or tree structure es-

tablished among the members of a particular combinatorial class. The idea is to find a unique

parent-child relationship among the objects such that one object can be generated from its

parent by making a minimal amount of changes. The main feature of this approach is that the

entire list of objects need not be in the memory at once for checking duplications. The objects

are generated in the order they are present in the family tree and generation rule itself ensures

that no omissions occur. The space complexity for this approach is also linear in the size of

an individual object. The main challenge in solving an enumeration problem by family tree

approach is to establish a unique parent-child relationship among the objects of interest. For

many problems, finding a suitable parent-child relationship may be extremely difficult.

There are a number of problems that have been solved by the family tree approach [KN05,

NU03, NU04]. Figure 1.3 illustrates the family tree developed by Kawano and Nakano [KN05]

for their algorithm for generating all set partitions.

The drawback of family tree approach is that to build a family tree we have to define both

parent-child and child-parent relationship. Moreover, the recursive traversal of family tree

yields an average constant time algorithm. But for enumeration problems, most of the cases

we want a constant time solution in ordinary sense. Hence intensive research has been made

on the traversal of family tree. Kawano and Nakano [KN05] gave a tree traversal algorithm

which generates each solution in constant time but the overall time complexity is same as the

ordinary traversal. In this thesis we present a new elegant tree traversal algorithm which is

Chapter 1. Introduction 12

11123

1122311213

1212311231 11233

12133

12313

12213

12231

12132

12312 12232 12233

12332 12333

12223

11232

12331

12113

12131

12311

12321 12322 12323

Figure 1.3: Illustration of the family tree for all set partitions.

constant time per solution and also overall time complexity of the algorithm is less than that

of ordinary traversal.

1.4 Scope of this Thesis

In this section we list the algorithms we have developed in this thesis. We follow the family

tree approach of solving enumeration problems. We invent new approaches to establish gray

code among the solutions using family tree approach for solving enumeration problems. We

concentrate on the efficiency improvement of the algorithms and invent new approaches to solve

enumeration problems such that each solution is generated in constant time (in ordinary sense).

1.4.1 Distribution of Objects to Bins

The first problem that we consider is to generate all distributions of n identical objects to m

bins. In this thesis, we give an algorithm to generate all such distributions without repetition.

Our algorithm generates each distribution in constant time with linear space complexity. We

also present an efficient tree traversal algorithm that generates each solution in O(1) time. To

the best of our knowledge, our algorithm is the first algorithm which generates each solution

1.4. Scope of this Thesis 13

in O(1) time in ordinary sense. By modifying our algorithm, we can generate the distributions

in anti-lexicographic order. Finally, we extend our algorithm for the case when the bins have

priorities associated with them. Overall space complexity of our algorithm is O(m), where m

is the number of bins. We give the detailed algorithms in Chapter 3.

1.4.2 Distribution of Distinguishable Objects to Bins

The second problem that we consider in this thesis is to generate all distributions of distin-

guishable objects to bins. Our algorithm generates each distribution in constant time without

repetition. To the best of our knowledge, our algorithm is the first algorithm which generates

each solution in O(1) time in ordinary sense. As a byproduct of our algorithm, we get a new

algorithm to enumerate all multiset partitions when the number of partitions is fixed and the

partitions are numbered. In this case, our algorithm generates each multiset partition in con-

stant time (in ordinary sense). Finally, we extend our algorithm for the case when the bins have

priorities associated with them. Overall space complexity of our algorithm is O(km), where

there are m bins and the objects fall into k different classes. We give the detailed algorithm in

Chapter 4.

1.4.3 Evolutionary Trees

In this thesis, we also deal with the problem of generating all evolutionary trees. Generating

all evolutionary trees among different species have many applications in Bioinformatics [JP04],

Genetic Engineering [KR03], Archaeology, Biochemistry and Molecular Biology. We first give

an algorithm to generate all such evolutionary trees with n species. Our algorithm is simple

and generates each tree in linear time without repetition (O(1) time in amortized sense). We

give the detailed algorithm in Chapter 5.

Chapter 1. Introduction 14

1.4.4 Labeled and Ordered Evolutionary Trees

In this thesis, we also give an efficient algorithm to generate all evolutionary trees with fixed

and ordered number of leaves. The order of the species is based on evolutionary relationship

and phylogenetic structure. A species is more related to its preceding and following species

in the sequence of species than other species in the sequence. We also find out a suitable

representation of such trees. We represent a labeled and ordered evolutionary tree with n

leaves by a sequence of (n − 2) numbers. Our algorithm generates all such trees in constant

time (on average) without repetition. We give the detailed algorithm in Chapter 6.

1.5 Summary

In this thesis we develop efficient algorithms for generating all distributions of objects to bins.

We generate distributions of both identical and distinguishable objects in O(1) time per distri-

bution. We also develop a technique for efficient family tree traversal such that each solution

is generated in constant time in ordinary sense). We also give an algorithm to generate all

evolutionary trees having n species without repetition. We find out an efficient representation

of such evolutionary trees such that each tree is generated in constant time on average. For the

purposes of biologists, we also give two new algorithms to generate evolutionary trees having

ordered species and satisfying some distance constraints. Our main results can be divided into

three parts.

The first part of the results is about the distributions of identical objects to bins. We give

an efficient algorithm that generates all distributions of n objects to m bins where the objects

are identical. The algorithm generates each distribution in constant time (in ordinary sense).

We also present an efficient tree traversal algorithm that generates each solution in O(1) time.

Our new results together with known ones are listed in Table 1.1.

The second part of the results deals with generating all distributions of n distinguishable

objects to m bins where the objects fall into k different classes. The algorithms generates each

distribution in constant time (in ordinary sense) from its previous one using linear space only.

1.5. Summary 15

Criteria Klingsberg Our algorithm

[K82]

Generation time Average constant Ordinary Constant

per object

Space complexity O(m) O(m)

Requires searching? YES NO

Table 1.1: Results on distribution of identical objects to bins.

Criteria Kawano and Nakano Our algorithm

[KN06]

Generates Multiset partitions Distributions of distinguishable

objects to bins

Generation time O(k) Ordinary Constant

per object

Space complexity O(km) O(km)

Table 1.2: Results on distribution of distinguishable objects to bins.

Criteria Nakano and Uno Our algorithm

[NU04]

Generates Rooted trees Labeled and ordered

with n nodes evolutionary trees

Generation time Average constant Average constant

per object

Redundant Objects YES NO

Table 1.3: Results on generating evolutionary trees.

Chapter 1. Introduction 16

This new result together with known ones is listed in Table 1.2.

The third part of our results is on bioinformatics. We give a linear time algorithm to

generate all evolutionary trees. We also find out an efficient representation of an evolutionary

tree having ordered species. We give a new algorithm to generate all evolutionary trees having

n ordered species. The algorithm is simple, generates each tree in constant time on average.

Our new results together with known ones are listed in Table 1.3.

Chapter 2

Preliminaries

In this chapter we define some basic terms of graph theory and algorithms. Definitions which

are not included in this chapter will be introduced as they are needed. We start, in Section 2.1,

by giving definitions of some standard graph theoretical terms used throughout the remainder

of this thesis. We describe some notions from complexity theory in Section 2.2. Sections 2.3

deals with a well known graph traversal algorithm. Finally, Section 2.4 deals with the Catalan

Families of combinatorial objects.

2.1 Basic Terminology

In this section we give definitions of some theoretical terms used throughout the remainder of

this thesis.

2.1.1 Graphs

A graph G is a structure (V,E) which consists of a finite set of vertices V and a finite set of

edges E; each edge is an unordered pair of distinct vertices. We denote the set of vertices of

G by V (G) and the set of edges by E(G). Figure 2.1 illustrates an example of a graph. An

edge connecting vertices vi and vj in V is denoted by (vi, vj). An edge (vi, vj) is called a loop if

17

Chapter 2. Preliminaries 18

vi = vj. A graph is called a simple graph if there is no loop or multiple edges between any two

vertices in G. The degree of a vertex v is the number of edges incident to v in G.

Figure 2.1: Illustration of a graph.

2.1.2 Paths and Cycles

A v0−vl walk, v0, e1, v1, . . . , vl−1, el, vl, in G is an alternating sequence of vertices and edges of G,

beginning and ending with a vertex, in which each edge is incident to two vertices immediately

preceding and following it. If the vertices v0, v1, . . . , vl are distinct (except possibly v0, vl), then

the walk is called a path and usually denoted either by the sequence of vertices v0, v1, . . . , vl or

by the sequence of edges e1, e2, . . . , el. The length of the path is l, one less than the number of

vertices on the path. A path or walk is closed if v0 = vl. A closed path containing at least one

edge is called a cycle.

2.1.3 Trees

A tree is a connected graph containing no cycle. Figure 2.2 is an example of a tree. The

vertices in a tree are usually called nodes . A rooted tree is a tree in which one of the nodes is

distinguished from the others. The distinguished node is called the root of the tree. The root

of a tree is generally drawn at the top. In Figure 2.2, the root is v1. Every node u other than

the root is connected by an edge to some other node p called the parent of u. We also call u

a child of p. We draw the parent of a node above that node. For example, in Figure 2.2, v1 is

the parent of v2, v3 and v4, while v2 is the parent of v5 and v6; v2, v3 and v4 are children of v1,

2.1. Basic Terminology 19

while v5 and v6 are children of v2. A leaf is a node of a tree that has no children. An internal

node is a node that has one or more children. Thus every node of a tree is either a leaf or an

internal node. In Figure 2.2, the leaves are v4, v5, v6, v7 and v8, and the nodes v1, v2 and v3

are internal nodes.

v1

v2 v3 v4

v6 v7 v8v5

Figure 2.2: Illustration of a tree.

The parent child relationship can be extended naturally to ancestors and descendants. Sup-

pose that u1, u2, . . . , ul is a sequence of nodes in a tree such that u1 is the parent of u2, which

is a parent of u3, and so on. Then node u1 is called an ancestor of ul and node ul a descendant

of u1. The root is an ancestor of every node in a tree and every node is a descendant of the

root. In Figure 2.2, all seven nodes are descendants of v1, and v1 is an ancestor of all nodes.

The height of a node u in a tree is the length of a longest path from u to a leaf. The height

of the tree is the height of the root. The depth of a node u in a tree is the length of a path from

the root to u. The level of a node u in a tree is the height of the tree minus the depth of u. In

Figure 2.2, for example, node v2 is of height 1, depth 1 and level 1. The tree in Figure 2.2 has

height 2.

2.1.4 Binary Trees

A binary tree is either a single node or consists of a node and two subtrees rooted at the node,

both of the subtrees are binary trees. Figure 2.3 illustrates a binary tree of 15 nodes.

A complete binary tree is a rooted tree with each internal node having exactly two children.

Chapter 2. Preliminaries 20

v1

v3v2

v4 v5 v6 v7

Figure 2.3: Illustration of a binary tree.

2.1.5 Family Trees

A family tree is a rooted tree with parent-child relationship. The vertices of a family tree have

levels associated with them. The root has the lowest level i.e. 0. The level for any other node is

one more than its parent except root. Vertices with the same parent v are called siblings. The

siblings may be ordered as c1, c2, . . . , cl where l is the number of children of v. If the siblings

are ordered then ci−1 is the left sibling of ci for 1 < i ≤ l and ci+1 is the right sibling of ci for

1 ≤ i < l. The ancestors of a vertex other than the root are the vertices in the path from the

root to this vertex, excluding the vertex and including the root itself. The descendants of a

vertex v are those vertices that have v as an ancestor. A leaf in a family tree has no children.

Figure 2.4 illustrates a family tree of 15 nodes.

Level 0

Level 1

Level2 (4,0,0)(3,1,0)(2,2,0)(1,3,0)(2,0,2)(1,1,2)

(0,3,1)(0,2,2)

(0,0,4)

(1,2,1) (2,1,1) (3,0,1)

(0,4,0)

(1,0,3)

(0,1,3)

Figure 2.4: Illustration of a family tree of 15 nodes.

2.1.6 Recursion Trees

A recursion tree is a family tree where each leaf is a solution and each internal node is a partial

solution e.g. set of subtrees. Along the path from root to a leaf we move towards a solution.

2.1. Basic Terminology 21

2.1.7 Evolutionary Trees

An evolutionary tree is a graphical representation of the evolutionary relationship among three

or more species. In a rooted evolutionary tree, the root corresponds to the most ancient ancestor

in the tree and the path from the root to a leaf in the rooted tree is called an evolutionary

path. Leaves of evolutionary trees correspond to the existing species while internal vertices

correspond to hypothetical ancestral species.

2.1.8 Integer Partition

Given an integer n, it is possible to represent it as the sum of one or more positive integers xi,

i.e., n = x1 + x2 + . . . + xm for 1 ≤ m ≤ n. This representation is called an integer partition if

x1 ≥ x2 ≥ . . . ≥ xm. For example, there are seven distinct partitions of the integer 5:

5, 4+1, 3+2, 3+1+1, 2+2+1, 2+1+1+1, 1+1+1+1+1.

2.1.9 Set Partition

For a positive integer n and k < n, set partition is the set of all partitions of {1, 2, . . . , n} into

k non-empty subsets. For instance, for n = 4 and k = 2 there are seven such partitions:

{1, 2, 3}∪ {4}, {1, 2, 4}∪ {3}, {1, 3, 4}∪ {2}, {2, 3, 4}∪ {1}, {1, 2}∪ {3, 4}, {1, 3}∪{2, 4}, {1, 4} ∪ {2, 3}, {1, 4} ∪ {2, 3}.

2.1.10 Multiset

A multiset is a set of elements where all the elements are not identical. The elements of a

multiset fall into different classes where the elements in the same class are identical but are

distinguishable from those of other classes. For example, {1,1,2,3,1,3,2,2} is an example of

multiset.

Chapter 2. Preliminaries 22

2.1.11 Simpleset

A simple set is a set of elements where all the elements are identical. By simple set we naturally

mean a set. For example, a set of apples, a set of graphs etc.

2.2 Algorithms and Complexity

In this section we briefly introduce some terminologies related to complexity of algorithms.

The most widely accepted complexity measure for an algorithm is the running time, which

is expressed by the number of operations it performs before producing the final answer. The

number of operations required by an algorithm is not the same for all problem instances. Thus,

we consider all inputs of a given size together, and we define the complexity of the algorithm

for that input size to be the worst case behavior of the algorithm on any of these inputs. Then

the running time is a function of size n of the input.

2.2.1 The notation O(n)

In analyzing the complexity of an algorithm, we are often interested only in the ”asymptotic

behavior”, that is, the behavior of the algorithm when applied to very large inputs. To deal

with such a property of functions we shall use the following notations for asymptotic running

time. Let f(n) and g(n) are the functions from the positive integers to the positive reals, then

we write f(n) = O(g(n)) if there exists positive constants c1 and c2 such that f(n) ≤ c1g(n)+c2

for all n. Thus the running time of an algorithm may be bounded from above by phrasing like

”takes time O(n2)”.

2.2.2 Polynomial algorithms

An algorithm is said to be polynomially bounded (or simply polynomial) if its complexity is

bounded by a polynomial of the size of a problem instance. Examples of such complexities are

O(n), O(nlogn), O(n100), etc. The remaining algorithms are usually referred as exponential or

2.2. Algorithms and Complexity 23

non-polynomial. Example of such complexity are O(2n), O(n!), etc.

When the running time of an algorithm is bounded by O(n), we call it a linear time algorithm

or simply a linear algorithm.

2.2.3 Constant Time

In computational complexity theory, constant time refers to the computation time of a problem

when the time needed to solve that problem doesn’t depend on the size of the data that is given

as input. Constant time is notated as O(1).

For example, accessing the elements in the array takes constant time as we can pick up an

element using the index and start working with it. However finding the minimum value in an

array is not a constant time operation as we need to scan each element of the array and then

decide the minimum of those elements. Hence it is a linear time operation and takes O(n) time.

2.2.4 Average Constant Time

In computational complexity theory, average constant time refers to the computation time of

a problem when the time needed to generate all the solutions of that problem depends on the

size of the data that is given as input but the computational time per solution is constant when

averaged over all solutions.

For example, in depth first search(DFS) traversal of a tree, the time required to visit all

nodes in the tree depends on the size of the tree. The computation time per node is constant

if averaged over all nodes. Hence DFS traversal is average constant.

2.2.5 Amortized Time

In analysis of algorithms, amortized analysis refers to finding the average running time per

operation over a worst-case sequence of operations. Amortized analysis differs from average-

case performance in that probability is not involved; amortized analysis guarantees the time

per operation over worst-case performance.

Chapter 2. Preliminaries 24

2.3 Graph Traversal Algorithm

When designing algorithms on graphs, we often need a method for exploring the vertices and

edges of a graph. In this section we describe such a method named depth first search (DFS). In

DFS each edge is traversed exactly once in the forward and reverse directions and each vertex

is visited. Thus DFS runs in linear time on average. We now describe the method.

Consider visiting the vertices of a graph G in the following way. We select and visit a

starting vertex v. Then we select any edge (v, w) incident on v and visit w. In general, suppose

x is the most recent visited vertex. The search is continued by selecting some unexplored edge

(x, y) incident on x. If y has been previously visited, we find another new edge incident on

x. If y has not been visited previously, then we visit y and begin a new search starting at

y. After completing the search through all paths beginning at y, the search returns to x, the

vertex from which y was first reached. The process of selecting unexplored edges incident to x

is continued until the list of these edges is exhausted. This method is called depth first search

since we continue searching in the deeper direction as long as possible.

If the graph G is a tree, then we can order the vertices based on the way the edges are

chosen to be traversed. Consider a vertex v from which a new edge would be explored and

another vertex would be reached. We mark a vertex u when we first reach u and call the label

of u the rank of u. The rank of the root of the tree is 0. So the rank of a vertex u is the

number of vertices explored before u is reached for the first time. Such a traversal is called a

pre-order traversal of the vertices of the tree. If a vertex u is labeled after all vertices located

in the subtree rooted at u are labeled, then the traversal is called post-order traversal. In case

of a binary tree, if the vertex u is labeled after all vertices located in the left subtree rooted at

u are labeled, but before all vertices located in the right subtree rooted at u are labeled, then

the traversal is called in-order traversal.

2.4. Catalan Families 25

2.4 Catalan Families

In several families of combinatorial objects, the size of the class is bounded by the Catalan

Numbers, defined for n ≥ 0 by

Cn =1

n + 12nCn (2.1)

These include binary trees on n vertices, well formed sequence of 2n parentheses, and

triangulations of a labeled convex polygon with n + 2 vertices. There exist bijections between

the members of the Catalan family [CLR90]. Therefore, enumeration algorithm for one member

of the family gives implicitly a listing scheme for every other member of the family.

Chapter 3

Distribution of Objects to Bins

3.1 Introduction

In computer science, we frequently need to count things and generate solutions. The science of

counting is captured by a branch of mathematics called combinatorics. A well known counting

problem is counting the number of ways objects can be distributed among bins [AU95, R00].

The paradigm problem is counting the number of ways of distributing fruits to children. For

example, Kathy, Peter and Susan are three children. We have four apples to distribute among

them without cutting apples into parts. In how many ways the children receive apples?

To solve the counting problem mentioned above, we have four letters A’s representing apples

and two *’s which will represent partitions between the apples belonging to different children.

We order the A’s and *’s as we like and interpret all A’s before the first * as being apples

belonging to Kathy. Those A’s between two *’s belonging to Peter, and the A’s after the

second * are apples belonging to Susan. For instance, AA*A*A represents the distribution

(2, 1, 1), where Kathy gets two apples and the other two children gets one each. Thus, each

distribution of apples to bins is associated with a unique string of four A’s and two *’s. How

many such strings are there? The number of such string is equal to the number of permutations

of those 6 letters. This number is 6!4!2!

. So, the solution for m bins and n objects is (n+m−1)!n!(m−1)!

[AU95, R00]. Thus we count the number of distributions. However, in this thesis we are not

26

3.1. Introduction 27

interested in counting the number of distributions, rather we are interested in generating all

distributions.

Let, D(n,m) represents the set of all distributions of n objects to m bins where each bin gets

zero or more objects. For the previous example, we have D(4, 3) representing all distributions.

Now, let, (i, j, k) represent the situation in which Kathy receives i apples, Peter receives j, and

Susan receives k. The 6!4!2!

= 15 possibilities are -

(0,0,4) (0,1,3) (0,2,2) (0,3,1) (0,4,0) (1,0,3) (1,1,2) (1,2,1) (1,3,0) (4,0,0) (2,0,2)

(2,1,1) (2,2,0) (3,0,1) (3,1,0)

It is useful to have the complete list of all solutions. One can use such a list to search

for a counter-example to some conjecture, to find best solution among all solutions or to test

and analyze an algorithm for its correctness or computational complexity. Many algorithms

to generate a particular class of objects without repetition, are already known [KN05, ZS98,

NU03, YN04, FL79, NU04, NU05, BS94].

There are many applications of distribution of objects to bins. In these days of automa-

tion, machines may require to distribute objects among candidates optimally. Generating all

distribution has many applications in computer science also. In computer networks suppose

there are several communication channels and several processes wants to use the channels. We

can think of communication channels as our symbolic objects and the processes as bins. To

find out which distribution is better taking into account congestion, QoS, channel capacity and

different factors, we may need to calculate these values for each solution. Then we may choose

the optimal one and the next distributions may depend on this distribution i.e. we may want

associate priority with processes. Generating all distributions has also applications in client-

server broker distributed architecture, CPU scheduling, memory management, multiprocessor

systems, etc. [T02, T04].

In this thesis we first consider the problem of generating all possible distributions. The

main challenges in finding algorithms for enumerating all distributions are as follows. Firstly,

the number of such distributions is exponential in general and hence listing all of them requires

huge time and computational power. Secondly, generating algorithms produce huge outputs

Chapter 3. Distribution of Objects to Bins 28

and the outputs dominate the running time. For this reason, reducing the amount of output is

essential. Thirdly, checking for any repetitions must be very efficient. Storing the entire list of

solutions generated so far will not be efficient, since checking each new solution with the entire

list to prevent repetition would require huge amount of memory and overall time complexity

would be very high. So, if we can compress the outputs, then it considerably improves the

efficiency of the algorithm. Therefore, many generating algorithms output objects in an order

such that each object differs from the preceding one by a very small amount, and output each

object as the “difference” from the preceding one. Such orderings of objects are known as Gray

codes [S97, KN05, R00].

The problem of generating all distributions of n objects to m bins can be viewed as generat-

ing integer partition of the integer n when there are m partitions, and the partitions are “fixed”,

“numbered” and “ordered”. That means the number of partitions is fixed, the partitions are

numbered and the assigned numbers are not altered. Zoghbi and Stojmenovic [ZS98] gave an

algorithm to generate integer partitions with a specified order of generation (lexicographic and

anti-lexicographic) but their partitions are not fixed, numbered and ordered. Moreover, their

algorithm does not allow empty partitions. Kawano and Nakano [KN05] generated all set par-

titions where the number of partitions are fixed but the subsets are not numbered or ordered.

They used efficient generation method based on the family tree structure of the solutions. If

we apply their method to this problem then we have to number the subsets. Then we have to

permutate the numbers that we have assigned to the subsets. Since the objects in this problem

are identical, permutating the assigned numbers leads to repetition when any two of the subsets

contain same number of objects. Thus modifying their algorithm we cannot solve our problem

of generating distributions.

Klingsberg [K82] gave an algorithm for sequential listing of the composition of an integer

n into k parts. The algorithm keeps pointers to the first and second nonzero elements in the

sequence. Then by incrementing and decrementing the proper elements in the sequence their

algorithm generates solutions. Their method is straight forward but requires searching for the

second nonzero element in the sequence, for the solutions having a nonzero as the first element.

3.1. Introduction 29

Hence their algorithm cannot generate each solution in O(1) time in ordinary sense, rather the

cost per generation is constant averaging over all solutions in D(n,m).

In this thesis we first give a new algorithm to generate all distributions of n objects to m bins

without repetition. Here, the number of bins are fixed and the bins are numbered and ordered.

The algorithm is simple and generates each distribution in constant time on average without

repetition. Our algorithm, generates a new distribution from an existing one by making a

constant number of changes and outputs each distribution as the difference from the preceding

one. The main feature of our algorithm is that we define a tree structure, that is parent-child

relationships, among those distributions (see Figure 3.1). In such a “tree of distributions”, each

node corresponds to a distribution of objects to bins and each node is generated from its parent

in constant time. In our algorithm, we construct the tree structure among the distributions in

such a way that the parent-child relation is unique, and hence there is no chance of producing

duplicate distributions. Our algorithm also generates the distributions in place, that means,

the space complexity is only O(m).

Level 0

Level 1

Level2 (4,0,0)(3,1,0)(2,2,0)(1,3,0)(2,0,2)(1,1,2)

(0,3,1)(0,2,2)

(0,0,4)

(1,2,1) (2,1,1) (3,0,1)

(0,4,0)

(1,0,3)

(0,1,3)

Figure 3.1: The Family Tree T4,3.

Later, we give a new algorithm to traverse the tree efficiently. This algorithm outputs

each distributions in constant time in ordinary sense (not in average sense). Thus we can

regard the derived sequence of the outputs as a combinatorial Gray code [S97, KN05, R00] for

distributions. To the best of our knowledge, our algorithm is the first algorithm to generate all

distribution in constant time per distribution in ordinary sense. Our algorithm also generates

distributions with a specified order of generation. By using this algorithm we can generate

integer partitions in anti-lexicographic order when the partitions are fixed and ordered. Then,

we extend our algorithm for the case when the bins have priorities associated with them. In

Chapter 3. Distribution of Objects to Bins 30

this case, the bins are numbered in the order of priority. The sequence of generations maintain

an order so that the generations maintain priority.

The rest of the chapter is organized as follows. Section 3.2 gives some definitions. Sec-

tion 3.3 deals with generating all distributions of objects to bins. In Section 3.4, we present

the improved tree traversal algorithm that generates each solution in O(1) time. Section 3.5

generates distributions in anti-lexicographic order. In Section 3.6, we consider the case when

priorities are associated with bins. Finally Section 3.7 is a conclusion. Our results presented in

this chapter are to appear in [AR06].

3.2 Preliminaries

In this section we define some terms used in this chapter.

Let G be a connected graph with n vertices. A tree is a connected graph without cycles.

A rooted tree is a tree with one vertex r chosen as root. A leaf in a tree is a vertex of degree

1. Each vertex in a tree is either an internal vertex or a leaf. A family tree is a rooted tree

with parent-child relationship. The vertices of a rooted tree have levels associated with them.

The root has the lowest level i.e. 0. The level for any other node is one more than its parent

except root. Vertices with the same parent v are called siblings. The siblings may be ordered

as c1, c2, . . . , cl where l is the number of children of v. If the siblings are ordered then ci−1 is the

left sibling of ci for 1 < i ≤ l and ci+1 is the right sibling of ci for 1 ≤ i < l. The ancestors of a

vertex other than the root are the vertices in the path from the root to this vertex, excluding

the vertex and including the root itself. The descendants of a vertex v are those vertices that

have v as an ancestor. A leaf in a family tree has no children.

Given an integer n, it is possible to represent it as the sum of one or more positive integers

xi, i.e., n = x1 + x2 + . . . + xm for 1 ≤ m ≤ n. This representation is called an integer partition

if x1 ≥ x2 ≥ . . . ≥ xm. For example, there are seven distinct partitions of the integer 5:

5, 4+1, 3+2, 3+1+1, 2+2+1, 2+1+1+1, 1+1+1+1+1.

3.2. Preliminaries 31

For a positive integer n and k < n, set partition is the set of all partitions of {1, 2, . . . , n}into k non-empty subsets. For instance, for n = 4 and k = 2 there are seven such partitions:

{1, 2, 3}∪ {4}, {1, 2, 4}∪ {3}, {1, 3, 4}∪ {2}, {2, 3, 4}∪ {1}, {1, 2}∪ {3, 4}, {1, 3}∪{2, 4}, {1, 4} ∪ {2, 3}, {1, 4} ∪ {2, 3}.

For positive integers n and m, let, A ∈ D(n,m) be a distribution of n objects to m bins.

The bins are ordered and numbered as B1, B2, . . . , Bm. For each A ∈ D(n,m), we define a

unique sequence of positive integers (a1, a2, . . . , am), where ai represents number of objects in

ith bin Bi, for 1 ≤ i ≤ m. The sequence for A is unique for each distribution because the

bins are ordered and numbered. For example, (0, 0, 4) represents there are 3 bins and 4 objects

and third bin contains 4 objects and the rest of the bins are empty (see Figure 3.2). We can

observe that for each sequence a1 + a2 + . . . + am = n. This equality holds because the number

of objects are fixed and we have to place every object to some bins.

����

����

���

������

���

(0,0,4)

Figure 3.2: Representation of a distribution of 4 objects to 3 bins.

Lexicographic order for distribution of objects is defined as follows. If P = (p1, p2, . . . , pm)

and Q = (q1, q2, . . . , qm) are sequences for two distributions, then P precedes Q lexicographically

if and only if, for some k, pk < qk and pi = qi for all 1 ≤ i ≤ (k − 1). For example,

the distributions of 4 objects to 3 bins in lexicographic order are: (0, 0, 4), (0, 1, 3), (0, 2, 2),

(0, 3, 1), (0, 4, 0) and so on. The anti-lexicographic order is the reverse of lexicographic one. The

distributions in antilexicographic order are: (4, 0, 0), (3, 1, 0), (3, 0, 1), (2, 2, 0), (2, 1, 1) and so

on.

A listing of distributions is said to be in gray code order if each successive sequences for

distributions in the listing differs by a constant amount. For example, the swapping of elements,

or the flipping of a bit. In this chapter, we establish such an ordering of all distribution of objects

Chapter 3. Distribution of Objects to Bins 32

to bins so that each distribution can be generated by making constant amount of changes to

the preceding distribution in the order.

3.3 Generating Distribution of Objects to Bins

In this section we give an algorithm to generate all distributions of identical objects to bins. For

that purpose we define a unique parent-child relationship among the distributions in D(n,m)

so that the relationship among the distributions can be represented by a tree with a suitable

distribution as the root. Figure 3.1 shows such a tree of distributions of 4 objects and 3 bins.

Once such a parent-child relationship is established, we can generate all the distributions in

D(n,m) using the relationship. We do not need to build or store the entire tree of distributions

at once, rather we generate each distribution in the order it appears in the tree structure.

In Section 3.3.1 we define a tree structure among distributions in D(n,m) and in Section

3.3.2 we present our algorithm which generates each solution in O(1) time on average.

3.3.1 The Family Tree

In this section we define a tree structure Tn,m among distributions in D(n,m).

For positive integers n and m, let, A ∈ D(n,m) be a distribution of n objects to B1, B2, . . . , Bm

bins. For each A ∈ D(n,m) we get a unique sequence (a1, a2, . . . , am) where ai represents num-

ber of objects in ith bin, for 1 ≤ i ≤ m. Note that, for each sequence a1 + a2 + · · ·+ am = n.

Now we define the family tree Tn,m as follows. Each node of Tn,m represents a distribution. If

there are m bins then there are m levels in Tn,m. A node is in level i in Tn,m if a1, a2, . . . , am−i−1 =

0 and am−i 6= 0 for 0 ≤ i < m. As the level increases the number of leftmost 0 decreases and

vice versa. Thus a node at level m − 1 has no leftmost 0 before leftmost nonzero integer i.e.

a1 6= 0. Since Tn,m is a rooted tree we need a root and the root is a node at level 0. One can

observe that a node is at level 0 in Tn,m if a1, a2, . . . , am−1 = 0 and am 6= 0. In this case, am = n

and there can be exactly one such node. We thus take the sequence (0, 0, . . . , 0, n) as the root

of Tn,m. Clearly, the number of leftmost 0 before any nonzero integer in root is greater than

3.3. Generating Distribution of Objects to Bins 33

that of any other sequence for any distribution in D(n,m).

To construct Tn,m, we define two types of relations among the distributions in D(n,m):

(a) Parent-child relationship and

(b) Child-parent relationship.

We define the parent-child relationships among the distributions in D(n,m) with two goals

in mind. First, the difference between a distribution A and its child C(A) should be minimum,

so that C(A) can be generated from A by minimum effort. Second, every distribution in

D(n,m) must have a parent except the root and only one parent in Tn,m. We achieve the first

goal by ensuring that the child C(A) of a distribution A can be found by simple subtraction.

That means A can also be generated from its child C(A) by simple addition. The second goal,

that is the uniqueness of the parent-child relationship is illustrated in the following subsections.

Parent-Child Relationship

Let A ∈ D(n,m) be a sequence (a1, a2, . . . , am) and it corresponds to a node of level i, 0 ≤ i < m

of Tn,m. So, we have a1, a2, . . . , am−i−1 = 0 and am−i 6= 0 for 0 ≤ i < m. The number of children

it has is equal to am−i. The sequence of the children are defined in such a way that to generate

a child from its parent we have to deal with only two integers in the sequence and the rest of the

integers remain unchanged. The two integers are determined by the level of parent sequence in

Tn,m. The operations we apply to these two integers are only subtraction and assignment. The

number of leftmost 0 decreases in the child sequence by applying parent-child relationship.

Let Cj(A) ∈ D(n,m) be the sequence of jth child, 1 ≤ j ≤ am−i of A. Note that A is in

level i of Tn,m and Cj(A) will be in level i + 1 of Tn,m. We define the sequence for Cj(A) as

(c1, c2, . . . , cm−i−1, cm−i, . . . , cm), where 0 ≤ i < m, c1 = c2 = . . . = cm−i−2 = 0, cm−i−1 = j,

cm−i = am−i− j and ck = ak for m− i+1 ≤ k ≤ m. Thus, we observe that Cj is a node of level

i + 1, 0 ≤ i < m − 1 of Tn,m and so c1, c2, . . . , cm−i−2 = 0 and cm−i−1 6= 0 for 0 ≤ i < m − 1.

So, for each consecutive level we only deal with two numbers am−i−1 and am−i and the rest of

the integers remain unchanged. For example, the solution (0, 0, 4), for n = 4 and m = 3, is a

Chapter 3. Distribution of Objects to Bins 34

node of level 0 because a1 = 0, a2 = 0 and a3 6= 0. Here, am−i = 4 so it has 4 children and the

four children are shown in Figure 3.1.

Child-Parent Relationship

The child-parent relation is just the reverse of parent-child relation. Let, A ∈ D(n,m) be a

sequence (a1, a2, . . ., am) and it corresponds to a node of level i, 1 ≤ i < m of Tn,m. So, we

have a1, a2, . . . , am−i−1 = 0 and am−i 6= 0 for 0 < i < m. We define a unique parent sequence

of A at level i − 1 of Tn,m. Like the parent-child relationship here we also deal with only two

integers in the sequence. The operations we apply to these two integers are only addition and

assignment. The number of leftmost 0 increases in the parent sequence by applying child-parent

relationship.

Let P (A) ∈ D(n,m) be the parent sequence of A. We define the sequence for P (A) as (p1, p2,

. . ., pm−i, pm−i+1, . . ., pm) where 1 ≤ i < m, p1 = p2 = . . . = pm−i = 0, pm−i+1 = am−i +am−i+1,

and pj = aj for m − i + 1 < j ≤ m. Thus, we observe that P (A) is a node of level i − 1,

1 ≤ i < m of Tn,m and so p1, p2, . . . , pm−i = 0 and pm−i+1 6= 0 for 1 ≤ i < m. For example, the

solution (0, 3, 1), for n = 4 and m = 3, is a node of level 1 because a1 = 0 and a2 6= 0. It has a

unique parent (0, 0, 4) as shown in Figure 3.1.

The Family Tree

From the above definitions we can construct Tn,m. We take the sequence Ar = a1, a2, . . . , am

as root where a1, a2, . . . , am−1 = 0 and am = n as we mentioned before. The family tree

Tn,m for the distributions in D(n,m) is shown in Figure 3.1. Based on the above parent-child

relationship, the following lemma proves that every distribution in D(n,m) is present in Tn,m.

Lemma 3.3.1 For any distribution A ∈ D(n,m), there is a unique sequence of distributions

that transforms A into the root Ar of Tn,m.

Proof. Let A ∈ D(n,m) be a sequence, where A is not the root sequence. By applying

child-parent relationship, we find the parent sequence P (A) of the sequence A. Now if P(A)

3.3. Generating Distribution of Objects to Bins 35

is the root sequence, then we stop. Otherwise, we apply the same procedure to P (A) and

find its parent P (P (A)). By continuously applying this process of finding the parent sequence

of the derived sequence, we have the unique sequence A,P (A), P (P (A)), . . . of sequences in

D(n,m) which eventually ends with the root sequence Ar of Tn,m. We observe that P (A) has

at least one zero more than A in its sequence. Thus A,P (A), P (P (A)), . . . never lead to a cycle

and the level of the derived sequence decreases which ends up with the level of root sequence

Ar. Q.E .D.

Lemma 3.3.1 ensures that there can be no omission of distributions in the family tree Tn,m.

Since there is a unique sequence of operations that transforms a distribution A ∈ D(n,m) into

the root Ar of Tn,m, by reversing the operations we can generate that particular distribution,

staring from root. Now we have to make sure that Tn,m represents distributions without rep-

etition. Based on the parent-child and child-parent relationships, the following lemma proves

this property of Tn,m.

Lemma 3.3.2 The family tree Tn,m represents distributions in D(n,m) without repetition.

Proof. Given a sequence A ∈ D(n,m), the children of A are defined in such a way that

no other sequence in D(n,m) can generate same child. For contradiction let two sequences

A,B ∈ D(n,m) are at level i of Tn,m and generate same child C. So, C is a sequence of

level i + 1 of Tn,m. The sequences for A, B and C are aj, bj and cj for 1 ≤ j ≤ m. Clearly,

ak = bk = 0 for 1 ≤ k ≤ m− i−1. According to parent-child relationship, we have ak = bk = ck

for m−i+1 ≤ k ≤ m because only two integers in the sequence are changed and the rest remain

unchanged. From the above two equations we have ak = bk for k 6= m − i and 1 ≤ k ≤ m.

Note that, a1 + a2 + . . . + am = n = b1 + b2 + . . . + bm. Substituting the values for ak and bk

for k 6= m− i and 1 ≤ k ≤ m and simplifying yields am−i = bm−i. So, ak = bk for 1 ≤ k ≤ m.

This implies that A and B are same sequence. By contradiction, every sequence has a single

and unique parent. Q.E .D.

Chapter 3. Distribution of Objects to Bins 36

3.3.2 The Algorithm

In this section we give an algorithm to construct Tn,m and generate all distributions.

If we can generate all child sequences of a given sequence in D(n, m), then in a recursive

manner we can construct Tn,m and generate all sequence in D(n,m). We have the root sequence

Ar = (0, . . . , 0, n). We get the child sequence Ac by using the parent to child relation discussed

above.

Procedure Find-All-Child-Distributions(A = (a1, a2, . . . , am), i)

{ A is the current sequence, i indicates the current level and Ac is the child sequence }begin

1 Output A {Output the difference from the previous distribution}2 for j = 1 to am−i

3 Find-All-Child-Distributions( Ac = (a1, a2, . . . , am−i−2, j, (am−i−j), . . . , am), i+1)

4 end;

5 Algorithm Find-All-Distributions(n,m)

6 begin

7 Find-All-Child-Distributions( Ar = (0, . . . , 0, n), 0 )

8 end.

The following theorem describes the performance of the algorithm Find-All-Distributions.

Theorem 3.3.3 The algorithm Find-All-Distributions runs in O(|D(n,m)|) time and uses

O(m) space.

Proof. In our algorithm we only use the simple addition or subtraction operation to

generate a new distribution from an old one. Thus each distribution is generated in constant

time without computational overhead. Since we traverse the family tree Tn,m and output each

sequence at each corresponding vertex of Tn,m we can generate all the sequences in D(n,m)

without repetition. By applying parent to child relation we can generate every child in O(1)

3.4. Efficient Tree Traversal 37

time. Then by using child to parent relation we go back to the parent sequence. Hence, the

algorithm takes O(|D(n,m)|) time i.e. constant time on average for each output.

Our algorithm outputs each distribution as the difference from the previous one. The data

structure that we use to represent the distribution is a sequence of integers where each integer

represents the number of objects in a particular bin. Therefore, the memory requirement is

O(m), where m is the number of bins. Q.E .D.

3.4 Efficient Tree Traversal

The algorithm in Section 3.3 generates all sequences in D(n,m) in O(|D(n,m)|) time. Thus

the algorithm generates each sequence in O(1) time “on average”. However, after generating

a sequence corresponding to the last vertex in the largest level in a large subtree of Tn,m, we

have to merely return from the deep recursive call without outputting any sequence and hence

we cannot generate each sequence in O(1) time (in ordinary sense). In this section we present

the improved tree traversal algorithm that generates each solution in O(1) time (in ordinary

sense).

To make the algorithm efficient we introduce two additional types of relations:

(i) Relationship between left sibling and right sibling and

(ii) Leaf-ancestor relationship.

In Section 3.4.1 we define the relationship between left sibling and right sibling, in Section

3.4.2, we illustrate the leaf-ancestor relationship and in Section 3.4.3 we present our efficient

tree traversal algorithm.

3.4.1 Relationship Between Left Sibling and Right Sibling

The relationship between left sibling and right sibling is defined with two goals in mind. First,

the difference between a distribution A and its right sibling As should be minimum if right

sibling exists, so that As can be generated from A by minimum effort. Second, the steps

Chapter 3. Distribution of Objects to Bins 38

needed to go back to parent and then generate next child must be reduced. We achieve the first

goal by ensuring that the right sibling As of a distribution A can be found by simple increment

and decrement operation. The second goal, that is reduction of steps is achieved by directly

moving to next child sequence from the current sequence. That means, we do not have to go

back to parent to generate the next child. This saves two extra steps needed to generate the

next child.

Let A ∈ D(n,m) be a sequence (a1, a2, . . . , am) and it corresponds to a node of level i,

1 ≤ i < m of Tn,m. So, we have a1, a2, . . . , am−i−1 = 0 and am−i 6= 0 for 1 ≤ i < m. We say the

right sibling As ∈ D(n,m) of node A exists if am−i+1 6= 0 at level i of Tn,m. Then we call the

sequence A the left sibling of As.

We define the sequence for As as (s1, s2, . . . , sm−i, sm−i+1, . . . , sm), 1 ≤ i < m, where s1 =

s2 = . . . = sm−i−1 = 0, sm−i = am−i−1, sm−i+1 = am−i+1 +1 and sj = aj for m− i+2 ≤ j ≤ m.

That means, to obtain As from A, we decrement am−i by one and increment am−i+1 by one and

the rest of the integers remain unchanged. For example, in Figure 3.3 the solution (0, 0, 3, 1) is

a node of level 1 and it has a right sibling (0, 0, 4, 0).

Level 0

Level 1

Level2

Level 3

(0,0,1,3)

(0,1,0,3)

(1,0,0,3) (3,0,0,1)(2,1,0,1)(1,2,0,1)(2,0,1,1)(1,1,1,1)(1,0,2,1) (4,0,0,0)

(0,0,4,0)

(0,3,0,1)(0,1,2,1) (0,2,1,1)

(0,0,3,1)

(0,1,1,2)

(1,0,1,2) (2,0,0,2)(1,1,0,2)

(0,0,2,2)

(0,2,0,2)

(0,0,0,4)

Figure 3.3: Efficient Traversal of the family tree T4,4.

3.4.2 Leaf-Ancestor Relationship

To avoid returning from deep recursive call without outputting any sequence, we define leaf-

ancestor relationship. After generating the sequence Al of the last vertex in the largest level

i.e. rightmost leaf, we do not return to parent. Instead, we return to the nearest ancestor

3.4. Efficient Tree Traversal 39

Aa which has right sibling. By rightmost leaf we mean that leaf which has no right sibling.

Thus this leaf-ancestor relation saves many non generation steps. Another reason of defining

leaf-ancestor relationship is that the nearest ancestor can be generated from the leaf sequence

by a simple swap operation between two integers in the sequence. The other integers in the

sequence remain unchanged.

Let Al ∈ D(n,m) be a sequence (a1, a2, . . . , am) of leaf and it corresponds to a node of level

m − 1 of Tn,m. So, we have a1 6= 0. We say that the ancestor sequence Aa ∈ D(n,m) of node

Al exists if a2 = 0 i.e. A has no right sibling. We define the ancestor sequence Aa of Al at level

m − 1 − k if a2 = a3 = . . . = ak+1 = 0 and ak+2 6= 0. The sequence for nearest ancestor Aa

is determined by the number of consecutive 0’s after a1 in the sequence for Al. We denote the

number such 0’s in the sequence Al as k. This k will determine the level and sequence of the

nearest ancestor Aa which has right sibling.

We define the sequence for Aa as (s1, s2, . . . , sk, sk+1, . . . , sm), where s1 = s2 = . . . = sk = 0

and sk+1 = a1 and sj = aj for k + 1 < j ≤ m. In other words, to obtain Aa from Al, we swap

a1 and ak+1 and the rest of the integers remain unchanged. One can observe that the sequence

Aa is at level m− 1− k of Tn,m. For example, in Figure 3.3 the solution (3,0,0,1) is a node of

level 3 and it has a nearest ancestor (0,0,3,1) which is obtained by swapping first integer 3 and

third integer 0. We have the following lemma on the nearest ancestor Aa of Al.

Lemma 3.4.1 Let Al be a leaf sequence of Tn,m having no right sibling. Then Al has a unique

ancestor sequence Aa in Tn,m. Furthermore, either Aa has a right sibling in Tn,m or Aa is the

root Ar of Tn,m.

Proof. Let the sequence for Al ∈ D(n,m) be (a1, a2, . . . , am) and it corresponds to a node

of level m− 1 of Tn,m. Note that a1 6= 0 and a2 = 0. We get the sequence for Aa by swapping

a1 and ak+1 where k is the number of consecutive 0’s after a1. Clearly Aa is an ancestor of

Al. Note that Aa is at level m− 1− k. By Lemma 3.2, parent-child relation is unique. Hence

by repeatedly applying child-parent relation on Al, we will reach a unique ancestor at level

m − 1 − k. For k = m − 1, one can observe that we get the root sequence Ar by swapping

Chapter 3. Distribution of Objects to Bins 40

a1 and am. For 1 ≤ k < m − 1, we get the unique ancestor sequence Aa which has a right

sibling. Q.E .D.

Lemma 3.4.1 ensures that Al has a unique ancestor Aa. As we see later Aa plays an

important role in our algorithm. Note that, we may need to return to ancestor Aa if current

node is a leaf Al and for a leaf sequence Al we have a1 6= 0. Aa is obtained from Al by swapping

a1 and ak+1 where k is the number of consecutive 0’s after a1. Now, to find out k we have

to search the sequence Al from a1 to ak+1 such that a2 = a3 = . . . = ak+1 = 0 and a1 6= 0,

ak+2 6= 0. We reduce the complexity of searching by keeping extra information as shown in

Figure 3.4 (for simplicity we omit the separators). The information consists of the number of

subsequences of consecutive 0’s and the number of 0’s in each subsequence after am−i, where i

is the current level. For this we keep a stack of size m/2. The top of the stack determines the

current k. Initially the stack is empty. As soon as we find a zero, when moving from parent

to child or left sibling to right sibling, we push a 1 on the stack. We increment the top of the

stack for consecutive 0’s. We make a pop operation when we apply the leaf-ancestor relation.

The stack operations are shown in Figure 3.5. One can observe that there can be at most m/2

subsequences of consecutive 0’s in a sequence of size m. Therefore, in worst case we need a

stack of size m/2.

Level 0

Level 1

Level2 400,2310,1301,1

013,_

103,1 112,_ 202,1

022,_ 031,_ 040,1

220,1130,1211,_

004,_

121,_

Figure 3.4: Efficient Traversal of T4,3 keeping extra information.

3.4.3 The Efficient Algorithm

In this section we present an efficient algorithm to generate all distributions in D(n,m). We

use three relations in this algorithm, they are parent-child relation, relation between left sibling

3.4. Efficient Tree Traversal 41

0040,

0130,

1030,

0004,

1

1

11

Level 0

Level 1

Level2

Level 3 4000,3100,2200,1300,

0220, 0310, 0400,1 1 2

2 2 2 3

Figure 3.5: Use of stack for tree traversal (T4,4).

and right sibling and leaf-ancestor relation. By applying parent-child relation, we go from root

down the family tree Tn,m until we reach leaf at level m − 1. Then we apply the relationship

between left sibling and right sibling to traverse horizontally until we reach a node which has

no right sibling. Then by applying leaf-ancestor relation, we return to that nearest ancestor

which has sibling. Then we again apply relation between left sibling and right sibling. The

sequence of applying relationships and generating distributions continues until we reach root.

This algorithm thus reduces non-generation steps and generates each sequence in O(1) time (in

ordinary sense).

Procedure Find-All-Child-Distributions2(A = (a1, a2, . . . , am), i)

{ A is the current sequence }begin

1 Output A {Output the difference from the previous distribution}2 if A has child then

3 Generate first child Ac;

4 Find-All-Child-Distributions2(Ac, i + 1);

5 else if A has right sibling then

6 Generate right sibling As;

7 Find-All-Child-Distributions2(As, i);

8 else

9 Generate that ancestor Aa at level i− k which has right sibling or which is root;

Chapter 3. Distribution of Objects to Bins 42

10 if Aa is root at level 0

11 then done

12 else

13 Generate right sibling Aas of Aa;

14 Find-All-Child-Distributions2(Aas, i− k);

15 end;

16 Algorithm Find-All-Distributions2(n,m)

17 begin

18 Find-All-Child-Distributions2( Ar = (0, . . . , 0, n), 0 );

19 end.

The tree traversal according to the efficient algorithm is depicted in Figure 3.3. For a

sequence we need O(m) space and additional m/2 space is required for stack manipulation.

Hence the algorithm takes O(m) space. One can observe that the algorithm generates all

sequences such that each sequence in Tn,m can be obtained from the preceding one by at most

two operations. Note that if A corresponds to a vertex v of Tn,m, which has no child and no

sibling, we need two steps, one for tracing its ancestor and the other for tracing ancestor’s right

sibling. Otherwise, we need only one step to generate the next sequence. Thus the algorithm

generates each sequence in O(1) time per sequence. Note that each sequence is similar to the

preceding one, since it can be obtained by at most two operations (see Figure 3.6). Thus, we can

regard the derived sequence of the sequences as a combinatorial Gray code [S97, KN05, R00]

for distributions. Thus we have the following theorem.

Theorem 3.4.2 The algorithm Find-All-Distributions2 uses O(m) space and generates

each distribution in D(n,m) in constant time (in ordinary sense).

3.5. Distributions in Anti-lexicographic Order 43

(0,0,4)

(0,1,3)

(1,0,3)

(0,2,2)

(1,1,2)

(2,0,2)

(0,3,1)

(1,2,1)

(2,1,1)

(3,0,1)

(0,4,0)

(1,3,0)

(2,2,0)

(3,1,0)

(4,0,0)

Figure 3.6: A Gray code for D(4, 3).

3.5 Distributions in Anti-lexicographic Order

In this section we describe how our algorithm generates distributions in anti-lexicographic order.

By using this technique, we can also generate integer partitions in anti-lexicographic order when

the partitions are fixed and ordered.

Our algorithm generates distributions with a specified order of generation. For positive

integers n and m, let A ∈ D(n,m) be a distribution of n objects to m bins. The bins are

ordered and numbered as B1, B2, . . . , Bm. We generate the distributions in an order such that

the rightmost bin gets the highest number of objects at first and the leftmost bin gets the

lowest. The number of objects in the rightmost bin decreases with the sequence of generations.

In the last distribution, the leftmost bin gets the highest number of objects and the rightmost

bin gets lowest. We find that in our generation if we reverse the order of the bins then we will

get the generation in anti-lexicographic order. The algorithm will be same as the previous case

but only the order of the bins will reverse. The anti-lexicographic order of generation is shown

in Figure 3.7.

Level 0

Level 1

Level2 004013103

310

301 211 202

220 130 040

022031121 112

400

Figure 3.7: Illustration of generation of D(4, 3) in anti-lexicographic order.

Chapter 3. Distribution of Objects to Bins 44

3.6 Generating Distributions with Priorities to Bins

In this section we consider the case when priorities are associated with bins. The sequence of

generations will maintain an order such that the bin with highest priority gets highest objects

at first and then the priorities of the bins are decreased one by one. The sequence of generations

maintain an order so that generations can maintain priority.

Our algorithm generates distributions with a specified order of generation. For positive

integers n and m, let A ∈ D(n,m) be a distribution of n objects to m bins. The bins are

ordered and numbered as B1, B2, . . . , Bm which have priorities as p1, p2, . . . , pm. Our order of

generation is such that the rightmost bin gets the highest number of objects at first and the

leftmost bin gets the lowest. The number of objects in the rightmost bin decreases with the

sequence of generations. The last distribution ensures that the leftmost bin gets the highest

number of objects and the rightmost bin gets lowest. We find that in our generation if the

bins are ordered according to their priorities in ascending order then we will get the generation

that maintains priority. That means the highest priority bin gets the highest objects at first

and the number of objects decreases with the generations. The algorithm will be same as the

previous case but only the order of the bins will be such that the following inequality holds:

p1 ≤ p2 ≤ . . . ≤ pm. The generation is similar to Figure 3.3.

3.7 Conclusion

In this chapter we give a simple algorithm to generate all distributions in D(n,m). The al-

gorithm generates each distribution in constant time with linear space complexity. We also

present an efficient tree traversal algorithm that generates each solution in O(1) time. Then,

we describe a method to generate distributions in anti-lexicographic order. Finally, we extend

our algorithms for the case when the bins have priorities associated with them. The main

feature of our algorithms is that they are constant time solution which is a very important

requirement for generation problems.

Chapter 4

Distribution of Distinguishable Objects

to Bins

4.1 Introduction

In the previous chapter, we gave efficient algorithms to generate all distributions of objects to

bins where the objects where identical. In this chapter we generalize the problem by considering

non-identical objects. Non-identical objects fall into different classes and we call such objects

as “distinguishable objects”.

Let there are m bins and n distinguishable objects where the objects fall into k different

classes. Objects within a class are identical with each other, but are distinguishable from those

of other classes. Let, nj represent the number of objects in the jth class where 1 ≤ j ≤ k. The

paradigm problem is distributing different types of fruits to children. Suppose we have three

apples, two pears and a banana to distribute to Kathy, Peter and Susan. Then m = 3, which

is the number of children. There are k = 3 groups, with n1 = 3, n2 = 2, and n3 = 1. Since

there are 6 objects in total, so n = 6.

Now the question is - Can we count the number of solutions? For identical objects, the

number of distributions for m bins and n identical objects is (n+m−1)!n!(m−1)!

[AU95, R00, AR06]. To

45

Chapter 4. Distribution of Distinguishable Objects to Bins 46

solve the counting problem for distinguishable objects, we use this formula. For distinguishable

objects, we first distribute the fruits of class 1 to all the bins. The number of such distributions

with n1 objects and m bins is (n1+m−1)!n1!(m−1)!

. Then we distribute the objects of second class and

so on upto kth class. Thus the total number of distributions will be the product of all these

solutions as in the following expression:

(n1 + m− 1)!

n1!(m− 1)!.(n2 + m− 1)!

n2!(m− 1)!. . . .

(nk + m− 1)!

nk!(m− 1)!

Let D(n,m, k) represents the set of all distributions of n objects to m bins where the

objects fall into k different classes and each bin gets zero or more objects. For the previous

example, we have D(6, 3, 3) representing all distributions where the number of distribution

is 8!3!2!

. 7!2!2!

. 6!1!2!

= 1524096000. Thus we count the number of distributions. However, in this

thesis we are not interested in counting the number of distributions, rather we are interested

in generating all distributions.

It is useful to have the complete list of all solutions. One can use such a list to search

for a counter-example to some conjecture, to find best solution among all solutions or to test

and analyze an algorithm for its correctness or computational complexity. Many algorithms

to generate a particular class of objects without repetition, are already known [AR06, KN05,

ZS98, NU03, YN04, FL79, NU04, NU05, BS94].

There are many applications of distribution of objects to bins. In these days of automa-

tion, machines may require to distribute objects among candidates optimally. Generating all

distribution has many applications in computer science also. In computer networks suppose

there are several communication channels and several processes wants to use the channels. We

can think of communication channels with different bandwidth as our symbolic objects and

the processes as bins. To find out which distribution is better taking into account congestion,

QoS, channel capacity and different factors, we may need to calculate these values for each

solution. Then we may choose the optimal one and the next distributions may depend on this

distribution i.e. we may want associate priority with processes. Generating all distributions

has also applications in client-server broker distributed architecture, CPU scheduling, memory

management, multiprocessor systems, etc. [T02, T04].

4.1. Introduction 47

Generally, generating algorithms produce huge outputs, and the outputs dominate the run-

ning time of the generating algorithms. So, if we can compress the outputs, then it considerably

improves the efficiency of the algorithm. Therefore, many generating algorithms output solu-

tions in an order such that each solution differs from the preceding one by a very small amount,

and output each solution as the “difference” from the preceding one. Such orderings of solutions

are known as Gray codes [S97, KN05, AR06].

Klingsberg [K82] gave an average constant time algorithm for sequential listing of the com-

position of an integer n into k parts. Using efficient tree traversal technique, we improved the

time complexity to constant time (in ordinary sense) and gave an efficient algorithm to generate

all distribution of identical objects to bins in the previous chapter. We used efficient generation

method based on the family tree structure of the distributions. However, in this chapter we

are interested in generating all distributions of distinguishable objects. This problem becomes

more difficult than the identical case since the solution space is large. If we apply the algorithm

for identical objects for distinguishable objects, there will be omission of distributions. Hence

the algorithm for generating identical objects is not applicable for distinguishable objects.

The problem of generating all distribution of distinguishable objects can be viewed as gen-

erating multiset partitions when the partitions are “fixed”, “numbered” and “ordered”. That

means the number of partitions is fixed, the partitions are numbered and the assigned numbers

to bins are not altered. There is no known algorithm that generates all multiset partitions

in constant time. So, our algorithm is the first algorithm that generates multiset partitions

in constant time when the partitions are fixed, numbered and ordered. Kawano and Nakano

[KN06] gave an algorithm to generate multiset partition but the algorithm is complex and does

not give solutions in constant time in ordinary sense. Their algorithm is based on family tree

recombination and generates solutions in O(k) time where there are k types of elements in the

set. Their method is not applicable here since the partitions are fixed, ordered and numbered.

On the other hand, our algorithm is simple and generates each solution in constant for fixed,

numbered and ordered partitions.

In this chapter we give an algorithm to generate all distributions of n distinguishable objects

Chapter 4. Distribution of Distinguishable Objects to Bins 48

to m bins where the objects fall into k different classes. Here, the number of bins is fixed and

the bins are numbered and ordered. The algorithm is simple and generates each distribution

in constant time on average without repetition. Our algorithm, generates a new distribution

from an existing one by making a constant number of changes and outputs each distribution

as the difference from the preceding one. The main feature of our algorithm is that we define

a tree structure, that is parent-child relationships, among those distributions (see Figure 4.1).

In such a “tree of distributions”, each node corresponds to a distribution of objects to bins and

each node is generated from its parent in constant time. In our algorithm, we construct the tree

structure among the distributions in such a way that the parent-child relation is unique, and

hence there is no chance of producing duplicate distributions. Our algorithm also generates the

distributions in place, that means, the space complexity is linear.

Level 1

((1,0),(1,0),(0,1))

((1,0),(0,1),(1,0)) ((0,1),(1,0),(1,0)) ((1,1),(0,0),(1,0))

Level 0 ((0,0),(0,0),(2,1))

((0,0),(1,0),(1,1))

((1,0),(0,0),(1,1))

((0,0),(2,0),(0,1)) ((0,0),(2,1),(0,0))

((2,0),(0,0),(0,1)) ((0,1),(0,0),(2,0))

((0,0),(0,1),(2,0)) ((0,0),(1,1),(1,0))

((2,0),(0,1),(0,0)) ((0,1),(2,0),(0,0))((1,0),(1,1),(0,0)) ((1,1),(1,0),(0,0)) ((2,1),(0,0),(0,0))

Level 2

Figure 4.1: The Family Tree T3,3,2.

Later, we give a new algorithm to traverse the tree efficiently. This algorithm outputs each

distributions in constant time in ordinary sense (not in average sense). Thus we can regard the

derived sequence of the outputs as a combinatorial Gray code [S97, KN05, R00] for distributions.

To the best of our knowledge, our algorithm is the first algorithm to generate all distribution

in constant time per distribution in ordinary sense. Then, we extend our algorithm for the

4.2. Preliminaries 49

case when the bins have priorities associated with them. In this case, the bins are numbered

in the order of priority. The sequence of generations maintain an order so that the successive

generations maintain priority.

The rest of the chapter is organized as follows. Section 4.2 gives some definitions. Section

4.3 deals with generating all distributions of distinguishable objects to bins. In Section 4.4, we

present the improved tree traversal algorithm that generates each solution in O(1) time. In

Section 4.5, we consider the case when priorities are associated with bins. Finally Section 4.6

is a conclusion.

4.2 Preliminaries

In this section we define some terms used in this chapter.

Let G be a connected graph with n vertices. A tree is a connected graph without cycles.

A rooted tree is a tree with one vertex r chosen as root. A leaf in a tree is a vertex of degree

1. Each vertex in a tree is either an internal vertex or a leaf. A family tree is a rooted tree

with parent-child relationship. The vertices of a rooted tree have levels associated with them.

The root has the lowest level i.e. 0. The level for any other node is one more than its parent

except root. Vertices with the same parent v are called siblings. The siblings may be ordered

as c1, c2, . . . , cl where l is the number of children of v. If the siblings are ordered then ci−1 is the

left sibling of ci for 1 < i ≤ l and ci+1 is the right sibling of ci for 1 ≤ i < l. The ancestors of a

vertex other than the root are the vertices in the path from the root to this vertex, excluding

the vertex and including the root itself. The descendants of a vertex v are those vertices that

have v as an ancestor. A leaf in a family tree has no children.

For a positive integer n and k < n, set partition is the set of all partitions of {1, 2, . . . , n}into k non-empty subsets. For instance, for n = 4 and k = 2 there are seven such partitions:

{1, 2, 3}∪ {4}, {1, 2, 4}∪ {3}, {1, 3, 4}∪ {2}, {2, 3, 4}∪ {1}, {1, 2}∪ {3, 4}, {1, 3}∪{2, 4}, {1, 4} ∪ {2, 3}, {1, 4} ∪ {2, 3}.

A simple set is a set of elements where all the elements are identical. A multiset is a set of

Chapter 4. Distribution of Distinguishable Objects to Bins 50

elements where all the elements are not identical. The elements of a multiset fall into different

classes where the elements in the same class are identical but are distinguishable from those of

other classes. For example, {1,1,2,3,1,3,2,2} is an example of multiset.

For positive integers n, m and k, let A ∈ D(n,m, k) be a distribution of n objects to m

bins where the objects fall into k classes. Let, nj represent the number of objects in the jth

class where 1 ≤ j ≤ k. Clearly, n1 + n2 + . . . + nk = n since every object must be in a class.

The bins are ordered and numbered as B1, B2, . . . , Bm. Each bin contains objects of different

classes. We order the different types of objects in a bin so that we can keep track of objects

of different classes. For each A ∈ D(n, m, k), we define a unique sequence (a1, a2, . . . , am),

where ai represents an inner sequence of positive integers (ti1, ti2, . . . , tik) where tij represents

the number of objects of jth type in ith bin Bi, for 1 ≤ i ≤ m, 1 ≤ j ≤ k. The sequence for A

is unique for each distribution because the bins are ordered and numbered and also the objects

of different types are ordered. For example, the sequence ((0, 0), (2, 1)) represents there are 2

bins because there are 2 sequence of sequences of integers and 3 objects which is sum of all the

integers in the sequence and there are 2 classes of objects where 2 objects are from class 1 and

1 object from class 2. Also the second bin contains 3 objects i.e. 2 objects from class 1 and 1

object from class 2 and the first bin is empty (see Figure 4.2).

������

������������

������

���

��� ((0,0),(2,1))

Figure 4.2: Representation of a distribution of 3 objects to 2 bins where the objects fall into

two classes and 2 objects from class 1 and 1 object from class 2.

For each such sequence of sequences in D(n,m, k), we have the following equations:

m∑

i=1

k∑

j=1

tij = n, (4.1)

m∑

i=1

tij = nj, for,1 ≤ j ≤ k, and (4.2)

4.3. Generating Distribution of Distinguishable Objects 51

k∑

j=1

nj = n. (4.3)

Equation 4.1 describes that the sum of all the integers in the sequence of sequences in equal

to the total number of objects. This holds because the number of objects are fixed and every

object is distributed somewhere in some bin. In Equation 4.2, we describe that every object

of same kind are present in the sequence. Since every object must be in a class Equation 4.3

holds.

For positive integer k, let a be a sequence of positive integers t1, t2, . . . , tk where tj ≥ 0 for

1 ≤ j ≤ k. We call a as zero sequence if t1 = t2 = · · · = tk = 0. That means all the integers

in a zero sequence are 0. We call a as nonzero sequence if there exists an index j, 1 ≤ j ≤ k,

such that tj 6= 0. That means a sequence is nonzero if at least one of the integers in the

sequence is nonzero. Let b be another sequence of positive integers u1, u2, . . . , uk for 1 ≤ j ≤ k.

By addition of two sequences a + b we mean the addition of corresponding elements tj + uj

where 1 ≤ j ≤ k. Similarly, by subtraction of two sequences a − b we mean the subtraction

of corresponding elements tj − uj where 1 ≤ j ≤ k and by equality of two sequences a = b we

mean the equality of corresponding elements tj = uj where 1 ≤ j ≤ k.

A listing of combinatorial objects is said to be in gray code order if each successive com-

binatorial objects in the listing differs by a constant amount. For example, the swapping of

elements, or the flipping of a bit. In this chapter, we establish such an ordering of all distribu-

tion of objects to bins so that each distribution can be generated by making constant amount

of changes to the preceding distribution in the order.

4.3 Generating Distribution of Distinguishable Objects

In this section we give an algorithm to generate all distributions of distinguishable objects to

bins. For that purpose we define a unique parent-child relationship among the distributions in

D(n,m, k) so that the relationship among the distributions can be represented by a tree with

a suitable distribution as the root. Figure 4.1 shows such a tree of distributions where each

Chapter 4. Distribution of Distinguishable Objects to Bins 52

distribution in the tree is in D(3, 3, 2). Once such a parent-child relationship is established,

we can generate all the distributions in D(n,m, k) using the relationship. We do not need to

build or store the entire tree of distributions at once, rather we generate each distribution in

the order it appears in the tree structure.

In Section 4.3.1 we define a tree structure among distributions in D(n,m, k) and in Section

4.3.2 we present our algorithm which generates each solution in O(1) time on average.

4.3.1 The Family Tree

In this section we define a tree structure Tn,m,k among distributions in D(n,m, k).

For positive integers n, m and k, let, A ∈ D(n,m, k) be a distribution of n objects to

B1, B2, . . . , Bm bins where the objects fall into k classes. Let, nj represent the number of

objects in the jth class where 1 ≤ j ≤ k. From Equation 4.3, n1 + n2 + . . . + nk = n. For each

A ∈ D(n,m, k), we define a unique sequence of sequences of positive integers (a1, a2, . . . , am),

where ai represents a sequence of integers (ti1, ti2, . . . , tik) where tij represents the number of

objects of jth type in ith bin Bi, for 1 ≤ i ≤ m, 1 ≤ j ≤ k.

Now we define the family tree Tn,m,k as follows. Each node in Tn,m,k represents a distribution

in D(n,m, k). If there are m bins then there are m levels in Tn,m,k. A node is in level i, 0 ≤ i < m

in Tn,m,k if tlj = 0 for 1 ≤ j ≤ k, 1 ≤ l < (m− i) and a(m−i) is nonzero sequence. So, a node at

level m− 1 has no leftmost inner zero sequence before leftmost inner nonzero sequence. As the

level increases the number of leftmost inner zero sequence decreases and vice versa. Since the

family tree is a rooted tree we need a root and the root is a node at level 0. One can observe

that a node is at level 0 in Tn,m,k if tlj = 0 for 1 ≤ j ≤ k, 1 ≤ l < (m) and tmj 6= 0 for 1 ≤ j ≤ k.

We also have from Equation 4.1 that∑m

i=1

∑kj=1 tij = n. Substituting the values for tlj for

1 ≤ j ≤ k, 1 ≤ l < (m) we find that∑k

j=1 tmj = n. By using Equation 4.2 and Equation 4.3, we

get tmj = nj where 1 ≤ j ≤ k. Thus we can say that there can be exactly one such node which

is our root. So, the sequence for root is ((0, . . . , 0), (0, . . . , 0), . . . , (0, . . . , 0), (n1, n2, . . . , nk)).

In other words, we can say that the number of leftmost inner zero sequence before any inner

nonzero sequence in root is greater than any other sequence for any distribution in D(n,m, k).

4.3. Generating Distribution of Distinguishable Objects 53

To construct Tn,m,k, we define two types of relationships: (a) Parent-child relationship and

(b) Child-parent relationship among the distributions in D(n,m, k) which are discussed in the

following sections.

Child-Parent Relationship

It is convenient to consider the child-parent relationship before the parent-child relationship.

Let, A ∈ D(n,m, k) be a sequence of sequences (a1, a2, . . . , am) which is not a root sequence,

where al represents a sequence of integers tlj for 1 ≤ j ≤ k, 1 ≤ l ≤ m. The sequence A

corresponds to a node of level i, 1 ≤ i < m. So, we have tlj = 0 for 1 ≤ j ≤ k, 1 ≤ l < (m− i)

and t(m−i) is nonzero sequence. We now define a unique parent sequence of A at level i− 1.

Let, P (A) ∈ D(n,m, k) be the parent sequence of A. We define the sequence for P (A) as

(p1, p2, . . ., pm−i, pm−i+1, . . . ,pm), 1 ≤ i < m where p1 = p2 = . . . = pm−i are zero sequences

and pm−i+1 = am−i + am−i+1 and pl = al for m− i + 1 < l ≤ m. Thus, we observe that P (A) is

a node of level i−1, 1 ≤ i < m and so p1, p2, . . . , pm−i are zero sequences and pm−i+1 is nonzero

sequence for 1 ≤ i < m. Thus for each consecutive level we only deal with two sequences am−i−1

and am−i and the rest of the sequences remain unchanged. The number of leftmost inner zero

sequence increases in the parent sequence by applying child-parent relationship. For example,

the solution ((1, 1), (1, 0)), for n = 3, m = 2, k = 2 and n1 = 2, n2 = 1, is a node of level 1

because a1 is a nonzero sequence. It has a unique parent ((0, 0), (2, 1)) as shown in Figure 4.3.

Parent-Child Relationship

The parent-child relationship is just the reverse of child-parent relationship. Let, A ∈ D(n,m, k)

be a sequence (a1, a2, . . . , am), where al represents a sequence of integers tlj for 1 ≤ j ≤ k,

1 ≤ l ≤ m. The sequence A corresponds to a node of level i, 0 ≤ i < m. So, we have

tlj = 0 for 1 ≤ j ≤ k, 1 ≤ l < (m − i) and a(m−i) is a nonzero sequence. Like the child-

parent relationship here we also deal with only two inner sequences in the sequence. From

the child-parent relationship, one can observe that the number of children of A is equal to

(∏k

j=1(t(m−i)j + 1))− 1.

Chapter 4. Distribution of Distinguishable Objects to Bins 54

Let, Cp(A) ∈ D(n,m, k) be the sequence of pth child, 1 ≤ p ≤ (∏k

j=1(t(m−i)j + 1)) − 1

of A. We define the sequence for Cp(A) as ( c1, c2, . . ., cm−i−1, cm−i, . . ., cm ), 0 ≤ i < m

where c1, c2, . . . , cm−i−2 are zero sequences and cm−i−1 = f(p, am−i), cm−i = am−i − f(p, am−i)

and cl = al for m − i + 1 ≤ l ≤ m, where f(p, am−i) is a sequence of integer dependent

on p and am−i. We define the sequence for f(p, am−i) as i1, i2, . . . , ik and f(p, am−i) generates

(∏k

j=1(t(m−i)j +1))−1 sequences. Thus, we observe that Cp is a node of level i+1, 0 ≤ i < m−1

and so c1, c2, . . . , cm−i−2 are zero sequences and cm−i−1 is nonzero sequence for 0 ≤ i < m− 1.

So, for each consecutive level we only deal with two sequences am−i−1 and am−i and the rest of

the sequences remain unchanged. The number of leftmost zero sequence decreases in the child

sequence by applying parent-child relationship. For example, the solution ((0, 0), (2, 1)), for

n = 3, m = 2, k = 2 and n1 = 2, n2 = 1, is a node of level 0 because a1 is a zero sequence, a2 is

not a zero sequence. Here, (∏k

j=1(t(m−i)j + 1))− 1 = (2 + 1).(1 + 1)− 1 = 5 so it has 5 children

and the five children are ((1,0),(1,1)), ((2,0),(0,1)), ((0,1),(2,0)), ((1,1),(1,0)) and ((2,1),(0,0))

as shown in Figure 4.3.

((1,0),(1,1))

Level 0

Level 1 ((2,0),(0,1)) ((0,1),(2,0)) ((1,1),(1,0)) ((2,1),(0,0))

((0,0),(2,1))

Figure 4.3: The sequence ((0, 0), (2, 1)) has five children.

The Family Tree

From the above definitions we can construct the family tree Tn,m,k. We take the sequence

Ar = a1, a2, . . . , am as root where a1, a2, . . . , am−1 are zero sequences and am = (n1, n2, . . . , nk)

as we mentioned before. The family tree Tn,m,k for the distributions in D(n,m, k) is shown

in Figure 4.1. Based on the above parent-child relationship, the following lemma proves that

every distribution in D(n,m, k) is present in Tn,m,k.

Lemma 4.3.1 For any distribution A ∈ D(n,m, k), there is a unique sequence of distributions

that transforms A into the root Ar of Tn,m,k.

4.3. Generating Distribution of Distinguishable Objects 55

Proof. Let, A ∈ D(n,m, k) be a sequence, where A is not the root sequence. We determine

the level of A in the family tree Tn,m,k. Then by applying child-parent relationship, we find the

parent sequence P (A) of A. Now if P(A) is the root sequence, then we stop. Otherwise, we

apply the same procedure to P (A) and find its parent P (P (A)). By continuously applying this

process of finding the parent sequence of the derived sequence, we have the unique sequence

A,P (A), P (P (A)), . . . of sequences in D(n,m, k) which eventually ends with the root sequence

Ar of Tn,m,k. We observe that P (A) has at least one zero more than A in its sequence. Thus

A,P (A), P (P (A)), . . . never lead to a cycle and the level of the derived sequence decreases

which ends up with the level of root sequence Ar. Q.E .D.

Lemma 4.3.1 ensures that there can be no omission of distributions in the family tree Tn,m,k.

Since there is a unique sequence of operations that transforms a distribution A ∈ D(n,m, k) into

the root Ar of Tn,m,k, by reversing the operations we can generate that particular distribution,

staring from root. We now have to make sure that the family tree Tn,m,k represents distributions

without repetition. Based on the parent-child and child-parent relationships, the following

lemma proves this property of Tn,m,k.

Lemma 4.3.2 The family tree Tn,m,k represents distributions in D(n,m, k) without repetition.

Proof. Given a sequence A ∈ D(n,m, k), the children of A are defined in such a way that no

other sequence in D(n,m, k) can generate same child. Let A,B ∈ D(n, m, k) be two different

sequences at level i of Tn,m,k. For a contradiction, assume that A and B generate the same

child C. Then C is a sequence of level i + 1 of Tn,m,k. The sequences for A, B and C are aj, bj

and cj for 1 ≤ j ≤ m. Clearly, al=bl for 1 ≤ l ≤ m − i − 1 and the parent-child relationship

yields al = bl = cl for m− i + 1 ≤ l ≤ m. Therefore al = bl for l 6= m− i and 1 ≤ l ≤ m. But

we have a1 + a2 + . . . + am = b1 + b2 + . . . + bm by Equation 4.1. Then al must be equal to

bl, for 1 ≤ l ≤ m. This implies that A and B are the same sequence, a contradiction. Hence

every sequence has a single and unique parent. Q.E .D.

Chapter 4. Distribution of Distinguishable Objects to Bins 56

4.3.2 The Algorithm

In this section, we give an algorithm to construct Tn,m,k and generate all distributions.

If we can generate all child sequences of a given sequence in D(n,m, k), then in a recursive

manner we can construct Tn,m,k and generate all sequence in D(n,m, k). We have the root

sequence Ar = ((0, . . . , 0), (0, . . . , 0), . . . , (0, . . . , 0), (n1, n2, . . . , nk)). We get the child sequence

Ac by using the parent to child relation discussed above.

Procedure Find-All-Child-Distributions(A = ( ( t11, t12, . . ., t1k ), ( t21, t22, . . .,

t2k ), . . ., ( tm1, tm2, . . ., tmk ) ), i)

{ A is the current sequence, i indicates the current level, Ac is the child sequence }begin

1 Output A {Output the difference from the previous distribution}2 for ik = 0 to t(m−i)k

3 for ik − 1 = 0 to t(m−i)(k−1)

4 . . .

5 for i1 = 0 to t(m−i)1

6 Find-All-Child-Distributions( Ac = ( ( t11, t12, . . ., t1k ), ( t21, t22, . . ., t2k

), . . ., ( t(m−i−2)1, t(m−i−2)2, . . ., t(m−i−2)k ), ( i1, i2, . . ., ik ), ( t(m−i)1− i1, t(m−i)2− i2,

. . ., t(m−i)k − ik ), . . ., ( tm1, tm2, . . ., tmk ) ), i + 1)

7 end;

8 Algorithm Find-All-Distributions(n,m)

9 begin

10 Find-All-Child-Distributions( Ar = ((0, . . . , 0),(0, . . . , 0),. . .,(0, . . . , 0),(n1,n2,. . . ,

nk)), 0 )

11 end.

Lemma 4.3.1 and 4.3.2 ensure that Algorithm Find-All-Distributions generates all

distributions without repetition. We now have the following theorem.

4.4. Efficient Tree Traversal 57

Theorem 4.3.3 The algorithm Find-All-Distributions runs in O(|D(n,m, k)|) time and

uses O(mk) space.

Proof. We traverse the family tree Tn,m,k and output each sequence at each correspond-

ing vertex of Tn,m,k when we visit the vertex for the first time. Hence, the algorithm takes

O(|D(n,m, k)|) time i.e. constant time on average for each output. Our algorithm outputs

each distribution as the difference from the previous one. The data structure that we use to

represent the distribution is a sequence of sequences of integers where each integer represents

the number of objects a particular class in a particular bin. Therefore, the memory requirement

is O(mk), where k is the number of types of objects and m is the number of bins. Q.E .D.

4.4 Efficient Tree Traversal

The algorithm in Section 4.3 generates all sequences in D(n,m, k) in O(|D(n,m, k)|) time. Thus

the algorithm generates each sequence in O(1) time “on average”. However, after generating

a sequence corresponding to the last vertex in the largest level in a large subtree of Tn,m,k, we

have to merely return from the deep recursive call without outputting any sequence and hence

we cannot generate each sequence in O(1) time (in ordinary sense). In this section we present

the improved tree traversal algorithm that generates each solution in O(1) time (in ordinary

sense).

To make the algorithm efficient we introduce two additional types of relations:

(i) Relationship between left sibling and right sibling and

(ii) Leaf-ancestor relationship.

In Section 4.4.1 we define the relationship between left sibling and right sibling. In Section

4.4.2, we illustrate the leaf-ancestor relationship. Section 4.4.3 shows the data structure that

we use to represent a distribution A ∈ D(n, m, k). Finally, in Section 4.4.4 we present our

efficient tree traversal algorithm.

Chapter 4. Distribution of Distinguishable Objects to Bins 58

4.4.1 Relationship Between Left Sibling and Right Sibling

The relationship between left sibling and right sibling is defined so that the difference between

a distribution A and its right sibling As should be minimum if right sibling exists. Thus As

can be generated from A by minimum effort. To generate right sibling sequence As, if it exists,

from left sibling sequence A using constant number of steps we generate parent from the left

sibling using child-parent relationship. Then we generate the next child using parent-child

relationship. Thus we will always require 2 steps to generate right sibling from left sibling

which is a constant time solution.

Let, A ∈ D(n,m, k) be a sequence of sequences (a1, a2, . . . , am) which is not a root sequence,

where al represents a sequence of integers tlj for 1 ≤ j ≤ k, 1 ≤ l ≤ m. The sequence A

corresponds to a node of level i, 0 ≤ i < m. So, we have tlj = 0 for 1 ≤ j ≤ k, 1 ≤ l < (m− i)

and a(m−i) is not a zero sequence. We say the right sibling, As ∈ D(n,m, k) of this node A

exists if t(m−i+1)j 6= 0 for 1 ≤ j ≤ k at level i. Then we call the sequence A left sibling of As.

We define the sequence for As as s1, s2, . . . , sm−i, sm−i+1, . . . , sm, 1 ≤ i < m where s1, s2,

. . . , sm−i−1 are zero sequences and sj = aj for m− i + 2 ≤ j ≤ m and to find sm−i, sm−i+1 we

apply child-parent relationship and then parent-child relationship. Thus, we observe that As

is a node of level i, 1 ≤ i < m and so s1, s2, . . . , sm−i−1 are zero sequences and sm−i is nonzero

sequence for 1 ≤ i < m. For example, the solution ((0, 0), (1, 0), (1, 1)), for n = 3, m = 3, k = 2

and n1 = 2, n2 = 1, is a node of level 1 because a1 is a nonzero sequence. It has a unique right

sibling ((0, 0), (2, 0), (0, 1)) as shown in Figure 4.4.

Level 1

Level 0

((0,0),(1,0),(1,1))

((1,0),(0,0),(1,1))Level2 ((1,0),(1,0),(0,1)) ((2,0),(0,0),(0,1)) ((0,1),(0,0),(2,0))

((0,0),(0,1),(2,0)) ((0,0),(1,1),(1,0)) ((0,0),(2,1),(0,0))((0,0),(2,0),(0,1))

((0,0),(0,0),(2,1))

Figure 4.4: Efficient Traversal of the family tree T3,3,2.

4.4. Efficient Tree Traversal 59

4.4.2 Leaf-Ancestor Relationship

To avoid returning from deep recursive call without outputting any sequence, we define leaf-

ancestor relationship. After generating the sequence Al of the last vertex in the largest level

i.e. rightmost leaf, we do not return to parent. Instead, we return to the nearest ancestor

Aa which has right sibling. By rightmost leaf we mean that leaf which has no right sibling.

Thus this leaf-ancestor relation saves many non generation steps. Another reason of defining

leaf-ancestor relationship is that the nearest ancestor can be generated from the leaf sequence

by just a simple swap operation between two inner sequences in the sequence. This is possible

due to the data structure that we use for this case (as described in the following subsection).

For the swap operation we just swap the pointers to sequences. The other inner sequences

remain unchanged.

Let, Al ∈ D(n,m, k) be a sequence of sequences (a1, a2, . . . , am) of leaf, where ap represents

a sequence of integers tpj for 1 ≤ j ≤ k, 1 ≤ p ≤ m. The sequence A corresponds to a node

of level m − 1. So, we have a(m−i) is a nonzero sequence. We say that the ancestor sequence

Aa ∈ D(n,m, k) of this node Al exists if a2 is a zero sequence that is it has no right sibling.

We define a unique ancestor sequence of Al at level m − 1 − q where a2, a3, . . . , aq+1 are zero

sequence and aq+2 is a nonzero sequence. This means we want to skip the long sequence of

inner zero sequence in the sequence for Al. The nearest ancestor sequence is determined by the

number of zero sequence in this sequence. We denote the number of inner zero sequence as q.

This q will determine the level and sequence of the nearest ancestor Aa which has sibling.

We define the sequence for Aa as s1, s2, . . . , sq, sq+1, . . . , sm, where s1, s2, . . . , sq are zero

sequence and sq+1 = a1 and sj = aj for q + 1 < j ≤ m. In other words, we just swap the

sequences a1 and aq+1 in the sequence and the rest of the inner sequences remain unchanged.

For example, in Figure 4.4 the solution ((2,0),(0,0),(0,1)), for n = 3, m = 3, k = 2 and

n1 = 2, n2 = 1, is a node of level 1 because a1 is a nonzero sequence. It has a unique ancestor

((0, 0), (2, 0), (0, 1)) which is obtained by swapping first and second sequences. We have the

following lemma on the nearest ancestor Aa of Al.

Lemma 4.4.1 Let Al be a leaf sequence of Tn,m,k having no right sibling. Then Al has a unique

Chapter 4. Distribution of Distinguishable Objects to Bins 60

ancestor sequence Aa in Tn,m,k. Furthermore, either Aa has a right sibling in Tn,m,k or Aa is

the root Ar of Tn,m,k.

Proof. Let the sequence for Al ∈ D(n,m, k) be (a1, a2, . . . , am) and it corresponds to a node

of level m− 1 of Tn,m,k. Note that a1 is a nonzero sequence and a2 is a zero sequence. We get

the sequence for Aa by swapping a1 and aq+1 where q is the number of consecutive inner zero

sequence after a1. Clearly Aa is an ancestor of Al. Note that Aa is at level m − 1 − q. By

Lemma 4.2, parent-child relation is unique. Hence by repeatedly applying child-parent relation

on Al, we will reach a unique ancestor at level m− 1− q. For q = m− 1, one can observe that

we get the root sequence Ar by swapping a1 and am. For 1 ≤ q < m − 1, we get the unique

ancestor sequence Aa which has a right sibling. Q.E .D.

Lemma 4.4.1 ensures that Al has a unique ancestor Aa. As we see later Aa plays an

important role in our algorithm. Note that, we may need to return to ancestor Aa if current

node is a leaf Al and for a leaf sequence Al we have a1 is a nonzero sequence. Aa is obtained

from Al by swapping a1 and aq+1 where q is the number of consecutive zero sequence after a1.

Now, to find out q we have to search the sequence Al from a1 to aq+1 such that a2, a3, . . . , aq+1

are inner zero sequence and a1, aq+2 are inner nonzero sequence. We reduce the complexity

of searching by keeping extra information as shown in Figure 4.5. The information consists of

the number of subsequences of consecutive inner zero sequence and the number of inner zero

sequence in each subsequence after am−i, where i is the current level. For this we keep a stack

of size m/2. The top of the stack determines the current q. Initially the stack is empty. As soon

as we find a zero sequence, when moving from parent to child or left sibling to right sibling, we

push a 1 on the stack. We increment the top of the stack for consecutive zero sequence. We

make a pop operation when we apply the leaf-ancestor relation. The stack operations are shown

in Figure 4.5. One can observe that there can be at most m/2 subsequences of consecutive

inner zero sequence in a sequence of size m. Therefore, in worst case we need a stack of size

m/2.

4.4. Efficient Tree Traversal 61

Level 1 ((0,0),(1,0),(1,1))

Level2

((0,0),(0,1),(2,0)) ((0,0),(1,1),(1,0)) ((0,0),(2,1),(0,0))((0,0),(2,0),(0,1))

Level 0 ((0,0),(0,0),(2,1))

((1,0),(0,0),(1,1)) 1

1

2

Figure 4.5: Efficient Traversal of T4,3 keeping extra information.

4.4.3 Representation of a Distribution in D(n,m, k)

In this section we describe the data structure that we use to represent a distribution in

D(n,m, k) that will help us to generate each distribution in constant time.

The operations that we use to generate distributions are addition, subtraction, increment,

decrement and swap. The index of the two operands for all of this operations are known.

So, we might think of keeping of array of integers. Since for distinguishable objects we deal

with sequence of sequences, we may want to use array of array of integers that means two-

dimensional array of integers. But note that for applying leaf-ancestor relationship we need

to swap enter sequence of integers. By keeping 2D-array of integers it will take O(k) time to

swap such array of integer. This is not efficient. To do swap operation in constant time, we

use a special data structure as shown in Figure 4.6. We keep an array of pointers for each

bins pointing to an array of integers. The array of integers represents the inner sequence that

is the sequence of different types of objects in a particular bin. The structure may be viewed

as array of objects where each object is an array of integers in object-oriented sense. Thus by

swapping the pointers we will be able to swap entire array in O(1) time.

t 11 t t21 m112t 1kt 22t t 2k m2t t mk

Figure 4.6: Illustration of data structure that we use to represent a distribution for distinguish-

able objects.

Chapter 4. Distribution of Distinguishable Objects to Bins 62

Now we examine the memory requirement of the data structure that we use. There are a

sequence of pointers of O(m) size where m is the number of bins. Each pointer points to an

array of integers of size O(k) where k is the number of types of objects. Thus, the memory

requirement is O(mk). The other operations addition, subtraction, increment and decrement

operations remain constant time and are not hampered by the modification of the data structure

since the array structure is also kept intact.

4.4.4 The Efficient Algorithm

In this section we present an efficient algorithm to generate all distributions in D(n,m, k). We

use three relations in this algorithm, they are parent-child relation, relation between left sibling

and right sibling and leaf-ancestor relation. By applying parent-child relation, we go from root

down the family tree Tn,m,k until we reach leaf at level m− 1. Then we apply the relationship

between left sibling and right sibling to traverse horizontally until we reach a node which has

no right sibling. Then by applying leaf-ancestor relation, we return to that nearest ancestor

which has sibling. Then we again apply relation between left sibling and right sibling. The

sequence of applying relationships and generating distributions continues until we reach root.

This algorithm thus reduces non-generation steps and generates each sequence in O(1) time (in

ordinary sense).

Procedure Find-All-Child-Distributions2( A = ( ( t11, t12, . . ., t1k ), ( t21, t22,

. . ., t2k ), . . ., ( tm1, tm2, . . ., tmk ) ) , i)

{ A is the current sequence }begin

1 Output A {Output the difference from the previous distribution}2 if A has child then

3 Generate first child Ac;

4 Find-All-Child-Distributions2(Ac, i + 1);

5 else if A has right sibling then

6 Generate right sibling As;

4.4. Efficient Tree Traversal 63

7 Find-All-Child-Distributions2(As, i);

8 else

9 Generate that ancestor Aa at level i− q which has right sibling or which is root;

10 if Aa is root at level 0

11 then done

12 else

13 Generate right sibling Aas of Aa;

14 Find-All-Child-Distributions2(Aas, i− q);

15 end;

16 Algorithm Find-All-Distributions2(n,m)

17 begin

18 Find-All-Child-Distributions2( Ar = ((0, . . . , 0),(0, . . . , 0),. . .,(0, . . . , 0),(n1, n2, . . .,

nk)), 0 );

19 end.

The tree traversal according to the efficient algorithm is depicted in Figure 4.4. For a

sequence we need O(mk) space and additional m/2 space is required for stack manipulation.

Hence the algorithm takes O(mk) space. One can observe that the algorithm generates all

sequences such that each sequence in Tn,m,k can be obtained from the preceding one by at most

two operations. Note that if A corresponds to a vertex v of Tn,m,k, which has no child and no

sibling, we need two steps, one for tracing its ancestor and the other for tracing ancestor’s right

sibling. Otherwise, we need only one step to generate the next sequence. Thus the algorithm

generates each sequence in O(1) time per sequence. Note that each sequence is similar to the

preceding one, since it can be obtained by at most two operations (see Figure 4.7). Thus, we can

regard the derived sequence of the sequences as a combinatorial Gray code [S97, KN05, R00]

for distributions. Thus we have the following theorem.

Theorem 4.4.2 The algorithm Find-All-Distributions2 uses O(mk) space and generates

each distribution in D(n,m, k) in constant time (in ordinary sense).

Chapter 4. Distribution of Distinguishable Objects to Bins 64

((0,0),(2,1),(0,0))

((1,0),(1,1),(0,0))

((2,0),(0,1),(0,0))

((0,1),(2,0),(0,0))

((1,1),(1,0),(0,0))

((2,1),(0,0),(0,0))

((0,0),(0,0),(2,1))

((0,0),(1,0),(1,1))

((1,0),(0,0),(1,1))

((0,0),(2,0),(0,1))

((1,0),(1,0),(0,1))

((2,0),(0,0),(0,1))

((0,0),(0,1),(2,0))

((0,1),(0,0),(2,0))

((0,0),(1,1),(1,0))

((1,0),(0,1),(1,0))

((0,1),(1,0),(1,0))

((1,1),(0,0),(1,0))

Figure 4.7: A Gray code for D(3, 3, 2).

4.5 Generating Distributions with Priorities to Bins

In this section, we consider the case when priorities are associated with bins. The sequence of

generations will maintain an order such that the bin with highest priority gets highest objects

at first and then the priorities of the bins are decreased one by one. The sequence of generations

maintain an order so that generations maintain priority.

Our algorithm generates distributions with a specified order of generation. For positive

integers n, m and k, let, A ∈ D(n,m, k) be a distribution of n distinguishable objects to m bins.

The bins are ordered and numbered as B1, B2, . . . , Bm which have priorities as p1, p2, . . . , pm.

Our order of generation is such that the rightmost bin gets the highest number of objects at

first and the leftmost bin gets the lowest. The number of objects in the rightmost bin decreases

with the sequence of generations. At the last distribution ensures that the leftmost bin gets the

highest number of objects and the rightmost bin gets lowest. We find that in our generation if

we order the bins according to their priorities in ascending order then we will get the generation

maintaining priority. That means the highest priority bin gets the highest objects at first and

the number of objects decreases with the generations. The algorithm will be same as the

previous case but only the order of the bins will be such that the following inequality holds:

p1 ≥ p2 ≥ . . . ≥ pm. The generation is similar to Figure 4.4.

4.6 Conclusion

In this chapter we give a simple algorithm to generate all distributions in D(n,m, k). The

algorithm generates each distribution in constant time with linear space complexity. We also

4.6. Conclusion 65

present an efficient tree traversal algorithm that generates each solution in O(1) time. Then,

we extend our algorithms for the case when the bins have priorities associated with them. The

main feature of our algorithms is that they are constant time solution which is a very important

requirement for generation problems.

Chapter 5

Evolutionary Trees

5.1 Introduction

In bioinformatics, we frequently need to establish evolutionary relationship between different

types of species [JP04, KR03]. Biologists often represent this relationship in the form of binary

trees. Such complete binary trees having different types of species in its leaves are known

as evolutionary trees (see Figure 5.1). In a rooted evolutionary tree, the root corresponds to

the most ancient ancestor in the tree. Leaves of evolutionary trees correspond to the existing

species while internal vertices correspond to hypothetical ancestral species.

Evolutionary trees are used to predict predecessors of existing species, to comment about

future generations, DNA sequence matching, etc. Prediction of ancestors can be easy if all

possible trees are generated. Moreover, it is useful to have the complete list of evolutionary

trees having different types of species. One can use such a list to search for a counter-example

to some conjecture, to find best solution among all solutions or to experimentally measure an

average performance of an algorithm over all possible input evolutionary trees. Many algorithms

to generate a given class of graphs without repetition, are already known [KN05, ZS98, NU03,

YN04, FL79, NU04, NU05, BS94, LBR93]. Many nice textbooks have been published on the

subject [AU95, R00, GKP94].

Let E(n) represents the set of all evolutionary trees with n distinct species. The number

66

5.1. Introduction 67

Panda Raccoon MonkeyBear

5 millions of years ago

10 millions of years ago

20 millions of years ago

Figure 5.1: The evolutionary tree having four species.

of such trees is exponential in general. For example, assume we want to find all possible

evolutionary trees of three species say Bear, Panda and Monkey. There may be three distinct

evolutionary trees having these species as leaves as shown in Figure 5.2.

Bear Panda Monkey Bear Monkey Panda Bear Panda Monkey

Figure 5.2: All possible evolutionary trees having three species.

In this thesis we first consider the problem of generating all possible evolutionary trees. The

main challenges in finding algorithms for enumerating all distributions are as follows. Firstly,

the number of such trees is exponential in general and hence listing all of them requires huge

time and computational power. Secondly, generating algorithms produce huge outputs and the

outputs dominate the running time. For this reason, reducing the amount of output is essential.

Thirdly, checking for any repetitions must be very efficient. Storing the entire list of solutions

generated so far will not be efficient, since checking each new solution with the entire list to

prevent repetition would require huge amount of memory and overall time complexity would

be very high. So, if we can compress the outputs, then it considerably improves the efficiency

of the algorithm. Therefore, many generating algorithms output objects in an order such that

each object differs from the preceding one by a very small amount, and output each object as

the “difference” from the preceding one.

Generating evolutionary trees is more like generating complete binary rooted trees with fixed

and labeled leaves. That means there are fixed number of leaves and the leaves are labeled.

Chapter 5. Evolutionary Trees 68

There are some existing algorithms for generating rooted trees with n vertices [KN05, NU03,

NU04, NU05, LBR93, BS94]. But these algorithms do not guarantee that there will be fixed

and labeled leaves. If we generate all binary trees with n leaves with existing algorithms then

we have to label each trees and permutate labels to generate all trees. Since the siblings are

not ordered, permutating the labels lead to repetition. Thus modifying existing algorithms we

cannot generate all evolutionary trees.

In this thesis we first give an algorithm to generate all evolutionary trees with n species.

The problem is difficult to solve because the solution space is large and we have to generate

all solution without repetition. For instance, if there are 4 species, the number of possible

evolutionary trees is 15 but for 5 species we have 105 solutions. Moreover, in this case the

siblings do not maintain any order. Hence the main challenges are to avoid mirror repetition

and sibling repetition. We generate each tree in such a way that the siblings are ordered and

maintain a special structure of the tree so that mirror repetition does not occur. Our algorithm

is simple and generates each tree in linear time without repetition (O(1) in amortized sense).

The rest of the chapter is organized as follows. Section 5.2 gives some definitions. Section

5.3 deals with generating all evolutionary trees with n labeled leaves. In Section 5.4 we define

the recursion tree structure among evolutionary trees in E(n) and in Section 5.5 we present our

algorithm which generates each solution in O(n) time in worst case (O(1) in amortized sense).

Finally Section 5.6 is a conclusion.

5.2 Preliminaries

In this section we define some terms used in this chapter.

Let G be a connected graph with n vertices. Degree of a vertex is number of edges incident

to that vertex. A tree is a connected graph without cycles. A rooted tree is a tree with one

vertex r chosen as root. A leaf in a tree is a vertex of degree 1. Each vertex in a tree is either

an internal vertex or a leaf. A complete binary tree is a rooted tree with each internal node

having exactly two children.

5.3. Generating Labeled Evolutionary Trees 69

An evolutionary tree is a graphical representation of the evolutionary relationship among

three or more species. In a rooted evolutionary tree, the root corresponds to the most ancient

ancestor in the tree and the path from the root to a leaf in the rooted tree is called an evo-

lutionary path. Leaves of evolutionary trees correspond to the existing species while internal

vertices correspond to hypothetical ancestral species.

In this thesis, we represent evolutionary tree in terms of complete binary tree. Each existing

species of evolutionary tree is a leaf in the complete binary tree (see Figure 5.3). We give labels

to each leaves. The label identifies the existing species. For example, labels A, B, C and D

represents Bear, Panda, Raccoon and Monkey. The labels are fixed and we call such trees as

labeled trees.

A B DCPanda MonkeyRaccoonBear

Figure 5.3: Representation of evolutionary tree in terms of complete binary tree.

5.3 Generating Labeled Evolutionary Trees

In this section, we present an algorithm to generate all evolutionary trees with n species.

Generating all evolutionary trees in E(n) is difficult to solve because the solution space is

large. For instance, if there are 4 species, the number of possible evolutionary trees is 15 but

for 5 species we have 105 solutions. The main challenges here are to avoid mirror repetition

and sibling repetition. Two evolutionary trees with n species are mirror image of each other,

if one can be found by taking the mirror image of the other. Similarly, two evolutionary trees

of n species are sibling equivalent to each other, if one can be found from another by changing

the order of children of any internal node. For example, the evolutionary trees of Figure 5.4(a)

and (b) are mirror image of one another. The evolutionary trees of Figure 5.5(a) and (b) are

sibling equivalent since one is obtained from another by changing the order of siblings. In this

Chapter 5. Evolutionary Trees 70

section, we present our algorithm of generating all evolutionary trees in E(n) which avoids such

repetitions.

A B

C D

E AB

CD

E

(b)(a)

Figure 5.4: Two evolutionary trees of (a) and (b) are mirror image of one another.

A B

C D

E B A

D C

E

(b)(a)

Figure 5.5: Two evolutionary trees of (a) and (b) are sibling equivalent to one another.

The main idea of our algorithm is to generate all possible subtrees of the evolutionary trees

in E(n). Then we assign numbers to the subtrees in a way that mirror repetition does not occur.

We call the set of subtrees that make up an evolutionary tree e ∈ E(n), the partial solution of

e. Then we recombine the subtrees in the set in an efficient manner such that sibling repetition

does not occur. For that purpose we define a recursion tree structure among the set of all

subtrees. A recursion tree is a family tree where each leaf is a solution and each internal node

is a partial solution e.g. set of subtrees. Along the path from root to a leaf we move towards a

solution. Hence to generate solutions we define a unique parent-child relationship among the

set of all subtrees of all evolutionary trees in E(n) so that the relationship among the set of

subtrees can be represented by a recursion tree with a suitable set as the root. Figure 5.6 shows

such a recursion tree of 4 species. Once such a parent-child relationship is established, we can

generate all the evolutionary trees in E(n) using the relationship. We do not need to build or

5.4. The Recursion Tree 71

store the entire recursion tree at once, rather we generate each evolutionary tree in the order

it appears in the recursion tree structure.

A B

D

C

A B

C

D

A B C D

A C

D

B

A C

B

D

A C B D

A D

C

B

A D

B

C

A DB C

B C

A

D

B C

D

A

B D

A

C

B D

C

A

C D

A

B

C D

B

A

A B

C

D

A B

D

C

BA C D

A C

B

D

A C

D

B

CA B D

A D

B

C

A D

C

B

DA B C

B C

A

D

B C

D

A

B D

A

C

B D

C

A

C D

A

B

C D

B

A

B C

A D

A C

B D

A B

C D

B D

A C

C D

A B

A B C D

A D

B C

Figure 5.6: The Recursion Tree R4.

5.4 The Recursion Tree

In this section we define a recursion tree structure Rn among the evolutionary trees in E(n).

For that purpose we assign numbers 1, 2, . . . , n to the species so that each species has identity.

This in turn helps us avoid repetitions.

For positive integer n, let S(n) be the set of all sets of subtrees that make up all evolutionary

trees in E(n). Let A ∈ S(n) be a set of subtree t1, t2, . . . , tk where k denotes the number of

subtrees in the set A. The set is ordered so we call A a sequence of subtree. We assign a number

ai to a subtree ti for 1 ≤ i ≤ k. Thus we get a sequence of integers a1, a2, . . . , ak associated

with A. The ais are calculated from the numbers assigned to the species. If there are m

species numbered s1, s2, . . . , sm in a subtree ti, then ai = min{s1, s2, . . . , sm} for 1 ≤ i ≤ k

and m ≤ n − k + 1. That means ai is the lowest numbered species in the subtree ti. For

example, Figure 5.7) shows a sequence of subtrees where the ais for each subtree are shown in

the internal nodes. Note that, for a leaf of the subtree ai = si.

Now we define the recursion tree Rn as follows. Each node of Rn represents a sequence

of subtree. If there are n species then there are n levels in Rn. A node is in level i in Rn if

Chapter 5. Evolutionary Trees 72

32

2 5 6

1 4

1

Figure 5.7: Illustration of a sequence of subtree A ∈ S(6).

there are n − i subtrees in the sequence of subtree for 0 ≤ i < n. As the level increases, the

number of subtree in the sequence decreases and vice versa. Thus a node at level n − 1 has

only one subtree which is an evolutionary tree. Since Rn is a rooted tree we need a root and

the root is a node at level 0. One can observe that a node is at level 0 in Rn if there are n

subtrees in the sequence of subtrees and there can be exactly one such node. In this case, all

the subtrees are the species itself i.e. aj = sj for 1 ≤ j ≤ n. For the root sequence we order the

subtrees according to the numbers assigned to the species. Thus for a root sequence we have

a1 < a2 < · · · < an. Clearly, the number of subtrees in root is greater than that of any other

sequence of subtrees in S(n).

To construct Rn, we define two types of relations among the sequence of subtrees in S(n):

(a) Parent-child relationship and

(b) Child-parent relationship.

We define the parent-child relationships among the sequence of subtrees in S(n) with two

goals in mind. First, the difference between a sequence of subtree A and its child C(A) should

be minimum, so that C(A) can be obtained from A by minimum effort. Second, every sequence

in S(n) must have a parent except the root and only one parent in Rn. We achieve the first

goal by ensuring that the child C(A) of a sequence A can be found by simply combining two

subtrees. That means A can also be generated from its child C(A) by simple decomposition.

The second goal, that is the uniqueness of the parent-child relationship is illustrated in the

following subsections.

5.4. The Recursion Tree 73

5.4.1 Parent-Child Relationship

Let A ∈ S(n) be a set of subtree t1, t2, . . . , tk where k denotes the number of subtrees in the

set A. Thus it corresponds to a node of level n− k, 1 ≤ k ≤ n of Rn. We have the numbers ai

associated with each ti for 1 ≤ i ≤ k. The sequence of subtrees of the children are defined in

such a way that to generate a child from its parent we have to deal with only two subtrees in the

sequence and the rest of the subtrees remain unchanged. The number of subtrees 0 decreases

in the child sequence by applying parent-child relationship.

Let C(A) ∈ S(n) be the sequence c1, c2, . . . , ck−1 of a child of A resulting from recombination

of ti and tj in the sequence for A, 1 ≤ i, j ≤ k. After recombination ti becomes the left child of

the root and tj becomes the right child of the root and the new subtree is denoted ti + tj. ti + tj

becomes the first subtree in the sequence for C(A) i.e. c1 = ti + tj. The rest of the subtrees in

the sequence remain unchanged in the order. We only recombine those ti and tj that do not

lead to repetition in the recursion tree. We have the following two cases based on whether ti

recombines with tj or not.

Case 1: i = 1

In this case t1 recombines with tj, for 1 < j ≤ n. Thus the number of such child is

n − 1 and we call these child as Type I child. We compute new a1 for c1 for this case as

a1(new) = mina1, aj. For example, in Figure 5.8 the first three children are obtained by the

recombination of t1 + t2, t1 + t3 and t1 + t3.

Case 2: i > 1

In this case ti recombines with tj only if a1 < ai and ai < aj, for i < j ≤ n. We call such

child as Type II child. One can observe that this case helps us avoid the sibling repetition. We

compute new a1 for c1 for this case as a1(new) = ai. For example, the fourth child in Figure 5.8

in generated by the recombination of t3 + t4.

Chapter 5. Evolutionary Trees 74

5 6

32 1 4

12

1

32

2

2

5

1 4

1 6

32

2

2

6

1 4

1 5

32

2 5 6

1 4

1

65

5

2 3

2

1 4

1

Type I Type II

Figure 5.8: Illustration of Type I and Type II child of a sequence of subtree A ∈ S(6).

5.4.2 Child-Parent Relationship

The child-parent relation is just the reverse of parent-child relation. Let A ∈ S(n) be a set of

subtree t1, t2, . . . , tk where k denotes the number of subtrees in the set A. Thus it corresponds

to a node of level n − k, 1 ≤ k ≤ n of Rn. We have the numbers ai associated with each

ti for 1 ≤ i ≤ k. We define a unique parent sequence of A at level n − k − 1 of Rn. Like

the parent-child relationship here we also deal with only two subtrees in the sequence. The

operations we apply to these two integers are only addition and assignment. The number of

subtrees 0 increases in the parent sequence by applying child-parent relationship.

Let P (A) ∈ S(n) be the sequence p1, p2, . . . , pk−1 of the parent of A. We get the sequence

for the parent by the decomposition of the subtree t1 in the sequence for A. Let pi and pj be

the resulting subtrees after decomposition. So, pi is the left child of t1 and pj is the right child

of t1. We compute ai and aj for the subtrees pi and pj. Now the following two cases occur

depending on the values of ai and aj that are associated with pi and pj.

Case 1: ai = a1(old) or aj = a1(old)

In this case p1 recombines with some pj and so we get i = 1. Hence A is a Type I child of

P (A). We place pj in the sequence for P (A) so that the subsequence from p2 to pn is sorted in

ascending order. The rest of the subtrees in the sequence remain unchanged in the order (see

5.4. The Recursion Tree 75

Figure 5.8).

Case 2: ai 6= a1(old) or aj 6= a1(old)

In this case A is a Type II child of P (A). We place pi and pj in the sequence for P (A) so

that the subsequence from p2 to pn is sorted in ascending order. The rest of the subtrees in the

sequence remain unchanged in the order (see Figure 5.8).

5.4.3 The Recursion Tree

From the above definitions we can construct Rn. We take the sequence t1, t2, . . . , tn of numbered

species as the root Ar as we mentioned before. The recursion tree Rn for the sequence of

subtrees in S(n) is shown in Figure 5.6. Based on the above parent-child relationship, the

following lemma proves that every sequence of subtrees in S(n) is present in Rn.

Lemma 5.4.1 For any sequence of subtree A ∈ S(n), there is a unique sequence of sequences

of subtrees that transforms A into the root Ar of Rn.

Proof. Let A ∈ S(n) be a sequence, where A is not the root sequence. By applying

child-parent relationship, we find the parent sequence P (A) of the sequence A. Now if P(A)

is the root sequence, then we stop. Otherwise, we apply the same procedure to P (A) and find

its parent P (P (A)). By continuously applying this process of finding the parent sequence of

the derived sequence, we have the unique sequence A,P (A), P (P (A)), . . . of sequences in S(n)

which eventually ends with the root sequence Ar of Rn. We observe that P (A) has at least

one subtree more than A in its sequence. Thus A,P (A), P (P (A)), . . . never lead to a cycle

and the level of the derived sequence decreases which ends up with the level of root sequence

Ar. Q.E .D.

Lemma 5.4.1 ensures that there can be no omission of sequence of subtrees in the recursion

tree Rn. Since there is a unique sequence of operations that transforms a sequence A ∈ S(n) into

the root Ar of Rn, by reversing the operations we can generate that particular sequence, staring

from root. Now we have to make sure that Rn represents sequences without repetition. Based

Chapter 5. Evolutionary Trees 76

on the parent-child and child-parent relationships, the following lemma proves this property of

Rn.

Lemma 5.4.2 The recursion tree Rn represents sequences of subtrees in S(n) without repeti-

tion.

Proof. Given a sequence A ∈ S(n), the children of A are defined in such a way that no other

sequence in S(n) can generate same child. For contradiction let two sequences A,B ∈ S(n) are

at level i of Rn and generate same child C. So, C is a sequence of level i + 1 of Rn. Since A

and B are at level i, they have same number of subtrees in the sequence and at least two of

the subtrees are different. According to child-parent relationship, we get P(C) by decomposing

t1 in the sequence for C. Decomposing t1 always results in two subtrees and the rest of the

subtrees in the sequence remain unchanged. Hence by applying child-parent relationship we

always get only one parent at level i. By contradiction, every sequence has a single and unique

parent. Q.E .D.

5.5 The Algorithm

In this section we give an algorithm to construct Rn and generate all evolutionary tree in E(n).

If we can generate all child sequences of a given sequence in S(n), then in a recursive manner

we can construct Rn and generate all evolutionary trees in E(n). We have the root sequence

Ar = (s1, s2, . . . , sn) where si are numbered species and s1 < s2 < · · · < sn . We get the child

sequence Ac by using the parent to child relation discussed above.

Procedure Find-All-Child-Subtrees(T = (t1, t2, . . . , tn−i), A = (a1, a2, . . . , an−i), i)

{ T is the current sequence of subtrees, i indicates the current level, A is the sequence of

assigned numbers and Tc is the child sequence of subtrees}begin

1 if i = n− 1 then

begin

5.5. The Algorithm 77

2 Output T {Output the Evolutionary Tree represented by t1};end

3 for j = 2 to n− i do

begin

4 Construct Tc = (t′1, t′2, . . . , t

′n−i−1) with t′1 = t1 + tj;

5 Update Ac = (a′1, a′2, . . . , a

′n−i−1) with a′1 = mina1, aj;

6 Find-All-Child-Subtrees( Tc, Ac, i + 1); {Type 1}end

7 for j = 2 to n− i− 1 do

8 for k = j to n− i do

begin

9 if aj > ak then

begin

10 Construct Tc = (t′1, t′2, . . . , t

′n−i−1) with t′1 = tj + tk;

11 Update Ac = (a′1, a′2, . . . , a

′n−i−1) with a′1 = ak;

12 Find-All-Child-Subtrees( Tc, Ac, i + 1); {Type 2}end

end

end;

13 Algorithm Find-All-Evolutionary-Trees( s1, s2, . . . , sn )

begin

14 Construct Tr = (t1, t2, . . . , tn) with A = (s1, s2, . . . , sn) as subtrees;

15 Find-All-Child-Subtrees(Tr, A, 0 )

end.

The following theorem describes the performance of the algorithm Find-All-Evolutionary-

Trees.

Theorem 5.5.1 The algorithm Find-All- Evolutionary-Trees uses O(n) space and runs in

O(|S(n)|) time.

Chapter 5. Evolutionary Trees 78

Proof. In our algorithm we only use the recombination or decomposition of subtrees to

generate a new sequence of subtrees from an old one. Thus each sequence is generated in

constant time without computational overhead. Since we traverse the recursion tree Rn and

generate each sequence at each corresponding vertex of Rn we can generate all the sequences

in S(n) without repetition. By applying parent to child relation we can generate every child in

O(1) time. Then by using child to parent relation we go back to the parent sequence. Hence,

the algorithm takes O(|S(n)|) time i.e. constant time on average for each sequence. Moreover,

each evolutionary tree is at the leaf of the recursion tree Rn. Hence to output each evolutionary

tree in E(n), we to traverse the path from root to leaf in the worst case. Thus the algorithm

Find-All- Evolutionary-Trees outputs each evolutionary tree in E(n) in linear time (in

worst case).

Our algorithm generates each sequence of subtree in place i.e. we apply recombination and

decomposition to the current sequence. Therefore, the memory requirement is O(n), where n

is the number of species. Q.E .D.

5.6 Conclusion

In this chapter, we give a simple algorithm to generate all evolutionary trees having n species.

The algorithm is simple, generates each tree in constant time on average, and clarifies a simple

relation among the trees, that is a recursion tree of the trees.

Chapter 6

Labeled and Ordered Evolutionary

Trees

6.1 Introduction

In the previous chapter, we gave an algorithm to generate all labeled evolutionary trees. In

this chapter, we deal with the problem of generating all labeled and ordered evolutionary trees.

Generating all labeled and ordered evolutionary trees among different species have many appli-

cations in Bioinformatics [JP04], Genetic Engineering [KR03], Archaeology, Biochemistry and

Molecular Biology. In these applications, to find a better prediction, sometimes it is necessary

to generate all possible evolutionary trees among different species. To a mathematician, such a

tree is simply a cycle-free connected graph, but to a biologist it represents a series of hypotheses

about evolutionary events. In this chapter, we are concerned with generating all such probable

evolutionary trees that will guide biologists to research in all biological subdisciplines. We give

an algorithm to generate all evolutionary trees having n ordered species without repetition.

We also find out an efficient representation of such evolutionary trees such that each tree is

generated in constant time on average.

In bioinformatics, we frequently need to establish evolutionary relationship between different

79

Chapter 6. Labeled and Ordered Evolutionary Trees 80

types of species [JP04, KR03]. Biologists often represent this relationship in the form of binary

trees. Such complete binary trees having different types of species in its leaves are known as

evolutionary trees (see Figure 6.1). In a rooted evolutionary tree, the root corresponds to

the most ancient ancestor in the tree. Leaves of evolutionary trees correspond to the existing

species while internal vertices correspond to hypothetical ancestral species.

Evolutionary trees are used to predict predecessors of existing species, to comment about

future generations, DNA sequence matching, etc. Prediction of ancestors can be easy if all

possible trees are generated. Moreover, it is useful to have the complete list of evolutionary

trees having different types of species. One can use such a list to search for a counter-example

to some conjecture, to find best solution among all solutions or to experimentally measure an

average performance of an algorithm over all possible input evolutionary trees. Many algorithms

to generate a given class of graphs without repetition, are already known [AR06, BS94, FL79,

KN05, NU03, NU04, NU05, S97, YN04, ZS98].

PandaBear

20 millions of years ago

Raccoon Monkey

5 millions of years ago

10 millions of years ago

Figure 6.1: The evolutionary tree having four species.

In this thesis we first consider the problem of generating all possible evolutionary trees.

The main challenges in finding algorithms for enumerating all evolutionary trees are as follows.

Firstly, the number of such trees is exponential in general and hence listing all of them requires

huge time and computational power. Secondly, generating algorithms produce huge outputs

and the outputs dominate the running time. For this reason, reducing the amount of output is

essential. Thirdly, checking for any repetitions must be very efficient. Storing the entire list of

solutions generated so far will not be efficient, since checking each new solution with the entire

list to prevent repetition would require huge amount of memory and overall time complexity

would be very high. So, if we can compress the outputs, then it considerably improves the

efficiency of the algorithm. Therefore, many generating algorithms output objects in an order

6.1. Introduction 81

such that each object differs from the preceding one by a very small amount, and output each

object as the “difference” from the preceding one.

Generating evolutionary trees is more like generating complete binary rooted trees with

’fixed’ and ’labeled’ leaves. That means there are fixed number of leaves and the leaves are

labeled. There are some existing algorithms for generating rooted trees with n vertices [BS94,

KN05, NU03, NU04, NU05]. But these algorithms do not guarantee that there will be fixed

and labeled leaves. If we generate all binary trees with n leaves with existing algorithms then

we have to label each trees and permutate labels to generate all trees. Since the siblings are

not ordered, permutating the labels lead to repetition. Thus modifying existing algorithms we

cannot generate all evolutionary trees.

In this chapter we first give an efficient algorithm to generate all evolutionary trees with fixed

and ordered number of leaves. The order of the species is based on evolutionary relationship

and phylogenetic structure. For instance, Bear is more related to Panda than Monkey and

Raccoon is more related to Panda than Bear. Thus a species is more related to its preceding

and following species in the sequence of species than other species in the sequence. The order

of labels maintains this property. This property implies that each species in the sequence share

a common ancestor either with the preceding species or with the following species. We apply

the above restriction on the order of leaves with two goals in mind. First, the solution space

is reduced so that more probable solutions are available for the biologists to predict quickly

and easily. Second, each such probable evolutionary tree must be generated in constant time.

We also find out a suitable representation of such trees. We represent a labeled and ordered

complete binary tree with n leaves by a sequence of (n− 2) numbers. Our algorithm generates

all such trees without repetition.

Furthermore the algorithm for generating labeled and ordered trees is simple and generates

each tree in constant time on average without repetition. Our algorithm generates a new tree

from an existing one by making a constant number of changes and outputs each tree as the

difference from the preceding one. The main feature of our algorithm is that we define a tree

structure, that is parent-child relationships, among those trees (see Figure 6.2). In such a

Chapter 6. Labeled and Ordered Evolutionary Trees 82

“tree of evolutionary trees”, each node corresponds to an evolutionary tree and each node is

generated from its parent in constant time. In our algorithm, we construct the tree structure

among the evolutionary trees in such a way that the parent-child relation is unique, and hence

there is no chance of producing duplicate evolutionary trees. Our algorithm also generates the

trees in place, that means, the space complexity is only O(n).

B

D

C

A

B

A

C

D

BA C D B

A

C D

A B

C

D

Figure 6.2: The Family Tree F4.

The rest of the chapter is organized as follows. Section 6.2 gives some definitions. Section

6.3 depicts the representation of evolutionary trees. Section 6.4 shows a tree structure among

evolutionary trees. In Section 6.5 we present our algorithm which generates each solution in

O(1) time on average. Finally, section 6.6 is a conclusion.

6.2 Preliminaries

In this section we define some terms used in this chapter.

In mathematics and computer Science, a tree is a connected graph without cycles. A rooted

tree is a tree with one vertex r chosen as root. A leaf in a tree is a vertex of degree 1. Each

vertex in a tree is either an internal vertex or a leaf. A complete binary tree is a rooted tree

with each internal node having exactly two children.

A family tree is a rooted tree with parent-child relationship. The vertices of a rooted tree

have levels associated with them. The root has the lowest level i.e. 0. The level for any

other node is one more than its parent except root. Vertices with the same parent v are called

siblings. The siblings may be ordered as c1, c2, . . . , cl where l is the number of children of v.

6.2. Preliminaries 83

If the siblings are ordered then ci−1 is the left sibling of ci for 1 < i ≤ l and ci+1 is the right

sibling of ci for 1 ≤ i < l. The ancestors of a vertex other than the root are the vertices in

the path from the root to this vertex, excluding the vertex and including the root itself. The

descendants of a vertex v are those vertices that have v as an ancestor. A leaf in a family tree

has no children.

An evolutionary tree is a graphical representation of the evolutionary relationship among

three or more species. In a rooted evolutionary tree, the root corresponds to the most ancient

ancestor in the tree and the path from the root to a leaf in the rooted tree is called an evo-

lutionary path. Leaves of evolutionary trees correspond to the existing species while internal

vertices correspond to hypothetical ancestral species.

In this chapter, we represent evolutionary tree in terms of complete binary tree. Each

existing species of evolutionary tree is a leaf in the complete binary tree (see Figure 6.3). We

give labels to each leaves. The label identifies the existing species. For example, labels A, B,

C and D represents Bear, Panda, Raccoon and Monkey. The labels are fixed and ordered.

The order of the species is based on evolutionary relationship and phylogenetic structure. For

instance, Bear is more related to Panda than Monkey and Raccoon is more related to Panda

than Bear. So, a species is more related to its preceding and following species in the sequence

of species than other species in the sequence. The order of labels maintains this property. This

property implies that each species in the sequence share a common ancestor either with the

preceding species or with the following species. Our complete binary tree will maintain this

property and we will generate all such trees with exactly n leaves.

A B DCPanda MonkeyRaccoonBear

Figure 6.3: Representation of evolutionary tree in terms of complete binary tree.

Chapter 6. Labeled and Ordered Evolutionary Trees 84

6.3 Representation of Evolutionary Trees

In this section we give an efficient representation of a labeled and ordered evolutionary tree in

T (n). We represent such trees with n species with a sequence of (n− 2) numbers.

Let T (n) be the set of all evolutionary trees with n labeled and ordered leaves. Now, we

find out a representation of each evolutionary tree t ∈ T (n). Our idea here is to represent a

tree with a sequence of numbers. For this, we find out an intermediate representation of each

tree t ∈ T (n). A complete binary tree with n labeled leaves can be represented with a string of

valid parenthesization of n labels l1, l2, . . . , ln. Figure 6.4 shows the representation of complete

binary tree having 5 leaves. Thus the number of such trees corresponds directly to Catalan

number. Thus the total number of complete binary trees with n fixed and labeled leaves is

given by2(n−1)C(n−1)

n.

����

A B

C D

E

2

( ( A B ) ( ( C D ) E ) )

2244442

Figure 6.4: Representation of an evolutionary tree having five species.

We now count the number of opening parenthesis ’(’ before each label li, 1 ≤ i ≤ (n−2) in the

string of valid parenthesis of each intermediate representation. This gives us a sequence of (n−2)

numbers a1, a2, . . ., an−2 where ai represents the number of ’(’ before label li, for 1 ≤ i ≤ (n−2).

Since the labels are fixed and ordered, we do not need to count for ln−1 and ln and so we omit

these two numbers in the sequence. For example, the sequence 244 represents a evolutionary

tree with 5 leaves which corresponds to the string of valid parenthesis ((l1((l2l3)l4))l5). One can

observe that for each sequence a1 ≤ a2 ≤ · · · ≤ an−2 and 1 ≤ ai ≤ (n− 1) for 1 ≤ i ≤ (n− 2).

Thus, we say that a sequence of (n− 2) numbers uniquely represents a evolutionary tree with

labeled and ordered leaves as shown in Figure 6.4.

Let S(n) denote the set of all such sequence. Each sequence s ∈ S(n) uniquely identifies a

tree t ∈ T (n). We have the following lemma.

6.4. The Family Tree 85

Lemma 6.3.1 A sequence s ∈ S(n) of (n − 2) numbers uniquely represents an evolutionary

tree t ∈ T (n).

Proof. In an evolutionary tree t ∈ T (n) the labeled leaves l1, l2, . . . , ln are ordered. A leaf

li, 1 < i < n can only be paired with either with li−1 or li+1 in the sequence of labels. We take

any two labels, li and lj, 1 < i ≤ n− 2 and j ∈ {i− 1, i + 1}. If li and lj are paired, the count

of the ’(’ is same for both of them. This implies that si = sj. If li and lj are not paired, their

count of the ’(’ is different which implies si 6= sj.

For any two trees t1 ∈ T (n) and t2 ∈ T (n), t1 6= t2, we will find at least two labels li

and lj which are paired in one and not paired in another. Thus, their count is different i.e.

si 6= sj. Hence the sequence s ∈ S(n) of (n − 2) numbers represents exactly one evolutionary

tree t ∈ T (n). Q.E .D.

6.4 The Family Tree

In this section we define a tree structure Fn among evolutionary trees in T (n).

For positive integer n, let t ∈ T (n) be an evolutionary tree with n leaves having l1, l2, . . . , ln

labels. For each t ∈ T (n), we get unique sequence s ∈ S(n) of (n− 2) numbers a1, a2, . . . , an−2

where ai represents the number of ’(’ before label li, for 1 ≤ i ≤ (n−2). Also, for each sequence

a1 ≤ a2 ≤ · · · ≤ an−2 and 1 ≤ ai ≤ (n− 1) for 1 ≤ i ≤ (n− 1).

Now we define the family tree Fn as follows. Each node of Fn represents an evolutionary

tree. If there are n species then there are (n − 1) levels in Fn. A node is in level i in Fn if

a1 ≤ a2 ≤ . . . ≤ ai < (n− 1) and ai+1 = . . . = an−2 = (n− 1) for 1 < i ≤ (n− 1). For example,

the sequence 224 is at level 2. As the level increases the number of rightmost (n− 1) decreases

and vice versa. Thus a node at level n− 2 has no rightmost (n− 1) number i.e. an−2 < (n− 1).

Since Fn is a rooted tree we need a root and the root is a node at level 0. One can observe that

a node is at level 0 in Fn if a1, a2, . . . , an−2 = (n− 1) and there can be exactly one such node.

We thus take the sequence (n − 1, n − 1, . . . , n − 1) as the root of Fn. Clearly, the number of

rightmost (n − 1) in root is greater than that of any other sequence for any evolutionary tree

Chapter 6. Labeled and Ordered Evolutionary Trees 86

in T (n).

To construct Fn, we define two types of relations among the evolutionary trees in T (n):

(a) Parent-child relationship and

(b) Child-parent relationship.

We define the parent-child relationships among the evolutionary trees in T (n) with two

goals in mind. First, the difference between an evolutionary tree s and its child C(s) should be

minimum, so that C(s) can be generated from A by minimum effort. Second, every evolutionary

tree in T (n) must have a parent except the root and only one parent in Fn. We achieve the

first goal by ensuring that the child C(s) of an evolutionary tree A can be found by simple

subtraction. That means s can also be generated from its child C(s) by simple addition. The

second goal, that is the uniqueness of the parent-child relationship is illustrated in the following

subsections.

6.4.1 Parent-Child Relationship

Let t ∈ T (n) be an evolutionary tree with n ordered leaves having l1, l2, . . . , ln labels and

s ∈ S(n) be the sequence of numbers a1, a2, . . . , an−2 corresponding to t. s corresponds to a

node of level i, 0 ≤ i ≤ (n − 2) of Fn. Thus we have a1 ≤ a2 ≤ · · · ≤ ai < (n − 1) and

ai+1 = · · · = an−2 = (n − 1) for 1 < i ≤ (n − 2). The number of children it has is equal to

(ai+1 − ai). The sequence of the children are defined in such a way that to generate a child

from its parent we have to deal with only one integer in the sequence and the rest of the

integers remain unchanged. The integer is determined by the level of parent sequence in Fn.

The operation we apply is only subtraction and assignment. The number of rightmost (n− 1)

decreases in the child sequence by applying parent-child relationship.

Let Cj(s) ∈ S(n) be the sequence of jth child, 1 ≤ j ≤ (ai+1−ai) of s. Note that s is in level

i of Fn and Cj(s) will be in level i+1 of Fn. We define the sequence for Cj(s) as c1, c2, . . . , cn−2

where ck = ak for k 6= j and cj = (ai+1 − j). Thus, we observe that Cj is a node of level i + 1,

0 ≤ i < n − 2 of Fn and so c1 ≤ c2 ≤ · · · ≤ ci+1 < (n − 1) and ci+2 = · · · = cn−2 = (n − 1)

6.4. The Family Tree 87

for 0 ≤ i < (n− 2). Thus for each consecutive level we only deal with the integer ai+1 and the

rest of the integers remain unchanged. For example, 244 for n = 5 is a node of level 1 because

a1 < 4 and a2 = a3 = 4. Here, a2 − a1 = 2 so it has two children and the two children are

shown in Figure 6.6.

6.4.2 Child-Parent Relationship

The child-parent relation is just the reverse of parent-child relation. Let t ∈ T (n) be an

evolutionary tree with n ordered leaves having l1, l2, . . . , ln labels and s ∈ S(n) be the sequence

of numbers a1, a2, . . . , an−2 corresponding to t. s corresponds to a node of level i, 0 ≤ i ≤ (n−2)

of Fn. Thus we have a1 ≤ a2 ≤ . . . ≤ ai < (n − 1) and ai+1 = . . . = an−2 = (n − 1) for

1 < i ≤ (n− 1). We define a unique parent sequence of s at level i− 1. Like the parent-child

relationship here we also deal with only one integer in the sequence. The operations we apply

here is only addition and assignment. The number of rightmost n − 1 increases in the parent

sequence by applying child-parent relationship.

Let P (s) ∈ S(n) be the parent sequence of s. We define the sequence for P (s) as p1, p2, . . .,

pn−2 where pj = aj for j 6= (i − 1) and pi−1 = (n − 1). Thus, we observe that P (s) is a

node of level i − 1, 1 ≤ i < (n − 2) of Fn and so p1 ≤ p2 ≤ · · · ≤ pi−1 < (n − 1) and

pi = · · · = pn−2 = (n− 1) for 1 ≤ i ≤ (n− 2). For example, 224 for n = 5 is a node of level 2

because a1 ≤ a2 ≤ 4 and a3 = 4. It has a unique parent 244 as shown in Figure 6.6.

6.4.3 The Family Tree

From the above definitions we can construct Fn. We take the sequence sr = a1, a2, . . . , an−2

as root where a1, a2, . . . , an−2 = n − 1 as we mentioned before. The family tree Fn for the

evolutionary trees in T (n) is shown in Figure 6.5 and Figure 6.6 shows the representation of

family tree Fn.

Based on the above parent-child relationship, the following lemma proves that every evolu-

tionary tree in T (n) is present in Fn.

Chapter 6. Labeled and Ordered Evolutionary Trees 88

C

E

D

B

A

C D

E

A

B

C

E

A

B

D

C

A

B D E

CA B

D E

D E

B C

A

A B

C D E

C

A B

D

E

C

A

E

B

D

BA C D

E

A

B C

D

E

B

A

C

D

E

B

D

C

A

E

A B

C

D

ELevel 0

Level 1

Level 2

Level 3

Figure 6.5: Illustration of Family Tree F5.

244

Level 0

Level 1

334 234 134224

144

Level 2

344

Level 3

124

333 233 223 133 123

444

Figure 6.6: Representation of Family Tree F5.

Lemma 6.4.1 For any evolutionary tree t ∈ T (n), there is a unique sequence of evolutionary

trees that transforms t into the root tr of Fn.

Proof. Let s ∈ S(n) be a sequence, where s is not the root sequence, representing

an evolutionary tree t ∈ T (n). By applying child-parent relationship, we find the parent

sequence P (s) of the sequence s. Now if P(s) is the root sequence, then we stop. Otherwise, we

apply the same procedure to P (s) and find its parent P (P (s)). By continuously applying this

process of finding the parent sequence of the derived sequence, we have the unique sequence

s, P (s), P (P (s)), . . . of sequences in S(n) which eventually ends with the root sequence sr of

Tn,m. We observe that P (s) has at least one (n− 1) number more than s in its sequence. Thus

s, P (s), P (P (s)), . . . never lead to a cycle and the level of the derived sequence decreases which

6.5. Algorithm 89

ends up with the level of root sequence sr. Q.E .D.

Lemma 6.4.1 ensures that there can be no omission of evolutionary trees in the family

tree Fn. Since there is a unique sequence of operations that transforms an evolutionary tree

t ∈ T (n) into the root tr of Fn, by reversing the operations we can generate that particular

evolutionary tree, staring from root. Now we have to make sure that Fn represents evolutionary

trees without repetition. Based on the parent-child and child-parent relationships, the following

lemma proves this property of Fn.

Lemma 6.4.2 The family tree Fn represents evolutionary trees in T (n) without repetition.

Proof. Given a sequence s ∈ S(n), representing a t ∈ T (n), the children of s are defined in

such a way that no other sequence in S(n) can generate same child. For contradiction let two

sequences A,B ∈ S(n) are at level i of Fn and generate same child C. Thus C is a sequence of

level i + 1 of Fn. The sequences for A, B and C are aj, bj and cj for 1 ≤ j ≤ n − 2. Clearly,

ak = bk = n − 1 for i + 1 ≤ k ≤ n − 2 and we will get at least one j such that aj 6= bj for

1 ≤ j ≤ i. According to parent-child relationship, to generate C from its parent A or B, only

one integer ai+1 or bi+1 is changed in the sequence. Thus child of A, C(A) and child of B, C(B)

are different since ai+1 = bi+1 and aj 6= bj for 1 ≤ j < i + 1. Thus A and B does not generate

same child C. By contradiction, every sequence has a single and unique parent. Q.E .D.

6.5 Algorithm

In this section, we give an algorithm to construct the family tree Fn and generate all trees.

If we can generate all child sequences of a given sequence in S(n), then in a recursive

manner we can construct Fn and generate all sequence in S(n). We have the root sequence

sr = (n − 1) . . . (n − 1). We get the child sequence sc by using the parent to child relation

discussed above.

Procedure Find-All-Child-Trees(s = a1a2 . . . an−1, i)

Chapter 6. Labeled and Ordered Evolutionary Trees 90

{ s is the current sequence, i indicates the current level and sc is the child sequence }begin

Output s {Output the difference from the previous evolutionary tree};for j = 1 to (ai+1 − ai)

Find-All-Child-Trees( sc = a1a2 . . . (ai+1 − j) . . . an−2), i + 1);

end;

Algorithm Find-All-Evolutionary-Trees(n)

begin

Find-All-Child-Trees( sr = (n− 1) . . . (n− 1), 0 );

end.

The following theorem describes the performance of the algorithm Find-All- Evolutionary-

Trees.

Theorem 6.5.1 The algorithm Find-All-Evolutionary-Trees uses O(n) space and runs in

O(|T (n)|) time.

Proof. In our algorithm we only use the simple addition or subtraction operation to generate

a new evolutionary tree from an old one. Thus each evolutionary tree is generated in constant

time without computational overhead. Since we traverse the family tree Fn and output each

sequence at each corresponding vertex of Fn we can generate all the evolutionary trees in T (n)

without repetition. By applying parent to child relation we can generate every child in O(1)

time. Then by using child to parent relation we go back to the parent sequence. Hence, the

algorithm takes O(|T (n)|) time i.e. constant time on average for each output.

Our algorithm outputs each evolutionary tree as the difference from the previous one. The

data structure that we use to represent the evolutionary trees is a sequence of n − 2 integers.

Therefore, the memory requirement is O(n), where n is the number of species. Q.E .D.

6.6. Conclusion 91

6.6 Conclusion

In this chapter, we find out an efficient representation of an evolutionary tree having ordered

species. We also give an algorithm to generate all evolutionary trees having n ordered species.

The algorithm is simple, generates each tree in constant time on average, and clarifies a simple

relation among the trees that is a family tree of the trees.

Chapter 7

Conclusion

This thesis deals with algorithms for generating all solutions of a combinatorial problem. We

have presented efficient algorithms to generate all distributions of n objects (both identical and

distinguishable) to m bins. We have introduced a new elegant efficient family tree traversal

algorithm that generates each solution in O(1) time (in ordinary sense). In this thesis, we

also deal with the problem of generating all evolutionary trees. We have given an algorithm to

generate all evolutionary trees having n species without repetition. We also find out an efficient

representation of such evolutionary trees such that each tree is generated in constant time on

average. For the purposes of biologists, we also give a new algorithm to generate evolutionary

trees having ordered species.

We first summarize each chapter and its contributions. In Chapter 1 we have discussed

about enumeration problems and its applications in different areas. We also have described the

main algorithmic challenges that any enumerations algorithm has to face and reviewed some of

the existing literature.

In Chapter 2 we have introduced graph theoretical terminologies that have been used

throughout this thesis.

In Chapter 3 we have given an elegant algorithm to generate all distributions of identical

objects to bins without repetition. Our algorithm generates each distribution in constant time

with linear space complexity. We also present an efficient tree traversal algorithm that generates

92

93

each solution in O(1) time (in ordinary sense). To the best of our knowledge, our algorithm is

the first algorithm which generates each solution in O(1) time in ordinary sense. By modifying

our algorithm, we can generate the distributions in anti-lexicographic order. Finally, we extend

our algorithm for the case when the bins have priorities associated with them. Overall space

complexity of our algorithm is O(m), where m is the number of bins.

In Chapter 4 we have given a simple algorithm to generate all distributions of distinguishable

objects to bins. The algorithm generates each distribution in constant time with linear space

complexity. We also present an efficient tree traversal algorithm that generates each solution in

O(1) time. Then, we extend our algorithms for the case when the bins have priorities associated

with them.

In Chapter 5 we have given a simple algorithm to generate all evolutionary trees having n

species. The algorithm is simple, generates each tree in linear time (O(1) time in amortized

sense) and clarifies a simple relation among the trees, that is a recursion tree of the trees.

In Chapter 6 we find out an efficient representation of an evolutionary tree having ordered

species. We also give an algorithm to generate all evolutionary trees having n ordered species.

The algorithm is simple, generates each tree in constant time on average, and clarifies a simple

relation among the trees that is a family tree of the trees.

In this thesis we have given many efficient algorithms for generating all solutions of different

enumeration problems. However, the following problems are still open.

1. Develop an algorithm that generates all distributions of distinguishable objects to bins

when the objects are weighted and the bins have maximum capacity.

2. Is there any constant time algorithm that generates each evolutionary tree in constant

time?

3. We have obtained a average constant-time algorithm for generating all labeled and ordered

evolutionary trees. Is it possible to obtain a constant time algorithm that generates each

evolutionary tree in O(1) time in ordinary sense?

References

[AR06] M. A. Adnan and M. S. Rahman, Distribution of objects to bins: generating all dis-

tributions, Proc. of International Conference on Computer and Information Technology

(ICCIT’06), 2006 (to appear).

[AU95] A. V. Aho and J. D. Ullman, Foundation of Computer Science, Computer Science

Press, New York, 1995.

[BS94] M. Belbaraka and I. Stojmenovic, On generating B-trees with constant average delay

and in lexicographic order, Information Processing Letters, 49, pp. 27-32, 1994.

[CLR90] T. M. Cormen, C. E. Leiserson and R. L. Rivest, Introduction to Algorithms, MIT

Press, 1990.

[FL79] T. I. Fenner and G. Loizou, A binary tree representation and related algorithms for

generating integer partitions, The Computer Journal, 23, pp. 332-337, 1979.

[GKP94] R. Graham, D.E. Knuth and O. Patashnik, Concrete Mathematics, Addision-Wesley,

(1994).

[HS93] J. Hershberger and S. Suri, Morphing Binary Trees, Journal of Algorithms, 1993.

[J63] S. M. Johnson, Generation of permutations by adjacent transpositions, Mathematics of

Computation, 17, pp. 282-285, 1963.

[JP04] N. C. Jones and P. A. Pevzner, An Introduction to Bioinformatics Algorithms, The MIT

Press, Cambridge, Massachusetts, London, England, 2004.

94

References 95

[JWW80] J. T. Joichi, D. E. White and S. G. Williamson, Combinatorial Gray codes, SIAM

Journal on Computing, 9(1), pp. 130-141, 1980.

[K06] D. E. Knuth, The Art of Computer Programming, Vol.4, url:

http://www.cs.utsa.edu/ wagner/knuth/, 2006.

[K82] P. Klingsberg, A gray code for compositions, Journal of Algorithms, 3, pp. 41-44, 1982.

[KN05] S. Kawano and S. Nakano, Constant time generation of set partition, IEICE Trans.

Fundamentals, E88-A, 4, pp. 930-934, 2005.

[KN06] S. Kawano and S. Nakano, Generating Multiset Partitions, (on private communication),

2006.

[KR03] D. E. Krane and Michael L. Raymer, Fundamental Concepts of BioInformatics, Pearson

Education, San Francisco, 2003.

[LBR93] J. M. Lucas , D. R. Baronaigien and F. Ruskey, On Rotations and the Generation of

binary Trees, Journal of Algorithms, 9, pp. 503-535, 1993.

[M98] B. D. Mckay, Isomorph-free exhaustive generation, Journal of Algorithms, 26, pp. 306-

324, 1998.

[NU03] S. Nakano and T. Uno, Efficient generation of rooted trees, NII Technical Report, NII-

2003-005E, July 2003.

[NU04] S. Nakano and T. Uno, Constant time generation of trees with specified diameter, Proc.

of WG 2004, LNCS 3353, pp. 33-45, 2004.

[NU05] S. Nakano and T. Uno, Generating colored trees, Proc. of WG 2005, LNCS 3787, pp.

249-260, 2005.

[NW78] A. Nijenhuis and H. Wilf, Combinatorial Algorithms, Academic press, New York, 1978.

[R00] K. H. Rosen, Discrete Mathematics and Its Applications, WCB/McGraw-Hill, Singapore,

2000.

References 96

[S97] C. Savage, A survey of combinatorial gray codes, SIAM Review, 39, pp. 605-629, 1997.

[T02] A. S. Tanenbaum, Computer Networks, Prentice Hall, Upper Saddle River, New Jersey,

2002.

[T04] A. S. Tanenbaum, Modern Operating Systems, Prentice Hall, Upper Saddle River, New

Jersey, 2004.

[T62] H. F. Trotter, PERM (Algorithm 115), Communications of the ACM, 5, pp. 434-435,

1962.

[W96] D. B. West, Introduction to Graph Theory, Prentice-Hall, Upper Saddle River, New

Jersey, 1996.

[YN04] K. Yamanaka and S. Nakano, Generating all realizers, IEICE Trans. Inf. and Syst.,

J87-DI, 12, pp. 1043-1050, 2004.

[ZS98] A. Zoghbi and I. Stojmenovic, Fast algorithm for generating integer partitions, Interna-

tional Journal of Computer Mathematics, 70, pp. 319-332, 1998.

List of Publications

1. Muhammad Abdullah Adnan and Md. Saidur Rahman, Distribution of objects to bins:

generating all distributions, Proc. of International Conference on Computer and Infor-

mation Technology (ICCIT’06), 2006 (to appear).

2. Muhammad Abdullah Adnan and Md. Saidur Rahman, Distribution of distinguishable

objects to bins: generating all distributions, submitted to a Journal, 2006.

3. Muhammad Abdullah Adnan and Md. Saidur Rahman, Efficient Generation of Evolu-

tionary Trees, submitted to a Journal, 2006.

97

Index

algorithm, 22

amortized time, 23

average constant time, 23

constant time, 23

exponential, 22

linear, 23

linear time, 23

non-polynomial, 23

polynomial, 22

polynomially bounded, 22

run time, 22

anti-lexicographic order, 4

binary tree

complete binary tree, 19

bioinformatics, 2

Catalan Numbers, 25

child-parent relation, 34

combinatorial gray code, 9

combinatorics, 1, 2, 26

combinatorial algorithms, 1

depth first search(DFS), 24

distinguishable objects, 45

edge

loop, 17

enumeration, 3

enumeration algorithm , 3

evolutionary tree, 21, 83

evolutionary path, 21, 83

labeled trees, 69

family tree, 11, 30

ancestor, 20

descendant, 20

level, 20

rightmost leaf, 59

sibling, 20, 30

genealogical tree, 11

generation

in place, 29

graph, 17

cycle, 18

degree, 18

edge, 17

path, 18

rank, 24

simple graph, 18

98

INDEX 99

vertex, 17

walk, 18

Gray code, 28, 47

gray code approach, 10

gray code order, 4, 51

integer partition, 6, 21, 30

Johnson-Trotter algorithm, 11

lexicographic order, 4

maximal clique, 5

multiset, 21

node

level, 19

objects

identical objects, 45

non-identical objects, 45

parent-child relationship, 33

partition, 6

recursion tree, 70

rightmost leaf, 38, 59

sequence

inner sequence, 50

nonzero sequence, 51

zero sequence, 51

set partition, 6, 21, 31

sibling

left sibling, 20, 30

right sibling, 20, 30

simple set, 22

subset, 6

traversal, 24

in-order, 24

post-order, 24

pre-order, 24

tree, 18, 30

ancestor, 19

binary tree, 19

child, 18

depth, 19

descendant, 19

family tree, 20

height, 19

internal node, 19

leaf, 19, 30

level, 30

nodes, 18

parent, 18

root, 18

rooted tree, 18, 30