bijective tree encoding saverio caminiti. 2 talk outline domains prüfer-like codes prüfer code...

Bijective tree encoding

Saverio Caminiti

2

Talk Outline Domains Prüfer-like codes

Prüfer code (1918) Neville codes (1953) Deo and Micikevičius code (2002)

Picciotto codes (1999) Applications, Operations and Properties

Random trees generation (with constrains) Locality and Heritability Other operations

Future work

3

Domains Labeled trees Tn

n nodes labeled with distinct symbols in s.t. || = n i.e. indexed with integers in [n] = {1, 2, ..., n}

Both rooted and unrooted Undirected No ordered among nodes children

Strings according with Cayley’s theorem

In n-2 for unrooted (i.e. [n]n-2) In n-1 for rooted (i.e. [n]n-1)

4

4 1 3 3 1 4 3 3 4

Examples

31

4

2

5

63

4

1

25

6

5

Prüfer code Introduced in 1918 to prove the Cayley’s

theorem is the first bijection between Tn and [n]n-2

(T) = adj(u) :: (T-u)

where: u is the smallest leaf in T, adj(u) is the only node adjacent to u in T, T-u is the tree obtained from T removing u, and the operator :: is the string concatenation.

6

(T) = adj(u) :: (T-u)

S 2 4 1 5 3 C 4 1 3 3 6

Example: Prüfer encode unrooted

31

4

2

5

6 = n

= n

n - 2

7

(T) = adj(u) :: (T-u)

S 2 1 5 6 3 C 1 4 3 3 4

Example: Prüfer encode rooted

3

4

1

25

6

= n

= n

n - 1

8

(T) = adj(u) :: (T-u)

S 2 1 5 6 3 C 1 4 3 3 4

Notes: Prüfer encode

3

4

1

25

6

n - 1

Focus on rooted trees1. Each node (but the root)

is removed exactly once2. Each node appear in C

once for each children3. A node can be removed

only after all its children

9

Example: Prüfer decode C 1 4 3 3 4 S ? ? ? ? ?

Let l be the length of the string C n = l + 1 = 6 First step: the leaves of initial tree are

those nodes that do not appear in C:{2, 5, 6}

choose the smallest one

10

Example: Prüfer decode C 1 4 3 3 4 S 2

The remaining code 4 3 3 4 is (T-{2})then we should choose the smallest leaf among

{1, 5, 6}

11

Example: Prüfer decode C 1 4 3 3 4 S 2 1

The remaining code 3 3 4 is (T-{2, 1})then we should choose the smallest leaf among

{5, 6}

12

Example: Prüfer decode C 1 4 3 3 4 S 2 1 5 6 3

3

4

1

25

6

13

Other Prüfer-like codes Neville (1953) for rooted trees

The first one was indeed the Prüfer code

Moon (1970) Adapts Neville’s codes to trees

Deo and Micikevičius (2002)

14

Second Neville code

15

Third Neville Code

16

Deo and Micikevičius code

17

Generalization It has been proven that any

deterministic procedure P able to choose at each stepa non- empty sequence of leaves can be usedto generate a bijective code

(T) = adj(P(T)) :: (T-P(T))

18

Why several codes

Different codes may have different properties and allow different operations

Encoding and Decoding algorithms for different code may have different time (and/or space) complexity

19

Implementation of Prüfer code Straightforward implementation: O(n log

n) First linear time algorithm in 1978

(left as exercise in Combinatorial algorithms)

Optimal parallel algorithm 2000 Linear time sequential algorithm

rediscovered in 2000 and 2001

Still unknown in 2003 !!!

20

Implementation of other codes Second Neville code 2002

Third Neville code 1953 (trivial)

Deo and Micikevičius 2002(in the original paper)

21

A unified approach

The encoding of all four codes can be reduce to sorting pairs integer in [n]

The decoding can be reduced to the computation of the rightmost occurrence of each symbol in the code string

22

Encoding: Second Neville code

pair 0,3 0,4 0,5 0,8 0,9 1,1 1,6 1,10 2,2

S 3 4 5 8 9 1 6 10 2 C 6 10 6 1 7 2 7 7 7

(l(v), v)

where l(v) is the level of v from the

bottom

23

Encoding: Third Neville code

pair 3,0 4,0 4,1 5,0 5,1 8,0 8,1 8,2 8,3

S 3 4 10 5 6 8 1 2 7 C 6 10 7 6 7 1 2 7 9

( (v), d(v, (v)) )

where (v) is the greatest leaf in the subtree rooted at v

24

Linear time implementation

All the information appearing in pairs can be computer with a simple tree traversal

O(n)

To sort the set of pairs it is enough to execute twice a stable integer sort

O(n)

25

Decoding: Third Neville code C 6 10 7 6 7 1 2 7 9 S ? ? ? ? ? ? ? ? ?

Compute the rightmost occurrence of eachv [n] into C:

v 1 2 3 4 5 6 7 8 9 10v 6 7 0 0 0 4 8 0 9 2

26

Decoding: Third Neville code C 6 10 7 6 7 1 2 7 9 S ? ? ? ? ? ? ? ? ?


v 1 2 3 4 5 6 7 8 9 10v 6 7 0 0 0 4 8 0 9 2

27

Decoding: Third Neville code C 6 10 7 6 7 1 2 7 9 S ? ? 10 ? 6 ? 1 2 7


v 1 2 3 4 5 6 7 8 9 10v 6 7 0 0 0 4 8 0 9 2

28

Decoding: Third Neville code C 6 10 7 6 7 1 2 7 9 S 3 4 10 5 6 8 1 2 7

29

Parallel results These techniques allow us to efficiently

encode and decode on EREW PRAM: Integer Sorting require O(log n) time

and O(n √ log n) operations The rightmost occurrence computation can be

reduced to Integer Sorting

30

Talk Outline Domains Prüfer-like codes

Prüfer code (1918) Neville codes (1953) Deo and Micikevičius code (2002)

Picciotto codes (1999) Applications, Operations and Properties

Random trees generation (with constrains) Locality and Heritability Other operations

Future work

31

Picciotto’s codes In her PhD thesis Picciotto proposed three

codes for unrooted trees: Blob code Happy code Dandelion code

Easily adapted to rooted tree (T, r)c1 c2 ... cn-2 r

n - 1

32

Happy code

6

0

1

2

3

4

5

7

33

Happy code

6

0

1

2

345

7

34

Happy code

6

0

1

2 3

45

7

35

Happy code

6

0

1

2 3

45

7

36

Happy code

6

0

1

2 3

45

7

Node 2 3 4 5 6 7C 0 4 3 6 6 5

37

Happy code

6

0

1

2 3

45

7

x f(x)

0 01 02 03 44 35 66 67 7

Node 2 3 4 5 6 7C 0 4 3 6 6 5

38

Happy code Create a bijection between Tn and a

subset of the endofunctions on [n]{ƒ:[n][n] s.t. ƒ(0) = ƒ(1) = 0}

The code string is ƒ(2) :: ƒ(3) :: ... :: ƒ(n)

Linear time encoding and decoding(identify and break cycles, reconstruct the original path from 1 to 0)

39

Blob code

5

0

1

2

34

Node 1 2 3 4 5C

40

Blob code

5

0

1

2

34

Node 1 2 3 4 5C -

41

Blob code

5

0

1

2

34

Node 1 2 3 4 5C 0 -

42

Blob code

5

0

1

2

34

Node 1 2 3 4 5C 5 0 -

path(3, 0) Blob3 is stable

43

Blob code

5

0

1

2

34

Node 1 2 3 4 5C 2 5 0 -

44

Blob code

5

0

1

2

34

Node 1 2 3 4 5C 2 2 5 0 -

path(1, 0) Blob1 is stable

45

Blob code Straight forward implementation leads to

O(n2)(used in 2003)

Can be reduced to the transformation of the tree in a functional digraph

Linear time encoding and decoding algorithm

46

Blob code

5

0

1

2

34

Node 1 2 3 4 5C 2 5 -

path(v, 0) contains u > vv is stable

47

Blob code

5

0

1

2

34

Node 1 2 3 4 5C 2 2 5 0 -

48

Blob code

5

0

1

2

34

Node 1 2 3 4 5C 2 2 5 0 -

ƒ(1) ƒ(2) ƒ(3) ƒ(4)

x f(x)

0 01 22 23 54 05 0

49

Dandelion code

Node 2 3 4 5 6 7 8 9 10 11C 5 6 10 2 4 2 1 0 3 9

50

Dandelion code

Node 2 3 4 5 6 7 8 9 10 11C 5 6 10 2 4 2 1 0 3 9

51

Dandelion code

Linear implementation: identify path(1, 0)traverse from 0 to 1 and mark “greater” nodestraverse from 1 to 0 and swap parents

52

Applications Random trees generation Genetic Algorithms Data compression Computation of forest volumes of graphs Represent trees in several context

(e.g. phylogenetic relationships in biology)

53

Random trees generation Easily generate a tree by decoding a

random string in linear time Effective in parallel Easy to add constrains:

Root Leaves set and number Degree of selected nodes

54

Genetic AlgorithmsGiven an optimization problem P (e.g.

constrained MST) a GA for P is an heuristic for P based on the following scheme:

Individual 1’Individual 2’Individual 3’Individual 4’

………

Individual N’

Individual 1”Individual 2”Individual 3”Individual 4”

………

Individual N”

Individual 1k

Individual 2k

Individual 3k

Individual 4k

………

Individual Nk

Generation 1 Generation 2 Generation K

55

Genetic AlgorithmsEach individual is a candidate solution for P

(e.g. a tree) and is represented by a chromosome string

500060002911111411562222222333333311

………

111110030

500020002914111431562223333223332211

………

111110030

123456789584321886456877318187341565

………

789241388


56

Genetic Algorithms Mutation and cross-over Selection and elitism

000000000111111111222222222333333333

………

111110000

000020000114111131222223333223332222

………

111110000

123456789584321886456877318187341565

………

789241388


57

Genetic AlgorithmsThe value of the best individual in each

generation should converge to the value of an optimal solution for P.

opt

generations

valu

e

58

The code must be bijective

If you use the parent vector representation the probability that an offspring string represents a tree is n – 1 1

n n

Desirable property: Locality Heritability

= 0 when n grows

Genetic Algorithms

59

Locality Small changes in the tree correspond to

small changes in the associated string (and vice versa)

Parent vector has optimal locality Prüfer-like codes exhibits poor locality Blob code experimentally better that

Prüfer code (2001) Happy and Dandelion codes should be

better than Blob code

60

Heritability A new string is generated by mixing two

existing strings with crossover operations Edges of the tree corresponding to the

mixed string belong to either of original trees

Parent vector: best Prüfer-like: poor Blob: better that Prüfer (2001) Happy and Dandelion: better than Blob

61

Other operations

Prüfer-like Picciotto

parent(v) no almost

children(v) no almost

Prüfer-like Picciotto

Identify the root yes yes

Identify leaves yes yes

Computations on degrees

yes yes

Computation of diameter

DM, N2 No

Most desirable operations:

62

Future work Develop parallel algorithms for Picciotto’s

codes

Investigate properties and operationsP(Ci = p(i)) for Blob, Happy, and Dandelion code

Define new efficient codes

Extend these codes to k-trees

bijective tree encoding saverio caminiti. 2 talk outline domains prüfer-like codes prüfer code...

Documents

neville codes

prfer encodet

prfer codemoon

prfer decodec14334s2the

prfer decodec14334s21the

prfer codeintroduced

codessecond neville

codesprfer code