bijective tree encoding saverio caminiti. 2 talk outline domains prüfer-like codes prüfer code...
TRANSCRIPT
Bijective tree encoding
Saverio Caminiti
2
Talk Outline Domains Prüfer-like codes
Prüfer code (1918) Neville codes (1953) Deo and Micikevičius code (2002)
Picciotto codes (1999) Applications, Operations and Properties
Random trees generation (with constrains) Locality and Heritability Other operations
Future work
3
Domains Labeled trees Tn
n nodes labeled with distinct symbols in s.t. || = n i.e. indexed with integers in [n] = {1, 2, ..., n}
Both rooted and unrooted Undirected No ordered among nodes children
Strings according with Cayley’s theorem
In n-2 for unrooted (i.e. [n]n-2) In n-1 for rooted (i.e. [n]n-1)
4
4 1 3 3 1 4 3 3 4
Examples
31
4
2
5
63
4
1
25
6
5
Prüfer code Introduced in 1918 to prove the Cayley’s
theorem is the first bijection between Tn and [n]n-2
(T) = adj(u) :: (T-u)
where: u is the smallest leaf in T, adj(u) is the only node adjacent to u in T, T-u is the tree obtained from T removing u, and the operator :: is the string concatenation.
6
(T) = adj(u) :: (T-u)
S 2 4 1 5 3 C 4 1 3 3 6
Example: Prüfer encode unrooted
31
4
2
5
6 = n
= n
n - 2
7
(T) = adj(u) :: (T-u)
S 2 1 5 6 3 C 1 4 3 3 4
Example: Prüfer encode rooted
3
4
1
25
6
= n
= n
n - 1
8
(T) = adj(u) :: (T-u)
S 2 1 5 6 3 C 1 4 3 3 4
Notes: Prüfer encode
3
4
1
25
6
n - 1
Focus on rooted trees1. Each node (but the root)
is removed exactly once2. Each node appear in C
once for each children3. A node can be removed
only after all its children
9
Example: Prüfer decode C 1 4 3 3 4 S ? ? ? ? ?
Let l be the length of the string C n = l + 1 = 6 First step: the leaves of initial tree are
those nodes that do not appear in C:{2, 5, 6}
choose the smallest one
10
Example: Prüfer decode C 1 4 3 3 4 S 2
The remaining code 4 3 3 4 is (T-{2})then we should choose the smallest leaf among
{1, 5, 6}
11
Example: Prüfer decode C 1 4 3 3 4 S 2 1
The remaining code 3 3 4 is (T-{2, 1})then we should choose the smallest leaf among
{5, 6}
12
Example: Prüfer decode C 1 4 3 3 4 S 2 1 5 6 3
3
4
1
25
6
13
Other Prüfer-like codes Neville (1953) for rooted trees
The first one was indeed the Prüfer code
Moon (1970) Adapts Neville’s codes to trees
Deo and Micikevičius (2002)
14
Second Neville code
15
Third Neville Code
16
Deo and Micikevičius code
17
Generalization It has been proven that any
deterministic procedure P able to choose at each stepa non- empty sequence of leaves can be usedto generate a bijective code
(T) = adj(P(T)) :: (T-P(T))
18
Why several codes
Different codes may have different properties and allow different operations
Encoding and Decoding algorithms for different code may have different time (and/or space) complexity
19
Implementation of Prüfer code Straightforward implementation: O(n log
n) First linear time algorithm in 1978
(left as exercise in Combinatorial algorithms)
Optimal parallel algorithm 2000 Linear time sequential algorithm
rediscovered in 2000 and 2001
Still unknown in 2003 !!!
20
Implementation of other codes Second Neville code 2002
Third Neville code 1953 (trivial)
Deo and Micikevičius 2002(in the original paper)
21
A unified approach
The encoding of all four codes can be reduce to sorting pairs integer in [n]
The decoding can be reduced to the computation of the rightmost occurrence of each symbol in the code string
22
Encoding: Second Neville code
pair 0,3 0,4 0,5 0,8 0,9 1,1 1,6 1,10 2,2
S 3 4 5 8 9 1 6 10 2 C 6 10 6 1 7 2 7 7 7
(l(v), v)
where l(v) is the level of v from the
bottom
23
Encoding: Third Neville code
pair 3,0 4,0 4,1 5,0 5,1 8,0 8,1 8,2 8,3
S 3 4 10 5 6 8 1 2 7 C 6 10 7 6 7 1 2 7 9
( (v), d(v, (v)) )
where (v) is the greatest leaf in the subtree rooted at v
24
Linear time implementation
All the information appearing in pairs can be computer with a simple tree traversal
O(n)
To sort the set of pairs it is enough to execute twice a stable integer sort
O(n)
25
Decoding: Third Neville code C 6 10 7 6 7 1 2 7 9 S ? ? ? ? ? ? ? ? ?
Compute the rightmost occurrence of eachv [n] into C:
v 1 2 3 4 5 6 7 8 9 10v 6 7 0 0 0 4 8 0 9 2
26
Decoding: Third Neville code C 6 10 7 6 7 1 2 7 9 S ? ? ? ? ? ? ? ? ?
Compute the rightmost occurrence of eachv [n] into C:
v 1 2 3 4 5 6 7 8 9 10v 6 7 0 0 0 4 8 0 9 2
27
Decoding: Third Neville code C 6 10 7 6 7 1 2 7 9 S ? ? 10 ? 6 ? 1 2 7
Compute the rightmost occurrence of eachv [n] into C:
v 1 2 3 4 5 6 7 8 9 10v 6 7 0 0 0 4 8 0 9 2
28
Decoding: Third Neville code C 6 10 7 6 7 1 2 7 9 S 3 4 10 5 6 8 1 2 7
29
Parallel results These techniques allow us to efficiently
encode and decode on EREW PRAM: Integer Sorting require O(log n) time
and O(n √ log n) operations The rightmost occurrence computation can be
reduced to Integer Sorting
30
Talk Outline Domains Prüfer-like codes
Prüfer code (1918) Neville codes (1953) Deo and Micikevičius code (2002)
Picciotto codes (1999) Applications, Operations and Properties
Random trees generation (with constrains) Locality and Heritability Other operations
Future work
31
Picciotto’s codes In her PhD thesis Picciotto proposed three
codes for unrooted trees: Blob code Happy code Dandelion code
Easily adapted to rooted tree (T, r)c1 c2 ... cn-2 r
n - 1
32
Happy code
6
0
1
2
3
4
5
7
33
Happy code
6
0
1
2
345
7
34
Happy code
6
0
1
2 3
45
7
35
Happy code
6
0
1
2 3
45
7
36
Happy code
6
0
1
2 3
45
7
Node 2 3 4 5 6 7C 0 4 3 6 6 5
37
Happy code
6
0
1
2 3
45
7
x f(x)
0 01 02 03 44 35 66 67 7
Node 2 3 4 5 6 7C 0 4 3 6 6 5
38
Happy code Create a bijection between Tn and a
subset of the endofunctions on [n]{ƒ:[n][n] s.t. ƒ(0) = ƒ(1) = 0}
The code string is ƒ(2) :: ƒ(3) :: ... :: ƒ(n)
Linear time encoding and decoding(identify and break cycles, reconstruct the original path from 1 to 0)
39
Blob code
5
0
1
2
34
Node 1 2 3 4 5C
40
Blob code
5
0
1
2
34
Node 1 2 3 4 5C -
41
Blob code
5
0
1
2
34
Node 1 2 3 4 5C 0 -
42
Blob code
5
0
1
2
34
Node 1 2 3 4 5C 5 0 -
path(3, 0) Blob3 is stable
43
Blob code
5
0
1
2
34
Node 1 2 3 4 5C 2 5 0 -
44
Blob code
5
0
1
2
34
Node 1 2 3 4 5C 2 2 5 0 -
path(1, 0) Blob1 is stable
45
Blob code Straight forward implementation leads to
O(n2)(used in 2003)
Can be reduced to the transformation of the tree in a functional digraph
Linear time encoding and decoding algorithm
46
Blob code
5
0
1
2
34
Node 1 2 3 4 5C 2 5 -
path(v, 0) contains u > vv is stable
47
Blob code
5
0
1
2
34
Node 1 2 3 4 5C 2 2 5 0 -
48
Blob code
5
0
1
2
34
Node 1 2 3 4 5C 2 2 5 0 -
ƒ(1) ƒ(2) ƒ(3) ƒ(4)
x f(x)
0 01 22 23 54 05 0
49
Dandelion code
Node 2 3 4 5 6 7 8 9 10 11C 5 6 10 2 4 2 1 0 3 9
50
Dandelion code
Node 2 3 4 5 6 7 8 9 10 11C 5 6 10 2 4 2 1 0 3 9
51
Dandelion code
Linear implementation: identify path(1, 0)traverse from 0 to 1 and mark “greater” nodestraverse from 1 to 0 and swap parents
52
Applications Random trees generation Genetic Algorithms Data compression Computation of forest volumes of graphs Represent trees in several context
(e.g. phylogenetic relationships in biology)
53
Random trees generation Easily generate a tree by decoding a
random string in linear time Effective in parallel Easy to add constrains:
Root Leaves set and number Degree of selected nodes
54
Genetic AlgorithmsGiven an optimization problem P (e.g.
constrained MST) a GA for P is an heuristic for P based on the following scheme:
Individual 1’Individual 2’Individual 3’Individual 4’
………
Individual N’
Individual 1”Individual 2”Individual 3”Individual 4”
………
Individual N”
Individual 1k
Individual 2k
Individual 3k
Individual 4k
………
Individual Nk
Generation 1 Generation 2 Generation K
55
Genetic AlgorithmsEach individual is a candidate solution for P
(e.g. a tree) and is represented by a chromosome string
500060002911111411562222222333333311
………
111110030
500020002914111431562223333223332211
………
111110030
123456789584321886456877318187341565
………
789241388
Generation 1 Generation 2 Generation K
56
Genetic Algorithms Mutation and cross-over Selection and elitism
000000000111111111222222222333333333
………
111110000
000020000114111131222223333223332222
………
111110000
123456789584321886456877318187341565
………
789241388
Generation 1 Generation 2 Generation K
57
Genetic AlgorithmsThe value of the best individual in each
generation should converge to the value of an optimal solution for P.
opt
generations
valu
e
58
The code must be bijective
If you use the parent vector representation the probability that an offspring string represents a tree is n – 1 1
n n
Desirable property: Locality Heritability
= 0 when n grows
Genetic Algorithms
59
Locality Small changes in the tree correspond to
small changes in the associated string (and vice versa)
Parent vector has optimal locality Prüfer-like codes exhibits poor locality Blob code experimentally better that
Prüfer code (2001) Happy and Dandelion codes should be
better than Blob code
60
Heritability A new string is generated by mixing two
existing strings with crossover operations Edges of the tree corresponding to the
mixed string belong to either of original trees
Parent vector: best Prüfer-like: poor Blob: better that Prüfer (2001) Happy and Dandelion: better than Blob
61
Other operations
Prüfer-like Picciotto
parent(v) no almost
children(v) no almost
Prüfer-like Picciotto
Identify the root yes yes
Identify leaves yes yes
Computations on degrees
yes yes
Computation of diameter
DM, N2 No
Most desirable operations:
62
Future work Develop parallel algorithms for Picciotto’s
codes
Investigate properties and operationsP(Ci = p(i)) for Blob, Happy, and Dandelion code
Define new efficient codes
Extend these codes to k-trees