large-scale knowledge processing (part-a, lecture-5)minato/lskp2018/lskp2018... · 2018. 12....

Large-Scale Knowledge Processing (Part-A, lecture-5)

Shin-ichi Minato

Prof. of Grad. School of Informatics, Kyoto Univ.(Grad. School of Info. Sci. & Tech., Hokkaido Univ.)

2019.01.04 Large-Scale Knowledge Processing 2

Review of the last lecture

• BDD applications– VLSI design automation

• Verification, logic optimization, test generation– Combinatorial problems, optimization

• Knapsack problem, 8-queens, Traveling salesman, etc.– Comparison with problem-specific method

• Representation of sets of combinations and ZDDs (Zero-suppressed BDDs)

• Sets of combinations• ZDD reduction rule• Basic properties of ZDDs

BDD applications and ZDDs


Topics of this lecture

Basic operations of ZDDs (review) Sequence BDDs for manipulating sets of sequences

Sets of sequences Encoded ZDDs for representing sets of sequences SeqBDDs and their basic operations Tries and SeqBDDs

Permutation decision diagrams (πDDs) Sets of permutations Permutation represented by a set of transpositions πDDs and their basic operations Applications

ZDD extensions for sequences and permutations

Exercises• Confirm that BDDs’ logical OR operation algorithm

can be applied for ZDDs as well, and it corresponds the Union operation for sets of combinations.

• Similarly, BDDs’ logical AND operation algorithm can be applied for ZDDs as well, and it corresponds the Intersection operation for sets of combinations.

2019.01.04 4Large-Scale Knowledge Processing


0 1

a

b

c

{ab, acd, cd}

d

c

N1

N3

N2

N4

N5 N6

N7 N8

b

a

{ac, ad, bc, bd}

N0

a=0 a=1N7∪N8

N3∪N6 N5∪N4

0∪N4

b=0b=1b=0 b=1

N3∪0 1∪0N3∪N4

N2∪10∪N2

0∪1

c=0 c=1

d=0 d=1

= N3

= N2

= 1

= 1

{ab, acd, cd} ∪ {ac, ad, bc, bd}

= N4

1∪0 = 1

Extension of ZDDs (Comb. to higher model)

6

CurrentZDDs

AdvancedZDD-likestructure



- Multisets- Sequences- Permutations- Partitions- Trees, DAGs- Networks- etc.

Applicationsin asymmetric world

Applicationswith higher data model

(Combinatorial)

(Higher model)

Data mining, Machine learning

Advanced searching etc.

Still many applications remains where ZDDs

would be effective.

Sequence data analysisNumerical data processing

Processing of trees or semi-structured data

Develop special new algebraic operations.

Directoutputs

Further outputs


Algebraic operations in (ordinary) ZDDs

φ, { λ }, P.top can be executed in a constant time. Other operations are almost linear time for ZDD size.

1

{λ}

change(c)

0 1

c10

{c}

1

c10

{λ, c}0 1

b

c

10

0 1

{b, bc}

0 1

a10

{a} 1

b

c10

0 1

{λ, b, bc, c}

1

b

c

10

0 1

a1

0

{λ, a, b, bc, c}

change(a)

union

change(b)

union

union

ZDD construction (applying set operations)



Sets of sequences (sets of strings)

Sets of combinations: Don’t consider order and duplication of items “abcc” and “bca” are the same as “abc”.

Sets of sequences: Distinguishes all finite sequences. {λ}, { ab, aba, bbc }, { a, aa, aaa, aaaa }, etc. Here we exclude infinite sets such as { a* }.

So many real-life applications. Text search and indexing Web (html/xml) data mining Bio informatics (e.g. DNA sequence analysis)


Encoded ZDDs for Sets of sequences

Pair of (Item - position) is considered different symbol.“aaa” “a1 a2 a3”“aba” “a1 b2 a3”

Alphabet size: |Σ|Maximum length of sequences: nTotal encoded symbols: |Σ|×n

Not very efficient. Many symbols needed. We need to put a fixed maximum

length of sequences.


Sequence BDD (SeqBDD)

Loekito, Bailey, and Pei [2009] Same as ZDD reduction rule. Only 0-edges keep variable ordering. 1-edges has no restriction. Still unique representation for a given

set of sequences. Each path from root to 1-terminal

corresponds to a sequence.


Basic operations of sequence family algebra

ZDD-like algebraic operations. onset, offset, and push operations are different. Other operations are almost same.


SeqBDD construction (applying basic operations)

a

10

{a}

1

{λ}

Push(a)a

1

{a, λ}

Union

b

a

1

{ba, b}

0Push(b)

b

a

1

{ba, a, b}

0

a

Union

Push(a)

{aba, aa, ab}

b

a

10

a

a


Correspondence of Trie and SeqBDD

Trie SeqBDD equivalent to trie Reduced SeqBDD


Trie and SeqBDD

Trie: Tree-structured lexicographical index of sequences A label of a letter for each branch A path from root to leaf corresponds to a sequence

In principle, a SeqBDD is equivalent to a trie with sharing their sub-trees. “DAWG”, which is similar idea, was already proposed in

the area of string processing. DAWG is only for indexing. SeqBDD supports not only

indexing but also set operations between SeqBDDs.


Manipulating permutations

Rubik’s cube: Let P = { π | any primitive move of cube.} P includes 18 (= 3 ways ×6 faces) permutations. Cartesian product P×P represents all possible patterns

obtained by twice of primitive moves. P20 will have all possible patterns. (but maybe too large.)

15 puzzle / Card games Optimization of packing / arranging strategy

“Amida”-drawing (primitive sorting networks) One-to-one matching problems between two parties. Any bijective relation corresponds to a permutation.

Design of loss-less codes. Analysis of reversible logic. (related to quantum logic circuit.)

(© Wikipedia)

(© Wikipedia)


Sets of permutations

Sets of combinations: ( ordinary BDDs/ZDDs) Don’t consider order and duplication of items “abcc” and “bca” are the same as “abc”.

Sets of sequences: ( SeqBDDs) Distinguishes all finite sequences. {λ}, { ab, aba, bbc }, { a, aa, aaa, aaaa }, etc.

Sets of permutations: ( πDD) Set of orders in a fixed number of items. φ, { 123 }, {12, 21}, { 123456, 132456, 246135 }

2019.01.04

Decomposition of permutation

54321

54321

(3,5,2,1,4)

π = (3,5,2,1,4)

Repeating this process provides a decomposed form.

18Large-Scale Knowledge Processing

Transpositions τ(x,y) : exchange of two items x and y. Any n-item permutation π can be decomposed

by at most (n-1) transpositions.Let x = dim(π), then π τ(x, xπ) must not move x. Thus, dim(π τ(x, xπ) ) ≦ dim(π) – 1

2019.01.04


54321

54321

54321

τ(5,4)(3,4,2,1)

π = (3,5,2,1,4) = (3,4,2,1) τ(5,4)





2019.01.04


54321

54321

54321

τ(4,1)54321

τ(5,4)(3,1,2)

π = (3,5,2,1,4) = (3,1,2) τ(4,1) τ(5,4)





2019.01.04


54321

54321

54321

τ(4,1)54321

τ(5,4)54321

τ(3,2)τ(2,1)

π = (3,5,2,1,4) = τ(2,1) τ(3,2) τ(4,1) τ(5,4)





Deterministicprocess. canonical formfor any given π.

22

Main idea of πDDs

P = P0 ∪ P1 τ(x,y)

Using a pair of IDs (x, y) for each decision node.

Let x = dim(P), and x > y > 0

x, y

P0 P1

P

P0 = { π | π∈P , xπ ≠ y }P1 = { πτ(x,y) | π∈P , xπ = y }

dim(P0) ≦ dim(P)dim(P1) < dim(P)

2019.01.04 Large-Scale Knowledge Processing


Node reduction rules for πDDs

x,y

P0 P1

x,y x,y

P0 P1

(share)

Node sharing

x,y

P P

(jump)

0

Zero-suppressednode elimination

Same reduction rules as ZDDs. Ordinary BDD rules don’t work.

πDDs of single permutation

0

φ

1

{ πe }

0 1

2,1

{(2,1)}

0 1

3,2

{(1,3,2)}

0 1

3,1

{(3,2,1)}

0 1

3,2

{(3,1,2)}

2,1

0 1

3,1

{(2,3,1)}

2,1

321

321

321

321

321

321

2019.01.04 24Large-Scale Knowledge Processing


πDDs for sets of permutations

0 1

2,1

{(2,1)}

1

2,1

{ πe,(2,1)}

{πe,(2,1)}

{πe,(2,1),(1,3,2),(3,1,2),(3,2,1),(2,3,1)}

1

3,1

2,1

3,2

{πe,(2,1,),(1,3,2),(3,1,2)}

{πe,(2,1)}

{(2,1),(3,1,2),(3,2,1),(2,3,1)}

1

3,1

2,1

3,2

{(2,1),(3,1,2)}

0

2,1

{(2,1)}


Algebraic operations for πDDs

“Permutation family algebra”


Synthesis of πDDs by algebraic operations

0 1

2,1

{(2,1)}

1

2,1

{πe ,(2,1)}

{πe ,(2,1),(1,3,2)}

3,2

2,1

1

1

{ πe }τ(3,2)

union

0 1

3,2

{(1,3,2)}

union

1

3,2

2,1

product

1

3,2

2,1

0

difference

τ(2,1)

{πe ,(2,1),(1,3,2),(3,1,2)}

{(3,1,2)}


Product operation for disjoint permutations

2019.01.04

Application to primitive sorting networks


For n item permutations,we may layout k primitive (adjacent) swaps. How many k is the minimumfor generating a givenpermutation?

Any permutations of up to k swaps can be computed by the following simple equations. using πDDs.

2019.01.04

Experiment (1) for primitive sorting networks


Experimental results for n = 10. All 10! permutations are generated by 45 swaps. (10! - 1) with 44 swaps. πDDs have a peak size at P27. Final size is only 45 nodes.

2019.01.04

Experiment (2) for primitive sorting networks


Experimental result for n lines of primitive sorting networks. Determined the minimum swaps m to generate all n! permutations. High compression rate (>10000 times) in πDD size (peak and final).

2019.01.04

Analysis of Rubik’s cube


Here, we focus only the moves of eight corner cubes. After solving the corner cubes, we can solve the other cubes independently.

Primitive turns along X-, Y-, and Z-axis can be written as follows.

2019.01.04





2019.01.04





All patterns by one move.

2019.01.04





All patterns by one move.

All patterns by k moves.

2019.01.04

Experimental result for Rubik’s cube


If focusing only corner cubes,11-moves generates all patterns. Only 511 πDD nodes for all patterns.

After generating Pk, we can analyze various properties of Rubik’s cube. The patterns such that only 2 cubes

moved but other 6 cubes remain can be written as:

4 patterns in k=10, and 5 patterns in k=11.(The sequence of moves can be obtained by πDD operations.)

2019.01.04

Discussion on solving Rubik’s cube


Rokicki et al. confirmed that all 3×3 Rubik’s cube can be solved as few as 20 moves. [Rokicki et al. 2010] They don’t enumerate all minimum number of moves.

They just checked no patterns require more than 20 moves. At first they applied mathematical pruning, and then explored all

patterns by using large-scale PC cloud. (total 35 CPU years) Straight-forward application of πDD to 3×3 Rubik’s cube,

might cause memory overflow. πDD generates the minimum moves for all patterns, so the

computation is stronger than Rokicki’s result. Not using mathematical pruning, only with πDD’s compression. Using one CPU only. Our method is very flexible to modify the problem regulation.

(Definition of primitive moves, restriction of sequences, etc.)


πDD sizes for typical cases

φ, { e } : O(1) nodes Sets of a single permutation with n items: O(n) nodes Sets of any k permutations with n items: O(k n) nodes Sets of all n rotations with n items: O(n2) nodes Sets of all n! permutations with n items: O(n2) nodes

Nodes for each permutation is bounded by “swap distance” from identical permutation.

πDD can be compact for representing the family consists of many similar sub-permutations.


Upper bound of πDD sizes

Number of Families of permutations up to n items: 2n!

n 0, 1, 2, 3, 4, 5, … n! 1, 1, 2, 6, 24, 120, … 2n! 2, 2, 4, 64, 16777216,

1329227995784915872903807060280344576, … At least log n bit needed to distinguish n objects.

Thus, πDD size must be O(n!) bit in general.

2019.01.04

Future Work on πDDs


Research on πDDs is still ongoing. A lot of future work remains.

Analyzing more precise complexity for πDD operations Developing a tool for easily manipulating permutation groups Many practical applications using sets of permutations Extension of permutation family algebra. (division, etc.) k-out-of n permutations, multiset of permutations, permutation of multiset items, etc.


Summary

Basic operations of ZDDs (review) Sequence BDDs for manipulating sets of sequences

Sets of sequences Encoded ZDDs for representing sets of sequences SeqBDDs and their basic operations Tries and SeqBDDs

Permutation decision diagrams (πDDs) Sets of permutations Permutation represented by a set of transpositions πDDs and their basic operations Applications

ZDD extensions for sequences and permutations


Exercises

Draw the SeqBDDs for the following sets of sequences. { λ, a, aa, aaa, aaaa } { aaa, aab, aba, abb, baa, bab, bba, bbb } { a, ab, abb, abbb, b, ba, baa, baaa }

large-scale knowledge processing (part-a, lecture-5)minato/lskp2018/lskp2018... · 2018. 12....

Documents