large-scale knowledge processing (part-a, lecture-5)minato/lskp2018/lskp2018... · 2018. 12....
TRANSCRIPT
Large-Scale Knowledge Processing (Part-A, lecture-5)
Shin-ichi Minato
Prof. of Grad. School of Informatics, Kyoto Univ.(Grad. School of Info. Sci. & Tech., Hokkaido Univ.)
2019.01.04 Large-Scale Knowledge Processing 2
Review of the last lecture
• BDD applications– VLSI design automation
• Verification, logic optimization, test generation– Combinatorial problems, optimization
• Knapsack problem, 8-queens, Traveling salesman, etc.– Comparison with problem-specific method
• Representation of sets of combinations and ZDDs (Zero-suppressed BDDs)
• Sets of combinations• ZDD reduction rule• Basic properties of ZDDs
BDD applications and ZDDs
2019.01.04 Large-Scale Knowledge Processing 3
Topics of this lecture
Basic operations of ZDDs (review) Sequence BDDs for manipulating sets of sequences
Sets of sequences Encoded ZDDs for representing sets of sequences SeqBDDs and their basic operations Tries and SeqBDDs
Permutation decision diagrams (πDDs) Sets of permutations Permutation represented by a set of transpositions πDDs and their basic operations Applications
ZDD extensions for sequences and permutations
Exercises• Confirm that BDDs’ logical OR operation algorithm
can be applied for ZDDs as well, and it corresponds the Union operation for sets of combinations.
• Similarly, BDDs’ logical AND operation algorithm can be applied for ZDDs as well, and it corresponds the Intersection operation for sets of combinations.
2019.01.04 4Large-Scale Knowledge Processing
2019.01.04 Large-Scale Knowledge Processing 5
0 1
a
b
c
{ab, acd, cd}
d
c
N1
N3
N2
N4
N5 N6
N7 N8
b
a
{ac, ad, bc, bd}
N0
a=0 a=1N7∪N8
N3∪N6 N5∪N4
0∪N4
b=0b=1b=0 b=1
N3∪0 1∪0N3∪N4
N2∪10∪N2
0∪1
c=0 c=1
d=0 d=1
= N3
= N2
= 1
= 1
{ab, acd, cd} ∪ {ac, ad, bc, bd}
= N4
1∪0 = 1
Extension of ZDDs (Comb. to higher model)
6
CurrentZDDs
AdvancedZDD-likestructure
AdvancedZDD-likestructure
AdvancedZDD-likestructure
- Multisets- Sequences- Permutations- Partitions- Trees, DAGs- Networks- etc.
Applicationsin asymmetric world
Applicationswith higher data model
(Combinatorial)
(Higher model)
Data mining, Machine learning
Advanced searching etc.
Still many applications remains where ZDDs
would be effective.
Sequence data analysisNumerical data processing
Processing of trees or semi-structured data
Develop special new algebraic operations.
Directoutputs
Further outputs
2019.01.04 Large-Scale Knowledge Processing 7
Algebraic operations in (ordinary) ZDDs
φ, { λ }, P.top can be executed in a constant time. Other operations are almost linear time for ZDD size.
1
{λ}
change(c)
0 1
c10
{c}
1
c10
{λ, c}0 1
b
c
10
0 1
{b, bc}
0 1
a10
{a} 1
b
c10
0 1
{λ, b, bc, c}
1
b
c
10
0 1
a1
0
{λ, a, b, bc, c}
change(a)
union
change(b)
union
union
ZDD construction (applying set operations)
2019.01.04 Large-Scale Knowledge Processing 8
2019.01.04 Large-Scale Knowledge Processing 9
Sets of sequences (sets of strings)
Sets of combinations: Don’t consider order and duplication of items “abcc” and “bca” are the same as “abc”.
Sets of sequences: Distinguishes all finite sequences. {λ}, { ab, aba, bbc }, { a, aa, aaa, aaaa }, etc. Here we exclude infinite sets such as { a* }.
So many real-life applications. Text search and indexing Web (html/xml) data mining Bio informatics (e.g. DNA sequence analysis)
2019.01.04 Large-Scale Knowledge Processing 10
Encoded ZDDs for Sets of sequences
Pair of (Item - position) is considered different symbol.“aaa” “a1 a2 a3”“aba” “a1 b2 a3”
Alphabet size: |Σ|Maximum length of sequences: nTotal encoded symbols: |Σ|×n
Not very efficient. Many symbols needed. We need to put a fixed maximum
length of sequences.
2019.01.04 Large-Scale Knowledge Processing 11
Sequence BDD (SeqBDD)
Loekito, Bailey, and Pei [2009] Same as ZDD reduction rule. Only 0-edges keep variable ordering. 1-edges has no restriction. Still unique representation for a given
set of sequences. Each path from root to 1-terminal
corresponds to a sequence.
2019.01.04 Large-Scale Knowledge Processing 12
Basic operations of sequence family algebra
ZDD-like algebraic operations. onset, offset, and push operations are different. Other operations are almost same.
2019.01.04 Large-Scale Knowledge Processing 13
SeqBDD construction (applying basic operations)
a
10
{a}
1
{λ}
Push(a)a
1
{a, λ}
Union
b
a
1
{ba, b}
0Push(b)
b
a
1
{ba, a, b}
0
a
Union
Push(a)
{aba, aa, ab}
b
a
10
a
a
2019.01.04 Large-Scale Knowledge Processing 14
Correspondence of Trie and SeqBDD
Trie SeqBDD equivalent to trie Reduced SeqBDD
2019.01.04 Large-Scale Knowledge Processing 15
Trie and SeqBDD
Trie: Tree-structured lexicographical index of sequences A label of a letter for each branch A path from root to leaf corresponds to a sequence
In principle, a SeqBDD is equivalent to a trie with sharing their sub-trees. “DAWG”, which is similar idea, was already proposed in
the area of string processing. DAWG is only for indexing. SeqBDD supports not only
indexing but also set operations between SeqBDDs.
2019.01.04 Large-Scale Knowledge Processing 16
Manipulating permutations
Rubik’s cube: Let P = { π | any primitive move of cube.} P includes 18 (= 3 ways ×6 faces) permutations. Cartesian product P×P represents all possible patterns
obtained by twice of primitive moves. P20 will have all possible patterns. (but maybe too large.)
15 puzzle / Card games Optimization of packing / arranging strategy
“Amida”-drawing (primitive sorting networks) One-to-one matching problems between two parties. Any bijective relation corresponds to a permutation.
Design of loss-less codes. Analysis of reversible logic. (related to quantum logic circuit.)
(© Wikipedia)
(© Wikipedia)
2019.01.04 Large-Scale Knowledge Processing 17
Sets of permutations
Sets of combinations: ( ordinary BDDs/ZDDs) Don’t consider order and duplication of items “abcc” and “bca” are the same as “abc”.
Sets of sequences: ( SeqBDDs) Distinguishes all finite sequences. {λ}, { ab, aba, bbc }, { a, aa, aaa, aaaa }, etc.
Sets of permutations: ( πDD) Set of orders in a fixed number of items. φ, { 123 }, {12, 21}, { 123456, 132456, 246135 }
2019.01.04
Decomposition of permutation
54321
54321
(3,5,2,1,4)
π = (3,5,2,1,4)
Repeating this process provides a decomposed form.
18Large-Scale Knowledge Processing
Transpositions τ(x,y) : exchange of two items x and y. Any n-item permutation π can be decomposed
by at most (n-1) transpositions.Let x = dim(π), then π τ(x, xπ) must not move x. Thus, dim(π τ(x, xπ) ) ≦ dim(π) – 1
2019.01.04
Decomposition of permutation
54321
54321
54321
τ(5,4)(3,4,2,1)
π = (3,5,2,1,4) = (3,4,2,1) τ(5,4)
Repeating this process provides a decomposed form.
19Large-Scale Knowledge Processing
Transpositions τ(x,y) : exchange of two items x and y. Any n-item permutation π can be decomposed
by at most (n-1) transpositions.Let x = dim(π), then π τ(x, xπ) must not move x. Thus, dim(π τ(x, xπ) ) ≦ dim(π) – 1
2019.01.04
Decomposition of permutation
54321
54321
54321
τ(4,1)54321
τ(5,4)(3,1,2)
π = (3,5,2,1,4) = (3,1,2) τ(4,1) τ(5,4)
Repeating this process provides a decomposed form.
20Large-Scale Knowledge Processing
Transpositions τ(x,y) : exchange of two items x and y. Any n-item permutation π can be decomposed
by at most (n-1) transpositions.Let x = dim(π), then π τ(x, xπ) must not move x. Thus, dim(π τ(x, xπ) ) ≦ dim(π) – 1
2019.01.04
Decomposition of permutation
54321
54321
54321
τ(4,1)54321
τ(5,4)54321
τ(3,2)τ(2,1)
π = (3,5,2,1,4) = τ(2,1) τ(3,2) τ(4,1) τ(5,4)
Repeating this process provides a decomposed form.
21Large-Scale Knowledge Processing
Transpositions τ(x,y) : exchange of two items x and y. Any n-item permutation π can be decomposed
by at most (n-1) transpositions.Let x = dim(π), then π τ(x, xπ) must not move x. Thus, dim(π τ(x, xπ) ) ≦ dim(π) – 1
Deterministicprocess. canonical formfor any given π.
22
Main idea of πDDs
P = P0 ∪ P1 τ(x,y)
Using a pair of IDs (x, y) for each decision node.
Let x = dim(P), and x > y > 0
x, y
P0 P1
P
P0 = { π | π∈P , xπ ≠ y }P1 = { πτ(x,y) | π∈P , xπ = y }
dim(P0) ≦ dim(P)dim(P1) < dim(P)
2019.01.04 Large-Scale Knowledge Processing
2019.01.04 Large-Scale Knowledge Processing 23
Node reduction rules for πDDs
x,y
P0 P1
x,y x,y
P0 P1
(share)
Node sharing
x,y
P P
(jump)
0
Zero-suppressednode elimination
Same reduction rules as ZDDs. Ordinary BDD rules don’t work.
πDDs of single permutation
0
φ
1
{ πe }
0 1
2,1
{(2,1)}
0 1
3,2
{(1,3,2)}
0 1
3,1
{(3,2,1)}
0 1
3,2
{(3,1,2)}
2,1
0 1
3,1
{(2,3,1)}
2,1
321
321
321
321
321
321
2019.01.04 24Large-Scale Knowledge Processing
2019.01.04 Large-Scale Knowledge Processing 25
πDDs for sets of permutations
0 1
2,1
{(2,1)}
1
2,1
{ πe,(2,1)}
{πe,(2,1)}
{πe,(2,1),(1,3,2),(3,1,2),(3,2,1),(2,3,1)}
1
3,1
2,1
3,2
{πe,(2,1,),(1,3,2),(3,1,2)}
{πe,(2,1)}
{(2,1),(3,1,2),(3,2,1),(2,3,1)}
1
3,1
2,1
3,2
{(2,1),(3,1,2)}
0
2,1
{(2,1)}
2019.01.04 Large-Scale Knowledge Processing 26
Algebraic operations for πDDs
“Permutation family algebra”
2019.01.04 Large-Scale Knowledge Processing 27
Synthesis of πDDs by algebraic operations
0 1
2,1
{(2,1)}
1
2,1
{πe ,(2,1)}
{πe ,(2,1),(1,3,2)}
3,2
2,1
1
1
{ πe }τ(3,2)
union
0 1
3,2
{(1,3,2)}
union
1
3,2
2,1
product
1
3,2
2,1
0
difference
τ(2,1)
{πe ,(2,1),(1,3,2),(3,1,2)}
{(3,1,2)}
2019.01.04 Large-Scale Knowledge Processing 28
Product operation for disjoint permutations
2019.01.04
Application to primitive sorting networks
29Large-Scale Knowledge Processing
For n item permutations,we may layout k primitive (adjacent) swaps. How many k is the minimumfor generating a givenpermutation?
Any permutations of up to k swaps can be computed by the following simple equations. using πDDs.
2019.01.04
Experiment (1) for primitive sorting networks
30Large-Scale Knowledge Processing
Experimental results for n = 10. All 10! permutations are generated by 45 swaps. (10! - 1) with 44 swaps. πDDs have a peak size at P27. Final size is only 45 nodes.
2019.01.04
Experiment (2) for primitive sorting networks
31Large-Scale Knowledge Processing
Experimental result for n lines of primitive sorting networks. Determined the minimum swaps m to generate all n! permutations. High compression rate (>10000 times) in πDD size (peak and final).
2019.01.04
Analysis of Rubik’s cube
32Large-Scale Knowledge Processing
Here, we focus only the moves of eight corner cubes. After solving the corner cubes, we can solve the other cubes independently.
Primitive turns along X-, Y-, and Z-axis can be written as follows.
2019.01.04
Analysis of Rubik’s cube
33Large-Scale Knowledge Processing
Here, we focus only the moves of eight corner cubes. After solving the corner cubes, we can solve the other cubes independently.
Primitive turns along X-, Y-, and Z-axis can be written as follows.
2019.01.04
Analysis of Rubik’s cube
34Large-Scale Knowledge Processing
Here, we focus only the moves of eight corner cubes. After solving the corner cubes, we can solve the other cubes independently.
Primitive turns along X-, Y-, and Z-axis can be written as follows.
2019.01.04
Analysis of Rubik’s cube
35Large-Scale Knowledge Processing
Here, we focus only the moves of eight corner cubes. After solving the corner cubes, we can solve the other cubes independently.
Primitive turns along X-, Y-, and Z-axis can be written as follows.
2019.01.04
Analysis of Rubik’s cube
36Large-Scale Knowledge Processing
Here, we focus only the moves of eight corner cubes. After solving the corner cubes, we can solve the other cubes independently.
Primitive turns along X-, Y-, and Z-axis can be written as follows.
All patterns by one move.
2019.01.04
Analysis of Rubik’s cube
37Large-Scale Knowledge Processing
Here, we focus only the moves of eight corner cubes. After solving the corner cubes, we can solve the other cubes independently.
Primitive turns along X-, Y-, and Z-axis can be written as follows.
All patterns by one move.
All patterns by k moves.
2019.01.04
Experimental result for Rubik’s cube
38Large-Scale Knowledge Processing
If focusing only corner cubes,11-moves generates all patterns. Only 511 πDD nodes for all patterns.
After generating Pk, we can analyze various properties of Rubik’s cube. The patterns such that only 2 cubes
moved but other 6 cubes remain can be written as:
4 patterns in k=10, and 5 patterns in k=11.(The sequence of moves can be obtained by πDD operations.)
2019.01.04
Discussion on solving Rubik’s cube
39Large-Scale Knowledge Processing
Rokicki et al. confirmed that all 3×3 Rubik’s cube can be solved as few as 20 moves. [Rokicki et al. 2010] They don’t enumerate all minimum number of moves.
They just checked no patterns require more than 20 moves. At first they applied mathematical pruning, and then explored all
patterns by using large-scale PC cloud. (total 35 CPU years) Straight-forward application of πDD to 3×3 Rubik’s cube,
might cause memory overflow. πDD generates the minimum moves for all patterns, so the
computation is stronger than Rokicki’s result. Not using mathematical pruning, only with πDD’s compression. Using one CPU only. Our method is very flexible to modify the problem regulation.
(Definition of primitive moves, restriction of sequences, etc.)
2019.01.04 Large-Scale Knowledge Processing 40
πDD sizes for typical cases
φ, { e } : O(1) nodes Sets of a single permutation with n items: O(n) nodes Sets of any k permutations with n items: O(k n) nodes Sets of all n rotations with n items: O(n2) nodes Sets of all n! permutations with n items: O(n2) nodes
Nodes for each permutation is bounded by “swap distance” from identical permutation.
πDD can be compact for representing the family consists of many similar sub-permutations.
2019.01.04 Large-Scale Knowledge Processing 41
Upper bound of πDD sizes
Number of Families of permutations up to n items: 2n!
n 0, 1, 2, 3, 4, 5, … n! 1, 1, 2, 6, 24, 120, … 2n! 2, 2, 4, 64, 16777216,
1329227995784915872903807060280344576, … At least log n bit needed to distinguish n objects.
Thus, πDD size must be O(n!) bit in general.
2019.01.04
Future Work on πDDs
42Large-Scale Knowledge Processing
Research on πDDs is still ongoing. A lot of future work remains.
Analyzing more precise complexity for πDD operations Developing a tool for easily manipulating permutation groups Many practical applications using sets of permutations Extension of permutation family algebra. (division, etc.) k-out-of n permutations, multiset of permutations, permutation of multiset items, etc.
2019.01.04 Large-Scale Knowledge Processing 43
Summary
Basic operations of ZDDs (review) Sequence BDDs for manipulating sets of sequences
Sets of sequences Encoded ZDDs for representing sets of sequences SeqBDDs and their basic operations Tries and SeqBDDs
Permutation decision diagrams (πDDs) Sets of permutations Permutation represented by a set of transpositions πDDs and their basic operations Applications
ZDD extensions for sequences and permutations
2019.01.04 Large-Scale Knowledge Processing 44
Exercises
Draw the SeqBDDs for the following sets of sequences. { λ, a, aa, aaa, aaaa } { aaa, aab, aba, abb, baa, bab, bba, bbb } { a, ab, abb, abbb, b, ba, baa, baaa }