![Page 1: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/1.jpg)
Sparse Suffix Sorting
Tomohiro I
![Page 2: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/2.jpg)
! Sorting suffixes of a string T is a fundamental task having a lot of applications such as text indexing.
Suffix sorting
![Page 3: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/3.jpg)
! Sorting suffixes of a string T is a fundamental task having a lot of applications such as text indexing.
Suffix sorting
T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
![Page 4: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/4.jpg)
Suffix sorting
lionlyingonlyonlyon$ ionlyingonlyonlyon$ onlyingonlyonlyon$ nlyingonlyonlyon$ lyingonlyonlyon$ yingonlyonlyon$ ingonlyonlyon$ ngonlyonlyon$ gonlyonlyon$ onlyonlyon$ nlyonlyon$ lyonlyon$ yonlyon$ onlyon$ nlyon$ lyon$ yon$ on$ n$ $
all suffixes of T
T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$ gonlyonlyon$ ingonlyonlyon$ ionlyingonlyonlyon$ lionlyingonlyonlyon$ lyingonlyonlyon$ lyon$ lyonlyon$ n$ ngonlyonlyon$ nlyingonlyonlyon$ nlyon$ nlyonlyon$ on$ onlyingonlyonlyon$ onlyon$ onlyonlyon$ yingonlyonlyon$ yon$ yonlyon$
20 9 7 2 1 5
16 12 19 8 4
15 11 18 3
14 10 6
17 13
sort
![Page 5: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/5.jpg)
Suffix sorting
lionlyingonlyonlyon$ ionlyingonlyonlyon$ onlyingonlyonlyon$ nlyingonlyonlyon$ lyingonlyonlyon$ yingonlyonlyon$ ingonlyonlyon$ ngonlyonlyon$ gonlyonlyon$ onlyonlyon$ nlyonlyon$ lyonlyon$ yonlyon$ onlyon$ nlyon$ lyon$ yon$ on$ n$ $
$ gonlyonlyon$ ingonlyonlyon$ ionlyingonlyonlyon$ lionlyingonlyonlyon$ lyingonlyonlyon$ lyon$ lyonlyon$ n$ ngonlyonlyon$ nlyingonlyonlyon$ nlyon$ nlyonlyon$ on$ onlyingonlyonlyon$ onlyon$ onlyonlyon$ yingonlyonlyon$ yon$ yonlyon$
20 9 7 2 1 5
16 12 19 8 4
15 11 18 3
14 10 6
17 13
all suffixes of T
sort
T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Suffix Array of T
![Page 6: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/6.jpg)
Suffix sorting
lionlyingonlyonlyon$ ionlyingonlyonlyon$ onlyingonlyonlyon$ nlyingonlyonlyon$ lyingonlyonlyon$ yingonlyonlyon$ ingonlyonlyon$ ngonlyonlyon$ gonlyonlyon$ onlyonlyon$ nlyonlyon$ lyonlyon$ yonlyon$ onlyon$ nlyon$ lyon$ yon$ on$ n$ $
$ gonlyonlyon$ ingonlyonlyon$ ionlyingonlyonlyon$ lionlyingonlyonlyon$ lyingonlyonlyon$ lyon$ lyonlyon$ n$ ngonlyonlyon$ nlyingonlyonlyon$ nlyon$ nlyonlyon$ on$ onlyingonlyonlyon$ onlyon$ onlyonlyon$ yingonlyonlyon$ yon$ yonlyon$
20 9 7 2 1 5
16 12 19 8 4
15 11 18 3
14 10 6
17 13
T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Suffix Array of T
lyon
![Page 7: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/7.jpg)
Suffix sorting
lionlyingonlyonlyon$ ionlyingonlyonlyon$ onlyingonlyonlyon$ nlyingonlyonlyon$ lyingonlyonlyon$ yingonlyonlyon$ ingonlyonlyon$ ngonlyonlyon$ gonlyonlyon$ onlyonlyon$ nlyonlyon$ lyonlyon$ yonlyon$ onlyon$ nlyon$ lyon$ yon$ on$ n$ $
$ gonlyonlyon$ ingonlyonlyon$ ionlyingonlyonlyon$ lionlyingonlyonlyon$ lyingonlyonlyon$ lyon$ lyonlyon$ n$ ngonlyonlyon$ nlyingonlyonlyon$ nlyon$ nlyonlyon$ on$ onlyingonlyonlyon$ onlyon$ onlyonlyon$ yingonlyonlyon$ yon$ yonlyon$
20 9 7 2 1 5
16 12 19 8 4
15 11 18 3
14 10 6
17 13
T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Suffix Array of T
lyon
![Page 8: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/8.jpg)
Suffix sorting
lionlyingonlyonlyon$ ionlyingonlyonlyon$ onlyingonlyonlyon$ nlyingonlyonlyon$ lyingonlyonlyon$ yingonlyonlyon$ ingonlyonlyon$ ngonlyonlyon$ gonlyonlyon$ onlyonlyon$ nlyonlyon$ lyonlyon$ yonlyon$ onlyon$ nlyon$ lyon$ yon$ on$ n$ $
$ gonlyonlyon$ ingonlyonlyon$ ionlyingonlyonlyon$ lionlyingonlyonlyon$ lyingonlyonlyon$ lyon$ lyonlyon$ n$ ngonlyonlyon$ nlyingonlyonlyon$ nlyon$ nlyonlyon$ on$ onlyingonlyonlyon$ onlyon$ onlyonlyon$ yingonlyonlyon$ yon$ yonlyon$
20 9 7 2 1 5
16 12 19 8 4
15 11 18 3
14 10 6
17 13
T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Suffix Array of T
I found lyon in 3 steps!
![Page 9: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/9.jpg)
Sparse suffix sorting
If we are interested in some (small) set of suffixes of T.
T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
lionlyingonlyonlyon$ ionlyingonlyonlyon$ onlyingonlyonlyon$ nlyingonlyonlyon$ lyingonlyonlyon$ yingonlyonlyon$ ingonlyonlyon$ ngonlyonlyon$ gonlyonlyon$ onlyonlyon$ nlyonlyon$ lyonlyon$ yonlyon$ onlyon$ nlyon$ lyon$ yon$ on$ n$ $
![Page 10: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/10.jpg)
Sparse suffix sorting
T = l i o n l y i n g o n l y o n l y o n $ lionlyingonlyonlyon$ ionlyingonlyonlyon$ onlyingonlyonlyon$ nlyingonlyonlyon$ lyingonlyonlyon$ yingonlyonlyon$ ingonlyonlyon$ ngonlyonlyon$ gonlyonlyon$ onlyonlyon$ nlyonlyon$ lyonlyon$ yonlyon$ onlyon$ nlyon$ lyon$ yon$ on$ n$ $
lionlyingonlyonlyon$ lyonlyon$ on$ onlyon$ onlyonlyon$
1 12 18 14 10
Sparse Suffix Array (SSA)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
For sparse indexing, the sparse suffix array is enough, which is much smaller than full suffix array.
sort
If we are interested in some (small) set of suffixes of T.
![Page 11: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/11.jpg)
! It can be solved in O(n) time by computing full suffix array and make it sparse, but it needs O(n) space in addition to the text.
! An alternative way is to use any sorting algorithm for a set of strings (not for suffixes) that uses O(b) additional space, but it takes Ω(bn) time.
Problem
Given T of length n and set of b positions, sort the designated suffixes.
Using O(s) additional space for some s ∈ [b, n], how fast can we sort sparse suffixes?
Question
![Page 12: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/12.jpg)
Monte-Carlo Algorithm
Randomized algorithms that may return a wrong output with low probability.
We can win the gamble with high probability!!
![Page 13: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/13.jpg)
! Monte-Carlo algorithm using Karp-Rabin fingerprints. ! Time-space tradeoffs: for any s ∈ [b, n]
O(n+(bn/s)log s)) time using O(s) additional space. ! When s = b, the time complexity is O(n log b). ! When s = b log b, the time complexity is O(n).
Monte-Carlo Algorithm
Randomized algorithms that may return a wrong output with low probability.
We can win the gamble with high probability!!
![Page 14: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/14.jpg)
Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87]
• q is a prime number. • r is chosen from [0, q-1] uniformly at random.
For sufficiently large q, the hash function becomes perfect with high probability, i.e., T[i..i+l] = T[i’..i’+l] iff FP[i..i+l] = FP[i’..i’+l].
T i j
FP[i.. j]= T[k]⋅ r j−kk=i
j
∑ modq
![Page 15: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/15.jpg)
Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87]
i j
FP[i.. j +1]= FP[i.. j]⋅ r +T[ j +1]modq
FP[k +1.. j]= FP[i.. j]−FP[i..k]⋅ r j−kmodq
i j k+1
i j
i j+1
T
We can compute FPs incrementally.
We can “subtract” prefix FP’s influence.
![Page 16: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/16.jpg)
T 1 n
We can preprocess T in O(n) time and O(s) space so that the fingerprint for any substring of length l can be computed in O(min{l, n/s}) time.
Lemma
FP-2
FP-4
n/s 2n/s 3n/s 4n/s
O(s)
FP-1
FP-3
![Page 17: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/17.jpg)
FP-3
FP-1
FP-2
FP-4
T 1 n n/s 2n/s 3n/s 4n/s
We can preprocess T in O(n) time and O(s) space so that the fingerprint for any substring of length l can be computed in O(min{l, n/s}) time.
Lemma
O(s)
FP-short
FP-long
FP-1
FP-3
![Page 18: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/18.jpg)
! The algorithm constructs sparse suffix tree.
Monte-Carlo algorithm
1
12
18
10
14
ionlying…
yonlyon$
lyon
l
on
ly…
$ $
lionlyingonlyonlyon$ lyonlyon$ on$ onlyon$ onlyonlyon$
1 12 18 14 10
Sparse Suffix Array
Sparse Suffix Tree: • Compacted trie representing all designated suffixes. • For each node, child edges are sorted. • Since edge labels can be encoded by pointers to T,
sparse suffix tree can be represented in O(b) space.
![Page 19: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/19.jpg)
Overview: gradual refinement
1
10
12
18
14
lionlying…
lyonlyon$ onlyon$
onlyonlyon$
on$
sort edges
1
12
18
10
14
ionlying…
yonlyon$
lyon
l
on
ly…
$ $
goal
1
10
14
18
12
only
lionlying…
lyonlyon$
on$
on$
only…
1
10
14
12
18
lyon on
$
lionlying…
$
ly…
lyonlyon$
1
12
10
18
14
ionlying…
yonlyon$
lyon
l
on ly…
$ $
T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
![Page 20: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/20.jpg)
l-strict (unsorted) sparse suffix trees
sort edges
1
12
18
10
14
ionlying…
yonlyon$
lyon
l
on
ly…
$ $
goal
2-strict 4-strict {32, 16, 8}-strict
1-strict
T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1
10
12
18
14
lionlying…
lyonlyon$ onlyon$
onlyonlyon$
on$
1
10
14
18
12
only
lionlying…
lyonlyon$
on$
on$
only…
1
10
14
12
18
lyon on
$
lionlying…
$
ly…
lyonlyon$
1
12
10
18
14
ionlying…
yonlyon$
lyon
l
on ly…
$ $
![Page 21: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/21.jpg)
l-strict (unsorted) sparse suffix trees
For any internal node of l-strict tree, any pair of child edge labels has common prefix of length less than l.
T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1
10
12
18
14
lionlying…
lyonlyon$ onlyon$
onlyonlyon$
on$
1
10
14
18
12
only
lionlying…
lyonlyon$
on$
on$
only…
1
10
14
12
18
lyon on
$
lionlying…
$
ly…
lyonlyon$
1
12
10
18
14
ionlying…
yonlyon$
lyon
l
on ly…
$ $
2-strict 4-strict {32, 16, 8}-strict
1-strict Condition of l-strict tree
![Page 22: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/22.jpg)
1
10
12
18
14
lionlying…
lyonlyon$ onlyon$
onlyonlyon$
on$
1
12
10
18
14
ionlying…
yonlyon$
lyon
l
on ly…
$ $
From 4-strict tree to 2-strict tree
{32, 16, 8}-strict
1-strict
2-strict 4-strict
For every edge label of 4-strict tree, we compute FP for the prefix of length 2, and merge edge labels having the same value.
T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1
10
14
18
12
only
lionlying…
lyonlyon$
on$
on$
only…
1
10
14
12
18
lyon on
$
lionlying…
$
ly…
lyonlyon$
![Page 23: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/23.jpg)
Gradual refinement
T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1
10
12
18
14
lionlying…
lyonlyon$ onlyon$
onlyonlyon$
on$
1
10
14
18
12
only
lionlying…
lyonlyon$
on$
on$
only…
1
10
14
12
18
lyon on
$
lionlying…
$
ly…
lyonlyon$
1
12
10
18
14
ionlying…
yonlyon$
lyon
l
on ly…
$ $
2-strict 4-strict {32, 16, 8}-strict
1-strict
![Page 24: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/24.jpg)
! For each refinement step, the FPs can be computed in O(b min{l, n/s}) time.
! For the first log s steps, the cost is O((bn/s)log s). ! After log s steps, since l < n/2log s = n/s, the cost is
bn/s + bn/2s + bn/4s + … + b = O(bn/s)
The total cost for gradual refinement
The total cost for FP computation is O((bn/s)log s). Proof.
![Page 25: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/25.jpg)
! For each refinement step, we conduct radix grouping of O(b) integer keys with radix s, which can be done in O(b logsn) time.
The total cost for gradual refinement
Proof.
The total cost for merging edge labels is O(b log n logsn) = O(b log2n / log s) = O((bn/s)log s).
![Page 26: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/26.jpg)
Grouping by distribute-and-collect
432 52
52
978
978
52
k elements associated with keys (integers in [0, n])
Empty-initialized buckets of size n
0 1 … 51 52 53 … 431 432 434 … 977 978 979 … n
![Page 27: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/27.jpg)
Grouping by distribute-and-collect
432 52
52
978
978
52
k elements associated with keys (integers in [0, n])
0 1 … 51 52 53 … 431 432 434 … 977 978 979 … n
Empty-initialized buckets of size n
![Page 28: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/28.jpg)
Grouping by distribute-and-collect
432 52
52
978
978
52
k elements associated with keys (integers in [0, n])
0 1 … 51 52 53 … 431 432 434 … 977 978 979 … n
O(k) time Empty-initialized buckets of size n
![Page 29: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/29.jpg)
Grouping by distribute-and-collect
432 52
52
978
978
52
k elements associated with keys (integers in [0, n])
0 1 2 3 4 5 6 7 8 9
Empty-initialized buckets of size 10
![Page 30: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/30.jpg)
Grouping by distribute-and-collect
432 52
52
978
978
52
k elements associated with keys (integers in [0, n])
0 1 2 3 4 5 6 7 8 9
Empty-initialized buckets of size 10
![Page 31: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/31.jpg)
Grouping by distribute-and-collect
432 52
52
978
978
52
k elements associated with keys (integers in [0, n])
0 1 2 3 4 5 6 7 8 9
432 52
52 978 978
52
Empty-initialized buckets of size 10
![Page 32: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/32.jpg)
Grouping by distribute-and-collect
k elements associated with keys (integers in [0, n])
0 1 2 3 4 5 6 7 8 9
432 52
52 978 978
52
432 52
52
978
978
52
Empty-initialized buckets of size 10
![Page 33: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/33.jpg)
Grouping by distribute-and-collect
k elements associated with keys (integers in [0, n])
0 1 2 3 4 5 6 7 8 9
432 52
52 978 978
52
432 52
52
978
978
52 432
52 52
52
Empty-initialized buckets of size 10
![Page 34: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/34.jpg)
Grouping by distribute-and-collect
k elements associated with keys (integers in [0, n])
0 1 2 3 4 5 6 7 8 9
432 52
52 978 978
52
432 52
52
978
978
52 432
52 52
52
978 978
432
52 52
52
978 978
O(k log10n) time Empty-initialized buckets of size 10
![Page 35: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/35.jpg)
Grouping by distribute-and-collect Radix grouping with radix s
k elements associated with keys (integers in [0, n])
0 1 2 3 … … s-4 s-3 s-2 s-1
432 52
52
978
978
52 432
52 52
52
978 978
O(k logsn) time
logsn steps
Empty-initialized buckets of size s
![Page 36: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/36.jpg)
! For each refinement step, we conduct radix grouping of O(b) integer keys with radix s, which can be done in O(b logsn) time.
The total cost for gradual refinement
Proof.
The total cost for merging edge labels is O(b log n logsn) = O(b log2n / log s) = O((bn/s)log s).
![Page 37: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/37.jpg)
For any s ∈ [b, n], we can construct sparse suffix tree correctly with high probability in O(n+(bn/s)log s)) time using O(s) additional space.
! When s = b, the time complexity is O(n log b). ! When s = b log b, the time complexity is O(n).
Monte-Carlo algorithm
Theorem
The total cost for FP computation is O((bn/s)log s).
The total cost for merging edge labels is O(b log n logsn) = O(b log2n / log s) = O((bn/s)log s).
![Page 38: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently](https://reader033.vdocuments.site/reader033/viewer/2022042419/5f361567297821762e53c15b/html5/thumbnails/38.jpg)
! Follow the gradual refinement steps for this instance.
Exercise
1
3
6
9
7
doortodoortodortmund$
odoortodortmund$
doortodortmund$
ortodoortodortmund$
ortodortmund$
T = d o o r t o d o o r t o d o r t m u n d $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
?
13
12
odortmund$
dortmund$