sparse suffix sorting€¦ · karp-rabin fingerprints (rolling hash) [karp and rabin, `87] • q is...

38
Sparse Suffix Sorting Tomohiro I

Upload: others

Post on 10-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Sparse Suffix Sorting

Tomohiro I

Page 2: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

!  Sorting suffixes of a string T is a fundamental task having a lot of applications such as text indexing.

Suffix sorting

Page 3: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

!  Sorting suffixes of a string T is a fundamental task having a lot of applications such as text indexing.

Suffix sorting

T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Page 4: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Suffix sorting

lionlyingonlyonlyon$ ionlyingonlyonlyon$ onlyingonlyonlyon$ nlyingonlyonlyon$ lyingonlyonlyon$ yingonlyonlyon$ ingonlyonlyon$ ngonlyonlyon$ gonlyonlyon$ onlyonlyon$ nlyonlyon$ lyonlyon$ yonlyon$ onlyon$ nlyon$ lyon$ yon$ on$ n$ $

all suffixes of T

T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

$ gonlyonlyon$ ingonlyonlyon$ ionlyingonlyonlyon$ lionlyingonlyonlyon$ lyingonlyonlyon$ lyon$ lyonlyon$ n$ ngonlyonlyon$ nlyingonlyonlyon$ nlyon$ nlyonlyon$ on$ onlyingonlyonlyon$ onlyon$ onlyonlyon$ yingonlyonlyon$ yon$ yonlyon$

20 9 7 2 1 5

16 12 19 8 4

15 11 18 3

14 10 6

17 13

sort

Page 5: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Suffix sorting

lionlyingonlyonlyon$ ionlyingonlyonlyon$ onlyingonlyonlyon$ nlyingonlyonlyon$ lyingonlyonlyon$ yingonlyonlyon$ ingonlyonlyon$ ngonlyonlyon$ gonlyonlyon$ onlyonlyon$ nlyonlyon$ lyonlyon$ yonlyon$ onlyon$ nlyon$ lyon$ yon$ on$ n$ $

$ gonlyonlyon$ ingonlyonlyon$ ionlyingonlyonlyon$ lionlyingonlyonlyon$ lyingonlyonlyon$ lyon$ lyonlyon$ n$ ngonlyonlyon$ nlyingonlyonlyon$ nlyon$ nlyonlyon$ on$ onlyingonlyonlyon$ onlyon$ onlyonlyon$ yingonlyonlyon$ yon$ yonlyon$

20 9 7 2 1 5

16 12 19 8 4

15 11 18 3

14 10 6

17 13

all suffixes of T

sort

T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Suffix Array of T

Page 6: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Suffix sorting

lionlyingonlyonlyon$ ionlyingonlyonlyon$ onlyingonlyonlyon$ nlyingonlyonlyon$ lyingonlyonlyon$ yingonlyonlyon$ ingonlyonlyon$ ngonlyonlyon$ gonlyonlyon$ onlyonlyon$ nlyonlyon$ lyonlyon$ yonlyon$ onlyon$ nlyon$ lyon$ yon$ on$ n$ $

$ gonlyonlyon$ ingonlyonlyon$ ionlyingonlyonlyon$ lionlyingonlyonlyon$ lyingonlyonlyon$ lyon$ lyonlyon$ n$ ngonlyonlyon$ nlyingonlyonlyon$ nlyon$ nlyonlyon$ on$ onlyingonlyonlyon$ onlyon$ onlyonlyon$ yingonlyonlyon$ yon$ yonlyon$

20 9 7 2 1 5

16 12 19 8 4

15 11 18 3

14 10 6

17 13

T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Suffix Array of T

lyon

Page 7: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Suffix sorting

lionlyingonlyonlyon$ ionlyingonlyonlyon$ onlyingonlyonlyon$ nlyingonlyonlyon$ lyingonlyonlyon$ yingonlyonlyon$ ingonlyonlyon$ ngonlyonlyon$ gonlyonlyon$ onlyonlyon$ nlyonlyon$ lyonlyon$ yonlyon$ onlyon$ nlyon$ lyon$ yon$ on$ n$ $

$ gonlyonlyon$ ingonlyonlyon$ ionlyingonlyonlyon$ lionlyingonlyonlyon$ lyingonlyonlyon$ lyon$ lyonlyon$ n$ ngonlyonlyon$ nlyingonlyonlyon$ nlyon$ nlyonlyon$ on$ onlyingonlyonlyon$ onlyon$ onlyonlyon$ yingonlyonlyon$ yon$ yonlyon$

20 9 7 2 1 5

16 12 19 8 4

15 11 18 3

14 10 6

17 13

T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Suffix Array of T

lyon

Page 8: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Suffix sorting

lionlyingonlyonlyon$ ionlyingonlyonlyon$ onlyingonlyonlyon$ nlyingonlyonlyon$ lyingonlyonlyon$ yingonlyonlyon$ ingonlyonlyon$ ngonlyonlyon$ gonlyonlyon$ onlyonlyon$ nlyonlyon$ lyonlyon$ yonlyon$ onlyon$ nlyon$ lyon$ yon$ on$ n$ $

$ gonlyonlyon$ ingonlyonlyon$ ionlyingonlyonlyon$ lionlyingonlyonlyon$ lyingonlyonlyon$ lyon$ lyonlyon$ n$ ngonlyonlyon$ nlyingonlyonlyon$ nlyon$ nlyonlyon$ on$ onlyingonlyonlyon$ onlyon$ onlyonlyon$ yingonlyonlyon$ yon$ yonlyon$

20 9 7 2 1 5

16 12 19 8 4

15 11 18 3

14 10 6

17 13

T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Suffix Array of T

I found lyon in 3 steps!

Page 9: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Sparse suffix sorting

If we are interested in some (small) set of suffixes of T.

T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

lionlyingonlyonlyon$ ionlyingonlyonlyon$ onlyingonlyonlyon$ nlyingonlyonlyon$ lyingonlyonlyon$ yingonlyonlyon$ ingonlyonlyon$ ngonlyonlyon$ gonlyonlyon$ onlyonlyon$ nlyonlyon$ lyonlyon$ yonlyon$ onlyon$ nlyon$ lyon$ yon$ on$ n$ $

Page 10: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Sparse suffix sorting

T = l i o n l y i n g o n l y o n l y o n $ lionlyingonlyonlyon$ ionlyingonlyonlyon$ onlyingonlyonlyon$ nlyingonlyonlyon$ lyingonlyonlyon$ yingonlyonlyon$ ingonlyonlyon$ ngonlyonlyon$ gonlyonlyon$ onlyonlyon$ nlyonlyon$ lyonlyon$ yonlyon$ onlyon$ nlyon$ lyon$ yon$ on$ n$ $

lionlyingonlyonlyon$ lyonlyon$ on$ onlyon$ onlyonlyon$

1 12 18 14 10

Sparse Suffix Array (SSA)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

For sparse indexing, the sparse suffix array is enough, which is much smaller than full suffix array.

sort

If we are interested in some (small) set of suffixes of T.

Page 11: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

!   It can be solved in O(n) time by computing full suffix array and make it sparse, but it needs O(n) space in addition to the text.

!  An alternative way is to use any sorting algorithm for a set of strings (not for suffixes) that uses O(b) additional space, but it takes Ω(bn) time.

Problem

Given T of length n and set of b positions, sort the designated suffixes.

Using O(s) additional space for some s ∈ [b, n], how fast can we sort sparse suffixes?

Question

Page 12: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Monte-Carlo Algorithm

Randomized algorithms that may return a wrong output with low probability.

We can win the gamble with high probability!!

Page 13: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

!  Monte-Carlo algorithm using Karp-Rabin fingerprints. ! Time-space tradeoffs: for any s ∈ [b, n]

O(n+(bn/s)log s)) time using O(s) additional space. ! When s = b, the time complexity is O(n log b). ! When s = b log b, the time complexity is O(n).

Monte-Carlo Algorithm

Randomized algorithms that may return a wrong output with low probability.

We can win the gamble with high probability!!

Page 14: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87]

•  q is a prime number. •  r is chosen from [0, q-1] uniformly at random.

For sufficiently large q, the hash function becomes perfect with high probability, i.e., T[i..i+l] = T[i’..i’+l] iff FP[i..i+l] = FP[i’..i’+l].

T i j

FP[i.. j]= T[k]⋅ r j−kk=i

j

∑ modq

Page 15: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87]

i j

FP[i.. j +1]= FP[i.. j]⋅ r +T[ j +1]modq

FP[k +1.. j]= FP[i.. j]−FP[i..k]⋅ r j−kmodq

i j k+1

i j

i j+1

T

We can compute FPs incrementally.

We can “subtract” prefix FP’s influence.

Page 16: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

T 1 n

We can preprocess T in O(n) time and O(s) space so that the fingerprint for any substring of length l can be computed in O(min{l, n/s}) time.

Lemma

FP-2

FP-4

n/s 2n/s 3n/s 4n/s

O(s)

FP-1

FP-3

Page 17: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

FP-3

FP-1

FP-2

FP-4

T 1 n n/s 2n/s 3n/s 4n/s

We can preprocess T in O(n) time and O(s) space so that the fingerprint for any substring of length l can be computed in O(min{l, n/s}) time.

Lemma

O(s)

FP-short

FP-long

FP-1

FP-3

Page 18: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

!  The algorithm constructs sparse suffix tree.

Monte-Carlo algorithm

1

12

18

10

14

ionlying…

yonlyon$

lyon

l

on

ly…

$ $

lionlyingonlyonlyon$ lyonlyon$ on$ onlyon$ onlyonlyon$

1 12 18 14 10

Sparse Suffix Array

Sparse Suffix Tree: •  Compacted trie representing all designated suffixes. •  For each node, child edges are sorted. •  Since edge labels can be encoded by pointers to T,

sparse suffix tree can be represented in O(b) space.

Page 19: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Overview: gradual refinement

1

10

12

18

14

lionlying…

lyonlyon$ onlyon$

onlyonlyon$

on$

sort edges

1

12

18

10

14

ionlying…

yonlyon$

lyon

l

on

ly…

$ $

goal

1

10

14

18

12

only

lionlying…

lyonlyon$

on$

on$

only…

1

10

14

12

18

lyon on

$

lionlying…

$

ly…

lyonlyon$

1

12

10

18

14

ionlying…

yonlyon$

lyon

l

on ly…

$ $

T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Page 20: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

l-strict (unsorted) sparse suffix trees

sort edges

1

12

18

10

14

ionlying…

yonlyon$

lyon

l

on

ly…

$ $

goal

2-strict 4-strict {32, 16, 8}-strict

1-strict

T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1

10

12

18

14

lionlying…

lyonlyon$ onlyon$

onlyonlyon$

on$

1

10

14

18

12

only

lionlying…

lyonlyon$

on$

on$

only…

1

10

14

12

18

lyon on

$

lionlying…

$

ly…

lyonlyon$

1

12

10

18

14

ionlying…

yonlyon$

lyon

l

on ly…

$ $

Page 21: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

l-strict (unsorted) sparse suffix trees

For any internal node of l-strict tree, any pair of child edge labels has common prefix of length less than l.

T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1

10

12

18

14

lionlying…

lyonlyon$ onlyon$

onlyonlyon$

on$

1

10

14

18

12

only

lionlying…

lyonlyon$

on$

on$

only…

1

10

14

12

18

lyon on

$

lionlying…

$

ly…

lyonlyon$

1

12

10

18

14

ionlying…

yonlyon$

lyon

l

on ly…

$ $

2-strict 4-strict {32, 16, 8}-strict

1-strict Condition of l-strict tree

Page 22: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

1

10

12

18

14

lionlying…

lyonlyon$ onlyon$

onlyonlyon$

on$

1

12

10

18

14

ionlying…

yonlyon$

lyon

l

on ly…

$ $

From 4-strict tree to 2-strict tree

{32, 16, 8}-strict

1-strict

2-strict 4-strict

For every edge label of 4-strict tree, we compute FP for the prefix of length 2, and merge edge labels having the same value.

T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1

10

14

18

12

only

lionlying…

lyonlyon$

on$

on$

only…

1

10

14

12

18

lyon on

$

lionlying…

$

ly…

lyonlyon$

Page 23: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Gradual refinement

T = l i o n l y i n g o n l y o n l y o n $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1

10

12

18

14

lionlying…

lyonlyon$ onlyon$

onlyonlyon$

on$

1

10

14

18

12

only

lionlying…

lyonlyon$

on$

on$

only…

1

10

14

12

18

lyon on

$

lionlying…

$

ly…

lyonlyon$

1

12

10

18

14

ionlying…

yonlyon$

lyon

l

on ly…

$ $

2-strict 4-strict {32, 16, 8}-strict

1-strict

Page 24: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

!  For each refinement step, the FPs can be computed in O(b min{l, n/s}) time.

!  For the first log s steps, the cost is O((bn/s)log s). !  After log s steps, since l < n/2log s = n/s, the cost is

bn/s + bn/2s + bn/4s + … + b = O(bn/s)

The total cost for gradual refinement

The total cost for FP computation is O((bn/s)log s). Proof.

Page 25: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

!  For each refinement step, we conduct radix grouping of O(b) integer keys with radix s, which can be done in O(b logsn) time.

The total cost for gradual refinement

Proof.

The total cost for merging edge labels is O(b log n logsn) = O(b log2n / log s) = O((bn/s)log s).

Page 26: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Grouping by distribute-and-collect

432 52

52

978

978

52

k elements associated with keys (integers in [0, n])

Empty-initialized buckets of size n

0 1 … 51 52 53 … 431 432 434 … 977 978 979 … n

Page 27: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Grouping by distribute-and-collect

432 52

52

978

978

52

k elements associated with keys (integers in [0, n])

0 1 … 51 52 53 … 431 432 434 … 977 978 979 … n

Empty-initialized buckets of size n

Page 28: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Grouping by distribute-and-collect

432 52

52

978

978

52

k elements associated with keys (integers in [0, n])

0 1 … 51 52 53 … 431 432 434 … 977 978 979 … n

O(k) time Empty-initialized buckets of size n

Page 29: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Grouping by distribute-and-collect

432 52

52

978

978

52

k elements associated with keys (integers in [0, n])

0 1 2 3 4 5 6 7 8 9

Empty-initialized buckets of size 10

Page 30: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Grouping by distribute-and-collect

432 52

52

978

978

52

k elements associated with keys (integers in [0, n])

0 1 2 3 4 5 6 7 8 9

Empty-initialized buckets of size 10

Page 31: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Grouping by distribute-and-collect

432 52

52

978

978

52

k elements associated with keys (integers in [0, n])

0 1 2 3 4 5 6 7 8 9

432 52

52 978 978

52

Empty-initialized buckets of size 10

Page 32: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Grouping by distribute-and-collect

k elements associated with keys (integers in [0, n])

0 1 2 3 4 5 6 7 8 9

432 52

52 978 978

52

432 52

52

978

978

52

Empty-initialized buckets of size 10

Page 33: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Grouping by distribute-and-collect

k elements associated with keys (integers in [0, n])

0 1 2 3 4 5 6 7 8 9

432 52

52 978 978

52

432 52

52

978

978

52 432

52 52

52

Empty-initialized buckets of size 10

Page 34: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Grouping by distribute-and-collect

k elements associated with keys (integers in [0, n])

0 1 2 3 4 5 6 7 8 9

432 52

52 978 978

52

432 52

52

978

978

52 432

52 52

52

978 978

432

52 52

52

978 978

O(k log10n) time Empty-initialized buckets of size 10

Page 35: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

Grouping by distribute-and-collect Radix grouping with radix s

k elements associated with keys (integers in [0, n])

0 1 2 3 … … s-4 s-3 s-2 s-1

432 52

52

978

978

52 432

52 52

52

978 978

O(k logsn) time

logsn steps

Empty-initialized buckets of size s

Page 36: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

!  For each refinement step, we conduct radix grouping of O(b) integer keys with radix s, which can be done in O(b logsn) time.

The total cost for gradual refinement

Proof.

The total cost for merging edge labels is O(b log n logsn) = O(b log2n / log s) = O((bn/s)log s).

Page 37: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

For any s ∈ [b, n], we can construct sparse suffix tree correctly with high probability in O(n+(bn/s)log s)) time using O(s) additional space.

!  When s = b, the time complexity is O(n log b). !  When s = b log b, the time complexity is O(n).

Monte-Carlo algorithm

Theorem

The total cost for FP computation is O((bn/s)log s).

The total cost for merging edge labels is O(b log n logsn) = O(b log2n / log s) = O((bn/s)log s).

Page 38: Sparse Suffix Sorting€¦ · Karp-Rabin fingerprints (rolling hash) [Karp and Rabin, `87] • q is a prime number. • r is chosen from [0, q-1] uniformly at random. For sufficiently

!  Follow the gradual refinement steps for this instance.

Exercise

1

3

6

9

7

doortodoortodortmund$

odoortodortmund$

doortodortmund$

ortodoortodortmund$

ortodortmund$

T = d o o r t o d o o r t o d o r t m u n d $ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

?

13

12

odortmund$

dortmund$