suffix and factor automata and combinatorics on words
TRANSCRIPT
![Page 1: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/1.jpg)
Suffix and Factor Automataand Combinatorics on Words
Gabriele Fici
Workshop PRIN 2007–2009Varese – 5 September 2011
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 2: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/2.jpg)
The Suffix Automaton
Definition (A. Blumer et al. 85 — M. Crochemore 86)The Suffix Automaton of the word w is the minimaldeterministic automaton recognizing the suffixes of w .
ExampleThe SA of w = aabbabb
0 1 2 3 4 5 6 7
3′′ 4′′
3′
a a b b a b b
b
b
a
b
b
a
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 3: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/3.jpg)
Algorithmic Construction
The SA allows the search of a pattern v in a text w in time andspace O(|v |). Moreover:
Theorem (A. Blumer et al. 85 — M. Crochemore 86)The SA of a word w over a fixed alphabet Σ can be built in timeand space O(|w |).
The SA has several applications, for example in
pattern matchingmusic retrievalspam detectionsearch of characteristic expressions in literary worksspeech recordings alignment. . .
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 4: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/4.jpg)
Algorithmic Construction
The SA allows the search of a pattern v in a text w in time andspace O(|v |). Moreover:
Theorem (A. Blumer et al. 85 — M. Crochemore 86)The SA of a word w over a fixed alphabet Σ can be built in timeand space O(|w |).
The SA has several applications, for example in
pattern matchingmusic retrievalspam detectionsearch of characteristic expressions in literary worksspeech recordings alignment. . .
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 5: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/5.jpg)
One Way to Build the SA
Build a naif non-deterministic automaton:
w = aabbabb
0 1 2 3 4 5 6 7a a b b a b b
Determinize by subset construction:
{0, 1, 2, . . . , 7} {1, 2, 5} {2} {3} {4} {5} {6} {7}
{3, 6} {4, 7}
{3, 4, 6, 7}
a a b b a b b
b
b
a
b
b
a
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 6: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/6.jpg)
One Way to Build the SA
Build a naif non-deterministic automaton:
w = aabbabb
0 1 2 3 4 5 6 7a a b b a b b
Determinize by subset construction:
{0, 1, 2, . . . , 7} {1, 2, 5} {2} {3} {4} {5} {6} {7}
{3, 6} {4, 7}
{3, 4, 6, 7}
a a b b a b b
b
b
a
b
b
a
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 7: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/7.jpg)
Ending Positions
We associate to each factor v of w the set of ending positionsof v in w . We note this set Endsetw (v).
Examplew = a a b b a b b
1 2 3 4 5 6 7
Endsetw (ba) = {5}, Endsetw (abb) = Endsetw (bb) = {4,7}.
Define on Fact(w) the equivalence:
u ∼SA v ⇐⇒ Endsetw (u) = Endsetw (v)
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 8: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/8.jpg)
Ending Positions
We associate to each factor v of w the set of ending positionsof v in w . We note this set Endsetw (v).
Examplew = a a b b a b b
1 2 3 4 5 6 7
Endsetw (ba) = {5}, Endsetw (abb) = Endsetw (bb) = {4,7}.
Define on Fact(w) the equivalence:
u ∼SA v ⇐⇒ Endsetw (u) = Endsetw (v)
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 9: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/9.jpg)
Ending Positions
u ∼SA v ⇐⇒ Endsetw (u) = Endsetw (v)
Remarku ∼SA v if and only if for any z ∈ Σ∗ one has
uz ∈ Suff(w)⇐⇒ vz ∈ Suff(w)
RemarkFact(w)/ ∼SA is in bijection with the set of states of the SA of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 10: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/10.jpg)
The Number of States of the SA
The number of states (classes) of the SA of w is therefore
|SA(w)| = |Fact(w)/ ∼SA |
The bounds are well known:
|w |+ 1 ≤ |SA(w)| ≤ 2|w | − 1
The upper bound is reached for w = ab|w |−1, with a 6= b.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LSA of words such that|SA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 11: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/11.jpg)
The Number of States of the SA
The number of states (classes) of the SA of w is therefore
|SA(w)| = |Fact(w)/ ∼SA |
The bounds are well known:
|w |+ 1 ≤ |SA(w)| ≤ 2|w | − 1
The upper bound is reached for w = ab|w |−1, with a 6= b.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LSA of words such that|SA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 12: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/12.jpg)
The Number of States of the SA
The number of states (classes) of the SA of w is therefore
|SA(w)| = |Fact(w)/ ∼SA |
The bounds are well known:
|w |+ 1 ≤ |SA(w)| ≤ 2|w | − 1
The upper bound is reached for w = ab|w |−1, with a 6= b.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LSA of words such that|SA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 13: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/13.jpg)
The Number of States of the SA
The number of states (classes) of the SA of w is therefore
|SA(w)| = |Fact(w)/ ∼SA |
The bounds are well known:
|w |+ 1 ≤ |SA(w)| ≤ 2|w | − 1
The upper bound is reached for w = ab|w |−1, with a 6= b.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LSA of words such that|SA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 14: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/14.jpg)
The Number of States of the SA
The number of states (classes) of the SA of w is therefore
|SA(w)| = |Fact(w)/ ∼SA |
The bounds are well known:
|w |+ 1 ≤ |SA(w)| ≤ 2|w | − 1
The upper bound is reached for w = ab|w |−1, with a 6= b.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LSA of words such that|SA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 15: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/15.jpg)
Special Factors
Definitionv is a left special factor of w if there exist a 6= b such thatav and bv are factors of w
v is a right special factor of w if there exist a 6= b such thatva and vb are factors of w
v is a bispecial factor of w if it is both left and right special
Example
w = aabbabb
ab is left special
b is right speciala and b are bispecial
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 16: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/16.jpg)
Special Factors
Definitionv is a left special factor of w if there exist a 6= b such thatav and bv are factors of w
v is a right special factor of w if there exist a 6= b such thatva and vb are factors of w
v is a bispecial factor of w if it is both left and right special
Example
w = aabbabb
ab is left special
b is right speciala and b are bispecial
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 17: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/17.jpg)
Special Factors
Definitionv is a left special factor of w if there exist a 6= b such thatav and bv are factors of w
v is a right special factor of w if there exist a 6= b such thatva and vb are factors of w
v is a bispecial factor of w if it is both left and right special
Example
w = aabbabb
ab is left specialb is right special
a and b are bispecial
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 18: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/18.jpg)
Special Factors
Definitionv is a left special factor of w if there exist a 6= b such thatav and bv are factors of w
v is a right special factor of w if there exist a 6= b such thatva and vb are factors of w
v is a bispecial factor of w if it is both left and right special
Example
w = aabbabb
ab is left specialb is right speciala and b are bispecial
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 19: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/19.jpg)
The Number of States of the SA
Theorem (G. Fici 09)
|SA(w)| = |w |+ 1 + SLw − Pw
SLw = number of left special factors of w
Pw = length of the shortest prefix of w which is not left special
Example (w = aabbabb)
0 1 2 3 4 5 6 7
3′′ 4′′
3′
a a b b a b b
b
b
a
b
b
a
SLw = 5 since the left special factors of w are ε,a,b,ab,abb
Pw = 2 since a is left special in w|SA(w)| = |w |+ 1 + SL
w − Pw = 7 + 1 + 5− 2 = 11
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 20: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/20.jpg)
The Number of States of the SA
Theorem (G. Fici 09)
|SA(w)| = |w |+ 1 + SLw − Pw
SLw = number of left special factors of w
Pw = length of the shortest prefix of w which is not left special
Example (w = aabbabb)
0 1 2 3 4 5 6 7
3′′ 4′′
3′
a a b b a b b
b
b
a
b
b
a
SLw = 5 since the left special factors of w are ε,a,b,ab,abb
Pw = 2 since a is left special in w|SA(w)| = |w |+ 1 + SL
w − Pw = 7 + 1 + 5− 2 = 11
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 21: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/21.jpg)
Example
Theorem
|SA(w)| = |w |+ 1 + SLw − Pw
Corollary (M. Sciortino and L.Q. Zamboni 07 — G. Fici 09)w ∈ LSA if and only if every left special factor of w is a prefix ofw.
If |Σ| = 2, LSA is the set of finite prefixes of standard Sturmianwords, i.e., the set of left special factors of Sturmian words.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 22: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/22.jpg)
Example
Theorem
|SA(w)| = |w |+ 1 + SLw − Pw
Corollary (M. Sciortino and L.Q. Zamboni 07 — G. Fici 09)w ∈ LSA if and only if every left special factor of w is a prefix ofw.
If |Σ| = 2, LSA is the set of finite prefixes of standard Sturmianwords, i.e., the set of left special factors of Sturmian words.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 23: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/23.jpg)
Standard Sturmian words
A standard Sturmian word is the cutting sequence of a straightline of irrational slope starting from the origin on the discreteplane.
LemmaA right infinite binary word w is a standard Sturmian word if andonly if the left special factors of w are prefixes of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 24: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/24.jpg)
Binary Words
Let fw denote the factor complexity of w , i.e., the functioncounting the number of distinct factors of w of each length.
A binary word is a word w such that fw (1) = 2, i.e., having 2distinct factors of length 1.
Lemma
Let w be a binary word. Then SLw = |w | − Hw .
SLw = number of left special factors of w
Hw = length of the shortest unrepeated prefix of w
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 25: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/25.jpg)
Binary Words
For binary words we thus have the formula:
|SA(w)| = 2|w |+ 1− (Hw + Pw )
Hw = length of the shortest unrepeated prefix of w
Pw = length of the shortest prefix of w which is not left special
As a corollary, we obtain a new characterization of the set ofprefixes of standard Sturmian words:
CorollaryA binary word w is a prefix of a standard Sturmian word if andonly if |w | = Hw + Pw .
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 26: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/26.jpg)
Binary Words
For binary words we thus have the formula:
|SA(w)| = 2|w |+ 1− (Hw + Pw )
Hw = length of the shortest unrepeated prefix of w
Pw = length of the shortest prefix of w which is not left special
As a corollary, we obtain a new characterization of the set ofprefixes of standard Sturmian words:
CorollaryA binary word w is a prefix of a standard Sturmian word if andonly if |w | = Hw + Pw .
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 27: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/27.jpg)
Example
Example (w = aabbabb)
0 1 2 3 4 5 6 7
3′′ 4′′
3′
a a b b a b b
b
b
a
b
b
a
Hw = 2 since aa occurs only once in wPw = 2 since a is left special in w
|SA(w)| = 2 · 7 + 1− (2 + 2) = 11
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 28: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/28.jpg)
The Number of Edges
What about the number of edges Ew?
The bounds on Ew are well known:
|w | ≤ Ew ≤ 3|w | − 4
For binary words we have the formula:
Lemma (G. Fici 09)
Ew = |SA(w)|+ |G(w)| − 1
G(w) is the union of the sets of bispecial factors and rightspecial prefixes of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 29: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/29.jpg)
The Number of Edges
What about the number of edges Ew?
The bounds on Ew are well known:
|w | ≤ Ew ≤ 3|w | − 4
For binary words we have the formula:
Lemma (G. Fici 09)
Ew = |SA(w)|+ |G(w)| − 1
G(w) is the union of the sets of bispecial factors and rightspecial prefixes of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 30: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/30.jpg)
The Number of Edges
What about the number of edges Ew?
The bounds on Ew are well known:
|w | ≤ Ew ≤ 3|w | − 4
For binary words we have the formula:
Lemma (G. Fici 09)
Ew = |SA(w)|+ |G(w)| − 1
G(w) is the union of the sets of bispecial factors and rightspecial prefixes of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 31: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/31.jpg)
Example
Example (w = aabbabb)
0 1 2 3 4 5 6 7
3′′ 4′′
3′
a a b b a b b
b
b
a
b
b
a
G(w) = BIS(w) ∪ (Pref (w) ∩ RS(w)) = {ε,a,b} ∪ {ε,a}
Ew = |SA(w)|+ |G(w)| − 1 = 11 + 3− 1 = 13
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 32: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/32.jpg)
The Class of LSP Words
Theoremw ∈ LSA if and only if the left special factors of w are prefixes ofw.
Corollary
If |Σ| = 2, then w ∈ LSA if and only if w is a prefix of a standardSturmian word.
Corollary
If |Σ| > 2, then:
w is prefix of a standard episturmian word⇒ w ∈ LSA.
w is prefix of a standard ϑ-episturmian word⇒ w ∈ LSA(ϑ being any involutory antimorphism of Σ∗, i.e., such thatϑ(uv) = ϑ(v)ϑ(u) and ϑ ◦ ϑ = id).
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 33: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/33.jpg)
The Class of LSP Words
Theoremw ∈ LSA if and only if the left special factors of w are prefixes ofw.
Corollary
If |Σ| = 2, then w ∈ LSA if and only if w is a prefix of a standardSturmian word.
Corollary
If |Σ| > 2, then:
w is prefix of a standard episturmian word⇒ w ∈ LSA.
w is prefix of a standard ϑ-episturmian word⇒ w ∈ LSA(ϑ being any involutory antimorphism of Σ∗, i.e., such thatϑ(uv) = ϑ(v)ϑ(u) and ϑ ◦ ϑ = id).
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 34: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/34.jpg)
The Class of LSP Words
Theoremw ∈ LSA if and only if the left special factors of w are prefixes ofw.
Corollary
If |Σ| = 2, then w ∈ LSA if and only if w is a prefix of a standardSturmian word.
Corollary
If |Σ| > 2, then:
w is prefix of a standard episturmian word⇒ w ∈ LSA.
w is prefix of a standard ϑ-episturmian word⇒ w ∈ LSA(ϑ being any involutory antimorphism of Σ∗, i.e., such thatϑ(uv) = ϑ(v)ϑ(u) and ϑ ◦ ϑ = id).
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 35: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/35.jpg)
The Class of LSP Words
DefinitionA right infinite word w is LSP if the left special factors of w areprefixes of w .
So, if |Σ| = 2, LSP is the class of standard Sturmian words.
ProblemCharacterize the class of LSP words, over an arbitrary fixedalphabet Σ.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 36: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/36.jpg)
The Class of LSP Words
DefinitionA right infinite word w is LSP if the left special factors of w areprefixes of w .
So, if |Σ| = 2, LSP is the class of standard Sturmian words.
ProblemCharacterize the class of LSP words, over an arbitrary fixedalphabet Σ.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 37: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/37.jpg)
The Class of LSP Words
Example
Let φ be the morphism defined by a 7→ abc, b 7→ aab, and φ(F )the image of the Fibonacci word under φ. For each n > 0, φ(F )has 1 l.s.f. but more than 1 r.s.f. of length n, and φ(F ) is LSP.
φ(F ) = abcaababcabcaababcaababcabc · · ·
So:
The set of factors of an LSP word is not closed under reversal,in general.
Thus, the class of standard (ϑ-)episturmian words is strictlyincluded in the class of LSP words.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 38: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/38.jpg)
The Class of LSP Words
Example
Let φ be the morphism defined by a 7→ abc, b 7→ aab, and φ(F )the image of the Fibonacci word under φ. For each n > 0, φ(F )has 1 l.s.f. but more than 1 r.s.f. of length n, and φ(F ) is LSP.
φ(F ) = abcaababcabcaababcaababcabc · · ·
So:
The set of factors of an LSP word is not closed under reversal,in general.
Thus, the class of standard (ϑ-)episturmian words is strictlyincluded in the class of LSP words.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 39: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/39.jpg)
The Factor Automaton
Definition (A. Blumer et al. 85 — M. Crochemore 86)The Factor Automaton of the word w is the minimaldeterministic automaton recognizing the factors of w .
ExampleThe FA of w = aabbabb
0 1 2 3 4 5 6 7
3′
a a b b a b b
ba
b
b
0-0
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 40: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/40.jpg)
Comparison Between the SA and the FA
Example (w=aabbabb)
0 1 2 3 4 5 6 7
3′′ 4′′
3′
a a b b a b b
b
b
a
b
b
a
0 1 2 3 4 5 6 7
3′
a a b b a b b
ba
b
b
0-0
States 3 and 3′′ and states 4 and 4′′ have been identifiedGabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 41: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/41.jpg)
Future
DefinitionThe future of v in w is what follows, in w , the occurrences of v :
Futw (v) = {z ∈ Σ∗ : vz ∈ Fact(w)}
Examplew = abbaabab
Futw (ba) = {ε,a,ab,aba,abab,b}
Define on Fact(w) the equivalence:
u ∼FA v ⇐⇒ Futw (u) = Futw (v)
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 42: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/42.jpg)
Future
DefinitionThe future of v in w is what follows, in w , the occurrences of v :
Futw (v) = {z ∈ Σ∗ : vz ∈ Fact(w)}
Examplew = abbaabab
Futw (ba) = {ε,a,ab,aba,abab,b}
Define on Fact(w) the equivalence:
u ∼FA v ⇐⇒ Futw (u) = Futw (v)
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 43: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/43.jpg)
Future
DefinitionThe future of v in w is what follows, in w , the occurrences of v :
Futw (v) = {z ∈ Σ∗ : vz ∈ Fact(w)}
Examplew = abbaabab
Futw (ba) = {ε,a,ab,aba,abab,b}
Define on Fact(w) the equivalence:
u ∼FA v ⇐⇒ Futw (u) = Futw (v)
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 44: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/44.jpg)
Future
u ∼FA v ⇐⇒ Futw (u) = Futw (v)
Remarku ∼FA v if and only if for any z ∈ Σ∗ one has
uz ∈ Fact(w)⇐⇒ vz ∈ Fact(w)
RemarkFact(w)/ ∼FA is in bijection with the set of states of the FA of w.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 45: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/45.jpg)
The Number of States of the FA
The number of states (classes) of the FA of w is therefore
|FA(w)| = |Fact(w)/ ∼FA |
The bounds are well known:
|w |+ 1 ≤ |FA(w)| ≤ 2|w | − 2
The upper bound is reached for w = ab|w |−2c, with a 6= b 6= c.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LFA of words such that|FA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 46: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/46.jpg)
The Number of States of the FA
The number of states (classes) of the FA of w is therefore
|FA(w)| = |Fact(w)/ ∼FA |
The bounds are well known:
|w |+ 1 ≤ |FA(w)| ≤ 2|w | − 2
The upper bound is reached for w = ab|w |−2c, with a 6= b 6= c.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LFA of words such that|FA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 47: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/47.jpg)
The Number of States of the FA
The number of states (classes) of the FA of w is therefore
|FA(w)| = |Fact(w)/ ∼FA |
The bounds are well known:
|w |+ 1 ≤ |FA(w)| ≤ 2|w | − 2
The upper bound is reached for w = ab|w |−2c, with a 6= b 6= c.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LFA of words such that|FA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 48: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/48.jpg)
The Number of States of the FA
The number of states (classes) of the FA of w is therefore
|FA(w)| = |Fact(w)/ ∼FA |
The bounds are well known:
|w |+ 1 ≤ |FA(w)| ≤ 2|w | − 2
The upper bound is reached for w = ab|w |−2c, with a 6= b 6= c.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LFA of words such that|FA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 49: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/49.jpg)
The Number of States of the FA
The number of states (classes) of the FA of w is therefore
|FA(w)| = |Fact(w)/ ∼FA |
The bounds are well known:
|w |+ 1 ≤ |FA(w)| ≤ 2|w | − 2
The upper bound is reached for w = ab|w |−2c, with a 6= b 6= c.
And for the lower bound?
Problem (J. Berstel and M. Crochemore)Characterize the language LFA of words such that|FA(w)| = |w |+ 1.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 50: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/50.jpg)
Inclusion
RemarkIf u ∼SA v, then u ∼FA v.
The converse is not true:
Example
w = abcaca
Futw (bc) = Futw (c) whilst Endsetw (bc) 6= Endsetw (c).
Clearly
|FA(w)| ≤ |SA(w)| and so LSA ⊂ LFA
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 51: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/51.jpg)
Inclusion
RemarkIf u ∼SA v, then u ∼FA v.
The converse is not true:
Example
w = abcaca
Futw (bc) = Futw (c) whilst Endsetw (bc) 6= Endsetw (c).
Clearly
|FA(w)| ≤ |SA(w)| and so LSA ⊂ LFA
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 52: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/52.jpg)
Inclusion
LSA ⊂ LFA
Example
w = abcc
We have |SA(w)| = 6 > |w |+ 1, so w /∈ LSA
Nevertheless |FA(w)| = 5 = |w |+ 1, so w ∈ LFA
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 53: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/53.jpg)
The Number of States of the FA
Does a formula for |FA(w)| exist?
Definition (A. Blumer et al. 84)The stem of w is the shortest non-empty prefix v of the longestrepeated suffix k of w such that v appears as prefix of kpreceded by letter b and all other occurrences of v in w arepreceded by letter a 6= b, whenever such a prefix exists;otherwise it is undefined.
Example
stem(aabbab) = ab
stem(abacbb) is undefined
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 54: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/54.jpg)
The Number of States of the FA
Does a formula for |FA(w)| exist?
Definition (A. Blumer et al. 84)The stem of w is the shortest non-empty prefix v of the longestrepeated suffix k of w such that v appears as prefix of kpreceded by letter b and all other occurrences of v in w arepreceded by letter a 6= b, whenever such a prefix exists;otherwise it is undefined.
Example
stem(aabbab) = ab
stem(abacbb) is undefined
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 55: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/55.jpg)
The Number of States of the FA
Does a formula for |FA(w)| exist?
Definition (A. Blumer et al. 84)The stem of w is the shortest non-empty prefix v of the longestrepeated suffix k of w such that v appears as prefix of kpreceded by letter b and all other occurrences of v in w arepreceded by letter a 6= b, whenever such a prefix exists;otherwise it is undefined.
Example
stem(aabbab) = ab
stem(abacbb) is undefined
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 56: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/56.jpg)
The Number of States of the FA
Lemma (A. Blumer et al. 84)The SA-classes that are identified by the FA-equivalencecorrespond to the prefixes x of the longest repeated suffix of wsuch that |x | ≥ |stem(w)|.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 57: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/57.jpg)
The Number of States of the FA
So we can define a new parameter:
Definition
SKw =
|stem(w)| if stem(w) is defined
Kw otherwise
Kw = length of the shortest unrepeated suffix of w
This allows us to derive a formula for |FA(w)|:
Theorem
|FA(w)| = |w |+ 1 + SLw − Pw + SKw − Kw
SLw = number of left special factors of w
Pw = length of the shortest prefix of w which is not left special
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 58: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/58.jpg)
The Number of States of the FA
So we can define a new parameter:
Definition
SKw =
|stem(w)| if stem(w) is defined
Kw otherwise
Kw = length of the shortest unrepeated suffix of w
This allows us to derive a formula for |FA(w)|:
Theorem
|FA(w)| = |w |+ 1 + SLw − Pw + SKw − Kw
SLw = number of left special factors of w
Pw = length of the shortest prefix of w which is not left special
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 59: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/59.jpg)
The Language LFA
Let k = uv ′ be the longest repeated suffix of w , where u is thelongest prefix of k that is also prefix of w . Then v ′ is thecharacteristic suffix of w .
Any word w can be uniquely factorized as w = w ′v ′.
w u� �k
u v ′� �k
u v ′ w ′
TheoremThe word w ∈ LFA if and only if its prefix w ′ ∈ LSA.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 60: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/60.jpg)
The Language LFA
Let k = uv ′ be the longest repeated suffix of w , where u is thelongest prefix of k that is also prefix of w . Then v ′ is thecharacteristic suffix of w .
Any word w can be uniquely factorized as w = w ′v ′.
w u� �k
u v ′� �k
u v ′ w ′
TheoremThe word w ∈ LFA if and only if its prefix w ′ ∈ LSA.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 61: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/61.jpg)
The Language LFA
Let k = uv ′ be the longest repeated suffix of w , where u is thelongest prefix of k that is also prefix of w . Then v ′ is thecharacteristic suffix of w .
Any word w can be uniquely factorized as w = w ′v ′.
w u� �k
u v ′� �k
u v ′ w ′
TheoremThe word w ∈ LFA if and only if its prefix w ′ ∈ LSA.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 62: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/62.jpg)
Examples
ExampleLet w = abaacaaa. The longest repeated suffix of w is v = aa,and the longest prefix of aa which is also a prefix of w is a.Then v ′ = a and w ′ = abaacaa. We have w /∈ LFA, andw ′ /∈ LSA.
ExampleLet w = abaababbaa. The longest repeated suffix of w isv = baa. Then w ′ = abaabab ∈ LSA, so w ∈ LFA.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 63: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/63.jpg)
Examples
ExampleLet w = abaacaaa. The longest repeated suffix of w is v = aa,and the longest prefix of aa which is also a prefix of w is a.Then v ′ = a and w ′ = abaacaa. We have w /∈ LFA, andw ′ /∈ LSA.
ExampleLet w = abaababbaa. The longest repeated suffix of w isv = baa. Then w ′ = abaabab ∈ LSA, so w ∈ LFA.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 64: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/64.jpg)
Remarks
Take |Σ| = 2.
DefinitionA word w is trapezoidal if it has at most n + 1 factors of length n
DefinitionA word w is rich if it contains |w |+ 1 palindromic factors
We have:
Proposition (A. de Luca 99)w Sturmian⇒ w trapezoidal
Proposition (A. de Luca, A. Glen and L.Q. Zamboni 08)w trapezoidal⇒ w rich
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 65: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/65.jpg)
Remarks
ExampleLet w = abaababbaa. The longest repeated suffix of w isv = baa. Then w ′ = abaabab ∈ LSA, so w ∈ LFA.
Remarks:
w is not balanced, since aa,bb ∈ Fact(w)
w is not trapezoidal, since it has four factors of length 2
w is not rich, since it contains only 10 = |w | palindromes:ε,a,b,aa,bb,aba,bab,abba,baab,abaaba
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 66: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/66.jpg)
Conclusions and Future Work
We gave a characterization of the words in LSA and LFA.
In agenda:
Investigate LSP words.
Apply an analogous approach to other data structures, e.g.suffix tree, suffix array, etc.
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 67: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/67.jpg)
G. Fici (2009)Combinatorics of Finite Words and Suffix AutomataProc. of the 3rd International Conference on AlgebraicInformatics. LNCS 5725: 250–259
G. Fici (2010)Factor Automata and Special FactorsProc. of the 13th Mons Theoretical Computer Science Days
G. Fici (2011)Special Factors and the Combinatorics of Suffix and FactorAutomataTheoret. Comput. Sci. 412(29): 3604–3615
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words
![Page 68: Suffix and Factor Automata and Combinatorics on Words](https://reader031.vdocuments.site/reader031/viewer/2022021211/6206574d8c2f7b173006dbef/html5/thumbnails/68.jpg)
Thank you!
Gabriele Fici Suffix and Factor Automata and Combinatorics on Words