regular languages regular expressions finite-state automata torbjörn lager, stockholm university
TRANSCRIPT
Regular LanguagesRegular ExpressionsFinite-State Automata
Torbjörn Lager, Stockholm University
FST - Torbjörn Lager, UU 2
Languages
Sets of stringsExample:
{“ac” “abc” “abbc” “abbbc” ...}
FST - Torbjörn Lager, UU 3
Finite-State Automata
Directed graphs consisting of states, and arcs labelled with symbols. For example:
20 1a c
b
FST - Torbjörn Lager, UU 4
Regular expressions
Example:
Note: Atomic symbols (e.g. ‘a’) denote languages (e.g. {“a”})
Operations (here: concatenation and kleene-star) are operations over languages
[a b* c]
FST - Torbjörn Lager, UU 5
Regular expressions, regular languages and automata
Regular expressions
Finite-state
automata
Regular languages
compile into
generateaccept
denote
FST - Torbjörn Lager, UU 6
Regular expressions, regular languages and automata
compiles into
generatesaccepts
denotes
20 1a c
b
[a b* c]
{“ac” “abc” “abbc” ...}
FST - Torbjörn Lager, UU 7
Regular expression operators
Concatenation A BUnion A | BIteration (Kleene-star) A*Difference A - BIntersection A & B
Grouping of expressions [A]
FST - Torbjörn Lager, UU 8
Examples
a b {“ab”}a [b|c] {“ab” “ac”}a* [b|c]* {“” “a” “ab” ... }[a|b] & [b|c] {“b”}[a|b] - b {“a”}[a|b] - [b|a] {}
FST - Torbjörn Lager, UU 9
Special symbols
? The any symbol?* The universal language[] The empty-string
language0 (or “”) The empty string (epsilon)
FST - Torbjörn Lager, UU 10
Regular expression operators
Optionality (A)Kleene-plus A+Complement ~A
Containment $ARestriction A => B _
C
FST - Torbjörn Lager, UU 11
Examples
a (b) c {“ac” “abc”}a b+ c {“abc” “abbc” “abbbc” ...}~[a b c] {“” ... “a” ... “ab” ...
“abca” ..}$[a|b] {“a” ... “abba” ...
“abcd” ...}b => a _ c {“” ... “a” ... “ccc” ...
“abc” ..}
FST - Torbjörn Lager, UU 12
Component technologies in FST
Word lists and lexicaTokenisersMorphological analysersPart-of-speech taggersParsers
FST - Torbjörn Lager, UU 13
Applications of FST
Named-entity recognitionInformation extractionCorpus linguisticsSpelling- and grammar checkingSpeech processing applications
FST - Torbjörn Lager, UU 14
The Xerox Finite-State Tool
Compiles extended regular expressions into finite-state machines (automata and transducers)
Allows the user to display, examine and modify the machines
FST - Torbjörn Lager, UU 15
Non-deterministic FSAs
At least one state has more than one transition leading from it labelled with the same symbol
0 1a b
b
2
FST - Torbjörn Lager, UU 16
Determinization of FSAs
Any non-deterministic FSA can be transformed into an equivalent deterministic FSA.
Example:
Determinize for efficiency!
20 1a b
b
20 1a b
b
FST - Torbjörn Lager, UU 17
Minimization of FSAs
Any (deterministic) FSA can be transformed into an equivalent FSA that has a minimal number of states.
Minimize for space!
FST - Torbjörn Lager, UU 18
Representing word lists
Think of a word list as a regular language
Use the calculus of regular expressions to query and update the wordlist
Determinize for speed!Minimize for space!
FST - Torbjörn Lager, UU 19
Various equivalences
(A) = A|[]A+ = A A* A+ = A* - []A - B = A & ~B~A = ?* - A$A = ?* A ?*~[A | B] = ~A & ~B~[A & B] = ~A | ~B
FST - Torbjörn Lager, UU 20
Various equivalences
A - A = ~[?*]A | ~[?*] = AA [] = AA ~[?*] = ~[?*]A & ?* = AA | ?* = ?*
FST - Torbjörn Lager, UU 21
Important theoretical results
Kleene’s theorem (concerning FSAs)Closure properties of regular
languages and regular relationsDecidability
FST - Torbjörn Lager, UU 22
FSAs and regular expressions
Kleene’s theorem: Any language recognised by an FSA is denoted by a regular expression and any language denoted by a regular expression can be recognised by a FSA.
FST - Torbjörn Lager, UU 23
Regex to FSA to regex
Regularexpressions
NondeterministicFSA
Nondeterministic FSAwith epsilon transitions
DeterministicFSA
Picture adapted from Hopcroft & Ullman 1979
FST - Torbjörn Lager, UU 24
From regular expressions to finite-state automata
The only really necessary operators: Disjunction Concatenation Iteration
Sidenote: Compare regular grammars: A --> x B
A --> x(where A and B are nonterminals, and where x is a sequence of terminals)
FST - Torbjörn Lager, UU 25
Closure properties of regular languages
A set is said to be closed under an operation iff applying the operation to members of the set will never take us outside the set
Example: if A and B are regular languages, then [A|B] is always regular. Therefore regular languages are closed under union.
FST - Torbjörn Lager, UU 26
Decidability
Given one automaton A: Is the string S a string in L(A) ? Does L(A) contain any strings at all ? Is L(A) equivalent to ?* ?
Given two automata A1 and A2: Is L(A1) a subset of L(A2) ?
Are L(A1) and L(A2) equivalent ?
Do L(A1) and L(A2) overlap ?
FST - Torbjörn Lager, UU 27