![Page 1: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/1.jpg)
Computational Language
Finite State Machines and Regular Expressions
![Page 2: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/2.jpg)
Plan Regular expressions
Introduction Operators Disjunction, precedence, substitution
Finite State Machines Link with regular expressions Determinisitic FSA Non-deterministic FSA
Lab session reg ex. implementation in UNIX (egrep)
![Page 3: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/3.jpg)
Regular Expressions Basis of all web-based and word-
processor-based searches Definition 1. An algebraic notation
for describing a string Definition 2. A set of rules that you
can use to specify one or more items, such as words in a file, by using a single character string (Sarwar et al.)
![Page 4: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/4.jpg)
Regular Expressions regular expression, text corpus regular expression algebra has
variants: Perl, Unix tools Unix tools: egrep, sed, awk
![Page 5: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/5.jpg)
Regular Expressions Find occurrences of /Nokia/ in the
text egrep -n ‘Nokia’ nokia_corpus.txt
![Page 6: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/6.jpg)
Regular Expressionsegrep -n ‘Nokia’ nokia_corpus.txt
1:.Nokia shares slide after warning 4:HELSINKI (Reuters) - Nokia has cut its sales growth forecast for 7:markets sharply down.Nokia warned group sales would grow only 13:better than expected first-quarter profits from Nokia, 15:Finland's Nokia and rivals have been hit by debt-laden telecoms 19:Nokia said in a statement. "The speed of this transition has been 20:slower than was anticipated earlier this year." Nokia saw its market 26:"The problem with Nokia is that it looks like its going ex-growth," 29:with a raft of new functions, was hurting. "Nokia had been perceived 36:Nokia cast another shadow over the sector by slashing its forecast for 41:be sold this year. "Nokia now believes that general weakness in all key 43:Nokia said. The market was caught by surprise, especially as Nokia had 46:said Nokia had been "a bit optimistic overall" in its forecasts. "We 49:adjust to weaker demand, Nokia followed the path of rivals in announcing 51:thousands of jobs in the group last year. Despite the bleak outlook, Nokia 57:Nokia also warned second quarter sales would grow only between two and 61:operating efficiencies, strong brand and leading product portfolio," Nokia 62:said. Nokia said it expected pro forma earnings per share (EPS) of 0.18-0.20 67:protecting the margins -- but Nokia has to be a top-line growth story as well, 69:analyst Susan Anthony.But Nokia, known for its strength in forecasting the 79:Nokia's own forecast. Nokia's January-March net sales came in worse than the
![Page 7: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/7.jpg)
Regular Expressions Suppress case distinctions
Nokia or nokia
![Page 8: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/8.jpg)
Regular Expressions set operatoregrep -n ‘[Nn]okia’
nokia_corpus.txt
![Page 9: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/9.jpg)
Regular Expressions Suppress other features, for
example singular share or plural shares
![Page 10: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/10.jpg)
Regular Expressions optional operatoregrep -n ‘shares?’
nokia_corpus.txt
![Page 11: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/11.jpg)
Regular Expressions
egrep -n ‘shares?’ nokia_corpus.txt
1:.Nokia shares slide after warning 6:weak demand, sending its shares 12 percent lower and European 62:said. Nokia said it expected pro forma earnings per share (EPS) of 0.18-0.20 85:lion share of the company's sales and earnings, saw sales fall seven percent
![Page 12: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/12.jpg)
Regular Expressions Kleene operators:
/string*/ “zero or more occurrences of previous character”
/string+/ “1 or more occurrences of previous character”
![Page 13: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/13.jpg)
Regular Expressions Wildcard operator:
/string./ “any character after the previous character”
![Page 14: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/14.jpg)
Regular Expressions Wildcard operator:
/string./ “any character after the previous character”
Combine wildcard and kleene: /string.*/ “zero or more instances of any
character after the previous character” /string.+/ “one or more instances of any
character after the previous character”
![Page 15: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/15.jpg)
Regular Expressions
egrep –n ‘profit.*’ nokia_corpus.txt
13:better than expected first-quarter profits from Nokia, 52:remains the only profitable handset maker among the "big three" suppliers 60:company's profitability outlook remains strong, driven by increasing 81:Pre-tax profit was 1.31 billion euros.The company's struggling networks unit
![Page 16: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/16.jpg)
Regular Expressions Anchors
Beginning of line operator: ^egrep ‘^said’ nokia_corpus.txt End of line operator: $egrep ‘$said’ nokia_corpus.txt
![Page 17: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/17.jpg)
Regular Expressions Disjunction:
set operator/[Ss]tring/ “a string which begins with either S
or s” Range/[A-Z]tring/ “a string beginning with a capital
letter” pipe |/string1|string2/ “either string 1 or string 2”
![Page 18: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/18.jpg)
Regular Expressions Disjunction
egrep –n ‘weak|warning|drop’ nokia_corpus.txt
egrep –n ‘weak.*|warn.*|drop.*’ nokia_corpus.txt
![Page 19: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/19.jpg)
Regular Expressions
Negation: /[^a-z]tring“ any strings that does not begin
with a small letter”
![Page 20: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/20.jpg)
Regular Expressions Precedence
1. Parantheses2. Kleene and optional operators * . ?3. Anchors and sequences4. Disjunction operator |
(a) /supply | iers/ /supply/ /iers/(b) /suppl(y|iers)/ /supply/ suppliers/
![Page 21: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/21.jpg)
Regular Expressions Substitution
sed ‘s/word1/word2/ corpus.txt
Me: I am feeling a bit depressed todaysed ‘s/I am/sorry to hear that you are/’
corpus.txt
![Page 22: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/22.jpg)
Regular Expressions Substitution
sed ‘s/word1/word2/ corpus.txt
Me: I am feeling a bit depressed todaysed ‘s/I am/sorry to hear that you are/’
corpus.txt
Eliza: sorry to hear that you are feeling a bit depressed today
![Page 23: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/23.jpg)
Regular Expressions Substitution
sed ‘s/word1/word2/ corpus.txt
Me: I wish I could shake this depressionsed
Eliza: I am sure you could shake this depression
![Page 24: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/24.jpg)
Regular Expressions Substitution
sed ‘s/word1/word2/’ corpus.txt
Me: I wish I could shake this depressionsed ‘s/wish I/am sure you/’ corpus.txt
Eliza: I am sure you could shake this depression
![Page 25: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/25.jpg)
Finite State Transition Networks
Finite State Automata (FSA) Just as a regular expression, used to
recognise a set of stringse.g. egrep –n ‘baa+!’ corpus.txt
![Page 26: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/26.jpg)
Finite State Transition Networks
Finite State Automata (FSA) Just as a regular expression, used to
recognise a set of strings Represented as a directed graph
![Page 27: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/27.jpg)
Finite State Transition Networks
Finite State Automata (FSA) Just as a regular expression, used to
recognise a set of strings Represented as a directed graph Set of nodes representing states
![Page 28: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/28.jpg)
Finite State Transition Networks
Finite State Automata (FSA) Just as a regular expression, used to
recognise a set of strings Represented as a directed graph Set of nodes representing states Set of arcs, links between nodes,
representing transitions between states
![Page 29: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/29.jpg)
Finite State Transition Networks
Finite State Automata (FSA) Just as a regular expression, used to
recognise a set of strings Represented as a directed graph Set of nodes representing states Set of arcs, links between nodes,
representing transitions between states Arcs are labelled
![Page 30: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/30.jpg)
Finite State Automata How does it work?
used to recognise a set of strings
![Page 31: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/31.jpg)
Finite State Automata How does it work?
used to recognise a set of strings Candidate input string represented as
a segmented tape with a symbol for each cell
![Page 32: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/32.jpg)
Finite State Automata How does it work?
used to recognise a set of strings Candidate input string represented as
a segmented tape with a symbol for each cell
String slowly fed into machine
![Page 33: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/33.jpg)
Finite State Automata How does it work?
used to recognise a set of strings Candidate input string represented as a
segmented tape with a symbol for each cell String slowly fed into machine If symbol on input matches symbol on arc,
then A) move to next state B) advance one symbol on input string C) keep going till final state or input ends
![Page 34: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/34.jpg)
Finite State Automata How does it work?
used to recognise a set of strings Candidate input string represented as a
segmented tape with a symbol for each cell String slowly fed into machine If symbol on input matches symbol on arc,
then A) move to next state B) advance one symbol on input string C) keep going till final state or input ends
Otherwise: stop and reject string
![Page 35: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/35.jpg)
Finite State Automata State Transition Table
State Input b a ! 0 1 Ø Ø 1 2 3 4:
![Page 36: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/36.jpg)
Finite State Automata State Transition Table
State Input b a ! 0 1 Ø Ø 1 Ø 2 Ø 2 3 4:
![Page 37: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/37.jpg)
Finite State Automata State Transition Table
State Input b a ! 0 1 Ø Ø 1 Ø 2 Ø 2 Ø 3 Ø 3 4:
![Page 38: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/38.jpg)
Finite State Automata State Transition Table
State Input b a ! 0 1 Ø Ø 1 Ø 2 Ø 2 Ø 3 Ø 3 Ø 3 4 4:
![Page 39: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/39.jpg)
Finite State Automata State Transition Table
State Input b a ! 0 1 Ø Ø 1 Ø 2 Ø 2 Ø 3 Ø 3 Ø 3 4 4: Ø Ø Ø
![Page 40: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/40.jpg)
Finite State Automata Algorithm for FSA (Jurafsky and Martin, p. 37)
function D-RECOGNIZE(tape, machine) returns accept or reject index <- Beginning of tape current-state <- Initial state of machine loop if End of input has been reached then if current-state is an accept state then return accept else return reject elseif transition-table [current-state, tape [index]] is empty then return reject else Current-state <- transition-table [current-state, tape [index]] Index <- index + 1 end
![Page 41: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/41.jpg)
Finite State Automata FSAs and recognition
![Page 42: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/42.jpg)
Finite State Automata FSAs and recognition FSAs and generation
At each transition print out label of arc At final state stop printing
![Page 43: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/43.jpg)
Finite State Automata Deterministic FSAs
An FSA whose recognition behaviour is fully determined by the state it is in and the input symbol it is looking at
![Page 44: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/44.jpg)
Finite State Automata Deterministic FSAs
An FSA whose recognition behaviour is fully determined by the state it is in and the input symbol it is looking at
Non-deterministic FSAs An FSA with decision points
![Page 45: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/45.jpg)
Finite State Automata Deterministic FSAs Non-deterministic FSAs
An FSA with decision points Self-loop may be in a particular state Arcs may have ε transitions
![Page 46: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/46.jpg)
Finite State Automata Deterministic FSAs Non-deterministic FSA
Backup: set a marker that can be returned to
Look-ahead: look ahead at input Parallelism: look at alternative paths in
parallel
![Page 47: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/47.jpg)
Finite State Automata Non-deterministic FSA: state transition
table State Input b a ! ε 0 1 Ø Ø Ø 1 Ø 2 Ø Ø 2 Ø 2, 3 Ø Ø 3 Ø Ø 4 Ø 4: Ø Ø Ø Ø
![Page 48: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/48.jpg)
Finite State Automata Formal language Set of strings Finite symbol set, alphabet
![Page 49: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/49.jpg)
Finite State Automata Formal language Set of strings Finite symbol set, alphabet
Σ = {a, b, !}
![Page 50: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/50.jpg)
Finite State Automata Formal language Set of strings Finite symbol set, alphabet L(m) = {baa!, ba!, baaa!,…}“formal language characterised by m”
m = model L = formal language
![Page 51: Computational Language Finite State Machines and Regular Expressions](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d4c5503460f94a2a5e7/html5/thumbnails/51.jpg)
Finite State Automata Formal language Set of strings Finite symbol set, alphabet L(m) = {baa!, ba!, baaa!,…} A formal language models a
fragment of a natural language