regular expressions - the university of edinburgh · 2011. 12. 4. · regular expressions •...
TRANSCRIPT
![Page 1: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/1.jpg)
Regular Expressions
• using REs to find patterns
• implementing REs using finite state automata
Sunday, 4 December 11
![Page 2: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/2.jpg)
REs and FSAs
• Regular expressions can be viewed as a textual way of specifying the structure of finite-state automata
• Finite-state automata are a way of implementing regular expressions
Sunday, 4 December 11
![Page 3: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/3.jpg)
Regular expressions• A formal language for specifying text strings• How can we search for any of these?woodchuckwoodchucksWoodchuckWoodchucks
Sunday, 4 December 11
![Page 4: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/4.jpg)
Regular Expressions for Textual Searches
Who does it?
Everybody:• Web search engines, CGI scripts• Information retrieval• Word processing (Emacs, vi, MSWord)• Linux tools (sed, awk, grep)• Computation of frequencies from corpora• Perl
Sunday, 4 December 11
![Page 6: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/6.jpg)
Regular Expression
• Regular expression: formula in algebraic notation for specifying a set of strings
• String: any sequence of alphanumeric characters
– letters, numbers, spaces, tabs, punctuation marks
• Regular expression search–pattern: specifying the set of strings we want to search
for
–corpus: the texts we want to search through
Sunday, 4 December 11
![Page 7: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/7.jpg)
Basic Regular Expression Patterns
• Case sensitive: d is not the same as D• Disjunctions: [dD] [0123456789]• Ranges: [0-9] [A-Z]• Negations: [^Ss] (only when ^ occurs immediately after [ )
• Optional characters: ? and *• Wild : . • Anchors: ^ and $, also \b and \B• Disjunction, grouping, and precedence: | (pipe)
Sunday, 4 December 11
![Page 8: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/8.jpg)
RE Match (single characters) Example Patterns Matched
[^A-Z]! not an uppercase letter “Oyfn pripetchik”
[^Ss] ! neither ‘S’ nor ‘s’ “I have no exquisite reason for’t”
[^\.] ! not a period “our resident Djinn”
[e/] ! either ‘e’ or ‘^’ “look up ˆ now”
a^b ! the pattern ‘a^b’ “look up aˆb now”
^T T at the beginning of a line “The Dow Jones closed up one”
Caret for negation, ^ , or anchor
Sunday, 4 December 11
![Page 9: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/9.jpg)
Optionality and CountersRE Match Example Patterns Matchedwoodchucks?! woodchuck or woodchucks “The woodchuck hid”
colou?r !! color or colour “comes in three colours”
(he){3} exactly 3 “he”s “and he said hehehe.”
? zero or one occurrences of previous char or expression* zero or more occurrences of previous char or expression+ one or more occurrences of previous char or expression{n} exactly n occurrences of previous char or expression{n, m} between n to m occurrences {n, } at least n occurrences
Sunday, 4 December 11
![Page 10: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/10.jpg)
Wild card ‘ .’
RE Match Example Patterns Matched
beg.n! ! ! any char between beg and n begin, beg’n, begunbig.*dog find lines where big and the big dog bit the little dog occur the big black dog bit the
Sunday, 4 December 11
![Page 11: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/11.jpg)
Operator Precedence Hierarchy
1. Parenthesis ( )2. Counters * + ? { }3. Sequences and Anchors the ^my end$4. Disjunction |
Examples: /moo+/
/try|ies/
/and|or/
Sunday, 4 December 11
![Page 12: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/12.jpg)
10/17/11Sunday, 4 December 11
![Page 13: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/13.jpg)
10/17/11
Example• Find all instances of the word “the” in a text.
Sunday, 4 December 11
![Page 14: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/14.jpg)
10/17/11
Example• Find all instances of the word “the” in a text. /the/
Sunday, 4 December 11
![Page 15: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/15.jpg)
10/17/11
Example• Find all instances of the word “the” in a text. /the/
Misses capitalized examples
Sunday, 4 December 11
![Page 16: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/16.jpg)
10/17/11
Example• Find all instances of the word “the” in a text. /the/
Misses capitalized examples
/[tT]he/
Sunday, 4 December 11
![Page 17: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/17.jpg)
10/17/11
Example• Find all instances of the word “the” in a text. /the/
Misses capitalized examples
/[tT]he/
Finds other or theology
Sunday, 4 December 11
![Page 18: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/18.jpg)
10/17/11
Example• Find all instances of the word “the” in a text. /the/
Misses capitalized examples
/[tT]he/
Finds other or theology
/\b[tT]he\b/
Sunday, 4 December 11
![Page 19: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/19.jpg)
10/17/11
Example• Find all instances of the word “the” in a text. /the/
Misses capitalized examples
/[tT]he/
Finds other or theology
/\b[tT]he\b/
/[^a-zA-Z][tT]he[^a-zA-Z]/
Sunday, 4 December 11
![Page 20: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/20.jpg)
10/17/11
Example• Find all instances of the word “the” in a text. /the/
Misses capitalized examples
/[tT]he/
Finds other or theology
/\b[tT]he\b/
/[^a-zA-Z][tT]he[^a-zA-Z]/Misses sentence-initial “the”
Sunday, 4 December 11
![Page 21: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/21.jpg)
10/17/11
Example• Find all instances of the word “the” in a text. /the/
Misses capitalized examples
/[tT]he/
Finds other or theology
/\b[tT]he\b/
/[^a-zA-Z][tT]he[^a-zA-Z]/Misses sentence-initial “the”
/(^|[^a-zA-Z])[tT]he[^a-zA-Z]/
Sunday, 4 December 11
![Page 22: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/22.jpg)
Errors• The process we just went through was
based on fixing two kinds of errorsMatching strings that we should not have
matched (there, then, other)False positives (Type I)
Not matching things that we should have matched (The)False negatives (Type II)
Sunday, 4 December 11
![Page 23: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/23.jpg)
A more complex example
Write a RE that will match “any PC with more than 500MHz and 32 Gb of disk space for less than $1000”.
• First a RE for prices/$[0-9]+/ # whole dollars/$[0-9]+\.[0-9][0-9]/ # dollars and cents/$[0-9]+(\.[0-9][0-9])?/ #cents optional/\b$[0-9]+(\.[0-9][0-9])?\b/ #word boundaries
Sunday, 4 December 11
![Page 24: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/24.jpg)
A more complex example
Write a RE that will match “any PC with more than 500MHz and 32 Gb of disk space for less than $1000”.
• First a RE for prices/$[0-9]+/ # whole dollars/$[0-9]+\.[0-9][0-9]/ # dollars and cents/$[0-9]+(\.[0-9][0-9])?/ #cents optional/\b$[0-9]+(\.[0-9][0-9])?\b/ #word boundaries
Sunday, 4 December 11
![Page 25: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/25.jpg)
Continued• Specifications for processor speed
/\b[0-9]+ *(MHz|[Mm]egahertz|Ghz|[Gg]igahertz)\b/
• Memory size/\b[0-9]+ *(Mb|[Mm]egabytes?)\b//\b[0-9](\.[0-9]+) *(Gb|[Gg]igabytes?)\b/
• Vendors/\b(Win(95|98|NT|dows *(NT|95|98|2000)?))\b//\b(Mac|Macintosh|Apple)\b/
Sunday, 4 December 11
![Page 26: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/26.jpg)
15
Substitutions and Memory• Substitutions: s/regexp/pattern/)
• Memory (\1, \2, etc. refer back to found matches) e.g., Put angle brackets around all integers in text
the 39 students ==> the <39> students
s/color/colour/
s/([0-9]+)/<\1>/
Sunday, 4 December 11
![Page 27: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/27.jpg)
Using Backslash
RE Match Example Patterns Matched\* ! an asterisk “*” “K*A*P*L*A*N”
\. ! a period “.” “Dr. Livingston, I presume”\? ! a question mark “Would you light my candle?”\n ! a newline
\t ! a tab
Sunday, 4 December 11
![Page 28: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/28.jpg)
Some Useful Aliases
RE Expansion Match Example Patterns\d [0-9] any digit Party of 5\D [ˆ0-9] any non-digit 99p \w [a-zA-Z0-9_] any alphanumeric or underscore 99p\W [ˆ\w] a non-alphanumeric !!!!\s [ \r\t\n\f] whitespace (sp, tab) \S [ˆ\s] Non-whitespace in Concord
Sunday, 4 December 11
![Page 29: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/29.jpg)
Substitutions and Memory• Substitutions: s/regexp/pattern/)
• Memory (\1, \2, etc. refer back to found matches) e.g., Put angle brackets around all integers in text
the 39 students ==> the <39> students
s/color/colour/
s/([0-9]+)/<\1>/
Sunday, 4 December 11
![Page 30: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/30.jpg)
Example
Swap first two words of line
s/(\w+) +(\w+)/\2 \1/
% perl -de 42DB<1> $s = “DOES HE LIKE BEER”;DB<2> print $s; DOES HE LIKE BEERDB<3> $s =~ s/(\w+) +(\w+)/\2 \1/;DB<4> print $s;HE DOES LIKE BEER
Sunday, 4 December 11
![Page 31: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/31.jpg)
Finite State Automata & Regular Expressions
• Regular expressions can be viewed as a textual way of specifying the structure of finite-state automata.• FSAs and their probabilistic relatives are at the
core of much of what we’ll do this quarter
Sunday, 4 December 11
![Page 32: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/32.jpg)
FSAs as Graphs• Let’s start with the sheep language
/baa+!/
Sunday, 4 December 11
![Page 33: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/33.jpg)
FSAs as Graphs
baa! baaa! baaaa! baaaaa!
...
• Let’s start with the sheep language
/baa+!/
Sunday, 4 December 11
![Page 34: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/34.jpg)
Sheep FSA• We can say the following things about this machine
It has 5 statesb, a, and ! are in its alphabet
q0 is the start state
q4 is an accept state
It has 5 transitions
Sunday, 4 December 11
![Page 35: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/35.jpg)
10/17/11 24
More Formally
• You can specify an FSA by enumerating the following things.The set of states: QA finite alphabet: ΣA start stateA set of accept/final statesA transition function that maps QxΣ to Q
Sunday, 4 December 11
![Page 36: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/36.jpg)
10/17/11 25
Yet Another View
• The guts of FSAs can be represented as tables
b a ! e0 11 22 2,33 4
4
Sunday, 4 December 11
![Page 37: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/37.jpg)
10/17/11 25
Yet Another View
• The guts of FSAs can be represented as tables
b a ! e0 11 22 2,33 4
4
If you’re in state 1 and you’re looking at an a, go to state 2
Sunday, 4 December 11
![Page 38: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/38.jpg)
Recognition• Recognition is the process of determining if a
string should be accepted by a machine• Or… it’s the process of determining if a string is in
the language we’re defining with the machine• Or… it’s the process of determining if a regular
expression matches a string• Those all amount the same thing in the end
Sunday, 4 December 11
![Page 39: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/39.jpg)
Recognition
• Traditionally, (Turing’s notion) this process is depicted with a tape.
Sunday, 4 December 11
![Page 40: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/40.jpg)
10/17/11 28
Recognition
• Start in the start state• Examine the current input• Consult the table• Go to a new state and update the tape pointer.• Until you run out of tape.
Sunday, 4 December 11
![Page 41: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/41.jpg)
10/17/11
Tracing a Rejection
a b a ! b
q0
Slide from Dorr/MonzSunday, 4 December 11
![Page 42: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/42.jpg)
10/17/11
Tracing a Rejection
a b a ! b
q0
0 1 2 3 4
b a a !a
Slide from Dorr/MonzSunday, 4 December 11
![Page 43: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/43.jpg)
10/17/11
Tracing a Rejection
a b a ! b
q0
0 1 2 3 4
b a a !a
REJECT
Slide from Dorr/MonzSunday, 4 December 11
![Page 44: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/44.jpg)
10/17/11
Tracing an Accept
b a a a
q0 q1 q2 q3 q3 q4
!
Slide from Dorr/MonzSunday, 4 December 11
![Page 45: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/45.jpg)
10/17/11
Tracing an Accept
b a a a
q0 q1 q2 q3 q3 q4
!
0 1 2 3 4
b a a !a
Slide from Dorr/MonzSunday, 4 December 11
![Page 46: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/46.jpg)
10/17/11
Tracing an Accept
b a a a
q0 q1 q2 q3 q3 q4
!
0 1 2 3 4
b a a !a
ACCEPT
Slide from Dorr/MonzSunday, 4 December 11
![Page 47: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/47.jpg)
Regular expression search
http://www.inf.ed.ac.uk/teaching/courses/il1/2010/labs/2010-10-07/regex.xml
Search for the following expressions
– Alice– brillig– m.m– c..c– [A-Z][A-Z]+– J|j– (J|j)– \(.*\)– l.*l– l.*?l– l.+l
31
http://www.learn-javascript-tutorial.com/RegularExpressions.cfm#h1.2
Sunday, 4 December 11What does . stand for? (any character)* is for repetition - zero or more times[aeiou] is for any vowel(i|o) is for either i or o\(.*\)
These are called regular expressionsWe can construct finite state machines to recognise regular expressions.
![Page 48: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/48.jpg)
More ExamplesFinite State Automata and Regular Expressions
Sunday, 4 December 11
![Page 49: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/49.jpg)
rara
rr a a
r r
S
Sunday, 4 December 11
Rara is similar
But rarr should send us back to look for the first a
![Page 50: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/50.jpg)
ra(ra)*
r a
r r
S
Sunday, 4 December 11
ra(ra)*
![Page 51: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/51.jpg)
suss this?
s[ ] su susss u s
us
[] s su sus •s s s sus s
u [] su [] •. [] [] [] []
inpu
t
stateFSAs can be represented as:
- graphs- transition tables
If you’re in state s and you’re looking at a u, go to state su
Sunday, 4 December 11
Now we try to write finite state machines that will search for regular expressions.
Start with something simple
SUSS
The path from start to success is obvious.
What to do when you find a wrong letter part-way through is harder.
After s, if we find another s we don’t go back to the beginning to look for an s - we just found one
So we go back to the state where we’re looking for a u
Similarly for susu – we go back to the state looking for the second s
![Page 52: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/52.jpg)
f
(flip)|(flop)
fl[] fl(i|o)
f lp
o
f fl[] fl(i|o)f l i
pS
Sunday, 4 December 11
![Page 53: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/53.jpg)
fl(i|o)p
f fl[ fl(i|f l i p
o
S
Sunday, 4 December 11
![Page 54: Regular Expressions - The University of Edinburgh · 2011. 12. 4. · Regular Expressions • Regular expressions can be viewed as a textual way of specifying the structure of finite-state](https://reader036.vdocuments.site/reader036/viewer/2022062311/5fe566f38b4ab55f8852563f/html5/thumbnails/54.jpg)
• any character is a regexp
• matches itself
• if R and S are regexps, so is RS
• matchesa match for R followed by a match for S
• if R and S are regexps, so is R|S
• matchesany match for R or S (or both)
• if R is a regexp, so is R* (R+)
• matchesany sequence of 0 (1) or more matches for R
regular expressions
1909-1994
Kleene *, +
*+
Stephen Cole Kleene
Sunday, 4 December 11