lexical analysis - philadelphia
TRANSCRIPT
![Page 1: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/1.jpg)
Lexical Analysis
![Page 2: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/2.jpg)
Where We Are
Lexical Analysis
Syntax Analysis
Semantic Analysis
IR Generation
IR Optimization
Code Generation
Optimization
Source Code
Machine
Code
![Page 3: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/3.jpg)
++ip;
while (ip < z)
![Page 4: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/4.jpg)
do[for] = new 0;
![Page 5: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/5.jpg)
T_Do [ T_For T_New T_IntConst
0
d o [ f o r ] = n e w 0 ;
] =
do[for] = new 0;
![Page 6: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/6.jpg)
Scanning a Source File
( i < z ) \ \ + i p ;p + + +w h i l e ( 1 3 7 < i ) \n \t + + i ;
( i < z ) \ \ + i p ;p + + +w h i l e ( 1 3 7 < i ) \n \t + + i ;
( i < z ) \ \ + i p ;p + + +w h i l e ( 1 3 7 < i ) \n \t + + i ;
( i < z ) \ \ + i p ;p + + +w h i l e ( 1 3 7 < i ) \n \t + + i ;
( i < z ) \ \ + i p ;p + + +w h i l e ( 1 3 7 < i ) \n \t + + i ;
( i < z ) \ \ + i p ;p + + +w h i l e ( 1 3 7 < i ) \n \t + + i ;
![Page 7: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/7.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While
![Page 8: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/8.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While
This is called a token . You can
think o f i t as an enumerated type
r ep resent ing what logical ent i t y we
read ou t o f the source code .
T he piece of the or ig inal p rog r am
f r o m which we made the token is
called a lexeme .
![Page 9: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/9.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While
![Page 10: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/10.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While
![Page 11: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/11.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While
Sometimes we will discard a lexeme
r at her t han sto r ing it f o r lat er use.
Her e, we ig nor e whit espace, since it
has no bearing on the meaning o f
the program.
![Page 12: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/12.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While
![Page 13: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/13.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While
![Page 14: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/14.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While
![Page 15: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/15.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While (
![Page 16: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/16.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While (
![Page 17: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/17.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While (
![Page 18: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/18.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While (
![Page 19: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/19.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While (
![Page 20: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/20.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While (
![Page 21: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/21.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While ( T_IntConst
137
![Page 22: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/22.jpg)
Scanning a Source File
+w h i l e ( 1 3 7 < i ) \n \t + + i ;
T_While ( T_IntConst
137
Some tokens can have
at t r ibutes that s tore
ext r a in for mat ion about
the token . Here we
s tore which in teger is
represented.
![Page 23: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/23.jpg)
Goals of Lexical Analysis
● Convert from physical description of a program into sequence of of tokens .●
●
E a c h token represents one logical piece of the source file – a keyword, the name of a variable, etc .
E a c h token is associated with a lexeme .
● The actual text of the token: “137 ,” “ int ,” etc.
● E a c h token may have optional attributes .
●
●
Extra information derived from the text – perhaps a numeric value.
The token sequence will be used in the parser to recover the program structure.
![Page 24: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/24.jpg)
Choosing Tokens
![Page 25: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/25.jpg)
What Tokens are Useful Her e?
for (int k = 0; k < myArray[5]; ++k) {
cout << k << endl;
}
![Page 26: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/26.jpg)
What Tokens are Useful Her e?
for (int k = 0; k < myArray[5]; ++k) {
cout << k << endl;
}
for
int
<<
=
(
)
{
}
;
<
[
]+
+
![Page 27: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/27.jpg)
What Tokens are Useful Her e?
for (int k = 0; k < myArray[5]; ++k) {
cout << k << endl;
}
for
int
<<
=
(
)
{
}
;
<
[
]++
Identifier
IntegerConstant
![Page 28: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/28.jpg)
Choosing Good Tokens
●
●
Very much dependent on the language.
Typically:
●
●
●
●
Give keywords their own tokens.
Give different punctuation symbols their own tokens.
Group lexemes representing identifiers, numeric constants, strings, etc . into their own groups.
Discard irrelevant information (whitespace, comments)
![Page 29: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/29.jpg)
Thanks to Prof. AlexAiken
Scanning is Hard
● FO RT R A N : Whitespace is irrelevant
DO 5 I = 1,25
DO 5 I = 1.25
![Page 30: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/30.jpg)
Thanks to Prof. AlexAiken
Scanning is Hard
● FO RT R A N : Whitespace is irrelevant
DO 5 I = 1,25
DO5I = 1.25
![Page 31: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/31.jpg)
Thanks to Prof. AlexAiken
Scanning is Hard
● FO RT R A N : Whitespace is irrelevant
DO 5 I = 1,25
DO5I = 1.25
● C a n be difficult to tell when to partition input.
![Page 32: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/32.jpg)
Thanks to Prof. AlexAiken
Scanning is Hard
● C + + : Nested template declarations
vector<vector<int>> myVector
![Page 33: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/33.jpg)
Thanks to Prof. AlexAiken
Scanning is Hard
● C + + : Nested template declarations
vector < vector < int >> myVector
![Page 34: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/34.jpg)
Thanks to Prof. AlexAiken
Scanning is Hard
● C + + : Nested template declarations
(vector < (vector < (int >> myVector)))
![Page 35: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/35.jpg)
Thanks to Prof. AlexAiken
Scanning is Hard
● C + + : Nested template declarations
(vector < (vector < (int >> myVector)))
● Again, can be difficult to determine where to split .
![Page 36: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/36.jpg)
Thanks to Prof. AlexAiken
Scanning is Hard
● PL/1: Keywords can be used as identifiers.
![Page 37: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/37.jpg)
Thanks to Prof. AlexAiken
Scanning is Hard
● PL/1: Keywords can be used as identifiers.
IF THEN THEN THEN = ELSE; ELSE ELSE = IF
![Page 38: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/38.jpg)
Thanks to Prof. AlexAiken
Scanning is Hard
● PL/1: Keywords can be used as identifiers.
IF THEN THEN THEN = ELSE; ELSE ELSE = IF
![Page 39: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/39.jpg)
Thanks to Prof. AlexAiken
Scanning is Hard
● PL/1: Keywords can be used as identifiers.
IF THEN THEN THEN = ELSE; ELSE ELSE = IF
● C a n be difficult to determine how to label lexemes.
![Page 40: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/40.jpg)
Chal lenges in Scanning
●
●
●
Ho w do we determine which lexemes are associated with each token?
When there are multiple ways we could scan the input, how do we know which one to pick?
Ho w do we address these concerns efficiently?
![Page 41: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/41.jpg)
Associating Lexemes with Tokens
![Page 42: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/42.jpg)
Lexemes and Tokens
●
●
Tokens give a way to categorize lexemes by what information they provide.
S o m e tokens might be associated with only a single lexeme:
●
●
Tokens for keywords like if and whileprobably only match those lexemes exactly.
S o m e tokens might be associated with lots of different lexemes:
● All variable names, all possible numbers, all possible strings, etc .
![Page 43: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/43.jpg)
Sets of Lexemes
●
●
●
●
Idea: Associate a set of lexemes with each token.
We might associate the “number” token with the set { 0, 1, 2, … , 10, 11, 12, … }
We might associate the “string” token with the set { "", "a", "b", "c", … }
We might associate the token for the keyword while with the set { while } .
![Page 44: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/44.jpg)
Ho w do we describe which (potentially infinite) set of lexemes is associated with
each token type?
![Page 45: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/45.jpg)
Formal Languages
●
●
A formal language is a set of strings.
M a ny infinite languages have finite descriptions:
●
●
●
Define the language using an automaton.
Define the language using a grammar.
Define the language using a regular expression.
●
●
We can use these compact descriptions of the language to define sets of strings.
Over the course of this class, we will use all of these approaches.
![Page 46: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/46.jpg)
Regular Expressions
●
●
●
Regu lar expressions are a family of descriptions that can be used to capture certain languages (the regular languages).
Often provide a compact and human-readable description of the language.
U s e d as the basis for numerous software systems, including the flex tool we will
use in this course.
![Page 47: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/47.jpg)
Atomic Regular Expressions
●
●
●
The regular expressions we will use in this course begin with two simple building blocks.
The symbol ε is a regular expression matches the empty string.
For any symbol a, the symbol a is a re g ular expression that just m atc hes a.
![Page 48: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/48.jpg)
Compound Regular Expressions
●
●
●
●
If R 1 and R 2 are regular expressions, R 1 R 2 is a regular
expression represents the concatenation of the
languages of R 1 and R 2 .
If R 1 and R 2 are regular expressions, R 1 |R 2 is a regular
expression representing the union of R 1 and R 2 .
If R is a regular expression, R * is a regular expression for the Kleene closure of R .
If R is a regular expression, (R ) is a regular expression with the same meaning as R .
![Page 49: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/49.jpg)
Operator Precedence
●
●
Regular expression operator precedence is
(R)
R*
R 1R 2
R1 | R2
S o ab*c|d is parsed as ((a(b*))c)|d
![Page 50: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/50.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings containing00 as a substring:
(0 | 1)*00(0 | 1)*
![Page 51: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/51.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings containing00 as a substring:
(0 | 1)*00(0 | 1)*
![Page 52: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/52.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings containing00 as a substring:
(0 | 1)*00(0 | 1)*
110111001010000
11111011110011111
![Page 53: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/53.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings containing00 as a substring:
(0 | 1)*00(0 | 1)*
110111001010000
11111011110011111
![Page 54: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/54.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings of length exactly four:
![Page 55: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/55.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings of length exactly four:
(0|1)(0|1)(0|1)(0|1)
![Page 56: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/56.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings of length exactly four:
(0|1)(0|1)(0|1)(0|1)
![Page 57: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/57.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings of length exactly four:
(0|1)(0|1)(0|1)(0|1)
0000101011111000
![Page 58: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/58.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings of length exactly four:
(0|1)(0|1)(0|1)(0|1)
0000101011111000
![Page 59: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/59.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings of length exactly four:
(0|1){4}
0000101011111000
![Page 60: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/60.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings of length exactly four:
(0|1){4}
0000101011111000
![Page 61: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/61.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings that contain at most one zero:
![Page 62: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/62.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings that contain at most one zero:
1*(0 | ε)1*
![Page 63: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/63.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings that contain at most one zero:
1*(0 | ε)1*
![Page 64: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/64.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings that contain at most one zero:
1*(0 | ε)1*
111101111111110111
0
![Page 65: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/65.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings that contain at most one zero:
1*(0 | ε)1*
111101111111110111
0
![Page 66: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/66.jpg)
Simple Regular Expressions
●
●
Suppose the only characters are 0 and 1.
H e re is a regular expression for strings that contain at most one zero:
1*0?1*
111101111111110111
0
![Page 67: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/67.jpg)
Applied Regular Expressions
●
●
Suppose our alphabet is a, @ , and ., where a
represents “some letter.”
A regular expression for email addresses is
aa* (.aa*)* @ aa*.aa* (.aa*)*
![Page 68: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/68.jpg)
Applied Regular Expressions
●
●
Suppose our alphabet is a, @ , and ., where a
represents “some letter.”
A regular expression for email addresses is
aa* (.aa*)* @ aa*.aa* (.aa*)*
![Page 69: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/69.jpg)
Applied Regular Expressions
●
●
Suppose our alphabet is a, @ , and ., where a
represents “some letter.”
A regular expression for email addresses is
aa* (.aa*)* @ aa*.aa* (.aa*)*
![Page 70: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/70.jpg)
Applied Regular Expressions
●
●
Suppose our alphabet is a, @ , and ., where a
represents “some letter.”
A regular expression for email addresses is
aa* (.aa*)* @ aa*.aa* (.aa*)*
![Page 71: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/71.jpg)
Applied Regular Expressions
●
●
Suppose our alphabet is a, @ , and ., where a
represents “some letter.”
A regular expression for email addresses is
aa* (.aa*)* @ aa*.aa* (.aa*)*
[email protected]@whitehouse.gov
![Page 72: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/72.jpg)
Applied Regular Expressions
●
●
Suppose our alphabet is a, @ , and ., where a
represents “some letter.”
A regular expression for email addresses is
a+ (.aa*)* @ aa*.aa* (.aa*)*
![Page 73: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/73.jpg)
Applied Regular Expressions
●
●
Suppose our alphabet is a, @ , and ., where a
represents “some letter.”
A regular expression for email addresses is
a+ (.a+)* @ a+.a+ (.a+)*
![Page 74: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/74.jpg)
Applied Regular Expressions
●
●
Suppose our alphabet is a, @ , and ., where a
represents “some letter.”
A regular expression for email addresses is
a+ (.a+)* @ a+.a+ (.a+)*
![Page 75: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/75.jpg)
Applied Regular Expressions
●
●
Suppose our alphabet is a, @ , and ., where a
represents “some letter.”
A regular expression for email addresses is
a+ (.a+)* @ a+ (.a+)+
![Page 76: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/76.jpg)
Applied Regular Expressions
●
●
Suppose our alphabet is a, @ , and ., where a
represents “some letter.”
A regular expression for email addresses is
a+(.a+)*@a+(.a+)+
![Page 77: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/77.jpg)
Applied Regular Expressions
●
●
Suppose that our alphabet is all AS C I I characters.
A regular expression for even numbers is
(+|-)?(0|1|2|3|4|5|6|7|8|9)*(0|2|4|6|8)
![Page 78: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/78.jpg)
Applied Regular Expressions
●
●
Suppose that our alphabet is all AS C I I characters.
A regular expression for even numbers is
(+|-)?(0|1|2|3|4|5|6|7|8|9)*(0|2|4|6|8)
42+1370-3248
-9999912
![Page 79: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/79.jpg)
Applied Regular Expressions
●
●
Suppose that our alphabet is all AS C I I characters.
A regular expression for even numbers is
(+|-)?(0|1|2|3|4|5|6|7|8|9)*(0|2|4|6|8)
42+1370-3248
-9999912
![Page 80: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/80.jpg)
Applied Regular Expressions
●
●
Suppose that our alphabet is all AS C I I characters.
A regular expression for even numbers is
(+|-)?[0123456789]*[02468]
42+1370-3248
-9999912
![Page 81: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/81.jpg)
Applied Regular Expressions
●
●
Suppose that our alphabet is all AS C I I characters.
A regular expression for even numbers is
(+|-)?[0-9]*[02468]
42+1370-3248
-9999912
![Page 82: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/82.jpg)
Matching Regular Expressions
![Page 83: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/83.jpg)
Implementing Regular Expressions
●
●
Regular expressions can be implemented using finite automata .
There are two main kinds of finite automata:
●
●
NFA s (nondeterministic finite automata), which we'll see in a second, and
DFAs (deterministic finite automata), which we'll see later.
● Automata are best explained by example. . .
![Page 84: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/84.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
![Page 85: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/85.jpg)
" "start
A,B,C,...,Z
Each circle is a s ta te o f the
auto mato n. T he auto mato n' s
conf igurat ion is determined
by what s t a t e ( s ) i t is in .
A Simple Automaton
![Page 86: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/86.jpg)
" "start
A,B,C,...,Z
These arrows are called
t ransit ions . The automaton
chang es which sta t e ( s) it is in
by fol lowing transi t ions.
A Simple Automaton
![Page 87: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/87.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" H E Y A "
![Page 88: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/88.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" H E Y A "
T he aut omat on ta kes a st r ing
as input and decides whether
t o accept o r r e j e c t the s t r i ng .
![Page 89: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/89.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" H E Y A "
![Page 90: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/90.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" H E Y A "
![Page 91: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/91.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" H E Y A "
![Page 92: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/92.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" H E Y A "
![Page 93: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/93.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" H E Y A "
![Page 94: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/94.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" H E Y A "
![Page 95: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/95.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" H E Y A "
![Page 96: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/96.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" H E Y A "
![Page 97: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/97.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" H E Y A "
![Page 98: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/98.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" H E Y A "
![Page 99: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/99.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" H E Y A "
T he d oub le cir cle ind icat e s that th is
s tate is an accepting s t a t e . The
auto mato n accept s the st r ing if it
ends in an accepting s ta te .
![Page 100: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/100.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" " " " " "
![Page 101: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/101.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" " " " " "
![Page 102: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/102.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" " " " " "
![Page 103: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/103.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" " " " " "
![Page 104: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/104.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" " " " " "
![Page 105: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/105.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" " " " " "
![Page 106: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/106.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" " " " " "
T her e is no transit ion on "
here, so the automaton
d ies and r ej ects.
![Page 107: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/107.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" " " " " "
T her e is no transit ion on "
here, so the automaton
d ies and r ej ects.
![Page 108: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/108.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" A B C
![Page 109: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/109.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" A B C
![Page 110: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/110.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" A B C
![Page 111: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/111.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" A B C
![Page 112: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/112.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" A B C
![Page 113: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/113.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" A B C
![Page 114: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/114.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" A B C
![Page 115: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/115.jpg)
" "start
A,B,C,...,Z
A Simple Automaton
" A B C
T his is not an accept ing
sta t e , so the auto mato n
r e j ec t s .
![Page 116: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/116.jpg)
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 117: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/117.jpg)
A Mo r e Complex Automaton
01 0
10 1
start0, 1
Not ice t hat t her e ar e mul t iple t r ansit ions
def i ned her e on 0 and 1. I f we r ead a
0 or 1 her e, we f o llow b o t h t r ansit ions
and enter multiple s ta tes.
![Page 118: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/118.jpg)
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 119: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/119.jpg)
0 1 1 1 0 1
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 120: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/120.jpg)
0 1 1 1 0 1
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 121: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/121.jpg)
0 1 1 1 0 1
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 122: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/122.jpg)
0 1 1 1 0 1
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 123: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/123.jpg)
0 1 1 1 0 1
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 124: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/124.jpg)
0 1 1 1 0 1
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 125: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/125.jpg)
0 1 1 1 0 1
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 126: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/126.jpg)
0 1 1 1 0 1
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 127: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/127.jpg)
0 1 1 1 0 1
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 128: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/128.jpg)
0 1 1 1 0 1
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 129: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/129.jpg)
0 1 1 1 0 1
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 130: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/130.jpg)
0 1 1 1 0 1
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 131: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/131.jpg)
0 1 1 1 0 1
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 132: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/132.jpg)
0 1 1 1 0 1
A Mo r e Complex Automaton
01 0
10 1
start0, 1
![Page 133: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/133.jpg)
A Mo r e Complex Automaton
01 0
10 1
start0, 1
0 1 1 1 0 1Since we are in a t least
one accep t ing sta t e , the
automaton accepts.
![Page 134: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/134.jpg)
An Even More Complex Automaton
a, c
b, c
start
ε
ε
ε
a, b
c
b
a
![Page 135: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/135.jpg)
An Even More Complex Automaton
a, c
b, c
start
ε
ε
ε
a, b
c
b
a
These are called ε - t rans i t ions . These
t r ansit ions ar e f ollo wed aut omat i cally and
without consuming any inpu t .
![Page 136: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/136.jpg)
b c b a
An Even More Complex Automaton
a, c
b, c
start
ε
ε
ε
a, b
c
b
a
![Page 137: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/137.jpg)
b c b a
An Even More Complex Automaton
a, c
b, c
start
ε
ε
ε
a, b
c
b
a
![Page 138: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/138.jpg)
An Even More Complex Automaton
a, c
b c b a
b, c
start
ε
ε
ε
a, b
c
b
a
![Page 139: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/139.jpg)
An Even More Complex Automaton
a, c
b c b a
b, c
start
ε
ε
ε
a, b
c
b
a
![Page 140: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/140.jpg)
An Even More Complex Automaton
a, c
b c b a
b, c
start
ε
ε
ε
a, b
c
b
a
![Page 141: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/141.jpg)
An Even More Complex Automaton
a, c
b c b a
b, c
start
ε
ε
ε
a, b
c
b
a
![Page 142: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/142.jpg)
An Even More Complex Automaton
a, c
b c b a
b, c
start
ε
ε
ε
a, b
c
b
a
![Page 143: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/143.jpg)
b c b a
An Even More Complex Automaton
a, c
b, c
start
ε
ε
ε
a, b
c
b
a
![Page 144: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/144.jpg)
b c b a
An Even More Complex Automaton
a, c
b, c
start
ε
ε
ε
a, b
c
b
a
![Page 145: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/145.jpg)
b c b a
An Even More Complex Automaton
a, c
b, c
start
ε
ε
ε
a, b
c
b
a
![Page 146: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/146.jpg)
b c b a
An Even More Complex Automaton
a, c
b, c
start
ε
ε
ε
a, b
c
b
a
![Page 147: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/147.jpg)
Simulating an N F A
●
●
Keep track of a set of states, initially the start state and everything reachable by ε-moves.
For each character in the input:
●
●
● Maintain a set of next states, initially empty.
For each current state:
– Follow all transitions labeled with the current letter.
– Add these states to the set of new states.
Add every state reachable by an ε-move to the set of next states.
● Complexity: O(mn2) for strings of length m and automata with n states.
![Page 148: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/148.jpg)
From Regular Expressions to NFAs
●
●
There is a (beautiful!) procedure from converting a regular expression to an N FA .
Associate each regular expression with an N FA with the following properties:
●
●
●
There is exactly one accepting state.
There are no transitions out of the accepting state.
There are no transitions into the starting state.
● These restrictions are stronger than necessary, but make the construction easier.
start
![Page 149: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/149.jpg)
Base C a s e s
εstart
Automaton for ε
astart
Automaton for single character a
![Page 150: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/150.jpg)
Construction for R 1 R 2
![Page 151: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/151.jpg)
R1
Construction for R 1 R 2
R2
start start
![Page 152: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/152.jpg)
Construction for R 1 R 2
R1
R2
start
![Page 153: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/153.jpg)
R1
Construction for R 1 R 2
R2
start
ε
![Page 154: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/154.jpg)
R1
Construction for R 1 R 2
R2
start
ε
![Page 155: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/155.jpg)
Construction for R 1 | R 2
![Page 156: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/156.jpg)
Construction for R 1 | R 2
start
R1
start
R2
![Page 157: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/157.jpg)
Construction for R 1 | R 2
start
R1
start
start
R2
![Page 158: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/158.jpg)
Construction for R 1 | R 2
R1start
ε
ε
R2
![Page 159: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/159.jpg)
Construction for R 1 | R 2
R1start
ε
ε
R2
![Page 160: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/160.jpg)
Construction for R 1 | R 2
R1start
ε
ε
ε
ε
R2
![Page 161: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/161.jpg)
Construction for R 1 | R 2
R1start
ε
ε
ε
ε
R2
![Page 162: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/162.jpg)
Construction for R*
![Page 163: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/163.jpg)
Construction for R*
R
start
![Page 164: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/164.jpg)
Construction for R*
R
startstart
![Page 165: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/165.jpg)
Construction for R*
R
startstart
ε
![Page 166: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/166.jpg)
Construction for R*
R
start
ε
ε ε
![Page 167: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/167.jpg)
Construction for R*
R
start
ε
ε ε
ε
![Page 168: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/168.jpg)
Construction for R*
R
start
ε
ε ε
ε
![Page 169: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/169.jpg)
Overall Result
●
●
●
Any regular expression of length n can be converted into an N FA with O(n) states.
C a n determine whether a string of lengthm matches a regular expression of lengthn in time O(mn2).
We'll see how to make this O(m) later (this is independent of the complexity of the regular expression!)
![Page 170: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/170.jpg)
Chal lenges in Scanning
●
●
●
Ho w do we determine which lexemes are associated with each token?
When there are multiple ways we could scan the input, how do we know which one to pick?
Ho w do we address these concerns efficiently?
![Page 171: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/171.jpg)
Chal lenges in Scanning
●
●
●
Ho w do we determine which lexemes are associated with each token?
When there are multiple ways we could scan the input, how do we know which one to pick?
Ho w do we address these concerns efficiently?
![Page 172: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/172.jpg)
Lexing Ambiguities
T_For
T_Identifier
for
[A-Za-z_][A-Za-z0-9_]*
![Page 173: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/173.jpg)
Lexing Ambiguities
T_For
T_Identifier
for
[A-Za-z_][A-Za-z0-9_]*
f o r t
![Page 174: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/174.jpg)
Lexing Ambiguities
T_For
T_Identifier
for
[A-Za-z_][A-Za-z0-9_]*
f o r t
f o r t
f o r t
f o r t
f o r t
f o r t
f
f
f
f
o r t
o r t
o r t
o r t
![Page 175: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/175.jpg)
Conflict Resolution
●
●
●
Assume all tokens are specified as regular expressions.
Algorithm: Left-to-right scan .
Tiebreaking rule one: Max ima l munch .
● Always match the longest possible prefix of the remaining text .
![Page 176: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/176.jpg)
Lexing Ambiguities
T_For
T_Identifier
for
[A-Za-z_][A-Za-z0-9_]*
f o r t
f o r t
f o r t
f o r t
f o r t
f o r t
f
f
f
f
o r t
o r t
o r t
o r t
![Page 177: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/177.jpg)
Lexing Ambiguities
T_For
T_Identifier
for
[A-Za-z_][A-Za-z0-9_]*
f o r t
f o r t
![Page 178: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/178.jpg)
Implementing Maximal M u n c h
●
●
Given a set of regular expressions, how can we use them to implement maximum munch?
Idea:
●
●
●
Convert expressions to N FAs .
Run all NFAs in parallel, keeping track of the last match.
When all automata get stuck, report the last match and restart the search at that point.
![Page 179: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/179.jpg)
T_Do
T_Double
T_Mystery
do
double
[A-Za-z]
Implementing Maximal M u n c h
![Page 180: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/180.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
![Page 181: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/181.jpg)
D O U B D O U B L E
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
![Page 182: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/182.jpg)
D O U B D O U B L E
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
![Page 183: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/183.jpg)
D O U B D O U B L E
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
![Page 184: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/184.jpg)
D O U B D O U B L E
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
![Page 185: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/185.jpg)
D O U B D O U B L E
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
![Page 186: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/186.jpg)
D O U B D O U B L E
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
![Page 187: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/187.jpg)
D O U B D O U B L E
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
![Page 188: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/188.jpg)
D O U B D O U B L E
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
![Page 189: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/189.jpg)
D O U B D O U B L E
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
![Page 190: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/190.jpg)
D O U B D O U B L E
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
![Page 191: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/191.jpg)
D O U B D O U B L E
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
![Page 192: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/192.jpg)
D O U B D O U B L E
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
![Page 193: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/193.jpg)
D O U B D O U B L E
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
![Page 194: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/194.jpg)
D O U B D O U B L E
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
![Page 195: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/195.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
U B D O U B L ED O
![Page 196: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/196.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
U B D O U B L ED O
![Page 197: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/197.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
U B D O U B L ED O
![Page 198: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/198.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
U B D O U B L ED O
![Page 199: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/199.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
U B D O U B L ED O
![Page 200: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/200.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
U B D O U B L ED O
![Page 201: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/201.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
U B D O U B L ED O
![Page 202: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/202.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
B D O U B L ED O U
![Page 203: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/203.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
B D O U B L ED O U
![Page 204: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/204.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
B D O U B L ED O U
![Page 205: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/205.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
B D O U B L ED O U
![Page 206: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/206.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
B D O U B L ED O U
![Page 207: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/207.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
B D O U B L ED O U
![Page 208: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/208.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 209: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/209.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 210: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/210.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 211: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/211.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 212: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/212.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 213: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/213.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 214: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/214.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 215: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/215.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 216: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/216.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 217: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/217.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 218: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/218.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 219: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/219.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 220: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/220.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 221: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/221.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 222: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/222.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 223: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/223.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 224: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/224.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 225: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/225.jpg)
do
double
[A-Za-z]
Implementing Maximal M u n c h
o
o u b l e
T_Do
T_Double
T_Mystery
start d
start d
start Σ
D O U B L ED O U B
![Page 226: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/226.jpg)
A Minor Simplification
![Page 227: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/227.jpg)
A Minor Simplification
start d o
start d o u b l e
start Σ
![Page 228: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/228.jpg)
A Minor Simplification
o
o u b l e
start
start
start
d
d
Σ
![Page 229: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/229.jpg)
A Minor Simplification
d o
d o u b l e
Σ
ε
ε
εstart
![Page 230: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/230.jpg)
A Minor Simplification
d o
d o u b l e
Σ
ε
ε
εstart
Build a single automaton
t hat r uns all t he mat ching
automata in paral lel .
![Page 231: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/231.jpg)
A Minor Simplification
d o
d o u b l e
Σ
ε
ε
εstart
![Page 232: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/232.jpg)
A Minor Simplification
d o
d o u b l e
Σ
ε
ε
εstart
Annotate each accepting
st at e wi t h which aut omat on
i t came f r o m .
![Page 233: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/233.jpg)
Other Conflicts
T_Do do
T_Double double
T_Identifier [A-Za-z_][A-Za-z0-9_]*
![Page 234: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/234.jpg)
Other Conflicts
T_Do do
T_Double double
T_Identifier [A-Za-z_][A-Za-z0-9_]*
d o u b l e
![Page 235: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/235.jpg)
Other Conflicts
T_Do do
T_Double double
T_Identifier [A-Za-z_][A-Za-z0-9_]*
d o u b l e
d o u b l e
d o u b l e
![Page 236: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/236.jpg)
Mo r e Tiebreaking
●
●
When two regular expressions apply, choose the one with the greater “priority.”
Simple priority system: pick the rule
that was defined first.
![Page 237: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/237.jpg)
Other Conflicts
T_Do do
T_Double double
T_Identifier [A-Za-z_][A-Za-z0-9_]*
d o u b l e
d o u b l e
d o u b l e
![Page 238: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/238.jpg)
Other Conflicts
T_Do do
T_Double double
T_Identifier [A-Za-z_][A-Za-z0-9_]*
d o u b l e
d o u b l e
![Page 239: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/239.jpg)
Other Conflicts
T_Do
T_Double
do
double
T_Identifier [A-Za-z_][A-Za-z0-9_]*
d o u b l e
d o u b l eWhy isn ' t
this a
problem?
![Page 240: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/240.jpg)
O n e Last Detail . . .
●
●
●
We know what to do if multiple rules match.
What if nothing matches?
Trick: Add a “catch-all” rule that matches any character and reports an error.
![Page 241: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/241.jpg)
Summary of Conflict Resolution
●
●
●
●
●
Construct an automaton for each regular expression.
M e r g e them into one automaton by adding a new start state.
S c a n the input, keeping track of the last known match.
Break ties by choosing higher-precedence matches.
Have a catch-all rule to handle errors.
![Page 242: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/242.jpg)
Chal lenges in Scanning
●
●
●
Ho w do we determine which lexemes are associated with each token?
When there are multiple ways we could scan the input, how do we know which one to pick?
Ho w do we address these concerns efficiently?
![Page 243: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/243.jpg)
Chal lenges in Scanning
●
●
●
Ho w do we determine which lexemes are associated with each token?
When there are multiple ways we could scan the input, how do we know which one to pick?
Ho w do we address these concerns efficiently?
![Page 244: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/244.jpg)
D FAs
●
●
The automata we've seen so far have all been N FAs .
A DFA is like an N F A , but with tighter restrictions:
●
●
Every state must have exactly one
transition defined for every letter.
ε-moves are not allowed.
![Page 245: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/245.jpg)
A Sample DFA
![Page 246: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/246.jpg)
A Sample DFA
start
0 0
1
1
0 0
1
1
![Page 247: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/247.jpg)
A Sample DFA
D
start
0 0
1
1
0 0
1
A
C
B1
![Page 248: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/248.jpg)
A Sample DFA
D
start
0 0
1
1
0 0
1
A
C
B1
A. C B
B. D A
C. A D
D. B C
0 1
![Page 249: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/249.jpg)
C o d e for DFAs
int kTransitionTable[kNumStates][kNumSymbols] = {{0, 0, 1, 3, 7, 1, …},…
};bool kAcceptTable[kNumStates] = {
false, true, true,…
};bool simulateDFA(string input) {
int state = 0;for (char ch: input)
state = kTransitionTable[state][ch];return kAcceptTable[state];
}
![Page 250: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/250.jpg)
C o d e for DFAs
int kTransitionTable[kNumStates][kNumSymbols] = {{0, 0, 1, 3, 7, 1, …},…
};bool kAcceptTable[kNumStates] = {
false, true, true,…
};bool simulateDFA(string input) {
int state = 0;for (char ch: input)
state = kTransitionTable[state][ch];return kAcceptTable[state];
}
Runs in t ime O(m )
on a str ing o f
length m .
![Page 251: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/251.jpg)
Speeding up Matching
●
●
●
In the worst-case, an N FA with n states takes time O(mn2) to match a string of length m .
DFAs, on the other hand, take only O(m).
There is another (beautiful!) algorithm to convert N FAs to DFAs.
Lexical Specification
Regular Expressions
NFA DFATable-Driven
DFA
![Page 252: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/252.jpg)
Subset Construction
●
●
●
●
N FA s can be in many states at once, while DFAs can only be in a single state at a time.
Key idea: M a k e the DFA s imulate the NFA .
H ave the states of the DFA correspond to the sets of states of the N FA .
Transitions between states of DFA correspond to transitions between sets of states in the N FA .
![Page 253: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/253.jpg)
From N F A to DFA
![Page 254: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/254.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
![Page 255: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/255.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
![Page 256: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/256.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
0, 1, 4, 11start
![Page 257: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/257.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
0, 1, 4, 11start
![Page 258: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/258.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
0, 1, 4, 11start 2, 5, 12d
![Page 259: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/259.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
0, 1, 4, 11start 2, 5, 12d
![Page 260: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/260.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
0, 1, 4, 11start 2, 5, 12d
![Page 261: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/261.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
0, 1, 4, 11start 2, 5, 12d
![Page 262: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/262.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
0, 1, 4, 11start 2, 5, 12d
Σ – d
12
![Page 263: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/263.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
0, 1, 4, 11start 2, 5, 12d
Σ – d
12
![Page 264: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/264.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
0, 1, 4, 11start 2, 5, 12d
Σ – d
12
![Page 265: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/265.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
0, 1, 4, 11start 2, 5, 12d
Σ – d
12
![Page 266: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/266.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
0, 1, 4, 11start 2, 5, 12d
Σ – d
o3, 6
12
![Page 267: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/267.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
0, 1, 4, 11start 2, 5, 12d
Σ – d
o3, 6
12
![Page 268: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/268.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
0, 1, 4, 11start 2, 5, 12d
Σ – d
o3, 6 7 8 9 10
u b l e
12
![Page 269: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/269.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
0, 1, 4, 11start 2, 5, 12d
Σ – d
o3, 6 7 8 9 10
u b l e
Σ – oΣ – u Σ – b Σ – l Σ – e
Σ
Σ12
Σ
![Page 270: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/270.jpg)
From N F A to DFA
1 2 3d o
4 5 6d o
7u
8b
9l
10e
1211 Σ
0
ε
ε
εstart
0, 1, 4, 11start 2, 5, 12d
Σ – d
o3, 6 7 8 9 10
u b l e
Σ – oΣ – u Σ – b Σ – l Σ – e
Σ
Σ12
Σ
![Page 271: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/271.jpg)
Modified Subset Construction
●
●
●
Instead of marking whether a state is accepting, remember which token type it matches.
Break ties with priorities.
When using DFA as a scanner, consider the DFA “stuck” if it enters the state corresponding to the empty set .
![Page 272: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/272.jpg)
Performance Concerns
●
●
The NFA-to-DFA construction can introduce exponentially many states.
Time/memory tradeoff:
●
●
Low-memory N FA has higher scan time.
High-memory DFA has lower scan time.
● Could use a hybrid approach by simplifying N FA before generating code.
![Page 273: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/273.jpg)
Real-World Scanning: Python
![Page 274: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/274.jpg)
+
while (ip < z)
++ip;
w h i l e ( i p < z ) \n \t + + i p ;
T_While ( T_Ident < T_Ident ) ++ T_Ident
ip z ip
While
++<
Ident
ip
Ident
z
Ident
ip
![Page 275: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/275.jpg)
Python Blocks
● Scoping handled by whitespace:
if w == z:
a = b
c = d
else:
e = f
g = h
● What does that mean for the scanner?
![Page 276: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/276.jpg)
Whitespace Tokens
●
●
●
●
●
Special tokens inserted to indicate changes in levels of indentation.
N E W L I N E marks the end of a line.
I N D E N T indicates an increase in indentation.
D E D E N T indicates a decrease in indentation.
N ote that I N D E N T and D E D E N T encodechange in indentation, not the total amount ofindentation.
![Page 277: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/277.jpg)
if w == z:
a = b
c = d
else:
e = f
g = h
Scanning Python
![Page 278: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/278.jpg)
if w == z:
a = b
c = d
else:
e = f
g = h
if ident
w
== ident
z
: NEWLINE
INDENT
a b
ident = ident NEWLINE
c d
ident = ident NEWLINE
DEDENT else :
INDENT
e f
ident = ident NEWLINE
Scanning Python
NEWLINE
DEDENT ident
g
= ident
h
NEWLINE
![Page 279: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/279.jpg)
if w == z: {
a = b;
c = d;
} else {
e = f;
}
g = h;
if ident
w
== ident
z
: NEWLINE
INDENT ident
a
ident
c
= ident NEWLINE
b
= ident
d
NEWLINE
DEDENT else :
INDENT
e f
ident = ident NEWLINE
Scanning Python
NEWLINE
DEDENT ident
g
= ident
h
NEWLINE
![Page 280: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/280.jpg)
if w == z: {
a = b;
c = d;
} else {
e = f;
}
g = h;
if ident
w
== ident
z
:
{
a
;
ident
c
ident = ident
b
= ident
d
;
} else :
e
ident = ident
f
;
Scanning Python
{
} ident
g
= ident
h
;
![Page 281: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/281.jpg)
Where to I N D E N T / D E D E N T ?
●
●
●
Scanner maintains a stack of line indentations keeping track of all indented contexts so far.
Initially, this stack contains 0, since initially the contents of the file aren't indented.
O n a newline:
●
●
S e e how m u c h whitespace is at the start of the line.
If this value exceeds the top of the stack:
–
–
Push the value onto the stack.
Emit an I N D E N T token.
● Otherwise, while the value is less than the top of the stack:
–
–
Pop the stack.
Emit a D E D E N T token.
Source: http://docs.python.org/reference/lexical_analysis.html
![Page 282: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/282.jpg)
Interesting Observation
●
●
Normally, more text on a line translates into more tokens.
With D E D E N T , less text on a line often means more tokens:
if cond1:
if cond2:
if cond3:
if cond4:
if cond5:
statement1
statement2
![Page 283: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/283.jpg)
S umm ary
●
●
●
●
●
Lexical analysis splits input text into tokens
holding a lexeme and an attribute .
Lexemes are sets of strings often defined with regular expressions .
Regular expressions can be converted toNFAs and from there to DFAs .
Max ima l -munch using an automaton allows for fast scanning.
N ot all tokens come directly from the source code.
![Page 284: Lexical Analysis - Philadelphia](https://reader033.vdocuments.site/reader033/viewer/2022050313/626fa37a9b5d4d741537ec7c/html5/thumbnails/284.jpg)
Next Time
Lexical Analysis
Syntax Analysis
Semantic Analysis
IR Generation
IR Optimization
Code Generation
Optimization
Source Code
Machine
Code
( Plus a li t t le b i t her e )