compiler design unit 2. lexical analysis...xqfwlrqv &rpsxwhg )urp wkh 6\qwd[ 7uhh...
TRANSCRIPT
![Page 1: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/1.jpg)
UNIT 2: LEXICAL ANALYSIS
1
Sadique NayeemAsst. ProfessorDept. of CSE
Sitamarhi Institute of Technology, Sitamarhi
![Page 2: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/2.jpg)
Lexical Analysis
Being the first phase of a compiler, the main task of the lexical analyzeris to: Read the input characters of the source program, Group them into lexemes, and Produce as output a sequence of tokens for each lexeme in the
source program.source program.
The stream of tokens is sent to the parser for syntax analysis.
It is common for the lexical analyzer to interact with the symbol table aswell.
Another task of LA is stripping out comments and whitespace (blank,newline, tab).
Another task is correlating error messages generated by the compilerwith the source program.
![Page 3: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/3.jpg)
getNextToken
Commonly, the interaction is implemented by having the parsercall the lexical analyzer. The call, suggested by thegetNextToken command, causes the lexical analyzer to readcharacters from its input until it can identify the next lexemeand produce for it the next token, which it returns to the parser.and produce for it the next token, which it returns to the parser.
![Page 4: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/4.jpg)
Sometimes, lexical analyzers are divided into a cascade of twoprocesses:
a) Scanning consists of the simple processes that do not requiretokenization of the input, such as deletion of comments andcompaction of consecutive whitespace characters into one.compaction of consecutive whitespace characters into one.
b) Lexical analysis proper is the more complex portion, where thescanner produces the sequence of tokens as output.
![Page 5: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/5.jpg)
All Program have
Keywords
Operator
Identifiers
Constants (number and strings)
Punctuation marks
![Page 6: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/6.jpg)
Token
A token is a pair consisting of a token name and an optionalattribute value.
<token name, attribute value>
The token name is an abstract symbol representing a kind oflexical unit, e.g., a particular keyword, or a sequence of inputlexical unit, e.g., a particular keyword, or a sequence of inputcharacters denoting an identifier.
The token names are the input symbols that the parserprocesses.
![Page 7: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/7.jpg)
Pattern
A Pattern is a description of the form that the lexemes of a tokenmay take.
In the case of a keyword as a token, the pattern is just thesequence of characters that form the keyword. (Example: if)
For identifiers and some other tokens, the pattern is a more For identifiers and some other tokens, the pattern is a morecomplex structure that is matched by many strings. (Example: age)
![Page 8: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/8.jpg)
Lexeme
A lexeme is a sequence of characters in the source programthat matches the pattern for a token and is identified by thelexical analyzer as an instance of that token.
#include<stdio.h> #include<stdio.h>#include<stdio.h>
void main()
{
printf(“SIT, Sitamarhi”);
}
#include<stdio.h>void main(){
int a=10, b=20, c;c = a + b;printf(“%d”, c);
}
![Page 9: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/9.jpg)
Examples of Tokens
![Page 10: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/10.jpg)
![Page 11: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/11.jpg)
![Page 12: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/12.jpg)
GATE 2000
printf("i = %d, &i = %x", i, &i);
![Page 13: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/13.jpg)
Lexical Errors
These errors are mainly the spelling mistakes and accidentalinsertion of foreign character if the language does not allow it.
It is hard for a lexical analyzer to tell, without the aid of othercomponents, that there is a source-code error.
For instance, if the string fi is encountered for the first time in a C For instance, if the string fi is encountered for the first time in a Cprogram in the context:
fi ( a == 10 )
A lexical analyzer cannot tell whether fi is a misspelling of thekeyword if or an undeclared function identifier. Since fi is a validlexeme for the token id, the lexical analyzer must return thetoken id to the parser and let some other phase of the compiler— probably the parser in this case — handle an error due totransposition of the letters.
![Page 14: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/14.jpg)
Suppose a situation arises in which the lexical analyzer is unableto proceed because none of the patterns for tokens matches anyprefix of the remaining input.
The simplest recovery strategy is "panic mode" recovery. Wedelete successive characters from the remaining input, until thedelete successive characters from the remaining input, until thelexical analyzer can find a well-formed token at the beginning ofwhat input is left.
Other possible error-recovery actions are:
1. Delete one character from the remaining input.
2. Insert a missing character into the remaining input.
3. Replace a character by another character.
4. Transpose two adjacent characters.
![Page 15: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/15.jpg)
Specification of Tokens
Alphabet
String
Language
Operation on Language (U , . , * , +)
Kleen Closure and Positive Closure
Transition Table
ε- Closure
RE to ε- NFA
ε- NFA to NFA
NFA to DFAKleen Closure and Positive Closure
Regular Expression
Transition Diagram
Finite Automata
NFA
DFA
ε- NFA
NFA to DFA
DFA Minimizations
![Page 16: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/16.jpg)
Regular Definitions
![Page 17: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/17.jpg)
![Page 18: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/18.jpg)
![Page 19: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/19.jpg)
![Page 20: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/20.jpg)
![Page 21: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/21.jpg)
![Page 22: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/22.jpg)
![Page 23: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/23.jpg)
![Page 24: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/24.jpg)
ε- NFA
NFA RE
DFA
![Page 25: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/25.jpg)
Regular expression can be represented by its syntax tree,where the leaves correspond to operands and the interiornodes correspond to operators.
An interior node is called a cat-node, or-node, or star-node if itis labeled by the concatenation operator (dot), union operator
RE to DFA
is labeled by the concatenation operator (dot), union operator|, or star operator *, respectively.
![Page 26: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/26.jpg)
Leaves in a syntax tree arelabeled by ε or by an alphabetsymbol. To each leaf not labeledε, we attach a unique integer.
We refer to this integer as theposition of the leaf and also as aposition of its symbol.
![Page 27: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/27.jpg)
Construct Syntax tree
a(a|b)*#
(a|b)c*#
(a|b) (a|b)#
(a|b)*(a|b)# (a|b)*(a|b)#
![Page 28: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/28.jpg)
Functions Computed From the Syntax Tree
To construct a DFA directly from a regular expression, we construct itssyntax tree and then compute four functions: nullable, firstpos, lastpos,and followpos, defined as follows. Each definition refers to the syntaxtree for a particular augmented regular expression ( r ) #.
1. nullable(n) is true for a syntax-tree node n if and only if thesubexpression represented by n has ε in its language. That is, thesubexpression represented by n has ε in its language. That is, thesubexpression can be "made null" or the empty string, even thoughthere may be other strings it can represent as well.
2. firstpos(n) is the set of positions in the subtree rooted at n thatcorrespond to the first symbol of at least one string in the languageof the subexpression rooted at n. (From where the starting positionelement of the sting is coming)
![Page 29: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/29.jpg)
3. lastpos(n) is the set of positions in the subtree rooted at n thatcorrespond to the last symbol of at least one string in the languageof the subexpression rooted at n. (From where the last positionelement of the sting is coming)
4. followpos(p), is the set of position q that can match the first or lastsymbol of the string generated by a given subexpression of asymbol of the string generated by a given subexpression of aregular expression.
![Page 30: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/30.jpg)
Computing nullable, firstpos, and lastpos
lastpos(n)
Ø
{i}
lastpos(c1) U lastpos(c2)
If (nullable(c2)) (lastpos(c1) U lastpos(c2)) else lastpos(c2)
lastpos(c1)
C2C1
*
C2FP1 LP1 FP2 LP2
![Page 31: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/31.jpg)
![Page 32: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/32.jpg)
Computing followpos
![Page 33: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/33.jpg)
Converting a Regular Expression Directly to a DFA
Step1. Construct a syntax tree T from the augmented regularexpression ( r ) #.
Step 2. Compute nullable, firstpos, lastpos, and followpos for T.
Step 3. Construct Dstates (set of states of DFA D) and Dtran (transitionfunction for D) by using following procedure.
![Page 34: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/34.jpg)
![Page 35: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/35.jpg)
The states of D are sets of positions in T.
Initially, each state is "unmarked," and a state becomes "marked"just before we consider its out-transitions.
The start state of D is firstpos(no), where node ‘no’ is the root of T.
The accepting states are those containing the position for theendmarker symbol #.
![Page 36: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/36.jpg)
The value of firstpos for the root of the tree is {1,2,3}, so this set is the start state of D.
Let us Call this set of states A.
We must compute Dtran[A, a] and Dtran[A, b].
Among the positions of A, leaf 1 and leaf 3 correspond to a, while leaf 2 correspondsto b. Thus,
Dtran[A,a] = followpos(l) U followpos(3) = {1,2,3,4} B
Dtran[A, b] = followpos{2) = {1,2,3} A
![Page 37: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/37.jpg)
Dtran[B, a] = followpos(l) U followpos(3) = {1,2,3,4} B
Dtran[B, b] = followpos(2) U followpos(4) = {1,2,3,5} C
Dtran[C, a] = followpos(l) U followpos(3) = {1,2,3,4} B
Dtran[C, b] = followpos(2) U followpos(5) = {1,2,3,6} D
Dtran[D, a] = followpos(l) U followpos(3) = {1,2,3,4} B
Dtran[D, b] = followpos(2) = {1,2,3} A
![Page 38: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/38.jpg)
A B C D
Note: We can also minimize the resultant DFA.
A B C D
![Page 39: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/39.jpg)
Question Time
Q. Find DFA from following regular expression.
a(a|b)*#
(a|b)c*#
![Page 40: Compiler Design Unit 2. LEXICAL ANALYSIS...XQFWLRQV &RPSXWHG )URP WKH 6\QWD[ 7UHH 7RFRQVWUXFWD')$GLUHFWO\IURPDUHJXODUH[SUHVVLRQ ZHFRQVWUXFWLWV V\QWD[WUHHDQGWKHQFRPSXWHIRXUIXQFWLRQV](https://reader036.vdocuments.site/reader036/viewer/2022071001/5fbe5647bfd1035189784c07/html5/thumbnails/40.jpg)
40
THANK YOU!