nondeterministic finite acceptor (nfa)cs453/yr2011/slides/... · q1q2 q3 a a a q0 two choices...
TRANSCRIPT
CS453 Lecture Lexical Analysis with JLex 1
Plan for Lexical Analysis with JLex
Overview of the MeggyJava Assignments
Lexer generators, show 15min example
Expressing tokens with regular expressions – regular expression syntax for JLex – using JLex with JavaCup
How do lexer generators work? – Convert regular expressions to NFA – Converting an NFA to DFA – Implementing the DFA
CS453 Lecture Introduction and MiniSVG 2
Structure of the MeggyJava Compiler
“sentences”
Synthesis Analysis
character stream
lexical analysis
“words” tokens
semantic analysis
syntactic analysis
AST
AST and symbol table
code gen
Atmel assembly code
PA2: MeggyJava and Atmel warmup PA3: setPixel compiler PA4: add control flow PA5: add functions PA6: add variables and objects PA7: add arrays
CS453 Lecture Lexical Analysis with JLex 3
Specifying Tokens with JLex JLex example input file:
package mjparser; import java_cup.runtime.Symbol;
%% %line %char %cup %public
%eofval{ return new Symbol(sym.EOF, new TokenValue("EOF", yyline, yychar)); %eofval}
LETTER=[A-Za-z] DIGIT=[0-9] UNDERSCORE="_" LETT_DIG_UND={LETTER}|{DIGIT}|{UNDERSCORE} ID={LETTER}({LETT_DIG_UND})*
%% "&&" {return new Symbol(sym.AND, new TokenValue(yytext(), yyline, yychar)); }
"+" {return new Symbol(sym.PLUS, ...); } "if" {return new Symbol(sym.IF,...); }
{ID} {return new Symbol(sym.ID, new ...
{EOL} { /* reset yychar */ … } {WS} { /* ignore */ }
1q 2q
3q
a
a
a
0q
}{aAlphabet =
Nondeterministic Finite Acceptor (NFA)
1q 2q
3q
a
a
a
0q
Two choices
}{aAlphabet =
Nondeterministic Finite Accepter (NFA)
a a
0q
1q 2q
3q
a
a
First Choice
a
a a
0q
1q 2q
3q
a
a
a
First Choice
a a
0q
1q 2q
3q
a
a
First Choice
a
a a
0q
1q 2q
3q
a
a
a “accept”
First Choice
All input is consumed a a
0q
1q 2q
3q
a
a
Second Choice
a
a a
0q
1q 2qa
a
Second Choice
a
3q
a a
0q
1q 2qa
a
a
3q
Second Choice
No transition: the automaton hangs
a a
0q
1q 2qa
a
a
3q
Second Choice
should we reject aa?
Input cannot be consumed An NFA accepts a string: when there is a computation of the NFA that accepts the string
all the input is consumed and the automaton is in a final state
AND
When To Accept a String
Example
aa is accepted by the NFA:
0q
1q 2q
3q
a
a
a
“accept”
0q
1q 2qa
a
a
3q
“reject??” because this computation accepts aa
But this only tells us that choice didn’t work….
a
0q
1q 2q
3q
a
a
Rejection example
a
a
0q
1q 2q
3q
a
a
a
First Choice
a
0q
1q 2q
3q
a
a
a
First Choice
“reject??”
Second Choice
a
0q
1q 2q
3q
a
a
a
Second Choice
a
0q
1q 2qa
a
a
3q
Second Choice
a
0q
1q 2qa
a
a
3q “reject??”
An NFA rejects a string: when there is no computation of the NFA that accepts the string:
• All the input is consumed and the automaton is in a non final state
• The input cannot be consumed
OR
Example
a is rejected by the NFA:
0q
1q 2qa
a
a
3q “reject??” 0q
1q 2qa
a
a
3q
“reject??”
All possible computations lead to rejection
1q 2q
3q
a
a
a
0q
Language accepted: }{aaL =
CS453 Lecture Lexical Analysis with JLex 25
Specifying Tokens with JLex JLex example input file:
package mjparser; import java_cup.runtime.Symbol;
%% %line %char %cup %public
%eofval{ return new Symbol(sym.EOF, new TokenValue("EOF", yyline, yychar)); %eofval}
LETTER=[A-Za-z] DIGIT=[0-9] UNDERSCORE="_” EOL=(\n|\r|\r\n) LETT_DIG_UND={LETTER}|{DIGIT}|{UNDERSCORE} ID={LETTER}({LETT_DIG_UND})*
%% "&&" {return new Symbol(sym.AND, new TokenValue(yytext(), yyline, yychar)); }
"+" {return new Symbol(sym.PLUS, ...); } "if" {return new Symbol(sym.IF,...); }
{ID} {return new Symbol(sym.ID, new ...
{EOL} { /* reset yychar */ … } {WS} { /* ignore */ }
CS453 Lecture Lexical Analysis with JLex 26
Example NFA for Multiple Tokens
CS453 Lecture Lexical Analysis with JLex 27
DFA from IF and ID NFAs