nondeterministic finite acceptor (nfa)cs453/yr2011/slides/... · q1q2 q3 a a a q0 two choices...

7
CS453 Lecture Lexical Analysis with JLex 1 Plan for Lexical Analysis with JLex Overview of the MeggyJava Assignments Lexer generators, show 15min example Expressing tokens with regular expressions regular expression syntax for JLex using JLex with JavaCup How do lexer generators work? Convert regular expressions to NFA Converting an NFA to DFA Implementing the DFA CS453 Lecture Introduction and MiniSVG 2 Structure of the MeggyJava Compiler “sentences” Synthesis Analysis character stream lexical analysis “words” tokens semantic analysis syntactic analysis AST AST and symbol table code gen Atmel assembly code PA2: MeggyJava and Atmel warmup PA3: setPixel compiler PA4: add control flow PA5: add functions PA6: add variables and objects PA7: add arrays CS453 Lecture Lexical Analysis with JLex 3 Specifying Tokens with JLex JLex example input file: package mjparser; import java_cup.runtime.Symbol; %% %line %char %cup %public %eofval{ return new Symbol(sym.EOF, new TokenValue("EOF", yyline, yychar)); %eofval} LETTER=[A-Za-z] DIGIT=[0-9] UNDERSCORE="_" LETT_DIG_UND={LETTER}|{DIGIT}|{UNDERSCORE} ID={LETTER}({LETT_DIG_UND})* %% "&&" {return new Symbol(sym.AND, new TokenValue(yytext(), yyline, yychar)); } "+" {return new Symbol(sym.PLUS, ...); } "if" {return new Symbol(sym.IF,...); } {ID} {return new Symbol(sym.ID, new ... {EOL} { /* reset yychar */ … } {WS} { /* ignore */ } 1 q 2 q 3 q a a a 0 q } {a Alphabet = Nondeterministic Finite Acceptor (NFA)

Upload: others

Post on 05-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nondeterministic Finite Acceptor (NFA)cs453/yr2011/Slides/... · q1q2 q3 a a a q0 Two choices Alphabet = {a} Nondeterministic Finite Accepter (NFA) aa q0 q1q2 q3 a a First Choice

CS453 Lecture Lexical Analysis with JLex 1

Plan for Lexical Analysis with JLex

 Overview of the MeggyJava Assignments

 Lexer generators, show 15min example

 Expressing tokens with regular expressions –  regular expression syntax for JLex –  using JLex with JavaCup

 How do lexer generators work? –  Convert regular expressions to NFA –  Converting an NFA to DFA –  Implementing the DFA

CS453 Lecture Introduction and MiniSVG 2

Structure of the MeggyJava Compiler

“sentences”

Synthesis Analysis

character stream

lexical analysis

“words” tokens

semantic analysis

syntactic analysis

AST

AST and symbol table

code gen

Atmel assembly code

PA2: MeggyJava and Atmel warmup PA3: setPixel compiler PA4: add control flow PA5: add functions PA6: add variables and objects PA7: add arrays

CS453 Lecture Lexical Analysis with JLex 3

Specifying Tokens with JLex  JLex example input file:

 package mjparser;  import java_cup.runtime.Symbol;

 %%  %line  %char  %cup  %public

 %eofval{   return new Symbol(sym.EOF, new TokenValue("EOF", yyline, yychar));  %eofval}

 LETTER=[A-Za-z]  DIGIT=[0-9]  UNDERSCORE="_"  LETT_DIG_UND={LETTER}|{DIGIT}|{UNDERSCORE}  ID={LETTER}({LETT_DIG_UND})*

 %%  "&&" {return new Symbol(sym.AND, new TokenValue(yytext(), yyline, yychar)); }

 "+" {return new Symbol(sym.PLUS, ...); }  "if" {return new Symbol(sym.IF,...); }

 {ID} {return new Symbol(sym.ID, new ...

 {EOL} { /* reset yychar */ … }  {WS} { /* ignore */ }

1q 2q

3q

a

a

a

0q

}{aAlphabet =

Nondeterministic Finite Acceptor (NFA)

Page 2: Nondeterministic Finite Acceptor (NFA)cs453/yr2011/Slides/... · q1q2 q3 a a a q0 Two choices Alphabet = {a} Nondeterministic Finite Accepter (NFA) aa q0 q1q2 q3 a a First Choice

1q 2q

3q

a

a

a

0q

Two choices

}{aAlphabet =

Nondeterministic Finite Accepter (NFA)

a a

0q

1q 2q

3q

a

a

First Choice

a

a a

0q

1q 2q

3q

a

a

a

First Choice

a a

0q

1q 2q

3q

a

a

First Choice

a

Page 3: Nondeterministic Finite Acceptor (NFA)cs453/yr2011/Slides/... · q1q2 q3 a a a q0 Two choices Alphabet = {a} Nondeterministic Finite Accepter (NFA) aa q0 q1q2 q3 a a First Choice

a a

0q

1q 2q

3q

a

a

a “accept”

First Choice

All input is consumed a a

0q

1q 2q

3q

a

a

Second Choice

a

a a

0q

1q 2qa

a

Second Choice

a

3q

a a

0q

1q 2qa

a

a

3q

Second Choice

No transition: the automaton hangs

Page 4: Nondeterministic Finite Acceptor (NFA)cs453/yr2011/Slides/... · q1q2 q3 a a a q0 Two choices Alphabet = {a} Nondeterministic Finite Accepter (NFA) aa q0 q1q2 q3 a a First Choice

a a

0q

1q 2qa

a

a

3q

Second Choice

should we reject aa?

Input cannot be consumed An NFA accepts a string: when there is a computation of the NFA that accepts the string

all the input is consumed and the automaton is in a final state

AND

When To Accept a String

Example

aa is accepted by the NFA:

0q

1q 2q

3q

a

a

a

“accept”

0q

1q 2qa

a

a

3q

“reject??” because this computation accepts aa

But this only tells us that choice didn’t work….

a

0q

1q 2q

3q

a

a

Rejection example

a

Page 5: Nondeterministic Finite Acceptor (NFA)cs453/yr2011/Slides/... · q1q2 q3 a a a q0 Two choices Alphabet = {a} Nondeterministic Finite Accepter (NFA) aa q0 q1q2 q3 a a First Choice

a

0q

1q 2q

3q

a

a

a

First Choice

a

0q

1q 2q

3q

a

a

a

First Choice

“reject??”

Second Choice

a

0q

1q 2q

3q

a

a

a

Second Choice

a

0q

1q 2qa

a

a

3q

Page 6: Nondeterministic Finite Acceptor (NFA)cs453/yr2011/Slides/... · q1q2 q3 a a a q0 Two choices Alphabet = {a} Nondeterministic Finite Accepter (NFA) aa q0 q1q2 q3 a a First Choice

Second Choice

a

0q

1q 2qa

a

a

3q “reject??”

An NFA rejects a string: when there is no computation of the NFA that accepts the string:

•  All the input is consumed and the automaton is in a non final state

•  The input cannot be consumed

OR

Example

a is rejected by the NFA:

0q

1q 2qa

a

a

3q “reject??” 0q

1q 2qa

a

a

3q

“reject??”

All possible computations lead to rejection

1q 2q

3q

a

a

a

0q

Language accepted: }{aaL =

Page 7: Nondeterministic Finite Acceptor (NFA)cs453/yr2011/Slides/... · q1q2 q3 a a a q0 Two choices Alphabet = {a} Nondeterministic Finite Accepter (NFA) aa q0 q1q2 q3 a a First Choice

CS453 Lecture Lexical Analysis with JLex 25

Specifying Tokens with JLex  JLex example input file:

 package mjparser;  import java_cup.runtime.Symbol;

 %%  %line  %char  %cup  %public

 %eofval{   return new Symbol(sym.EOF, new TokenValue("EOF", yyline, yychar));  %eofval}

 LETTER=[A-Za-z]  DIGIT=[0-9]  UNDERSCORE="_”  EOL=(\n|\r|\r\n)  LETT_DIG_UND={LETTER}|{DIGIT}|{UNDERSCORE}  ID={LETTER}({LETT_DIG_UND})*

 %%  "&&" {return new Symbol(sym.AND, new TokenValue(yytext(), yyline, yychar)); }

 "+" {return new Symbol(sym.PLUS, ...); }  "if" {return new Symbol(sym.IF,...); }

 {ID} {return new Symbol(sym.ID, new ...

 {EOL} { /* reset yychar */ … }  {WS} { /* ignore */ }

CS453 Lecture Lexical Analysis with JLex 26

Example NFA for Multiple Tokens

CS453 Lecture Lexical Analysis with JLex 27

DFA from IF and ID NFAs