lexical analysis dragon book: chapter 3. compiler structure lexical analyzer syntax analyzer...

36
Lexical Analysis Dragon Book: chapter 3

Upload: cole-kirby

Post on 28-Mar-2015

298 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Lexical Analysis

Dragon Book: chapter 3

Page 2: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Compiler structure

Lexical analyzer

Syntax analyzer

Semantic analyzer

Intermediate codegenerator

Code optimizer

Code generator

Source program

Target program

Symbol table Error handling

Page 3: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Compiler structure

Lexical analyzer

Syntax analyzer

Source program

Symbol table Error handling

token Get next token

Page 4: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Tokens in programming languages

Token Sample instances Description

if id keyword

rel <, <=, <>, >=, > relation

id count, length, point2

variable

num 3.1415927, 7, 145e-3

Numericalconstant

str “abc”, “some space”“\7\” is a char”

Constant string

Page 5: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Tokens may be difficult to recognize Fortran: DO 5 I=1.25

DO 5 I=1,25(spaces do not count).

PL/I: IF THEN THEN THEN=ELSE; ELSE ELSE=THEN;(no reserved keywords).

PL/I: PR1(2, 7, 18, D*3, 175.14)=3(proc. call or array reference).

Page 6: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Strings, languages. A sequence of characters over some

alphabet, e.g., 0100110 over {0, 1}. In computers, usually ASCII or EBCDIC. Length of strings: number of characters. Empty string: (size 0). Concatenation: putting one string after

another. X=dog, Y=house, XY=doghouse (also X.Y).

Prefix: ban is prefix of banana.Suffix: ana is prefix of banana.

Page 7: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Language: a set of strings The alphabet is a language:

L={A, B, …, Z, a, b, …, z}. Constant languages: X={ab, ba}, Y={a}. Concatenation: X.Y = {aba, baa}.

Y.X = {aab, aba}. Union: XY=X+Y=X|Y={ab, ba, a}. Exponentation: X3 = X.X.X Star: X* = zero or more occurrences.

L* = all words with letters from L. L+= all words with one or more letters from

L.

Page 8: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Regular expressions

X|Y = XY= { s | sX or sY }.X.Y = { x.y | xX and yY }.X* = i=0, Xi.

X+ = i=1, Xi.

Page 9: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Examples

a|b = {a, b}. (a|b).(a|b) = {aa, ab, ba, bb}. a* = { , a, aa, aaa, … }. (a|b)* = { , a, b, ab, ba, aa, aba,

… }

Page 10: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Defining tokens

digit [0-9] digits digit+ fraction . digits | exponent E ( + | - | ) digits | const digits fraction exponent

Page 11: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Not everything is regular!

All the words of the form w c w, wherew is a word and c a letter.

The syntax of a program, e.g., the recursive definition of if-then-else.stmtif expr then stmt else stmt.

Page 12: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Reading the input

Need sometimes to “lookahead”. For example: identifying the variable done.

May need to “unread” a character.

If a>8 then goto nextloop else begin while z>8 do

Token starts here

Last character read

Page 13: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Returning: token + attributes.

if xyz > 11 then if, keyword id, value=xyz op, value=“>”. const, value=11 then, keyword.

Page 14: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Finite Automata

s1

s4

s2

c

a

a

a

b

b

b

b

s3

s5

c

a

Includes:

States {s1,s2,…,s5}.

Initial states {s1}.

Accepting states {s3,s5}.

Alphabet {a, b, c}.

Transitions:

{(s1,a,s2), (s2, a, s3), …}.

Deterministic?

Page 15: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Automaton. What is the language?

b

s0

a

a bs1

Formally:

An input is a word over the alphabet .

A run over a word is an alternating sequence ofstates and letters, starting from the initial state.

Accepting run: ends with an accepting state.

Page 16: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Example

s0

a

a bs1

Input: aabbb

Run: s0 a s0 a s0 b s1 b s1 b s1. Accepts.

Input: aba

Run: s0 a s0 b s1 a s0. Does not accept.

b

Page 17: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Automaton. What is the language?

s0

a

a

b

bs1

Page 18: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Automaton. What is the language?

s1

a

a

b

bs0

Page 19: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Identifying tokens

IF

T H E N

L SE

E

letterletter|digit

Page 20: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Non deterministic automata

Allows more than a single transition from a state with the same label.

There does not have to be a transition from every state with every label.

Allows multiple initial states.

Allows transitions.

s0 s1 s20,1

1 0,1 0,1s3

Page 21: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Nondeterministic runs

Input: 0100

Run 1: s0 0 s0 1 s0 0 s0 0 s0. Does not accept.Run 2: s0 0 s0 1 s1 0 s2 0 s3. Accepts.

Accepts when there exists an accepting run.

s0 s1 s20,1

1 0,1 0,1s3

Page 22: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Determinizing Automata

s0 s1 s20,1

1 0,1 0,1s3

Each state of D is a set of the states of N.

S—aT when T={t|sS and s—at}.

The initial state of D includes all the initial states of N.

Accepting states in D include at least one acceptingstate of N.

Page 23: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Determinization

0,1 s0 s1 s21 0,

10,1

s3

s0

s0,s3

s0,s2 s0,s1,s3

s0,s2,s3

s0,s1,s2,s3s0,s1,s2s0,s10

00

0

1

00

0

1 1

1 1

1

1

0

Page 24: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Determinization

000

100

010 101

110

1110110010

00

0

1

00

0

1 1

1 1

1

1

0

Page 25: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Translating regular expressions into automata

L1

L1 L2

L2

L

L1L2L1.L2

L*

Page 26: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Automatic translation

(a|b).(a.b)=(ab)(ab)=(a+b).(a+b)=…

a

b

a

b

a

b

a

b

Page 27: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Determinization with transitions.

s1 s3a

s2 s4b

s0 s5

s7 s9a

s8 s10bs6 s11

Add to each set states reachable using transitions.

s0,s1,s2

s3,s5,s6,s7,s8 s9,s11

s4,s5,s6,s7,s8 s10,s11

a a

abb

b

Page 28: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Minimization

Group all the states together.

Separate states according to available exit transitions.

Separate a set to two if from some of its states one can reach another set and with others one cannot.

Repeat until cannot separate.

p0

p1 p3

p2 p4

a a

abb

b

Page 29: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Minimization

Group all the states together.

{p0, p1, p2, p3, p4}.

p0

p1 p3

p2 p4

a a

abb

b

Page 30: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Minimization

Separate states according to available exit transitions.

p0

p1 p3

p2 p4

a a

abb

b

Page 31: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Minimization

p0

p1 p3

p2 p4

a a

abb

b

Separate a set to two if from some of its states one can reach another set and with others one cannot.

Repeat until cannot separate.

Page 32: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Can minimize now

a

b

a

b

bb

aa

Page 33: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Lex

Declarations%%Translation rules%%Auxiliary procedures

Page 34: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Lex behavior

Lex ProgramLex sourceprogramlex.l

lex.yy.c

CCompiler

a.out

a.outInput

streem

Output

tokens

Page 35: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Lex behavior Translates the definitions into an

automaton. The automaton looks for the longest

matching string. Either return some value to the reading

program (parser), or looks for next token. Lookahead operator: x/y allow the

token x only if y follows it (but y is not part of the token).

Page 36: Lexical Analysis Dragon Book: chapter 3. Compiler structure Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer

Lex Project Project collection date: Feb 11th. Work in pairs (singles). Use lex to take a text and check

whether the number of open parentheses of any kind is equal to the number of closed parentheses.

Exception: Inside quotes. \” is not a closing quote.