cpsc 325 - compiler

14
CPSC 325 - Compiler Tutorial 2 Scanner & Lex

Upload: webb

Post on 05-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

CPSC 325 - Compiler. Tutorial 2 Scanner & Lex. Tokens. Input. Token Stream: Each significant lexical chunk of the program is represented by a token Operators & Punctuation: { } ! + - = * ; : … Keywords: if while return goto Identifier: id & actual name - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CPSC 325 - Compiler

CPSC 325 - Compiler

Tutorial 2

Scanner & Lex

Page 2: CPSC 325 - Compiler

Tokens

Token Stream: Each significant lexical chunk of the program is represented by a token– Operators & Punctuation: { } ! + - = * ; : …– Keywords: if while return goto– Identifier: id & actual name– Constants: kind & value; int, floating-point charact

er, string, …

Input

Page 3: CPSC 325 - Compiler

Token – example 1

Input text

if( x >= y ) y = 10;

Token Stream

IF LP ID(x)

Assign SEMIINT(10)

ID(y) RPGEQ

ID(y)

Page 4: CPSC 325 - Compiler

Parser

Tokens

IF LP ID(x)

Assign SEMIINT(10)

ID(y) RPGEQ

ID(y)

IfStmt

INT(10)ID(y)ID(y)ID(x)

>= assign

Page 5: CPSC 325 - Compiler

Sample Grammar

Program ::= statement | program statement Statement ::= assignStmt | ifStmt assignStmt ::= id = expr; ifStmt ::= if ( expr ) Statement Expr ::= id | int | expr + expr id ::= a | b | … | y | z Int ::= 1 | 2 | … | 9 | 0

Page 6: CPSC 325 - Compiler

Why Separate the Scanner and Parser?

Simplicity & Separation of Concerns– Scanner hides details from parser (comments, wh

itespace, input files, etc.)– Parser is easier to build; has simpler input stream

Efficiency– Scanner can use simpler, faster design

(But still often consumes a surprising amount of the compiler’s total execution time)

Page 7: CPSC 325 - Compiler

Principle of Longest Match

In most of languages, the scanner should pick the longest possible string to make up the next token if there is a choice.

Examplereturn apple != banana;

Should be recognized as 5 tokens

Not more (not parts of words or identifier, or ! And = as separate tokens)

return NEQ ID(banana) SEMIID(apple)

Page 8: CPSC 325 - Compiler

Scanner DFA Example (1)

0

4

3

2

1

Accept EOF

Accept LP

Accept RP

Accept SEMI

White space or comments

end of input

(

)

;

Page 9: CPSC 325 - Compiler

Scanner DFA Example (2)

10

9

7

6

Accept NEQ

Accept NOT

Accept LEQ

Accept LESS

White space or comments

5

8

!

<

=

other

=

other

Page 10: CPSC 325 - Compiler

Scanner DFA Example (3)

11

12

White space or comments

[0-9]

[0-9]

other Accept INT

Page 11: CPSC 325 - Compiler

Scanner DFA Example (4)

13

14

White space or comments

[a-zA-Z]

[a-zA-Z]

other Accept ID orkeyword

Page 12: CPSC 325 - Compiler

Lex/Flex

Use Flex instead of Lex Use Bison instead of yacc When compile, link to the library

flex file.lex gcc –o object lex.yy.c –ll object

Page 13: CPSC 325 - Compiler

Lex - Structure

Declarations/Definitions

%% Rules/Production

- Lex expression

- white space

- C statement (optional)

%% Additional Code/Subroutines

Page 14: CPSC 325 - Compiler

Lex – Basic operators

* - zero or more occurrences . - “ANY” character .* - matches any sequence | - separator + - one or more occurrences. (a+ :== aa*) ? - zero or one of something. (b? :== (b+null) [ ] - choice, so [12345] (1|2|3|4|5) (Note: [*+] represent a choice between star and plus. They lost their specialty. - - [a-zA-Z] a to z and A to Z, all the letters. \ - \* matches *, and \. Match period or decimal point.