cpsc 325 - compiler

Post on 05-Jan-2016

33 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

CPSC 325 - Compiler. Tutorial 2 Scanner & Lex. Tokens. Input. Token Stream: Each significant lexical chunk of the program is represented by a token Operators & Punctuation: { } ! + - = * ; : … Keywords: if while return goto Identifier: id & actual name - PowerPoint PPT Presentation

TRANSCRIPT

CPSC 325 - Compiler

Tutorial 2

Scanner & Lex

Tokens

Token Stream: Each significant lexical chunk of the program is represented by a token– Operators & Punctuation: { } ! + - = * ; : …– Keywords: if while return goto– Identifier: id & actual name– Constants: kind & value; int, floating-point charact

er, string, …

Input

Token – example 1

Input text

if( x >= y ) y = 10;

Token Stream

IF LP ID(x)

Assign SEMIINT(10)

ID(y) RPGEQ

ID(y)

Parser

Tokens

IF LP ID(x)

Assign SEMIINT(10)

ID(y) RPGEQ

ID(y)

IfStmt

INT(10)ID(y)ID(y)ID(x)

>= assign

Sample Grammar

Program ::= statement | program statement Statement ::= assignStmt | ifStmt assignStmt ::= id = expr; ifStmt ::= if ( expr ) Statement Expr ::= id | int | expr + expr id ::= a | b | … | y | z Int ::= 1 | 2 | … | 9 | 0

Why Separate the Scanner and Parser?

Simplicity & Separation of Concerns– Scanner hides details from parser (comments, wh

itespace, input files, etc.)– Parser is easier to build; has simpler input stream

Efficiency– Scanner can use simpler, faster design

(But still often consumes a surprising amount of the compiler’s total execution time)

Principle of Longest Match

In most of languages, the scanner should pick the longest possible string to make up the next token if there is a choice.

Examplereturn apple != banana;

Should be recognized as 5 tokens

Not more (not parts of words or identifier, or ! And = as separate tokens)

return NEQ ID(banana) SEMIID(apple)

Scanner DFA Example (1)

0

4

3

2

1

Accept EOF

Accept LP

Accept RP

Accept SEMI

White space or comments

end of input

(

)

;

Scanner DFA Example (2)

10

9

7

6

Accept NEQ

Accept NOT

Accept LEQ

Accept LESS

White space or comments

5

8

!

<

=

other

=

other

Scanner DFA Example (3)

11

12

White space or comments

[0-9]

[0-9]

other Accept INT

Scanner DFA Example (4)

13

14

White space or comments

[a-zA-Z]

[a-zA-Z]

other Accept ID orkeyword

Lex/Flex

Use Flex instead of Lex Use Bison instead of yacc When compile, link to the library

flex file.lex gcc –o object lex.yy.c –ll object

Lex - Structure

Declarations/Definitions

%% Rules/Production

- Lex expression

- white space

- C statement (optional)

%% Additional Code/Subroutines

Lex – Basic operators

* - zero or more occurrences . - “ANY” character .* - matches any sequence | - separator + - one or more occurrences. (a+ :== aa*) ? - zero or one of something. (b? :== (b+null) [ ] - choice, so [12345] (1|2|3|4|5) (Note: [*+] represent a choice between star and plus. They lost their specialty. - - [a-zA-Z] a to z and A to Z, all the letters. \ - \* matches *, and \. Match period or decimal point.

top related