1 scangen. 2 scangen accepts descriptions of tokens written as regular produces tables for a finite...

20
1 ScanGen

Upload: damian-marsh

Post on 16-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

1

ScanGen

Page 2: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

2

ScanGen

Scangen accepts descriptions of tokens written as regular

produces tables for a finite automata driver program

written by Gary Sevitsky in spring 1979 and modified and enhanced by Robert Gray in spring 1980

Later changes were made by Charles Fischer in 1981 and 1982

Page 3: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

3

ScanGen

User defines the input to ScanGen in the form of a file with three sections: Options, Character Classes, Token Definitions:

Token name {minor,major} = regular expression Regular expression can include except clauses, and {Toss} attributes

Example of ScanGen input:

textbook page 61: extended Micro

Page 4: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

4

Options Section

The is optional Followed by one or more option names

(which are not reserved) The option names may appear in any

order, separated by blanks or commas

Page 5: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

5

Class Definitions

The specify the character classes that make up the alphabet used by the regular expressions.

The character classes are sets of ASCII characters, which are defined, as in the example, by using single characters within quotes or by using ranges of characters.

Page 6: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

6

Regular Expression definitions

Specify the tokens, are built using the character classes and the following operations positive closure ("+") Kleene closure ("*") concatenation (".") union (",") Precedences can be overridden by use of parentheses

Page 7: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

7

Token Number

Major token number Define a token class

Minor token number Specify the member of

that class. If not specified

default value "0". Token numbers

Must be non-negative integers Same token number may be used for different tokens

Tokens that are to be deleted (comments, spaces, etc) Are assigned a major token number of "0".

Page 8: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

8

"NOT" operation

Used in definitions of "StrConst"

and "RunOnString". Only be used to

complement a union of character classes. The complement is taken relative to the classe

s specified in the class definitions. character class "EPSILON" stays out of complements.

Page 9: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

9

"TOSS" feature

Tell the scanner whether or not to append a character to the token string

it is building. If a character is not to be appended

put a "{TOSS}" after the name of its character class in the token definition.

A "{TOSS}" may only appear after the name of a character class or after "NOT(...)".

Careless use of the TOSS feature can lead to a toss/save conflict

Page 10: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

10

"TOSS" feature

For example a toss/save conflict would occur if "StrConst" were def

ined by:

Quote{TOSS} . (NOT(Quote, Linefeed), Quote.Quote{TOSS})*. Quote{TOSS}

This conflict can be seen by comparing scanner actions on the strings 'a' and 'a''b'.

Page 11: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

11

Example 1

OPTIONS tables,list CLASS letter = 'A'..'Z', 'a'..'z'; digit = '0'..'9'; blank = ' '; DEFINITION TOKEN emptyspace {0} = blank+; TOKEN identifier {1} = letter.(letter, digit)*; TOKEN number {2} = digit+;

Page 12: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

12

Example 2

OPTIONS List, tablesCLASS E = 'E', 'e'; OtherLetter = 'A'..'D','F'..'Z','a'..'d','f'..'z'; Digit = '0'..'9'; Blank = ' '; Dot = '.'; Plus = '+'; Minus = '-'; Quote = ''''; Linefeed = 10;

Page 13: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

13

Example 2

DEFINITION TOKEN EmptySpace {0} = (Blank, Linefeed)+; Letter = E, OtherLetter; TOKEN Identifier {1} = Letter.(Letter,Digit)* EXCEPT 'BEGIN' {4}, 'END' {5}; TOKEN IntConst {2,1} = Digit+; TOKEN RealConst {2,2} = IntConst.Dot.IntConst. (EPSILON, E.(EPSILON, Plus, Minus).IntConst); TOKEN StrConst {2,3} = Quote{TOSS}. (NOT(Quote, Linefeed),Quote{TOSS}.Quote)*. Quote{TOSS}; TOKEN RunOnString {3} = Quote{TOSS}. (NOT(Quote, Linefeed), Quote{TOSS}.Quote)*. Linefeed{TOSS};

Page 14: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

14

ScanGen Driver The driver routine provides the actual scann

er routine, which is called by the parser.void scanner(codes *major, codes *minor, char *token_text)

It reads the input character stream, and drives the finite automata, using the tables generated by ScanGen, and returns the found token.

Page 15: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

15

ScanGen Tables

The finite automata table has the form next_state[NUMSTATES][NUMCHARS]

In addition, an action table tells the driver when a complete token is recognized and what to do with the “lookahead” character:action[NUMSTATES][NUMCHARS]

Page 16: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

16

Action Table

The action table has 6 possible values:

ERROR

MOVEAPPEND

MOVENOAPPEND

HALTAPPEND

HALTNOAPPEND

HALTREUSE

scan error.

current_token += ch and go on.

discard ch and go on.

current_token += ch, token found, return it.

discard ch, token found, return it.

save ch for later reuse, token found, return it.

Driver program on textbook pages 65,66

Page 17: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

17

Output tables

This file consists of the following five sections: Section 1: Parameters for the Scanner Section 2: Character Class Mapping Section 3: Reserved Word to Token Mapping. Section 4: Reserved Word List Section 5: Transition Table of the Minimal

Deterministic Finite Automaton.

Page 18: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

18

Page 19: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

19

Execute ScanGen

1.Download the SCANGEN.zip and expand into cygwin /usr/src/scangen directory

2. Run ./scangen.exe < adacs.scan

3. Type Tables when shows c a n g e n -- automatic lexical analyzer generator version 2.0 (12/8

2)

options used for this run: tables, optimize

construction of finite automaton completedOutput file `Tables': Tables

Page 20: 1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in

20

Complie and Test Scanner

1.Download the scanner.example.rar and expand into cygwin /usr/src/scanner.example directory

2. Copy Tables file from /usr/src/scangen into /usr/src/scanner.example

3.compile with makefile make

4. run a.exe ./a

5. type source file test