1 problem 2 a scanner / parser for simple c. 2 outline l language syntax for sc l requirements for...

22
Problem 2 A Scanner / Parser for Simple C

Post on 21-Dec-2015

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

1

Problem 2

A Scanner / Parser for Simple C

Page 2: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

2

Outline Language syntax for SC Requirements for the scanner Requirement for the parser companion files Main classes

Page 3: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

3

Language Syntax for SC

0. Comments: 1. Data types:2. Literals and Identifiers 3. Operators: 4. Control Statements:5. Functions: 6. Program Syntax: 7. Forward Function Declarations:8. Built-in library functions:9. Nested lexical scoping.

Page 4: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

4

0. Comments:

// Ignore to the end of line

1. Data types: » int » void» There can be arrays of int. ex: int x, y[10];» There are no boolean variables, but

expressions may have type boolean if they are the result of a comparison. (e.g., x > y )

Page 5: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

5

2. Literals and identifiers IDENTIFIER:

» used for variable or function names» type names ‘int’ and ‘void’ are key words.» format: (letter or _ ) (letter or _ or digit )*

(integer) CONSTANT:» non-negative decimal number( < 231).» legal: 23, 54, 0,» illegal: -10, 01, 001. » note: -10 is regarded as two tokens – and 10.

STRING_LITERAL: “ followed by a sequence of characters in which “, and \ must be escaped by a preceding \, and finally followed by a matching “.

Page 6: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

6

3. Operators1. arithmetic +, -, *, /2. relational ==, !=, <, >3. logical &&, ||, !

All binary arithmetic and logical operators are left-associative.

NOT (!) and UNARY_MINUS (-) are not associative. I.e., (- - 2) is illegal; you must use -(-2).

Arithmetic operators require integer operands, logical operators require boolean operands, The relational operators == and != work on both bool & int.

The relational operators < and > work on only int. All return a boolean.

Page 7: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

7

Operator Precedence Highest - (unary minus) * / + - == != > < ! Lowest && || You have to assign the appropriate precedences in

CUP. All math operators (+, -, *, /) and comparisons (==, !=, <,

>) are defined for only integers. Logical operators (&&, ||, !) apply only to results of

comparisons (booleans).

Page 8: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

8

4. Control Statements Assignment: lhs = Expression ; function call: foo(…); return: return [ expr ]; blockStatement: { statement_list } Statement_list : Statement_list statement | SC has three control statements:

» if, if-else, and while.1. if ( condition ) { statement_list }2. if ( condition ) { statement_list } else { statement_list }3. while ( condition ){ statement_list }

Page 9: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

9

Control Statements There can be any level of nesting of these

control statements. Curly braces are required for all SC control statements.

Ex: the code if (x != 0) a = b + c; should produce a syntax error. The correct

syntax would be if (x != 0) { a = b + c; } There is no implicit type casting in conditional

statements. For instance : “ if (x) " is not valid if "x" is an int.

You need to write “ if ( x != 0) "

Page 10: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

10

5. Functions All functions must be declared with type int or

void. Function calls in SC can appear either on the

RHS of an assignment or as a statement:» x = a + foo() + 10; // foo must return an

integer» foo(); // foo may return either integer or void

A function call will have either 0 or 1 arguments. Functions may have 'return' statements. There

can be one or more return statements in the statement list. The syntax for the return statements are » return; // when returns void» return exp; // when returns integer

Page 11: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

11

6. Program Syntax: A valid SC program consists of zero or more

global variable declarations, followed by one or more function definitions.

The body of the functions may contain local variable declarations, followed by the code.

The function structure :

[ int | void ] function_name( 0 or 1 parameters )

{ local variable declarations; // optional

statement list;

}

Page 12: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

12

7. Forward Function Declarations:

A function has to be defined before you can call it. Ex: the following code should cause bar() to be marked

as an undefined function. void foo() { bar(1); } void bar(int x) { y = x; } Here, we need a forward function declarations. for bar()

before we can call it. void bar( int x );

foo() { bar(); } void bar( int y ) { y = 1; }

Forward declarations specify the function return type and parameter type. They must match with actual function declaration.The name of the parameter in the forward declaration is not relevant.

Page 13: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

13

8. Built-in functions There are three built-in void functions:

» printInt(int), printString(StringLiteral), printLine().

The compiler will automatically generate code for these functions if they are used.

printString() takes a literal constant string as argument. Ex:

int x;

printInt(1); printInt(x); printInt(x+1);

printString("foo");

printLine();

Page 14: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

14

9. Scoping rules SC applies nested lexical scoping, where every pair of

curly brackets creates a new nested scope (and can include new variable declarations).

Variables visible include those declared locally and in enclosing scopes. For instance:

A: int x; // x is global foo() { B: int x,y; if (x == 1) // refers to x declared at B: { C: int x,z; x = y+z; //x,z declared at C: ; y declared at B: } else { int i,j; x = 3; // x declared at B:; }} // x,z at C: are not visible

Page 15: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

15

Requirements for lexical analyser

Identify the keywords, operators, identifiers, strings, constants and other necessary characters correctly.

Use JLex to generate the lexical analyzer» You may need to explicitly look for carriage returns on PC-

based systems, if the %notunix command to the lexer is not working. You can do this by specifying "\r\n" instead of "\n" as the newline.

Goals: 1. find all legal tokens2. handle comments3. report unterminated strings

Tasks: 1. add comments2. extend strings3. extend numbers

Page 16: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

16

Requirements for the SC parser

Use CUP to generate the parser. Report simple syntax errors, and attempt to recover. Goals:

» 1. accept all legal SC syntax » 2. report simple syntax errors using "error" token

Exs:» "illegal statement“, "illegal expression"» "illegal declaration" "missing semicolon"

Tasks:» void functions» array declarations» multiple variable declarations» nested lexical scoping » arithmetic, logical, and relational operators» forward function declarations

Page 17: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

17

Implementation You have been given a toy parser which parses a

subset of the SC language. You need to enhance them to parse the full SC

language. The files are listed below, with a brief description: mysc.lex:

» The JLex specification file for implementing the scanner. Most of the basic tokens have been added for you. All you need to do is extend it for comments, strings, and better handling of identifiers and numbers.

mysc.cup: » The CUP specification file. You need to extend this

file in order to implement the parser. Most of your changes should be to this file. A lot of the grammar and actions have been provided.

Page 18: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

18

SymTabEntry.java: » Code for implementing symbol table entries. You

should not need to modify this file. SymTab.java:

» Code for implementing the symbol table You should not need to modify this file.

ExpNode.java: » Code for storing information with expression nodes.

You should not need to modify this file. Yylex.java, Parser.java, sym.java:

» Files created by JLex & CUP go.bat: Scriptfile for compiling mycc. goAll.bat: Script for compiling mycc & testing it on test*.in

Page 19: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

19

goTest.bat: Script testing mycc on test*.in toy*.sc: Some sample input SC programs

handled by the toy front end test*.sc: Some sample input SC programs you

should try to parse test*.log: Log files created by mysc when using

the goAll.bat scripts test*.out: Some sample output files from a fully

implemented front end

Page 20: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

20

Main classes1. Parser

» Generated by CUP, this is the class which performs the actual parse. All action code in your CUP grammar are executed as methods of class "parser".

» Important public fields include globalSymTab, currSymTab, and curType.

» These fields are used to store information to be transfered between different actions in CUP.

» The "parser" class also contains main(), the starting point of the user code.

2. sym» Generated by CUP;» contains the constants for all the token in the grammar.

Also used by JLex for the scanner.» Useful constants: sym.INT, sym.BOOL, and sym.VOID.

Page 21: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

21

Main classes (continued)3. Yylex

» Generated by JLex; implements the scanner.

4. SymTab» Stores all the symbols in a scope as a

HashMap. » The symbol table also keeps track of its

parent. » All local nested scopes are kept in a children

list, except for function scopes, which are stored in the entry for the function symbol.

Page 22: 1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion

22

Main classes (continued)5. SymTabEntry

» Stores information for each symbol. » Generic information includes name (as a String) &

type (sym.INT, etc). Also includes information specific to arrays & functions.

» Note: The symTabEntry for a function stores the symTab for that function.

6. ExpNode» Stores information as to the type and value of an

expression.» Helpful for keeping track of intermediate expressions

(e.g., 1+2*4).