advanced compilers sdd, scope and ast - cnuplas.cnu.ac.kr/courses/2017f/a_compilers/ac 4 sdd scope...

35
1 Advanced Compilers SDD, Scope and AST Fall. 2017 Chungnam National Univ. Eun-Sun Cho

Upload: others

Post on 26-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

1

Advanced CompilersSDD, Scope and AST

Fall. 2017

Chungnam National Univ.

Eun-Sun Cho

2

Syntax-Directed Definition

• Syntax-Directed Definition

– Each production rule has a “To-Do(!)” list

• To-Do= an associated semantic action (code)

– a ‘To-Do’ list will be executed, whenever a production rule

is being reduced (in LR), or being derived(in LL)

– Commonly used in parser generators (i.e. yacc, ANTLR..)

• The parser to be created will have semantic actions,

defined by the compiler developer

3

Semantic Actions

• For actions of E E + E

– Three E need to be differentiated in action codes

– In yacc/bison, ANTLR .. : using $$, $1, $2, ....

eg. expr ::= expr PLUS expr {$$=$1+$3;}

Example 1) Calculator made of Yacc

4

expr : NUM {$$ = $1;}| expr PLUS expr {$$ = $1 + $3;}| expr MULT expr {$$ = $1 + $3;} | LPAR expr RPAR {$$ = $2;};

expr : NUM {$$.val = $1.val;}| expr PLUS expr {$$.val = $1.val + $3.val;}| expr MULT expr {$$.val = $1.val + $3.val;} | LPAR expr RPAR {$$.val = $2.val;};

or

We call values in vals “attributes”

5

Yacc, bison

declarations

%%

rules

%%

support code

.y file format

(same format as lex/flex )foo.y

y.tab.c

a.out

gcc

yacc, bison

yyparse() is

the main routine

6

Declarations

• User defined declarations: described “%{“ between “%}”

• Tokens – terminal to be used in the grammar

– %token terminal1 terminal2 ...

– or %token terminal1 val1 terminal2 val2 ...

• For the usages in yacc (or bison), the option ‘-d’ will allow to use the

tokens defined in lex (which is in y.tab.h)

– Refer to the Internet materials!

• lex/yacc pair, flex/bison pair

If you want to use C++ in the actions, you had better flex++/bison++ rather than lex/yacc

7

Declarations CONTS’

• Start symbol

%start non-terminal

• Associativity – (left, right, none)%left TK_PLUS

%right TK_ASSIGNMENT

%nonassoc TK_LESSTHAN

• Precedence

%prec

– Used to define a priority of a rule

8

Declarations CONTS’

• Attribute values – Information of terminal/non-terminal symbol

• delivered from lexer (scanner) eg. %union {

int ival;

char *name;

double dval;

}

– codify as YYSTYPE

• Type of non-terminals: used when a special value need to be delivered

%type<union_entry>non_terminal

eg. %type<ival>IntNumber

9

Functions

• Main function

– yyparse()

• Error function

– yyerror(char *s);

• The value of the last token

– yylval of type YYSTYPE (%union decl)

note:yylval is defined in lex

[a-z] {yylval.ival = yytext[0] – ‘a’;

return TK_NAME;} // in lex

then, we can use yylval in yacc

10

Rules

• Production rules

non-terminal : first_ | second_ | ... ;

non-terminal : ; /* -rule */

eg) foo : production1 | ; /* nothing*/

• Actions are to be executed if the rhs is matched

non-terminal l : rhs {action routine} ;

• Conflict

: shift/reduce or reduce/reduce conflicts

eg1) e: ’X’ | e ’+’ e ;

• “X+X+X” is “(X+X)+X”? or “X+(X+X)”?

eg2) Z: X|Y; X:’a’; Y:’a’;

• in yacc, conflict will occur

11

Attribute Values ($vars)

• Each terminal/non-terminal has attribute values

• In the action of the matched rule

– $$ = LHS

– $1 = first symbol of the RHS

– $2 = second symbol, etc.

– If the actions are as follows

A: B {...} C {...} ;

C’s value is $3 !!

12

Example .y File- Calculator

%union {

int value;

char *symbol;

}

%type<value> exp term factor

%type<symbol> ident

...

exp : exp ‘+’ term {$$ = $1 + $3; };

/* Note, $1 and $3 are ints here */factor: ident {$$ = lookup(symbolTable, $1); };

/* Note, $1 is a char* here */

13

Elimination of If-Conflict in C

• Binding else with the closest if

– Change the grammar (possible but complex!),

– or Use yacc directives

%nonassoc LOWER_THAN_ELSE

%nonassoc ELSE

statement : if expr statement %prec LOWER_THEN_ELSE

| if expr statement ELSE statement

14

SDD Example 2) Type Declaration

D T id {AddType(id, T.type);

D.type = T.type; }

D D1, id {AddType(id, D1.type);

D.type = D1.type; }

T int {T.type = intType; }

T float {T.type = floatType; }

{AddType($2, $1.type);

$$.type = $1.type; }

in yacc

D

D , id

T id

intintType

T.type

D.type

D.type

Values are propagated, in bottom-up direction

int a, b

15

SDD Example 3) Type Declaration

D TL {AddType(id, T.type);

D.type = T.type;

L.type = D.type; }

T int {T.type = intType; }

T float {T.type = floatType; }

L L1, id {AddType(id, L1.type);

??? }

L id {AddType(id, ???); }

D

id

LT

intintType

T.type

L.type

D.type

int a, b

L , idL.type

Values are propagated, in both top-down and bottom-up direction

16

Attributes

• AST vs. SDD– fact 1 An AST can be defined using SDDs. (eg. previous example 1 )

– fact 2 A SDD can be viewed a series of evaluations on attributesattached to the nodes of an AST.

• Categories of attributes (in AXYZ)– synthesized attr.

• Evaluation is made from the attribute values of children (bottom-up)

• A.attr = f(X.attr, Y.attr, Z.attr);

• All the terminals assume to be synthesized attr.

– inherited attr.

• Evaluation is made from the attribute values of parent or sibling as well as children

• Y.attr = f(A.attr, X.attr, Z.attr);

Implementation of Attribute Evaluation

• On-the-fly implementation (cf. Rule-based method, Parse Tree method)

– evaluation order is the same as AST traversal order

– Most efficient, but has restriction

• S-attributed SDD : only has synthesized attr.

• L-attributed SDD : only has synthesized attr. + evaluation with attributes of left siblings

• Semantic analysis

– To ensure correctness of usage of program constructs (variables, objects, expressions, statements...) by analysis

• Related to scopes and types

– Semantic analysis “Attribute Evaluation + Attribute value check”

• cf. single pass (with single AST) vs. multiple pass (with separate AST)

17

Listener Style in ANTLR

18

A method has what-to-be-

done when visiting a node in

a parsing tree

If the name of grammar is MiniC, interfaceMiniCListener and skeleton class MiniCBaseListener are generated automatically.

For each nonterminal,enter.., exit.. methods are generated automatically

While the parsing tree is traversed, enter.. and exit.. methods would be invoked when each nonterminal node is visited.

19

Semantic Analysis

• Scope-related checks: Check if a variable is not used before definition, if same-named variables are defined twice

• Type-related checks: Variables and the assigned values are type compatible

Lexical AnalysisSyntax Analysis

Semantic AnalysisErrors

Abstract Syntax Tree + ...

Source code

20

Scope

• Identifiers– Variables, Constants, Function names, lables ...

• ‘Lexical’ Scope– Textual range of a program

• Statement block, function definition, source file, the whole program...

• Scope of the identifiers– The lexical scope referred by the identifier

예) scope of a variable

In a block (local variables), In a function definition(formal parameters), Source file (global variables), The whole program (extern variables)

cf. How about fields? methods?

21

Variaboles Scope : PL Review

{ int a;...

{int b;...}

....}

scope of variable a

Scope of variable b

void foo() {... goto lab;...lab: i++;... goto lab;...

}

int foo(int n) {...

}

scope of argument n

scope of label lab(The function body

in ANSI C)

22

Symbol Tables

• Symbol tables

– Data structure for management of symbols and related information

– Scope and types of identifiers in the data structure are referenced in Semantic analysis and code generation

– Insertion : in variable declaration phase

– Lookup: when used in expressions and in other language structures

• Table entries : identifier name + info

– eg.

NAME KIND TYPE ATTRIBUTESfoo func int,int int externm arg intn arg int consttmp var char const

23

Scope Information in Symbol Tables

• Characteristics of block structured languages

– Each block (lexical scope) has local variable

declarations

Each lexical scope has a single symbol table

– Hierarchy of scopes :

• Each block (lexical scope) can have other

subblocks

• Any variables declared in enclosing blocks

can be used

Hierarchy of symbol tables

int x;

void f(int m) {

float x, y;

...

{int i, j; ....; }

{int x; l: ...; }

}

int g(int n) {

char t;

... ;

}

24

Examples

int x;

void f(int m) {

float x, y;

...

{int i, j; ....; }

{int x; l: ...; }

}

int g(int n) {

char t;

... ;

}

x var int

f func int void

g func int int

m arg int

x var float

y var float

n arg int

t var char

i var int

j var int

x var int

l label

Global symtab

func f

symtab

func g

symtab

25

Error Checking

int x;

void f(int m) {

float x, y;

...

{int i, j; x=1; }

{int x; l: i=2; }

}

int g(int n) {

char t;

x=3;

}

x var int

f func int void

g func int int

m arg int

x var float

y var float

n arg int

t var char

i var int

j var int

x var int

l label

Global symtab

i=2

Error!

“undefined

variable”

• Starting from the current scope, search upward along the hierarchy

• If no matching declaration is found until reaching the root Error!

26

Symbol Table Implementation

• Essential operation– Table construction: after building the AST

– Insertion : in variable declaration phase

– Lookup: when used in expressions and in other language structures (checking)

cf. forward reference ?

• For efficiency– Identifier names in table entries

• using a string pool to hold only pointers to the pool in the table

– Local tables: hash

– Globally N-ary tree structure• But, tree is too expensive!

• More efficient management is possible by using locality of usage:

Note that after getting out of the scope, the corresponding local table is useless

Global Table Hierarchy-Using A Stack

27

{int i,j;} {int x..}

f() f() f() g()

file file file file file fileStack] 27

int x;

void f(int m) {

float x, y;

...

{int i, j; ....; }

{int x; l: ...; }

}

int g(int n) {

char t;

... ;

}

x var int

f func int void

g func int int

m arg int

x var float

y var float

n arg int

t var char

i var int

j var int

x var int

l label

Global symtab

func f

symtab

func g

symtab

28

Semantic Analysis

• Semantic Analysis = Syntax Analysis +α

• In this chapter– Abstract Syntax Tree (AST)

– Syntax-Directed Definition/Translation (SDT)

Lexical (어휘) AnalysisSyntax (구문) Analysis

Semantic (의미) AnalysisErrors

Abstract Syntax Tree+ ...

Source Code

Abstract Syntax Tree

29

30

Parse Tree

• Parse tree

– Describing derivation process

– Terminals are leaf nodes

– Non-terminals are intermediate nodes

– Impossible to express derivation order

(eg. same with left-most derivation and

right-most derivation)

S

E + S

( S ) E

E + S 5

E + S1

2 E

( S )

E + S

E3 4

31

Abstract Syntax Tree (AST)

• AST

– Parse tree without

superfluous information

+

+ 5

1 +

2 +

3 4

S

E + S

( S ) E

E + S 5

E + S1

2 E

( S )

E + S

E3 4

32

AST Data Structures – Java Example

Abstract class Expr{}

class Add extends Expr {

Expr left, right;

Add(Expr L, Expr R) {

left=L; right=R;

}

}

class Num extends Expr {

int value;

Num(int v) {value = v;}

}

+

N 5+

N 1+

. . .

cf. Visitor Pattern

33

AST Data Structures – C Example 1

struct tokenType {

int tokenNumber;

char * tokenValue;

}

typedef struct nodeType{

struct tokenType token; // only meaningful token

struct nodeType children[MAX]; // here, max==2

}

// Space waste problem

// Might be serious for a “Statement List” (rather than nodes for ADD)

// What about using a linked list for a node? Performance overhead!

(+,0)

(+,0)

(+,0)

. . .

(N,1)

(N,5)

34

AST Data Structures – C Example 2

struct tokenType {

int tokenNumber;

char * tokenValue;

}

typedef struct nodeType{

struct tokenType token; // only meaningful tokens

struct nodeType * son;

struct nodeType * brother;

}

// n-ary tree is represented as a binary tree

(+,0)

(+,0)

(+,0)

. . .

(N,1)

(N,5)

35

Building an AST from SDD

expr ::= NUM {$$ = new Num($1.val); }

expr ::= expr PLUS expr {$$ = new Add($1, $3); }

expr ::= expr MULT expr {$$ = new Mul($1, $3); }

expr ::= LPAR expr RPAR {$$ = $2; }