compilation encapsulation

26
Compilation Encapsulation Or: Why Every Component Should Just Do Its Damn Job

Upload: galena

Post on 23-Feb-2016

54 views

Category:

Documents


0 download

DESCRIPTION

Compilation Encapsulation. Or: Why Every Component Should Just Do Its Damn Job. “when a negative int literal (e.g. -5) appears in the code, should it be a single integer token whose value is -5 or as two tokens, minus and an integer whose value is 5?”. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Compilation Encapsulation

Compilation Encapsulation

Or: Why Every Component Should Just Do Its Damn Job

Page 2: Compilation Encapsulation

“when a negative int literal (e.g. -5) appears in the code, should it be a single integer token whose value is -5 or as two tokens, minus and an integer whose value is 5?”

Page 3: Compilation Encapsulation

“when a negative int literal (e.g. -5) appears in the code, should it be a single integer token whose value is -5 or as two tokens, minus and an integer whose value is 5?”

Well, in theory…

We can write a lexer (maybe not with flex) with lookbehind, that makes sure the last token was neither a number nor a variable. (Or a function call, or a field reference. Pretty complicated lexer.)

Page 4: Compilation Encapsulation

But just because we can do itdoes that make it a good idea?

Page 5: Compilation Encapsulation

But what if we change the syntax?

Professor Moriarty wants IC to be more like Matlab. He asks you to support support scalar operations on arrays of scalars.

array – scalar = [a1-scalar, … an-scalar]

And suddenly new int[n] - 6 is valid…

Page 6: Compilation Encapsulation

Generic compiler structure

Executable code

exe

Sourcetext

txtSemantic

RepresentationBackend(synthesi

s)

Compiler

Frontend(analysis)

Page 7: Compilation Encapsulation

Executable code

exe

Sourcetext

txtSemantic

RepresentationBackend(synthesi

s)

Compiler

Frontend(analysis)

ICProgram

ic

x86 executable

exeLexicalAnalysi

s

Syntax Analysi

sParsing

AST Symbol

Tableetc.

Inter.Rep.(IR)

CodeGeneration

IC compiler

Page 8: Compilation Encapsulation

Encapsulation, what does it mean?

• It means each component needs to do its job, without regard for what the other components are doing.

• The tokenizer only cares about dividing the stream into tokens– Invalid characters– Keywords– Strings and comments

Page 9: Compilation Encapsulation

• The parser only cares about building a structure out of tokens– Assumes a valid stream of tokens– Structural rules with no meaning

• The semantic checker is free to only worry about semantics– Assumes a valid AST– Actually worries about meaning

Page 10: Compilation Encapsulation

Fake Exam Question #1

Professor Xavier wants IC to be more like Python. He asks you to support array and string multiplication.

"abc"*3 “abcabcabc”(new MyClass[5] * 2).length 10

Page 11: Compilation Encapsulation

But suppose…

• Suppose you decided to define your strings like so:

<YYINITIAL> \" { //move to <STRING> state to handle content yybegin(STRING); in_string_literal = true; }

<STRING>{

\"/{VALID_STR_POSTFIX} { //found the end of a string, finish. yybegin(YYINITIAL); in_string_literal = false; return new Token (sym.QUOTE,yyline + 1, string.toString()); } \" { throw new LexicalError(yyline+1); } //longest token only if invalid ahead

Page 12: Compilation Encapsulation

But suppose…

• Now we have to go back and fix the lexer, too.• When, in reality, there was no real reason to

perform that test:– There’s no case of something after the string the

syntax won’t be able to cope with.

Page 13: Compilation Encapsulation

Back to the tokenizer not caring

• What needs changing?• Lexer:– Nothing

• Syntax:– Nothing

• Semantic checks– Type check

• Code generation– Functionality of the operation

Page 14: Compilation Encapsulation

Fake Exam Question #2

• We want IC to support binary numeric literals– With the following syntax: 0b010010101 (leading

zeros after the binary signifier allowed)– With the same range restrictions

Page 15: Compilation Encapsulation

Solution #1

• We’ll add a new lexer token type, BINNUMBER– 0b[01]+

• And a new syntax rule for a BinNumber literal– Which, really, is only BINNUMBER

• And then check its range– Which is actually a lot easier than with decimals…

Page 16: Compilation Encapsulation

A short interlude: where does X go?

• Is property X lexical, syntactic or semantic?• Two main deciding factors

1. Correctness:Is there enough data to make the call right now?

2. Laziness:What will be gained by doing this right now?Is this the place where it’s easiest to do?

Page 17: Compilation Encapsulation

Example A: Range of decimal literals

• Correctness:– In any two's complement implementation of

integers, the bound is not going to be symmetric.– So we can’t make the call until we know if we have

a positive or a negative number on our hand…• Laziness:– Writing a lot of code that looks at the child

expressions during syntax is usually a bad sign.

Page 18: Compilation Encapsulation

Example B: range of binary literals

• Correctness:– All the data is there the second we got the token.

• Laziness:– Postponing the check means a continued

separation between binary and decimal literals – If we check right now, we can convert the value to

a number and forget all about it

Page 19: Compilation Encapsulation

Back to Fake Question #2

• So we can actually do it this way:• We’ll add a new lexer rule– 0b[01]+– We’ll also check the range here– And then! – return new Token (sym.INTEGER,yyline + 1,

bin2decimal(yytext()));

Page 20: Compilation Encapsulation

Where does Y go?

• Place the following property:call to method foo() is a static call.

• Our guiding principle here is correctness:

Lexer

Syntax?

Page 21: Compilation Encapsulation

Where does Y go?

• Syntax breaks methods up into three types:1.ClassName.foo() – definitely static2.varname.foo() – definitely not

So… correct?3.foo() - ???

So… not syntax.

Page 22: Compilation Encapsulation

Fake Question #3

• We want to allow type inference in IC

var a = new A();A b = a;C c = a; //type error

Page 23: Compilation Encapsulation

Q3: Lexer

• New token type VAR

Page 24: Compilation Encapsulation

Q3: Syntax

• We want an init expression whose type is VAR– Do we add VAR to types?– No, we treat it like void.

• How about AST representation?– We modify our LocalVariable class to keep “TBD”

in its type

Page 25: Compilation Encapsulation

Q3: Semantics

• To determine the new variable’s type:– instead of computing its type field (which is TBD) – compute the type of the expression

• Put that value into the symbol table, and all else is business as usual!

Page 26: Compilation Encapsulation

Good luck on the exam!