topic 8: intermediate representation -...

37
Topic 8: Intermediate Representation 1 Compiler Design Prof. Hanjun Kim CoreLab (Compiler Research Lab) POSTECH

Upload: hamien

Post on 16-Jun-2018

244 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Topic 8: Intermediate Representation

1

Compiler Design

Prof. Hanjun Kim

CoreLab (Compiler Research Lab)

POSTECH

Page 2: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

IR: Intermediate Representations

• Intermediate Representation (IR)• An abstract machine language• Expresses operations of target machine• Not specific to any particular machine• Independent of source language

• IR code generation not necessary• Semantic analysis phase can generate real assembly

directly• Hinders portability and modularity

Lexer ParserSemantic

Analysis

SourceStream of

Tokens

Abstract

Syntax Tree IR

2

Page 3: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Intermediate Representations

3

C++

C

Java

ML

Fortran

x86

MIPS

ARM

IA-64

C++

C

Java

ML

Fortran

x86

MIPS

ARM

IA-64

IR

n source languages & m target machines

n * m compiler necessary n + m compiler necessary

Page 4: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Properties of a Good IR

• Must be convenient for semantic analysis phase to produce

• Must be convenient for translate into real assembly code for all desired target machines

• RISC processors• Execute simple operations

• Ex: load, store, add, shift, branch

• IR should represent abstract load, abstract store, abstract add, …

• CISC processors• Execute more complex

• Ex: multiply-add, add to/from memory

• Instruction selection phase may clump together simple operations in IR to form complex real machine instructions

4

Page 5: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

IR Representations

• IR can be represented in many forms: Trees, List …

• Here, we will use a simple IR tree as an example• exp: constructs that compute some value, possibly with

side effects

• stm: constructs that perform side effects and control flow

5

Page 6: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

IR Tree Example: Expression

signature TREE = sig

datatype exp = CONST of int

| NAME of Temp.label

| TEMP of Temp.temp

| BINOP of binop * exp * exp

| MEM of exp

| CALL of exp of exp list

| ESEQ of stm * exp

and binop = PLUS | MINUS | MUL | DIV | AND

| OR | LSHIFT | RSHIFT | ARSHIFT

| XOR

6

Page 7: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

IR Tree Example: Expression

Expressions compute some value, possibly with side effects

• CONST(i)

• Integer constant i

• NAME(n)

• Symbolic constant n corresponding to assembly language label (abstract

name for memory address)

• TEMP(t)

• temporary t, or abstract/virtual register t

• BINOP(op, e1, e2):

• e1 evaluated before e2

• Integer arithmetic operators: PLUS, MINUS, MUL, DIV

• Integer bit-wise operator: AND, OR, XOR

• Integer logical shift operator: LSHIFT, RSHIFT

• Integer arithmetic shift operator: ARSHIFT

7

Page 8: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

IR Tree Example: Expression

Expressions compute some value, possibly with side effects

• MEM(e)

• contents of wordSize bytes of memory starting at address e

• wordSize: defined in Frame module

• If MEM is used as left operand of MOVE statement: store

• If MEM is used as right operand of MOVE statement: load

• CALL(f, l)

• Application of function f to argument list l

• Sub-expression f is evaluated first

• Arguments in list l are evaluated left to right

• ESEQ(s, e)

• Statement s evaluated for side-effect, e evaluated next for result

8

Page 9: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

IR Tree Example: Statement

TREE continued:

and stm = MOVE of exp * exp

| EXP of exp

| JUMP of exp * Temp.label list

| CJUMP of relop * exp * exp

* Temp.label * Temp.label

| SEQ of stm * stm

| LABEL of Temp.label

and relop = EQ | NE | LT | GT | LE | GE

| ULT | ULE | UGT | UGE

9

Page 10: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

IR Tree Example: Statement

Statements have side effects and perform control flow

• MOVE(TEMP(t), e)

• Evaluate e, and move result into temporary t

• MOVE(MEM(e1), e2)

• Evaluate e1, yielding address a

• Then, evaluate e2, store the result at address a in wordsize bytes of memory

• EXP(e)

• Evaluate e, discard the result

• JUMP(e, labs)

• Jump to address e

• E may be literal label (NAME(l)), or address calculated by expression

• labs specifies all locations that e can evaluate to (used for dataflow analysis)

• Ex: JUMP(NAME(l), [l])

10

Page 11: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

IR Tree Example: Statement

Statements have side effects and perform control flow

• CJUMP(op, e1, e2, t, f)

• Evaluate e1, then e2

• Compare the results using op

• If true, jump to t, else jump to f

• op:

• EQ, NE: signed/unsigned integer equality and non-equality

• LT, GT, LE, GE: signed integer inequality

• ULT, UGT, ULE, UGE: unsigned integer inequality

• SEQ(s1, s2)

• Statement s1 followed by s2

• LABEL(l)

• Label definition – constant value of l defined to be current machine code

address

• Use NAME(l) to specify jump target, calls, etc.

11

Page 12: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Translation of an Abstract Syntax Tree

into an IR Tree

12

Page 13: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Variable: Declarations

• Update environments

• venv: variable environment

• tenv: type environment

• Consider “let” expression:

fun transExp(venv, tenv) exp =

| A.LetExp(decs, body, pos) =

let

val (venv’, tenv’) = transDecs(venv, tenv, decs)

in

transExp(venv’, tenv’) body

end

• Need level to allow accesses from nested functions

13

Page 14: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Variable: Simple variable use

• Case 1: variable v declared in temporary register

• InReg(t_103):TEMP(t_103)

• Case 2: variable v declared in a stack frame

• InFrame(k):MEM(BINOP(PLUS, TEMP(FP), CONST(k)))

• k: offset in own frame

• FP: frame pointer, declared in Frame module

14

Page 15: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Variable: Variable use in a nested function

• Modern languages support nest functions

• Generate an IR code with static links

• Level of function f in which v used: 1

• Level of function g in which v declared: n

• MEM nodes added to tree with static link offsets (k_1, …, k_n-1)

• When function g is reached, offset k_n used

• Example: InFrame(k):

MEM(BINOP(PLUS, CONST(k_n),

MEM(BINOP(PLUS, CONST(k_n-1),

...

MEM(BINOP(PLUS, CONST(k_2),

MEM(BINOP(PLUS, CONST(k_1), TEMP(FP))))))

• k_1, k_2, …, k_n-1: static link offset

• k_n: offset of v in own frame

• FP: frame pointer, declared in Frame module

15

Page 16: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Array

• Array Creation

• Call run-time system function alloca to allocate and initialize memory

int a[10];

CALL(NAME(alloca), [CONST(10), CONST(4)]

• Array Access

• Given array variable a,

&(a[0]) = a

&(a[1]) = a + w, where w is the word-size

&(a[2]) = a + (2*w)

...

• Let e be the IR tree for &a:

a[i] => MEM(BINOP(PLUS, e, BINOP(MUL, i, CONST(w))))

• Compiler must emit code to check whether i is out of bounds

16

Page 17: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Record: Creation

17

• Call run-time system function alloca to allocate and initialize memory

• Return address of record (TEMP(result))

type rectype = {f1:int, f2:int}

var a:rectype := rectype{f1=4, f2=5}

=>

ESEQ(SEQ(

MOVE(TEMP(result), CALL(NAME(alloca), [CONST(8)])),

SEQ(

MOVE(BINOP(PLUS,TMPE(result),CONST(0*w)),CONST(4))

SEQ(

MOVE(BINOP(PLUS, TMPE(result),CONST(1*w)),CONST(5))

))),

TEMP(result))

Page 18: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Record: Accesstype rectype = {f1:int, f2:int, f3:int}

| | |

offset: 0 1 2

var a:rectype := rectype{f1=4, f2=5, f3=6}

• Let e be IR tree for aa.f3:

MEM(BINOP(PLUS, e,

BINOP(MUL, CONST(2), CONST(w))))

• Compiler must emit code to check whether a is nil

18

Page 19: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Strings

• In C, string literal is constant address of memory segment initialized

to characters in string

• In assembly, label used to refer to this constant address

• Label definition includes directives that reserve and initialize memory

• Ex: “foo”

• Create a new label l and its string fragment “foo”

• NAME(l): used to refer to the string

• Code emitter will handle the fragment: initialize memory with “foo” at

address l

• All string operations performed by run-time system functions such as string library

• Ex: CALL(NAME(“stringEqual”), [s1, s2])

19

Page 20: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Conditional Statements

if e1 then e2 else e3

• Treat e1 as conditional expression with label t and f

• Ex: e1 = x<5 • e1(t,f): CJUMP(LT, x, CONST(5), t,f)

ESEQ(SEQ(e1(t,f),

SEQ(LABEL(t),

SEQ(MOVE(TEMP(r), e2)),

SEQ(JUMP(NAME(join)),

SEQ(LABEL(f),

SEQ(MOVE(TEMP(r), e3)),

LABEL(join))))))),

TEMP(r))

20

Page 21: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

While Loops

• Layout of a while loop• While CONDITION do BODY

• test:if not(CONDITION) goto done

BODY

goto test

done:

• A break statement within BODY is a JUMP to label done

• done should be the label of the nearest enclosing loop in nest loops

21

Page 22: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

For Loops

• Basic idea: rewrite AST into let/while ASTfor i:= low to high do

body

• Becomes:let

var i:= low

var limit := high

in

while (i <= limit) do

(body; i := i+1)

end

• Compilation:• If limit == maxint, it will overflow in translated version

22

Page 23: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Function Calls

23

f(a1, a2, …, an) ->

CALL(NAME(l_f), [sl, e1, e2, ..., en])

• sl static link of f (computable at compile-time)

• To compute static link, need:• Level of function f

• Level of function g, the calling function

• Computation: similar to simple variable access

Page 24: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Function Declaration

• Function is translated into assembly language segment with three components

• Prologue

• Body

• Epilogue

• Some of prologue and epilogue are generated after register allocation because frame size can be determined after register allocation

24

Page 25: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Function Prolog

• Prologue precedes body of the function • Pseudo-instructions to announce the beginning of a

function

• Label definition for the function name

• Instruction to adjust stack pointer (SP) – allocate a new frame

• Instructions to save “escaping” arguments including static link into the frame

• Instructions to move non-escaping arguments to fresh temporary registers

• Instructions to store any callee-save registers used in the function

25

Page 26: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Function Epilog

• Epilog follows body of the function• Instructions to move function result (return value) into

return register

• Instructions to restore any callee-save registers used in the function

• Instruction to adjust stack pointer – dellocate frame

• Return instructions – jump to return address

• Pseudo-instructions to announce the end of the function

26

Page 27: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Problems with IR Trees

• Mismatches between Trees and machine language programs

• CJUMP jumps to either of two labels, but conditional branch instructions in real machine only jump to onelabel. On false condition, fail-through to next instruction

• ESEQ, CALL nodes within expressions can make different orders of evaluating subtrees yield different results

• CALL nodes within argument list of other CALL nodes will cause problems if arguments are passed in formal-parameter registers

• BINOP(PLUS, CALL(…), CALL(…))

27

Page 28: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Solution: Canonicalization

• Make a sequence of IR statements • Eliminate ESEQ

• Lift ESEQ nodes higher and higher in the tree until they become SEQnodes

• ESEQ(𝑠1, ESEQ(𝑠2, 𝑒)) => ESEQ(SEQ(𝑠1, 𝑠2), 𝑒)

• Move CALLs to Top Level

• Assign return values to fresh temporary registers

• Problem: BINOP(PLUS, CALL(…), CALL(…))

• CALL(fun,args)

=> ESEQ(MOVE(TEMP t, CALL(fun, args)), TEMP t)

• Generate a linear list of statements• SEQ(SEQ(a,b),c) => SEQ(a, SEQ(b,c)) => a; SEQ(b,c)

28

Page 29: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Examples of Eliminating ESEQ

29

ESEQ

ESEQs1

s2 e

ESEQ

eSEQ

s1 s2

ESEQ(s1, ESEQ(s2, e)) ESEQ(SEQ(s1, s2), e)

BINOP

ESEQop

s e1

ESEQ

BINOPs

op e2

BINOP(op, ESEQ(s, e1), e2)

MEM(ESEQ(s, e1))

JUMP(ESEQ(s, e1))

CJUMP(op, ESEQ(s, e1), e2, l1, 12)

ESEQ(s, BINOP(op, e1, e2))

ESEQ(s, MEM(e1))

SEQ(s, CJUMP(e1))

SEQ(s, CJUMP(op, e1, e2, l1, l2)

e2

e1

Page 30: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Examples of Eliminating ESEQ

30

BINOP

ESEQop

s e2

ESEQ

ESEQMOVE

TEMP e1

BINOP(op, e1, ESEQ(s, e2)) ESEQ(MOVE(TEMP t, e1),

ESEQ(s,BINOP(op,TEMP t,e2)))

e1

s BINOP

t op TEMP e2

t

CJUMP(op,e1,ESEQ(s,e2),l1,l2) SEQ(MOVE(TEMP t, e1),

SEQ(s,CJUMP(op,TEMP t,e2,l1,l2)))

Page 31: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Examples of Eliminating ESEQ

31

BINOP

ESEQop

s e2

ESEQ

s

BINOP(op, e1, ESEQ(s, e2)) ESEQ(s,BINOP(op, e1, e2))

e1 BINOP

op e1 e2

CJUMP(op,e1,ESEQ(s,e2),l1,l2) SEQ(s,CJUMP(op, e1, e2, l1, l2))

If s, e1 commute, …

Page 32: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

How to Handle CJUMP?

Reordering Basic blocks

32

Page 33: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Basic Block

• Basic Block • A sequence of assembly instructions with one entry and one exit point

• First statement: LABEL

• Last statement: Branch (JUMP, CJUMP)

• No LABEL, JUMP, CJUMP statements in between

• Control flow graph of basic blocks are more convenient

• Determine by the following:• Find leaders:

• First statement

• Targets of conditional and unconditional branches

• Instructions that follow branches

• Basic blocks are leader up to, but not include next leader

33

Page 34: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Basic Block Creation

• Scan from beginning to end• LABEL: a new block starts

• JUMP, CJUMP: a block is ended (next block starts)

• Ending without JUMP or CJUMP• Insert JUMP to the next block

• Starting without LABEL

• Insert a new label

• Reorder basic blocks• Make CJUMP immediately followed by false label

34

Page 35: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Trace

• A sequence of statements that are consecutivelyexecuted during program execution

• A program has many different, overlapping traces

• A trace can include unconditional branches and conditional branches

• In a trace, unconditional branches can be removed

L1 : prologue statements

JUMP(test);

test: CJUMP(>, i, N, done, body)

body: loop body

JUMP(test);

done: epilogue statements

35

Trace

Page 36: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

A Covering Set of Traces

• A covering set of traces• A set of traces that exactly covers the whole program

• Each basic block must be in exactly one trace

• We want to minimize the number of JUMPs• Minimize the number of JUMPs from one trace to

another

• Minimize the number of traces in the covering set

36

Page 37: Topic 8: Intermediate Representation - POSTECHcorelab.postech.ac.kr/...compiler/8_Intermediate_Representation.pdfTopic 8: Intermediate Representation 1 Compiler Design ... fun transExp

Optimization with Trace

37

L1 : prologue statements

JUMP(test);

test: CJUMP(>, i, N, done, body)

body: loop body

JUMP(test);

done: epilogue statements

L1 : prologue statements

JUMP(test);

body: loop body

JUMP(test);

test: CJUMP(>, i, N, done, body)

done: epilogue statements

L1 : prologue statements

JUMP(test);

test: CJUMP(<=, i, N, body, done)

done: epilogue statements

body: loop body

JUMP(test);

(a)

(b)

(c)