automatically checking the correctness of program analyses and transformations

69
Automatically Checking the Correctness of Program Analyses and Transformations

Post on 22-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Automatically Checking the Correctness of Program Analyses and

Transformations

Compilers have many bugs

• [Bug middle-end/19650] New: miscompilation of correct code• [Bug c++/19731] arguments incorrectly named in static member

specialization• [Bug rtl-optimization/13300] Variable incorrectly identified as a biv• [Bug rtl-optimization/16052] strength reduction produces wrong code• [Bug tree-optimization/19633] local address incorrectly thought to escape• [Bug target/19683] New: MIPS wrong-code for 64-bit multiply• [Bug c++/19605] Wrong member offset in inherited classes• Bug java/19295] [4.0 regression] Incorrect bytecode produced for bitwise

AND• …

Searched for “incorrect” and “wrong” in the gcc-bugs mailing list. Some of the results:

Total of 545 matches…And this is only for Janary 2005!On a mature compiler!

Compiler bugs cause problems

if (…) { x := …;} else { y := …;}…;

ExecCompiler

• They lead to buggy executables• They rule out having strong guarantees

about executables

The focus: compiler optimizations

• A key part of any optimizing compiler

Original program Optimization

Optimized program

The focus: compiler optimizations

• A key part of any optimizing compiler

• Hard to get optimizations right– Lots of infrastructure-dependent details– There are many corner cases in each optimization– There are many optimizations and they interact in

unexpected ways– It is hard to test all these corner cases and all these

interactions

Goals

• Make it easier to write compiler optimizations– student in an undergrad compiler course should be

able to write optimizations

• Provide strong guarantees about the correctness of optimizations– automatically (no user intervention at all)– statically (before the opts are even run once)

• Expressive enough for realistic optimizations

The Rhodium work

• A domain-specific language for writing optimizations: Rhodium

• A correctness checker for Rhodium optimizations• An execution engine for Rhodium optimizations

• Implemented and checked the correctness of a variety of realistic optimizations

Broader implications

• Many other kinds of program manipulators:code refactoring tools, static checkers– My work is about program analyses and

transformations, the core of any program manipulator

• Enables safe extensible program manipulators– Allow end programmers to easily and safely extend

program manipulators– Improve programmer productivity

Outline

• Introduction

• Overview of the Rhodium system

• Writing Rhodium optimizations

• Checking Rhodium optimizations

• Evaluation

Rhodium system overview

Checker

Written by programmer

Written by me

Rhodium Execution engine

RdmOpt

RdmOpt

RdmOpt

Rhodium system overview

Checker

Written by programmer

Written by me

Rhodium Execution engine

RdmOpt

RdmOpt

RdmOpt

Rhodium system overview

RdmOpt

RdmOpt

RdmOpt

Checker Checker Checker

CheckerChecker CheckerChecker CheckerChecker

Rhodium system overview

Exec

Compiler

Rhodium Execution engine

RdmOpt

RdmOpt

RdmOpt

if (…) { x := …;} else { y := …;}…;

Checker Checker Checker

The technical problem

• Tension between:– Expressiveness– Automated correctness checking

• Challenge: develop techniques– that will go a long way in terms of expressiveness– that allow correctness to be checked

Contribution: three techniques

AutomaticTheoremProver

Rdm Opt

Verification Task

Checker

Show that for any original program:

behavior oforiginal program

=

behavior ofoptimized program

Verification Task

Contribution: three techniques

AutomaticTheoremProver

Rdm Opt

Verification Task

Verification Task

Contribution: three techniques

AutomaticTheoremProver

Rdm Opt

Verification Task

Verification Task

Contribution: three techniques

1. Rhodium is declarative– declare intent using

rules– execution engine

takes care of the rest

AutomaticTheoremProver

Rdm Opt

Contribution: three techniques

AutomaticTheoremProver

Rdm Opt

1. Rhodium is declarative– declare intent using

rules– execution engine

takes care of the rest

Contribution: three techniques

1. Rhodium is declarative

2. Factor out heuristics– legal transformations– vs. profitable

transformations

AutomaticTheoremProver

Rdm OptHeuristics not

affecting correctnessPart that must be reasoned

about

Contribution: three techniques

AutomaticTheoremProver

1. Rhodium is declarative

2. Factor out heuristics– legal transformations– vs. profitable

transformations

Heuristics not affecting correctness

Part that must be reasoned

about

Contribution: three techniques

1. Rhodium is declarative

2. Factor out heuristics

3. Split verification task– opt-dependent– vs. opt-independent

AutomaticTheoremProver

opt-dependent

opt-independent

Contribution: three techniques

1. Rhodium is declarative

2. Factor out heuristics

3. Split verification task– opt-dependent– vs. opt-independent

AutomaticTheoremProver

Contribution: three techniques

AutomaticTheoremProver

1. Rhodium is declarative

2. Factor out heuristics

3. Split verification task– opt-dependent– vs. opt-independent

Contribution: three techniques

AutomaticTheoremProver

1. Rhodium is declarative

2. Factor out heuristics

3. Split verification task

Result:• Expressive language• Automated

correctness checking

Outline

• Introduction

• Overview of the Rhodium system

• Writing Rhodium optimizations

• Checking Rhodium optimizations

• Evaluation

MustPointTo analysis

c = a

a = &b

d = *c

a b

ca b

d = b

MustPointTo info in Rhodium

c = a

a = &b

mustPointTo (a, b)

ca b mustPointTo (a, b)

mustPointTo (c, b)

a b

d = *c

MustPointTo info in Rhodium

c = a

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

c = a

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

mustPointTo (a, b)a ba b mustPointTo (a, b)

MustPointTo info in Rhodiumdefine fact mustPointTo(X:Var,Y:Var)with meaning [ X == &Y ]

c = a

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

mustPointTo (a, b)a bFact correct on edge if:

whenever program execution reaches edge, meaning of fact evaluates to true in the program state

Propagating facts

c = a

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

define fact mustPointTo(X:Var,Y:Var)with meaning [ X == &Y ]

mustPointTo (a, b)a b

a = &bif currStmt == [X = &Y]then mustPointTo(X,Y)@out

Propagating facts

c = a

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

if currStmt == [X = &Y]then mustPointTo(X,Y)@out

define fact mustPointTo(X:Var,Y:Var)with meaning [ X == &Y ]

mustPointTo (a, b)a b

if currStmt == [X = &Y] then mustPointTo(X,Y)@out

Propagating facts

c = a

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

define fact mustPointTo(X:Var,Y:Var)with meaning [ X == &Y ]

mustPointTo (a, b)a b

c = ac = a

Propagating facts

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

if mustPointTo(X,Y)@in andcurrStmt == [Z = X]

then mustPointTo(Z,Y)@out

mustPointTo (c, b)

define fact mustPointTo(X:Var,Y:Var)with meaning [ X == &Y ]

if currStmt == [X = &Y] then mustPointTo(X,Y)@out

mustPointTo (a, b)a b mustPointTo (a, b)

Propagating facts

c = a

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

define fact mustPointTo(X:Var,Y:Var)with meaning [ X == &Y ]

if currStmt == [X = &Y] then mustPointTo(X,Y)@out

if mustPointTo(X,Y)@in andcurrStmt == [Z = X]

then mustPointTo(Z,Y)@out

mustPointTo (a, b)a b

d = *cd = *c

Transformations

c = a

a = &b

ca b mustPointTo (a, b)

mustPointTo (c, b) if mustPointTo(X,Y)@in and currStmt == [Z = *X]

then transform to [Z = Y]

mustPointTo (c, b)

d = b

define fact mustPointTo(X:Var,Y:Var)with meaning [ X == &Y ]

if currStmt == [X = &Y] then mustPointTo(X,Y)@out

if mustPointTo(X,Y)@in andcurrStmt == [Z = X]

then mustPointTo(Z,Y)@out

mustPointTo (a, b)a b

d = *c

Transformations

c = a

a = &b

ca b mustPointTo (a, b)

mustPointTo (c, b) if mustPointTo(X,Y)@in and currStmt == [Z = *X]

then transform to [Z = Y]d = b

define fact mustPointTo(X:Var,Y:Var)with meaning [ X == &Y ]

if currStmt == [X = &Y] then mustPointTo(X,Y)@out

if mustPointTo(X,Y)@in andcurrStmt == [Z = X]

then mustPointTo(Z,Y)@out

mustPointTo (a, b)a b

Profitability heuristics

Legal transformations

Subset of legal transformations

(identified by the Rhodium rules)

(actually performed)

Profitability Heuristics

Profitability heuristic example 1

• Inlining• Many heuristics to determine when to inline a

function– compute function sizes, estimate code-size increase,

estimate performance benefit– maybe even use AI techniques to make the decision

• However, these heuristics do not affect the correctness of inlining

• They are just used to choose which of the correct set of transformations to perform

Profitability heuristic example 2

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

...

}

x := a + b;

• Partial redundancy elimination (PRE)

Profitability heuristic example 2

• Code duplication

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

...

}

x := a + b;x := a + b;

• PRE as code duplication followed by CSE

Profitability heuristic example 2

• Code duplication• CSE

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

...

}

x :=

x := a + b;

a + b; x;

• PRE as code duplication followed by CSE

Profitability heuristic example 2

• Code duplication• CSE• self-assignment

removal

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

...

}

x :=

x := a + b;

x;

• PRE as code duplication followed by CSE

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

...

}

x := a + b;

Profitability heuristic example 2

Legal placements of x := a + b

Profitable placement

Semantics of a Rhodium opt

• Run propagation rules in a loop until there are no more changes (optimistic iterative analysis)

• Then run transformation rules

• Then run profitability heuristics

• For better precision, combine propagation rules and transformations rules using our previous composition framework [POPL 02]

More facts

define fact mustNotPointTo(X:Var,Y:Var)with meaning [ X &Y ]

define fact hasConstantValue(X:Var,C:Const)with meaning [ X == C ]

define fact doesNotPointIntoHeap(X:Var)with meaning [ exists Y:Var . X == &Y ]

More rules

if currStmt == [X = *A] and

mustNotPointToHeap(A)@in and

forall B:Var . mayPointTo(A,B)@in implies mustNotPointTo(B,Y)

then mustNotPointTo(X,Y)@out

if currStmt == [Y = I + BE ] and

varEqualArray(X,A,J)@in and

equalsPlus(J,I,BE)@in and

: mayDef(X) Æ : mayDefArray(A) Æunchanged(BE)

then varEqualArray(X,A,Y)@out

More in Rhodium

• More powerful pointer analyses– Heap summaries

• Analyses across procedures– Interprocedural analyses

• Analyses that don’t care about the order of statements– Flow-insensitive analyses

Outline

• Introduction

• Overview of the Rhodium system

• Writing Rhodium optimizations

• Checking Rhodium optimizations

• Evaluation

Exec

Compiler

Rhodium Execution engine

RdmOpt

RdmOpt

if (…) { x := …;} else { y := …;}…;

Checker Checker

Rhodium correctness checker

RdmOpt

Checker

Rhodium correctness checker

Checker

CheckerRdmOpt

Checker

Rhodium correctness checker

Automatic theorem prover

RdmOpt

Checker

Rhodium correctness checker

Automatic theorem prover

definefact …

if …then transform …

if …then …

Checker

Profitability heuristics

Rhodium optimization

Rhodium correctness checker

Automatic theorem prover

Rhodium optimization

definefact …

if …then transform …

if …then …

Checker

Rhodium correctness checker

Automatic theorem prover

Rhodium optimization

definefact …

VCGen

LocalVC

LocalVC

LemmaFor any Rhodium opt:

If Local VCs are trueThen opt is correct

Proof

«¬

$

\ rt l

Checker

Opt-dependent

Opt-independent

VCGen

if …then …

if …then transform …

Local verification conditions

define fact mustPointTo(X,Y)with meaning [ X == &Y ]

if mustPointTo(X,Y)@in andcurrStmt == [Z = X]

then mustPointTo(Z,Y)@out

if mustPointTo(X,Y)@in and currStmt == [Z = *X]

then transform to [Z = Y]

Assume:

Propagated factis correct

Show:

All incoming facts are correct

Assume:

Original stmt and transformed stmthave same behavior

Show:

All incoming facts are correct

Local VCs (generated and proven automatically)

Local correctness of prop. rules

currStmt == [Z = X]

then mustPointTo(Z,Y)@out

Local VC (generated and proven automatically)

if mustPointTo(X,Y)@in and

define fact mustPointTo(X,Y)with meaning [ X == &Y ]

Assume:

Propagated factis correct

Show:

All incoming facts are correct

Show: [ Z == &Y ] (out)

[ X == &Y ] (in) and

out= step (in , [Z = X] )

Assume:

mustPointTo (X, Y)

mustPointTo (Z, Y)

Z := X

Local correctness of prop. rules

Show: [ Z == &Y ] (out)

[ X == &Y ] (in) and

out= step (in , [Z = X] )

Assume:

Local VC (generated and proven automatically)

define fact mustPointTo(X,Y)with meaning [ X == &Y ]

currStmt == [Z = X]

then mustPointTo(Z,Y)@out

if mustPointTo(X,Y)@in and

mustPointTo (X, Y)

mustPointTo (Z, Y)

Z := X

X Y

Z := X

in

out Z Y?

Outline

• Introduction

• Overview of the Rhodium system

• Writing Rhodium optimizations

• Checking Rhodium optimizations

• Evaluation

Dimensions of evaluation

• Ease of use

• Correctness guarantees

• Usefulness of the checker

• Expressiveness

Ease of use

• Joao Dias– third year graduate student in compilers at Harvard– less than 45 mins to write CSE and copy prop

• Erika Rice, summer 2004– only knowledge of compilers: one undergrad class– started writing Rhodium optimizations in a few days

• Simple interface to the compiler’s structures– pattern matching– “flow functions” familiar to compiler 101 students

• Ease of use• Guarantees• Usefulness• Expressiveness

Correctness guarantees

• Once checked, optimizations are guaranteed to be correct

• Caveat: trusted computing base– execution engine– checker implementation– proofs done by hand once by me

• Adding a new optimization does not increase the size of the trusted computing base

• Ease of use• Guarantees• Usefulness• Expressiveness

Usefulness of the checker

• Found subtle bugs in my initial implementation of various optimizations

define fact equals(X:Var, E:Expr)with meaning [ X == E ]

if currStmt == [X = E] then equals(X,E)@out

x := x + 1x = x + 1

equals (x , x + 1)

• Ease of use• Guarantees• Usefulness• Expressiveness

if currStmt == [X = E] then equals(X,E)@outif currStmt == [X = E] and “X does not appear in E”then equals(X,E)@out

Usefulness of the checker

• Found subtle bugs in my initial implementation of various optimizations

define fact equals(X:Var, E:Expr)with meaning [ X == E ] x := x + 1x = x + 1

equals (x , x + 1)

• Ease of use• Guarantees• Usefulness• Expressiveness

x = x + 1x = x + 1x = *y + 1

Usefulness of the checker

• Found subtle bugs in my initial implementation of various optimizations

define fact equals(X:Var, E:Expr)with meaning [ X == E ]

if currStmt == [X = E] Æ “X does not appear in E”then equals(X,E)@out

equals (x , x + 1)equals (x , *y + 1)if currStmt == [X = E] and “E does not use X”then equals(X,E)@out

• Ease of use• Guarantees• Usefulness• Expressiveness

Rhodium expressiveness

• Traditional optimizations:– const prop and folding, branch folding, dead assignment elim,

common sub-expression elim, partial redundancy elim, partial dead assignment elim, arithmetic invariant detection, and integer range analysis.

• Pointer analyses– must-point-to analysis, Andersen's may-point-to analysis with

heap summaries

• Loop opts– loop-induction-variable strength reduction, code hoisting, code

sinking

• Array opts– constant propagation through array elements, redundant array

load elimination

• Ease of use• Guarantees• Usefulness• Expressiveness

Rhodium expressiveness

• Traditional optimizations:– const prop and folding, branch folding, dead assignment elim,

common sub-expression elim, partial redundancy elim, partial dead assignment elim, arithmetic invariant detection, and integer range analysis.

• Pointer analyses– must-point-to analysis, Andersen's may-point-to analysis with

heap summaries

• Loop opts– loop-induction-variable strength reduction, code hoisting, code

sinking

• Array opts– constant propagation through array elements, redundant array

load elimination

• Ease of use• Guarantees• Usefulness• Expressiveness

Expressiveness limitations

• May not be able to express your optimization in Rhodium– opts that build complicated data structures– opts that perform complicated many-to-many

transformations (e.g.: loop fusion, loop unrolling)

• A correct Rhodium optimization may be rejected by the correctness checker – limitations of the theorem prover– limitations of first-order logic

• Ease of use• Guarantees• Usefulness• Expressiveness

Summary

• Rhodium system– makes it easier to write optimizations– provides correctness guarantees– is expressive enough for realistic optimizations

• Rhodium system provides a foundation for safe extensible program manipulators