super-optimizing llvm irllvm.org/devmtg/2011-11/sands_super-optimizingllvmir.pdfsuper optimization...

121
Super-optimizing LLVM IR Duncan Sands DeepBlueCapital / CNRS

Upload: others

Post on 10-Aug-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Super-optimizing LLVM IR

Duncan Sands

DeepBlueCapital / CNRS

Page 2: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Thanks to

Googlefor sponsorship

Page 3: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Super optimization

● Optimization → Improve code

Page 4: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Super optimization

● Optimization → Improve code

● Super-optimization → Obtain perfect code

Page 5: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Super optimization

● Optimization → Improve code

● Super-optimization → Obtain perfect code

Super-optimization → automatically find code improvements

Page 6: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Super optimization

● Optimization → Improve code

● Super-optimization → Obtain perfect code

Super-optimization → automatically find code improvements

Idea from LLVM OpenProjects web-page(suggested by John Regehr)

Page 7: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Goal

Automatically find simplifications missed by the LLVM optimizers

- And have a human implement them in LLVM

Page 8: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Goal

- And have a human implement them in LLVM

Non goalDirectly optimize programs

Automatically find simplifications missed by the LLVM optimizers

Page 9: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Goal

- And have a human implement them in LLVM

Non goalDirectly optimize programs

It doesn't matter if the simplifications foundare sometimes wrong

Automatically find simplifications missed by the LLVM optimizers

Page 10: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

ExamplesMissed simplifications found in “fully optimized” code:

• X - (X - Y) → Y

Page 11: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

ExamplesMissed simplifications found in “fully optimized” code:

• X - (X - Y) → Y Not done because of operand uses

Page 12: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

ExamplesMissed simplifications found in “fully optimized” code:

• X - (X - Y) → Y

• (X<<1) - X → X

Not done because of operand uses

Page 13: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

ExamplesMissed simplifications found in “fully optimized” code:

• X - (X - Y) → Y

• (X<<1) - X → X

Not done because of operand uses

Not done because of operand uses

Page 14: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

ExamplesMissed simplifications found in “fully optimized” code:

• X - (X - Y) → Y

• (X<<1) - X → X

• non-negative number + power-of-two != 0 → true

Not done because of operand uses

Not done because of operand uses

Page 15: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

ExamplesMissed simplifications found in “fully optimized” code:

• X - (X - Y) → Y

• (X<<1) - X → X

• non-negative number + power-of-two != 0 → true

Not done because of operand uses

Not done because of operand uses

New!

Page 16: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Process● Compile program to bitcode

Page 17: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Process● Compile program to bitcode

● Run optimizers on bitcode

Page 18: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Process● Compile program to bitcode

● Run optimizers on bitcode

● Harvest interesting expressions

Page 19: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Process● Compile program to bitcode

● Run optimizers on bitcode

● Harvest interesting expressions

● Analyse them for missing simplifications

Page 20: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Process● Compile program to bitcode

● Run optimizers on bitcode

● Harvest interesting expressions

● Analyse them for missing simplifications

● Implement the simplifications in LLVM

Page 21: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Process● Compile program to bitcode

● Run optimizers on bitcode

● Harvest interesting expressions

● Analyse them for missing simplifications

● Implement the simplifications in LLVM

Repeat

Page 22: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Process● Compile program to bitcode

● Run optimizers on bitcode

● Harvest interesting expressions

● Analyse them for missing simplifications

● Implement the simplifications in LLVM

● Profit!

Repeat

Page 23: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Process● Compile program to bitcode

● Run optimizers on bitcode

● Harvest interesting expressions

● Analyse them for missing simplifications

● Implement the simplifications in LLVM

● Profit!

Repeat

Inspired by “Automatic Generation of Peephole Superoptimizers”by Bansal & Aiken (Computer Systems Lab, Stanford)

Page 24: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Harvesting$ opt ­load=./harvest.so ­std­compile­opts ­harvest ­details \    ­disable­output bzip2.bc@07:@09{  ; In function: "mainGtU()", BB: "entry"  %0 = zext i32 %i1 to i64}07:@07:@3c:12:@3c:@06:@07:24:28:20:@29{  ; In function: "bsPutUInt32()", BB: "bsW.exit"  %28 = lshr i32 %u, 16  %29 = and i32 %28, 255  %49 = sub i32 24, %48 ; From BB: "bsW.exit24"  %50 = shl i32 %29, %49        ; From BB: "bsW.exit24"  %51 = or i32 %50, %47 ; From BB: "bsW.exit24"}...

Page 25: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Harvesting$ opt ­load=./harvest.so ­std­compile­opts ­harvest ­details \    ­disable­output bzip2.bc@07:@09{  ; In function: "mainGtU()", BB: "entry"  %0 = zext i32 %i1 to i64}07:@07:@3c:12:@3c:@06:@07:24:28:20:@29{  ; In function: "bsPutUInt32()", BB: "bsW.exit"  %28 = lshr i32 %u, 16  %29 = and i32 %28, 255  %49 = sub i32 24, %48 ; From BB: "bsW.exit24"  %50 = shl i32 %29, %49        ; From BB: "bsW.exit24"  %51 = or i32 %50, %47 ; From BB: "bsW.exit24"}...

Plugin pass that harvests code sequences

Page 26: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Harvesting$ opt ­load=./harvest.so ­std­compile­opts ­harvest ­details \    ­disable­output bzip2.bc@07:@09{  ; In function: "mainGtU()", BB: "entry"  %0 = zext i32 %i1 to i64}07:@07:@3c:12:@3c:@06:@07:24:28:20:@29{  ; In function: "bsPutUInt32()", BB: "bsW.exit"  %28 = lshr i32 %u, 16  %29 = and i32 %28, 255  %49 = sub i32 24, %48 ; From BB: "bsW.exit24"  %50 = shl i32 %29, %49        ; From BB: "bsW.exit24"  %51 = or i32 %50, %47 ; From BB: "bsW.exit24"}...

Harvest code sequences after running standard optimizers

Page 27: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Harvesting$ opt ­load=./harvest.so ­std­compile­opts ­harvest ­details \    ­disable­output bzip2.bc@07:@09{  ; In function: "mainGtU()", BB: "entry"  %0 = zext i32 %i1 to i64}07:@07:@3c:12:@3c:@06:@07:24:28:20:@29{  ; In function: "bsPutUInt32()", BB: "bsW.exit"  %28 = lshr i32 %u, 16  %29 = and i32 %28, 255  %49 = sub i32 24, %48 ; From BB: "bsW.exit24"  %50 = shl i32 %29, %49        ; From BB: "bsW.exit24"  %51 = or i32 %50, %47 ; From BB: "bsW.exit24"}...

Code sequences}

}

Page 28: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Harvesting$ opt ­load=./harvest.so ­std­compile­opts ­harvest ­details \    ­disable­output bzip2.bc@07:@09{  ; In function: "mainGtU()", BB: "entry"  %0 = zext i32 %i1 to i64}07:@07:@3c:12:@3c:@06:@07:24:28:20:@29{  ; In function: "bsPutUInt32()", BB: "bsW.exit"  %28 = lshr i32 %u, 16  %29 = and i32 %28, 255  %49 = sub i32 24, %48 ; From BB: "bsW.exit24"  %50 = shl i32 %29, %49        ; From BB: "bsW.exit24"  %51 = or i32 %50, %47 ; From BB: "bsW.exit24"}...

Code sequences}

}Code sequence = maximal connected subgraph of theLLVM IR containing only supported operations

Page 29: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Harvesting$ opt ­load=./harvest.so ­std­compile­opts ­harvest ­details \    ­disable­output bzip2.bc@07:@09{  ; In function: "mainGtU()", BB: "entry"  %0 = zext i32 %i1 to i64}07:@07:@3c:12:@3c:@06:@07:24:28:20:@29{  ; In function: "bsPutUInt32()", BB: "bsW.exit"  %28 = lshr i32 %u, 16  %29 = and i32 %28, 255  %49 = sub i32 24, %48 ; From BB: "bsW.exit24"  %50 = shl i32 %29, %49        ; From BB: "bsW.exit24"  %51 = or i32 %50, %47 ; From BB: "bsW.exit24"}...

Normalized expressions

Page 30: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Harvesting$ opt ­load=./harvest.so ­std­compile­opts ­harvest ­details \    ­disable­output bzip2.bc@07:@09{  ; In function: "mainGtU()", BB: "entry"  %0 = zext i32 %i1 to i64}07:@07:@3c:12:@3c:@06:@07:24:28:20:@29{  ; In function: "bsPutUInt32()", BB: "bsW.exit"  %28 = lshr i32 %u, 16  %29 = and i32 %28, 255  %49 = sub i32 24, %48 ; From BB: "bsW.exit24"  %50 = shl i32 %29, %49        ; From BB: "bsW.exit24"  %51 = or i32 %50, %47 ; From BB: "bsW.exit24"}...

Explanatory annotations(ignored)

Page 31: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Harvesting$ opt ­load=./harvest.so ­std­compile­opts ­harvest \    ­disable­output bzip2.bc@07:@0907:@07:@3c:12:@3c:@06:@07:24:28:20:@29...

Normalized & encoded form allows textual comparisons:

$ opt ­load=./harvest.so ­std­compile­opts ­harvest \  ­disable­output bzip2.bc | sort | uniq ­c | sort ­r ­n    265 @00:07:@2b    178 @01:07:@0f    120 @00:@07:@2b    ...

$ opt ­load=./harvest.so ­std­compile­opts ­harvest \    ­disable­output bzip2.bc@07:@0907:@07:@3c:12:@3c:@06:@07:24:28:20:@29...

} Ordered by frequency of occurrence

Page 32: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

HarvestingMost common expressions in unoptimized bitcode from the LLVM testsuite:

07:0a → sext X00:07:2c → X != 007:09 → zext X05:07:0f → X +nsw -100:07:2b → X == 007:07:13 → X -nsw Y07:07:32 → X >=s Y01:07:0f → X +nsw 106:07:0a:16 → (sext X) * power-of-2

sext = sign-extend

zext = zero-extend

+nsw = add with no-signed wrap

-nsw = sub with no-signed wrap>=s = signed greater than or equal

power-of-2 = constant thatis a power of two

Page 33: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

ExpressionsICMP_SLT

ZeroZExt

Add

Register Register

● Directed acyclic graph - no loops!

● Integer operations only - no floating point!

● No memory operations (load/store)!

● No types!

● Limited set of constants (eg: Zero, One, SignBit)

Most integer operations supported (eg: ctlz, overflow intrinsics).Doesn't support byteswap (because of lack of types).

Page 34: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Analysing expressions

Four modes:

● Constant folding

● Reduce to sub-expression

● Unused variables

● Rule reduction

Page 35: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Analysing expressions

Four modes:

● Constant folding

● Reduce to sub-expression

● Unused variables

● Rule reduction

zext x <s 0 → 0 (i.e. false)

Page 36: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Analysing expressions

Four modes:

● Constant folding

● Reduce to sub-expression

● Unused variables

● Rule reduction

zext x <s 0 → 0 (i.e. false)

((x + z) *nsw y) /s y → x + z

Page 37: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Analysing expressions

Four modes:

● Constant folding

● Reduce to sub-expression

● Unused variables

● Rule reduction

zext x <s 0 → 0 (i.e. false)

((x + z) *nsw y) /s y → x + z

Page 38: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Analysing expressions

Four modes:

● Constant folding

● Reduce to sub-expression

● Unused variables

● Rule reduction

zext x <s 0 → 0 (i.e. false)

((x + z) *nsw y) /s y → x + z

x - (x + y) → 0 - y

Page 39: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Analysing expressions

Four modes:

● Constant folding

● Reduce to sub-expression

● Unused variables

● Rule reduction

zext x <s 0 → 0 (i.e. false)

((x + z) *nsw y) /s y → x + z

x - (x + y) → 0 - y

Result does not depend on xCan replace x with (eg) 0

Page 40: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Analysing expressions

Four modes:

● Constant folding

● Reduce to sub-expression

● Unused variables

● Rule reduction

zext x <s 0 → 0 (i.e. false)

((x + z) *nsw y) /s y → x + z

x - (x + y) → 0 - y

Repeatedly apply rules from a list.Search minimum of cost function.

Page 41: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Analysing expressions

Four modes:

● Constant folding

● Reduce to sub-expression

● Unused variables

● Rule reduction

zext x <s 0 → 0 (i.e. false)

((x + z) *nsw y) /s y → x + z

x - (x + y) → 0 - y

Repeatedly apply rules from a list.Search minimum of cost function.

Rafael Auler'sGSOC project

Page 42: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Analysing expressions

Four modes:

● Constant folding

● Reduce to sub-expression

● Unused variables

● Rule reduction

zext x <s 0 → 0 (i.e. false)

((x + z) *nsw y) /s y → x + z

x - (x + y) → 0 - y

Repeatedly apply rules from a list.Search minimum of cost function.

Fast!

Alway

s a w

in!

Page 43: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Analysing expressions

Four modes:

● Constant folding

● Reduce to sub-expression

● Unused variables

● Rule reduction

zext x <s 0 → 0 (i.e. false)

((x + z) *nsw y) /s y → x + z

x - (x + y) → 0 - y

Repeatedly apply rules from a list.Search minimum of cost function.

Fast!

Alway

s a w

in!

Fast!

Often

a win!

Page 44: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Analysing expressions

Four modes:

● Constant folding

● Reduce to sub-expression

● Unused variables

● Rule reduction

zext x <s 0 → 0 (i.e. false)

((x + z) *nsw y) /s y → x + z

x - (x + y) → 0 - y

Repeatedly apply rules from a list.Search minimum of cost function.

Fast!

Alway

s a w

in!

Fast!

Often

a win!

Fast!

Somet

imes

a w

in!

Page 45: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Analysing expressions

Four modes:

● Constant folding

● Reduce to sub-expression

● Unused variables

● Rule reduction

zext x <s 0 → 0 (i.e. false)

((x + z) *nsw y) /s y → x + z

x - (x + y) → 0 - y

Repeatedly apply rules from a list.Search minimum of cost function.

Fast!

Alway

s a w

in!

Fast!

Often

a win!

Fast!

Somet

imes

a w

in!

Slow!

Wor

k in

prog

ress

!

Page 46: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Analysing expressions

Four modes:

● Constant folding

● Reduce to sub-expression

● Unused variables

● Rule reduction

zext x <s 0 → 0 (i.e. false)

((x + z) *nsw y) /s y → x + z

x - (x + y) → 0 - y

Repeatedly apply rules from a list.Search minimum of cost function.

Implement in LLVM'sInstructionSimplify analysis

Implement in LLVM'sInstCombine transform

Page 47: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Constant foldingICMP_SLT

ZeroZExt

Add

Register Register

● Assign types to nodes

Page 48: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Constant foldingICMP_SLT

ZeroZExt

Add

Register Register

● Assign types to nodes

i1 No choice

Page 49: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Constant foldingICMP_SLT

ZeroZExt

Add

Register Register

● Assign types to nodesi1

i1

Choice (chose smallest)

Page 50: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Constant foldingICMP_SLT

ZeroZExt

Add

Register Register

● Assign types to nodesi1 i1

i1

i1

No choice

Page 51: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Constant foldingICMP_SLT

ZeroZExt

Add

Register Register

● Assign types to nodesi1 i1

i1

i2

i1

Choice (chose smallest)

Page 52: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Constant foldingICMP_SLT

ZeroZExt

Add

Register Register

● Assign types to nodesi1 i1

i1

i2

i1

i2 No choice

Page 53: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Constant foldingICMP_SLT

ZeroZExt

Add

Register Register

● Assign types to nodesi1 i1

i1

i2

i1

i2

Strategies: (1) Random choice; (2) All small types.

Page 54: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Constant foldingICMP_SLT

ZeroZExt

Add

Register Register

● Assign types to nodes

● Assign values to terminal nodes & propagate up

i1 0 i1 1

i1

i2

i1

i2 0

Strategies: (1) Random choice; (2) All small types.

Choice

No choice

Page 55: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Constant foldingICMP_SLT

ZeroZExt

Add

Register Register

● Assign types to nodes

● Assign values to terminal nodes & propagate up

i1 0 i1 1

i1 1

i2 1

i1 0

i2 0

Strategies: (1) Random choice; (2) All small types.

Page 56: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Constant foldingICMP_SLT

ZeroZExt

Add

Register Register

● Assign types to nodes

● Assign values to terminal nodes & propagate up

i1 0 i1 1

i1 1

i2 1

i1 0

i2 0

Strategies: (1) Random choice; (2) All small types.

Strategies: (1) Random inputs; (2) Every possible input.

Page 57: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Constant foldingICMP_SLT

ZeroZExt

Add

Register Register

● Assign types to nodes

● Assign values to terminal nodes & propagate up

i2 1 i2 1

i2 2

i3 2

i1 0

i3 0

Strategies: (1) Random choice; (2) All small types.

Strategies: (1) Random inputs; (2) Every possible input.

Repeat many times

Page 58: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Constant foldingICMP_SLT

ZeroZExt

Add

Register Register

● Assign types to nodes

● Assign values to terminal nodes & propagate up

● Result at the root always the same → found a constant fold

i2 1 i2 1

i2 2

i3 2

i1 0

i3 0

Strategies: (1) Random choice; (2) All small types.

Strategies: (1) Random inputs; (2) Every possible input.

Repeat many times

Always zero

Page 59: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

False positives

Eg: A | (B + 1) | (C - 1) == 0

Page 60: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

False positives

Eg: A | (B + 1) | (C - 1) == 0

Mostly evaluates to “false”

Page 61: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

False positives

Eg: A | (B + 1) | (C - 1) == 0

A, B and C have i8 type → 1 / 2^24 chance of seeing “true”

Mostly evaluates to “false”

Page 62: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

False positives

Eg: A | (B + 1) | (C - 1) == 0

A, B and C have i8 type → 1 / 2^24 chance of seeing “true”

A, B and C have i1 type → 1 / 8 chance of seeing “true”

Mostly evaluates to “false”

Page 63: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

False positives

Eg: A | (B + 1) | (C - 1) == 0

A, B and C have i8 type → 1 / 2^24 chance of seeing “true”

A, B and C have i1 type → 1 / 8 chance of seeing “true”

Mostly evaluates to “false”

Use of small types hugely reduces the number of false positives

Page 64: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

ExamplesConstant folds found in “fully optimized” code:

● ( ( (X + Y) >>L power-of-two ) & Z ) + power-of-two == 0 → false

Page 65: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

ExamplesConstant folds found in “fully optimized” code:

● ( ( (X + Y) >>L power-of-two ) & Z ) + power-of-two == 0 → false

Implemented as: “non-negative-number + power-of-two != 0”

Page 66: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

ExamplesConstant folds found in “fully optimized” code:

● ( ( (X + Y) >>L power-of-two ) & Z ) + power-of-two == 0 → false

● ( (X >s Y) ? X : Y ) >=s X → true

Page 67: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

ExamplesConstant folds found in “fully optimized” code:

● ( ( (X + Y) >>L power-of-two ) & Z ) + power-of-two == 0 → false

● ( (X >s Y) ? X : Y ) >=s X → true

“max(X, Y) >= X”. Implemented several max/min folds.

Page 68: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

ExamplesConstant folds found in “fully optimized” code:

● ( ( (X + Y) >>L power-of-two ) & Z ) + power-of-two == 0 → false

● ( (X >s Y) ? X : Y ) >=s X → true

● X rem ( Y ? X : 1 ) → 0

● (Y /u X) >u Y → false

Page 69: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

ExamplesConstant folds found in “fully optimized” code:

● ( ( (X + Y) >>L power-of-two ) & Z ) + power-of-two == 0 → false

● ( (X >s Y) ? X : Y ) >=s X → true

● X rem ( Y ? X : 1 ) → 0

● (Y /u X) >u Y → false

Require reasoning aboutundefined behaviour

Page 70: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Undefined behaviour

ICMP_UGT

UDiv

Register (X) Register (Y)

(X /u Y) >u X → false

Page 71: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Undefined behaviour

ICMP_UGT

UDiv

Register (X) Register (Y)

(X /u Y) >u X → false

i8 42 i8 0

undefined

Page 72: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Undefined behaviour

ICMP_UGT

UDiv

Register (X) Register (Y)

(X /u Y) >u X → false

i8 42 i8 0

undefined

undefined

Any operation with an undef operand gets an undef result

Page 73: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Undefined behaviour

ICMP_UGT

UDiv

Register (X) Register (Y)

(X /u Y) >u X → false

i8 42 i8 0

undefined

undefined

Any operation with an undef operand gets an undef result

● Avoids false negatives

● May result in subtle false positives

Page 74: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Reduce to subexpressionSDiv

MulNSW

Register (X) Register (Y)

(X *nsw Y) /s Y → X

Page 75: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Reduce to subexpressionSDiv

MulNSW

Register (X) Register (Y)

(X *nsw Y) /s Y → X

● Assign types to nodesStrategies: (1) Random choice; (2) All small types.

i3 i3

i3

i3

Page 76: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Reduce to subexpressionSDiv

MulNSW

Register (X) Register (Y)

(X *nsw Y) /s Y → X

● Assign types to nodes

● Assign values to terminal nodes & propagate upStrategies: (1) Random choice; (2) All small types.

Strategies: (1) Random inputs; (2) Every possible input.

i3 2 i3 1

i3 2

i3 2

Page 77: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Reduce to subexpressionSDiv

MulNSW

Register (X) Register (Y)

(X *nsw Y) /s Y → X

● Assign types to nodes

● Assign values to terminal nodes & propagate up

● See if some node always has same value as root (or undef)

Strategies: (1) Random choice; (2) All small types.

Strategies: (1) Random inputs; (2) Every possible input.

i3 2 i3 1

i3 2

i3 2

Same

Page 78: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Reduce to subexpressionSDiv

MulNSW

Register (X) Register (Y)

(X *nsw Y) /s Y → X

● Assign types to nodes

● Assign values to terminal nodes & propagate up

● See if some node always has same value as root (or undef)

Strategies: (1) Random choice; (2) All small types.

Strategies: (1) Random inputs; (2) Every possible input.

Repeat many timesi3 1 i3 2

i3 2

i3 1

Same

Page 79: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Reduce to subexpressionSDiv

MulNSW

Register (X) Register (Y)

(X *nsw Y) /s Y → X

● Assign types to nodes

● Assign values to terminal nodes & propagate up

● See if some node always has same value as root (or undef)→ found a subexpression reduction

Strategies: (1) Random choice; (2) All small types.

Strategies: (1) Random inputs; (2) Every possible input.

Repeat many times

Always same

Page 80: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Register pressure(X *nsw Y) /s Y → X Is this always a win?

Page 81: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Register pressure(X *nsw Y) /s Y → X Is this always a win?

Z = X *nsw Y

...

W = Z /s Ycall @foo(W, Y, Z)

Page 82: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Register pressure(X *nsw Y) /s Y → X Is this always a win?

Z = X *nsw Y

...

W = Z /s Ycall @foo(W, Y, Z)

X not used again

Page 83: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Register pressure(X *nsw Y) /s Y → X Is this always a win?

Z = X *nsw Y

...

W = Z /s Ycall @foo(W, Y, Z)

X not used againTwo registers needed (for Y, Z)

Page 84: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Register pressure(X *nsw Y) /s Y → X Is this always a win?

Z = X *nsw Y

...

W = Z /s Ycall @foo(W, Y, Z)

Z = X *nsw Y

...

... W not computed ...call @foo(X, Y, Z)

Transform: W → X

Page 85: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Register pressure(X *nsw Y) /s Y → X Is this always a win?

Z = X *nsw Y

...

... W not computed ...call @foo(X, Y, Z)

Three registers needed (for X, Y, Z)

Page 86: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Register pressure(X *nsw Y) /s Y → X Is this always a win?

Transform increases the number of long lived registers by one.May require spilling to the stack.

Page 87: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Unused variablesX +nsw Z >=s Z +nsw Y

Z is an “unused variable”

Page 88: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Unused variablesX +nsw Z >=s Z +nsw Y

Z is an “unused variable”

For every choice of the other variables (X, Y)the result of the expression does not dependon the value of Z (or is undefined)

Page 89: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Unused variablesX +nsw Z >=s Z +nsw Y

Z is an “unused variable”

For every choice of the other variables (X, Y)the result of the expression does not dependon the value of Z (or is undefined)

Replaced Z with 0

Transform: X +nsw Z >=s Z +nsw Y → X >=s Y

Page 90: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Unused variablesX +nsw Z >=s Z +nsw Y

Z is an “unused variable”

For every choice of the other variables (X, Y)the result of the expression does not dependon the value of Z (or is undefined)

Replaced Z with 0

Transform: X +nsw Z >=s Z +nsw Y → X >=s Y

Detect similarly to constant folding etc.

Page 91: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

ExamplesUnused variables found in “fully optimized” code:

● X >=s X +nsw Y

● ((X + Y) + -1) == X

● Y >>exact X == 0

● Y <<nsw X == 0

X is unused

Page 92: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Problems with unused variables

● More false positives than other modes

Page 93: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Problems with unused variables

● More false positives than other modes

● May increase register pressure

Page 94: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Problems with unused variables

● More false positives than other modes

● May increase register pressure

● May increase the amount of computation

Page 95: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Problems with unused variables

● More false positives than other modes

● May increase register pressure

● May increase the amount of computation

Eg: (A + B) * (C + D) == B * C + B * D

B is an unused variable

Page 96: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Problems with unused variables

● More false positives than other modes

● May increase register pressure

● May increase the amount of computation

Eg: (A + B) * (C + D) == B * C + B * D

B is an unused variable

Transforms to: A * C + A * D == 0

Page 97: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Problems with unused variables

● More false positives than other modes

● May increase register pressure

● May increase the amount of computation

Eg: (A + B) * (C + D) == B * C + B * D

B is an unused variable

Transforms to: A * C + A * D == 0

Requires computing A*C, A*D etc.

Page 98: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Rule reductionRequires a list of rules, eg:

rule (0 And 1) => (1 And 0); // Commutativity rule (0 And AllBitsSet) <=> 0; // AllBitsSet is And-identity rule ((0 Or 1) And 2) <=> ((0 And 2) Or (1 And 2)); // Distributivity rule (0 Or AllBitsSet) => AllBitsSet; // AllBitsSet is Or-annihilator.

Page 99: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Rule reductionRequires a list of rules, eg:

rule (0 And 1) => (1 And 0); // Commutativity rule (0 And AllBitsSet) <=> 0; // AllBitsSet is And-identity rule ((0 Or 1) And 2) <=> ((0 And 2) Or (1 And 2)); // Distributivity rule (0 Or AllBitsSet) => AllBitsSet; // AllBitsSet is Or-annihilator.

(X & Y) | Y Cost: 22

Page 100: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Rule reductionRequires a list of rules, eg:

rule (0 And 1) => (1 And 0); // Commutativity rule (0 And AllBitsSet) <=> 0; // AllBitsSet is And-identity rule ((0 Or 1) And 2) <=> ((0 And 2) Or (1 And 2)); // Distributivity rule (0 Or AllBitsSet) => AllBitsSet; // AllBitsSet is Or-annihilator.

(X & Y) | Y Cost: 22

(X & Y) | (Y & AllOnesValue) Cost: 30

Page 101: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Rule reductionRequires a list of rules, eg:

rule (0 And 1) => (1 And 0); // Commutativity rule (0 And AllBitsSet) <=> 0; // AllBitsSet is And-identity rule ((0 Or 1) And 2) <=> ((0 And 2) Or (1 And 2)); // Distributivity rule (0 Or AllBitsSet) => AllBitsSet; // AllBitsSet is Or-annihilator.

(X & Y) | Y Cost: 22

(X & Y) | (Y & AllOnesValue) Cost: 30

(X & Y) | (AllOnesValue & Y) Cost: 30

Page 102: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Rule reductionRequires a list of rules, eg:

rule (0 And 1) => (1 And 0); // Commutativity rule (0 And AllBitsSet) <=> 0; // AllBitsSet is And-identity rule ((0 Or 1) And 2) <=> ((0 And 2) Or (1 And 2)); // Distributivity rule (0 Or AllBitsSet) => AllBitsSet; // AllBitsSet is Or-annihilator.

(X & Y) | Y Cost: 22

(X & Y) | (Y & AllOnesValue) Cost: 30

(X & Y) | (AllOnesValue & Y) Cost: 30

(X | AllOnesValue) & Y Cost: 22

Page 103: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Rule reductionRequires a list of rules, eg:

rule (0 And 1) => (1 And 0); // Commutativity rule (0 And AllBitsSet) <=> 0; // AllBitsSet is And-identity rule ((0 Or 1) And 2) <=> ((0 And 2) Or (1 And 2)); // Distributivity rule (0 Or AllBitsSet) => AllBitsSet; // AllBitsSet is Or-annihilator.

(X & Y) | Y Cost: 22

(X & Y) | (Y & AllOnesValue) Cost: 30

(X & Y) | (AllOnesValue & Y) Cost: 30

(X | AllOnesValue) & Y Cost: 22

AllOnesValue & Y Cost: 11

Page 104: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Rule reductionRequires a list of rules, eg:

rule (0 And 1) => (1 And 0); // Commutativity rule (0 And AllBitsSet) <=> 0; // AllBitsSet is And-identity rule ((0 Or 1) And 2) <=> ((0 And 2) Or (1 And 2)); // Distributivity rule (0 Or AllBitsSet) => AllBitsSet; // AllBitsSet is Or-annihilator.

(X & Y) | Y Cost: 22

(X & Y) | (Y & AllOnesValue) Cost: 30

(X & Y) | (AllOnesValue & Y) Cost: 30

(X | AllOnesValue) & Y Cost: 22

AllOnesValue & Y Cost: 11

Y Cost: 3

Page 105: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Rule reductionRequires a list of rules, eg:

rule (0 And 1) => (1 And 0); // Commutativity rule (0 And AllBitsSet) <=> 0; // AllBitsSet is And-identity rule ((0 Or 1) And 2) <=> ((0 And 2) Or (1 And 2)); // Distributivity rule (0 Or AllBitsSet) => AllBitsSet; // AllBitsSet is Or-annihilator.

(X & Y) | Y Cost: 22

(X & Y) | (Y & AllOnesValue) Cost: 30

(X & Y) | (AllOnesValue & Y) Cost: 30

(X | AllOnesValue) & Y Cost: 22

AllOnesValue & Y Cost: 11

Y Cost: 3

Page 106: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Rule reductionRequires a list of rules, eg:

rule (0 And 1) => (1 And 0); // Commutativity rule (0 And AllBitsSet) <=> 0; // AllBitsSet is And-identity rule ((0 Or 1) And 2) <=> ((0 And 2) Or (1 And 2)); // Distributivity rule (0 Or AllBitsSet) => AllBitsSet; // AllBitsSet is Or-annihilator.

(X & Y) | Y Cost: 22

(X & Y) | (Y & AllOnesValue) Cost: 30

(X & Y) | (AllOnesValue & Y) Cost: 30

(X | AllOnesValue) & Y Cost: 22

AllOnesValue & Y Cost: 11

Y Cost: 3

Time: 1 minute

Page 107: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Rule reductionRequires a list of rules, eg:

rule (0 And 1) => (1 And 0); // Commutativity rule (0 And AllBitsSet) <=> 0; // AllBitsSet is And-identity rule ((0 Or 1) And 2) <=> ((0 And 2) Or (1 And 2)); // Distributivity rule (0 Or AllBitsSet) => AllBitsSet; // AllBitsSet is Or-annihilator.

(X & Y) | Y Cost: 22

(X & Y) | (Y & AllOnesValue) Cost: 30

(X & Y) | (AllOnesValue & Y) Cost: 30

(X | AllOnesValue) & Y Cost: 22

AllOnesValue & Y Cost: 11

Y Cost: 3SubExpr: 0.05 secs UnusedVar: 0.08 secs

Time: 1 minute

Page 108: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Rule reduction problems● Slow

Page 109: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Rule reduction problems● Slow

● Needs more rules

Page 110: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Rule reduction problems● Slow

● Needs more rules

● Can this approach find unexpected simplifications?

(zext X) + power-of-two == 0 → false

Page 111: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Rule reduction problems● Slow

● Needs more rules

● Can this approach find unexpected simplifications?

(zext X) + power-of-two == 0 → false

Needs more work!

Page 112: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Profit!

Page 113: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Profit?Approximate % speed-up: constant folds

400.perlbench401.bzip2

403.gcc429.mcf

445.gobmk456.hmmer

458.sjeng462.libquantum

464.h264ref471.omnetpp

473.astar483.xalancbmk

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Page 114: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Profit?!Approximate % speed-up: constant folds & reduce to sub-expr:

400.perlbench401.bzip2

403.gcc429.mcf

445.gobmk456.hmmer

458.sjeng462.libquantum

464.h264ref471.omnetpp

473.astar483.xalancbmk

-2

-1

0

1

2

3

4

Page 115: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Improvements● Work directly with LLVM IR

Page 116: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Improvements● Work directly with LLVM IR

define i64 @combine(i64 %x) { %xl = trunc i64 %x to i32 %h = lshr i64 %x, 32 %xh = trunc i64 %h to i32 %eh = zext i32 %xh to i64 %el = zext i32 %xl to i64 %h2 = shl i64 %eh, 32 %r = or i64 %h2, %el ret i64 %r}

Simplifies to: ret %x

Page 117: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Improvements● Work directly with LLVM IR

define i64 @combine(i64 %x) { %xl = trunc i64 %x to i32 %h = lshr i64 %x, 32 %xh = trunc i64 %h to i32 %eh = zext i32 %xh to i64 %el = zext i32 %xl to i64 %h2 = shl i64 %eh, 32 %r = or i64 %h2, %el ret i64 %r}

Simplifies to: ret %x

((zext (trunc (X >>l pow-2))) << pow-2) | (zext (trunc X))

Impossible to find, due to● Type-free expressions● Limited number of constants

Page 118: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Improvements● Work directly with LLVM IR

(Constant folding, subexpression reduction, unused variables)

How to avoid many false positives?

Page 119: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Improvements● Work directly with LLVM IR

● Sort expressions by execution frequency rather than textual frequency

(Constant folding, subexpression reduction, unused variables)

How to avoid many false positives?

Page 120: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

Improvements● Work directly with LLVM IR

● Sort expressions by execution frequency rather than textual frequency

(Constant folding, subexpression reduction, unused variables)

How to avoid many false positives?

Eg: generate fake debug info using the encoded expression forthe “function”.

Hottest “functions” reported by profiling tools are the hottestexpressions!

Page 121: Super-optimizing LLVM IRllvm.org/devmtg/2011-11/Sands_Super-optimizingLLVMIR.pdfSuper optimization Optimization → Improve code Super-optimization → Obtain perfect code Super-optimization

svn://topo.math.u-psud.fr/harvest

Getting it