approximating to the last bit -...

38
Approximating to the Last Bit Thierry Moreau, Adrian Sampson, Luis Ceze {moreau, luisceze}@cs.washington.edu, [email protected] WAX 2016 co-located with ASPLOS 2016 April 3rd 2016

Upload: dangtruc

Post on 14-Apr-2018

230 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Approximating to the Last Bit

Thierry Moreau, Adrian Sampson, Luis Ceze {moreau, luisceze}@cs.washington.edu, [email protected]

WAX 2016 co-located with ASPLOS 2016

April 3rd 2016

Page 2: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

What this Talk is About

2

How many bits in a program are really that important?

1 - AXE: Quality Tuning Framework

2 - PERFECT Benchmark Study

Page 3: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Precision Tuning

3

More precision means larger memory footprint, more data movement, more energy used in computation

Page 4: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Precision Tuning

4

More precision means larger memory footprint, more data movement, more energy used in computation

doublefloat

Page 5: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Precision Tuning

5

More precision means larger memory footprint, more data movement, more energy used in computation

n

doublefloat

1

Page 6: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

AXE Precision Tuning Framework

6

Goal: Maximize Bit-Savings given a Quality Target

Page 7: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

AXE Precision Tuning Framework

kernel.c

quality target

AXE framework instruction-level

precision requirements

quality &bit-savings

7

Built on top of ACCEPT, the approximate C/C++ compiler http://accept.rocks

Page 8: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

quality

bit-savings

bad OK

AXE Precision Tuning Framework

8

instruction 0instruction 1instruction 2…instruction n-1instruction n

Default (no bit-savings)

Page 9: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

AXE Precision Tuning Framework

9

instruction 0instruction 1instruction 2…instruction n-1instruction n

Coarse-Grained Precision Reduction

quality

bit-savings

bad OK

Page 10: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

AXE Precision Tuning Framework

10

instruction 0instruction 1instruction 2…instruction n-1instruction n

Fine-Grained Precision Reduction

quality

bit-savings

bad OK

Page 11: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

PERFECT Benchmark SuiteApplication Domain Kernels Metric

PERFECT Application 1Discrete Wavelet

Transform

Signal to Noise Ratio(SNR)

[120dB to 10dB] (0.0001% to 31.6% MSE)

2D ConvolutionHistogram Equalization

Space Time Adaptive Processing

Outer ProductSystem SolveInner Product

Synthetic Aperture RadarInterpolation 1Interpolation 2Back Projection

Wide Area Motion Imaging

DebayerImage RegistrationChange Detection

Required Kernels FFT 1DFFT 2D

11

Page 12: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

1 - PERFECT Dynamic Instruction Mix

12

load/store 27%

int arith 4%

fp arith 31%

math 1%

int arith 25%

control 11%

Safe to approximatePrecise

Page 13: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

1 - PERFECT Dynamic Instruction Mix

Long latency ops are all safe to approximate

13

fp arith 31%

math 1%

Safe to approximatePrecise

Page 14: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

1 - PERFECT Dynamic Instruction Mix

14

load/store 27%

Memory ops are mostly safe to approximate

(mostly data vs. pointers)

Safe to approximatePrecise

Page 15: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

1 - PERFECT Dynamic Instruction Mix

15

int arith 25%

control 11%

Control and address computation must

remain precise

Safe to approximatePrecise

Page 16: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

2 - Bit-Savings over Approximate Instructions

16

Bit-S

avin

gs

0%

20%

40%

60%

80%

100%

Average SNR (dB)10 20 40 60 80 100 120

26%32%

40%48%

57%

74%83%

High QualityApproximate

Page 17: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

2 - Bit-Savings over Approximate Instructions

17

Bit-S

avin

gs

0%

20%

40%

60%

80%

100%

Average SNR (dB)10 20 40 60 80 100 120

26%32%

40%48%

57%

74%83%

PERFECT Manual 0.001% MSE

Page 18: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

2 - Bit-Savings over Approximate Instructions

18

Bit-S

avin

gs

0%

20%

40%

60%

80%

100%

Average SNR (dB)10 20 40 60 80 100 120

26%32%

40%48%

57%

74%83%

PERFECT Manual 0.001% MSE

Approximate Computing 10% MSE

Page 19: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Future Architectural Challenges

Mechanisms to translate bit-savings into energy savings?

New data types/representations?

ISA extensions?

19

Page 20: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Thank You!

20

Thierry Moreau, Luis Ceze, Adrian Sampson {moreau, luisceze}@cs.washington.edu, [email protected]

WAX 2016 co-located with ASPLOS 2016

April 3rd 2016

Approximating to the Last Bit

Page 21: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Backup Slides

21

Page 22: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Bit Savings

Explore the opportunity for precision reduction in a hardware-agnostic way

22

BitSavings =X

insnstatic

(precisionref

� precision

approx

)

precision

ref

⇥ execs

execs

total

Page 23: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Framework Overview

Built on top of ACCEPT, the approximate C/C++ compiler http://accept.rocks

AnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

23

Page 24: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Program Annotation

void conv2d (pix *in, pix *out, flt *filter){ for (row) { for (col) { flt sum = 0 int dstPos = … for (row_offset) { for (col_offset) { int srcPos = … int fltPos = … sum += in[srcPos] * filter[fltPos] } } out[dstPos] = sum / normFactor } }}

AnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

24

Page 25: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Program Annotation

void conv2d (APPROX pix *in, APPROX pix *out, APPROX flt *filter){ for (row) { for (col) { APPROX flt sum = 0 int dstPos = … for (row_offset) { for (col_offset) { int srcPos = … int fltPos = … sum += in[srcPos] * filter[fltPos] } } out[dstPos] = sum / normFactor } }}

AnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

Key: use the APPROXtype qualifier

25

Page 26: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Program Annotation

Takeways: Annotating data is intuitive (~10 mins to annotate a kernel) Variables used to index arrays cannot be safely approximated

typedef float flttypedef int pix

typedef APPROX float flttypedef APPROX int pix

tips on annotating programs faster

AnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

26

Page 27: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Static Analysis

AnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

Instruction-Level Precision Configuration (ILPC)

conv2d:13:7:load:Int32 conv2d:13:10:load:Float conv2d:13:11:fmul:Float conv2d:13:12:fadd:Float conv2d:15:1:fdiv:Float conv2d:15:7:store:Int32

ACCEPT identified safe-to-approximate instructions from data annotations using flow analysis

void conv2d (APPROX pix *in, APPROX pix *out, APPROX flt *filter){ for (row) { for (col) { APPROX flt sum = 0 int dstPos = … for (row_offset) { for (col_offset) { int srcPos = … int fltPos = … sum += in[srcPos] * filter[fltPos] } } out[dstPos] = sum / normFactor } }}

ACCEPT

27

Page 28: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Approximate Binary

AnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

Error Injection

Instruction-Level Precision Configuration (ILPC)

conv2d:13:7:load:Int32 conv2d:13:10:load:Float conv2d:13:11:fmul:Float conv2d:13:12:fadd:Float conv2d:15:1:fdiv:Float conv2d:15:7:store:Int32

Each instruction in the ILCP acts as a quality knob that the autotuner can use to maximize bit-savings

ACCEPT

28

Page 29: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

AnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

Quality Assessment

The programmer provides a quality assessment script to evaluate quality on the program output

Reference Binary

Approximate Binary

eval.py

10dB SNR

29

Page 30: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

AnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

Autotuner

config k: error = 0.10%

config [k+1, i-1]: error = 5.91%

config [k+1, i]: error = 0.30%

config [k+1, i+1]: error = 0.12%

config [k+2, i-1]: error = 5.91%

config [k+2, i]: error = 0.33%

config [k+2, i+1]: error = 1.6%

Greedy iterative algorithm: reduces precision requirement of the instruction that impacts quality the least

Finds solution in O(m2n) worst case where m is the number of static safe-to-approximate instructions and n are the levels of precision for all instructions

30

Page 31: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

AnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

Autotuner

precise60dB40dB

20dB

10dB The autotuner greedily maximizes bit-savings as the quality target is lowered

31

Page 32: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Precision “Guarantees”

Currently empirically derived and input dependent

Future work would extend on the current infrastructure to assimilate data dependence

information in order to derive formal error guarantees

Page 33: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

1 - PERFECT Dynamic Instruction Mix

0%

25%

50%

75%

100%

2d-convdwthist-eqoutersysteminnerinterp1interp2bp debayerlucas-kanadechange-detfft-1dfft-2dAVERAGE

prec_controlprec_int_arithprec_memappr_mathappr_fp_arithappr_int_arithappr_mem

33

Page 34: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

PERFECT Benchmark SuiteApplication Domain Kernels Metric

PERFECT Application 1Discrete Wavelet

Transform

SNR [120dB to 10dB]

2D ConvolutionHistogram Equalization

Space Time Adaptive Processing

Outer ProductSystem SolveInner Product

Synthetic Aperture RadarInterpolation 1Interpolation 2Back Projection

Wide Area Motion Imaging

DebayerImage RegistrationChange Detection

Required Kernels FFT 1DFFT 2D

10 log10

PNk=1 |rk|2PN

k=1 |rk � ak|2

!

N : number of output elements

rk: reference value of element ktk: approximate value of element k

34

Page 35: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

2 - Bit-Savings over Approximate Instructions

35

Aggr

egat

e Bi

t Sav

ings

0%

25%

50%

75%

100%

2dco

nv dwt

histeq ou

ter

system

solve inn

er

interp

1int

erp2 bp

deba

yer

lucas

kana

de

chan

gede

tfft1

dfft2

d

AVERAGE

10 20 40 60 80 100 120

Page 36: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

2 - Bit-Savings over Approximate Instructions

You don’t need a lot of bits to obtain an acceptable output!36

Bit-S

avin

gs

0%

20%

40%

60%

80%

100%

Average SNR (dB)10 20 40 60 80 100 120

int arith fp arith mem ops math AGGREGATE

26%32%

40%

48%

57%

74%

83%83%

74%

57%

48%

40%32%

26%

Page 37: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Architectural Target

37

Core Energy Breakdownoverheads

compute

General Purpose CPU

compute

Vector Processor*

Accelerators

specialization

the smaller the overheads, the larger the potential gains

* [Quora, Venkataramani et al., MICRO2013]

Page 38: Approximating to the Last Bit - approximate.computerapproximate.computer/wax2016/slides/moreau.pdfApplication Domain Kernels Metric ... Interpolation 1 Interpolation 2 Back Projection

Precision ScalingMechanisms for precision scalability:

• Fine-grained ALU power gating*

• Bit-sliced ALU units

• Lossy Compression

38

++

Energy Savings

Bit-Savings

?

* [Quora, Venkataramani et al., MICRO2013]