load value approximation - stacs lab · 2019-10-25 · relaxed confidence windows 34 ℎ i...

52
Load Value Approximation Joshua San Miguel Mario Badr Natalie Enright Jerger

Upload: others

Post on 09-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Load Value Approximation

Joshua San Miguel

Mario Badr

Natalie Enright Jerger

Page 2: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Accessing Memory

2

shared caches, directory, network-on-chip

main memory

processor core

L1 cache

Page 3: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Accessing Memory

3

shared caches, directory, network-on-chip

main memory

miss

processor core

L1 cache

Page 4: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Accessing Memory

4

shared caches, directory, network-on-chip

main memory

Accessing memory is 10x – 100x greater latency and energy than accessing L1 cache!

processor core

L1 cache miss

Page 5: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Accessing Memory

5

shared caches, directory, network-on-chip

main memory

Accessing memory is 10x – 100x greater latency and energy than accessing L1 cache!

Higher efficiency via Approximate Computing…

processor core

L1 cache miss

Page 6: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximate Computing

Not all computations need to be precise.

6

http://www.zentut.com/

http://www.businessweek.com/

http://www.cc.gatech.edu/~cnieto6/

http://www.analyticbridge.com/

http://themusicparlour.blogspot.ca/

http://www.scientific-computing.com/

Data mining

Computer vision Audio and video processing

Gaming Machine learning Dynamical simulation

Page 7: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximate Computing

7

execution time

energy

Page 8: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximate Computing

8

execution time

energy error

Page 9: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximate Computing

9

execution time

energy error

Page 10: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximate Computing

Many applications can tolerate approximate data. 40% to nearly 100% of data footprint is approximate

[Sampson, MICRO 2013].

10

Page 11: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximate Computing

Many applications can tolerate approximate data. 40% to nearly 100% of data footprint is approximate

[Sampson, MICRO 2013].

11

Approximate value locality: Many data values are similar to or can be approximated from

previously seen values.

Page 12: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Outline

• Load Value Approximation

• Non-Speculative Operation

• Approximator Design

• Relaxed Confidence Windows

• Approximation Degree

• Methodology

• Evaluation

12

Page 13: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Load Value Approximation

13

processor core

L1 cache

shared caches, directory, network-on-chip

main memory

Page 14: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Load Value Approximation

14

processor core

L1 cache

shared caches, directory, network-on-chip

main memory

approximator

Page 15: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Load Value Approximation

15

processor core

L1 cache

shared caches, directory, network-on-chip

main memory

approximator load miss A

A?

Page 16: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Load Value Approximation

16

processor core

L1 cache

shared caches, directory, network-on-chip

main memory

approximator generate A_approx

A?

Page 17: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Load Value Approximation

17

processor core

L1 cache

shared caches, directory, network-on-chip

main memory

approximator

A_approx

Page 18: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Load Value Approximation

18

processor core

L1 cache

shared caches, directory, network-on-chip

main memory

approximator

A_approx

No speculation, no rollbacks.

Page 19: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Load Value Approximation

19

processor core

L1 cache

shared caches, directory, network-on-chip

main memory

approximator

A_approx

Page 20: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Load Value Approximation

20

processor core

L1 cache

shared caches, directory, network-on-chip

A_approx

main memory

fetch A_actual

approximator

Page 21: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Load Value Approximation

21

processor core

L1 cache

shared caches, directory, network-on-chip

main memory

A_approx

approximator train with A_actual

Page 22: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Load Value Approximation

22

processor core

L1 cache

shared caches, directory, network-on-chip

main memory

approximator

Learns past values. Estimates future values. Improves performance and saves energy.

A_approx

Page 23: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximator Design

23

ℎ , instruction

address

global history buffer

tag conf degree LHB

approximator table

𝑓

local history buffer

Page 24: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximator Design

24

ℎ , 1.0 2.2 3.1 instruction

address

global history buffer

tag conf degree LHB

approximator table

𝑓

local history buffer

4.1 3.9 4.0

time

Page 25: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximator Design

25

ℎ , 1.0 2.2 3.1 instruction

address

global history buffer

tag conf degree LHB

approximator table

𝑓

local history buffer

4.1 3.9 4.0

load miss A time

Page 26: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximator Design

26

ℎ , 1.0 2.2 3.1 instruction

address

global history buffer

tag conf degree LHB

approximator table

𝑓

local history buffer

4.1 3.9 4.0

load miss A time

PC ⊕ 1.0 ⊕ 2.2 ⊕ 3.1

Page 27: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximator Design

27

ℎ , 1.0 2.2 3.1 instruction

address

global history buffer

tag conf degree LHB

approximator table

𝑓

local history buffer

4.1 3.9 4.0

load miss A time

PC ⊕ 1.0 ⊕ 2.2 ⊕ 3.1

(4.1 + 3.9 + 4.0) / 3

A_approx = 4.0

Page 28: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximator Design

28

ℎ , 1.0 2.2 3.1 instruction

address

global history buffer

tag conf degree LHB

approximator table

𝑓

local history buffer

4.1 3.9 4.0

load miss A do_work(A_approx) time

Page 29: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximator Design

29

ℎ , 1.0 2.2 3.1 instruction

address

global history buffer

tag conf degree LHB

approximator table

𝑓

local history buffer

4.1 3.9 4.0

load miss A do_work(A_approx)

request(A_actual)

time

Page 30: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximator Design

30

ℎ , 1.0 2.2 3.1 instruction

address

global history buffer

tag conf degree LHB

approximator table

𝑓

local history buffer

4.1 3.9 4.0

load miss A do_work(A_approx)

request(A_actual) A_actual = 4.2

time

Page 31: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximator Design

31

ℎ , instruction

address

global history buffer

tag conf degree LHB

approximator table

𝑓

local history buffer

load miss A do_work(A_approx)

request(A_actual) A_actual = 4.2

time

2.2 3.1 4.2

3.9 4.0 4.2

Page 32: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximator Design – Other Considerations

32

• Floating-point precision

• History buffer sizes

• Stale values

More details in paper.

Page 33: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximator Design

33

Relaxed Confidence Windows How do we avoid making bad approximations?

Trade-off performance and error.

Approximation Degree Do we need to fetch the actual value from memory every time?

Trade-off energy and error.

Page 34: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Relaxed Confidence Windows

34

ℎ , 1.0 2.2 3.1 instruction

address

global history buffer

tag conf degree LHB

approximator table

𝑓

local history buffer

4.1 3.9 4.0

load miss A do_work(A_approx)

request(A_actual)

time

A_approx = 4.0

Page 35: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Relaxed Confidence Windows

35

ℎ , 1.0 2.2 3.1 instruction

address

global history buffer

tag conf degree LHB

approximator table

𝑓

local history buffer

4.1 3.9 4.0

load miss A do_work(A_approx)

request(A_actual)

time

A_approx = 4.0

A_actual = 9.0!

Page 36: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Relaxed Confidence Windows

36

tag conf degree LHB

When approximating:

if conf >= 0: use A_approx

else: don’t use A_approx

When updating:

if A_approx, A_actual differ by <= CONF_WINDOW%: conf++

else: conf--

Page 37: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Relaxed Confidence Windows – Output Error

37

0%

20%

40%

60%

80%

100%

ou

tpu

t er

ror

0% 5% 10% 20% infinite

Varying CONF_WINDOW%:

Page 38: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Relaxed Confidence Windows – L1-D MPKI

38

Varying CONF_WINDOW%:

0.0

0.2

0.4

0.6

0.8

1.0

0% 5% 10% 20% infinite

no

rmal

ized

L1

-D M

PK

I

CONF_WINDOW%

Page 39: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximator Design

39

Relaxed Confidence Windows How do we avoid making bad approximations?

Trade-off performance and error.

Approximation Degree Do we need to fetch the actual value from memory every time?

Trade-off energy and error.

Page 40: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximation Degree

40

ℎ , 1.0 2.2 3.1 instruction

address

global history buffer

tag conf degree LHB

approximator table

𝑓

local history buffer

4.1 3.9 4.0

load miss A do_work(A_approx)

request(A_actual)

time

A_approx = 4.0

Page 41: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximation Degree

41

ℎ , 1.0 2.2 3.1 instruction

address

global history buffer

tag conf degree LHB

approximator table

𝑓

local history buffer

4.1 3.9 4.0

load miss A do_work(A_approx)

request(A_actual)

time

A_approx = 4.0

A_actual = 4.0

Page 42: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximation Degree

42

ℎ , 1.0 2.2 3.1 instruction

address

global history buffer

tag conf degree LHB

approximator table

𝑓

local history buffer

4.1 3.9 4.0

load miss A do_work(A_approx)

request(A_actual)

time

A_approx = 4.0

A_actual = 4.0

Page 43: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximation Degree

43

tag conf degree LHB

When approximating:

if degree == APPROX_DEGREE: fetch A_actual

else: don’t fetch A_actual

When updating:

if degree == APPROX_DEGREE: degree = 0

else: degree++

Page 44: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximation Degree – Output Error

44

Varying APPROX_DEGREE:

0%

20%

40%

60%

80%

100%

ou

tpu

t er

ror

0 1 2 4 8 16

Page 45: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Approximation Degree – L1-D Fetches

45

Varying APPROX_DEGREE:

0

0.2

0.4

0.6

0.8

1

0 1 2 4 8 16

no

rmal

ized

L1

-D f

etch

es

APPROX_DEGREE

Page 46: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Methodology

46

Multi-threaded approximate applications PARSEC benchmark suite [Bienia, Princeton 2011]

Programmer annotations and ISA extensions [Esmaeilzadeh, ASPLOS 2012]

Approximator design space exploration Pin dynamic binary instrumentation tool [Luk, PLDI 2005]

Full-system simulation FeS2 cycle-level x86 simulator [Neelakantam, ASPLOS 2008]

Approximator, cache and memory energy consumption CACTI modeling tool [Thoziyoor, HP 2008]

Page 47: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Evaluation

47

0%

2%

4%

6%

8%

10%

12%

14%

16%

0 4 16

APPROX_DEGREE

application speedup energy savings

Page 48: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Evaluation

48

0%

2%

4%

6%

8%

10%

12%

14%

16%

0 4 16

APPROX_DEGREE

application speedup energy savings

Up to 28% speedup

Page 49: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Evaluation

49

0%

2%

4%

6%

8%

10%

12%

14%

16%

0 4 16

APPROX_DEGREE

application speedup energy savings

Up to 28% speedup

Up to 44% energy savings

Page 50: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Evaluation

50

0%

2%

4%

6%

8%

10%

12%

14%

16%

0 4 16

APPROX_DEGREE

application speedup energy savings

Reduces L1-D MPKI by 30% over traditional value predictor and prefetcher.

Up to 28% speedup

Up to 44% energy savings

Page 51: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Conclusion

51

Load Value Approximation

• Approximate Value Locality

• Non-Speculative

• Relaxed Confidence Windows

• Approximation Degree

↑ performance

↓ energy consumption

low output error

Page 52: Load Value Approximation - STACS Lab · 2019-10-25 · Relaxed Confidence Windows 34 ℎ i nstruction , 1.0 2.2 3.1 address global history buffer tag conf degree LHB approximator

Conclusion

52

baseline (precise) load value approximation