fairness via source throttling: a configurable and high-performance fairness substrate for...

30
Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee* Onur Mutlu Yale N. Patt * * HPS Research Group The University of Texas at Austin ‡ Computer Architecture Laboratory Carnegie Mellon University

Upload: lewis-britcher

Post on 01-Apr-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

Fairness via Source Throttling:

A configurable and high-performance fairness substrate for multi-core memory systems

Eiman Ebrahimi*

Chang Joo Lee*

Onur Mutlu‡

Yale N. Patt*

* HPS Research Group The University of Texas at

Austin

‡ Computer Architecture LaboratoryCarnegie Mellon University

Page 2: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

2

Background and Problem

Core 0 Core 1 Core 2 Core N

Shared Cache

Memory Controller

DRAMBank

0

DRAMBank

1

DRAM Bank

2

... DRAMBank K

...

Shared MemoryResources

Chip BoundaryOn-chipOff-chip

2

Page 3: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

3

•Applications slow down due to interference from memory requests of other applications

3

Background and Problem

SharedTi

•Slowdown of application i = Ti

Alone

Max{Slowdown i} over all applications i

Min{Slowdown i} over all applications i(MICRO ’07)

•Unfairness =

•A memory system is fair if slowdowns ofsame-priority applications are equal(MICRO ‘06, MICRO ‘07, ISCA ‘08)

Page 4: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

4

• Magnitude of each application’s slowdown depends on concurrently running applications’ memory behavior

• Large disparities in slowdowns are unacceptable• Low system performance • Vulnerability to denial of service attacks• Difficult for system software to enforce priorities

4

Background and Problem

Page 5: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

5

• Background and Problem

• Motivation for Source Throttling

• Fairness via Source Throttling (FST)

• Evaluation

• Conclusion

5

Outline

Page 6: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

6

• Primarily manage inter-application interference in only one particular resource

• Shared Cache, Memory Controller, Interconnect, etc.

• Combining techniques for the different resources can result in negative interaction

• Approaches that coordinate interaction among techniques for different resources require complex implementations

Our Goal: Enable fair sharing ofthe entire memory system by dynamically detecting

and controlling interference in a coordinated manner

6

Prior Approaches

Page 7: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

7

• Manage inter-application interference at the cores, not at the shared resources

• Dynamically estimate unfairness in the memory system

• If unfairness > system-software-specified target thenthrottle down core causing unfairness & throttle up core that was unfairly treated

7

Our Approach

Page 8: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

A4A4B1B1

A1A1A2A2A3A3

Oldest ⎧⎪⎪⎩

Shared MemoryResources

A: ComputeCompute Stall on Stall on A1A1

Stall on Stall on A2A2

Stall on Stall on A3A3

Stall on Stall on A4A4

ComputeCompute Stall waiting for shared resourcesStall waiting for shared resources Stall on Stall on B1B1

B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged

Interference Core A’s stall

timeCore B’s stall time

A4A4

B1B1

A1A1

A2A2A3A3

⎧⎪⎪⎩

Shared MemoryResources

A: ComputeCompute Stall on Stall on A1A1

Stall on Stall on A2A2

ComputeCompute Stall Stall wait.wait.

Stall on Stall on B1B1

B:

Dynamically detect application A’s interference for application B and throttle down application A

Core A’s stall time

Core B’s stall time

Fair Source Throttling

Stall Stall wait.wait.

Request Generation Order

A1, A2, A3,

A4, B1B1, A2, A3, A4

queue of requests to shared resources

queue of requests to shared resources

Saved Cycles Core BOldest

Intensive application A generates many requests and causes long stall times for less intensive application B

Throttled Requests

Stall on Stall on A4A4

Stall on Stall on A3A3

Extra Cycles Core A

Page 9: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

9

• Background and Problem

• Motivation for Source Throttling

• Fairness via Source Throttling (FST)

• Evaluation

• Conclusion

9

Outline

Page 10: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

10

• Runtime Unfairness Evaluation• Dynamically estimates the unfairness in the

memory system

• Dynamic Request Throttling• Adjusts how aggressively each core makes

requests to the shared resources

10

Fairness via Source Throttling (FST)

Page 11: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

11

Runtime UnfairnessEvaluation

DynamicRequest

Throttling

1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)

if (Unfairness Estimate >Target) { 1-Throttle down App-interfering 2-Throttle up App-slowest}

FSTUnfairness Estimate

App-slowestApp-interfering

⎪ ⎨ ⎪ ⎧⎩

Slowdown Estimation

TimeInterval 1 Interval 2 Interval 3

Runtime UnfairnessEvaluation

DynamicRequest

Throttling

Fairness via Source Throttling (FST)

Page 12: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

12

Runtime UnfairnessEvaluation

DynamicRequest

Throttling

1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)

if (Unfairness Estimate >Target) { 1-Throttle down App-interfering 2-Throttle up App-slowest}

FSTUnfairness Estimate

App-slowestApp-interfering

12

Fairness via Source Throttling (FST)

Page 13: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

13

•How can be estimated in shared mode?TiAlone

13

SharedTi

•Slowdown of application i = Ti

Alone

Max{Slowdown i} over all applications i

Min{Slowdown i} over all applications i•Unfairness =

TiExcess

• is the number of extra cycles it takes application i to execute due to interference

TiShared

=TiAlone

- TiExcess

Estimating System Unfairness

Page 14: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

14

0 0 0 0

Interference per corebit vector

Core # 0 1 2 3

Core 0 Core 1 Core 2 Core 3

Bank 0

Bank 1

Bank 2

Bank 7

...

Memory Controller

Shared Cache

Three interference sources:1. Shared Cache2. DRAM bus and bank3. DRAM row-buffers

FST hardware

14

Bank 2

Tracking Inter-Core Interference

Row

Page 15: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

15

Bank 2

Row B

Bank 0 Bank 1 Bank 7...

Core 0 Core 1

Row HitRow ConflictRow HitRow Conflict

Row BShadow Row Address Register(SRAR) Core 0 :

Row A

Interference inducedrow conflict

Interference per corebit vector

0 0

Core # 0 1

1

Shadow Row Address Register(SRAR) Core 1 :

queue of requests to bank 2

FST additions

Row Buffer:

15

Tracking DRAM Row-Buffer Interference

Row BRow ARow A

Row B

Page 16: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

16

0 0 0 0

Interference per corebit vector

Core # 0 1 2 3

00

00

00

00

Excess Cycles Counters per core

1

TTCycle Count T+1T+1

11

T+2T+2

22FST hardware

1

T+3T+3

33

11

Core 0 Core 1 Core 2 Core 3

Bank 0

Bank 1

Bank 2

Bank 7

...

Memory Controller

Shared Cache

16

TiExcess

⎪⎨⎪⎧

Tracking Inter-Core Interference

Page 17: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

17

Runtime UnfairnessEvaluation

DynamicRequest

Throttling

1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)

if (Unfairness Estimate >Target) { 1-Throttle down App-interfering 2-Throttle up App-slowest}

FSTUnfairness Estimate

App-slowestApp-interfering

17

Fairness via Source Throttling (FST)

Page 18: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

18

Cnt 3Cnt 3Cnt 2Cnt 2Cnt 1Cnt 1Cnt 0Cnt 00

0 0 0 -

Interference per corebit vector

Core #0 1 2 3--

Cnt 1,0Cnt 1,0

Cnt 2,0Cnt 2,0

Cnt 3,0Cnt 3,0

Excess Cycles Counters per core

• To identify App-interfering, for each core i• FST separately tracks interference caused

by each core j ( j ≠ i )

0 0 - 0

0 - 0 0

- 0 0 0

⎪ ⎨ ⎪ ⎧⎩

⎪⎨⎪⎧

Interfered with core

Interfering core

Cnt 0,1Cnt 0,1

--

Cnt 2,1Cnt 2,1

Cnt 3,1Cnt 3,1

Cnt 0,2Cnt 0,2

Cnt 1,2Cnt 1,2

--

Cnt 3,2Cnt 3,2

Cnt 0,3Cnt 0,3

Cnt 1,3Cnt 1,3

Cnt 2,3Cnt 2,3

--

1

core 2interfered

withcore 1

Cnt 2,1++

Cnt 2,1++

0123

Row with largest count determines App-interfering

App-slowest = 2

18

Tracking Inter-Core Interference

Page 19: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

19

Runtime UnfairnessEvaluation

DynamicRequest

Throttling

1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)

if (Unfairness Estimate >Target) { 1-Throttle down App-interfering 2-Throttle up App-slowest}

FSTUnfairness Estimate

App-slowestApp-interfering

19

Fairness via Source Throttling (FST)

Page 20: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

20

• Goal: Adjust how aggressively each core makes requests to the shared resources

• Mechanisms:

• Miss Status Holding Register (MSHR) quota• Controls the number of concurrent requests

accessing shared resources from each application

• Request injection frequency• Controls how often memory requests are issued

to the last level cache from the MSHRs

20

Dynamic Request Throttling

Page 21: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

21

• Throttling level assigned to each core determines both MSHR quota and request injection rate

Throttling level MSHR quota Request Injection Rate

100% 128 Every cycle

50% 64 Every other cycle

25% 32 Once every 4 cycles

10% 12 Once every 10 cycles

5% 6 Once every 20 cycles

4% 5 Once every 25 cycles

3% 3 Once every 30 cycles

2% 2 Once every 50 cycles

Total # ofMSHRs: 128

21

Dynamic Request Throttling

Page 22: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

22

TimeInterval i Interval i+1 Interval i+2

Runtime UnfairnessEvaluation

DynamicRequest Throttling

FSTUnfairness Estimate

App-slowest

App-interfering

Throttling Levels

Core 0 Core 1 Core 350% 100% 10% 100%25% 100% 25% 100%25% 50% 50% 100%

Interval iInterval i + 1Interval i + 2

3

Core 2

Core 0

Core 0 Core 2Throttle down Throttle up

2.5

Core 2

Core 1

Throttle down Throttle up

System software fairness goal: 1.4

22

FST at Work

Slowdown Estimation

⎪ ⎨ ⎪ ⎧⎩

Slowdown Estimation

⎪ ⎨ ⎪ ⎧⎩

Page 23: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

23

• Different fairness objectives can be configured by system software• Estimated Unfairness > Target Unfairness

• Estimated Max Slowdown > Target Max Slowdown

• Estimated Slowdown(i) > Target Slowdown(i)

• Support for thread priorities• Weighted Slowdown(i) =

Estimated Slowdown(i) x Weight(i)

23

System Software Support

Page 24: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

24

• Total storage cost required for 4 cores is ∼ 12KB

• FST does not require any structures or logic that are on the processor’s critical path

24

Hardware Cost

Page 25: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

25

• Background and Problem

• Motivation for Source Throttling

• Fairness via Source Throttling (FST)

• Evaluation

• Conclusion

25

Outline

Page 26: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

26

• x86 cycle accurate simulator

• Baseline processor configuration

• Per-core• 4-wide issue, out-of-order, 256 entry ROB

• Shared (4-core system)• 128 MSHRs

• 2 MB, 16-way L2 cache

• Main Memory• DDR3 1333 MHz

• Latency of 15ns per command (tRP, tRCD, CL)

• 8B wide core to memory bus

26

Evaluation Methodology

Page 27: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

27

grom

+ar

t+as

tar+

h26

4

art+

gam

es+Gem

s+h2

64

lbm

+om

net+

apsi+

vorte

x

art+

lesli

e+ga

mes

+gr

om

art+

asta

r+lesli

e+cr

afty

art+

milc

+vo

rtex+

calcul

ix

luca

s+am

mp+

xalanc

+gr

om

lbm

+Gem

s+as

tar+

mes

a

mgr

id+pa

rser

+so

plex

+pe

rlb

gcc0

6+xa

lanc

+lb

m+ca

ctus

gmea

n

44.4%

36%

27

System Unfairness Results

Page 28: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

28

gmea

n

25.6%

14%

grom

+ar

t+as

tar+

h26

4

art+

gam

es+Gem

s+h2

64

lbm

+om

net+

apsi+

vorte

x

art+

lesli

e+ga

mes

+gr

om

art+

asta

r+lesli

e+cr

afty

art+

milc

+vo

rtex+

calcul

ix

luca

s+am

mp+

xalanc

+gr

om

lbm

+Gem

s+as

tar+

mes

a

mgr

id+pa

rser

+so

plex

+pe

rlb

gcc0

6+xa

lanc

+lb

m+ca

ctus

28

System Performance Results

Page 29: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

29

• Fairness via Source Throttling (FST) is a new fair and high-performance shared resource management approach for CMPs

• Dynamically monitors unfairness and throttles down sources of interfering memory requests

• Eliminates the need for and complexity of multiple per-resource fairness techniques

• Improves both system fairness and performance

• Incorporates thread weights and enables different fairness objectives

29

Conclusion

Page 30: Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee * Onur

Fairness via Source Throttling:

A configurable and high-performance fairness substrate for multi-core memory systems

Eiman Ebrahimi*

Chang Joo Lee*

Onur Mutlu‡

Yale N. Patt*

* HPS Research Group The University of Texas at

Austin

‡ Computer Architecture LaboratoryCarnegie Mellon University