sam: optimizing multithreaded cores for speculative ... · sam : optimizing multithreaded cores for...

53
SAM: Optimizing Multithreaded Cores for Speculative Parallelism MALEEN ABEYDEERA , SUVINAY SUBRAMANIAN, MARK JEFFREY, JOEL EMER, DANIEL SANCHEZ PACT 2017

Upload: others

Post on 06-Aug-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAM:OptimizingMultithreadedCoresforSpeculativeParallelismMALEENABEYDEERA, SUVINAY SUBRAMANIAN, MARKJEFFREY,JOEL EMER, DANIEL SANCHEZ

PACT 2017

Page 2: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

ExecutiveSummaryAnalyzestheinterplaybetweenhardwaremultithreadingandspeculativeparallelism

(eg:ThreadLevelSpeculationandTransactionalMemory )

Conventionalmultithreadingcausesperformancepathologiesonspeculativeworkloads• Increaseinabortedwork• Inefficientuseofspeculationresources

Why?Allthreadsaretreatedequally

SpeculationAwareMultithreading(SAM)• Prioritizethreadsrunning tasksmorelikelytocommit

SAMmakesmultithreadingmoreuseful

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 2

Page 3: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

ExecutiveSummaryAnalyzestheinterplaybetweenhardwaremultithreadingandspeculativeparallelism

(eg:ThreadLevelSpeculationandTransactionalMemory )

Conventionalmultithreadingcausesperformancepathologiesonspeculativeworkloads• Increaseinabortedwork• Inefficientuseofspeculationresources

Why?Allthreadsaretreatedequally

SpeculationAwareMultithreading(SAM)• Prioritizethreadsrunning tasksmorelikelytocommit

SAMmakesmultithreadingmoreuseful

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 2

Page 4: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

Outline

BackgroundonspeculativeparallelismPitfallsofspeculativeparallelismwithconventionalmultithreadingSAMonin-ordercoresSAMonout-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 3

Page 5: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

BackgroundonSpeculativeParallelism

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 4

ParallelizetaskswhenthedependencesarenotknowninadvanceHardwareexecutesalltasksinparallel,abortinguponconflictsWhichtasktoabort?Conflictresolutionpolicy

Page 6: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

BackgroundonSpeculativeParallelism

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 4

ParallelizetaskswhenthedependencesarenotknowninadvanceHardwareexecutesalltasksinparallel,abortinguponconflictsWhichtasktoabort?Conflictresolutionpolicy

SpeculativeParallelism

Orderede.g.Thread-LevelSpeculation(TLS)

(Programorderdictatestheconflictresolutionorder)

Unorderede.g.HardwareTransactionalMemory

(Anyexecutionorderisvalid,buthigh-performanceconflictresolutionpoliciesdefineanorder)

Page 7: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

BackgroundonSpeculativeParallelism

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 4

ParallelizetaskswhenthedependencesarenotknowninadvanceHardwareexecutesalltasksinparallel,abortinguponconflictsWhichtasktoabort?Conflictresolutionpolicy

Implicitorderamongalltasksinanyspeculativesystem

SpeculativeParallelism

Orderede.g.Thread-LevelSpeculation(TLS)

(Programorderdictatestheconflictresolutionorder)

Unorderede.g.HardwareTransactionalMemory

(Anyexecutionorderisvalid,buthigh-performanceconflictresolutionpoliciesdefineanorder)

Page 8: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

BaselineSystem- Swarm[Jeffrey,MICRO’15]

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 5

void desTask(Timestamp ts , GateInput* input) {Gate* g = input ->gate ();bool toggledOutput = g.simulateToggle(input); if ( toggledOutput ) {

for (GateInput* i : g-> connectedInputs ()) {swarm::enqueue(desTask , ts+delay(g,i), i);

}}

}

Page 9: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

BaselineSystem- Swarm[Jeffrey,MICRO’15]

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 5

void desTask(Timestamp ts , GateInput* input) {Gate* g = input ->gate ();bool toggledOutput = g.simulateToggle(input); if ( toggledOutput ) {

for (GateInput* i : g-> connectedInputs ()) {swarm::enqueue(desTask , ts+delay(g,i), i);

}}

} Taskscreatechildrentasks(functionptr,timestamp,args)

Timestampedtasks

Page 10: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

BaselineSystem- Swarm[Jeffrey,MICRO’15]

Tasksappeartoexecuteintimestamporder

Unorderedexecutionviaequaltimestamps

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 5

void desTask(Timestamp ts , GateInput* input) {Gate* g = input ->gate ();bool toggledOutput = g.simulateToggle(input); if ( toggledOutput ) {

for (GateInput* i : g-> connectedInputs ()) {swarm::enqueue(desTask , ts+delay(g,i), i);

}}

} Taskscreatechildrentasks(functionptr,timestamp,args)

Timestampedtasks

Page 11: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SwarmMicroarchitecture

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 6

Equaltimestamps:globalorderviaVirtualTime(VT)

Timestamp Tiebreaker

Virtual Time

Page 12: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SwarmMicroarchitecture

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 6

Mem / IO

Mem

/ IO

Mem / IO

Mem

/ IO

16-tile, 64-core CMP Tile Organization

Core Core Core Core

L1I/D L1I/D L1I/D L1I/D

L2

L3 SliceRouter

Task Unit

Tile

Equaltimestamps:globalorderviaVirtualTime(VT)

Timestamp Tiebreaker

Virtual Time

Page 13: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SwarmMicroarchitecture

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 6

Mem / IO

Mem

/ IO

Mem / IO

Mem

/ IO

16-tile, 64-core CMP Tile Organization

Core Core Core Core

L1I/D L1I/D L1I/D L1I/D

L2

L3 SliceRouter

Task Unit

Tile

Equaltimestamps:globalorderviaVirtualTime(VT)

Tasksexecuteout-of-order,butcommitinVTorder

Timestamp Tiebreaker

Virtual Time

Commitqueue:stateoftaskswaitingtocommit

Page 14: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

Outline

BackgroundonspeculativeparallelismPitfallsofspeculativeparallelismwithconventionalmultithreadingSAMonin-ordercoresSAMonout-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 7

Page 15: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

PitfallsofSpeculation-ObliviousMultithreading

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8

Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order

Page 16: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

PitfallsofSpeculation-ObliviousMultithreading

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8

Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order

Page 17: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

PitfallsofSpeculation-ObliviousMultithreading

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8

Insights:1.Multithreadingcanbehighlybeneficial

Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order

Micro-opsissuedfromcommitted tasks

Noreadymicro-opstoissue

Page 18: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

PitfallsofSpeculation-ObliviousMultithreading

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8

Insights:1.Multithreadingcanbehighlybeneficial

However,multithreadingcanalsoleadto:2.Increasedaborts

Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order

Micro-opsissuedfromcommitted tasks

Noreadymicro-opstoissue

Micro-opsissuedfromabortedtasks

Page 19: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

PitfallsofSpeculation-ObliviousMultithreading

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8

Insights:1.Multithreadingcanbehighlybeneficial

However,multithreadingcanalsoleadto:2.Increasedaborts3.Inefficientuseofspeculationresources

Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order

Micro-opsissuedfromcommitted tasks

Noreadymicro-opstoissue

Micro-opsissuedfromabortedtasks

Resourcestalls

Page 20: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

PitfallsofSpeculation-ObliviousMultithreading

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8

Insights:1.Multithreadingcanbehighlybeneficial

However,multithreadingcanalsoleadto:2.Increasedaborts3.Inefficientuseofspeculationresources

Unlikely-to-committaskshurtthethroughputoflikely-to-commitones

Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order

Micro-opsissuedfromcommitted tasks

Noreadymicro-opstoissue

Micro-opsissuedfromabortedtasks

Resourcestalls

Page 21: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

Speculation-AwareMultithreading

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 9

Prioritizethreadsaccordingtotheirconflictresolutionpriorities

ReduceSpeculationResourceStalls(taskscommitearly)

ReduceAborts(focusresourcesontaskslikelytocommit)

Page 22: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

Outline

BackgroundonspeculativeparallelismPitfallsofspeculativeparallelismwithconventionalmultithreadingSAMonin-ordercoresSAMonout-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 10

Page 23: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonin-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 11

SMTIssue

Fetch Decode leslesRegisterFiles

Pipe 0Pipe 1

Int ALUFP ALU

Int ALUMem/DCache

Thread micro-op queues

Page 24: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonin-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 11

SMTIssue

Fetch Decode leslesRegisterFiles

Pipe 0Pipe 1

Int ALUFP ALU

Int ALUMem/DCache

Thread micro-op queues

Page 25: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonin-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 11

SMTIssue

Fetch Decode leslesRegisterFiles

Pipe 0Pipe 1

Int ALUFP ALU

Int ALUMem/DCache

Thread micro-op queues

Conflict resolutionpriority updates(Virtual Times)

Task Unit

Page 26: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonin-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 11

SMTIssue

Fetch Decode leslesRegisterFiles

Pipe 0Pipe 1

Int ALUFP ALU

Int ALUMem/DCache

Thread micro-op queues

SAM issue priorities(higher is better)

SortMax

Ready

52:9

52:717:195:4

Virtual Times

3

24

1

IssueThreadID

Conflict resolutionpriority updates(Virtual Times)

Task Unit

Page 27: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

ExperimentalMethodology

BaselineSystem• Swarm+Wait-N-GoTM [Jafrietal.ASPLOS’13]conflictresolutiontechniques• Cycle-accurate,event-driven,Pin-basedsimulator• Modelsystemsupto64cores• Cores:2wideissue,upto8threadspercore

Benchmarks• Ordered:Swarm[Jeffreyetal.MICRO’15,MICRO’16]– 8benchmarks• Unordered:STAMP[Minhetal. IISWC’08]– 8benchmarks

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 12

Page 28: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMmakesmultithreadingmoreeffective

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13

1 Thread

Ordered Benchmarks Unordered Benchmarks

Page 29: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMmakesmultithreadingmoreeffective

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13

8 Thread Round Robin1 Thread

Ordered Benchmarks Unordered Benchmarks

Page 30: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMmakesmultithreadingmoreeffective

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13

8 Thread SAM8 Thread Round Robin1 Thread

Ordered Benchmarks Unordered Benchmarks

Page 31: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMmakesmultithreadingmoreeffective

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13

8 Thread SAM8 Thread Round Robin1 Thread

Ordered Benchmarks Unordered Benchmarks

8threadedcoresoutperformsinglethreadedcoresby1.85X

WithSAM,thebenefitincreasesto2.33X

Page 32: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMmakesmultithreadingmoreeffective

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13

8 Thread SAM8 Thread Round Robin1 Thread

Ordered Benchmarks Unordered Benchmarks

8threadedcoresoutperformsinglethreadedcoresby1.85X

WithSAM,thebenefitincreasesto2.33X

Page 33: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMmakesmultithreadingmoreeffective

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13

8 Thread SAM8 Thread Round Robin1 Thread

Ordered Benchmarks Unordered Benchmarks

8threadedcoresoutperformsinglethreadedcoresby1.85X

WithSAM,thebenefitincreasesto2.33X

Page 34: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

WhydoesSAMhelp?

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 14

SAMmatchesRRwhentherearenopathologies

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready Other

Page 35: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

WhydoesSAMhelp?

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 14

SAMmatchesRRwhentherearenopathologies

SAMreduceswastedwork

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready Other

Page 36: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

WhydoesSAMhelp?

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 14

SAMmatchesRRwhentherearenopathologies

SAMreduceswastedwork

SAMreducesresourcestalls

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready Other

Page 37: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

Outline

BackgroundonspeculativeparallelismPitfallsofspeculativeparallelismwithconventionalmultithreadingSAMonin-ordercoresSAMonout-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 15

Page 38: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonout-of-ordercoresUnlikein-ordercores,prioritiesaffectpipelineefficiency• Asinglethreadcanclogcoreresources• Increasedwrongpathexecution

Despitethese,prioritizingtasksisbetter

Needforaggressiveprioritizationaffectscoredesign• Shared,notpartitionedROBs

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 16

SMTIssue

Fetch Decode

Thread micro-op queues

Issue Buffer

PhysicalRegFile

Pipe 0 ReorderBuffer

Pipe 1

In-flight uops (for ICount)

3 9 4 2

SAM priorities

3 4 2 1

Conflict resolutionpriority updates(from task unit)

Conflict res. priorities

2 3 2 1

Page 39: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMtradeoffs without-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 17

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Baselinepolicy- ICount(IC)

sssp – 8threads

Page 40: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMtradeoffs without-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 17

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Baselinepolicy- ICount(IC)

SAMismorebeneficialwithdynamicallysharedROBsReducesaborts+resourcestalls

sssp – 8threads

Page 41: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMtradeoffs without-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 17

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Baselinepolicy- ICount(IC)

SAMismorebeneficialwithdynamicallysharedROBsReducesaborts+resourcestalls

Butreducedpipelineefficiency

sssp – 8threads

Page 42: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMtradeoffs without-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 17

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Baselinepolicy- ICount(IC)

SAMismorebeneficialwithdynamicallysharedROBsReducesaborts+resourcestalls

ButreducedpipelineefficiencyIncreaseinwrong-pathissues+not-readystalls

sssp – 8threads

Page 43: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

AdaptiveSAMpolicy

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 44: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

AdaptiveSAMpolicy

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18

HardwarecounterstotrackcyclesMicro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 45: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

AdaptiveSAMpolicy

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18

Aborted Resource NotreadyWrongpath

HardwarecounterstotrackcyclesMicro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 46: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

AdaptiveSAMpolicy

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18

Aborted Resource NotreadyWrongpath

Hardwarecounterstotrackcycles

+

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 47: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

AdaptiveSAMpolicy

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18

Aborted Resource NotreadyWrongpath

Hardwarecounterstotrackcycles

Cycleslosttotasklevelspeculation

+

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 48: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

AdaptiveSAMpolicy

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18

Aborted Resource NotreadyWrongpath

Hardwarecounterstotrackcycles

Cycleslosttotasklevelspeculation

Cycleslosttopipelineinefficiencies

+ +

>

UseSAM UseICount

True False

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 49: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonOoO cores(allbenchmarks)

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 19

At8threads/core:• Multithreadingimprovesperformance

oversinglethreadedcoresby1.1x

Averageoverallbenchmarks

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 50: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonOoO cores(allbenchmarks)

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 19

At8threads/core:• Multithreadingimprovesperformance

oversinglethreadedcoresby1.1x• WithSAM,improvementrisesto1.5x

Averageoverallbenchmarks

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 51: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonOoO cores(allbenchmarks)

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 19

At8threads/core:• Multithreadingimprovesperformance

oversinglethreadedcoresby1.1x• WithSAM,improvementrisesto1.5x

Adaptivepolicyslightlyincreasesperformanceat2and4threads

Averageoverallbenchmarks

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 52: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

Conclusion

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 20

Conventionalmultithreadingcausesperformancepathologiesonspeculativeworkloads• Increaseinabortedwork• Inefficientuseofspeculationresources

SpeculationAwareMultithreading(SAM)Prioritizethreadsrunningtasksmorelikelytocommit

SAMmakesmultithreadingmoreuseful

Page 53: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

Questions?

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 21

Conventionalmultithreadingcausesperformancepathologiesonspeculativeworkloads• Increaseinabortedwork• Inefficientuseofspeculationresources

SpeculationAwareMultithreading(SAM)Prioritizethreadsrunningtasksmorelikelytocommit

SAMmakesmultithreadingmoreuseful