thread clustering: sharing-aware scheduling on s mp-cmp-smt...

15
Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT Multiprocessors Department of Electrical and Computer Engineering University of Toronto, Canada Presenter: Hwan-jin Yong EuroSys’07 David Tam, Reza Azimi, Michael Stumm

Upload: others

Post on 26-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT …csl.skku.edu/uploads/ECE5658S17/week4b.pdf · 2017-03-26 · Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT

ThreadClustering:Sharing-AwareSchedulingonSMP-CMP-SMTMultiprocessors

DepartmentofElectricalandComputerEngineeringUniversityofToronto,Canada

Presenter:Hwan-jin YongEuroSys’07

DavidTam,RezaAzimi,MichaelStumm

Page 2: Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT …csl.skku.edu/uploads/ECE5658S17/week4b.pdf · 2017-03-26 · Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT

Outline

• Introduction:OpenPower 720Architecture• Motivatoin• PerformanceManagementUnit

• DesignofthreadClustrering Scheme• Evaluation• Contribute• Summary

Page 3: Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT …csl.skku.edu/uploads/ECE5658S17/week4b.pdf · 2017-03-26 · Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT

OpenPower 720Server• Design:Performance,Scalability,Reliabilityetc• Power7processors(SMP-CMP-SMTMultiprocessor)

• DesignedMulti-corearchitecture(calledCMP)forleadingthethroughput• Sharedmemorymultiprocessors(SMP)• SimultaneousMultithreading(SMT)

IBMOpenPower 720

Page 4: Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT …csl.skku.edu/uploads/ECE5658S17/week4b.pdf · 2017-03-26 · Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT

Simutaneous multithreading(SMT)• MultipleindependentthreadstoexecuteSIMULTANEOUSLYontheSAMEcore

• IncreaseCoreEfficiency• Example

• Singlethread:Theprocesspipelinegetstalledwhenwaitingfordatatoarrivefrommemory

• Ifonethreadiswaitingforafloatingpointoperationtocomplete,anotherthreadcanusetheintegerunits

• Power5• BySMT,2virtualprocessorperrealprocessor

Power5Layout

Page 5: Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT …csl.skku.edu/uploads/ECE5658S17/week4b.pdf · 2017-03-26 · Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT

ExistingOSdoesn’thandlethecomplexityofmulticoreprocessors

Motivation• Thepoorperformanceis…what….• Solution?Power-5(8-logicalprocessors)

• IncreaseCacheSize...Money!• Addmoreprocessor…Power!• HiremoreChipArchitectureengineers..??

Page 6: Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT …csl.skku.edu/uploads/ECE5658S17/week4b.pdf · 2017-03-26 · Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT

OverviewofThreadClusteringScheme• DesignofThreadClusteringScheme

1.MonitoringStallBreakdown

2.DetectingSharingPatterns

3.ThreadClustering

4.ThreadMigration

Page 7: Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT …csl.skku.edu/uploads/ECE5658S17/week4b.pdf · 2017-03-26 · Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT

Step1:Monitoring StallBreakdown• PMU

• DetectvariouseventthatcancountinProcessor

• IntroductionPMUonCortex-R• SelectOnlythreeeventregister• Overflowhandling• Difficulttoextracthigh-levelinsight

<Cortex-R(eventup-to40ea),ReferenceManual>

FuncA

FuncB

FuncC

PMU On PMU Off

Data Cache Miss: 0Branch Mis-Prediction: 0Instruction count : 0ClockCount : 0

Data Cache Miss: ABranch Mis-Prediction: BInstruction count : CClockCount : D

Page 8: Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT …csl.skku.edu/uploads/ECE5658S17/week4b.pdf · 2017-03-26 · Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT

Step2:DetectSharingPatterns• Construnction shMaps

• buildshMap (summarydatastructure),countremotecacheaccess(8-bitcounter)

• SetoneregionindexonshMap Vectorifcachemississatisfiedbyremotecacheaccess

• Regionsize:128bytes(equaltocachesize)• Buthowtoencoretotalvirtualaddressspacewithonly128regionentry

• Usesimplehashingfunction(region=address%128?)shMap Vector (Thread A)

0 1 127

addess space (Power5 : 64 bit)2^640

Page 9: Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT …csl.skku.edu/uploads/ECE5658S17/week4b.pdf · 2017-03-26 · Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT

Step2:DetectSharingPatterns• ForLowOverhead&ReduceNoise(falsereport)onStep2

• TemporalSampling:noteverytimetorecordandprocesswhenremotecacheaccess,OnlyonesetinNoccurrencesofremotecacheaccess

• SpatialSampling: Toreducehashcollisions(falsereport)andcanmaintainsmallmemorysizeofshmap vector

shMap Vector (Thread A) 6

0 1 127

ThreadA

HashFunc

2

shMap Vector (Thread A) 2

0 1 127

ThreadA

HashFunc

2

shMapFilter

FirstCome,GotTiket!

Page 10: Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT …csl.skku.edu/uploads/ECE5658S17/week4b.pdf · 2017-03-26 · Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT

Step3:ThreadClustering

• DefinethesimilarityoftwoshMap vectors

10

shMap Vector (Thread A)

256 200 130

shMap Vector (Thread B)

256 150

The similarity value will be high when two threads are sharing data (Theard A and B)

Page 11: Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT …csl.skku.edu/uploads/ECE5658S17/week4b.pdf · 2017-03-26 · Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT

Evaluation(1/2)

11

• Continuousverticaldark-linemeansclusteredthreads

Page 12: Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT …csl.skku.edu/uploads/ECE5658S17/week4b.pdf · 2017-03-26 · Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT

Evaluation(2/2)

12

• Performanceimprovementofup-to7%• Reducestallduetoremotecacheaccess

Page 13: Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT …csl.skku.edu/uploads/ECE5658S17/week4b.pdf · 2017-03-26 · Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT

RelatedWork:isitbeingusednowdays?

13

• 추가준비중

Page 14: Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT …csl.skku.edu/uploads/ECE5658S17/week4b.pdf · 2017-03-26 · Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT

Summary

• ProposeNewThreadscheduling• Usingrun-timeinformationfromhardwareperformancecount

• Detectionsharingpatterndifferentthreads• FindBestlocationthreadpositionnotmakingremotecacheaccessanymore

• OSJobschedulertore-assignthreadsthatsharedatatothesamechipdomain(memorydomain)withlowoverhead

14

Page 15: Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT …csl.skku.edu/uploads/ECE5658S17/week4b.pdf · 2017-03-26 · Thread Clustering: Sharing-Aware Scheduling on S MP-CMP-SMT

Thankyou!

15

?Questions?