mitigating variability in hpc systems and …sc17.supercomputing.org/sc17 archive...mitigating...

Post on 11-Aug-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MITIGATING VARIABILITY IN HPC SYSTEMS AND APPLICATIONS FOR PERFORMANCE AND POWER EFFICIENCY

B ILGE ACUN

DEPARTMENT OF COMPUTER SC IENCE

UNIVERS ITY OF I LL INOIS AT URBANA-CHAMPAIGN

DOC TORAL SHOWCASE– NOV, 2017

1

DissertaDon Goal ToincreasetheperformanceandpowerefficiencyofHighPerformanceCompu:ng(HPC)systemsthroughmi:ga:ngvarioussourcesofvariabilitywithoutsacrificingfromperformance

• AnalyzevariabilityinlargescaleHPCsystems◦  Frequency,power,temperature

• Addresseachofthesourcesofthevariability◦  ViasoHwareandhardwaretechniques

B.ACUN,P.MILLER,L.V.KALE.“VARIATIONAMONGPROCESSORSUNDERTURBOBOOSTINHPCSYSTEMS.”INTERNATIONALCONFERENCEONSUPERCOMPUTING(ICS).2016. 2

Outline 1.  Introduc:on

2.  ADynamicRun:meInterac:ngwithDataCenter’sResourceManager

3.  Varia:onAnalysis:Frequency,Temperature,Power

4.  Mi:ga:ngFrequencyVaria:on

5.  Mi:ga:ngTemperatureVaria:on

6.  Mi:ga:ngPowerVaria:on

7.  Mi:ga:ngWithinApplica:onVaria:ons

8.  Conclusion

3

Outline 1.  Introduc:on

2.  ADynamicRun:meInterac:ngwithDataCenter’sResourceManager

3.  Varia:onAnalysis:Power,Temperature,Frequency

4.  Mi:ga:ngFrequencyVaria:on

5.  Mi:ga:ngTemperatureVaria:on

6.  Mi:ga:ngPowerVaria:on

7.  Mi:ga:ngWithinApplica:onVaria:ons

8.  Conclusion

4

2016 U.S. Data Center Energy Report

8

Equation 1

� ���� � ���

���������

Where f = fraction of 5th year shipments in installed base IBy = installed base in year y Sy = shipments in year y

Figure 5. Total Volume Server Installed Base Estimates from Three Studies

Figure 6. Volume Server Installed Base 2000-2020

!

��

��

��

��

�!

��

���� ���� ���� ���� ���! ���� ���� ���� ���� ���! ������

���� ������� ���������

���

����� �

����

���� �

����� �

��� �� ����

������������� ��������� ������

��

��

��

��

��

��

���� ���� ���� ���� ���� ���� ���� ����

������ ������� ���������

���

����� �

����������� ����

���������� ����

���������� ����

��������� ����

���������

5

ES-2

The combination of these efficiency trends has resulted in a relatively steady U.S data center electricity demand over the past 5 years, with little growth expected for the remainder of this decade. It is important to note that this near constant electricity demand across the decade is occurring while simultaneously meeting a drastic increase in demand for data center services; data center electricity use would be significantly higher without these energy efficiency improvements. A counterfactual scenario was created for this study that estimates what data center energy consumption would have been if industry energy-savings efforts were halted in 2010. For this scenario, the follow metrics remain static at 2010 industry-wide levels from 2010-2020:

• Average server utilization • Server power scaling at low utilization • Average power draw of hard disk drives • Average power draw of network ports • Average infrastructure efficiency (i.e., PUE)

The resulting electricity demand, shown in Figure ES-1, indicates that more than 600 additional billion kWh would have been required across the decade.

Figure ES-1 Projected Data Center Total Electricity Use

Estimates include energy used for servers, storage, network equipment, and infrastructure in all U.S. data centers. The solid line represents historical estimates from 2000-2014 and the dashed lines represent five projection scenarios through 2020; Current Trends, Improved Management (IM), Best Practices (BP), Hyperscale Shift (HS), and the static 2010 Energy Efficiency counterfactual.

* Figures and data are taken from A.Shehabietal.“Unitedstatesdatacenterenergyusagereport,”LawrenceBerkeleyNa:onalLaboratory.LBNL-1005775,vol.4,2016.

•  Energyconsump:onhasbeenflat-lineddueto(*):

•  Improvedopera:ons•  Hardwareadvancements

•  Datacenterelectricityisspentby(*):•  Servers:~40%•  Infrastructure:~40%•  Networkandstorage:~20%

Outline 1.  Introduc:on

2.  ADynamicRun:meInterac:ngwithDataCenter’sResourceManager

3.  Varia:onAnalysis:Power,Temperature,Frequency

4.  Mi:ga:ngFrequencyVaria:on

5.  Mi:ga:ngTemperatureVaria:on

6.  Mi:ga:ngPowerVaria:on

7.  Mi:ga:ngWithinApplica:onVaria:ons

8.  Conclusion

6

Charm++ as an Energy Efficient RunDme

7B.ACUN,A.LANGER,H.MENON,O.SAROOD,E.TOTONI,ANDL.V.KALE.“POWER,RELIABILITY,PERFORMANCE:ONESYSTEMTORULETHEMALL”.IEEECOMPUTER,2016.

InteracDon Between the RunDme System and the Resource Manager

ü  Allowsdynamicinterac:onbetweenthesystemresourcemanagerandtherun:mesystemü  Meetssystem-levelconstraintssuchaspowercapsandhardwareconfigura:onsü  Achievestheobjec:vesofbothdatacenterusersandsystemadministrators

B.ACUN,A.LANGER,H.MENON,O.SAROOD,E.TOTONI,ANDL.V.KALE.“POWER,RELIABILITY,PERFORMANCE:ONESYSTEMTORULETHEMALL”.IEEECOMPUTER,2016. 8

Components of Charm++ with Its InteracDons

Charm++hasfourmaincomponents:•  Localmanager:trackslocalinforma:on

suchasobjectloads,CPUtemperatures•  Load-balancingmodule:makesload-balancingdecisionsandredistributesload•  Power-resiliencymodule:ensuresthatthe

CPUtemperaturesremainbelowthetemperaturethreshold,changethepowercap

•  Client-serverinterface:Enablesinterac:onswithotherprograms

9B.ACUN,A.LANGER,H.MENON,O.SAROOD,E.TOTONI,ANDL.V.KALE.“POWER,RELIABILITY,PERFORMANCE:ONESYSTEMTORULETHEMALL”.IEEECOMPUTER,2016.

Outline 1.  Introduc:on

2.  ADynamicRun:meInterac:ngwithDataCenter’sResourceManager

3.  Varia:onAnalysis:Power,Temperature,Frequency

4.  Mi:ga:ngFrequencyVaria:on

5.  Mi:ga:ngTemperatureVaria:on

6.  Mi:ga:ngPowerVaria:on

7.  Mi:ga:ngWithinApplica:onVaria:ons

8.  Conclusion

10

Analysis of Performance Variability

��������

������

������

���� ���� ����

��������

���������������

��������������������������������������������

������������������������������

11

•  Intel’sXeonPhi-KnightsLanding(KNL)processorsonCorisupercomputeratNERSC•  Highvariabilitydueto4reasonsisreported(frequently15%,upto100%):

•  OSnoise•  Cacheconten:onon:le(L2)•  Memoryvariabilityduetocachemodepageconflicts•  Networkvariability

*Imagesource:hnps://insidehpc.com/2016/01/mcdram/ •  ~50%performancevaria:onon256KNLprocessors

Power and Frequency CorrelaDon

����

�����

����

�����

����

���� ���� ���� ���� ���� ����

���������������

���������

�������������������������������

��������

��������������

12

•  TheprocessorswhosefrequenciesarethronledallhittheThermalDesignPower(TDP)ofthechip(135Wans)

•  256CoriIntelHaswellprocessorsisshown

Frequency ThroXling is not Only Temperature Related

������������������������

���� ���� ���� ���� ���� ����

���������������

���������

���������������������������������

13

•  TheprocessorsthathitTDP,haveawiderangeoftemperaturefrom56Cto78C(22Cdifference)

•  ProcessorsthatdonothitTDPhavesimilartemperatureranges

Summary: VariaDon Analysis • Large-scalesystemsexhibitpowerandtemperaturevaria:ons

thatarerelatedtodesignandmanufacture.

• Theinherentdifferencesmanifestthemselvesasfrequency,performancevaria:ons.

• Mi:ga:ngthesevaria:onsenablestoincreasetheperformanceandpowerefficiencywithoutsacrificingfromperformancewhichisanimportantconcernforHPCusers.

14

Outline 1.  Introduc:on

2.  ADynamicRun:meInterac:ngwithDataCenter’sResourceManager

3.  Varia:onAnalysis:Power,Temperature,Frequency

4.  Mi:ga:ngFrequencyVaria:on

5.  Mi:ga:ngTemperatureVaria:on

6.  Mi:ga:ngPowerVaria:on

7.  Mi:ga:ngWithinApplica:onVaria:ons

8.  Conclusion

B.ACUN,L.V.KALE.”MITIGATINGPROCESSORVARIATIONWITHDYNAMICLOADBALANCING."IEEEINTERNATIONALWORKSHOPONVARIABILITYINPARALLELANDDISTRIBUTEDSYSTEMS(VARSYS,IPDPS).2016. 15

Outline 1.  Introduc:on

2.  ADynamicRun:meInterac:ngwithDataCenter’sResourceManager

3.  Varia:onAnalysis:Power,Temperature,Frequency

4.  Mi:ga:ngFrequencyVaria:on

5.  Mi:ga:ngTemperatureVaria:on

6.  Mi:ga:ngPowerVaria:on

7.  Mi:ga:ngWithinApplica:onVaria:ons

8.  Conclusion

16

MoDvaDon • Understandthetemperaturevaria:onandcoolinginefficiencies inlargescalesystems

• Findsolu:onstomi:gatethevaria:oninordertoreducecoolingpower

B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017. 17

POWER8ServerNodeArchitecture

•  Fourfansthatoperatessynchronouslytocoolthetwochips,GPUs,andmemory•  Ifanyofthecomponentshitstheirtemperaturelimit(73C,79C,and74Crespec:vely),

fanstriggerinareac:veway

��

���

����

����

����

����

�� �� �� �� �� �� �� �� �� �� ����������������

��������������������

�����������������������������������������������

����������������������������������������������

MoDvaDon •  Temperaturevaria>onsamongcores:

•  7Cinidletemperatures•  20Cidle/ac:vemixed•  9Cinallac:vetemperatures

•  Synchronousfancontrol:•  4independentfansinthenode•  Fansallacttogetherandcause

evenfurthertemperaturevaria:on

•  Reac>vecoolingbehavior:•  54Wjumpinfanpower•  10minutesstabiliza:on:me

witharegularworkload

7C 20C

18B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.

Temperature VariaDon in Large Scale

CoriatNERSC–IntelHaswell MinskyatIBM-POWER8

•  Steady-statetemperaturedistribu:onof1,800coresintwodifferentplasorms:•  25Cdifferenceamongcoreswhenrunningthesameworkload

19B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.

Oscillatory Cooling Behavior

30%

10%

60%

99%

Workloadstarts

20

CPUU:liza:on

B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.

Fan Behavior of Different ApplicaDons

���

���

���

���

����

�� �� ��� ��� ���

���������

����������

�������������������������������������

��������������

���������������

21

•  Someapplica:onsmakesonlysinglepowerpeakfromtheapplica:onstart•  Someapplica:onskeepsoscilla:ngevenaHertensofminutesofexecu:on

B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.

Why Temperature Modeling is Difficult? • Therearelotsofparametersaffec:ngthecoretemperatures:◦  Complexworkloads◦  Ambienttemperature◦  Corefrequencies◦  Fanspeedlevel◦  Physicallayout◦  Hardwarevaria:ons

• Combina:onoftheseparameterscreatesanexponen:almodelingspace◦  10differentcores◦  0-100CPUu:liza:onlevels◦  44differentfrequencylevels◦  3000RPM-10000RPMfanspeedlevels◦  4fansv (10^10)*44*(10^4)=~2^52

Pre-Processing

Training

Deployment

RawData

CoreTemperatures(Estimation)

CoreUtilizations

FanSpeed

NeuralNetworkModel

TrainingPhaseDeploymentPhase

AmbientTemperature

CoreFrequenciesPre-Processing

Training

Deployment

RawData

CoreTemperatures(Estimation)

CoreUtilizations

FanSpeed

NeuralNetworkModel

TrainingPhaseDeploymentPhase

AmbientTemperature

CoreFrequenciesPre-Processing

Training

Deployment

RawData

CoreTemperatures(Estimation)

CoreUtilizations

FanSpeed

NeuralNetworkModel

TrainingPhaseDeploymentPhase

AmbientTemperature

CoreFrequenciesPre-Processing

Training

Deployment

RawData

CoreTemperatures(Estimation)

CoreUtilizations

FanSpeed

NeuralNetworkModel

TrainingPhaseDeploymentPhase

AmbientTemperature

CoreFrequencies

22B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.

Neural Networks for Temperature Modeling • Neuralnetworksaregoodbecause:◦  Theycancapturelinearandnon-linearbehaviorbetweeninputandoutputparameters

◦  Theyworkwellinnoisydata◦  Theydonotneedforformula:onofanobjec:vefunc:on

• NeuralnetworkshasbeenusedinHPCfor:◦  Energyandpowermodeling[1]◦  Performancemodeling[2]◦  Temperaturemodeling:◦  ForGPUtemperaturemodeling[3]◦  Forcoarse-graineddatacenterlevelmodeling[4]

1.  A.Tiwari,M.A.Laurenzano,L.Carrington,andA.Snavely.ModelingpowerandenergyusageofHPCkernels.InParallelandDistributedProcessingSymposiumWorkshops&PhDForum(IPDPSW),IEEE,2012.

2.  B.C.Lee,D.M.Brooks,B.R.deSupinski,M.Schulz,K.Singh,andS.A.McKee.Methodsofinferenceandlearningforperformancemodelingofparallelapplica:ons.InProceedingsofthe12thACMSIGPLANSymposiumonPrinciplesandPracHceofParallelProgramming,PPoPP'07,2007.

3.  A.Sridhar,A.Vincenzi,M.Ruggiero,andD.A:enza.Neuralnetwork-basedthermalsimula:onofintegratedcircuitsonGPUs.IEEETransacHonsonComputer-AidedDesignofIntegratedCircuitsandSystems31.

4.  L.Wang,G.vonLaszewski,F.Huang,J.Dayal,T.Frulani,andG.Fox.Taskschedulingwithann-basedtemperaturepredic:oninadatacenter:asimula:on-basedstudy.EngineeringwithComputers,2011.

23B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.

Neural Networks for Temperature PredicDon

ExperimentalSetup:•  Firestoneclusterat

IBMwithPower8processors

•  1node=2sockets,20physicalcores,160SMTcores

•  OCC,andBMCfortemperature,powerreadings

Pre-Processing

Training

Deployment

RawData

CoreTemperatures(Predic:on)

CoreU:liza:ons

FanSpeeds

NeuralNetworkModel

TrainingPhaseDeploymentPhase

AmbientTemperature

CoreFrequencies

ChipPower

24

•  Proof-of-conceptprototypemodel,othermodelscanalsobeusedinmysolu:ons.

B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.

Neural Network ConfiguraDon and ValidaDon

•  Otherconfigura:onsincludenumberoflayers,andnumberofneurons.

•  Wetestdifferentback-propaga:onalgorithmswithdifferent:meandmemoryrequirements.

0 500 1000 1500 2000Number of Samples used for Training

0.5

1

1.5

Mea

n Ab

solu

te E

rror [°C

] Levenberg-MarquardtScaled conjugate gradientResilient

0 5 10 15 20Core number

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Mea

n Ab

solu

te E

rror [°C

]

Median

25%-75%9%-91%

25B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.

Model Guided ProacDve Cooling Decisions • Fancontrol◦  Thiscanreduceoscilla:onsandchip-to-chiptemperaturevaria:ons.◦ Whatshouldbethefanspeedleveltobeablekeepthechipsatacertaintemperaturelimit?

• Loadbalancing◦  Thiscanremovecore-to-core,aswellaschip-to-chiptemperaturevaria:ons.

◦ Whatwouldthecoretemperaturesbecomeifacertainamountofdataismovedfromonecoretoanother?

• DVFS◦  Chip-levelDVFScanreducechip-to-chip,corelevelDVFScore-to-coretemperaturevaria:ons.

◦ Whatfrequencylevelweneedtosetforthecorestostayunderatemperaturelimitforaworkload?

26B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.

ProacDve Fan Control Mechanism

v  Preemp:vefan-controlremovestemperaturepeaks,and isabletokeepthetemperatureasthesamelevelasreac:vefancontrol.

v  Thekeyideabehindprecoolingistocooltheprocessorproac:vely,forexample,beforetheapplica:onstarts.

v  Itcanbedoneviajobscheduler,and/orrun:mewithouttakingoverthetotalcontrol.

������������������������

�� ���� ���� ���� ���� ����

���������������

��������

������������������������������������������������

������������������������������

����������������������

��������������������

���

���

���

���

����

�� ���� ���� ���� ���� ����

���������

��������

�������������������������������������

������������������������������

����������������������

��������������������

27B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.

45.6%reduc:on

infanpower

9.4%reduc:on

infanenergy

Decoupling the Fans

BEFORE AFTER

28

13%reduc:oninfanenergy

7.7%reduc:oninfanpower

B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.

Total ReducDon in Fan Power

29

53%reduc:oninfanpoweron

average

22%reduc:oninenergyonaverage

B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.

Remaining Temperature VariaDon

�������������������������������

�� �� �� �� �� �� �� �� �� ������������������

����������������

��������������������������������

•  Howtomi:gateintra-chiptemperaturevaria:on?•  DVFS:core-levelisnotsupportedinmanyarchitecturesL•  LoadBalancingJ

30

•  Thereisupto10Cintra-chipvaria:onthatcannotbemi:gatedbydecoupledfans

B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.

Temperature-Aware Load Balancing With Charm++

•  Loadbalancinghaspoten:altoremovebothchipandcorelevelvaria:ons.

•  Itcanhelpreducethetemperaturevaria:ons,buthowdowedecidehowmuchloadtomove?

•  Charm++hasanrun:medatabasewhichstores:•  Numberoftasksperprocess•  Loadofeachobject

•  Loadbalancingistriggeredperiodicallywithcustomizableperiods

•  Weimplementourtemperature-awaremodelguidedloadbalancingalgorithm.

���

���

���

���

���

�� ���� ���� ���� ����

���������������

��������

���������������������������������������������������

����������������

�������������������������������

����������������������������������������

31B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.

Summary: MiDgaDng Temperature VariaDon • Analyzedinefficienciesincoolingsystems

• Proposedsolu:onsbasedonaneuralnetworkbasedtemperaturepredic:onmodel:

◦  Precooling◦  Decoupledfancontrol◦  Loadbalancing

• Ourresultsshows:◦ Wecanaccuratelypredictcoretemperatures◦ Peakfanpowercanbereducedby53%,energyby22%◦ Asaresult,aircoolingsystemscanbemademoreefficient

32B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.

Outline 1.  Introduc:on

2.  ADynamicRun:meInterac:ngwithDataCenter’sResourceManager

3.  Varia:onAnalysis:Power,Temperature,Frequency

4.  Mi:ga:ngFrequencyVaria:on

5.  Mi:ga:ngTemperatureVaria:on

6.  Mi:ga:ngPowerVaria:on

7.  Mi:ga:ngWithinApplica:onVaria:ons

8.  Conclusion

33

MiDgaDng Across Component Power VariaDon

CPU CPU

GPU GPU GPUGPU

HeterogeneousComputeNodeArchitecture

Memory Memory

NetworkCard NetworkCard

•  SierraandSummitDevnodearchitecture.•  SummitDevhasIBMPOWER8CPUs,NVIDIATeslaP100GPUs,DDR4memoryandMellanoxEDRInfinibandnetworkadapters.

B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017. 34

•  Anexascalearchitecturehavingfatandheterogeneouscomputenodes•  Eachofthenodecomponentshavedifferentpowervaria:ons

Idle Power DistribuDon of Node Components

��

���

���

���

���

���

���� ���� ���� ���� ���� ���� ���� ���� ����

��������������������

���������

�����������������������������

��

���

���

���

���

���

���� ���� ���� ���� ���� ���� ���� ���� ����

���������

��������������������������������

��

���

���

���

���

���

���� ���� ���� ���� ���� ���� ���� ���� ����

��������������������

���������

�����������������������������

��

���

���

���

���

���

���� ���� ���� ���� ����

���������

������������������������������������

35B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.

CPU Power DistribuDon of Different Benchmarks

����������������������������

���� ���� ���� ���� ���� ����

��������������������

���������

�����������������������������

����������������������������

���� ���� ���� ���� ���� ����

���������

���������������������������������

����������������������������

���� ���� ���� ���� ���� ����

��������������������

���������

���������������������������������

����������������������������

���� ���� ���� ���� ���� ����

���������

������������������������������

36B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.

Random Node Assembly

Illustra(on of Data Center Components’ Efficiency in Random Assembly

ComputeNode

CPU

Memory

GPU

Powerefficiencyscale

Efficient NotEfficient

Dic;onary:

37B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.

•  Efficientandnon-efficientcomponentsmayrandomlyshowupinanode.

Categorized Node Assembly

Illustra(on of Type-1 Node Assembly

38

•  Componentshavingthesameefficiencylevelaregatheredinthesamenode•  Datacenterconsumeslesspowerifnotatfullload•  Customerscanselecttobuyonlyefficientnodes

B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.

AcDve Power DistribuDon of Components

39

•  Theac:vedistribu:onsarefitintoGaussiandistribu:ons•  Extrapolatedfrom~90to5000nodesinordertorepresentlargescale

B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.

CPU Memory GPU

Categorized Assembly Power ReducDon

������������������

�� �� �� �� �� �� �� �� �� �������������������������

�������������������

��������������������

��������������������

������������������������������������������������������������������������������

��

40

•  Totalpowerconsump:onofthecomponents:6.5MW•  Summitisexpectedtoconsume13MW

B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.

ApplicaDon Specific Node Assembly

Illustra(on of Type-2 Node Assembly

41

•  Nodeisassembledbasedonapplica:oncharacteris:cs:•  amemoryintensiveapplica:ondon’tneedefficientCPUs

•  Performancevaria:onscanbemi:gated(upto16%)

B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.

Balanced Power Node Assembly

Illustra(on of Type-3 Node Assembly

42

•  Makestotalnodepowerandaverageperformancemorepredictable•  Moresuitableforcloudplasorms

B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.

Summary: MiDgaDng Power VariaDon

• Analyzedpowervaria:onofdifferentcomponentsinthenode

• Proposedthreenewnodeassemblytechniques:◦  Categorized◦  Applica:onSpecific◦  BalancedNodePower

43B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.

Outline 1.  Introduc:on

2.  ADynamicRun:meInterac:ngwithDataCenter’sResourceManager

3.  Varia:onAnalysis:Power,Temperature,Frequency

4.  Mi:ga:ngFrequencyVaria:on

5.  Mi:ga:ngTemperatureVaria:on

6.  Mi:ga:ngPowerVaria:on

7.  Mi:ga:ngWithinApplica:onVaria:ons

8.  Conclusion

44

MiDgaDng ApplicaDon-Level VariaDons

45

•  Applica:onsmighthavedifferentphasesorkernelsthatexecutesimultaneouslyandhavedifferent“op:malfrequency”levels

•  Unstructuredornon-uniformapplica:ons•  Applica:onswithspecialthreads:I/O,communica:onthreads

•  Run:mecandecidethe“op:malfrequency”ofeachapplica:ontask

•  Transparentfromtheapplica:on•  Noneedforapplica:onmodifica:ons,inser:onofdirec:ves•  Higherefficiencybymakingautomatedfine-grainop:miza:ons

•  Whatis“op:malfrequency”?•  Energyminimalfrequency•  Lowerfrequencywithoutsacrificingfromperformance•  Highestfrequencyunderapower-constraint•  Temperaturerestrainingfrequency

Per-core DVFS Support in Three Architectures

��

���

����

����

����

����

����

� � � � � � � � � � �����������������������������

��������������

������������������������

���������������������������������

�����������������������������������������������

�����

������������

�����������

����������

��

���

���

���

���

���

���

� � � � ���������������������������������

��������������

������������������������

����������������������������������������������

������������������������������������������������

�����

������������

�����������

����������

��

���

���

���

���

���

���

� � � � ���������������������������������

��������������

������������������������

����������������������������������������������

�����������������������������������������������������

�����

������������

�����������

����������

46

•  IntelHaswellistheonlyplasormthatsupportsper-coreDVFSinproduc:on.

RunDme-based FuncDon-Level OpDmizaDon Approach

Sta$s$csCollec$on

• Collectpowerandperformance:• Foreachentrymethodineverychareinstance

• Foreachfrequencylevel

Op$malFrequencyCalcula$on

• Mode1:Minimalenergymode

• Mode2:Maximumperformance

Op$malFrequencyApplica$on

• Execu@on@meandoverheadthreshold

47

•  Charm++“entrymethods”naturallyenablessepara:onandcontrolofdifferentkernelsoftheapplica:ons.

No Core-Level Power Data Available

������������������

������������

� � � �������

�����������������������

����������������������

������������������������������������������

������������

�������������

•  Usealockingmechanism–onecoreac:veata:me•  Subtractsta:cpowerofidlecores

•  Doesmeasurementsonecorereflectoverallmeasurements?

48

Entry-based DVFS

RunningatF1:EnergyOp2malFrequency

Kernel-2

Tlatency

Kernel-1

Toverhead Tx

RunningatF2:EnergyOp2malFrequency

Ω

:EnergyLoss

:Sub-op2malEnergy

Ω Ω

RunningatF1

Ω

SendcommandtochangefreqtoF2

FreqischangedtoF2

Ω

Timeline

49

T_overhead=2-5microsecondsT_latency=upto500microseconds

Energy ReducDon for Different Kernel DuraDons

50

•  Theop:malfrequencyofthetargetkernelis2.3GHz•  Itdoesnotworthtransi:oningfrom1.7-2.8GHz•  Shorterkerneldura:onshavelessbenefitduetooverheadandlatency

-25

-15

-5

5

15

25

3.53.33.232.82.72.52.32.221.91.71.51.41.2

EnergyRed

uc,o

n(%

)

Transi,onFrequency(GHz)

10s 1s 0.1s 0.01s

0.001s 0.0005s 0.0001s

Summary: MiDgaDng ApplicaDon VariaDons

• Task-basedrun:mecandofinegrainedop:miza:onsbyop:mizingeachkerneloftheapplica:on

• Theperformanceofakernelortaskdependsonwhatothercoresarerunning◦  Needcore-levelpowercounters

• Per-coreDVFSneedstohavelesslatencyandoverheadtobeprac:cal

51

Outline 1.  Introduc:on

2.  ADynamicRun:meInterac:ngwithDataCenter’sResourceManager

3.  Varia:onAnalysis:Power,Temperature,Frequency

4.  Mi:ga:ngFrequencyVaria:on

5.  Mi:ga:ngTemperatureVaria:on

6.  Mi:ga:ngPowerVaria:on

7.  Mi:ga:ngWithinApplica:onVaria:ons

8.  Conclusion

52

Concluding Remarks • Largescalesystemsexhibitvariabilityduetovariousreasonsandlikelytocon:nueexhibitinthefuture.

• Variabilityisbadforperformancereproducibilityanddebugging.

• Removingthevariabilityisinherentlydifficult.

• Insteadrun:mesorsoHwaresystemsshouldknowandaccommodateforvariability.

• SupportfromHPCsystemstoenablepower,temperaturerelatedmeasurementsandcontrolsarekeyinachievingthis.

53

Thank you!

54

top related