mitigating variability in hpc systems and …sc17.supercomputing.org/sc17 archive...mitigating...
TRANSCRIPT
MITIGATING VARIABILITY IN HPC SYSTEMS AND APPLICATIONS FOR PERFORMANCE AND POWER EFFICIENCY
B ILGE ACUN
DEPARTMENT OF COMPUTER SC IENCE
UNIVERS ITY OF I LL INOIS AT URBANA-CHAMPAIGN
DOC TORAL SHOWCASE– NOV, 2017
1
DissertaDon Goal ToincreasetheperformanceandpowerefficiencyofHighPerformanceCompu:ng(HPC)systemsthroughmi:ga:ngvarioussourcesofvariabilitywithoutsacrificingfromperformance
• AnalyzevariabilityinlargescaleHPCsystems◦ Frequency,power,temperature
• Addresseachofthesourcesofthevariability◦ ViasoHwareandhardwaretechniques
B.ACUN,P.MILLER,L.V.KALE.“VARIATIONAMONGPROCESSORSUNDERTURBOBOOSTINHPCSYSTEMS.”INTERNATIONALCONFERENCEONSUPERCOMPUTING(ICS).2016. 2
Outline 1. Introduc:on
2. ADynamicRun:meInterac:ngwithDataCenter’sResourceManager
3. Varia:onAnalysis:Frequency,Temperature,Power
4. Mi:ga:ngFrequencyVaria:on
5. Mi:ga:ngTemperatureVaria:on
6. Mi:ga:ngPowerVaria:on
7. Mi:ga:ngWithinApplica:onVaria:ons
8. Conclusion
3
Outline 1. Introduc:on
2. ADynamicRun:meInterac:ngwithDataCenter’sResourceManager
3. Varia:onAnalysis:Power,Temperature,Frequency
4. Mi:ga:ngFrequencyVaria:on
5. Mi:ga:ngTemperatureVaria:on
6. Mi:ga:ngPowerVaria:on
7. Mi:ga:ngWithinApplica:onVaria:ons
8. Conclusion
4
2016 U.S. Data Center Energy Report
8
Equation 1
� ���� � ���
���������
Where f = fraction of 5th year shipments in installed base IBy = installed base in year y Sy = shipments in year y
Figure 5. Total Volume Server Installed Base Estimates from Three Studies
Figure 6. Volume Server Installed Base 2000-2020
�
�
�
�
!
��
��
��
��
�!
��
���� ���� ���� ���� ���! ���� ���� ���� ���� ���! ������
���� ������� ���������
���
����� �
����
���� �
����� �
��� �� ����
������������� ��������� ������
�
�
�
�
�
��
��
��
��
��
��
���� ���� ���� ���� ���� ���� ���� ����
������ ������� ���������
���
����� �
����������� ����
���������� ����
���������� ����
��������� ����
���������
5
ES-2
The combination of these efficiency trends has resulted in a relatively steady U.S data center electricity demand over the past 5 years, with little growth expected for the remainder of this decade. It is important to note that this near constant electricity demand across the decade is occurring while simultaneously meeting a drastic increase in demand for data center services; data center electricity use would be significantly higher without these energy efficiency improvements. A counterfactual scenario was created for this study that estimates what data center energy consumption would have been if industry energy-savings efforts were halted in 2010. For this scenario, the follow metrics remain static at 2010 industry-wide levels from 2010-2020:
• Average server utilization • Server power scaling at low utilization • Average power draw of hard disk drives • Average power draw of network ports • Average infrastructure efficiency (i.e., PUE)
The resulting electricity demand, shown in Figure ES-1, indicates that more than 600 additional billion kWh would have been required across the decade.
Figure ES-1 Projected Data Center Total Electricity Use
Estimates include energy used for servers, storage, network equipment, and infrastructure in all U.S. data centers. The solid line represents historical estimates from 2000-2014 and the dashed lines represent five projection scenarios through 2020; Current Trends, Improved Management (IM), Best Practices (BP), Hyperscale Shift (HS), and the static 2010 Energy Efficiency counterfactual.
* Figures and data are taken from A.Shehabietal.“Unitedstatesdatacenterenergyusagereport,”LawrenceBerkeleyNa:onalLaboratory.LBNL-1005775,vol.4,2016.
• Energyconsump:onhasbeenflat-lineddueto(*):
• Improvedopera:ons• Hardwareadvancements
• Datacenterelectricityisspentby(*):• Servers:~40%• Infrastructure:~40%• Networkandstorage:~20%
Outline 1. Introduc:on
2. ADynamicRun:meInterac:ngwithDataCenter’sResourceManager
3. Varia:onAnalysis:Power,Temperature,Frequency
4. Mi:ga:ngFrequencyVaria:on
5. Mi:ga:ngTemperatureVaria:on
6. Mi:ga:ngPowerVaria:on
7. Mi:ga:ngWithinApplica:onVaria:ons
8. Conclusion
6
Charm++ as an Energy Efficient RunDme
7B.ACUN,A.LANGER,H.MENON,O.SAROOD,E.TOTONI,ANDL.V.KALE.“POWER,RELIABILITY,PERFORMANCE:ONESYSTEMTORULETHEMALL”.IEEECOMPUTER,2016.
InteracDon Between the RunDme System and the Resource Manager
ü Allowsdynamicinterac:onbetweenthesystemresourcemanagerandtherun:mesystemü Meetssystem-levelconstraintssuchaspowercapsandhardwareconfigura:onsü Achievestheobjec:vesofbothdatacenterusersandsystemadministrators
B.ACUN,A.LANGER,H.MENON,O.SAROOD,E.TOTONI,ANDL.V.KALE.“POWER,RELIABILITY,PERFORMANCE:ONESYSTEMTORULETHEMALL”.IEEECOMPUTER,2016. 8
Components of Charm++ with Its InteracDons
Charm++hasfourmaincomponents:• Localmanager:trackslocalinforma:on
suchasobjectloads,CPUtemperatures• Load-balancingmodule:makesload-balancingdecisionsandredistributesload• Power-resiliencymodule:ensuresthatthe
CPUtemperaturesremainbelowthetemperaturethreshold,changethepowercap
• Client-serverinterface:Enablesinterac:onswithotherprograms
9B.ACUN,A.LANGER,H.MENON,O.SAROOD,E.TOTONI,ANDL.V.KALE.“POWER,RELIABILITY,PERFORMANCE:ONESYSTEMTORULETHEMALL”.IEEECOMPUTER,2016.
Outline 1. Introduc:on
2. ADynamicRun:meInterac:ngwithDataCenter’sResourceManager
3. Varia:onAnalysis:Power,Temperature,Frequency
4. Mi:ga:ngFrequencyVaria:on
5. Mi:ga:ngTemperatureVaria:on
6. Mi:ga:ngPowerVaria:on
7. Mi:ga:ngWithinApplica:onVaria:ons
8. Conclusion
10
Analysis of Performance Variability
��������
������
������
���� ���� ����
��������
���������������
��������������������������������������������
������������������������������
11
• Intel’sXeonPhi-KnightsLanding(KNL)processorsonCorisupercomputeratNERSC• Highvariabilitydueto4reasonsisreported(frequently15%,upto100%):
• OSnoise• Cacheconten:onon:le(L2)• Memoryvariabilityduetocachemodepageconflicts• Networkvariability
*Imagesource:hnps://insidehpc.com/2016/01/mcdram/ • ~50%performancevaria:onon256KNLprocessors
Power and Frequency CorrelaDon
����
�����
����
�����
����
���� ���� ���� ���� ���� ����
���������������
���������
�������������������������������
��������
��������������
12
• TheprocessorswhosefrequenciesarethronledallhittheThermalDesignPower(TDP)ofthechip(135Wans)
• 256CoriIntelHaswellprocessorsisshown
Frequency ThroXling is not Only Temperature Related
������������������������
���� ���� ���� ���� ���� ����
���������������
���������
���������������������������������
13
• TheprocessorsthathitTDP,haveawiderangeoftemperaturefrom56Cto78C(22Cdifference)
• ProcessorsthatdonothitTDPhavesimilartemperatureranges
Summary: VariaDon Analysis • Large-scalesystemsexhibitpowerandtemperaturevaria:ons
thatarerelatedtodesignandmanufacture.
• Theinherentdifferencesmanifestthemselvesasfrequency,performancevaria:ons.
• Mi:ga:ngthesevaria:onsenablestoincreasetheperformanceandpowerefficiencywithoutsacrificingfromperformancewhichisanimportantconcernforHPCusers.
14
Outline 1. Introduc:on
2. ADynamicRun:meInterac:ngwithDataCenter’sResourceManager
3. Varia:onAnalysis:Power,Temperature,Frequency
4. Mi:ga:ngFrequencyVaria:on
5. Mi:ga:ngTemperatureVaria:on
6. Mi:ga:ngPowerVaria:on
7. Mi:ga:ngWithinApplica:onVaria:ons
8. Conclusion
B.ACUN,L.V.KALE.”MITIGATINGPROCESSORVARIATIONWITHDYNAMICLOADBALANCING."IEEEINTERNATIONALWORKSHOPONVARIABILITYINPARALLELANDDISTRIBUTEDSYSTEMS(VARSYS,IPDPS).2016. 15
Outline 1. Introduc:on
2. ADynamicRun:meInterac:ngwithDataCenter’sResourceManager
3. Varia:onAnalysis:Power,Temperature,Frequency
4. Mi:ga:ngFrequencyVaria:on
5. Mi:ga:ngTemperatureVaria:on
6. Mi:ga:ngPowerVaria:on
7. Mi:ga:ngWithinApplica:onVaria:ons
8. Conclusion
16
MoDvaDon • Understandthetemperaturevaria:onandcoolinginefficiencies inlargescalesystems
• Findsolu:onstomi:gatethevaria:oninordertoreducecoolingpower
B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017. 17
POWER8ServerNodeArchitecture
• Fourfansthatoperatessynchronouslytocoolthetwochips,GPUs,andmemory• Ifanyofthecomponentshitstheirtemperaturelimit(73C,79C,and74Crespec:vely),
fanstriggerinareac:veway
��
���
����
����
����
����
�� �� �� �� �� �� �� �� �� �� ����������������
��������������������
�����������������������������������������������
����������������������������������������������
MoDvaDon • Temperaturevaria>onsamongcores:
• 7Cinidletemperatures• 20Cidle/ac:vemixed• 9Cinallac:vetemperatures
• Synchronousfancontrol:• 4independentfansinthenode• Fansallacttogetherandcause
evenfurthertemperaturevaria:on
• Reac>vecoolingbehavior:• 54Wjumpinfanpower• 10minutesstabiliza:on:me
witharegularworkload
7C 20C
18B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.
Temperature VariaDon in Large Scale
CoriatNERSC–IntelHaswell MinskyatIBM-POWER8
• Steady-statetemperaturedistribu:onof1,800coresintwodifferentplasorms:• 25Cdifferenceamongcoreswhenrunningthesameworkload
19B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.
Oscillatory Cooling Behavior
30%
10%
60%
99%
Workloadstarts
20
CPUU:liza:on
B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.
Fan Behavior of Different ApplicaDons
���
���
���
���
����
�� �� ��� ��� ���
���������
����������
�������������������������������������
��������������
���������������
21
• Someapplica:onsmakesonlysinglepowerpeakfromtheapplica:onstart• Someapplica:onskeepsoscilla:ngevenaHertensofminutesofexecu:on
B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.
Why Temperature Modeling is Difficult? • Therearelotsofparametersaffec:ngthecoretemperatures:◦ Complexworkloads◦ Ambienttemperature◦ Corefrequencies◦ Fanspeedlevel◦ Physicallayout◦ Hardwarevaria:ons
• Combina:onoftheseparameterscreatesanexponen:almodelingspace◦ 10differentcores◦ 0-100CPUu:liza:onlevels◦ 44differentfrequencylevels◦ 3000RPM-10000RPMfanspeedlevels◦ 4fansv (10^10)*44*(10^4)=~2^52
Pre-Processing
Training
Deployment
RawData
CoreTemperatures(Estimation)
CoreUtilizations
FanSpeed
NeuralNetworkModel
TrainingPhaseDeploymentPhase
AmbientTemperature
CoreFrequenciesPre-Processing
Training
Deployment
RawData
CoreTemperatures(Estimation)
CoreUtilizations
FanSpeed
NeuralNetworkModel
TrainingPhaseDeploymentPhase
AmbientTemperature
CoreFrequenciesPre-Processing
Training
Deployment
RawData
CoreTemperatures(Estimation)
CoreUtilizations
FanSpeed
NeuralNetworkModel
TrainingPhaseDeploymentPhase
AmbientTemperature
CoreFrequenciesPre-Processing
Training
Deployment
RawData
CoreTemperatures(Estimation)
CoreUtilizations
FanSpeed
NeuralNetworkModel
TrainingPhaseDeploymentPhase
AmbientTemperature
CoreFrequencies
22B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.
Neural Networks for Temperature Modeling • Neuralnetworksaregoodbecause:◦ Theycancapturelinearandnon-linearbehaviorbetweeninputandoutputparameters
◦ Theyworkwellinnoisydata◦ Theydonotneedforformula:onofanobjec:vefunc:on
• NeuralnetworkshasbeenusedinHPCfor:◦ Energyandpowermodeling[1]◦ Performancemodeling[2]◦ Temperaturemodeling:◦ ForGPUtemperaturemodeling[3]◦ Forcoarse-graineddatacenterlevelmodeling[4]
1. A.Tiwari,M.A.Laurenzano,L.Carrington,andA.Snavely.ModelingpowerandenergyusageofHPCkernels.InParallelandDistributedProcessingSymposiumWorkshops&PhDForum(IPDPSW),IEEE,2012.
2. B.C.Lee,D.M.Brooks,B.R.deSupinski,M.Schulz,K.Singh,andS.A.McKee.Methodsofinferenceandlearningforperformancemodelingofparallelapplica:ons.InProceedingsofthe12thACMSIGPLANSymposiumonPrinciplesandPracHceofParallelProgramming,PPoPP'07,2007.
3. A.Sridhar,A.Vincenzi,M.Ruggiero,andD.A:enza.Neuralnetwork-basedthermalsimula:onofintegratedcircuitsonGPUs.IEEETransacHonsonComputer-AidedDesignofIntegratedCircuitsandSystems31.
4. L.Wang,G.vonLaszewski,F.Huang,J.Dayal,T.Frulani,andG.Fox.Taskschedulingwithann-basedtemperaturepredic:oninadatacenter:asimula:on-basedstudy.EngineeringwithComputers,2011.
23B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.
Neural Networks for Temperature PredicDon
ExperimentalSetup:• Firestoneclusterat
IBMwithPower8processors
• 1node=2sockets,20physicalcores,160SMTcores
• OCC,andBMCfortemperature,powerreadings
Pre-Processing
Training
Deployment
RawData
CoreTemperatures(Predic:on)
CoreU:liza:ons
FanSpeeds
NeuralNetworkModel
TrainingPhaseDeploymentPhase
AmbientTemperature
CoreFrequencies
ChipPower
24
• Proof-of-conceptprototypemodel,othermodelscanalsobeusedinmysolu:ons.
B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.
Neural Network ConfiguraDon and ValidaDon
• Otherconfigura:onsincludenumberoflayers,andnumberofneurons.
• Wetestdifferentback-propaga:onalgorithmswithdifferent:meandmemoryrequirements.
0 500 1000 1500 2000Number of Samples used for Training
0.5
1
1.5
Mea
n Ab
solu
te E
rror [°C
] Levenberg-MarquardtScaled conjugate gradientResilient
0 5 10 15 20Core number
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Mea
n Ab
solu
te E
rror [°C
]
Median
25%-75%9%-91%
25B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.
Model Guided ProacDve Cooling Decisions • Fancontrol◦ Thiscanreduceoscilla:onsandchip-to-chiptemperaturevaria:ons.◦ Whatshouldbethefanspeedleveltobeablekeepthechipsatacertaintemperaturelimit?
• Loadbalancing◦ Thiscanremovecore-to-core,aswellaschip-to-chiptemperaturevaria:ons.
◦ Whatwouldthecoretemperaturesbecomeifacertainamountofdataismovedfromonecoretoanother?
• DVFS◦ Chip-levelDVFScanreducechip-to-chip,corelevelDVFScore-to-coretemperaturevaria:ons.
◦ Whatfrequencylevelweneedtosetforthecorestostayunderatemperaturelimitforaworkload?
26B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.
ProacDve Fan Control Mechanism
v Preemp:vefan-controlremovestemperaturepeaks,and isabletokeepthetemperatureasthesamelevelasreac:vefancontrol.
v Thekeyideabehindprecoolingistocooltheprocessorproac:vely,forexample,beforetheapplica:onstarts.
v Itcanbedoneviajobscheduler,and/orrun:mewithouttakingoverthetotalcontrol.
������������������������
�� ���� ���� ���� ���� ����
���������������
��������
������������������������������������������������
������������������������������
����������������������
��������������������
���
���
���
���
����
�� ���� ���� ���� ���� ����
���������
��������
�������������������������������������
������������������������������
����������������������
��������������������
27B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.
45.6%reduc:on
infanpower
9.4%reduc:on
infanenergy
Decoupling the Fans
BEFORE AFTER
28
13%reduc:oninfanenergy
7.7%reduc:oninfanpower
B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.
Total ReducDon in Fan Power
29
53%reduc:oninfanpoweron
average
22%reduc:oninenergyonaverage
B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.
Remaining Temperature VariaDon
�������������������������������
�� �� �� �� �� �� �� �� �� ������������������
����������������
��������������������������������
• Howtomi:gateintra-chiptemperaturevaria:on?• DVFS:core-levelisnotsupportedinmanyarchitecturesL• LoadBalancingJ
30
• Thereisupto10Cintra-chipvaria:onthatcannotbemi:gatedbydecoupledfans
B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.
Temperature-Aware Load Balancing With Charm++
• Loadbalancinghaspoten:altoremovebothchipandcorelevelvaria:ons.
• Itcanhelpreducethetemperaturevaria:ons,buthowdowedecidehowmuchloadtomove?
• Charm++hasanrun:medatabasewhichstores:• Numberoftasksperprocess• Loadofeachobject
• Loadbalancingistriggeredperiodicallywithcustomizableperiods
• Weimplementourtemperature-awaremodelguidedloadbalancingalgorithm.
���
���
���
���
���
�� ���� ���� ���� ����
���������������
��������
���������������������������������������������������
����������������
�������������������������������
����������������������������������������
31B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.
Summary: MiDgaDng Temperature VariaDon • Analyzedinefficienciesincoolingsystems
• Proposedsolu:onsbasedonaneuralnetworkbasedtemperaturepredic:onmodel:
◦ Precooling◦ Decoupledfancontrol◦ Loadbalancing
• Ourresultsshows:◦ Wecanaccuratelypredictcoretemperatures◦ Peakfanpowercanbereducedby53%,energyby22%◦ Asaresult,aircoolingsystemscanbemademoreefficient
32B.ACUN,E.K.LEE,Y.PARK,L.V.KALE.“SUPPORTFORPOWEREFFICIENTPROACTIVECOOLINGMECHANISMS”.INSUBMISSIONTOHIPC,2017.
Outline 1. Introduc:on
2. ADynamicRun:meInterac:ngwithDataCenter’sResourceManager
3. Varia:onAnalysis:Power,Temperature,Frequency
4. Mi:ga:ngFrequencyVaria:on
5. Mi:ga:ngTemperatureVaria:on
6. Mi:ga:ngPowerVaria:on
7. Mi:ga:ngWithinApplica:onVaria:ons
8. Conclusion
33
MiDgaDng Across Component Power VariaDon
CPU CPU
GPU GPU GPUGPU
HeterogeneousComputeNodeArchitecture
Memory Memory
NetworkCard NetworkCard
• SierraandSummitDevnodearchitecture.• SummitDevhasIBMPOWER8CPUs,NVIDIATeslaP100GPUs,DDR4memoryandMellanoxEDRInfinibandnetworkadapters.
B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017. 34
• Anexascalearchitecturehavingfatandheterogeneouscomputenodes• Eachofthenodecomponentshavedifferentpowervaria:ons
Idle Power DistribuDon of Node Components
��
���
���
���
���
���
���� ���� ���� ���� ���� ���� ���� ���� ����
��������������������
���������
�����������������������������
��
���
���
���
���
���
���� ���� ���� ���� ���� ���� ���� ���� ����
���������
��������������������������������
��
���
���
���
���
���
���� ���� ���� ���� ���� ���� ���� ���� ����
��������������������
���������
�����������������������������
��
���
���
���
���
���
���� ���� ���� ���� ����
���������
������������������������������������
35B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.
CPU Power DistribuDon of Different Benchmarks
����������������������������
���� ���� ���� ���� ���� ����
��������������������
���������
�����������������������������
����������������������������
���� ���� ���� ���� ���� ����
���������
���������������������������������
����������������������������
���� ���� ���� ���� ���� ����
��������������������
���������
���������������������������������
����������������������������
���� ���� ���� ���� ���� ����
���������
������������������������������
36B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.
Random Node Assembly
Illustra(on of Data Center Components’ Efficiency in Random Assembly
ComputeNode
CPU
Memory
GPU
Powerefficiencyscale
Efficient NotEfficient
Dic;onary:
37B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.
• Efficientandnon-efficientcomponentsmayrandomlyshowupinanode.
Categorized Node Assembly
Illustra(on of Type-1 Node Assembly
38
• Componentshavingthesameefficiencylevelaregatheredinthesamenode• Datacenterconsumeslesspowerifnotatfullload• Customerscanselecttobuyonlyefficientnodes
B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.
AcDve Power DistribuDon of Components
39
• Theac:vedistribu:onsarefitintoGaussiandistribu:ons• Extrapolatedfrom~90to5000nodesinordertorepresentlargescale
B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.
CPU Memory GPU
Categorized Assembly Power ReducDon
������������������
�� �� �� �� �� �� �� �� �� �������������������������
�������������������
��������������������
��������������������
������������������������������������������������������������������������������
�
��
40
• Totalpowerconsump:onofthecomponents:6.5MW• Summitisexpectedtoconsume13MW
B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.
ApplicaDon Specific Node Assembly
Illustra(on of Type-2 Node Assembly
41
• Nodeisassembledbasedonapplica:oncharacteris:cs:• amemoryintensiveapplica:ondon’tneedefficientCPUs
• Performancevaria:onscanbemi:gated(upto16%)
B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.
Balanced Power Node Assembly
Illustra(on of Type-3 Node Assembly
42
• Makestotalnodepowerandaverageperformancemorepredictable• Moresuitableforcloudplasorms
B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.
Summary: MiDgaDng Power VariaDon
• Analyzedpowervaria:onofdifferentcomponentsinthenode
• Proposedthreenewnodeassemblytechniques:◦ Categorized◦ Applica:onSpecific◦ BalancedNodePower
43B.ACUN,E.K.LEE,Y.PARK.“MULTI-COMPONENTPOWER-AWAREJOBSCHEDULINGBASEDONNODEANDAPPLICATIONCHARACTERISTICS".PENDINGPATENT,NO:15/658,494.JULY,2017.
Outline 1. Introduc:on
2. ADynamicRun:meInterac:ngwithDataCenter’sResourceManager
3. Varia:onAnalysis:Power,Temperature,Frequency
4. Mi:ga:ngFrequencyVaria:on
5. Mi:ga:ngTemperatureVaria:on
6. Mi:ga:ngPowerVaria:on
7. Mi:ga:ngWithinApplica:onVaria:ons
8. Conclusion
44
MiDgaDng ApplicaDon-Level VariaDons
45
• Applica:onsmighthavedifferentphasesorkernelsthatexecutesimultaneouslyandhavedifferent“op:malfrequency”levels
• Unstructuredornon-uniformapplica:ons• Applica:onswithspecialthreads:I/O,communica:onthreads
• Run:mecandecidethe“op:malfrequency”ofeachapplica:ontask
• Transparentfromtheapplica:on• Noneedforapplica:onmodifica:ons,inser:onofdirec:ves• Higherefficiencybymakingautomatedfine-grainop:miza:ons
• Whatis“op:malfrequency”?• Energyminimalfrequency• Lowerfrequencywithoutsacrificingfromperformance• Highestfrequencyunderapower-constraint• Temperaturerestrainingfrequency
Per-core DVFS Support in Three Architectures
��
���
����
����
����
����
����
� � � � � � � � � � �����������������������������
��������������
������������������������
���������������������������������
�����������������������������������������������
�����
������������
�����������
����������
��
���
���
���
���
���
���
� � � � ���������������������������������
��������������
������������������������
����������������������������������������������
������������������������������������������������
�����
������������
�����������
����������
��
���
���
���
���
���
���
� � � � ���������������������������������
��������������
������������������������
����������������������������������������������
�����������������������������������������������������
�����
������������
�����������
����������
46
• IntelHaswellistheonlyplasormthatsupportsper-coreDVFSinproduc:on.
RunDme-based FuncDon-Level OpDmizaDon Approach
Sta$s$csCollec$on
• Collectpowerandperformance:• Foreachentrymethodineverychareinstance
• Foreachfrequencylevel
Op$malFrequencyCalcula$on
• Mode1:Minimalenergymode
• Mode2:Maximumperformance
Op$malFrequencyApplica$on
• Execu@on@meandoverheadthreshold
47
• Charm++“entrymethods”naturallyenablessepara:onandcontrolofdifferentkernelsoftheapplica:ons.
No Core-Level Power Data Available
������������������
������������
� � � �������
�����������������������
����������������������
������������������������������������������
������������
�������������
• Usealockingmechanism–onecoreac:veata:me• Subtractsta:cpowerofidlecores
• Doesmeasurementsonecorereflectoverallmeasurements?
48
Entry-based DVFS
RunningatF1:EnergyOp2malFrequency
Kernel-2
Tlatency
Kernel-1
Toverhead Tx
RunningatF2:EnergyOp2malFrequency
Ω
…
:EnergyLoss
:Sub-op2malEnergy
Ω Ω
RunningatF1
Ω
SendcommandtochangefreqtoF2
FreqischangedtoF2
Ω
Timeline
49
T_overhead=2-5microsecondsT_latency=upto500microseconds
Energy ReducDon for Different Kernel DuraDons
50
• Theop:malfrequencyofthetargetkernelis2.3GHz• Itdoesnotworthtransi:oningfrom1.7-2.8GHz• Shorterkerneldura:onshavelessbenefitduetooverheadandlatency
-25
-15
-5
5
15
25
3.53.33.232.82.72.52.32.221.91.71.51.41.2
EnergyRed
uc,o
n(%
)
Transi,onFrequency(GHz)
10s 1s 0.1s 0.01s
0.001s 0.0005s 0.0001s
Summary: MiDgaDng ApplicaDon VariaDons
• Task-basedrun:mecandofinegrainedop:miza:onsbyop:mizingeachkerneloftheapplica:on
• Theperformanceofakernelortaskdependsonwhatothercoresarerunning◦ Needcore-levelpowercounters
• Per-coreDVFSneedstohavelesslatencyandoverheadtobeprac:cal
51
Outline 1. Introduc:on
2. ADynamicRun:meInterac:ngwithDataCenter’sResourceManager
3. Varia:onAnalysis:Power,Temperature,Frequency
4. Mi:ga:ngFrequencyVaria:on
5. Mi:ga:ngTemperatureVaria:on
6. Mi:ga:ngPowerVaria:on
7. Mi:ga:ngWithinApplica:onVaria:ons
8. Conclusion
52
Concluding Remarks • Largescalesystemsexhibitvariabilityduetovariousreasonsandlikelytocon:nueexhibitinthefuture.
• Variabilityisbadforperformancereproducibilityanddebugging.
• Removingthevariabilityisinherentlydifficult.
• Insteadrun:mesorsoHwaresystemsshouldknowandaccommodateforvariability.
• SupportfromHPCsystemstoenablepower,temperaturerelatedmeasurementsandcontrolsarekeyinachievingthis.
53
Thank you!
54