fast, distributed computa2ons in the cloud · contribu2ons • demonstra2ng how the control plane...
TRANSCRIPT
Fast,DistributedComputa2onsintheCloud
OmidMashayekhi
Advisor:PhilipLevis
April7,2017
22
CloudFrameworks
3
SQL Streaming MachineLearning
Graph
CloudFramework
......
Cloudframeworksabstractawaythecomplexi2esofthecloudinfrastructurefromtheapplica2ondevelopers:
1. Automa2cdistribu2on2. Elas2cscalability3. Mul2tenantapplica2ons4. Loadbalancing5. Faulttolerance
CloudFrameworks
4
SQL Job
ControlPlane
......
• Jobisaninstanceoftheapplica2onrunningintheframework.• Taskistheunitofcomputa2on.• Controlplanemakesthemagichappen:
• Par22oningjobintotasks• Schedulingtasks• Loadbalancing• Faultrecovery
Task
10s 1s 100ms 10ms 1ms
I/O-bounddataanaly2cs
MapReduceHadoop
TaskLength
2004
Evolu2onofCloudFrameworks
5
10s 1s 100ms 10ms 1ms
I/O-bounddataanaly2cs
In-memorydataanaly2cs
MapReduceHadoop
TaskLength
SparkNaiad
2004 2012
Evolu2onofCloudFrameworks
6
10s 1s 100ms 10ms 1ms
I/O-bounddataanaly2cs
In-memorydataanaly2cs
Op2mizeddataanaly2cs
MapReduceHadoop
TaskLength
SparkNaiad
Spark2.0CommonIL
C++
2004 2012 2016
Evolu2onofCloudFrameworks
7
8
10s 1s 100ms 10ms 1ms
I/O-bounddataanaly2cs
In-memorydataanaly2cs
Op2mizeddataanaly2cs
MapReduceHadoop
TaskLength
SparkNaiad
Spark2.0CommonIL
C++
2004 2012 2016
9
• Oneitera2onoflogis2cregressionoveradatasetofsize64MB.• Tasksimplementedefficiently,couldrun50xfaster.
TaskExecu2onTime(ms)
178(6x)
67(16x)
21(51x)
1071
C++
Java
SparkRDD
SparkDataFrame
10s 1s 100ms 10ms 1ms
I/O-bounddataanaly2cs
In-memorydataanaly2cs
Op2mizeddataanaly2cs
MapReduceHadoop
TaskLength
SparkNaiad
Spark2.0CommonIL
C++
2004 2012 2016
10
Individualtasksaregebngfaster.
Butdoesitnecessarilymeanthatjobcomple2on2meisgebngshorter?
CloudFrameworks
11
SQL Job
ControlPlane
......
Taskaregebngordersofmagnitudefaster.
Howaboutthejob?
12
ControlPlaneTheNewBodleneck
• Logis2cregressionoveradatasetofsize100GB.• ClassicSparkusedtobeCPU-bound.
13
ControlPlaneTheNewBodleneck
• Logis2cregressionoveradatasetofsize100GB.• Spark2.0isalreadycontrol-bound.
14
ControlPlaneTheNewBodleneck
• Logis2cregressionoveradatasetofsize100GB.• Spark-opt:hypothe2calcasewhereSparkrunstasksasfastasC++.
15
Controlplaneistheemergingbodleneckforthecloudcompu2ngframeworks.
16
ControlPlaneTheNewBodleneck
• Logis2cregressionoveradatasetofsize100GB.• Nimbuswithexecu<ontemplatesscalesalmostlinearly.
Contribu2ons
• Demonstra2nghowthecontrolplaneistheemergingbo=leneckfordataanaly2csframeworks.
• Execu<onTemplatesasanabstrac2onforthecontrolplaneofcloudcompu2ngframeworks,thatenablesordersofmagnitudehighertaskthroughput,whilekeepingthefine-grained,flexiblescheduling.
• Thedesign,implementa2on,andevalua2onofNimbus,adistributedcloudcompu2ngframeworkthatembedsexecu2ontemplates.
• Ademonstra2onofasingle-coregraphicalsimula<onthatNimbusautoma<callydistributesinthecloudshowingexecu2ontemplatesinprac2ceforcomplexapplica2ons.
17
Thistalk
• ControlPlane:theEmergingBodleneck
• DesignScopeoftheControlPlane
• Execu2onTemplates
• Nimbus:aFrameworkwithTemplates
• Evalua2on
18
Thistalk
• ControlPlane:theEmergingBodleneck
• DesignScopeoftheControlPlane
• Execu2onTemplates
• Nimbus:aFrameworkwithTemplates
• Evalua2on
19
CloudFrameworksDesign
• Currently,therearetwoapproaches:1. Centralizedcontrolmodel.
• Controllergeneratesandassignstaskstotheworker.• Limitedtaskthroughput,butreac2vescheduling.
2. Distributeddataflowmodel.
• Nodesgenerateandspawntaskslocally.• Greatscalability,butsta2cscheduling.
20
21
DesignSpectrum
CentralizedControllers DistributedControllers
22
DesignSpectrum
CentralizedControllers DistributedControllersControllerTaskGraph
Loop
Worker Worker Worker Worker
• Controllercentrallyschedulesandspawnstasks.
• MapReduce• Hadoop• Spark
23
DesignSpectrum
CentralizedControllers DistributedControllersControllerTaskGraph
Loop
Worker Worker Worker Worker
• Controllercentrallyschedulesandspawnstasks.
24
DesignSpectrum
CentralizedControllers DistributedControllersControllerTaskGraph
Loop
Worker Worker Worker Worker
• Controllercentrallyschedulesandspawnstasks.
25
DesignSpectrum
CentralizedControllers DistributedControllersControllerTaskGraph
Loop
Worker Worker Worker Worker
• Controllercouldreac2velyanddynamicallychangetheschedule.
26
DesignSpectrum
CentralizedControllers DistributedControllersControllerTaskGraph
Loop
Worker Worker Worker Worker
• Controllercouldreac2velyanddynamicallychangetheschedule.
27
DesignSpectrum
CentralizedControllers DistributedControllersControllerTaskGraph
Loop
Worker Worker Worker Worker
• Controllercouldreac2velyanddynamicallychangetheschedule.
28
DesignSpectrum
CentralizedControllers DistributedControllersControllerTaskGraph
Loop
Worker Worker Worker Worker
• Butcontrollerbodlenecksatscale.
29
DesignSpectrum
CentralizedControllers DistributedControllersControllerTaskGraph
Loop
Worker Worker Worker Worker
• Butcontrollerbodlenecksatscale.
30
DesignSpectrum
CentralizedControllers DistributedControllersControllerTaskGraph
Loop
Worker Worker Worker Worker
• Butcontrollerbodlenecksatscale.
31
DesignSpectrum
CentralizedControllers DistributedControllersControllerTaskGraph
Loop
Worker Worker Worker Worker
• Butcontrollerbodlenecksatscale.
32
DesignSpectrum
CentralizedControllers DistributedControllersControllerTaskGraph
Loop
Worker Worker Worker Worker
• Butcontrollerbodlenecksatscale.
33
DesignSpectrum
CentralizedControllers DistributedControllers
Workersfallidle
• Logis2cregressionoveradatasetofsize100GBinSpark2.0MLlib.• ControlPlanebo=lenecksatscale,genera2ngandspawningtasks.
34
DesignSpectrum
CentralizedControllers DistributedControllers
Controller
Worker
Loop
Controller
Worker
Loop
Controller
WorkerLoop
Controller
Worker
Loop
Synchroniza2on
• Eachnodegeneratesandexecutestaskslocally.
• Naiad• TensorFlow
35
DesignSpectrum
CentralizedControllers DistributedControllers
Controller
Worker
Controller
Worker
Loop
Controller
Worker
Loop
Controller
Worker
Loop
Controller
Worker
Loop
Controller
Worker
Loop
Controller
Worker
Loop
Controller
Worker
Loop
Synchroniza2on
Loop
• Thedesignscaleswellasthereisnosinglebodleneck.
36
DesignSpectrum
CentralizedControllers DistributedControllers
Controller
Worker
Loop
Controller
Worker
Loop
Controller
WorkerLoop
Controller
Worker
Loop
Synchroniza2on
• But,theschedulingissta2c.• Theprogressspeedisboundtothespeedoftheslowestnode.• Anychangerequiresstoppingallnodesandinstallingnewdataflow.
Straggling
37
DesignSpectrum
CentralizedControllers DistributedControllers
Synchroniza2on
• Inprac2cethestragglermi2ga2onisonlyproac<ve:• Avoidingstragglersbyme2culousengineeringwork.• Launchingbackupworkers(atleastdoublingtheresources).
Controller
BackupWorker
Loop
Controller
BackupWorker
Loop
Controller
BackupWorker
Loop
Controller
BackupWorker
Loop
Loop
Controller
Loop
Controller
Loop
Controller
Loop
Controller
38
DesignSpectrum
CentralizedControllers DistributedControllers
Synchroniza2on
• Inprac2cethestragglermi2ga2onisonlyproac<ve:• Avoidingstragglersbyme2culousengineeringwork.• Launchingbackupworkers(atleastdoublingtheresources).
Controller
BackupWorker
Loop
Controller
BackupWorker
Loop
Controller
BackupWorker
Loop
Controller
BackupWorker
Loop
Loop
Controller
Loop
Controller
Loop
Controller
Loop
Controller
Straggling
39
DesignSpectrum
CentralizedControllers DistributedControllers
Synchroniza2on
• Inprac2cethestragglermi2ga2onisonlyproac<ve:• Avoidingstragglersbyme2culousengineeringwork.• Launchingbackupworkers(atleastdoublingtheresources).
Controller
BackupWorker
Loop
Controller
BackupWorker
Loop
Controller
BackupWorker
Loop
Loop
Controller
Loop
Controller
Loop
Controller
Loop
Controller
BackupWorker
Loop
Controller
40
DesignSpaceSummary
ControlPlaneDesign
ExampleFramework
TaskThroughput
TaskScheduling
Centralized
MapReduce
Low DynamicHadoop
Spark
DistributedNaiad
High Sta2cTensorFlow
41
DesignSpaceSummary
ControlPlaneDesign
ExampleFramework
TaskThroughput
TaskScheduling
Centralized
MapReduce
Low DynamicHadoop
Spark
DistributedNaiad
High Sta2cTensorFlow
Wewouldliketohavethebestofbothworlds:• Hightaskthroughputforfastcomputa2ons.• Dynamic,fine-grainedschedulingdecisions.
Repe22vePaderns
• Advanceddataanaly2csareitera2veinnature.– Machinelearning,graphprocessing,imagerecogni2on,etc.
• Thisresultsinrepe22vepadernsinthecontrolplane.– Similartasksexecutewithminordifferences.
42
Repe22vePaderns
• Advanceddataanaly2csareitera2veinnature.– Machinelearning,graphprocessing,imagerecogni2on,etc.
• Thisresultsinrepe22vepadernsinthecontrolplane.– Similartasksexecutewithminordifferences.
43
while(error>threshold_e){while(gradient>threshold_g){ //Optimizationcodeblock gradient=Gradient(tdata,coeff,param) coeff+=gradient}//Estimationcodeblockerror=Estimate(edata,coeff,param)param=update_model(param,error)
}
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
Thistalk
• ControlPlane:theEmergingBodleneck
• DesignScopeoftheControlPlane
• Execu2onTemplates
• Nimbus:aFrameworkwithTemplates
• Evalua2on
44
Execu2onTemplates
• Tasksarecachedasparameterizableblocksonnodes.
• Insteadofassigningthetasksfromscratch,templatesareinstan<atedbyfillinginonlychangingparameters.
45
Task idData listDep. listFunctionParameter
Task idData listDep. listFunctionParameter
Task idData listDep. listFunctionParameter
Execu2onTemplates
• Tasksarecachedasparameterizableblocksonnodes.
• Insteadofassigningthetasksfromscratch,templatesareinstan<atedbyfillinginonlychangingparameters.
46
Task idData listDep. listFunctionParameter
Task idData listDep. listFunctionParameter
Task idData listDep. listFunctionParameter
Load NewTask ids
Parameters
T1P1
T2P2
T3P3
47
Execu2onTemplatesMechanismsSummary
• Instan<a<on:spawnablockoftaskswithoutprocessingeachtaskindividuallyfromscratch.Ithelpsincreasethetaskthroughput.
• Edits:modifiesthecontentofeachtemplateatthegranularityoftasks.Itenablesfine-grained,dynamicscheduling.
• Patches:Incasethestateoftheworkerdoesnotmatchtheprecondi2onsofthetemplate.Itenablesdynamiccontrolflow.
Execu2onModel
48
ControllerDriver Program
Data
Map
Reduce
Dataflo
w
Worker Worker
Execu2onModel
49
ControllerDriver Program
Data
Map
Reduce
Dataflo
w
Worker Worker
TaskGraph
Execu2onModel
50
ControllerDriver Program
Data
Map
Reduce
Dataflo
w
Worker Worker
DataObjects DataObjects
TaskGraph
Execu2onModel
51
ControllerDriver Program
Data
Map
Reduce
Dataflo
w
Worker Worker
DataObjects DataObjects
CTaskGraph
Execu2onModel
52
ControllerDriver Program
Data
Map
Reduce
Dataflo
w
Worker Worker
DataObjects DataObjects
C
Task idData listDep. listFunctionParameter
TaskGraph
Execu2onModel
53
ControllerDriver Program
Data
Map
Reduce
Dataflo
w
Worker Worker
DataObjects DataObjects
DataExchange C
Task idData listDep. listFunctionParameter
TaskGraph
Repe22vePaderns
54
Controller
Worker Worker
DataObjects DataObjects
TaskGraph
while(error>threshold_e){while(gradient>threshold_g){ //Optimizationcodeblock gradient=Gradient(tdata,coeff,param) coeff+=gradient}//Estimationcodeblockerror=Estimate(edata,coeff,param)param=update_model(param,error)
}
Driver Program
Repe22vePaderns
55
Controller
Worker Worker
DataObjects DataObjects
C
Task idData listDep. listFunctionParameter
TaskGraph
while(error>threshold_e){while(gradient>threshold_g){ //Optimizationcodeblock gradient=Gradient(tdata,coeff,param) coeff+=gradient}//Estimationcodeblockerror=Estimate(edata,coeff,param)param=update_model(param,error)
}
Driver Program
Repe22vePaderns
56
Controller
Worker Worker
DataObjects DataObjects
TaskGraph
while(error>threshold_e){while(gradient>threshold_g){ //Optimizationcodeblock gradient=Gradient(tdata,coeff,param) coeff+=gradient}//Estimationcodeblockerror=Estimate(edata,coeff,param)param=update_model(param,error)
}
Driver Program
DataExchange C
Task idData listDep. listFunctionParameter
Repe22vePaderns
57
Controller
Worker Worker
DataObjects DataObjects
C
Task idData listDep. listFunctionParameter
TaskGraph
while(error>threshold_e){while(gradient>threshold_g){ //Optimizationcodeblock gradient=Gradient(tdata,coeff,param) coeff+=gradient}//Estimationcodeblockerror=Estimate(edata,coeff,param)param=update_model(param,error)
}
Driver Program
Repe22vePaderns
58
Controller
Worker Worker
DataObjects DataObjects
TaskGraph
while(error>threshold_e){while(gradient>threshold_g){ //Optimizationcodeblock gradient=Gradient(tdata,coeff,param) coeff+=gradient}//Estimationcodeblockerror=Estimate(edata,coeff,param)param=update_model(param,error)
}
Driver Program
DataExchange C
Task idData listDep. listFunctionParameter
Execu2onTemplatesAbstrac2on
59
Controller
Worker Worker
CTaskGraph
DataObjects DataObjects
Execu2onTemplatesAbstrac2on
60
Controller
Worker Worker
TaskGraph
DataObjects DataObjects
CCTemplate
Template
Execu2onTemplatesAbstrac2on
61
Controller
Worker Worker
TaskGraph
DataObjects DataObjects
CTemplate
Template
Execu2onTemplatesAbstrac2on
62
Controller
Worker Worker
TaskGraph
DataObjects DataObjects
CTemplate
Template
Instantiate<params> Instantiate<params>
Execu2onTemplatesAbstrac2on
63
Controller
Worker Worker
TaskGraph
DataObjects DataObjects
CCTemplate
Template
Execu2onTemplatesAbstrac2on
64
Controller
Worker Worker
TaskGraph
DataObjects DataObjects
CTemplate
Template
Execu2onTemplates
TheDevilisinthedetails.
• Cachingtasksimpliessta2cbehavior:
– Templatesanddynamicscheduling?
• Reac2veschedulingchangesforloadbalancing.
• Schedulingchangesatthetaskgranularity.
– Templatesanddynamiccontrolflow?
• Needtosupportnestedloops.
• Needtosupportdatadependentbranches.
65
Execu2onTemplates
TheDevilisinthedetails.
• Cachingtasksimpliessta2cbehavior:
– Templatesanddynamicscheduling?
• Reac2veschedulingchangesforloadbalancing.
• Schedulingchangesatthetaskgranularity.
– Templatesanddynamiccontrolflow?
• Needtosupportnestedloops.
• Needtosupportdatadependentbranches.
66
Execu2onTemplatesEdits
• Ifschedulingchanges,evenslightly,thetemplatesareobsolete.
– Forexamplemigra2ngtasksamongworkers.
• Insteadofpayingthesubstan2alcostofinstallingtemplatesforeverychanges,templatesallowedit,tochangetheirstructure.
• Editsenableaddingorremovingtasksfromthetemplateandmodifyingthetemplatecontent,in-place.
• Controllerhasthegeneralviewofthetaskgraphsoitcanupdatethedependenciesproperly,neededbytheedits.
67
Execu2onTemplatesEdits
68
Controller
Worker
TaskGraph
DataObjects
CTemplate
Worker
DataObjects
Template
Migrateonetask
Execu2onTemplatesEdits
69
Controller
Worker
TaskGraph
DataObjects
CTemplate
Worker
DataObjects
Template
Edit<remove>Edit<add>
Execu2onTemplatesEdits
70
Controller
Worker
TaskGraph
DataObjects
CTemplate
Worker
DataObjects
Template
Execu2onTemplatesEdits
71
Controller
Worker
TaskGraph
DataObjects
CTemplate
Worker
DataObjects
Template
Instantiate<params> Instantiate<params>
Execu2onTemplates
TheDevilisinthedetails.
• Cachingtasksimpliessta2cbehavior:
– Templatesanddynamicscheduling?
• Reac2veschedulingchangesforloadbalancing.
• Schedulingchangesatthetaskgranularity.
– Templatesanddynamiccontrolflow?
• Needtosupportnestedloops.
• Needtosupportdatadependentbranches.
72
Execu2onTemplatesGranularity
73
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
Execu2onTemplatesGranularity
74
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
• Themoretaskscachedinthetemplatethebeder.
– Thecostoftemplateinstan2a2onisamor2zedovergreaternumberoftasks.
– Butloopunrollingonlyworksforsta2ccontrolflow.
Execu2onTemplatesGranularity
75
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
Template
Execu2onTemplatesGranularity
76
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
Template
Execu2onTemplatesGranularity
77
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
Template
• Cannotreusethetemplate(onlytwoitera2onsoftheinnerloop).
Execu2onTemplatesGranularity
78
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
• Templatescannotgobeyondabranchinthedriverprogram.
• Execu2ontemplatesoperatesatthegranularityofbasicblocks:
– Acodeblockwithsingleentryandnobranchesexceptattheend.
– Itisthebiggestblockwithoutsacrificingdynamiccontrolflow.
Execu2onTemplatesGranularity
79
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
Template1
Execu2onTemplatesGranularity
80
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
Template1 Instan2ateTemplate1
Instan2ateTemplate1
Instan2ateTemplate1
Instan2ateTemplate1
Execu2onTemplatesGranularity
81
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
Template2
Execu2onTemplatesGranularity
82
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
Template2 Instan2ateTemplate2
Execu2onTemplatesGranularity
83
Controller
Worker Worker
TaskGraph
DataObjects DataObjects
C
StartTemplate
EndTemplate EndTemplate
StartTemplate
Execu2onTemplatesGranularity
84
Controller
Worker Worker
TaskGraph
DataObjects DataObjects
CTemplate
Template
Execu2onTemplatesPatching
85
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
• Withdynamiccontrolflowabasicblockcanhavedifferententries.
• Theexecu2onstateisnotsimilarinallcircumstances.
Execu2onTemplatesPatching
86
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
Instan2ateTemplate1
Instan2ateTemplate1
Execu2onTemplatesPatching
87
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
Instan2ateTemplate1
Instan2ateTemplate1
Execu2onTemplatesPatching
88
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
Instan2ateTemplate1
Instan2ateTemplate1
Execu2onTemplatesPatching
89
TrainingData
Es,ma,onData
Parameters
ErrorEs,ma,onItera,veOp,mizer
Coeffi
cien
ts
Instan2ateTemplate1
Instan2ateTemplate1
Updatedmodelparametersonlyonthereducer
Execu2onTemplatesPatching
• Eachtemplatehasasetofprecondi<onsthatneedtobesa2sfiedbeforeitcanbeinstan2ated.
– Forexamplethesetofdataobjectsinmemory,accessedbythetaskscachedinthetemplate.
90
Execu2onTemplatesPatching
91
Controller
Worker Worker
TaskGraph
DataObjects DataObjects
CTemplate
Template
Execu2onTemplatesPatching
92
Controller
Worker Worker
TaskGraph
DataObjects DataObjects
CTemplate
Template
Precondi2ons Precondi2ons
Execu2onTemplatesPatching
• Eachtemplatehasasetofprecondi<onsthatneedtobesa2sfiedbeforeitcanbeinstan2ated.
– Forexamplethesetofdataobjectsinmemory,accessedbythetaskscachedinthetemplate.
• Workerstatemightnotmatchtheprecondi2onsofthetemplateinallcircumstances.
• Controllerpatchestheworkerstatebeforetemplateinstan2a2on,tosa2sfytheprecondi2ons.
93
Execu2onTemplatesPatching
94
Controller
Worker Worker
TaskGraph
DataObjects DataObjects
CTemplate
Template
Precondi2ons Precondi2ons
Execu2onTemplatesPatching
95
Controller
Worker Worker
TaskGraph
DataObjects DataObjects
CTemplate
Template
Precondi2ons Precondi2ons
Patch<load>
Execu2onTemplatesPatching
96
Controller
Worker Worker
TaskGraph
DataObjects DataObjects
CTemplate
Template
Precondi2ons Precondi2ons
Execu2onTemplatesPatching
97
Controller
Worker Worker
TaskGraph
DataObjects DataObjects
CTemplate
Template
Instantiate<params> Instantiate<params>
Precondi2ons Precondi2ons
Execu2onTemplatesPatching
98
Controller
Worker Worker
TaskGraph
DataObjects DataObjects
CCTemplate
Template
Precondi2ons Precondi2ons
99
Execu2onTemplatesMechanismsSummary
• Instan<a<on:spawnablockoftaskswithoutprocessingeachtaskindividuallyfromscratch.Ithelpsincreasethetaskthroughput.
• Edits:modifiesthecontentofeachtemplateatthegranularityoftasks.Itenablesfine-grained,dynamicscheduling.
• Patches:Incasethestateoftheworkerdoesnotmatchtheprecondi2onsofthetemplate.Itenablesdynamiccontrolflow.
Thistalk
• ControlPlane:theEmergingBodleneck
• DesignScopeoftheControlPlane
• Execu2onTemplates
• Nimbus:aFrameworkwithTemplates
• Evalua2on
100
101
Nimbus• Nimbusisdesignedforlowlatency,fastcomputa2onsinthecloud.
– ImplementedinC++(thecorelibraryis~35,000semicolons).
– Mutabledatamodeltoallowin-placeopera2ons.
• Nimbusembedsexecu2ontemplatesforitscontrolplane.
– Thecentralizedcontrollerallowsdynamicschedulingandresourcealloca2on.
– Execu2ontemplateshelpdeliverhightaskthroughputatscale.
• Nimbussupportstradi2onaldataanaly2csaswellasEulerianandhybridgraphicalsimula2ons;forthefirst2meinacloudframework.
– Supervised/unsupervisedlearningalgorithms,graphlibrary.
– PhysBAMlibrary(water,smoke,etc.)
NimbusControlFlow
102
Controller
Worker Worker
Controller
Worker Worker
Controller
Worker Worker
Controller
Worker Worker
• Tasksspawnothertasksforexecu2on(similartoLegion).
• Driverprogramisalineageoftasksexecu2ngontheworkers.
• MoreflexibleDAGforthetaskgraph.
• Notjustnarrowandwidedependencies.• Neededforgraphicalsimula2ons.
NimbusControllerandWorkerTemplates
103
Controller
Worker Worker Worker Worker
Instantiate
ControllerTemplates
NimbusControllerandWorkerTemplates
104
Controller
Worker Worker Worker Worker
ControllerTemplates
NimbusControllerandWorkerTemplates
105
Controller
Worker Worker Worker Worker
WorkerTemplates
Inst.Inst.Inst.Inst.
NimbusControllerandWorkerTemplates
106
Controller
Worker Worker Worker Worker
CCC
WorkerTemplates
NimbusGraphicalSimula2ons
107
Driver Program:
Partition prt = {2, 1, 1};Create(velocity, prt);
Op(exec: advect, data: velocity, read: core/ghost, write: ghost);
...
Laun
cher
PhysicalData
Mappings
Controller
BA C D
21 34 5 6
LogicalDataCopyTasks
TranslatorManager
app.so
21 3
TranslatorManager
4 5 6
app.so
PhysicalTask(advect, {1,2,3})
PhysicalTask(advect, {4,5,6})LogicalTask(advect, {A,B,C})LogicalTask(advect, {B,C,D})
CompuBngNodes
GeometricTask(advect, left_reg)GeometricTask(advect, right_reg)
ApplicaBonData
• Thegoalistoautoma2callydistributesequen2allibrarykernels.• Fourlayerdataabstrac2on(geometric,logical,physical,applica2on).• Automa2ctransla2onandcachingbetweenthedatalayers.
nimbus.stanford.edu
108
• Formoreinforma2onyoucanvisitNimbuswebsite.
Thistalk
• ControlPlane:theEmergingBodleneck
• DesignScopeoftheControlPlane
• Execu2onTemplates
• Nimbus:aFrameworkwithTemplates
• Evalua2on
109
• Controlplanetaskthroughput:– Execu2ontemplatesmatchthestrongscalingperformanceofframeworkswithdistributedcontrolplanedesign.
• Dynamicscheduling:
– Execu2ontemplatesallowslowcost,reac2veschedulinganddynamicresourcealloca2onsimilartoacentralizedframeworks.
• Dynamiccontrolflow:– Execu2ontemplatescanhandleapplica2onswithnestedloopsanddatadependentbrancheswithlowoverhead.
110
Evalua2onResultsSummary
Evalua2onStrongScalabilitywithTemplates
111
• Logis2cregressionoverdatasetofsize100GB.• Spark-optandNaiad-opt,runstasksasfastasC++implementa2on.• Nimbuscentralizedcontrollerwithexecu2ontemplatesmatchestheperformanceofNaiadwithadistributedcontrolplane.
Evalua2onReac2ve,Fine-GrainedSchedulingwithTemplates
112
Migra2ng5%ofthetasks
• Logis2cregressionoverdatasetofsize100GB,on100workers.• Naiad-optcurveissimulated(migra2onsevery5itera2ons).• Execu2ontemplatesallowlowcost,reac2veschedulingchangesthrougheditsattaskgranularity.
• Singleeditoverheadisonly41μs(inaverage).
Evalua2onDynamicResourceAlloca2onwithTemplates
113
• Logis2cregressionover100GBofdata,on50/100workers.• One-2metemplateinstalla2oncostis~40%ofdirecttaskscheduling.• Nimbusallowsdynamicresourcealloca2on.• Nimbusinstallsmul2pleversionsofatemplatedependingonresources.
x100x100 x50
Valida2ngprecondi2onsfortemplatereuse
Installingnewtemplates
Evalua2onHighTaskThroughputwithTemplates
114
• SparkandNimbusbothhavecentralizedcontroller.• Nimbustaskthroughputscalessuperlinearlywithmoreworkers.
• O(N2):moretasksandshortertasks,simultaneously.• Forataskgraphswithsinglestage:
• Instan2a2oncostis<2μspertask(500,000taskspersecond).
Evalua2onGraphicalSimula2onsDistributedinNimbus
115
Evalua2onComplexi2esofGraphicalSimula2ons
116
Levelset Posi2vePar2cles Posi2veRemovedPar2cles
Velocity Nega2vePar2cles Nega2veRemovedPar2cles
• 40differentvariables:scalar,vector,par2cle.• Triplynestedloopwithdatadependentbranches.
• 9differenttemplates(basicblocks).• 3branchesthatneedpatching.
Evalua2onSpeedupwithTemplates
117
• Canonicalwatersimula2onsunderNimbusandMPI.• Withouttemplates,Nimbusisalmost6xslowerthanMPI.• Slowdownmeanseitherlowerresolu2onormore2me/money.
Evalua2onSpeedupwithTemplates
118
• Canonicalwatersimula2onsunderNimbusandMPI.• Withouttemplates,Nimbusisalmost6xslowerthanMPI.• Slowdownmeanseitherlowerresolu2onormore2me/money.
$180
$30
Evalua2onComparisonwithHand-TunedMPI
119
• Canonicalwatersimula2onsunderNimbusandMPI.• Nimbusperformanceiswithin3-15%ofthehand-tunedMIP.• At512cores,therearemorethan1milliondis2nctdataobjectsandtaskthroughputpicksat460,000taskspersecond.
Evalua2onLoadBalancingandFaultRecoverywithTemplates
120
0 20 40 60Time (minute)
0
200
400
Itera
tion N
um
ber Enabled
Disabled
rewind from checkpointcheckpoint
checkpointone node fails
one node straggles
• Nimbuscontrolleradaptstothestragglersandworkerfailures.• Templatesareseamlesslyinstalledasschedulechanges.
Contribu2ons
• Demonstra2nghowthecontrolplaneistheemergingbo=leneckfordataanaly2csframeworks.
• Execu<onTemplatesasanabstrac2onforthecontrolplaneofcloudcompu2ngframeworks,thatenablesordersofmagnitudehighertaskthroughput,whilekeepingthefine-grained,flexiblescheduling.
• Thedesign,implementa2on,andevalua2onofNimbus,adistributedcloudcompu2ngframeworkthatembedsexecu2ontemplates.
• Ademonstra2onofasingle-coregraphicalsimula<onthatNimbusautoma<callydistributesinthecloudshowingexecu2ontemplatesinprac2ceforcomplexapplica2ons.
121
Conclusion
122
ControlPlaneDesign ExampleFramework
TaskThroughput
TaskScheduling
Centralized
MapReduce
Low DynamicHadoop
Spark
DistributedNaiad
High Sta2cTensorFlow
Centralizedw/Execu2onTemplates
Nimbus High Dynamic
123
ThankYou!