fast, distributed computa2ons in the cloud · contribu2ons • demonstra2ng how the control plane...

123
Fast, Distributed Computa2ons in the Cloud Omid Mashayekhi Advisor: Philip Levis April 7, 2017

Upload: others

Post on 22-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Fast,DistributedComputa2onsintheCloud

OmidMashayekhi

Advisor:PhilipLevis

April7,2017

Page 2: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

22

Page 3: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

CloudFrameworks

3

SQL Streaming MachineLearning

Graph

CloudFramework

......

Cloudframeworksabstractawaythecomplexi2esofthecloudinfrastructurefromtheapplica2ondevelopers:

1.  Automa2cdistribu2on2.  Elas2cscalability3.  Mul2tenantapplica2ons4.  Loadbalancing5.  Faulttolerance

Page 4: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

CloudFrameworks

4

SQL Job

ControlPlane

......

• Jobisaninstanceoftheapplica2onrunningintheframework.• Taskistheunitofcomputa2on.• Controlplanemakesthemagichappen:

• Par22oningjobintotasks• Schedulingtasks• Loadbalancing• Faultrecovery

Task

Page 5: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

10s 1s 100ms 10ms 1ms

I/O-bounddataanaly2cs

MapReduceHadoop

TaskLength

2004

Evolu2onofCloudFrameworks

5

Page 6: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

10s 1s 100ms 10ms 1ms

I/O-bounddataanaly2cs

In-memorydataanaly2cs

MapReduceHadoop

TaskLength

SparkNaiad

2004 2012

Evolu2onofCloudFrameworks

6

Page 7: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

10s 1s 100ms 10ms 1ms

I/O-bounddataanaly2cs

In-memorydataanaly2cs

Op2mizeddataanaly2cs

MapReduceHadoop

TaskLength

SparkNaiad

Spark2.0CommonIL

C++

2004 2012 2016

Evolu2onofCloudFrameworks

7

Page 8: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

8

10s 1s 100ms 10ms 1ms

I/O-bounddataanaly2cs

In-memorydataanaly2cs

Op2mizeddataanaly2cs

MapReduceHadoop

TaskLength

SparkNaiad

Spark2.0CommonIL

C++

2004 2012 2016

Page 9: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

9

• Oneitera2onoflogis2cregressionoveradatasetofsize64MB.• Tasksimplementedefficiently,couldrun50xfaster.

TaskExecu2onTime(ms)

178(6x)

67(16x)

21(51x)

1071

C++

Java

SparkRDD

SparkDataFrame

10s 1s 100ms 10ms 1ms

I/O-bounddataanaly2cs

In-memorydataanaly2cs

Op2mizeddataanaly2cs

MapReduceHadoop

TaskLength

SparkNaiad

Spark2.0CommonIL

C++

2004 2012 2016

Page 10: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

10

Individualtasksaregebngfaster.

Butdoesitnecessarilymeanthatjobcomple2on2meisgebngshorter?

Page 11: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

CloudFrameworks

11

SQL Job

ControlPlane

......

Taskaregebngordersofmagnitudefaster.

Howaboutthejob?

Page 12: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

12

ControlPlaneTheNewBodleneck

• Logis2cregressionoveradatasetofsize100GB.• ClassicSparkusedtobeCPU-bound.

Page 13: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

13

ControlPlaneTheNewBodleneck

• Logis2cregressionoveradatasetofsize100GB.• Spark2.0isalreadycontrol-bound.

Page 14: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

14

ControlPlaneTheNewBodleneck

• Logis2cregressionoveradatasetofsize100GB.• Spark-opt:hypothe2calcasewhereSparkrunstasksasfastasC++.

Page 15: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

15

Controlplaneistheemergingbodleneckforthecloudcompu2ngframeworks.

Page 16: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

16

ControlPlaneTheNewBodleneck

• Logis2cregressionoveradatasetofsize100GB.• Nimbuswithexecu<ontemplatesscalesalmostlinearly.

Page 17: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Contribu2ons

•  Demonstra2nghowthecontrolplaneistheemergingbo=leneckfordataanaly2csframeworks.

•  Execu<onTemplatesasanabstrac2onforthecontrolplaneofcloudcompu2ngframeworks,thatenablesordersofmagnitudehighertaskthroughput,whilekeepingthefine-grained,flexiblescheduling.

•  Thedesign,implementa2on,andevalua2onofNimbus,adistributedcloudcompu2ngframeworkthatembedsexecu2ontemplates.

•  Ademonstra2onofasingle-coregraphicalsimula<onthatNimbusautoma<callydistributesinthecloudshowingexecu2ontemplatesinprac2ceforcomplexapplica2ons.

17

Page 18: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Thistalk

•  ControlPlane:theEmergingBodleneck

•  DesignScopeoftheControlPlane

•  Execu2onTemplates

•  Nimbus:aFrameworkwithTemplates

•  Evalua2on

18

Page 19: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Thistalk

•  ControlPlane:theEmergingBodleneck

•  DesignScopeoftheControlPlane

•  Execu2onTemplates

•  Nimbus:aFrameworkwithTemplates

•  Evalua2on

19

Page 20: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

CloudFrameworksDesign

•  Currently,therearetwoapproaches:1.  Centralizedcontrolmodel.

•  Controllergeneratesandassignstaskstotheworker.•  Limitedtaskthroughput,butreac2vescheduling.

2.  Distributeddataflowmodel.

•  Nodesgenerateandspawntaskslocally.•  Greatscalability,butsta2cscheduling.

20

Page 21: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

21

DesignSpectrum

CentralizedControllers DistributedControllers

Page 22: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

22

DesignSpectrum

CentralizedControllers DistributedControllersControllerTaskGraph

Loop

Worker Worker Worker Worker

• Controllercentrallyschedulesandspawnstasks.

• MapReduce• Hadoop• Spark

Page 23: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

23

DesignSpectrum

CentralizedControllers DistributedControllersControllerTaskGraph

Loop

Worker Worker Worker Worker

• Controllercentrallyschedulesandspawnstasks.

Page 24: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

24

DesignSpectrum

CentralizedControllers DistributedControllersControllerTaskGraph

Loop

Worker Worker Worker Worker

• Controllercentrallyschedulesandspawnstasks.

Page 25: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

25

DesignSpectrum

CentralizedControllers DistributedControllersControllerTaskGraph

Loop

Worker Worker Worker Worker

• Controllercouldreac2velyanddynamicallychangetheschedule.

Page 26: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

26

DesignSpectrum

CentralizedControllers DistributedControllersControllerTaskGraph

Loop

Worker Worker Worker Worker

• Controllercouldreac2velyanddynamicallychangetheschedule.

Page 27: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

27

DesignSpectrum

CentralizedControllers DistributedControllersControllerTaskGraph

Loop

Worker Worker Worker Worker

• Controllercouldreac2velyanddynamicallychangetheschedule.

Page 28: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

28

DesignSpectrum

CentralizedControllers DistributedControllersControllerTaskGraph

Loop

Worker Worker Worker Worker

• Butcontrollerbodlenecksatscale.

Page 29: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

29

DesignSpectrum

CentralizedControllers DistributedControllersControllerTaskGraph

Loop

Worker Worker Worker Worker

• Butcontrollerbodlenecksatscale.

Page 30: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

30

DesignSpectrum

CentralizedControllers DistributedControllersControllerTaskGraph

Loop

Worker Worker Worker Worker

• Butcontrollerbodlenecksatscale.

Page 31: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

31

DesignSpectrum

CentralizedControllers DistributedControllersControllerTaskGraph

Loop

Worker Worker Worker Worker

• Butcontrollerbodlenecksatscale.

Page 32: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

32

DesignSpectrum

CentralizedControllers DistributedControllersControllerTaskGraph

Loop

Worker Worker Worker Worker

• Butcontrollerbodlenecksatscale.

Page 33: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

33

DesignSpectrum

CentralizedControllers DistributedControllers

Workersfallidle

• Logis2cregressionoveradatasetofsize100GBinSpark2.0MLlib.• ControlPlanebo=lenecksatscale,genera2ngandspawningtasks.

Page 34: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

34

DesignSpectrum

CentralizedControllers DistributedControllers

Controller

Worker

Loop

Controller

Worker

Loop

Controller

WorkerLoop

Controller

Worker

Loop

Synchroniza2on

• Eachnodegeneratesandexecutestaskslocally.

• Naiad• TensorFlow

Page 35: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

35

DesignSpectrum

CentralizedControllers DistributedControllers

Controller

Worker

Controller

Worker

Loop

Controller

Worker

Loop

Controller

Worker

Loop

Controller

Worker

Loop

Controller

Worker

Loop

Controller

Worker

Loop

Controller

Worker

Loop

Synchroniza2on

Loop

• Thedesignscaleswellasthereisnosinglebodleneck.

Page 36: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

36

DesignSpectrum

CentralizedControllers DistributedControllers

Controller

Worker

Loop

Controller

Worker

Loop

Controller

WorkerLoop

Controller

Worker

Loop

Synchroniza2on

• But,theschedulingissta2c.• Theprogressspeedisboundtothespeedoftheslowestnode.• Anychangerequiresstoppingallnodesandinstallingnewdataflow.

Straggling

Page 37: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

37

DesignSpectrum

CentralizedControllers DistributedControllers

Synchroniza2on

• Inprac2cethestragglermi2ga2onisonlyproac<ve:• Avoidingstragglersbyme2culousengineeringwork.• Launchingbackupworkers(atleastdoublingtheresources).

Controller

BackupWorker

Loop

Controller

BackupWorker

Loop

Controller

BackupWorker

Loop

Controller

BackupWorker

Loop

Loop

Controller

Loop

Controller

Loop

Controller

Loop

Controller

Page 38: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

38

DesignSpectrum

CentralizedControllers DistributedControllers

Synchroniza2on

• Inprac2cethestragglermi2ga2onisonlyproac<ve:• Avoidingstragglersbyme2culousengineeringwork.• Launchingbackupworkers(atleastdoublingtheresources).

Controller

BackupWorker

Loop

Controller

BackupWorker

Loop

Controller

BackupWorker

Loop

Controller

BackupWorker

Loop

Loop

Controller

Loop

Controller

Loop

Controller

Loop

Controller

Straggling

Page 39: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

39

DesignSpectrum

CentralizedControllers DistributedControllers

Synchroniza2on

• Inprac2cethestragglermi2ga2onisonlyproac<ve:• Avoidingstragglersbyme2culousengineeringwork.• Launchingbackupworkers(atleastdoublingtheresources).

Controller

BackupWorker

Loop

Controller

BackupWorker

Loop

Controller

BackupWorker

Loop

Loop

Controller

Loop

Controller

Loop

Controller

Loop

Controller

BackupWorker

Loop

Controller

Page 40: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

40

DesignSpaceSummary

ControlPlaneDesign

ExampleFramework

TaskThroughput

TaskScheduling

Centralized

MapReduce

Low DynamicHadoop

Spark

DistributedNaiad

High Sta2cTensorFlow

Page 41: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

41

DesignSpaceSummary

ControlPlaneDesign

ExampleFramework

TaskThroughput

TaskScheduling

Centralized

MapReduce

Low DynamicHadoop

Spark

DistributedNaiad

High Sta2cTensorFlow

Wewouldliketohavethebestofbothworlds:• Hightaskthroughputforfastcomputa2ons.• Dynamic,fine-grainedschedulingdecisions.

Page 42: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Repe22vePaderns

•  Advanceddataanaly2csareitera2veinnature.–  Machinelearning,graphprocessing,imagerecogni2on,etc.

•  Thisresultsinrepe22vepadernsinthecontrolplane.–  Similartasksexecutewithminordifferences.

42

Page 43: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Repe22vePaderns

•  Advanceddataanaly2csareitera2veinnature.–  Machinelearning,graphprocessing,imagerecogni2on,etc.

•  Thisresultsinrepe22vepadernsinthecontrolplane.–  Similartasksexecutewithminordifferences.

43

while(error>threshold_e){while(gradient>threshold_g){ //Optimizationcodeblock gradient=Gradient(tdata,coeff,param) coeff+=gradient}//Estimationcodeblockerror=Estimate(edata,coeff,param)param=update_model(param,error)

}

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

Page 44: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Thistalk

•  ControlPlane:theEmergingBodleneck

•  DesignScopeoftheControlPlane

•  Execu2onTemplates

•  Nimbus:aFrameworkwithTemplates

•  Evalua2on

44

Page 45: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplates

•  Tasksarecachedasparameterizableblocksonnodes.

•  Insteadofassigningthetasksfromscratch,templatesareinstan<atedbyfillinginonlychangingparameters.

45

Task idData listDep. listFunctionParameter

Task idData listDep. listFunctionParameter

Task idData listDep. listFunctionParameter

Page 46: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplates

•  Tasksarecachedasparameterizableblocksonnodes.

•  Insteadofassigningthetasksfromscratch,templatesareinstan<atedbyfillinginonlychangingparameters.

46

Task idData listDep. listFunctionParameter

Task idData listDep. listFunctionParameter

Task idData listDep. listFunctionParameter

Load NewTask ids

Parameters

T1P1

T2P2

T3P3

Page 47: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

47

Execu2onTemplatesMechanismsSummary

•  Instan<a<on:spawnablockoftaskswithoutprocessingeachtaskindividuallyfromscratch.Ithelpsincreasethetaskthroughput.

•  Edits:modifiesthecontentofeachtemplateatthegranularityoftasks.Itenablesfine-grained,dynamicscheduling.

•  Patches:Incasethestateoftheworkerdoesnotmatchtheprecondi2onsofthetemplate.Itenablesdynamiccontrolflow.

Page 48: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onModel

48

ControllerDriver Program

Data

Map

Reduce

Dataflo

w

Worker Worker

Page 49: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onModel

49

ControllerDriver Program

Data

Map

Reduce

Dataflo

w

Worker Worker

TaskGraph

Page 50: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onModel

50

ControllerDriver Program

Data

Map

Reduce

Dataflo

w

Worker Worker

DataObjects DataObjects

TaskGraph

Page 51: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onModel

51

ControllerDriver Program

Data

Map

Reduce

Dataflo

w

Worker Worker

DataObjects DataObjects

CTaskGraph

Page 52: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onModel

52

ControllerDriver Program

Data

Map

Reduce

Dataflo

w

Worker Worker

DataObjects DataObjects

C

Task idData listDep. listFunctionParameter

TaskGraph

Page 53: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onModel

53

ControllerDriver Program

Data

Map

Reduce

Dataflo

w

Worker Worker

DataObjects DataObjects

DataExchange C

Task idData listDep. listFunctionParameter

TaskGraph

Page 54: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Repe22vePaderns

54

Controller

Worker Worker

DataObjects DataObjects

TaskGraph

while(error>threshold_e){while(gradient>threshold_g){ //Optimizationcodeblock gradient=Gradient(tdata,coeff,param) coeff+=gradient}//Estimationcodeblockerror=Estimate(edata,coeff,param)param=update_model(param,error)

}

Driver Program

Page 55: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Repe22vePaderns

55

Controller

Worker Worker

DataObjects DataObjects

C

Task idData listDep. listFunctionParameter

TaskGraph

while(error>threshold_e){while(gradient>threshold_g){ //Optimizationcodeblock gradient=Gradient(tdata,coeff,param) coeff+=gradient}//Estimationcodeblockerror=Estimate(edata,coeff,param)param=update_model(param,error)

}

Driver Program

Page 56: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Repe22vePaderns

56

Controller

Worker Worker

DataObjects DataObjects

TaskGraph

while(error>threshold_e){while(gradient>threshold_g){ //Optimizationcodeblock gradient=Gradient(tdata,coeff,param) coeff+=gradient}//Estimationcodeblockerror=Estimate(edata,coeff,param)param=update_model(param,error)

}

Driver Program

DataExchange C

Task idData listDep. listFunctionParameter

Page 57: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Repe22vePaderns

57

Controller

Worker Worker

DataObjects DataObjects

C

Task idData listDep. listFunctionParameter

TaskGraph

while(error>threshold_e){while(gradient>threshold_g){ //Optimizationcodeblock gradient=Gradient(tdata,coeff,param) coeff+=gradient}//Estimationcodeblockerror=Estimate(edata,coeff,param)param=update_model(param,error)

}

Driver Program

Page 58: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Repe22vePaderns

58

Controller

Worker Worker

DataObjects DataObjects

TaskGraph

while(error>threshold_e){while(gradient>threshold_g){ //Optimizationcodeblock gradient=Gradient(tdata,coeff,param) coeff+=gradient}//Estimationcodeblockerror=Estimate(edata,coeff,param)param=update_model(param,error)

}

Driver Program

DataExchange C

Task idData listDep. listFunctionParameter

Page 59: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesAbstrac2on

59

Controller

Worker Worker

CTaskGraph

DataObjects DataObjects

Page 60: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesAbstrac2on

60

Controller

Worker Worker

TaskGraph

DataObjects DataObjects

CCTemplate

Template

Page 61: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesAbstrac2on

61

Controller

Worker Worker

TaskGraph

DataObjects DataObjects

CTemplate

Template

Page 62: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesAbstrac2on

62

Controller

Worker Worker

TaskGraph

DataObjects DataObjects

CTemplate

Template

Instantiate<params> Instantiate<params>

Page 63: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesAbstrac2on

63

Controller

Worker Worker

TaskGraph

DataObjects DataObjects

CCTemplate

Template

Page 64: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesAbstrac2on

64

Controller

Worker Worker

TaskGraph

DataObjects DataObjects

CTemplate

Template

Page 65: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplates

TheDevilisinthedetails.

• Cachingtasksimpliessta2cbehavior:

–  Templatesanddynamicscheduling?

•  Reac2veschedulingchangesforloadbalancing.

•  Schedulingchangesatthetaskgranularity.

–  Templatesanddynamiccontrolflow?

•  Needtosupportnestedloops.

•  Needtosupportdatadependentbranches.

65

Page 66: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplates

TheDevilisinthedetails.

• Cachingtasksimpliessta2cbehavior:

–  Templatesanddynamicscheduling?

•  Reac2veschedulingchangesforloadbalancing.

•  Schedulingchangesatthetaskgranularity.

–  Templatesanddynamiccontrolflow?

•  Needtosupportnestedloops.

•  Needtosupportdatadependentbranches.

66

Page 67: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesEdits

• Ifschedulingchanges,evenslightly,thetemplatesareobsolete.

– Forexamplemigra2ngtasksamongworkers.

• Insteadofpayingthesubstan2alcostofinstallingtemplatesforeverychanges,templatesallowedit,tochangetheirstructure.

• Editsenableaddingorremovingtasksfromthetemplateandmodifyingthetemplatecontent,in-place.

• Controllerhasthegeneralviewofthetaskgraphsoitcanupdatethedependenciesproperly,neededbytheedits.

67

Page 68: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesEdits

68

Controller

Worker

TaskGraph

DataObjects

CTemplate

Worker

DataObjects

Template

Migrateonetask

Page 69: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesEdits

69

Controller

Worker

TaskGraph

DataObjects

CTemplate

Worker

DataObjects

Template

Edit<remove>Edit<add>

Page 70: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesEdits

70

Controller

Worker

TaskGraph

DataObjects

CTemplate

Worker

DataObjects

Template

Page 71: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesEdits

71

Controller

Worker

TaskGraph

DataObjects

CTemplate

Worker

DataObjects

Template

Instantiate<params> Instantiate<params>

Page 72: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplates

TheDevilisinthedetails.

• Cachingtasksimpliessta2cbehavior:

–  Templatesanddynamicscheduling?

•  Reac2veschedulingchangesforloadbalancing.

•  Schedulingchangesatthetaskgranularity.

–  Templatesanddynamiccontrolflow?

•  Needtosupportnestedloops.

•  Needtosupportdatadependentbranches.

72

Page 73: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesGranularity

73

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

Page 74: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesGranularity

74

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

• Themoretaskscachedinthetemplatethebeder.

– Thecostoftemplateinstan2a2onisamor2zedovergreaternumberoftasks.

– Butloopunrollingonlyworksforsta2ccontrolflow.

Page 75: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesGranularity

75

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

Template

Page 76: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesGranularity

76

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

Template

Page 77: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesGranularity

77

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

Template

• Cannotreusethetemplate(onlytwoitera2onsoftheinnerloop).

Page 78: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesGranularity

78

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

• Templatescannotgobeyondabranchinthedriverprogram.

• Execu2ontemplatesoperatesatthegranularityofbasicblocks:

–  Acodeblockwithsingleentryandnobranchesexceptattheend.

–  Itisthebiggestblockwithoutsacrificingdynamiccontrolflow.

Page 79: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesGranularity

79

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

Template1

Page 80: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesGranularity

80

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

Template1 Instan2ateTemplate1

Instan2ateTemplate1

Instan2ateTemplate1

Instan2ateTemplate1

Page 81: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesGranularity

81

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

Template2

Page 82: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesGranularity

82

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

Template2 Instan2ateTemplate2

Page 83: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesGranularity

83

Controller

Worker Worker

TaskGraph

DataObjects DataObjects

C

StartTemplate

EndTemplate EndTemplate

StartTemplate

Page 84: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesGranularity

84

Controller

Worker Worker

TaskGraph

DataObjects DataObjects

CTemplate

Template

Page 85: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesPatching

85

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

• Withdynamiccontrolflowabasicblockcanhavedifferententries.

• Theexecu2onstateisnotsimilarinallcircumstances.

Page 86: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesPatching

86

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

Instan2ateTemplate1

Instan2ateTemplate1

Page 87: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesPatching

87

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

Instan2ateTemplate1

Instan2ateTemplate1

Page 88: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesPatching

88

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

Instan2ateTemplate1

Instan2ateTemplate1

Page 89: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesPatching

89

TrainingData

Es,ma,onData

Parameters

ErrorEs,ma,onItera,veOp,mizer

Coeffi

cien

ts

Instan2ateTemplate1

Instan2ateTemplate1

Updatedmodelparametersonlyonthereducer

Page 90: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesPatching

• Eachtemplatehasasetofprecondi<onsthatneedtobesa2sfiedbeforeitcanbeinstan2ated.

– Forexamplethesetofdataobjectsinmemory,accessedbythetaskscachedinthetemplate.

90

Page 91: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesPatching

91

Controller

Worker Worker

TaskGraph

DataObjects DataObjects

CTemplate

Template

Page 92: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesPatching

92

Controller

Worker Worker

TaskGraph

DataObjects DataObjects

CTemplate

Template

Precondi2ons Precondi2ons

Page 93: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesPatching

• Eachtemplatehasasetofprecondi<onsthatneedtobesa2sfiedbeforeitcanbeinstan2ated.

– Forexamplethesetofdataobjectsinmemory,accessedbythetaskscachedinthetemplate.

• Workerstatemightnotmatchtheprecondi2onsofthetemplateinallcircumstances.

• Controllerpatchestheworkerstatebeforetemplateinstan2a2on,tosa2sfytheprecondi2ons.

93

Page 94: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesPatching

94

Controller

Worker Worker

TaskGraph

DataObjects DataObjects

CTemplate

Template

Precondi2ons Precondi2ons

Page 95: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesPatching

95

Controller

Worker Worker

TaskGraph

DataObjects DataObjects

CTemplate

Template

Precondi2ons Precondi2ons

Patch<load>

Page 96: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesPatching

96

Controller

Worker Worker

TaskGraph

DataObjects DataObjects

CTemplate

Template

Precondi2ons Precondi2ons

Page 97: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesPatching

97

Controller

Worker Worker

TaskGraph

DataObjects DataObjects

CTemplate

Template

Instantiate<params> Instantiate<params>

Precondi2ons Precondi2ons

Page 98: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Execu2onTemplatesPatching

98

Controller

Worker Worker

TaskGraph

DataObjects DataObjects

CCTemplate

Template

Precondi2ons Precondi2ons

Page 99: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

99

Execu2onTemplatesMechanismsSummary

•  Instan<a<on:spawnablockoftaskswithoutprocessingeachtaskindividuallyfromscratch.Ithelpsincreasethetaskthroughput.

•  Edits:modifiesthecontentofeachtemplateatthegranularityoftasks.Itenablesfine-grained,dynamicscheduling.

•  Patches:Incasethestateoftheworkerdoesnotmatchtheprecondi2onsofthetemplate.Itenablesdynamiccontrolflow.

Page 100: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Thistalk

•  ControlPlane:theEmergingBodleneck

•  DesignScopeoftheControlPlane

•  Execu2onTemplates

•  Nimbus:aFrameworkwithTemplates

•  Evalua2on

100

Page 101: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

101

Nimbus•  Nimbusisdesignedforlowlatency,fastcomputa2onsinthecloud.

–  ImplementedinC++(thecorelibraryis~35,000semicolons).

–  Mutabledatamodeltoallowin-placeopera2ons.

•  Nimbusembedsexecu2ontemplatesforitscontrolplane.

–  Thecentralizedcontrollerallowsdynamicschedulingandresourcealloca2on.

–  Execu2ontemplateshelpdeliverhightaskthroughputatscale.

•  Nimbussupportstradi2onaldataanaly2csaswellasEulerianandhybridgraphicalsimula2ons;forthefirst2meinacloudframework.

–  Supervised/unsupervisedlearningalgorithms,graphlibrary.

–  PhysBAMlibrary(water,smoke,etc.)

Page 102: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

NimbusControlFlow

102

Controller

Worker Worker

Controller

Worker Worker

Controller

Worker Worker

Controller

Worker Worker

• Tasksspawnothertasksforexecu2on(similartoLegion).

• Driverprogramisalineageoftasksexecu2ngontheworkers.

• MoreflexibleDAGforthetaskgraph.

• Notjustnarrowandwidedependencies.• Neededforgraphicalsimula2ons.

Page 103: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

NimbusControllerandWorkerTemplates

103

Controller

Worker Worker Worker Worker

Instantiate

ControllerTemplates

Page 104: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

NimbusControllerandWorkerTemplates

104

Controller

Worker Worker Worker Worker

ControllerTemplates

Page 105: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

NimbusControllerandWorkerTemplates

105

Controller

Worker Worker Worker Worker

WorkerTemplates

Inst.Inst.Inst.Inst.

Page 106: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

NimbusControllerandWorkerTemplates

106

Controller

Worker Worker Worker Worker

CCC

WorkerTemplates

Page 107: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

NimbusGraphicalSimula2ons

107

Driver Program:

Partition prt = {2, 1, 1};Create(velocity, prt);

Op(exec: advect, data: velocity, read: core/ghost, write: ghost);

...

Laun

cher

PhysicalData

Mappings

Controller

BA C D

21 34 5 6

LogicalDataCopyTasks

TranslatorManager

app.so

21 3

TranslatorManager

4 5 6

app.so

PhysicalTask(advect, {1,2,3})

PhysicalTask(advect, {4,5,6})LogicalTask(advect, {A,B,C})LogicalTask(advect, {B,C,D})

CompuBngNodes

GeometricTask(advect, left_reg)GeometricTask(advect, right_reg)

ApplicaBonData

• Thegoalistoautoma2callydistributesequen2allibrarykernels.• Fourlayerdataabstrac2on(geometric,logical,physical,applica2on).• Automa2ctransla2onandcachingbetweenthedatalayers.

Page 108: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

nimbus.stanford.edu

108

• Formoreinforma2onyoucanvisitNimbuswebsite.

Page 109: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Thistalk

•  ControlPlane:theEmergingBodleneck

•  DesignScopeoftheControlPlane

•  Execu2onTemplates

•  Nimbus:aFrameworkwithTemplates

•  Evalua2on

109

Page 110: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

• Controlplanetaskthroughput:– Execu2ontemplatesmatchthestrongscalingperformanceofframeworkswithdistributedcontrolplanedesign.

• Dynamicscheduling:

– Execu2ontemplatesallowslowcost,reac2veschedulinganddynamicresourcealloca2onsimilartoacentralizedframeworks.

• Dynamiccontrolflow:– Execu2ontemplatescanhandleapplica2onswithnestedloopsanddatadependentbrancheswithlowoverhead.

110

Evalua2onResultsSummary

Page 111: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Evalua2onStrongScalabilitywithTemplates

111

• Logis2cregressionoverdatasetofsize100GB.• Spark-optandNaiad-opt,runstasksasfastasC++implementa2on.• Nimbuscentralizedcontrollerwithexecu2ontemplatesmatchestheperformanceofNaiadwithadistributedcontrolplane.

Page 112: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Evalua2onReac2ve,Fine-GrainedSchedulingwithTemplates

112

Migra2ng5%ofthetasks

• Logis2cregressionoverdatasetofsize100GB,on100workers.• Naiad-optcurveissimulated(migra2onsevery5itera2ons).• Execu2ontemplatesallowlowcost,reac2veschedulingchangesthrougheditsattaskgranularity.

• Singleeditoverheadisonly41μs(inaverage).

Page 113: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Evalua2onDynamicResourceAlloca2onwithTemplates

113

• Logis2cregressionover100GBofdata,on50/100workers.• One-2metemplateinstalla2oncostis~40%ofdirecttaskscheduling.• Nimbusallowsdynamicresourcealloca2on.• Nimbusinstallsmul2pleversionsofatemplatedependingonresources.

x100x100 x50

Valida2ngprecondi2onsfortemplatereuse

Installingnewtemplates

Page 114: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Evalua2onHighTaskThroughputwithTemplates

114

• SparkandNimbusbothhavecentralizedcontroller.• Nimbustaskthroughputscalessuperlinearlywithmoreworkers.

• O(N2):moretasksandshortertasks,simultaneously.• Forataskgraphswithsinglestage:

• Instan2a2oncostis<2μspertask(500,000taskspersecond).

Page 115: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Evalua2onGraphicalSimula2onsDistributedinNimbus

115

Page 116: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Evalua2onComplexi2esofGraphicalSimula2ons

116

Levelset Posi2vePar2cles Posi2veRemovedPar2cles

Velocity Nega2vePar2cles Nega2veRemovedPar2cles

• 40differentvariables:scalar,vector,par2cle.• Triplynestedloopwithdatadependentbranches.

• 9differenttemplates(basicblocks).• 3branchesthatneedpatching.

Page 117: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Evalua2onSpeedupwithTemplates

117

• Canonicalwatersimula2onsunderNimbusandMPI.• Withouttemplates,Nimbusisalmost6xslowerthanMPI.• Slowdownmeanseitherlowerresolu2onormore2me/money.

Page 118: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Evalua2onSpeedupwithTemplates

118

• Canonicalwatersimula2onsunderNimbusandMPI.• Withouttemplates,Nimbusisalmost6xslowerthanMPI.• Slowdownmeanseitherlowerresolu2onormore2me/money.

$180

$30

Page 119: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Evalua2onComparisonwithHand-TunedMPI

119

• Canonicalwatersimula2onsunderNimbusandMPI.• Nimbusperformanceiswithin3-15%ofthehand-tunedMIP.• At512cores,therearemorethan1milliondis2nctdataobjectsandtaskthroughputpicksat460,000taskspersecond.

Page 120: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Evalua2onLoadBalancingandFaultRecoverywithTemplates

120

0 20 40 60Time (minute)

0

200

400

Itera

tion N

um

ber Enabled

Disabled

rewind from checkpointcheckpoint

checkpointone node fails

one node straggles

• Nimbuscontrolleradaptstothestragglersandworkerfailures.• Templatesareseamlesslyinstalledasschedulechanges.

Page 121: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Contribu2ons

•  Demonstra2nghowthecontrolplaneistheemergingbo=leneckfordataanaly2csframeworks.

•  Execu<onTemplatesasanabstrac2onforthecontrolplaneofcloudcompu2ngframeworks,thatenablesordersofmagnitudehighertaskthroughput,whilekeepingthefine-grained,flexiblescheduling.

•  Thedesign,implementa2on,andevalua2onofNimbus,adistributedcloudcompu2ngframeworkthatembedsexecu2ontemplates.

•  Ademonstra2onofasingle-coregraphicalsimula<onthatNimbusautoma<callydistributesinthecloudshowingexecu2ontemplatesinprac2ceforcomplexapplica2ons.

121

Page 122: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

Conclusion

122

ControlPlaneDesign ExampleFramework

TaskThroughput

TaskScheduling

Centralized

MapReduce

Low DynamicHadoop

Spark

DistributedNaiad

High Sta2cTensorFlow

Centralizedw/Execu2onTemplates

Nimbus High Dynamic

Page 123: Fast, Distributed Computa2ons in the Cloud · Contribu2ons • Demonstra2ng how the control plane is the emerging bo=leneck for data analy2cs frameworks. • Execu

123

ThankYou!