john%schulman%and%arjun%singh%%kubitron/courses/cs262...task%pipeline%specifica/on%and%scheduling...

1
Task Pipeline Specifica/on and Scheduling John Schulman and Arjun Singh Example Scheduling Results Related Work Overview Research pipelines, such as those o8en found in computer vision or computa;onal biology, o8en consist of a large number of heterogenous programs. This leads to bri@le code that is difficult to maintain, while requiring significant effort to parallelize across mul;ple machines. Our lightweight framework executes pipelines on clusters of machines with minimal effort (and automa;c dependency packaging) while scheduling tasks to minimize ;me un;l comple;on (including file transfer ;me). Task Pipelines L Ruffus: PythonLbased, lightweight, no parallelism L Luigi: Parallelism via Hadoop, requires code changes for parallel execu;on L compmake: Python only, parallelism via SGE and Multyvac L Oozie: DAG workflows for Hadoop, heavy syntax CS 262A Possible Extensions L Dynamically replan upon comple;on of every job L Visualiza;on of pipeline state (see Luigi) L Integra;on with distributed filesystems (e.g. HDFS) L Formulate problem as ILP L Only rerun parts of pipeline that haven’t changed Scheduling L Delay scheduling (used in Spark) L Job shop scheduling Filter Depth Maps (600) Python Create Point Clouds (600) Python Segment Point Clouds (120) C++ class DetectChessboard(Task): input = {'image': 'filename'} output = {'board': 'filename'} def run(self): from scipy.misc import imread import pycb board_size = self.params['board_size'] img = imread(self.input['image']) corners, chessboards = pycb.extract_chessboards(img, use_corner_thresholding=False) pycb.save_chessboard(self.output['board'], corners, chessboards, [board_size]) So@ware Used L CDE: h@p://www.pgbovine.net/cde.html Packages up executables + all dependencies L cloudpickle: pickle (almost) all of Python When/where should each task be computed (given DAG)? L Minimize total ;me to comple;on (makespan). L Consider transfer ;me of files, bandwidth limits, and limited computa;on. Input: L Dependency graph with a set of jobs. Each job has a set of input and output files. L Es;mate of how long each task + transport takes Output: L Tuples specifying computa;on + transporta;on events L Computa;on event: (job, loca;on, start ;me, end ;me) L Transport event: (from loc., to loc., start ;me, end ;me) HillCClimbing Algorithm L Let a denote an assignment of jobs to computers L Let makespan(a) be the makespan when simula;ng a greedy execu;on of a L a best = [1,1,…1]; t best = makespan(a best ) L Repeat num_trials ;mes L a trial = copy(a best ) L Pick a random index of a trial and set to random value L t trial = makespan(a trial ) L update a best if a trial is be@er Simulated Scheduling Experiments Pipeline Execu/on Results 7m 46s New pipeline, local networking, 1 machine 7m 45s New pipeline, no networking, 1 machine 10m 45s Old pipeline (wait for each stage to finish before next) Run-me Experiment User’s Local Machine All code + data Worker (EC2) Worker (EC2) Worker (EC2) Worker (EC2) Worker (EC2)

Upload: others

Post on 18-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: John%Schulman%and%Arjun%Singh%%kubitron/courses/cs262...Task%Pipeline%Specifica/on%and%Scheduling %John%Schulman%and%Arjun%Singh%% Example Scheduling Results Overview Related%Work

Task%Pipeline%Specifica/on%and%Scheduling%John%Schulman%and%Arjun%Singh%%

Example

Scheduling

Results

Related%WorkOverviewResearch%pipelines,%such%as%those%o8en%found%in%computer%vision%or%computa;onal%biology,%o8en%consist%of%a%large%number%of%heterogenous%programs.%This%leads%to%bri@le%code%that%is%difficult%to%maintain,%while%requiring%significant%effort%to%parallelize%across%mul;ple%machines.%Our%lightweight%framework%executes%pipelines%on%clusters%of%machines%with%minimal%effort%(and%automa;c%dependency%packaging)%while%scheduling%tasks%to%minimize%;me%un;l%comple;on%(including%file%transfer%;me).

Task%PipelinesL%Ruffus:%PythonLbased,%lightweight,%no%parallelismL%Luigi:%Parallelism%via%Hadoop,%requires%code%changes%for%parallel%execu;onL%compmake:%Python%only,%parallelism%via%SGE%and%MultyvacL%Oozie:%DAG%workflows%for%Hadoop,%heavy%syntax

CS%262A

Possible%ExtensionsL%Dynamically%replan%upon%comple;on%of%every%jobL%Visualiza;on%of%pipeline%state%(see%Luigi)L%Integra;on%with%distributed%filesystems%(e.g.%HDFS)L%Formulate%problem%as%ILPL%Only%rerun%parts%of%pipeline%that%haven’t%changed

SchedulingL%Delay%scheduling%(used%in%Spark)L%Job%shop%scheduling%

Filter%Depth%Maps%(600)

Python

Create%Point%Clouds%(600)

Python

Segment%Point%Clouds%(120)

C++

class DetectChessboard(Task): input = {'image': 'filename'} output = {'board': 'filename'}

def run(self): from scipy.misc import imread import pycb board_size = self.params['board_size'] img = imread(self.input['image']) corners, chessboards = pycb.extract_chessboards(img, use_corner_thresholding=False) pycb.save_chessboard(self.output['board'], corners, chessboards, [board_size])

So@ware%UsedL%CDE:%h@p://www.pgbovine.net/cde.html%%%Packages%up%executables%+%all%dependenciesL%cloudpickle:%pickle%(almost)%all%of%Python

When/where%should%each%task%be%computed%(given%DAG)?%%%L%Minimize%total%;me%to%comple;on%(makespan).%%L%Consider%transfer%;me%of%files,%bandwidth%limits,%%%%%%and%limited%computa;on.Input:%%%L%Dependency%graph%with%a%set%of%jobs.%Each%job%has%a%set%%%%%of%input%and%output%files.%%%L%Es;mate%of%how%long%each%task%+%transport%takesOutput:%%%L%Tuples%specifying%computa;on%+%transporta;on%events%%%L%Computa;on%event:%(job,%loca;on,%start%;me,%end%;me)%%%L%Transport%event:%(from%loc.,%to%loc.,%start%;me,%end%;me)

HillCClimbing%AlgorithmL%Let%a%denote%an%assignment%of%jobs%to%computersL%Let%makespan(a)%be%the%makespan%when%simula;ng%a%%%%greedy%execu;on%of%aL%abest%=%[1,1,…1];%tbest%=%makespan(abest)L%Repeat%num_trials%;mes%%L%atrial%=%copy(abest)%%L%Pick%a%random%index%of%atrial%and%set%to%random%value%%L%ttrial%=%makespan(atrial)%%L%update%abest)%if%atrial%is%be@er

Simulated%Scheduling%Experiments

Pipeline%Execu/on%Results

7m+46sNew+pipeline,+local+networking,+1+machine7m+45sNew+pipeline,+no+networking,+1+machine

10m+45sOld+pipeline+(wait+for+each+stage+to+finish+before+next)

Run-meExperiment

User’sLocal+MachineAll+code+++data

Worker(EC2)

Worker(EC2)

Worker(EC2)

Worker(EC2)

Worker(EC2)