beating gridlock: parallel programming with sas® grid … · 2020-02-02 · beating gridlock –...

47
Thursday, February 20, 2014 Beating Gridlock: Parallel Programming with SAS® Grid Computing and SAS/CONNECT® Jack Fuller Experis Business Analytics

Upload: others

Post on 03-Jul-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Presenter’s Name

Thursday, February 20, 2014

Beating Gridlock: Parallel Programming with SAS® Grid Computing and SAS/CONNECT®

Jack FullerExperis Business Analytics

Page 2: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 2

Introduction – The Problem

• A SAS program is taking a very long time to execute

• The program is constructed of multiple, independent subtasks

• Simulation programs frequently exhibit this behavior

Page 3: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 3

Introduction – The Solution

• Split the program into a driver program and a set of subprograms which run in parallel

ElapsedRealTime

TotalCPUTime

Page 4: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 4

Overview

• When to use parallel processing

• How to use parallel processing

• Points to consider

Page 5: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

When to Use Parallel Processing

Page 6: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 6

Independent Subtasks

Initialization

Finalization

IndependentSubtask 3

IndependentSubtask 2

IndependentSubtask 1

Page 7: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 7

Independent Subtasks

Initialization

Finalization

IndependentSubtask 3

IndependentSubtask 2

IndependentSubtask 1

Page 8: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 8

Independent Subtasks

Initialization

Finalization

IndependentSubtask 3

IndependentSubtask 2

IndependentSubtask 1

Page 9: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 9

Independent Subtasks

Initialization

Finalization

IndependentSubtask 3

IndependentSubtask 2

IndependentSubtask 1

Page 10: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 10

Independent Subtasks

Initialization

Finalization

IndependentSubtask 3

IndependentSubtask 2

IndependentSubtask 1

Page 11: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 11

Independent Subtasks

Initialization

Finalization

IndependentSubtask 3

IndependentSubtask 2

IndependentSubtask 1

Page 12: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 12

Independent Subtasks

Initialization

Finalization

IndependentSubtask 3

IndependentSubtask 2

IndependentSubtask 1

SAS Connect 9.3User’s Guide

Server

Server

ClientClient Client

Page 13: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 13

Independent Subtasks

Initialization

Finalization

IndependentSubtask 3

IndependentSubtask 2

IndependentSubtask 1

Grid Computingin SAS 9.3

Task

Task

SubtaskSubtask Subtask

Page 14: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 14

Types of Concurrency

• Data Parallelism - subtasks work on different pieces of data– Subtask 1 loads observations 1-10,000– Subtask 2 loads observations 10,001-20,000– …

• Task Parallelism - subtasks perform different functions– Subtask 1 processes demographics data– Subtask 2 processes vital signs data– …

Page 15: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 15

Amdahl’s Law

• P is the number of processors• is the fraction of work that is not parallelized

Describes the optimal increase in speed gained from converting a serial process to a

parallel process

Page 16: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 16

Amdahl’s Law as

• P is the number of processors• is the fraction of work that is not parallelized

As the number of processors (P) increases, the increase in speed is constrained by the fraction of work that is not parallelized (α)

Page 17: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 17

Graphical Representation of Amdahl's Law

Page 18: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

How to Use Parallel Processing

Page 19: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 19

Our Problem – Creating Test Data

do pt=1 to 10000;if uniform(0) > .40 then do;

gender = ‘M’;/* Generate test data */

end;else do;

gender = ‘F’;/* Generate test data */

end;output;

end;

How do we want to define our concurrency?

Page 20: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 20

Our Problem – Creating Test Data

do pt=1 to 10000;if uniform(0) > .40 then do;

gender = ‘M’;/* Generate test data */

end;else do;

gender = ‘F’;/* Generate test data */

end;output;

end;

Data Parallelism

Page 21: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 21

do pt=1 to 10000;if uniform(0) > .40 then do;

gender = ‘M’;/* Generate test data */

end;else do;

gender = ‘F’;/* Generate test data */

end;output;

end;

Our Problem – Creating Test Data

Task Parallelism

Page 22: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 22

do pt=1 to 10000;if uniform(0) > .40 then do;

gender = ‘M’;/* Generate test data */

end;else do;

gender = ‘F’;/* Generate test data */

end;output;

end;

Our Problem – Creating Test Data

Let’s choose task parallelism by SDTM domain

Page 23: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 23

Example – Initialization

%let seed = 0;%let numPt = 1000;

%let rc=%sysfunc(grdsvc_enable(_all_, resource=SASApp));

options autosignon;

libname shared "%sysfunc(pathname(work))";

Passed to each subtask

Page 24: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 24

Example – Initialization

%let seed = 0;%let numPt = 1000;

%let rc=%sysfunc(grdsvc_enable(_all_, resource=SASApp));

options autosignon;

libname shared "%sysfunc(pathname(work))";

Enable the grid service

Page 25: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 25

Example – Initialization

%let seed = 0;%let numPt = 1000;

%let rc=%sysfunc(grdsvc_enable(_all_, resource=SASApp));

options autosignon;

libname shared "%sysfunc(pathname(work))";

Allow the program to sign on without user interaction

Page 26: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 26

Example – Initialization

%let seed = 0;%let numPt = 1000;

%let rc=%sysfunc(grdsvc_enable(_all_, resource=SASApp));

options autosignon;

libname shared "%sysfunc(pathname(work))";

Point a LIBREF to the task’s work

library

Page 27: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 27

Example – Create the Initial Data

data pt;do pt=1 to &numPt;

gender = ifc(uniform(&seed) > .45, 'M', 'F');output;

end;run;

Our input data lives in the task’s

work directory

PT GENDER1 …2 …3 ……

Page 28: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 28

Example – Demographics Subtask

%syslput seed=&seed / remote=task1;rsubmit task1 wait=no inheritlib=(shared);

data shared.task1(keep=pt ht wt);set shared.pt;if gender = ‘M’ then do;

/* Compute statistics */end;else do;

/* Compute statistics */end;

run;endrsubmit;

Create a macrovariable in thesubtask andinitialize itto &seed

Page 29: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 29

Example – Demographics Subtask

%syslput seed=&seed / remote=task1;rsubmit task1 wait=no inheritlib=(shared);

data shared.task1(keep=pt ht wt);set shared.pt;if gender = ‘M’ then do;

/* Compute statistics */end;else do;

/* Compute statistics */end;

run;endrsubmit;

Delineate the code

which will run inthe subtask

Page 30: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 30

Example – Demographics Subtask

%syslput seed=&seed / remote=task1;rsubmit task1 wait=no inheritlib=(shared);

data shared.task1(keep=pt ht wt);set shared.pt;if gender = ‘M’ then do;

/* Compute statistics */end;else do;

/* Compute statistics */end;

run;endrsubmit;

Name of the subtask

NOTE: Subtask names are limited to 8 chars

Page 31: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 31

Example – Demographics Subtask

%syslput seed=&seed / remote=task1;rsubmit task1 wait=no inheritlib=(shared);

data shared.task1(keep=pt ht wt);set shared.pt;if gender = ‘M’ then do;

/* Compute statistics */end;else do;

/* Compute statistics */end;

run;endrsubmit;

Specify asynchronous

execution

Page 32: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 32

Example – Demographics Subtask

%syslput seed=&seed / remote=task1;rsubmit task1 wait=no inheritlib=(shared);

data shared.task1(keep=pt ht wt);set shared.pt;if gender = ‘M’ then do;

/* Compute statistics */end;else do;

/* Compute statistics */end;

run;endrsubmit;

Give the subtaskaccess to the

specified library

Page 33: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 33

Example – Demographics Subtask

%syslput seed=&seed / remote=task1;rsubmit task1 wait=no inheritlib=(shared);

data shared.task1(keep=pt ht wt);set shared.pt;if gender = ‘M’ then do;

/* Compute statistics */end;else do;

/* Compute statistics */end;

run;endrsubmit;

PT HT WT1 … …2 … …3 … …… … …

Page 34: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 34

Example – Labs Subtask

%syslput seed=&seed / remote=task2;rsubmit task2 wait=no inheritlib=(shared);

data shared.task2(keep=pt wbc rbc);set shared.pt;/* Compute statistics */

run;endrsubmit;

PT WBC RBC1 … …2 … …3 … …… … …

Page 35: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 35

Example – Vital Signs Subtask

%syslput seed=&seed / remote=task3;rsubmit task3 wait=no inheritlib=(shared);

data shared.task3(keep=pt syst_bp diast_bp);set shared.pt;if gender = ‘M’ then do;

/* Compute statistics */end;else do;

/* Compute statistics */end;

run;endrsubmit;

PT SYST_BP DIAST_BP1 … …2 … …3 … …… … …

Page 36: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 36

waitfor _all_ task1 task2 task3;

data final;merge pt task:;

by pt;run;

signoff _all_;

Example – Finalization

Wait for subtasks

Page 37: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 37

waitfor _all_ task1 task2 task3;

data final;merge pt task:;

by pt;run;

signoff _all_;

PT GENDER HT WT WBC RBC SYST_BP DIAST_BP1 … … … … … … …2 … … … … … … …3 … … … … … … …… … … … … … … …

Example – Finalization

Notice the use of the task work

library

Page 38: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 38

waitfor _all_ task1 task2 task3;

data final;merge pt task:;

by pt;run;

signoff _all_;

Example – Finalization

Sign off all subtasks

Page 39: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Points to Consider

Page 40: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 40

How many subtasks should there be?

• This will depend on a number of factors– The fraction of work that is not parallelized– The number of available grid nodes

• My rule of thumb– I like to benchmark my applications at 5, 10 and 15 subtasks and

then adjust accordingly

Page 41: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 41

How much work should occur in each subtask?

• There has to be enough work in each subtask– SAS/CONNECT has a login process which I have benchmarked

as taking between 5 and 15 seconds– The goal is to have the friction involved with creating/managing

the subtasks become negligible

• My rule of thumb– I like to have each subtask take at around 10 minutes

Page 42: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 42

Should libraries be shared between subtasks?

• Arguments for using INHERITLIB– One thing is in one place

• Arguments against using INHERITLIB– Subtasks can bottleneck as they queue up awaiting access

• My rule of thumb– I like to use INHERITLIB unless I have a reason not to

Page 43: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Conclusion

Page 44: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 44

Conclusion

• Put as much work as possible in the subtasks

• Get something up and running, benchmark it and then adjust things

• Good luck!

Page 45: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Questions

Page 46: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 46

Acknowledgement

• SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

• Other brand and product names are registered trademarks or trademarks of their respective companies.

Page 47: Beating Gridlock: Parallel Programming with SAS® Grid … · 2020-02-02 · Beating Gridlock – MISUG Experis | Thursday, February 20, 2014 41 How much work should occur in each

Beating Gridlock – MISUG

Experis | Thursday, February 20, 2014 47

Thank you…

Jack FullerSenior Application Developer

[email protected]