boost your efficiency when dealing with multiple jobs on ... · let's help the scheduler!...

84
Boost your efficiency when dealing with multiple jobs on the Cray XC40 supercomputer Shaheen II Samuel KORTAS June 5 th t 2016 KAUST Supercomputing Laboratory KSL Workshop Series

Upload: others

Post on 26-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Boost your efficiency when dealingwith multiple jobs on the CrayXC40 supercomputer Shaheen II

Samuel KORTAS June 5th t 2016KAUST Supercomputing LaboratoryKSL Workshop Series

Page 2: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

• A few tips when dealing with numerous jobs• Slurm way (up to a limit)• Four KSL tools to move you further

– Breakit (1 to 10000s, all same)

– KTF (1 to 100, tuned)

– Avati (1 to 1000s, programmed)

– Decimate (dependent jobs)

• Hands-out session: /scratch/tmp/ksl_workshop• Documentation on hpc.kaust.edu.sa/1001_jobs (to be

completed today)

• Conclusion

Agenda

Page 3: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Launching thousands of jobs…

• Some of our users use shaheen to explore parameterssweeping involving thousands of jobs saving thousands oftemporary files

• Need a result in a guaranteed time

• Are not hpc experts, but are challenging problem in terms ofscheduling and file system stress.

• Implement complex workflows sending the output of one codeinto the input of others and producing a lot of small files

Page 4: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Scheduling thousands of jobs

• KSL does its best… but it's not that easy folks! →The tetris Game gets rough with long rectangles ;-(

6144 Nodes availables

Time

`

X 1000s !!!!

Page 5: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Let's help the scheduler! (1/5)Putting the right elapsed time

Page 6: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Let's help the scheduler! (2/5)Let's share resources better among us ● Current policy of scheduler is first in first served● Your priority increases as long as you are waiting

'actively' in the queue,● hold or dependent jobs are not counted● Slurm takes into account your backfilling potential● But we have to share guys…

→ number of jobs in the queue is limited● Fair share slurm implementation is reported to work well

with only a small number of projects

Page 7: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Let's help the scheduler! (3/5)Let's lower the stress on the filesystem ● Each one of the 1000s jobs may need to read, probe or

write a file.● We got a unique filesystem shared by all the jobs, let's

save it● Lustre is not tuned for little files● → Let's use ramdisk when it's possible and save data

that matters to Lustre (see next slide)● → Let's communicate in memory instead of via files● → Let's choose the right stripe count

Page 8: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Let's help the scheduler! (4/5)How to use ramdisk?● On each shaheen II computing, /tmp is a ramdisk, a POSIX

filesystem hosted directly in memory– → starting at 64 GB, it shrinks as your program uses more and more memory

● → an additional memory requests or a write in /tmp fails when : size(OS) + size(program instructions) + size(program variable) + size(/tmp) > 128 GB

– Still /tmp is the fastest filesystem of all (compared to lustre and datawarp)

– But it's distibuted and lost at the end of the job.

● → think of storing temporary files in /tmp and save them atthe end of the job

● → think of storing frequently accessed files in /tmp

Page 9: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Let's help the scheduler! (5/5)Off-loading the cdls to compute nodes● You may need to

– Pre/postprocess

– Monitor a job

– Relaunch it

– Get notified when it's starting or ending...

● Automate all this and move the load from the cdl to thecompute nodes– Use #SBATCH mail-user

– Use breakit, ktf, maestro, decimate

– Ask KSL team for help: it's only a script away

Page 10: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Managing 1001 jobs 1 - the SLURM waysubmitting Arrays...

Page 11: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Slurm Way (1/3)

● Slurm can Submit and manage collection of similar jobseasily → job_array

● To submit 500 element job array:sbatch --array=1-500 -N1 -i my_in_%a -o my_out_%a job.sh

where “%a” in file name mapped to array task ID (1 – 500)

● squeue -r -user <my_user_name> 'unfolds' job queued as job array

● More info at http://slurm.schedmd.com/job_array.html

Page 12: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Slurm Way (2/3)Job environment variables

• squeue and scancel commands plus some scontrol options can operate on entire job array or select task IDs

• squeue -r option prints each task ID separately

Page 13: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Slurm Way (3/3)Job example

Possible commands:

sbatch --array=1-16 my_job

sbatch --array=1-500%20 my_job only allow 20 active running jobs at a given time

Taken from https://rcc.uchicago.edu/docs/running-jobs/array/index.html

Page 14: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Slurm Way But …● Slurm count each job of the array as a job per

se: as for now the total number of jobs in thequeue is limited to 800 jobs per user

● Pending job are not gaining priority● Only one parameter can vary

– → if need to work on several parameter, the script himselfhas to deduce them from the number in the array...

Page 15: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Slurm Way hands-on…● Submit the job

/scratch/tmp/ksl_workshop/slurm/demo.job

As an array of 20 occurrences, check

– the script,

– its output

– The queue

– Cancel it

Page 16: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Slurm Way hands-on…solution● Submit the job

/scratch/tmp/ksl_workshop/slurm/demo.job

sbatch –array=1-20 /scratch/tmp/ksl_workshop/slurm/demo.job

As an array of 20 occurrences, check

– the script,

– its output,

– The queue, → squeue -r --user=<my_user>

– Cancel it → scancel -n <my_job_name>

Page 17: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Managing 1001 jobs?4 KSL open source Tools

Page 18: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Why? Ease your life and centralizesome common developments

breakit ktf maestro decimate

● Soon Available at https://bitbucket.org/kaust_KSL/ (GNU GPL License)

● Written in python 2.7

● Installed on Shaheen II, Portable on workstation, Noor…

● All share common api and internal library engine also available on bitbucket.org/kaust_KSL

● Maintained by KSL (samuel.kortas (at) kaust.edu.sa)

Availaible on shaheen as modules Under development for 2 PIsreleased soon on bitbucket.org

Our Goal:Hiding Complexity

Page 19: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Managing 1001 jobsUsing the breakit wrapper

Page 20: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Breakit (1/3)Idea and status ● To allow you to cope seamlessly with the limit

of 800 jobs● No need to change your job array● Breakit automatically monitors the process for

you

● → version 0.1 –→ I need your feedback!

Page 21: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Slurm way (1/2)How to handle it with slurm?

Max numberof jobs in queueYou

Or prog on cdl

Page 22: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Slurm way (2/2)How to handle it with slurm

Max number of jobs in queueYou

Or prog on cdl

Page 23: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Breakit (2/3)How does it work?

Max number of jobsbreakit

Page 24: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Breakit (2/3)How does it work?

Max number of jobsbreakit

Page 25: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Breakit (2/3)How does it work?

Max number of jobsGone!

Breakit is notactive anymore!

Page 26: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Breakit (2/3)How does it work?

Max number of jobsGone !t

The jobs are starting

Page 27: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Breakit (2/3)How does it work?

Max number of jobs

They submit the next jobs witha dependency

Page 28: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Breakit (2/3)How does it work?

Max number of jobs

First stop are donedependency is solvedNext ones are prending

Page 29: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Breakit (2/3)How does it work?

Max number of jobs

They submit the next jobs witha dependency

Page 30: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Breakit (2/3)How does it work? ● Instead of submitting all the jobs, they are

submitted by chunks– Chunk #n is running or pending

– Chunk #n+1 is depending on Chunk #n,● Starts only when every jobs of chunk #n have completed● Submit Chunk #n+2 setting a dependency on Chunk # n+1

● ….We did offload some task from the cdl oncompute nodes ;-)

Page 31: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Breakit (3/3)How to use it? 1) Load the breakit module

module load breakitman breakit (to be completed)breakit -h

2) Launch your job:breakit --job=your.job –array=<nb of jobs> --chunk=<max_nb_of_jobs_in_queue>

3) Manage it:

squeue -r -u <user> -n <job_name>

scancel -n <job_name>

Page 32: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Breakit Hands on• Via breakit submit an array of 100 occurrences of job

/scratch/tmp/breakit/demo.job only having 16 jobssimultaneously in the queue

Page 33: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Breakit Hands on (solution)• Via breakit submit an array of 100 occurrences of job

/scratch/tmp/breakit/demo.job only having 16 jobssimultaneously in the queue

module load breakit

breakit --job=/scratch/tmp/breakit/demo.job --range=100 --chunk=16

Page 34: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Breakit Next steps• Find a better name!

• Support all array range (not only 1-n)

• Provide an easy restart

• Provide an easier way to kill jobs

Page 35: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Managing 101 jobs Using KTF

Page 36: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF Idea ● At a certain point, you may need:

– to evaluate the performance of a code under different conditions,

– to run a parametric study.

● the same executable is run several times with a different set ofparameters– Physical values characterizing the problem,

– number of processors, threads and/or nodes

– compiler used

– compiling option

– parameters passed on the srun command line to experiment different placement strategies

– …

● KTF (Kaust Test Framework) can help you on this!

Page 37: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

What is KTF?

● KTF (Kaust Test Framework) has been designed and usedduring Shaheen II procurement in order to ease– Generation

– Submission

– Monitoring

– Result collecting

● Written in python 2.7● Self-contained and portable● Available on bitbucket.org/kaust_KSL/ktf

Of a set of jobs depending ona set of parameters to explore.

Page 38: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

How does KTF works?A few definitions● An 'experiment'● A case is one single run of this experiment with

a given set of parameters● A test gathers a number of cases

Page 39: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

How does KTF works?

● KTF relies on– A centralized file listing all combinations of parameters to

address : ie shaheen_cases.kt

– A set of template files where the parameters needs to bereplaced before the submission in all files ending by .template

Page 40: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF hands-on! (1/)Initialize environment1) Load the environment, and check that ktf is available

module load ktfman ktfktf -h

2) Create and initialize your working directory

mkdir <my_test_dir>cd <my_test_dir>ktf --init

→ you should get a ktf-like tree structure with some example of centralize case filesand associated templates

3) Examine the case file shaheen_cases.ktf, understands the ktf syntax, modify parametersand check your change by listing all the combinations

ktf --exp

Page 41: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF Centralized case file (see file shaheen_zephyr0.ktf)

According to this case file, for the third test case, in each file ending by .template:✔ __Case__ will be replaced by 128✔ __Experiment__ will be replaced by zephyr/stong✔ __NX__ will be replaced by 255✔ __NY__ will be replaced by 255✔ __NB_CORES__ will be replaced by 128✔ __ELLAPSED_TIME__ will be replaced by 0:05:00

● # is a comment → not parsed by KTF● First line gives the name of the parameters● Case and Experiment are absolutely mandatory● Each line following is a test case, setting value for EACH of parameter

← KTF comment

← third test case

← list of parameters

Page 42: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF Directory initial structure

ktf

←default case file

←one experiment directory

←subdirectory containing files common to all the experiments

←one experiment directory

Page 43: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF job.shaheen.template(see files in tests/zephyr/strong/)

← third test case

← list of parameters

← KTF comment

← file job.shaheen.template

./zephyr input

Page 44: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF job.shaheen.template(see files in tests/zephyr/strong/)

← third test case

← list of parameters

← KTF comment

← file input.template

Page 45: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF commands ktf ...

… --help : get help on command line

… --init : initialize the environment copying example .template and .kt files

… --build : generate all combination listed in the case file

… --launch: generate all combination listed in the case file and submit them

… --exp : list all combination present in the case .ktf file

… --monitor: monitor all the experiments and displays all results in a dashboard

… --kill : kill all jobs related to this ktf session

… --status : list all stamp dates and cases of the experiments made or currently occuring

Page 46: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF hands-on! (2/)Prepare a first experiment

4) Examine the case file shaheen_cases.ktf, understands thektf syntax, modify parameters and check your change bylisting all the combinations

ktf --exp

5) Build an experiment and check that the templated fileshave been well processed

ktf --build→ should create one tests_ directories : tests_shaheen_<date>_<time>

Page 47: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF Directory KTF Directory after --build

←Initial template

← Third case

Page 48: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF Directory KTF Directory after --launch

←job.shaheen processed from job.shaheen.template

←input processed from input.template

Zephyr is copied from the←common directory

Page 49: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF Centralized case file Handling constant parameters

← KTF comment

← third test case

← list of parameters

← list of parameters

←#KTF pragma declaring new parameters that will keep same value ever after

…. strictly identical to File shaheen_zephyr1.ktf

File shaheen_zephyr0.ktf ….

Page 50: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Another example KTF case fileCase

Experiment

Experiment

Page 51: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF filters and flags ktf --xxx ...

… --case-file=<case file> : use another case files thanshahen_cases.kt

… --what=zzzz : filters on some cases

… --reservation=<reservation name> : submit within a reservation

● ktf --exp --what=128

● ktf --launch –what=64 --reservation=workshop

● ktf --exp –case-file=shaheen_zephyr1.ktf

Page 52: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF filters and flags ktf --xxx ...

… --ktf-file=<case file> : use another case files than shahen_cases.ktf

… --what=zzzz : filters on some cases

… --when=yyyy | --today | --now : filters on some date stamps

… --times=<nb>: repeat submission <nb> times

… --info : switch on informative traces

… --info-level=[0|1|2|3] : change informative trace level

… --debug : switch on debugging traces

… --debug-level=[0|1|2|3] : change debugging trace level

Page 53: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF hands-on! (3/)Playing with –what filter4) Examine the case file shaheen_cases.ktf, understands the ktf syntax,

modify parameters and check your change by listing all thecombinations with or without filtering and using other cases files

ktf --expktf --exp --what=<your filter>ktf --exp –case-file=shaheen_zephyr1.ktf

5) Build an experiment and check that the templated files have been wellprocessed

ktf --buildktf --build --what=<your filter>

→ should create two tests directories from where you call ktf tests_shaheen_<date>_<time>

Page 54: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF hands-on! (3/)launch and monitor our first experiment

6) Build an experiment and submit it

ktf –launch [ --reservation=workshop ]→ should create a new tests directory and spawn the jobs ./tests_shaheen_<date>_<time>

ktf --monitor→ will monitor your current ktf session→ check what shows in the R/ directory

7) Play with repeating experiments and filtering results

ktf --launch --what=<your filter> [ --reservation=workshop ]ktf --launch --times=5 [ --reservation=workshop ]ktf --monitorktf --monitor --what=<your case filter> --when=<your date filter>

→ check what shows in the R/ directory

Page 55: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF results dashboardreading the result dashboard

% ktf --monitor

Page 56: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF results dashboardreading the result dashboard

When

What

% ktf --monitor

Status

Time

Subdir

in R/

Not

finished yet

!Job.err

not empty

Page 57: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF R/ directoryquick access to results

• This R/ directory is updated each time you call kt--monitor

• It builds symbolic links to the results directory in order toprovide you quick access to the results you want to check.

Page 58: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF R/ directoryquick access to results directory

^

Page 59: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF results configurationimplementation and default printing

● In fact… alias ktf = python run_test.py alias ki = python run_test.py --init alias km = python run_test.py --monitor

● In run_test.py, is encoded the value to be displayed in the dashboard(printed when calling –monitor)

● By default, it is <ellapsed time taken by the whole test>/<status of the test>

with a '!' after the status if ever job.err is not empty… with a '!' before the status if ever the job is not terminated properly remember you can use

cat or more or tail R/*/job.err to scan all these files!

Page 60: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF results configurationchanging default printing

● But you can change the displayed values at will! And adaptit to your own needs:● Other values: Flops, intermediate results, total number of

iterations, convergence rate,● Several values : <flops>/<time>/<status>● Other event to trigger the '!' sign● Other typographic signs

● → how to do it…

Page 61: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF run_test.py file

Page 62: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF hands-on! (5/)modifying the result printed

8) Check what ktf prints of it:

ktf --monitorand understand how run_test.py is working

9) Modify run_test.py in order to print the time per iteration

Page 63: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF Next steps• Gather tests into campaign

• Have a better display --monitor option, Web interface,Automated generation of plots

• Enrich the filtering feature : regular expression, severalfilters possible

• Enable coding capability inside the case file

• Complete the documentation

• Save results into database and be able to compute statistics

• Cover the compiling step

Page 64: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

KTF Next steps• Support –clean and campaigns

• Chains several jobs into one

• Support job arrays, dependencies, mail to user

• Port on Noor and workstation

• Offload from workstation to shaheen

• Better versioning of the template file

• Decline one ktf initial environment per science fields

Page 65: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Managing 1001 jobs using Maestro

Page 66: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Maestro principles (1/2)

• Handling these studies should be same on:– A linux box– Shaheen, Noor, Stampede…– A laptop under windows or mac OS– A given set of linux boxes

• The only prerequisite:– Python > 2.4 and MPI on a supercomputer– Python > 2.4 on a workstation

Page 67: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Maestro principles (2/2)

• Minimal or no knowledge of HPCenvironment required

• Easy management of the jobs handled asa whole.

Page 68: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

A set of tools adapted to a distributedexecution (1/3)

• No pre-installation needed on the machines: maestro isself contained

• Easy and quick prototyping on workstation withimmediate porting on supercomputer

• Global Error signals easy to throw and trace

• Global handling of the jobs as a whole study (launching,monitoring, killing and restarting through one command)

Page 69: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

A set of tools adapted to a distributedexecution (2/3)

• All the flexibility of python available to the user in adistributed environment (class inheritance, modules…) production of code robust, easy to read with anexplicit error stack in case of problem to debug

• Transparent replication of the environment on each ofthe compute nodes

• Work in /tmp of each compute node to minimize thestress of the filesystem

Page 70: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

A set of tools adapted to a distributedexecution (3/3)

• Extended Grep (multi-line, multi-column, regular expressions) topostprocess the output files

• Centralized management of the template to replace

• Global selection of files to be kept and parametrization of thereceiving directory

• A console to explore easily subdirectories where results aresaved

• Each running process can write in a same global file

Page 71: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Maestro Principles

maestro

Page 72: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Maestro Principles

maestro

MaestroAllocate A pool of Nodes andruns elementary jobin it

Page 73: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Maestro Principles

maestro

MaestroAllocate A pool of Nodes andruns elementary jobin it

Page 74: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Maestro Principles

maestro

MaestroAllocate A pool of Nodes andruns elementary jobin it

Page 75: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

An example

File to saveFile to save

Directory name whereResults are saved

Directory name whereResults are saved

Elementary computationSending local andGlobal messages

Elementary computationSending local andGlobal messages

Parametrized Z rangeParametrized Z range

Definition of the domainto sweep

Definition of the domainto sweep

Page 76: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Command line options

<no option> : classical sequential run on 1 core stopping at the first error encountered--cores=<n> : parallel run on n cores--depth=<p> : partial parallelisation up to level p--stat : live status of ongoing computation--reservation=<id> : run inside a reservation--time=hh:mm:ss : set the elapsed duration of the overall job--kill : kills ongoing computation and clean environment--resume : resume a computation--restart : restart from scratch a computation--help : help screen

Page 77: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

• Demo!

Page 78: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Next Steps

• Allowing maestro to launch multicore jobs

• More clever sweeping algorithms decime project

• Support of a given set of workstation

• Coupling maestro with website

• Remote launching and dynamic off-loading fromworkstation to supercomputer

Page 79: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Managing depedent jobs incomplex workflow Using Decimate

Page 80: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Idea

• Some workflow involve several steps dependingof one another

– → several jobs with a dependency betweenthem

• Some intermediate steps may break

– → dependency will break

– → the workflow will remain idle, requesting anaction

• We want to automate it

Page 81: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

What is decimate?Add-ons and goodies• Tool in python written for two different PIs

with the same need• Launch, monitor, heal dependent jobs• Make things automated and smooth

Page 82: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

What is decimate?

• Add-ons

– Centralized log files,

– Global –resume, --status and –kill command

– Sends a mail at any time to the user to keep himupdated

– Can make decision when dependency is broken● Relaunch same job again and fix dependency● Change input data, relaunch and fix dependency● cancel only this job and move on.● Cancel the whole workflow.

Page 83: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Some example of workflow

Page 84: Boost your efficiency when dealing with multiple jobs on ... · Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read,

Conclusion

slurm breakit ktf maestro decimate

Typical #job

< 800 > 800 100 1-1000 ?

Job are same same different different different

parameter 1 1 several many any

#nodes/job same same any same Any

dependent One at atime

One at atime

no no yes

We have presented some useful tools to handle many jobs at a time

Your feedback is [email protected]