htcondor user tutorial - university of...

62
1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina Koch, Research Computing Facilitator Center for High Throughput Computing STAT679, November 2, 2017

Upload: doanhuong

Post on 02-Mar-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

1

HIGH-THROUGHPUT COMPUTINGAND YOUR RESEARCH

Christina Koch, Research Computing FacilitatorCenter for High Throughput Computing

STAT679, November 2, 2017

Page 2: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

2

About Me

• I work for the Center for High Throughput Computing (CHTC)

• We provide resources for UW Madison faculty, students and staff whose research problems are too big for their laptops/desktops

Page 3: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

3

Overview• Large Scale Computing• High Throughput Computing• Large Scale Computing Resources and the Center for High Throughput Computing (CHTC)

• Running HTC Jobs at CHTC• User Expectations• Next Steps

Page 4: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

4

LARGE SCALE COMPUTING

Page 5: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

5

What is large-scale computing?

larger than ‘desktop’(in memory, data, processors)

Page 6: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

6

Problem:running many computationstakes a long time

(running on one processor)

time

Page 7: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

7

time

So how do you speed things up?

Page 8: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

8

time

Break up the work!Use more processors! (parallelize)

n processors

Page 9: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

9

time

n processors

High throughput computing

Page 10: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

10

time

n processors

High performance computing

Page 11: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

11

HIGH THROUGHPUT COMPUTING

Page 12: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

12

High Throughput Examples• Test many parameter combinations• Analyze multiple images or datasets• Do a replicate/randomized analysis• Align genome/RNA sequence data

Page 13: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

13

Explaining Post-Katrina Home Rebuilding

Economics professor, Jesse Gregory, performs HTC optimization of a model to predict the most important factors determining household rebuilding after Katrina.

Most important rebuilding factors:- relative funding available to

household if rebuilt- rebuild status of neighboring

households

http://www.opensciencegrid.org/using-high-throughput-computing-to-evaluate-post-katrina-rebuilding-grants/

Jesse’s projects in the last year:4.5 million hours, 1.5 million OSG hours

Fraction of Neighbors Rebuilt

(Repair Cost / Replacement Cost)

more fundsqualified for

less fundsqualified for

Page 14: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

14

The Philosophy of HTC

• Break work into many ‘smaller’ jobs• single or few CPUs, short run times, smaller input and output per job

• Run on as many processors as possible• smaller and shorter jobs are best• take dependencies with you (like R)

• Automate as much as you can• black box programs that use various input files• numbered files

• Scale up gradually

14

Page 15: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

15

CENTER FOR HIGH THROUGHPUT COMPUTING

Page 16: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

16

time

n processors

High throughput computing

Page 17: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

17

WHAT WE NEEDLots of computers, to run multiple independent computations

Page 18: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

18

CHTC Services

Center for High Throughput Computing, est. 2006

• Large-scale, campus-shared computing systems• high-throughput computing (HTC) and

high-performance computing (HPC) systems

• all standard services provided free-of-charge

• hardware buy-in options • support and training for

using our systems• proposal assistance• chtc.cs.wisc.edu

Page 19: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

19

CHTC

time limit: <72 hrs/job

10,000+ CPU hrs/day

S

CHTC Accessible Computing

Presenter
Presentation Notes
We also have an HPC cluster, managed with SLURM, but ‘backfilled’ with HTCondor jobs as part of the CHTC Pool.
Page 20: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

20

UW Gridup to ~8 hrs/job

~20,000 CPU hrs/day

CHTC<72 hrs/job

10,000 hrs/day

S

CHTC Accessible Computing

Presenter
Presentation Notes
We also have an HPC cluster, managed with SLURM, but ‘backfilled’ with HTCondor jobs as part of the CHTC Pool.
Page 21: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

21

Open Science Gridup to ~4 hrs/job

~200,000 CPU hrs/day

UW Gridup to ~8 hrs/job

~20,000 CPU hrs/day

CHTC<72 hrs/job

10,000 hrs/day

S

CHTC Accessible Computing

Presenter
Presentation Notes
We also have an HPC cluster, managed with SLURM, but ‘backfilled’ with HTCondor jobs as part of the CHTC Pool.
Page 22: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

22Researchers who use the CHTC are located all over campus (red buildings)

http://chtc.cs.wisc.edu

Jul’12-Jun’13

Jul’13-Jun’14

Jul’14-Jun’15 Quick Facts

97 132 265 Million Hours Served120 148 188 Research Projects52 56 61 Departments

Page 23: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

23

Jul’12-Jun’13

Jul’13-Jun’14

Jul’14-Jun’15 Quick Facts

97 132 265 Million Hours Served120 148 188 Research Projects52 56 61 Departments

Researchers who use the CHTC are located all over campus (red buildings)

http://chtc.cs.wisc.edu

Individual researchers:

30 years of computing per day

Page 24: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

24

What else?• Software

• We can support most open source software (R, Python, other projects)

• Can also support Matlab• Data

• CHTC cannot *store* data• However, you can work with up to several TB of data on our system

Page 25: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

25

How it works• Job submission

• Instead of running programs on your own computer, log in and submit jobs

• Job = single independent calculation• Can run hundreds of jobs at once (or more)

• HTCondor• Computer program that controls and runs jobs on CHTC’s computers

Page 26: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

26

Getting Started• Facilitators

• Help researchers get started• Advise on best approach and resources for your research problem

• How do I get an account?• Temporary accounts created for this class. • If you want to use CHTC beyond this class for research, fill out our account request form: http://chtc.cs.wisc.edu/form

Page 27: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

27

RUNNING A JOB ON CHTC’S HIGH THROUGHPUT SYSTEM WITH HTCONDOR

Page 28: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

28

How It Works (in CHTC)• Submit jobs to a queue (on a submit server)• HTCondor schedules them to run on computers that belong to CHTC (execute servers)

submitexecute

execute

execute

Page 29: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

29

HTCONDOR

What is HTCondor?• Software that schedules and runs computing tasks on computers

Page 30: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

30

Job Example• Consider an imaginary program called “compare_states”, which compares two data files and produces a single output file.

wi.dat

compare_states

us.dat

wi.dat.out

$ compare_states wi.dat us.dat wi.dat.out

Page 31: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

31

Submit File• List your executable and any arguments it takes.

• Arguments are any options passed to the executable from the command line.

compare_states

$ compare_states wi.dat us.dat wi.dat.out

executable = compare_statesarguments = wi.dat us.dat wi.dat.out

should_transfer_files = YEStransfer_input_files = us.dat, wi.datwhen_to_transfer_output = ON_EXIT

log = job.logoutput = job.outerror = job.err

request_cpus = 1request_disk = 20MBrequest_memory = 20MB

queue 1

job.submit

Page 32: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

32

Submit File• Indicate your input files.

wi.dat

us.dat

executable = compare_statesarguments = wi.dat us.dat wi.dat.out

should_transfer_files = YEStransfer_input_files = us.dat, wi.datwhen_to_transfer_output = ON_EXIT

log = job.logoutput = job.outerror = job.err

request_cpus = 1request_disk = 20MBrequest_memory = 20MB

queue 1

job.submit

Page 33: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

33

Submit File• HTCondor will transfer back all new and changed files (usually output) from the job.

wi.dat.out

executable = compare_statesarguments = wi.dat us.dat wi.dat.out

should_transfer_files = YEStransfer_input_files = us.dat, wi.datwhen_to_transfer_output = ON_EXIT

log = job.logoutput = job.outerror = job.err

request_cpus = 1request_disk = 20MBrequest_memory = 20MB

queue 1

job.submit

Page 34: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

34

Submit File• log: file created by HTCondor to track job progress

• output/error: captures stdout and stderr

executable = compare_statesarguments = wi.dat us.dat wi.dat.out

should_transfer_files = YEStransfer_input_files = us.dat, wi.datwhen_to_transfer_output = ON_EXIT

log = job.logoutput = job.outerror = job.err

request_cpus = 1request_disk = 20MBrequest_memory = 20MB

queue 1

job.submit

Page 35: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

35

Submit File• Request the appropriate resources for your job to run.

•queue: keyword indicating “create a job.”

executable = compare_statesarguments = wi.dat us.dat wi.dat.out

should_transfer_files = YEStransfer_input_files = us.dat, wi.datwhen_to_transfer_output = ON_EXIT

log = job.logoutput = job.outerror = job.err

request_cpus = 1request_disk = 20MBrequest_memory = 20MB

queue 1

job.submit

Page 36: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

36

Submitting and Monitoring• To submit a job/jobs:

condor_submit submit_file_name

• To monitor submitted jobs, use: condor_q

$ condor_submit job.submitSubmitting job(s).1 job(s) submitted to cluster 128.

$ condor_q-- Schedd: submit-5.chtc.wisc.edu : <128.104.101.92:9618?...ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD128.0 alice 5/9 11:09 0+00:00:00 I 0 0.0 compare_states wi.dat us.dat

1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

HTCondor Manual: condor_submitHTCondor Manual: condor_q

Page 37: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

37

condor_q

• By default condor_q shows user’s job only*

• Constrain with username, ClusterId or full JobId

$ condor_q-- Schedd: submit-5.chtc.wisc.edu : <128.104.101.92:9618?...ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD

128.0 alice 5/9 11:09 0+00:00:00 I 0 0.0 compare_states wi.dat us.dat

1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

* as of version 8.5

JobId = ClusterId .ProcId

Page 38: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

38

Job Idle

(submit_dir)/job.submitcompare_stateswi.datus.datjob.logjob.outjob.err

$ condor_q-- Schedd: submit-5.chtc.wisc.edu : <128.104.101.92:9618?...ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD

128.0 alice 5/9 11:09 0+00:00:00 I 0 0.0 compare_states wi.dat us.dat

1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

Submit Node

Page 39: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

39

Job Starts

compare_stateswi.datus.dat

$ condor_q-- Schedd: submit-5.chtc.wisc.edu : <128.104.101.92:9618?...ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD

128.0 alice 5/9 11:09 0+00:00:00 < 0 0.0 compare_states wi.dat us.dat w

1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

(submit_dir)/job.submitcompare_stateswi.datus.datjob.logjob.outjob.err

Submit Node

(execute_dir)/

Execute Node

Page 40: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

40

Job Running$ condor_q

-- Schedd: submit-5.chtc.wisc.edu : <128.104.101.92:9618?...ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD

128.0 alice 5/9 11:09 0+00:01:08 R 0 0.0 compare_states wi.dat us.dat

1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

(submit_dir)/job.submitcompare_stateswi.datus.datjob.logjob.outjob.err

Submit Node

(execute_dir)/compare_stateswi.datus.datstderrstdoutwi.dat.out

Execute Node

Page 41: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

41

Job Completes

(execute_dir)/compare_stateswi.datus.datstderrstdoutwi.dat.out

stderrstdout

wi.dat.out

$ condor_q-- Schedd: submit-5.chtc.wisc.edu : <128.104.101.92:9618?...ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD

128 alice 5/9 11:09 0+00:02:02 > 0 0.0 compare_states wi.dat us.dat

1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

Execute Node

(submit_dir)/job.submitcompare_stateswi.datus.datjob.logjob.outjob.err

Submit Node

Page 42: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

42

Job Completes (cont.)$ condor_q

-- Schedd: submit-5.chtc.wisc.edu : <128.104.101.92:9618?...ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD

0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended

(submit_dir)/job.submitcompare_stateswi.datus.datjob.logjob.outjob.errwi.dat.out

Submit Node

Page 43: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

43

TRY IT OUT

Page 44: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

44

Setting up• Log In

• hostname: learn.chtc.wisc.edu• username: your campus NetId

• Copy a sample submit file and script to your home directory: • cp /home/groups/stat679/example/* ./

• Choose a name to pass as the argument, by adding it to the submit file.

Page 45: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

45

Submit the job• Submit the job and monitor it in the queue

• condor_submit job.sub• condor_q

• Check the .out and .log files once the job completes:• What does the message say?• How much memory/disk did your job use?

Page 46: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

46

Submitting Multiple JobsReplacing single job inputs

with a variable of choice

executable = compare_statesarguments = wi.dat us.dat wi.dat.out

transfer_input_files = us.dat, wi.dat

queue 1

executable = compare_statesarguments = $(infile) us.dat $(infile).out

transfer_input_files = us.dat, $(infile)

queue ...

Page 47: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

47

multiple “queue” statements

matching ... pattern

in ... list

from ... file

Possible Queue Statementsinfile = wi.datqueue 1infile = ca.datqueue 1infile = ia.datqueue 1

queue infile matching *.dat

queue infile in (wi.dat ca.dat ia.dat)

queue infile from state_list.txtwi.datca.datia.dat

Not Recommended

state_list.txt

Recommended

Page 48: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

48

Multiple Jobs• Create a list of names in a text file called “names.txt”

• Create a copy of the submit file called “multiple-jobs.sub”

• Change two lines in “multiple-jobs.sub”• arguments = $(name)• queue name from names.txt

• Submit the jobs to see what happens.

Page 49: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

49

USER EXPECTATIONS

Page 50: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

50

Be responsible! • These resources are shared and you get to use them for free -- be a good citizen.

• Ask questions if you aren’t sure about something.

• Don’t run programs directly on the submit server.

• Data files should be small. Talk to us if you want to submit jobs with big (> 1GB) of data.

Page 51: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

51

Resource Request• Jobs are nearly always using a part of a computer, not the whole thing

• Very important to request appropriate resources (memory, cpus, disk) for a job

whole computer

your request

Page 52: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

52

Resource Assumptions• Even if your system has default CPU, memory and disk requests, these may be too small!

• Important to run test jobs and use the log file to request the right amount of resources: • requesting too little: causes problems for your and other jobs; jobs might by held by HTCondor

• requesting too much: jobs will match to fewer “slots”

Page 53: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

53

Log File000 (128.000.000) 05/09 11:09:08 Job submitted from host: <128.104.101.92&sock=6423_b881_3>...001 (128.000.000) 05/09 11:10:46 Job executing on host: <128.104.101.128:9618&sock=5053_3126_3>...006 (128.000.000) 05/09 11:10:54 Image size of job updated: 220

1 - MemoryUsage of job (MB)220 - ResidentSetSize of job (KB)

...005 (128.000.000) 05/09 11:12:48 Job terminated.

(1) Normal termination (return value 0)Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote UsageUsr 0 00:00:00, Sys 0 00:00:00 - Run Local UsageUsr 0 00:00:00, Sys 0 00:00:00 - Total Remote UsageUsr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage

0 - Run Bytes Sent By Job33 - Run Bytes Received By Job0 - Total Bytes Sent By Job33 - Total Bytes Received By JobPartitionable Resources : Usage Request Allocated

Cpus : 1 1Disk (KB) : 14 20480 17203728Memory (MB) : 1 20 20

Page 54: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

54

TESTING IS KEY!ALWAYS run test jobs before submitting many jobs at once.

Page 55: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

55

Facilitators• Christina (me) and Lauren Michael work for CHTC as Research Computing Facilitators.

• It’s our job to answer questions and help people get started with computing at CHTC.

• Email us! [email protected]• Or come to office hours in the WID:

• Tues/Thurs, 3:00 - 4:30• Wed, 9:30 - 11:30

http://chtc.cs.wisc.edu/get-help

Page 56: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

56

NEXT STEPS

Page 57: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

57

Building up a workflow• Try to get ONE job running

• Follow steps that follow• Troubleshoot

• Check memory/disk requirements• Do a small scale test of 5-10 jobs

• Check memory + disk requirements *again*• Run full-scale set of jobs

Page 58: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

58

Special Considerations for Code• Your code should be able to take in different inputs (usually via an argument)

• Example (in R): # analysis.R

options <- commandArgs(trailingOnly=TRUE)input_file <- options[[1]]

# continue w/ input file

$ R CMD BATCH “--args input.tgz” analysis.R

Page 59: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

59

Special Considerations for R

Guide for R Jobs:chtc.cs.wisc.edu/r-jobs.shtml

Need to:1.prepare a portable R installation2. create a primary “executable” script that will run your R code3. submit jobs with different input for each

Page 60: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

60

1. Prepare portable R

1. download your required R version (.tar.gz)

2. follow CHTC’s guide to start an interactive “build” job:

chtc.cs.wisc.edu/inter-submit.shtml

3. install R and any libraries

4. pack up the resulting R installation directory

5. exit the interactive session

described in: chtc.cs.wisc.edu/r-jobs.shtml

S iB

Page 61: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

61

2. Create the Job “executable”

#!/bin/bash

# command line script (runR) to run R

# untar the portable R installation:tar –xzvf R.tar.gz

# make sure the above R installation will be used:export PATH=$(pwd)/R/bin:$PATH

# run R, with my R script as an argument to this executable:R CMD BATCH “--args $1” Rscript.R

Page 62: HTCondor User Tutorial - University of Wisconsin–Madisonpages.stat.wisc.edu/~jgillett/679/CHTC/ChristinaKochCHTC2nov17.pdf · 1 HIGH-THROUGHPUT COMPUTING AND YOUR RESEARCH Christina

62

3. Create the submit file

executable = runR.sharguments = input.tgzlog = $(Process).logerror = $(Process).erroutput = $(Process).out

transfer_input_files = R.tar.gz, input_file, myscript.R

request_cpus = 1request_disk = 1GBrequest_memory = 1GB

queue