high performance computing on flux eeb 401 charles j antonelli mark champe lsait ars september, 2014

29
High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

Upload: sherilyn-richardson

Post on 05-Jan-2016

220 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

High PerformanceComputing on Flux

EEB 401Charles J Antonelli

Mark ChampeLSAIT ARS

September, 2014

Page 2: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 2

FluxFlux is a university-wide shared computational discovery / high-performance computing service.

Provided by Advanced Research Computing at U-M

Operated by CAEN HPC

Procurement, licensing, billing by U-M ITS

Interdisciplinary since 2010

9/14

http://arc.research.umich.edu/resources-services/flux/

Page 3: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 3

The Flux clusterLogin nodes Compute nodes

Storage…

Data transfernode

9/14

Page 4: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 4

A Flux node

12, 16 Intel cores

48,64 GB RAM

Local disk

Network

9/14

Page 5: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 5

Programming Models

Two basic parallel programming modelsMessage-passingThe application consists of several processes running on different nodes and communicating with each other over the network

Used when the data are too large to fit on a single node, and simple synchronization is adequate

“Coarse parallelism”

Implemented using MPI (Message Passing Interface) libraries

Multi-threadedThe application consists of a single process containing several parallel threads that communicate with each other using synchronization primitives

Used when the data can fit into a single process, and the communications overhead of the message-passing model is intolerable

“Fine-grained parallelism” or “shared-memory parallelism”

Implemented using OpenMP (Open Multi-Processing) compilers and libraries

Both

9/14

Page 6: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 6

Command Line Reference

William E Shotts, Jr.,“The Linux Command Line: A Complete Introduction,”No Starch Press, January 2012.http://linuxcommand.org/tlcl.php .

Download Creative Commons Licensed version athttp://downloads.sourceforge.net/project/linuxcommand/TLCL/13.07/TLCL-13.07.pdf .

9/14

Page 7: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 7

Using Flux

Three basic requirements:A Flux login accountA Flux allocationAn MToken (or a Software Token)

Logging in to Fluxssh [email protected] wired or MWirelessVPNssh login.itd.umich.edu first

9/14

Page 8: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 8

Copying dataThree ways to copy data to/from Flux

From Linux or Mac OS X, use scp:scp localfile [email protected]:remotefilescp [email protected]:remotefile localfilescp -r localdir [email protected]:remotedir

From Windows, use WinSCP

U-M Blue Dischttp://www.itcs.umich.edu/bluedisc/

Use Globus Connect

9/14

Page 9: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 9

Globus OnlineFeatures

High-speed data transfer, much faster than scp or WinSCP

Reliable & persistent

Minimal client software: Mac OS X, Linux, Windows

GridFTP EndpointsGateways through which data flow

Exist for XSEDE, OSG, …

UMich: umich#flux, umich#nyx

Add your own client endpoint!

Add your own server endpoint: contact [email protected]

More informationhttp://cac.engin.umich.edu/resources/login-nodes/globus-gridftp

9/14

Page 10: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 10

Batch workflowYou create a batch script and submit it to PBS (the cluster resource manager & scheduler)

PBS schedules your job, and it enters the flux queue

When its turn arrives, your job will execute the batch script

Your script has access to any applications or data stored on the Flux cluster

When your job completes, anything it sent to standard output and error are saved and returned to you

You can check on the status of your job at any time, or delete it if it’s not doing what you want

A short time after your job completes, it disappears

9/14

Page 11: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 11

Basic batch commands

Once you have a script, submit it:qsub scriptfile

$ qsub singlenode.pbs6023521.nyx.engin.umich.edu

You can check on the job status:qstat jobidqstat -u user$ qstat -u cjanyx.engin.umich.edu: Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----6023521.nyx.engi cja flux hpc101i -- 1 1 -- 00:05 Q --

To delete your jobqdel jobid

$ qdel 6023521$

9/14

Page 12: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 12

Loosely-coupled batch script

#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l procs=12,pmem=1gb,walltime=01:00:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe

#Your Code Goes Below:cd $PBS_O_WORKDIRmpirun ./c_ex01

9/14

Page 13: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 13

Tightly-coupled batch script

#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l nodes=1:ppn=12,mem=47gb,walltime=02:00:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe

#Your Code Goes Below:cd $PBS_O_WORKDIRmatlab -nodisplay -r script

9/14

Page 14: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 14

Flux softwareLicensed and open software:

Abacus, BLAST, BWA, bowtie, ANSYS, Java, Mason, Mathematica, Matlab, R, RSEM, STATA SE, …

See http://cac.engin.umich.edu/resources

C, C++, Fortran compilers:Intel (default), PGI, GNU toolchains

You can choose software using the module command

9/14

Page 15: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 15

ModulesThe module command allows you to specify what versions of software you want to usemodule list -- Show loaded modulesmodule load name -- Load module name for usemodule show name -- Show info for name module avail -- Show all available modulesmodule avail name -- Show versions of module name*module unload name -- Unload module namemodule -- List all optionsEnter these commands at any time during your sessionA configuration file allows default module commands to be executed at login

Put module commands in file ~/privatemodules/defaultDon’t put module commands in your .bashrc / .bash_profile

9/14

Page 16: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 16

Flux storageLustre filesystem mounted on /scratch on all login, compute, and transfer nodes

640 TB of short-term storage for batch jobs

Large, fast, short-term

NFS filesystems mounted on /home and /home2 on all nodes

80 GB of storage per user for development & testing

Small, slow, long-term

9/14

Page 17: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 17

Flux environment

The Flux login nodes have the standard GNU/Linux toolkit:

make, perl, python, java, emacs, vi, nano, …

Watch out for source code or data files written on non-Linux systems

Use these tools to analyze and convert source files to Linux formatfile

dos2unix9/14

Page 18: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 18

BLASTLoad modules

mod unload intel-comp openmpi gccmod load med python/3.2.3 gcc boost/1.54.0-gcc ncbi-blast/2.2.29

Create file ~/.ncbirc , with contents[BLAST]BLASTDB=/nfs/med-ref-genomes/blast

Copy sample code to your home directorycdcp ~cja/hpc/eeb401-sample-code.tar.gz .tar -zxvf eeb401-sample-code.tar.gzcd ./eeb401-sample-code

9/14

Page 19: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 19

BLASTExamine blast-example.pbs

Edit with your favorite Linux editoremacs, vi, pico, …

Change email address [email protected] to your own

9/14

Page 20: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 20

BLASTSubmit your job to Fluxqsub blast-example.pbs

Watch the progress of your jobqstat jobid

When complete, look at the job’s outputless blast-example.ojobid

9/14

Page 21: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 21

BWAmodule load med samtoolsmodule load med ncbi-blastmodule load med bowtie # optionalmodule load med bwa

9/14

Page 22: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 22

Bowtiemodule load med bowtie

9/14

Page 23: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 23

RSEMmodule load R/3.0.1module load lsa rsemmodule load med bowtie

Note: loading R/3.0.1 unloads gcc/4.7.0 and loadsgcc/4.4.6

9/14

Page 24: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 24

Perl scriptsmodule load lsa baucom-bioinformaticsmodule show baucom-bioinformatics

9/14

Page 25: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 25

Interactive jobsYou can submit jobs interactively:

qsub -I -X -V -l procs=2 -l walltime=15:00 -A youralloc_flux -l qos=flux –q flux

This queues a job as usualYour terminal session will be blocked until the job runs

When your job runs, you'll get an interactive shell on one of your nodes

Invoked commands will have access to all of your nodes

When you exit the shell your job is deleted

Interactive jobs allow you toDevelop and test on cluster node(s)

Execute GUI tools on a cluster node

Utilize a parallel debugger interactively

9/14

Page 26: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

26

Interactive BLASTLoad modules:

module unload gcc openmpimodule load med gcc ncbi-blast

Start an interactive PBS sessionqsub -I -V -l nodes=1:ppn=2 -l walltime=1:00:00 -A eeb401f14_flux -l qos=flux -q flux

Run BLAST in the interactive shellcd $PBS_O_WORKDIRblastdbcmd -db refseq_rna -entry nm_000249 -out test_query.fablastn -query test_query.fa -db refseq_rna -task blastn -dust no -outfmt 7 -num_alignments 2 -num_descriptions 2 -num_threads 2

9/14cja 2014

Page 27: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 27

Gaining insightThere are several commands you can run to get some insight over when your job will start:

freenodes : shows the total number of free nodes and cores currently available on Flux

mdiag -a youralloc_name : shows cores and memory defined for your allocation and who can run against it

showq -w acct=yourallocname: shows cores being used by jobs running against your allocation (running/idle/blocked)

checkjob -v jobid : Can show why your job might not be starting

showstart -e all jobid : Gives you a coarse estimate of job start time; use the smallest value returned

9/14

Page 28: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 28

Some Flux Resources

http://arc.research.umich.edu/resources-services/flux/

U-M Advanced Research Computing Flux pages

http://cac.engin.umich.edu/CAEN HPC Flux pages

http://www.youtube.com/user/UMCoECACCAEN HPC YouTube channel

For assistance: [email protected] by a team of people including unit support staffCannot help with programming questions, but can help with operational Flux and basic usage questions

9/14

Page 29: High Performance Computing on Flux EEB 401 Charles J Antonelli Mark Champe LSAIT ARS September, 2014

cja 2014 29

Any Questions?Charles J. AntonelliLSAIT Advocacy and Research [email protected]://www.umich.edu/~cja734 763 0607

9/14