high performance computing on flux eeb 401 charles j antonelli mark champe lsait ars september, 2014

Post on 05-Jan-2016

220 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

High PerformanceComputing on Flux

EEB 401Charles J Antonelli

Mark ChampeLSAIT ARS

September, 2014

cja 2014 2

FluxFlux is a university-wide shared computational discovery / high-performance computing service.

Provided by Advanced Research Computing at U-M

Operated by CAEN HPC

Procurement, licensing, billing by U-M ITS

Interdisciplinary since 2010

9/14

http://arc.research.umich.edu/resources-services/flux/

cja 2014 3

The Flux clusterLogin nodes Compute nodes

Storage…

Data transfernode

9/14

cja 2014 4

A Flux node

12, 16 Intel cores

48,64 GB RAM

Local disk

Network

9/14

cja 2014 5

Programming Models

Two basic parallel programming modelsMessage-passingThe application consists of several processes running on different nodes and communicating with each other over the network

Used when the data are too large to fit on a single node, and simple synchronization is adequate

“Coarse parallelism”

Implemented using MPI (Message Passing Interface) libraries

Multi-threadedThe application consists of a single process containing several parallel threads that communicate with each other using synchronization primitives

Used when the data can fit into a single process, and the communications overhead of the message-passing model is intolerable

“Fine-grained parallelism” or “shared-memory parallelism”

Implemented using OpenMP (Open Multi-Processing) compilers and libraries

Both

9/14

cja 2014 6

Command Line Reference

William E Shotts, Jr.,“The Linux Command Line: A Complete Introduction,”No Starch Press, January 2012.http://linuxcommand.org/tlcl.php .

Download Creative Commons Licensed version athttp://downloads.sourceforge.net/project/linuxcommand/TLCL/13.07/TLCL-13.07.pdf .

9/14

cja 2014 7

Using Flux

Three basic requirements:A Flux login accountA Flux allocationAn MToken (or a Software Token)

Logging in to Fluxssh login@flux-login.engin.umich.eduCampus wired or MWirelessVPNssh login.itd.umich.edu first

9/14

cja 2014 8

Copying dataThree ways to copy data to/from Flux

From Linux or Mac OS X, use scp:scp localfile login@flux-xfer.engin.umich.edu:remotefilescp login@flux-login.engin.umich.edu:remotefile localfilescp -r localdir login@flux-xfer.engin.umich.edu:remotedir

From Windows, use WinSCP

U-M Blue Dischttp://www.itcs.umich.edu/bluedisc/

Use Globus Connect

9/14

cja 2014 9

Globus OnlineFeatures

High-speed data transfer, much faster than scp or WinSCP

Reliable & persistent

Minimal client software: Mac OS X, Linux, Windows

GridFTP EndpointsGateways through which data flow

Exist for XSEDE, OSG, …

UMich: umich#flux, umich#nyx

Add your own client endpoint!

Add your own server endpoint: contact flux-support@umich.edu

More informationhttp://cac.engin.umich.edu/resources/login-nodes/globus-gridftp

9/14

cja 2014 10

Batch workflowYou create a batch script and submit it to PBS (the cluster resource manager & scheduler)

PBS schedules your job, and it enters the flux queue

When its turn arrives, your job will execute the batch script

Your script has access to any applications or data stored on the Flux cluster

When your job completes, anything it sent to standard output and error are saved and returned to you

You can check on the status of your job at any time, or delete it if it’s not doing what you want

A short time after your job completes, it disappears

9/14

cja 2014 11

Basic batch commands

Once you have a script, submit it:qsub scriptfile

$ qsub singlenode.pbs6023521.nyx.engin.umich.edu

You can check on the job status:qstat jobidqstat -u user$ qstat -u cjanyx.engin.umich.edu: Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----6023521.nyx.engi cja flux hpc101i -- 1 1 -- 00:05 Q --

To delete your jobqdel jobid

$ qdel 6023521$

9/14

cja 2014 12

Loosely-coupled batch script

#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l procs=12,pmem=1gb,walltime=01:00:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe

#Your Code Goes Below:cd $PBS_O_WORKDIRmpirun ./c_ex01

9/14

cja 2014 13

Tightly-coupled batch script

#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l nodes=1:ppn=12,mem=47gb,walltime=02:00:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe

#Your Code Goes Below:cd $PBS_O_WORKDIRmatlab -nodisplay -r script

9/14

cja 2014 14

Flux softwareLicensed and open software:

Abacus, BLAST, BWA, bowtie, ANSYS, Java, Mason, Mathematica, Matlab, R, RSEM, STATA SE, …

See http://cac.engin.umich.edu/resources

C, C++, Fortran compilers:Intel (default), PGI, GNU toolchains

You can choose software using the module command

9/14

cja 2014 15

ModulesThe module command allows you to specify what versions of software you want to usemodule list -- Show loaded modulesmodule load name -- Load module name for usemodule show name -- Show info for name module avail -- Show all available modulesmodule avail name -- Show versions of module name*module unload name -- Unload module namemodule -- List all optionsEnter these commands at any time during your sessionA configuration file allows default module commands to be executed at login

Put module commands in file ~/privatemodules/defaultDon’t put module commands in your .bashrc / .bash_profile

9/14

cja 2014 16

Flux storageLustre filesystem mounted on /scratch on all login, compute, and transfer nodes

640 TB of short-term storage for batch jobs

Large, fast, short-term

NFS filesystems mounted on /home and /home2 on all nodes

80 GB of storage per user for development & testing

Small, slow, long-term

9/14

cja 2014 17

Flux environment

The Flux login nodes have the standard GNU/Linux toolkit:

make, perl, python, java, emacs, vi, nano, …

Watch out for source code or data files written on non-Linux systems

Use these tools to analyze and convert source files to Linux formatfile

dos2unix9/14

cja 2014 18

BLASTLoad modules

mod unload intel-comp openmpi gccmod load med python/3.2.3 gcc boost/1.54.0-gcc ncbi-blast/2.2.29

Create file ~/.ncbirc , with contents[BLAST]BLASTDB=/nfs/med-ref-genomes/blast

Copy sample code to your home directorycdcp ~cja/hpc/eeb401-sample-code.tar.gz .tar -zxvf eeb401-sample-code.tar.gzcd ./eeb401-sample-code

9/14

cja 2014 19

BLASTExamine blast-example.pbs

Edit with your favorite Linux editoremacs, vi, pico, …

Change email address bjensen@umich.edu to your own

9/14

cja 2014 20

BLASTSubmit your job to Fluxqsub blast-example.pbs

Watch the progress of your jobqstat jobid

When complete, look at the job’s outputless blast-example.ojobid

9/14

cja 2014 21

BWAmodule load med samtoolsmodule load med ncbi-blastmodule load med bowtie # optionalmodule load med bwa

9/14

cja 2014 22

Bowtiemodule load med bowtie

9/14

cja 2014 23

RSEMmodule load R/3.0.1module load lsa rsemmodule load med bowtie

Note: loading R/3.0.1 unloads gcc/4.7.0 and loadsgcc/4.4.6

9/14

cja 2014 24

Perl scriptsmodule load lsa baucom-bioinformaticsmodule show baucom-bioinformatics

9/14

cja 2014 25

Interactive jobsYou can submit jobs interactively:

qsub -I -X -V -l procs=2 -l walltime=15:00 -A youralloc_flux -l qos=flux –q flux

This queues a job as usualYour terminal session will be blocked until the job runs

When your job runs, you'll get an interactive shell on one of your nodes

Invoked commands will have access to all of your nodes

When you exit the shell your job is deleted

Interactive jobs allow you toDevelop and test on cluster node(s)

Execute GUI tools on a cluster node

Utilize a parallel debugger interactively

9/14

26

Interactive BLASTLoad modules:

module unload gcc openmpimodule load med gcc ncbi-blast

Start an interactive PBS sessionqsub -I -V -l nodes=1:ppn=2 -l walltime=1:00:00 -A eeb401f14_flux -l qos=flux -q flux

Run BLAST in the interactive shellcd $PBS_O_WORKDIRblastdbcmd -db refseq_rna -entry nm_000249 -out test_query.fablastn -query test_query.fa -db refseq_rna -task blastn -dust no -outfmt 7 -num_alignments 2 -num_descriptions 2 -num_threads 2

9/14cja 2014

cja 2014 27

Gaining insightThere are several commands you can run to get some insight over when your job will start:

freenodes : shows the total number of free nodes and cores currently available on Flux

mdiag -a youralloc_name : shows cores and memory defined for your allocation and who can run against it

showq -w acct=yourallocname: shows cores being used by jobs running against your allocation (running/idle/blocked)

checkjob -v jobid : Can show why your job might not be starting

showstart -e all jobid : Gives you a coarse estimate of job start time; use the smallest value returned

9/14

cja 2014 28

Some Flux Resources

http://arc.research.umich.edu/resources-services/flux/

U-M Advanced Research Computing Flux pages

http://cac.engin.umich.edu/CAEN HPC Flux pages

http://www.youtube.com/user/UMCoECACCAEN HPC YouTube channel

For assistance: hpc-support@umich.eduRead by a team of people including unit support staffCannot help with programming questions, but can help with operational Flux and basic usage questions

9/14

cja 2014 29

Any Questions?Charles J. AntonelliLSAIT Advocacy and Research Supportcja@umich.eduhttp://www.umich.edu/~cja734 763 0607

9/14

top related