high performance computing on flux eeb 401 charles j antonelli mark champe lsait ars september, 2014
Post on 05-Jan-2016
220 Views
Preview:
TRANSCRIPT
High PerformanceComputing on Flux
EEB 401Charles J Antonelli
Mark ChampeLSAIT ARS
September, 2014
cja 2014 2
FluxFlux is a university-wide shared computational discovery / high-performance computing service.
Provided by Advanced Research Computing at U-M
Operated by CAEN HPC
Procurement, licensing, billing by U-M ITS
Interdisciplinary since 2010
9/14
http://arc.research.umich.edu/resources-services/flux/
cja 2014 3
The Flux clusterLogin nodes Compute nodes
Storage…
Data transfernode
9/14
cja 2014 4
A Flux node
12, 16 Intel cores
48,64 GB RAM
Local disk
Network
9/14
cja 2014 5
Programming Models
Two basic parallel programming modelsMessage-passingThe application consists of several processes running on different nodes and communicating with each other over the network
Used when the data are too large to fit on a single node, and simple synchronization is adequate
“Coarse parallelism”
Implemented using MPI (Message Passing Interface) libraries
Multi-threadedThe application consists of a single process containing several parallel threads that communicate with each other using synchronization primitives
Used when the data can fit into a single process, and the communications overhead of the message-passing model is intolerable
“Fine-grained parallelism” or “shared-memory parallelism”
Implemented using OpenMP (Open Multi-Processing) compilers and libraries
Both
9/14
cja 2014 6
Command Line Reference
William E Shotts, Jr.,“The Linux Command Line: A Complete Introduction,”No Starch Press, January 2012.http://linuxcommand.org/tlcl.php .
Download Creative Commons Licensed version athttp://downloads.sourceforge.net/project/linuxcommand/TLCL/13.07/TLCL-13.07.pdf .
9/14
cja 2014 7
Using Flux
Three basic requirements:A Flux login accountA Flux allocationAn MToken (or a Software Token)
Logging in to Fluxssh login@flux-login.engin.umich.eduCampus wired or MWirelessVPNssh login.itd.umich.edu first
9/14
cja 2014 8
Copying dataThree ways to copy data to/from Flux
From Linux or Mac OS X, use scp:scp localfile login@flux-xfer.engin.umich.edu:remotefilescp login@flux-login.engin.umich.edu:remotefile localfilescp -r localdir login@flux-xfer.engin.umich.edu:remotedir
From Windows, use WinSCP
U-M Blue Dischttp://www.itcs.umich.edu/bluedisc/
Use Globus Connect
9/14
cja 2014 9
Globus OnlineFeatures
High-speed data transfer, much faster than scp or WinSCP
Reliable & persistent
Minimal client software: Mac OS X, Linux, Windows
GridFTP EndpointsGateways through which data flow
Exist for XSEDE, OSG, …
UMich: umich#flux, umich#nyx
Add your own client endpoint!
Add your own server endpoint: contact flux-support@umich.edu
More informationhttp://cac.engin.umich.edu/resources/login-nodes/globus-gridftp
9/14
cja 2014 10
Batch workflowYou create a batch script and submit it to PBS (the cluster resource manager & scheduler)
PBS schedules your job, and it enters the flux queue
When its turn arrives, your job will execute the batch script
Your script has access to any applications or data stored on the Flux cluster
When your job completes, anything it sent to standard output and error are saved and returned to you
You can check on the status of your job at any time, or delete it if it’s not doing what you want
A short time after your job completes, it disappears
9/14
cja 2014 11
Basic batch commands
Once you have a script, submit it:qsub scriptfile
$ qsub singlenode.pbs6023521.nyx.engin.umich.edu
You can check on the job status:qstat jobidqstat -u user$ qstat -u cjanyx.engin.umich.edu: Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----6023521.nyx.engi cja flux hpc101i -- 1 1 -- 00:05 Q --
To delete your jobqdel jobid
$ qdel 6023521$
9/14
cja 2014 12
Loosely-coupled batch script
#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l procs=12,pmem=1gb,walltime=01:00:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe
#Your Code Goes Below:cd $PBS_O_WORKDIRmpirun ./c_ex01
9/14
cja 2014 13
Tightly-coupled batch script
#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l nodes=1:ppn=12,mem=47gb,walltime=02:00:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe
#Your Code Goes Below:cd $PBS_O_WORKDIRmatlab -nodisplay -r script
9/14
cja 2014 14
Flux softwareLicensed and open software:
Abacus, BLAST, BWA, bowtie, ANSYS, Java, Mason, Mathematica, Matlab, R, RSEM, STATA SE, …
See http://cac.engin.umich.edu/resources
C, C++, Fortran compilers:Intel (default), PGI, GNU toolchains
You can choose software using the module command
9/14
cja 2014 15
ModulesThe module command allows you to specify what versions of software you want to usemodule list -- Show loaded modulesmodule load name -- Load module name for usemodule show name -- Show info for name module avail -- Show all available modulesmodule avail name -- Show versions of module name*module unload name -- Unload module namemodule -- List all optionsEnter these commands at any time during your sessionA configuration file allows default module commands to be executed at login
Put module commands in file ~/privatemodules/defaultDon’t put module commands in your .bashrc / .bash_profile
9/14
cja 2014 16
Flux storageLustre filesystem mounted on /scratch on all login, compute, and transfer nodes
640 TB of short-term storage for batch jobs
Large, fast, short-term
NFS filesystems mounted on /home and /home2 on all nodes
80 GB of storage per user for development & testing
Small, slow, long-term
9/14
cja 2014 17
Flux environment
The Flux login nodes have the standard GNU/Linux toolkit:
make, perl, python, java, emacs, vi, nano, …
Watch out for source code or data files written on non-Linux systems
Use these tools to analyze and convert source files to Linux formatfile
dos2unix9/14
cja 2014 18
BLASTLoad modules
mod unload intel-comp openmpi gccmod load med python/3.2.3 gcc boost/1.54.0-gcc ncbi-blast/2.2.29
Create file ~/.ncbirc , with contents[BLAST]BLASTDB=/nfs/med-ref-genomes/blast
Copy sample code to your home directorycdcp ~cja/hpc/eeb401-sample-code.tar.gz .tar -zxvf eeb401-sample-code.tar.gzcd ./eeb401-sample-code
9/14
cja 2014 19
BLASTExamine blast-example.pbs
Edit with your favorite Linux editoremacs, vi, pico, …
Change email address bjensen@umich.edu to your own
9/14
cja 2014 20
BLASTSubmit your job to Fluxqsub blast-example.pbs
Watch the progress of your jobqstat jobid
When complete, look at the job’s outputless blast-example.ojobid
9/14
cja 2014 21
BWAmodule load med samtoolsmodule load med ncbi-blastmodule load med bowtie # optionalmodule load med bwa
9/14
cja 2014 22
Bowtiemodule load med bowtie
9/14
cja 2014 23
RSEMmodule load R/3.0.1module load lsa rsemmodule load med bowtie
Note: loading R/3.0.1 unloads gcc/4.7.0 and loadsgcc/4.4.6
9/14
cja 2014 24
Perl scriptsmodule load lsa baucom-bioinformaticsmodule show baucom-bioinformatics
9/14
cja 2014 25
Interactive jobsYou can submit jobs interactively:
qsub -I -X -V -l procs=2 -l walltime=15:00 -A youralloc_flux -l qos=flux –q flux
This queues a job as usualYour terminal session will be blocked until the job runs
When your job runs, you'll get an interactive shell on one of your nodes
Invoked commands will have access to all of your nodes
When you exit the shell your job is deleted
Interactive jobs allow you toDevelop and test on cluster node(s)
Execute GUI tools on a cluster node
Utilize a parallel debugger interactively
9/14
26
Interactive BLASTLoad modules:
module unload gcc openmpimodule load med gcc ncbi-blast
Start an interactive PBS sessionqsub -I -V -l nodes=1:ppn=2 -l walltime=1:00:00 -A eeb401f14_flux -l qos=flux -q flux
Run BLAST in the interactive shellcd $PBS_O_WORKDIRblastdbcmd -db refseq_rna -entry nm_000249 -out test_query.fablastn -query test_query.fa -db refseq_rna -task blastn -dust no -outfmt 7 -num_alignments 2 -num_descriptions 2 -num_threads 2
9/14cja 2014
cja 2014 27
Gaining insightThere are several commands you can run to get some insight over when your job will start:
freenodes : shows the total number of free nodes and cores currently available on Flux
mdiag -a youralloc_name : shows cores and memory defined for your allocation and who can run against it
showq -w acct=yourallocname: shows cores being used by jobs running against your allocation (running/idle/blocked)
checkjob -v jobid : Can show why your job might not be starting
showstart -e all jobid : Gives you a coarse estimate of job start time; use the smallest value returned
9/14
cja 2014 28
Some Flux Resources
http://arc.research.umich.edu/resources-services/flux/
U-M Advanced Research Computing Flux pages
http://cac.engin.umich.edu/CAEN HPC Flux pages
http://www.youtube.com/user/UMCoECACCAEN HPC YouTube channel
For assistance: hpc-support@umich.eduRead by a team of people including unit support staffCannot help with programming questions, but can help with operational Flux and basic usage questions
9/14
cja 2014 29
Any Questions?Charles J. AntonelliLSAIT Advocacy and Research Supportcja@umich.eduhttp://www.umich.edu/~cja734 763 0607
9/14
top related