ttu high performance computing user training: part 2 srirangam addepalli and david chaffin, ph.d....

TTU High Performance Computing User Training: Part 2 Srirangam Addepalli and David Chaffin, Ph.D.

Advanced Session: Outline

Cluster Architecture

File System and Storage

Lectures with Labs:

Advanced Batch Jobs

Compilers/Libraries/Optimization

Compiling/Running Parallel Jobs

Grid Computing


HPCC Clusters

hrothgar: 128 dual-processor 64-bit Xeons, 3.2 Ghz, 4GB memory, Infiniband and Gigabit Ethernet, Centos 4.3 (Redhat)

community cluster: 64 nodes, part of hrothgar, same except no Infiniband. Owned by faculty members, controlled by batch queues.

minigar; 20 nodes, 3.6 Ghz, IB, for development, open soon

Physics grid machine on order: some nodes available

poseidon: Opteron, 3 nodes, pathscale compilers

Several retired, test, grid systems


Cluster Performance

Main factors:

1. Individual node performance: of course. SpecFP2000Rate (www.spec.org) matches our apps well. Newest dual cores have 2x cores, ~1.5x perf per core for 3x performance per node vs. hrothgar.

2. Fabric latency (delay time of one message, ms. IB=6 GE=40)

3. Fabric bandwidth (MB/s IB=600 GE=60)

Intels better cpu right now, AMD better shared mem performance. Overall about equal.


Cluster Architecture

An application example where the system is limited by interconnect performance:

gromacs, simulation time completed/real time

Hrothgar, 8 nodes, Gig-E: ~1200 ns/day

Hrothgar, 8 nodes, IB: ~2800 ns/day

Current dual-core systems have 3x the serial throughput of hrothgar, and quad-core systems are coming next year. They need more bandwidth: Gig-E will in the future be suitable only for serial jobs.


Cluster Usage

ssh to hrothgar

scp files to hrothgar

compile on hrothgar

run on compute nodes (only) using lsf batch system (only)

example files: /home/shared/examples/


Some Useful LSF Commandsbjobs –w (-w for wide shows full node name)

bjobs –l [job#] (–l for long shows everything)

bqueues [-l] shows queues [everything]

bhist [job#] job history

bpeek [job#] stdout/err stored by lsf

bkill job# kill it

-bash-3.00$ /home/shared/bin/check-hosts-batch.sh

hrothgar, 2 free=0 nodes, 0 cpus

hrothgar, 1 free=3 nodes, 3 cpus

hrothgar, 0 free=125 nodes

hrothgar, offline=0 nodes


Batch Queues on hrothgarbqueues

QUEUE_NAME PRIO STATU MAX JL/U JL/P JL/H NJOBS PEND RUN

short 35 Open 56 56 - - 0 0 0

parallel 35 Open 224 40 - - 108 0 108

serial 30 Open 156 60 - - 204 140 64

parallel_long 25 Open 256 64 - - 16 0 16

idle 20 Open 256 256 - - 100 0 55

Every 30 sec the scheduler cycles queued jobs. Starts if:

(1) nodes are available, free or idle run

(2) Cpu’s less than user queue limit “bqueues JL/U”

(3) Cpu’s Less that total queue limit “bqueues MAX”

(4) Highest priority queue (short,par,ser,par_long,idle)

(5) Fair share (user with smallest current usage goes first)


Unix/Linux Compiling Common Features

[compiler] [options] [source files] [linker options](pathscale is only on poseidon)

C compilers: gcc, icc, pathcc

C++: g++, icpc, pathCC

Fortran: g77, ifort, pathf90

Options: -O [optimize] -o outputfilename

Source files: new.f or *.f or *.c

Linker options: To link with libx.a or libx.so in /home/elvis/lib:

-L/home/elvis/lib –lx

Many programs need: -lm, -pthread


MPI Compile: Path. /home/shared/examples/new-bashrc [using bash]

source /home/shared/examples/new-cshrc [using tcsh]

hrothgar:dchaffin:dchaffin $ echo $PATH

/sbin:/bin:/usr/bin:/usr/sbin:/usr/X11R6/bin:\

/usr/share/bin:/opt/rocks/bin:/opt/rocks/sbin:\

/opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/bin:\

/opt/intel/fce/9.0/bin:/opt/intel/cce/9.0/bin:\

/share/apps/mpich/IB-icc-ifort-64/bin:\

/opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/bin

mpich: IB or GE, icc or gcc or pathcc, ifort or g77 or pathf90

mpicc/mpif77/mpif90/mpiCC must match mpirun!


MPI Compile/Runcp /home/shared/examples/mpi-basic.sh .

cp /home/shared/examples/cpi.c .

/opt/mpich/gnu/bin/mpicc cpi.c [or]

/share/apps/mpich/IB-icc-ifort-64/bin/mpicc cpi.c

vi mpi-basic.sh

Ptiles comment out the mpirun that you are not using either IB or default

Could change executable name

bsub < mpi-basic.sh

produces:

job#.out lsf output

job#.pgm.out mpirun output

job#.err lsf stderr

job#.pgm.err mpirun stderr


Exercise/HomeworkRun mpi benchmark on Infiniband, Ethernet, and Shared memory. Compare latency and bandwidth. Research and briefly discuss reasons for the performance:

Hardware bandwidth (look it up)

Software layers (OS, interrupts, MPI, one-sided copy, two-sided copy)

Hardware:

Topspin Infiniband SDR, PCI-X

Xeon Nocona shared memory

Intel Gigabit, on board

Program:/home/shared/examples/mpilc.c or equivalent

ttu high performance computing user training: part 2 srirangam addepalli and david chaffin, ph.d....

Documents