FAS Research Computing
Choosing Resources Wisely
Plamen Krastev Office: 38 Oxford, Room 117
Email:[email protected]
FAS Research Computing
Inform you of available computational resources
Help you choose appropriate computational
resources for your research
Provide guidance for scaling up your applications
and performing computations more efficiently
More efficient use = more resources available to
do research
Enable you to “Work smarter, better, faster”
Objectives
Slide 2
FAS Research Computing
Outline
Choosing computational resources
Overview of available RC resources
Partition / Queue
Time
Number of nodes and cores
Memory
Storage
Examples
Slide 3
FAS Research Computing
What resources do I need?
Slide 4
Is my code serial or parallel?
How many cores and/or nodes does it need?
How much memory does it require?
How long does my code take to run?
How big is the Input / Output Data for each run?
How is the input data read by the code (e.g., hardcoded,
keyboard, parameter/data file(s), external
database/website, etc.)?
FAS Research Computing
What resources do I need?
Slide 5
How is the output data written by the code (standard
output/screen, data file(s), etc.)?
How many tasks/jobs/runs do I need to complete?
What is my timeframe / deadline for the project (e.g., paper,
conference, thesis, etc.)?
What computational resources are available at Research
Computing?
FAS Research Computing
RC resources: Odyssey
Slide 6
Odyssey is a large scale heterogeneous HPC cluster
Compute:
60,000+ compute cores (and increasing)
Cores per node: 8 to 64
Memory per node: 12GB to 512GB (4GB/core)
1,000,000+ NVIDIA GPU cores
Storage:
Over 35PB of storage
Home directories: 100GB
Lab space: Initial 4TB at $0 with expansion on a TB basis available for
purchase at $45/TB/year
Local scratch: 270GB/node
Global Scratch: High-performance shared scratch: 1 PB total, Lustre
file system
https://rc.fas.harvard.edu/resources/odyssey-storage
FAS Research Computing
RC resources: Odyssey
Slide 7
Odyssey is a large scale heterogeneous HPC cluster
Software:
CentOS
SLURM job manager
1,000+ scientific tools and programs
https://portal.rc.fas.harvard.edu/apps/modules
Interconnect:
2 underlying networks connecting 3 data centers
TCP/IP network
Low-latency 56 GB/s InfiniBand network: inter-node parallel
computing, fast access to Lustre mounted storage
Hosted Machines:
• 300+ virtual machines
• Lab instrument workstations
FAS Research Computing
Available Storage
Slide 8
Home Directories
Lab Storage Local Scratch Global Scratch Persistent
Research Data
Size Limit 100GB 4TB+ 270GB/node 1.2PB total 3PB
Availability All cluster nodes +
Desktop/laptop
All cluster nodes + Desktop/laptop
Local compute
node only.
All cluster nodes
Only IB connected
cluster nodes
Backup Hourly
snapshot + Daily Offsite
Daily Offsite No backup No backup External Repos
No backup
Retention Policy
Indefinite Indefinite Job duration 90 days 3-9 months
Performance Moderate. Not
suitable for high I/O
Moderate. Not suitable for
high I/O
Suited for small file I/O intensive jobs
Appropriate for large file I/O
intensive jobs
Appropriate for large I/O
intensive jobs
Cost Free 4TB Free +
Expansion at $45/TB/yr
Free Free Free
FAS Research Computing
Partition / Queue
Slide 9
general serial_requeue interact bigmem unrestricted Lab
queues
Time Limit
7 days 7 days 3 days no limit no limit no limit
# Nodes 177 1071 8 7 8 1154
# Cores / Node
64 8-64 64 64 64 8-64
Memory / Node (GB)
256 12-512 256 512 256 12-512
Batch jobs:
#SBATCH -p general # Partition name
Interactive or test jobs:
srun -p interact OTHER_OPTIONS
https://rc.fas.harvard.edu/resources/running-jobs/#SLURM_partitions
FAS Research Computing
Time
Slide 10
How long does my code take to run?
Batch jobs:
#SBATCH -p serial_requeue
#SBATCH -t 0-02:00 #Time in D-HH:MM
Interactive or test jobs:
srun -t 0-02:00 -p interact OTHER_JOB_OPTIONS
FAS Research Computing
Number of nodes and cores
Slide 11
Is my code serial or parallel?
Serial (single-core) jobs
Batch jobs:
#SBATCH -p serial_requeue
#SBATCH -c 1 # Number of cores
Interactive or test jobs:
srun -c 1 -p interact OTHER_JOB_OPTIONS
Core / Thread / Process / CPU
FAS Research Computing
Number of nodes and cores
Slide 12
Parallel shared memory (single node) jobs
Examples:
• OpenMP (Fortran, C/C++)
• MATLAB Parallel Computing Toolbox (PCT)
• Python (e.g., threading, multiprocessing)
• R (e.g., multicore)
Batch jobs:
#SBATCH -p general # Partition
#SBATCH -N 1 # Number of nodes
#SBATCH -c 4 # Number of cores (per task)
srun -c 4 PROGRAM PROGRAM_OPTIONS
Interactive or test jobs:
srun -p interact -N 1 -c 4 OTHER_OPTIONS
FAS Research Computing
Number of nodes and cores
Slide 13
Parallel distributed memory (multi-node) jobs
Examples:
• MPI (openmpi, impi, mvapich) with Fortran or
C/C++ code
• MATLAB Distributed Computing Server (DCS)
• Python (e.g., mpi4py)
• R (e.g., Rmpi, snow)
Batch jobs:
#SBATCH -p general # Partition
#SBATCH -n 4 # Number of tasks
srun -n 4 PROGRAM PROGRAM_OPTIONS
Interactive or test jobs:
srun -p interact -n 4 OTHER_OPTIONS
FAS Research Computing
Memory
Slide 14
Serial and parallel shared memory (single node) jobs
Batch jobs:
#SBATCH -p serial_requeue # Partition
#SBATCH --mem=4000 # Memory / node in MB
Interactive or test jobs:
srun --mem=4000 -p interact OTHER_OPTIONS
Parallel distributed memory (multi-node) jobs
Batch jobs:
#SBATCH -p general # Partition
#SBATCH -n 4 # Number of tasks
#SBATCH --mem-per-cpu=4000 # Memory / core in MB
Interactive or test jobs:
srun --mem-per-cpu=4000 -n 4 -p interact OTHER_OPTIONS
FAS Research Computing
Memory
Slide 15
How much memory does my code require?
• Understand your code and how the algorithms scale
analytically
• Run an interactive job and monitor memory usage
(with the “top” Unix command)
• Run a test batch job and check memory usage after
the job has completed (with the “sacct” SLURM
command)
FAS Research Computing
Memory
Slide 16
Know your code
Example:
A real*8 (Fortran), or double (C/C++), matrix of dimension
100,000 X 100,000 requires ~80GB of RAM
Data Type: Fortran / C Bytes
integer*4 / int 4
integer*8 / long 8
real*4 / float 4
real*8 / double 8
complex*8 / float complex 8
complex*16 / double complex 16
FAS Research Computing
Memory
Slide 17
Run an interactive job and monitor memory usage (with the “top” Unix
command)
Example: Check the memory usage of a matrix diagonalization code
Request an interactive bash shell session:
srun -p interact -n 1 -t 0-02:00 --pty --mem=4000 bash
Run the code, e.g.,
./matrix_diag.x
Open a new shell terminal and ssh to the compute node where the
interactive job dispatched, e.g.,
ssh holy2a18307
In the new shell terminal run top, e.g.,
top -u pkrastev
FAS Research Computing
Memory
Slide 18
Run 1:
Matrix dimension = 3000 X 3000 (real*8)
Needs 3,000 X 3000 X 8 / 1000000 = ~72 MB of RAM
FAS Research Computing
Memory
Slide 19
Run 2: Input size changed
Double matrix dimension, Quadrupole required memory
Matrix dimension = 6000 X 6000 (real*8)
Needs 6,000 X 6000 X 8 / 1000000 = ~288MB of RAM
FAS Research Computing
sacct overview
• sacct = SLURM accounting database
– every 30 sec the node collects the amount of CPU
and memory usage that all of the process IDs are
using for a given job. After the job ends this data is
set to slurmdb.
• Common flags
– j jobid or –name=jobname
– S YYYY-MM-DD and –E YYYY-MM-DD
– o ouput_options
Slide 20
JobID,JobName,NCPUS,Nnodes,Submit,Start,End,CPUTime,TotalCPU,ReqMem,
MaxRSS,MaxVMSize,State,Exit,Node
http://slurm.schedmd.com/sacct.html
FAS Research Computing
Memory
Slide 21
Run a test batch job and check memory usage after the job has
completed (with the “sacct” SLURM command)
Example:
[pkrastev@sa01 Resources]$ sacct -o ReqMem,MaxRSS -j 70446364
ReqMem MaxRSS
---------- ----------
320Mn 286648K
or
MaxRSS = 286648KB = 286.648MB
ReqMem = 320MB or 10% > MaxRSS
https://rc.fas.harvard.edu/resources/faq/how-to-know-what-memory-limit-to-put-on-my-job
FAS Research Computing
Storage
Slide 22
Home directories, /n/home*, and Lab storage are not appropriate for I/O
intensive or large number of jobs. Typical utilization would be jobscripts,
and in-house analysis codes or self-installed software
For jobs that create high-volume of small files (< 10 MB) , use local
scratch. You need to copy your input data to /scratch and move output
data to a different location after the job completes
For I/O intensive jobs – large data files (> 100 MB) and/or large number
of data files (100s of 10-100MB) – use the global scratch file-system
/n/regal
https://rc.fas.harvard.edu/policy-scratch
FAS Research Computing
Storage
Slide 23
60 Oxford St
Initial Lab shares (4TB)
Legacy equipment
1 Summer Street
Personal home directories
Purchased lab shares
Older Lab owned compute nodes
Holyoke, MA
Global scratch high-performance file-
system
Compute nodes > 2012 (33K+ cores)
Topology may affect the efficiency of your
work! For best performance storage needs
to be closer to compute
FAS Research Computing
Storage Utilization
Use “du” Unix command to check disk usage, e.g.,
du -h $HOME
...
37G /n/home06/pkrastev
Slide 24
https://en.wikipedia.org/wiki/Du_(Unix)
FAS Research Computing
Examples
Slide 25
#!/bin/bash
#SBATCH -J lapack_test
#SBATCH -o lapack_test.out
#SBATCH -e lapack_test.err
#SBATCH -p serial_requeue
#SBATCH -t 0-00:30
#SBATCH -N 1
#SBATCH -c 1
#SBATCH --mem=4000
# Load required modules
source new-modules.sh
# Run program
./lapack_test.x
Serial application
FAS Research Computing
Examples
Slide 26
#!/bin/bash
#SBATCH -J omp_dot
#SBATCH -o omp_dot.out
#SBATCH -e omp_dot.err
#SBATCH -p general
#SBATCH -t 0-02:00
#SBATCH -N 1
#SBATCH -c 4
#SBATCH --mem=16000
# Set up environment
source new-modules.sh
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# Run program
srun -c $SLURM_CPUS_PER_TASK ./omp_dot.x
Parallel OpenMP (single-node) application
FAS Research Computing
Examples
Slide 27
#!/bin/bash
#SBATCH -J parallel_monte_carlo
#SBATCH -o parallel_monte_carlo.out
#SBATCH -e parallel_monte_carlo.err
#SBATCH -N 1
#SBATCH -c 8
#SBATCH -t 0-03:30
#SBATCH -p general
#SBATCH --mem=32000
# Load required software modules
source new-modules.sh
module load matlab/R2016a-fasrc01
# Run program
srun -n 1 -c 8 matlab-default -nosplash -nodesktop -r "parallel_monte_carlo;exit"
MATLAB Parallel Computing Toolbox (single-node) application
FAS Research Computing
Examples
Slide 28
#!/bin/bash
#SBATCH -J planczos
#SBATCH -o planczos.out
#SBATCH -e planczos.err
#SBATCH -p general
#SBATCH -t 30
#SBATCH -n 8
#SBATCH --mem-per-cpu=4000
# Load required modules
source new-modules.sh
module load intel/15.0.0-fasrc01
module load openmpi/1.8.3-fasrc02
# Run program
srun -n 8 --mpi=pmi2 ./planczos.x
Parallel MPI (multi-node) application
https://github.com/fasrc/User_Codes
FAS Research Computing
Test first
• Before diving right into submitting 100s or 1000s
of research jobs, ALWAYS test a few first.
– ensure the job will finish to completion without
errors
– ensure you understand the resources needs
and how they scale with different data sizes
and input options
Slide 29
FAS Research Computing
Contact Information
Harvard Research Computing Website:
http://rc.fas.harvard.edu
Email:
Office Hours:
Wednesdays noon – 3pm
38 Oxford Street, 2nd Floor Conference Room Slide 30