introduction to hpcc for cem956

100
Introduction to HPCC for CEM956 Chun-Min Chang Research Consultant Institute for Cyber-Enabled Research Download this presentation: https://wiki.hpcc.msu.edu/display/TEAC/2016-09-13+CSM+956

Upload: others

Post on 09-Feb-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to HPCC for CEM956

Introduction to HPCC for CEM956

Chun-Min ChangResearch Consultant

Institute for Cyber-Enabled Research

Download this presentation: https://wiki.hpcc.msu.edu/display/TEAC/2016-09-13+CSM+956

Page 2: Introduction to HPCC for CEM956

How this workshop works

• We are going to cover some basics with hands on exampes.

• Exercises are denoted by the following icon in this presents:

• Bold italic means commands which I expect you type on your terminal in most cases.

Page 3: Introduction to HPCC for CEM956

Green and Red Sticky

• Use the sticky notes provided to help me help you.- No Sticky = I am working

- Green = I am done and ready to move on (yea!)

- Red = I am stuck and need more time and/or some help

If you have not followed the set up instructions: https://wiki.hpcc.msu.edu/x/uQViAQ

Please do now

Page 4: Introduction to HPCC for CEM956

Agenda

• Introduction – iCER & MSU HPC system

• How to Use the HPC– Get on – Load your supply– Work on board

• Summary– Terms– Commands– Websites

Page 5: Introduction to HPCC for CEM956

Agenda

• Introduction – iCER & MSU HPC system

• How to Use the HPC– Get on – Load your supply– Work on board

• Summary– Terms– Commands– Websites

Page 6: Introduction to HPCC for CEM956

iCER Overview

• iCER is a research unit at MSU. We:– Maintain the university’s supercomputer– Organize Training/workshops– Provide 1-on-1 consulting– Help with grant proposal

Page 7: Introduction to HPCC for CEM956

iCER Overview

• What does iCER stand for?– Institute for Cyber-Enabled Research– Established in 2009 to encourage and support the

application of advanced computing resources and techniques by MSU researchers.

• Mission– iCER’s mission to help researchers with computational

components of their research

Page 8: Introduction to HPCC for CEM956

Funding From…

• The Vice President office for Research and Graduate Studies (VPRGS)

• Engineering College, College of Natural Science and College of Social Science

• This allows us to provides services and resources for FREE!!!

Page 9: Introduction to HPCC for CEM956

Why use HPCC? (Latency and Throughput)

• Your own computer is too slow• You have large amounts of data to process (hundreds of

gigabytes)• Your program requires massive amounts of memory ( 64gb - 6 TB )• Your work could be done in parallel • You need specialized software (in a Linux environment)• You need licensed software (Stata, Matlab, Gaussian, etc)• Your work takes a long time (days, weeks)• You want to collaborate using shared programs and data• Software that can take advantage of accelerator cards

(Graphics Processors)

Page 10: Introduction to HPCC for CEM956

HPC vs PCEach computing node in a cluster is a single computer

Your Computer

2014 Cluster 2016 Cluster

Processors 1 2 per node 2 per node

Cores per processor

4-8 20 per node 28 per node

Speed 2.5 - 3 ghz 2.5 ghz 2.4 ghz

Connection to other computers

Campus ethernet1,000 MBit/sec

"Infiniband" 50,000 MBit/sec

"Infiniband" 50,000 MBit/sec

Computers(nodes)

1 => 8 cores total

223 =>4,676 cores total

367 =>10,276 cores total

Users 1 ~1,200 ~1,200

Schedule On Demand 24/7, Queue 24/7, Queue

High Performance => Running work in parallel, for long periods of time

Page 11: Introduction to HPCC for CEM956

MSU HPC System

• Large Memory Nodes (up to 6TB!)• GPU Accelerated cluster (K20, k80)• Phi Accelerated clusters (5110p)• Over 590 nodes, 15000 computing cores• New 2PB high speed parallel scratch file space• 50GB replicated file spaces (upgragtable to 1TB)• Access to large open-source software stack

Shared !!! (& More nodes coming)

Page 12: Introduction to HPCC for CEM956

Cluster Resources (for Computing Nodes)

Year Name Description ppn Memory Nodes Total Cores

2016 intel16 Intel Xeon E5-2680 v4 (2.4 GHz) 28 128GB+ 367 10,276

2011 intel11 Intel Xeon E7-8837 (2.67 GHz) 32 512GB 2 64

32 1TB 1 32

64 2TB 2 128

2014 intel14 Intel Xeon E5-2670 v2 (2.5 GHz) 20 64GB 128 2560

20 256GB 23 460

2 NVIDIA K20 GPUs 20 128GB 39 780

2 Xeon Phi 5110P 20 128GB 27 540

Large Memory 48 or 96 1 - 6 TB 6 336

595 15,176

Page 13: Introduction to HPCC for CEM956

Available Software

• Compilers, debuggers, and profilers– Intel compilers, Impi, openmpi, mvapich, totalview, DDD,

DDT…

• Libraries– BLAS, FFTW, LaPACK, MKL, PETSc, Trilinos…

• Commercial Software– Gaussian, GaussView, Mathematica, FLUENT, Abaqus,

MATLAB…

NOTE: Not include software installed privately under users’ own space

Page 14: Introduction to HPCC for CEM956

Online Resources

• icer.msu.edu: iCER Home– hpcc.msu.edu: HPCC Home

• wiki.hpcc.msu.edu: HPCC User Wiki– Documentation and user manual

• For workshops and registration details, visit:http://icer.msu.edu/events

Page 15: Introduction to HPCC for CEM956

Agenda

• Introduction – iCER & MSU HPC system

• How to Use the HPC– Get on – Load your supply– Work on board– Available Services

• Summary– Terms– Commands– Websites

Page 16: Introduction to HPCC for CEM956

Student Accounts

• Accounts created for classroom use are added to a Class group (cem956_fs16_s1). These accounts are usually kept active for 6 months and then automatically disabled unless a membership change is requested by a faculty member. When students have dropped the course. should have their HPCC accounts disabled.

• TA/faculty will answer students’ questions. • Students can attend iCER monthly Introduction to HPCC

workshop www.icer.msu.edu/upcoming-workshops

Types of space: Home space /mnt/home/$USER

Scratch space /mnt/scratch/$USER

Page 17: Introduction to HPCC for CEM956

Get general Accounts

• PIs can request accounts (for each group member) at http://contact.icer.msu.edu/account

• Each account has access up to:– 50 GB of replicated file space (/mnt/home/userid)– 520 processing cores– 2 PB of high-speed scratch space (/mnt/scratch/userid)

• Also available: shared group folder upon request by PI

Three types of space: Home space /mnt/home/$USERGroup space /mnt/research/<group>Scratch space /mnt/scratch/$USER

/mnt/ls15/scratch/groups/<group>

Page 18: Introduction to HPCC for CEM956

Connecting to the HPCC using ssh

• Windows: we recommand “mobaXterm” as connection client.– Insert the thumb drive– Open appropriate folder– Looking for MobaXterm– Install MobaXterm– Server: hpcc.msu.edu or rdp.hpcc.msu.edu– Username=your netid; password = your netid password

• OSX: – Access spotlight from the menu bar icon (or ⌘ SPACE)– Type terminal– In terminal, type: ssh [email protected]

Page 19: Introduction to HPCC for CEM956

Our supercomputer

is a “remote service”

Log in to gateway

dev-intel16

dev-intel16-k80

dev-intel14-k20

dev-intel14-phi

dev-intel14

Page 20: Introduction to HPCC for CEM956

HPCC message

Hostname is “Gateway”

Command prompt at bottom

(Note: file list on left side does not present in Mac terminal)

Successful connection HPCC Gateway

Page 21: Introduction to HPCC for CEM956

Gateway Nodes

• Shared: accessible by anyone with an MSU-HPCC account.

• Shared resource -- hundreds of users on the gateway nodes

• Only means of accessing HPCC compute resources.• Useful information (status, file space, messages)• ** DO NOT RUN ANYTHING ON THESE NODES! **

Page 22: Introduction to HPCC for CEM956

Our supercomputer

is a “remote service”

ssh to a dev node

dev-intel16

dev-intel16-k80

dev-intel14-k20

dev-intel14-phi

dev-intel14

Page 23: Introduction to HPCC for CEM956

All the board!

• If you’re not connected to the HPCC, please do so now.• From gateway, please connect to a development node:

ssh dev-intel16-k80 or ssh dev-intel14-phi• ssh == secure shell (allows you to securely connect to

other computers)

• Take a look at what you have in your account (ls)

• Take a look at what available on system (ml)

• Jump to other dev-node and exit

Page 24: Introduction to HPCC for CEM956

MobaXterm (ssh to a dev node)

Page 25: Introduction to HPCC for CEM956

Developement Nodes

• Shared: accessible by anyone with an MSU-HPCC account

• Meant for testing / short jobs.• Currently, up to 2 hours of CPU time• Development nodes are “identical” to compute

nodes in the cluster • Node name descriptive (name= feature, #=year)• http://wiki.hpcc.msu.edu/x/pwNe (HPCC Quick Reference Sheet)

Page 26: Introduction to HPCC for CEM956

Compute Nodes

• Use them by job submission (reservation)

• Dedicated: you request # cores, memory, walltime(also for advanced users: accelerators, temporary file space, licenses)

• Queuing system: when those resources are available for your job, those resources are assigned to you.

• Two modes– batch mode: generate a script that runs a series of

commands (use pbs job scription)– Interactive mode (qsub –I)

Page 27: Introduction to HPCC for CEM956

Nodes & Clusters

A “node” is a single computer, usually with a few processors and many cores. Each node has it’s own host name/address like 'dev-intel14'

A “cluster” is composed of dozens (or hundreds) of nodes, tied together with high speed network

Types of Nodes: gateway : computer available to outside world for you to login ; doorware

only; not enough to run programs. From here, one can connect to other nodes inside the HPCC network

dev-node : computer you choose and log-in to do setup, development, compile, and testing your programs using similar configuration as cluster nodes for the most accurate testing. dev-nodes includes dev-intel10, dev-intel14, dev-intel14-k20 (which includes k20 GPU add-in card).

compute-nodes: run actual computation; accessible via the scheduler/queue; can’t log in directly

gpu, phi: nodes within a cluster that have special-purpose hardware. ‘fat-nodes’ have large RAM

Page 28: Introduction to HPCC for CEM956

Basic navigation commands: ls

Use the Command with the Options to look into your home directory:

ls list files and directories

Some options for ls command

-a list all files and directories

-h print sizes in human readable format (e.g., 1k, 2.5M, 3.1G)

-l list with a long listing format

-t sort by modification time

Page 29: Introduction to HPCC for CEM956

Basic navigation commands: cd

Use the Command with the Options to try:

cd directory_name change to named directory

cd change to home-directory

cd ~ change to home-directory

cd .. change to parent directory

cd - change to the previous directory

pwd display the path of the current directory

Page 30: Introduction to HPCC for CEM956

Agenda

• Introduction – iCER & MSU HPC system

• How to Use the HPC– Get on – Load your supply– Work on board– Available Services

• Summary– Terms– Commands– Websites

Page 31: Introduction to HPCC for CEM956

Old fashion

• scp source destination

scp == secure copy

SFTP

Use hostname: rsync.hpcc.msu.edu

Page 32: Introduction to HPCC for CEM956

File Transfer & Loading Software

• Transfer to/from your computer– MobaXterm– Cyberduck (thumb drive)

– Mapping to HPCC https://wiki.hpcc.msu.edu/display/hpccdocs/Mapping+HPC+drives+to+a+campus+computer

• Load to/from HPCC– Module – Installation– From shared space

Page 33: Introduction to HPCC for CEM956

MobaXterm (Sessions -> New session -> SFTP)

Page 34: Introduction to HPCC for CEM956

Transferring Files

• SFTP program– download and install

cyberduck– https://cyberduck.io/

• Hostname: rsync.hpcc.msu.edu

• Username: your netid• Password: your netid password• Port : 22

Page 35: Introduction to HPCC for CEM956

Cyberduck

Page 36: Introduction to HPCC for CEM956

Mapping HPC drives to a campus computer

You can connect your laptop to the HPCC home & research directories Can be very convenient and work on Windows 10!

For more information see : https://wiki.hpcc.msu.edu/x/VYCe

1. determine your file system using the command df -h ~Filesystem Size Used Avail Use% Mounted onufs-10-b.i:/b10/u/home/billspat

2. server path Mac: smb: //ufs-10-b.hpcc.msu.edu/billspat Windows: \\ufs-10-b.hpcc.msu.edu\billspat

3. connectMac : Finder | connect to server | put in server path | netid and PasswordWindows : My Computer | Map Network Drive | select letterWindows fix: user name = hpcc\netid (same password)

Page 37: Introduction to HPCC for CEM956

Summary of MSU HPCC File SystemsGLOBAL function path (mount point) Env

Home directories your personal /mnt/home/$USER $HOME

Research Space Shared with working group /mnt/research/<group>

Scratch temporary computational workspace

/mnt/scratch/$USER/mnt/ls15/scratch/groups/<group>

$SCRATCH

Software (Gaussian) software installed /opt/software $PATH

LOCAL

Operating system

local disk, don't use these /etc, /usr/lib /usr/bin

temp or local fast in each single node only, some programs use /tmp

/tmp or /mnt/local

Page 38: Introduction to HPCC for CEM956

Directories

/mnt/home/$USERbin data mnt tmp

/

root

home scratch research

colbrydi gmason doortiCMICH

Page 39: Introduction to HPCC for CEM956

Example: File Manipulation

• Try Commands

• See the contents of your “.bashrc” file> cat .bashrc

• Make a directory called “hpccworkshop”, change to that directory and list the contents.> mkdir hpccworkshop

> cd ./hpccworkshop

mkdir make directorycp copy filecat display contents of text filerm remove file

Page 40: Introduction to HPCC for CEM956

Exercise!

Please type (in your terminal):module load powertoolsgetexample intro_workshop

•This will copy some example files (for this workshop) to your directory. •Exercise: try to find the file ‘youfoundit.txt’ in the “hidden” directory.

Page 41: Introduction to HPCC for CEM956

(Advanced Tip)

• Tab Completion short cut– Typing out directory/file/program names is

time consuming.– When you start typing out the name, hit tab

key. The shell will try to fill in the rest of the directory name

– E.g., return to home directorycd type “cd i”, then press tab

Page 42: Introduction to HPCC for CEM956

(Advanced tip 2)

• Access previous command by pressing up arrows• See full history by typing

history• Repeat previous command in history by using the

exclamation sign, e.g.[ongbw@dev-intel10] % history…302 cd scripts

!302 will call the command “cd scripts”

Page 43: Introduction to HPCC for CEM956

Examining Files

• Print partial contents of a file using head, tailhead filename tail filename

• When the file is really big, use “less”less filename

– Use arrow keys to scroll by line– Space to go to the next page– “b” to go backwards– “g” to go to the beginning– “G” to go to the end– “q” to quit– “/” to search

Page 44: Introduction to HPCC for CEM956

Creating / Editing text files

• You “could” create/edit files on your local system and transfer them using SFTP– Downside: slow– Need to run command “dos2unix” if you are

running a windows system• Much “faster” in the long term to learn how to

edit files in the command line

Page 45: Introduction to HPCC for CEM956

Editors

• Choice of editor is really a religion– emacs (my favorite, powerful)– vi / vim (popular, powerful) – Nano (simple, easy to learn)

• Today, we will learn nano. To start, typenano newfile.txt

• Commands are in the bottom. (^ == control key)• Create a file that says: this is a text file!

Page 46: Introduction to HPCC for CEM956

Nano

• Nano does not leave any output on the screen after it exists.

• But ls now shows that we have a new file called newfile

• Lets tidy up by deleting this new file:rm newfile

** NOTE: no undelete in Linux (unlike windows)

Page 47: Introduction to HPCC for CEM956

File Systems and static link

Files can have aliases or shortcuts, so one file or folder can be accessed multiple way, e.g.

/mnt/scratch is actually a static link to

/mnt/ls15/scratch/users

try this: ls –l /mnt/scratchcd /mnt/scratch/$USER # replace netid with YOUR netidpwd

What do you see? Note that "ls15" stands for "Lustre Scratch 2015",

meaning the scratch lustre file system installed in 2015.

Page 48: Introduction to HPCC for CEM956

Home directories are private; Research folders are shared by group cd ~ls -al # -rw------- 1 billspat staff-np 3360642 Apr 9 10:35 myfiles.sam

Group Example: (you may not have a research group)ls -al /mnt/research/holekamplab #this won't work for you-rw-rw-r-- 1 billspat holekamplab 3360642 Apr 9 10:35 eg1.sam

The Difference is read and write (rw) by group

# /mnt/scratch/netid may be public to system unless you make it privateTo share a file using your scratch directory:

touch ~/testfilecp ~/testfile /mnt/scratch/<netid>/testfilechmod g+r /mnt/scratch/<netid>/testfile

Private and Shared files

Page 49: Introduction to HPCC for CEM956

Can copy files from system to laptop with secure ftp = sftp or secure copy = scp

Example1. create a file to copy, please type

cd ~/classecho 'Hello from $USER' > greeting.txtecho 'Created on `date`' >>greeting.txtcat greeting.txt# make sure you are in the directory with pwd command, and it’s lower case2. exit from the HPCCexit; exit3. in terminal or Moba issue following command scp => secure copyscp [email protected]:class/testfile.txt testfile.txtopen testfile.txt

Windows: MobaXTerm auto connected, files are listed on the side

Copying files to/from HPCC

Page 50: Introduction to HPCC for CEM956

Loading Software

• Load to/from your computer– MobaXterm– Cyberduck– Mapping to HPCC

https://wiki.hpcc.msu.edu/display/hpccdocs/Mapping+HPC+drives+to+a+campus+computer

• Load to/from HPCC– Module – Installation– From shared space

Page 51: Introduction to HPCC for CEM956

Module System

• For maximizing the different types of software and system configurations that are available to the users, HPCC uses a Module system to set Paths and Libraries.

• Key Commands– module list : List currently loaded modules– module load modulename : Load a module– module unload modulename : Unload a module– module spider keyword : Search modules for a keyword– module show modulename : Show what is changed by a

module – module purge : Unload all modules

Page 52: Introduction to HPCC for CEM956

Using Modules

module list # show modules currently loaded in your profile

module purge # clear all modules

module list # none loaded

# which version of Matlab do you need?module spider Gaussian

module spider Python/2.7.2

# after purging (no modules), find modules using module list

module load Gaussian/g09

module unload Gaussian

Page 53: Introduction to HPCC for CEM956

Some Software requires other base programs to be loaded

module purge # remove all modulesmodule load R/3.1.1# … error: why? base modules not loaded

# to find out what is needed, please use spider command for detailed outpu

module spider R/3.1.1

module load gnumodule load openmpimodule load R/3.1.1

Page 54: Introduction to HPCC for CEM956

Default Modules re-load every time you connect

module purge # remove all modulesmodule listexit # return to gatewayssh dev-intel10module list

# … what do you see? default modules

module load Mathematica; module listssh dev-intel14module list

Need to load all modules you need for every session or jobor look into auto loading of modules using profile (covered later)

Page 55: Introduction to HPCC for CEM956

Powertools

• Powertools is a collection of software tools and examples that allows researchers to better utilize High Performance Computing (HPC) systems.

• module load powertools

• quota # list your home directory disk usage • poweruser # Set up your account to load powertools by default• priority_status #IF part of a buy-in group; Shows the status of an individuals

(buy-in nodes)• userinfo <$USER> # get details of a user (from public MSU directory)

Page 56: Introduction to HPCC for CEM956

Load supply from HPCC

• PleaseLoad Software modules type (in your terminal):module load powertoolsgetexample intro2hpcc

See the changes before and after module load•This will copy some example files (for this workshop) to your directory. •Exercise: try to find the file ‘youfoundit.txt’ in the “hidden” directory.

Page 57: Introduction to HPCC for CEM956

getexample

• You can obtain a lot of examples through getexample. Take advantage of it!

Page 58: Introduction to HPCC for CEM956

Useful Powertools commands

• getexample : Download user examples• powertools : Display a list of current powertools or a

long description• licensecheck : Check software licenses• avail : Show currently available node resources

• For more information, refer to this webpage (HPCC Powertools)

• https://wiki.hpcc.msu.edu/x/p4Bn

Page 59: Introduction to HPCC for CEM956

getexample

• If you do not load powertools, please do it now:• Download the helloworld example using

getexample• Check what you downloaded. What is the biggest

file?

Page 60: Introduction to HPCC for CEM956

ssh dev-intel10mkdir classcd classmodule load powertoolsgetexample getexample helloworldcd helloworldlscat READMEcat hello.ccat hello.qsub

Using MSU HPCC 'getexample' tool

connect to dev nodecreate directorychange directoryHPCC scriptslist all examplescopy one examplechange directorylist filesshow README fileshow c programshow qsub program

We will test using a simple program from HPCC examples.So, after connecting to hpcc, please type :

Page 61: Introduction to HPCC for CEM956

The README file has instructions for compiling and runninggcc hello.c -o hello # compile and output to new program

'hello'./hello # test run this hello program. Does it work?qsub hello.qsub # submit to queue to run on cluster

This 'job' was assigned the 'job' id 29444624 and put into the 'queue.'Now, we wait for it to be scheduled, run, and output files created...

Compile, Test, and Queue

Page 62: Introduction to HPCC for CEM956

Agenda

• Introduction – iCER & MSU HPC system

• How to Use the HPC– Get on – Load your supply– Work on board– Available Services

• Summary– Terms– Commands– Websites

Page 63: Introduction to HPCC for CEM956

CLI vs. GUI

• CLI: Command Line Interface: So far, we used CLI

• GUI: Graphical User Interface

Page 64: Introduction to HPCC for CEM956

Run Matlab on HPCC

• Run Matlab in command line mode• Can you Run Matlab in GUI ?

If you have problem open GUI, you may need some more tools

Page 65: Introduction to HPCC for CEM956

What is X11 and what is needed for X11?

• Method for running Graphical User Interface (GUI) across a network connection

SSH

X11

Page 66: Introduction to HPCC for CEM956

What is needed for X11?

• X11 server running on your personal computer• SSH connection with X11 enabled• Fast network connection

– Preferably on campus

Page 67: Introduction to HPCC for CEM956

Run X11 and RDP on Windows

• Install MobaXterm• http://mobaxterm.mobatek.net/download.html• at MobaXterm,

– ssh -X [email protected]

Run remote desktop onrdp.hpcc.msu.edu

Page 68: Introduction to HPCC for CEM956

Run X11 & rdp on Mac (OSX)

• Mac users:• Install XQuartz from iCER thumb drive• Run XQuartz• Quit terminal• Run terminal again• ssh -X [email protected]

Download remote desktop from Apple store (free) and run it on

rdp.hpcc.msu.edu

Page 69: Introduction to HPCC for CEM956

Test GUI using X11

• run X11• Try one of the following commands

xeyesor/andfirefox

Page 70: Introduction to HPCC for CEM956

Batch vs. Interactive

• Send job to scheduler and run in batch queue

• Interactive

Page 71: Introduction to HPCC for CEM956

Running Jobs on the HPC

• The developer (dev) nodes are used to compile, test and debug programs

• Two ways to run jobs– submission job scripts– interactive way

• Submission scripts are used to run (heavy/many) jobs on the cluster.

Page 72: Introduction to HPCC for CEM956

Advantages of running on dev-nodes

• Yo do not need to write a submission script• Yo do not need to wait in the queue• You can provide input to and get feedback from

your programs as they are running

Page 73: Introduction to HPCC for CEM956

Disadvantages of running on dev-nodes

• All the resources on developer nodes are shared between all uses

• Any single process is limited to 2 hours of cpu time. If a process runs longer than 2 hours it will be killed.

• Programs that overutilize the resources on a developer node (preventing other to use the system) can be killed without warning.

Page 74: Introduction to HPCC for CEM956

Programs that can use GUI

• GaussView• MATLAB• Mathematica• totalView : C/C++/fortran debugger especially for

multiple processors• DDD (Data Display Debugger) : graphical front-end

for command-line debugger• Etc, etc, and etc

Page 75: Introduction to HPCC for CEM956

Batch commands

• We’ve focused so far on working interactively (command line)

• Need dedicated resources for a computationally intensive task? – List resource requests in a job script– List commands in the job script– Submit job to the scheduler– Check results when job is finished

Page 76: Introduction to HPCC for CEM956

The Queue

The system is busy and used by many peoplemay want to run one program for a long timemay want to run many programs all at once on multiple nodes or

cores

To share we have a 'resource manager' that takes in requests to run programs, and allocates them

you write a special script that tells the resource manager what you need, and how to run your program, and 'submit to the queue' (get in line) with the qsub command

We submitted our job to the queue using the command qsub hello.qsub

And the 'qsub file' had the instructions to the scheduler on how to run

Page 77: Introduction to HPCC for CEM956

A Submission Script is a mini program for the scheduler that has :

1.List of required resources2.All command line instructions needed to run the computation

including loading all modules3.Ability for script to identify arguments4.collect diagnostic output (qstat -f)

Script tells the scheduler how to run your program on the cluster. Unless you specify, your program can run on any one of the 7,500 cores available.

Submission of Job to the Queue

Page 78: Introduction to HPCC for CEM956

Submission Script

• List of required resourceshttps://wiki.hpcc.msu.edu/display/hpccdocs/Scheduling+Jobs

• All command line instructions needed to run the computation

Page 79: Introduction to HPCC for CEM956

Typical Submission Script

#!/bin/bash -login

### define resources needed:#PBS -l walltime=00:01:00#PBS -l nodes=5:ppn=1#PBS -l mem=2gb

### you can give your job a name for easier identification#PBS -N name_of_job

### reserve license for conmercial software#PBS –W x=gres:MATLAB

### load necessary modules, e.g.module load MATLAB

### change to the working directory where your code is locatedcd ${PBS_O_WORKDIR}

### call your executablematlab –nodisplay –r “test(10)”

# Record Job execution informationqstat –f ${PBS_JOBID}

Page 80: Introduction to HPCC for CEM956

Submit a job

• Go to the helloworld directorycd ~/helloworld

• Create a simple submission scriptnano hello.qsub

• See next slide to edit the file…

Page 81: Introduction to HPCC for CEM956

hello.qsub

#!/bin/bash –login#PBS –l walltime=00:05:00#PBS –l nodes=1:ppn=1

cd ${PBS_O_WORKDIR}./hello

qstat –f ${PBS_JOBID}

Page 82: Introduction to HPCC for CEM956

Details about job script

• “#” is normally a comment except– “#!” special system commands– #!/bin/bash– #PBS instructions to the scheduler#PBS -l nodes=1:ppn=1#PBS -l walltime=hh:mm:ss#PBS -l mem=2gb (!!! Not per core but total)

• http://wiki.hpcc.msu.edu/x/Np-T (Scheduling Jobs)

Page 83: Introduction to HPCC for CEM956

Submitting and monitoring

• Once job script created, submit the file to the queue: qsub hello.qsub

• Record job id number (######) and wait around 30 seconds

• Check jobs in the queue with: qstat -u netid & showq -u netid

• What about using the Commands qs & sq ?• Status of a job: qstat –f jobid• Delete a job in a queue: qdel jobid

Page 84: Introduction to HPCC for CEM956

Common Commands

• qsub “submission script”– Submit a job to the queue

• qdel “job id”– Delete a job from the queue

• showq - u “user id”– Show the current job queue of the user

• checkjob -v “job id”– Check the status of the current job

• showstart -e all “job id”– Show the estimated start time of the job

Page 85: Introduction to HPCC for CEM956

Scheduling Priorities

• NOT First Come First Serve!• Jobs that use more resources get higher priority

(because these are hard to schedule)• Smaller jobs are backfilled to fit in the holes created

by the bigger jobs• Eligible jobs acquire more priority as they sit in the

queue• Jobs can be in three basic states: (check states by showq)

– Blocked, eligible or running– Jobs in BatchHold

Page 86: Introduction to HPCC for CEM956

Scheduling Tips

• Requesting more resources does not make a job runfaster unless you are running a parallel program.

• The more resources you request, the “harder” it is for the scheduler to reserve those resources.

• First time: over-estimate how many resources you need, and then modify appropriately.

• (qstat –f ${PBS_JOBID} at the bottom of your scripts will give you resources information when the job is done)

• Job finished time = Job waiting time + Job running time

Page 87: Introduction to HPCC for CEM956

Advanced Scheduling Tips

• Resources– A large proportion of the cluster can only run jobs that are

four hours or less– Most nodes have at least 24 GB of memory– Half have at least 64 GB of memory– Few have more than 64 GB of memory– Maximum running time of jobs: 7 days (168 hours)– Maximum memory that can be requested: 6 TB

• Scheduling Limit– total 520 cores using on HPCC at a time– 15 eligible jobs at a time– 1000 submitted jobs

Page 88: Introduction to HPCC for CEM956

Job Completion

• By default the job will automatically generate two files when it completes:– Standard Output:

• Ex: jobname.o5945571– Standard Error:

• Ex: jobname.e5945571• You can combine these files if you add the join

option in your submission script:#PBS -j oe

• You can change the output file name#PBS -o /mnt/home/netid/myoutputfile.txt

Page 89: Introduction to HPCC for CEM956

Other Job Properties

• resources (-l)– Walltime, memory, nodes, processor, network, etc.https://wiki.hpcc.msu.edu/display/hpccdocs/Advanced+Specification+of++Resources

#PBS –l feature=gpgpu,gbe#PBS –l nodes=2:ppn=8:gpu=2#PBS –l mem=16gb

• Email address (-M)#PBS –M [email protected]

- Email Options (-m)#PBS –m abe

Many others, see the wiki: http://wiki.hpcc.msu.edu/

Page 90: Introduction to HPCC for CEM956

Advanced Environment Variables

• The scheduler adds a number of environment variables that you can use in your script:

https://wiki.hpcc.msu.edu/display/hpccdocs/HPCC+Quick+Reference+Sheet

PBS_JOBID• The job number for the current job.

PBS_O_WORKDIR• The original working directory which the job was

submittedhttps://wiki.hpcc.msu.edu/display/hpccdocs/Advanced+Scripting+Using+PBS+Environment+Variables

• Examplemkdir ${PBS_O_WORKDIR}/${PBS_JOBID}

Page 91: Introduction to HPCC for CEM956

Agenda

• Introduction – iCER & MSU HPC system

• How to Use the HPC– Get on – Load your supply– Work on board

• Summary– Terms– Commands– Websites

Page 92: Introduction to HPCC for CEM956

Terms

• ICER, HPC• Account: login, directory (folder) • Node: Laptop, gateway, dev-nodes, comp-nodes• Cores: nodes=2:ppn=4 • Memory: total, per node, mem=2gb• Storage: home, research, scratch, disk• Module: list, load, spider, unload, purge, show,……• Files: software, data, job script • Path: • Queue: qsub, qstat, qdel.• Jobs: JobID,

Page 93: Introduction to HPCC for CEM956

Commands

• ssh• cd, ls, pwd, • module sub_comand• rm, mv, cp, scp• more, less, cat • nano, vi, • qsub, qstat, qdel, showq• echo, export • Mkdir, • tar, unzip, zip

Page 94: Introduction to HPCC for CEM956

Websites

• HPCC user wiki:http://wiki.hpcc.msu.edu

• Training informationhttp://icer.msu.edu/upcoming-workshops

Linux Commands:• An exhaustive list

http://ss64.com/bash/• A useful cheatsheet

http://fosswire.com/post/2007/08/unixlinux-command-cheat-sheet/• Explain a command given to you

http://explainshell.com/

Page 95: Introduction to HPCC for CEM956

Q & A

Page 96: Introduction to HPCC for CEM956

Thanks!

Page 97: Introduction to HPCC for CEM956

When would I use the HPC?

• Takes too long for computation• Runs out of memory• Needs licensed software• Needs advanced interface (visualization/database)• Read/write Lots of data

(Latency and Throughput)

Page 98: Introduction to HPCC for CEM956

Goals of this short workshop : Understanding of...

• the nature of high-performance computing (HPC) and when to use it;• the overall structure the MSU HPC Center (HPCC);• Understand the different ways (Xterm, ssh, RDP) of connecting to the

HPCC• the steps to run software on the MSU HPCC (workflow, module system);• managing files and data• working environment, using or developing software, and testing;• how to run programs on the HPC in 'batch' via the scheduler;• using GUI programs on the HPC for dev or computation• how to get help, training, or consultation

In short, Use Computation to "Reduce the Mean Time to Science" - Dirk Colbry, former director MSU HPCC

Page 99: Introduction to HPCC for CEM956

Connecting to the HPCC using ssh

Page 100: Introduction to HPCC for CEM956

Transferring Files

• Hostname: rsync.hpcc.msu.edu• Username: your netid• Password: your netid password• Port : 22

Exercise: download this presentation from/mnt/scratch/changc81/intro2hpcc/intro2hpcc.pdfto your local system