ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail gpu software: modules dev core apps pgi...
TRANSCRIPT
![Page 1: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/1.jpg)
![Page 2: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/2.jpg)
● ssh glogin.ibex.kaust.edu.sa● First login auto-generates keys & ssh config
– .ssh/config● Host glogin #GPU login nodes
Hostname glogin.ibex.kaust.edu.saUser $USERIdentityFile ~/.ssh/ksl-internalStrictHostKeyChecking noForwardX11 yesForwardX11Trusted yes
Getting Started: GPU Login
https://www.hpc.kaust.edu.sa/ibex/new_user https://www.hpc.kaust.edu.sa/ibex/faq
![Page 3: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/3.jpg)
● Modules– Customized to login node (GPU, Intel, AMD)
● glogin: /sw/csg/modulefiles/*
– Improved GPU App Stack is here● Make requests: [email protected] ● Stay connected: https://kaust-ibex.slack.com/
– #announce, #general, #gpu
– Prefer default modules (/sw/csg/modulefiles/*)● /cbrc/modules/* will be deprecated
GPU Software: Modules
![Page 4: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/4.jpg)
● Modules
module availmodule load module/version
GPU Software: Modules
https://www.hpc.kaust.edu.sa/ibex/appNvidia / show all
![Page 5: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/5.jpg)
module avail
GPU Software: Modules
DEV CORE APPS
pgi (OpenACC)gccintelcmakegitjavamaven
NVIDIA (OpenGL / EGL)cudacudnnncclopenmpi
anaconda3machine_learning tensorflow keras torch caffe* caffe2 theano* scipy, numpy, scikit-learn, etc.
paraviewbclfastq2cp2kgromacslammpsmapdnamdpysparkrelionseismic_unixsphire
* NVIDIA EGL supported; X11+GL support is missing...
![Page 6: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/6.jpg)
![Page 7: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/7.jpg)
● sinfo --partition=batch --format="%n %f" | fgrep -v nogpu
● dgpu501-22-r cpu_intel_e5_2670,gpu,...,tesla_k40mdgpu502-01-l cpu_intel_e5_2670,gpu,...,tesla_k20mdgpu702-16 cpu_intel_e5_2699_v3,gpu,...,gtx1080tidgpu703-01 cpu_intel_e5_2699_v3,gpu,...,p100dgpu703-25 cpu_intel_e5_2699_v3,gpu,...,p6000
GPU Jobs + Constraints
https://www.hpc.kaust.edu.sa/ibex/job
![Page 8: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/8.jpg)
● srun --time=30:00 --mem=64GB--gres=gpu:p100:1 --pty bash -l
● sbatch --time=60:00 --mem=128GB--gres=gpu:2--constraint="[p100|p6000]"runjob.sbat
GPU Jobs + Constraints
https://www.hpc.kaust.edu.sa/ibex/job
![Page 9: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/9.jpg)
● sbatch --time=60:00 runjob.sbat● runjob.sbat
#SBATCH --job-name=gpujob#SBATCH --gres=gpu:gtx1080i:4#SBATCH --constraint="[local_500G]"#SBATCH --mem=128GB#SBATCH --nodes=2 --ntasks-per-node=2
GPU Jobs + Constraints
https://www.hpc.kaust.edu.sa/ibex/job
![Page 10: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/10.jpg)
● CMake– module load cmake
● C++– System default: GCC v4.8.5
– module load gcc/6.4.0
– module load pgi/17.10
GPU Software: Modules & Compilers
![Page 11: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/11.jpg)
● CUDA– module load cuda
– nvcc -std=c++11 -o example example.cu● cuDNN
– module load cudnn
– nvcc -std=c++11 -o example example.cu
GPU Software: Modules & Compilers
![Page 12: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/12.jpg)
GPU Visualization Analytic Apps
● ParaView (HPC visualization / analytics)– module load paraview
– https://wiki.vis.kaust.edu.sa/training/2017-18/advancedparaviewworkshop
● MapD (GPU Database)– Available for early-user testing...
https://wiki.vis.kaust.edu.sa/training
![Page 13: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/13.jpg)
GPU Python Environments
● anaconda3– module load anaconda3
– conda list
– ipython
● Custom Python environments:
– conda --help– https://conda.io/docs/_downloads/conda-cheatsheet.pdf
– https://conda.io/docs/
![Page 14: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/14.jpg)
GPU Machine Learning Apps
● machine_learning– module av machine_learning
● <year>.<num>-cudnn<ver>-cuda<ver>-py<ver>
– module load machine_learning
– conda list
– Contains: ● TensorFlow, Keras, Caffe2, Torch, etc. +
numpy, scipy, scikit-learn, pandas, matplotlib, etc.
![Page 15: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/15.jpg)
GPU Machine Learning Apps
● tensorflow– module load tensorflow
– ipython
>>> import tensorflow as tf
– python <model.py>
![Page 16: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/16.jpg)
GPU Performance Tools
● General Information (not scalable)
– nvidia-smi+-----------------------------------------------------------------------------+| NVIDIA-SMI 384.98 Driver Version: 384.98 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. ||===============================+======================+======================|| 0 GeForce GTX TIT... On | 0000:0D:00.0 Off | N/A || 37% 56C P2 153W / 189W | 135MiB / 6081MiB | 86% Default |+-------------------------------+----------------------+----------------------+| 1 GeForce GTX TIT... On | 0000:0E:00.0 Off | N/A || 31% 47C P8 34W / 189W | 2MiB / 6082MiB | 0% Default |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes: GPU Memory || GPU PID Type Process name Usage ||=============================================================================|| 0 72633 C ../../build.cudnntraining.teneen/trainlenet 133MiB |+-----------------------------------------------------------------------------+
KSL provides profiling training...
![Page 17: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/17.jpg)
GPU Performance Monitoring
● Modify Batch Script:
● View / Truncate logs
tail -f gpu-dmon.log
truncate --size=0 gpu-dmon.log
# SBATCH ...
# After SBATCH section, but before running main program# Pipe nvidia-smi logging into *.log file.# Must run nvidia-smi in background
nvidia-smi dmon >> gpu-dmon.log &
# Run primary GPU application here...# Don't run primary application in background
# After primary GPU application# kill nvidia-smi monitor to allow batch job to terminate early.
pkill nvidia-smi For Testi
ng ONLY
For Testi
ng ONLY
NOTNOT fo
r Pro
duction
for P
roductio
n
![Page 18: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/18.jpg)
![Page 19: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/19.jpg)
![Page 20: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi](https://reader030.vdocuments.site/reader030/viewer/2022041122/5f37be190f67f96da25fad35/html5/thumbnails/20.jpg)