hpcc mid-morning break high performance computing on a gpu cluster dirk colbry, ph.d. research...
TRANSCRIPT
![Page 1: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cdf5503460f949a8bcd/html5/thumbnails/1.jpg)
HPCC Mid-Morning Break
High Performance Computing on a GPU cluster
Dirk Colbry, Ph.D.
Research Specialist
Institute for Cyber Enabled Discovery
![Page 2: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cdf5503460f949a8bcd/html5/thumbnails/2.jpg)
What is a GPU?
• Graphics Processing Unit• Originally designed to make
Video Games• Uses many processing cores to
parallelize the math required for real time game play.
• Early researchers made general programs that looked like graphics so they could run in the GPU.
• In 2006 nVidia released the CUDA programming interface to allow users to easily make scalable general purpose programs that run on the GPU (GPGPU).
![Page 3: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cdf5503460f949a8bcd/html5/thumbnails/3.jpg)
GPU vs CPU
![Page 4: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cdf5503460f949a8bcd/html5/thumbnails/4.jpg)
CPU and GPU working together
![Page 5: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cdf5503460f949a8bcd/html5/thumbnails/5.jpg)
Running on the GPU
• Program Starts on the CPU Copy data to GPU (slow-ish) Run kernel threads on GPU (very fast) Copy results back to CPU (slow-ish)
• There are a lot of clever ways to fully utilize both the GPU and CPU.
![Page 6: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cdf5503460f949a8bcd/html5/thumbnails/6.jpg)
Pros and Cons
• Benefits Lots of processing
cores. Works with the CPU
as a co-processor Very fast local
memory bandwidth Large online
community of developers
• Drawbacks Can be difficult to
program. Memory Transfers
between GPU and CPU are costly (time).
Cores typically run the same code.
![Page 7: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cdf5503460f949a8bcd/html5/thumbnails/7.jpg)
gfx-000 Test hardware
• Single Quad core 2.4 Ghz Intel Processor.
• 8GB of CPU RAM• Three Nvidia GTX 280 Video cards:
1GB of ram per card 240 CUDA processing Cores per card 1.3 GHz Processor Clock Speed
• Total of 724 cores on a single machine
![Page 8: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cdf5503460f949a8bcd/html5/thumbnails/8.jpg)
Installed Software on gfx-000
• Cuda toolkit 2.2 and 2.3 For programming in c/c++ and fortran
• cublas – Cuda version of blas libraries• cufft – Cuda version of fft libraries• pycuda – Python Cuda Interface• Zephyr – Molecular Dynamics Program
optimized for GPUs
![Page 9: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cdf5503460f949a8bcd/html5/thumbnails/9.jpg)
Other Available Software
• OpenCL c/c++ interface
• Jacket Matlab GPU wrapper
• Lattice Boltzmann pde solver
• OpenVIDIA Machine Vision
• Many Many others
• Cuda Zone ~90 thousand cuda
developers. Lots of software
examples Developer Forms Tutorials
• http://www.nvidia.com/object/cuda_home.html
![Page 10: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cdf5503460f949a8bcd/html5/thumbnails/10.jpg)
New GPU Cluster Buy-In
• Rack Units: 1U• CPU: 2x Intel Xeon E5530 Quad-Core 2.40GHZ• Memory: 18GB of Ram• Hard drive: 250GB disk for OS and Local
Scratch• Network: Ethernet only, (no Infiniband support)• GPU: Two Nvidia Tesla M1060 GPUs• Support: Four year, next business day hardware
support• Cost: $5,224
![Page 11: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cdf5503460f949a8bcd/html5/thumbnails/11.jpg)
Each Nvidia Tesla M1060
• Number of Streaming Processor Cores 240• Frequency of processor cores 1.3 GHz• Single Precision peak floating point performance 933 gigaflops• Double Precision peak floating point performance 78 gigaflops• Dedicated Memory 4 GB GDDR3• Memory Speed 800 MHz• Memory Interface 512-bit • Memory Bandwidth 102 GB/sec• System Interface PCIe
![Page 12: HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery](https://reader035.vdocuments.site/reader035/viewer/2022062713/56649cdf5503460f949a8bcd/html5/thumbnails/12.jpg)
What are we buying
• 240 cores * 2 GPUs + 4 cores * 2 CPUs = 488 Cores / node
• 31 Nodes (minimum) * 488 Cores / node = 15,128 cores in our new cluster
• However, 20 of these nodes are dedicated buy-in nodes so only 5368 cores will be available in the general cluster