cuda developer tools: overview & exciting new features

41
Rafael Campana, Software Engineering Director, NVIDIA CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

Upload: others

Post on 02-Aug-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

Rafael Campana, Software Engineering Director, NVIDIA

CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

Page 2: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

2

Developer Workflow & Developer Tools portfolio

Overall updates

Developer Tool product overviews & new features

AGENDA

Page 3: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

3

DEVELOPER WORKFLOW

Application

Development

IDE integration

Debug Gfx APIDebug CUDA

System & CUDA

Trace

Graphics ProfilingCUDA Kernel

Profiling

Gfx GPU

crash dump

CUDA GPU

crash dump

Page 4: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

4

Nsight Eclipse EditionNsight Visual Studio Edition

DEVELOPER TOOLS PORTFOLIO

Application

Development

IDE integration

Debug Gfx APIDebug CUDA

System & CUDA

Trace

Graphics ProfilingCUDA Kernel

Profiling

Gfx GPU

crash dump

CUDA GPU

crash dump

Page 5: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

5

Nsight Eclipse EditionNsight Visual Studio Edition

DEVELOPER TOOLS PORTFOLIO

Application

Development

IDE integration

Debug Gfx APIDebug CUDA

System & CUDA

Trace

Graphics ProfilingCUDA Kernel

Profiling

Gfx GPU

crash dump

CUDA GPU

crash dump

Nsight Eclipse Editioncuda-gdbNsight Visual Studio EditionNsight Computecuda-memcheck & compute-sanitizerNsight Graphics

Page 6: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

6

Nsight Eclipse EditionNsight Visual Studio Edition

DEVELOPER TOOLS PORTFOLIO

Application

Development

IDE integration

Debug Gfx APIDebug CUDA

System & CUDA

Trace

Graphics ProfilingCUDA Kernel

Profiling

Gfx GPU

crash dump

CUDA GPU

crash dump

Nsight Eclipse Editioncuda-gdbNsight Visual Studio EditionNsight Computecuda-memcheck & compute-sanitizerNsight Graphics

cuda-gdb/Nsight Eclipse EditionNsight Visual Studio EditionNsight Aftermath

Page 7: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

7

Nsight Eclipse EditionNsight Visual Studio Edition

DEVELOPER TOOLS PORTFOLIO

Application

Development

IDE integration

Debug Gfx APIDebug CUDA

System & CUDA

Trace

Graphics ProfilingCUDA Kernel

Profiling

Gfx GPU

crash dump

CUDA GPU

crash dump

Nsight Eclipse Editioncuda-gdbNsight Visual Studio EditionNsight Computecuda-memcheck & compute-sanitizerNsight Graphics

cuda-gdb/Nsight Eclipse EditionNsight Visual Studio EditionNsight Aftermath

Nsight Systems

Page 8: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

8

Nsight Eclipse EditionNsight Visual Studio Edition

DEVELOPER TOOLS PORTFOLIO

Application

Development

IDE integration

Debug Gfx APIDebug CUDA

System & CUDA

Trace

Graphics ProfilingCUDA Kernel

Profiling

Gfx GPU

crash dump

CUDA GPU

crash dump

Nsight Eclipse Editioncuda-gdbNsight Visual Studio EditionNsight Computecuda-memcheck & compute-sanitizerNsight Graphics

cuda-gdb/Nsight Eclipse EditionNsight Visual Studio EditionNsight Aftermath

Nsight Systems

Nsight Compute

Nsight Graphics

Page 9: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

9

OVERALL UPDATESChips, platforms support across developer tools

CUDA 11.0 support OS support updatesMacOSX host platform only

Removal of Windows 7 support

Chips UpdateA100 GPU Support

Arm SBSA support

Page 10: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

10

NVIDIA® NSIGHT™ ECLIPSE EDITION

Plug-in to EclipseEclipse 4.8 to 4.11 support

Edit, build, debug CUDA-C applications

CUDA aware source code editor – syntax highlighting, code completion and inline help

Debugger - Seamless and simultaneous debugging of CPU and GPU code

NVCC build integration to cross compile for various target platforms

Docker support

Plug-ins enabling CUDA development

Documentation: https://docs.nvidia.com/cuda/nsight-eclipse-plugins-guide

Page 11: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

11

NSIGHT VISUAL STUDIO EDITION

Visual Studio 2015, 2017 and 2019 support

CPU + GPU CUDA Debugging

Source-correlated assembly debugging(SASS / PTX / SASS+PTX)

Data breakpoints for CUDA C/C++ code

Expressions in Locals, Watch and Conditionals

CUDA info view

Crash dump

Warp Watch view in NextGen debugger

Improved OptiX debugging (-lineinfo)

More info & documentation: https://developer.nvidia.com/nsight-visual-studio-edition

VS Extension enabling CUDA development

Page 12: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

12

NSIGHT TOOLS INTEGRATION INTO VISUAL STUDIO

Improved workflow for Nsight Systems, Nsight Graphics & Nsight Compute:• Settings passed to the standalone tool• Quick launch with key bindings• Complements Nsight Visual Studio Edition's Debugger

Works with Visual C++, C#, Visual Basic .NET, F#, and Python projectsOn Visual Studio Marketplace: installation and automatic update notificationsSupported in Visual Studio 2015, 2017, & 2019

Improved Workflow. Launches more powerful, standalone tools within Visual Studio

Tools appear under the Nsight menu (highlighted) Visual Studio project settings are transferred to the Nsight tool upon launch

More Info: https://developer.nvidia.com/nsight-tools-visual-studio-integration

Page 13: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

13

CUDA-GDB

Command line source and assembly (SASS) level debugger

Nsight Eclipse Edition debugging backend

Simultaneous CPU and GPU debugging

Inspect and modify memory, register, variable state

Control program execution

Runtime GPU error detection

Support for multiple GPUs, multiple contexts, multiple kernels, Thread focus

Core dump support

Documentation : http://docs.nvidia.com/cuda/cuda-gdb

Overview of command line debugger

(cuda-gdb) info cuda threads breakpoint all

BlockIdx ThreadIdx Virtual PC Dev SM Wp Ln Filename Line

Kernel 0

(1,0,0) (0,0,0) 0x0000000000948e58 0 11 0 0 infoCommands.cu 12

(1,0,0) (1,0,0) 0x0000000000948e58 0 11 0 1 infoCommands.cu 12

(1,0,0) (2,0,0) 0x0000000000948e58 0 11 0 2 infoCommands.cu 12

(1,0,0) (3,0,0) 0x0000000000948e58 0 11 0 3 infoCommands.cu 12

(1,0,0) (4,0,0) 0x0000000000948e58 0 11 0 4 infoCommands.cu 12

(1,0,0) (5,0,0) 0x0000000000948e58 0 11 0 5 infoCommands.cu 12

(cuda-gdb) info cuda threads breakpoint 2 lane 1

BlockIdx ThreadIdx Virtual PC Dev SM Wp Ln Filename Line

Kernel 0

(1,0,0) (1,0,0) 0x0000000000948e58 0 11 0 1 infoCommands.cu 12

Page 14: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

14

CUDA-GDB

Upgrade to GDB 8.2

MacOSX support brought back to life as a host

Performance improvements

● Module load time (~30% faster)

Quality improvements

● Improved handling of -lineinfo debug information (OptiX)● Improved debugging with parallel cuda-gdb sessions● Enabled CPU-side hardware watchpoints

New Features

Page 15: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

15

CUDA-MEMCHECK

Multiple tools

memcheck : reports out of bounds/misaligned memory access errors

racecheck : identifies races on __shared__ memory

initcheck : usage of uninitialized global memory

synccheck : identify invalid usage of __syncthreads() and __syncwarp()

Documentation: http://docs.nvidia.com/cuda/cuda-memcheck/index.html

Functional correctness checking tool suite

Page 16: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

16

COMPUTE-SANITIZER

Next-Gen replacement tool for cuda-memcheck

New command line interface (CLI) tool based on the Sanitizer API

Performance gain for applications using libraries such as CUSOLVER, CUFFT or DL frameworks

cuda-memcheck has performance issues on Windows that were inherent in its design

compute-sanitizer fixes that and brings performance on Windows to be on par with Linux.

Available in CUDA 11.0

OS: Linux (x86_64, Power, Arm SBSA), Windows

GPUs: Maxwell+

cuda-memcheck still supported in CUDA 11.0 (does not support Arm SBSA)

Documentation: https://docs.nvidia.com/cuda/compute-sanitizer

New tool for functional correctness checking tool suite

Page 17: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

17

COMPUTE-SANITIZER

Provides finer control than cuda-memcheck through APIs to analyze memory patterns

APIs are grouped into two categories:

Callback API – CUDA events such as memory allocations/kernel

Patching API – inserts patches for specific memory instructions

New in CUDA 11.0:

• SanitizerSetCallbackData API has been updated to take a function as input rather than a stream

• Added support for Cooperative Groups

• Memory Access callbacks for shared/local memory now report the address offset within the shared/local window

Documentation: https://docs.nvidia.com/cuda/compute-sanitizer

Samples: https://github.com/NVIDIA/compute-sanitizer-samples

API for functional correctness checking tool suite

Page 18: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

18

NSIGHT SUITE OF PROFILERS

Page 19: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

19

NSIGHT TOOLS WORKFLOW

Nsight SystemsComprehensive system-level

performance

Nsight ComputeDetailed CUDA kernel performance

Nsight GraphicsDetailed frame/render performance

Dive into top CUDA kernels by using metrics/counter

collection

Dive into graphicsframes

Start here

Re-check overall performance

Re-check overall performance

Page 20: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

20

NSIGHT SYSTEMSSystem profiler

Key Features:

• System-wide application algorithm tuning

• Multi-process tree support

• Locate optimization opportunities

• Visualize millions of events on a very fast GUI timeline

• Or gaps of unused CPU and GPU time

• Balance your workload across multiple CPUs and GPUs

• CPU algorithms, utilization and thread stateGPU streams, kernels, memory transfers, etc

• Command Line, Standalone, IDE Integration

OS: Linux (x86, Power, Arm SBSA, Tegra), Windows, MacOSX (host)

GPUs: Pascal+

Docs/product: https://developer.nvidia.com/nsight-systems

Page 21: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

21

CPU

utilization

Processes &

threads

OS runtime

APIs

CUDA &

cuBLAS APIs

GPU CUDA

Kernels & memory

transfers

CPU IP &

backtrace

sample data

NVTX

annotations

Additionally

Trace:• TensorRT• Direct3D11,12,DXR• Vulkan• OpenGL• OpenACC• MPI• OpenMP• Ftrace• ETW• WDDM• GPU Context Switch

Export:• SQLite• HDF5• JSON

Architectures:• X86_64• Power• Arm SBSA• Tegra

NVTX projected on

GPU CUDA streams

Page 22: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

22

NSIGHT SYSTEMS

● NVTX projected

onto GPU

● Event table

Key Features

Page 23: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

23

NSIGHT SYSTEMSMPI & OpenACC Trace

Page 24: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

24

NSIGHT SYSTEMSOpenMP 5 API Trace

Page 25: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

25

NSIGHT SYSTEMSCUDA graph node correlation & NVTX projection

Page 26: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

26

● Before

No utilization

No sorting

● After

Clear activity/state

Sort by utilization

NSIGHT SYSTEMSImproved UX without thread scheduler data permission

Page 27: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

27

NSIGHT SYSTEMS

CLI improvement

Simultaneous sessions & management

nsys launch forwards terminal and signals

nsys stats command

nsys export command

CUDA context “All Streams” aggregation tree

Windows WDDM GPU memory usage, paging, evictions

Graphics trace enhancements

Other Features

Page 28: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

28

NSIGHT COMPUTEKernel Profiling Tool

Key Features:

• Interactive CUDA API debugging and kernel profiling

• Fast Data Collection

• Compare performance metrics across different runs

• Fully Customizable (Programmable UI/Guided Analysis)

• Command Line, Standalone, IDE Integration

OS: Linux (x86, Power, Tegra, Arm SBSA), Windows, MacOSX(host only)

GPUs: Volta, Turing, A100 GPUs

Docs/product: https://developer.nvidia.com/nsight-compute

Page 29: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

29

NSIGHT COMPUTEProfile Report – Details Page

Focused Sections

All Data on

Single Page

Ordered from Top-Level to

Low-Level

Page 30: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

30

NSIGHT COMPUTESection Example

Section Headerprovides overview &

context for other sections

Section Bodyprovides additional

details (tables & charts)

Section Configcompletely data driven

add/modify/change sections

Page 31: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

31

NSIGHT COMPUTEUnguided Analysis / Rules System

Analysis Rulesrecommendations from

nvvp and more

Rules Configcompletely data driven

add/modify/change rules

Page 32: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

32

NSIGHT COMPUTEDiff’ing kernel runs

Metric deltacurrent values and changes

from baseline

Baselinefrom any previous profile report

(different kernel, gpu, …)

Chart differencecurrent values and

baseline values

Page 33: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

33

NSIGHT COMPUTE

Support for Arm (SBSA)

target architecture

Support for PowerPC

target architecture

Profile activity command

line is shown

New Features

Page 34: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

34

NSIGHT COMPUTE

Detailed breakdown of

high-level SOL metrics

New Features

Page 35: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

35

NSIGHT COMPUTE

Improved help for Source page

and Details page charts

New Features

Page 36: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

36

NSIGHT COMPUTE

Support for CUDA Asynchronous Copy

Sparse Data compression

A100 GPU support

Page 37: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

37

NSIGHT COMPUTE

Efficient way to evaluate kernel characteristics, quickly understand potential directions for further improvements or existing limiters

Inputs: Arithmetic Intensity (FLOPS/bytes)Performance (FLOPS/s)

Ceilings: Peak Memory BandwidthPeak FP32/FP64 Performance

New Roofline analysis

Page 38: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

38

NSIGHT COMPUTE

New Hot Spot Tables

Result Filtering by NVTX

Source Page can now be collapsed to generate aggregate metrics

Recommendations: links to other sections & lines on Source page, access instanced metrics

Workflow Improvements

Page 39: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

39

NSIGHT COMPUTE

Shortened Executables and File Extensions

Old names continue to work

New rule for reporting uncoalesced memory accesses

Support report name placeholders %p, %q, %i, and %h

Improved documentation

Other Improvements

Versions UI CLI Report file extension

2019.5 (CUDA 10.2) and before nv-nsight-cu nv-nsight-cu-cli .nsight-cuprof-report

2020.1 (CUDA 11.0) and newer ncu-ui ncu .ncu-rep

Page 40: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES

40

USEFUL LINKS

Web: https://developer.nvidia.com/tools-overview

How to contact us? Forums: https://forums.developer.nvidia.com/c/development-toolsemail: [email protected]

Other digital GTC talks of interest:

S21351: Scaling the Transformer Model Implementation in PyTorch Across Multiple Nodes

S21547: Rebalancing the Load:Profile-Guided Optimization of the NAMD Molecular Dynamics Program for Modern GPUs using Nsight Systems

S21771: Optimizing CUDA Kernels in HPC Simulation and Visualization Codes using Nsight Compute

S21565: Roofline Performance Model for HPC and Deep-Learning Applications

Page 41: CUDA DEVELOPER TOOLS: OVERVIEW & EXCITING NEW FEATURES