introduction to deep learning (nvidia)

Oct 2016

NVIDIA DEEP LEARNING

ENTERPRISE AUTOGAMING DATA CENTERPRO VISUALIZATION

THE WORLD LEADER IN VISUAL COMPUTING

THE BIG BANG IN MACHINE LEARNING

DNN GPUBIG DATA

100 hours of video uploaded every minute

350 millions images uploaded per day

2.5 Petabytes of customer data hourly

2008 2009 2010 2011 2012 2013 2014

NVIDIA GPU x86 CPU

BIG DATA & ANALYTICS

AUTOMOTIVEAuto sensors reporting

location, problems

COMMUNICATIONSLocation-based advertising

CONSUMER PACKAGED GOODSSentiment analysis of what’s hot, problems

$FINANCIAL SERVICES

Risk & portfolio analysis New products

EDUCATION & RESEARCHExperiment sensor analysis

HIGH TECHNOLOGY / INDUSTRIAL MFG.

Mfg. qualityWarranty analysis

LIFE SCIENCESClinical trials

MEDIA/ENTERTAINMENTViewers / advertising

effectiveness

ON-LINE SERVICES / SOCIAL MEDIA

People & career matching

HEALTH CAREPatient sensors, monitoring, EHRs

OIL & GASDrilling exploration sensor

analysis

RETAILConsumer sentiment

TRAVEL &TRANSPORTATION

Sensor analysis for optimal traffic flows

UTILITIESSmart Meter analysis for network capacity,

LAW ENFORCEMENT & DEFENSE

Threat analysis - social media monitoring, photo analysis

EXPONENTIAL DATA GROWTH

INCREASING DATA VARIETY

Search Marketing

Behavioral Targeting

Dynamic Funnels

User Generated Content

Mobile Web

SMS/MMS

Sentiment

HD Video

Speech To Text

Product/Service Logs

Social Network

Business Data Feeds

User Click Stream

Sensors Infotainment Systems

Wearable Devices

CyberSecurity Logs

ConnectedVehicles

Machine Data

IoT Data

Dynamic Pricing

Payment Record

Purchase Detail

Purchase Record

Support Contacts

Segmentation

Offer Details

Web Logs

Offer History

A/B Testing

BUSINESS PROCESS

Streaming Video

Natural Language Processing

DIGITAL

AI 90% of the world’s data created in the last year - IBM

WHAT IS DEEP LEARNING?

ARTIFICALINTELLIGENCE MACHINE

LEARNINGDEEP LEARNINGPerception

Reasoning

Planning

Optimization

Computational Statistics

Supervised and Unsupervised Learning

Neural networks

Distributed Representations

Hierarchical Explanatory Factors

Unsupervised Feature Engineering

DEEP LEARNING FUELING DISCOVERY

Classify Satellite Images for Carbon Monitoring

Analyze Obituaries on the Web for Cancer-related Discoveries

Determine Drug Treatments to Increase Child’s Chance of Survival

NASA AMES

DEEP LEARNING FOR EVERY APPLICATION

Visual search for e-commerce

Visual Search in Geoinformatics

Improving Agriculture: LettuceBot only

sprays weeds

Language Classification

Deep Learning CNN

Super-Human Language Translation

DEEP LEARNING FOR EVERY APPLICATION

CONSUMERS LOVE DEEP LEARNING

MORE THAN 1,500 AI START UPS AROUND THE WORLD

Deep Learningfor Art

Deep Learning for Cybersecurity

Deep Learning for Genomics

Deep Learning for Self-Driving Cars

IMAGENET CHALLENGEWhere it all started … again

person

hammer

flower pot

power drill

person

helmet

motorcycle

person

1.2M training images • 1000 object categories

Challenge

ACHIEVING SUPERHUMAN PERFORMANCE

2012: Deep Learning researchers

worldwide discover GPUs

2016: Microsoft achieves speech recognition

milestone

2015: ImageNet — Deep Learning achievessuperhuman image

recognition

DEEP LEARNING ADOPTION IS EXPONENTIAL

# of Organizations Using Deep Learning

Source: Jeff Dean, Spark Summit 2016

MASSIVE COMPUTING CHALLENGE

SPEECH RECOGNITION

2014Deep Speech 1

80 GFLOP7,000 hrs of Data

~8% Error

465 GFLOP12,000 hrs of

Data~5% Error

2015Deep Speech 2

10XTraining Ops

IMAGE RECOGNITION

2012AlexNet

8 Layers1.4 GFLOP~16% Error

152 Layers22.6 GFLOP~3.5% Error

2015ResNet

16XModel

Device

NVIDIA DEEP LEARNING PLATFORM

TRAINING

DIGITS Training System

Deep Learning Frameworks

Tesla P100, DGX1

DATACENTER INFERENCING

DeepStream SDK

TensorRT

Tesla P40 & P4

Device

NVIDIA DEEP LEARNING PLATFORM

TRAINING DATACENTER INFERENCING

Training: comparing to Kepler GPU in 2013 using Caffe, Inference: comparing img/sec/watt to CPU: Intel E5-2697v4 using AlexNet

65Xin 3 years

Tesla P100

40Xvs CPU

Tesla P4

40x Efficient vs CPU, 8x Efficient vs FPGA

AlexNet

CPU FPGA 1x M4 (FP32) 1x P4 (INT8)

Maximum Efficiency for Scale-out Servers

TESLA P4

5.5 TFLOPS

20,000

40,000

60,000

80,000

100,000

GoogLeNet AlexNet

8x M40 (FP32) 8x P40 (INT8)TESLA P40Highest Throughput for Scale-up Servers

4x Boost in Less than One Year

INTRODUCING TESLA P100

Page Migration Engine

Virtually Unlimited Memory

CoWoS HBM2

3D Stacked Memory (i.e fast!)

NVLink

GPU Interconnect for Maximum Scalability

NVIDIA DGX-1AI Supercomputer-in-a-Box

Instant productivity — plug-and-play, supports every AI framework

Performance optimized across the entire stack

Always up-to-date via the cloud

Mixed framework environments —containerized

Direct access to NVIDIA experts

DGX STACKFully integrated Deep Learning platform

NVIDIA POWERS DEEP LEARNINGEvery major DL framework leverages NVIDIA SDKs

Mocha.jl

NVIDIA DEEP LEARNING SDK

COMPUTER VISION SPEECH & AUDIO NATURAL LANGUAGE PROCESSING

OBJECT DETECTION

IMAGE CLASSIFICATION

VOICE RECOGNITION

LANGUAGE TRANSLATION

RECOMMENDATION ENGINES

SENTIMENT ANALYSIS

NVIDIA DIGITSInteractive Deep Learning GPU Training System

Interactive deep neural network development environment for image classification and object detection

Schedule, monitor, and manage neural network training jobs

Analyze accuracy and loss in real time

Track datasets, results, and trained neural networks

Scale training jobs across multiple GPUs automatically

NVIDIA cuDNNAccelerating Deep Learning

High performance building blocks for deep learning frameworks

Drop-in acceleration for widely used deep learning frameworks such as Caffe, CNTK, Tensorflow, Theano, Torch and others

Accelerates industry vetted deep learning algorithms, such as convolutions, LSTM, fully connected, and pooling layers

Fast deep learning training performance tuned for NVIDIA GPUs

Deep Learning Training PerformanceCaffe AlexNet

K40 K80 + cuDN…

M40 + cuDNN4

P100 + cuDNN5

“ NVIDIA has improved the speed of cuDNNwith each release while extending the interface to more operations and devices at the same time.”— Evan Shelhamer, Lead Caffe Developer, UC Berkeley

AlexNet training throughput on CPU: 1x E5-2680v3 12 Core 2.5GHz. 128GB System Memory, Ubuntu 14.04M40 bar: 8x M40 GPUs in a node, P100: 8x P100 NVLink-enabled

0 50 100 150 200 250 300

1x CPU (14 cores)

Inference Execution Time (ms)

User Experience: Instant Response45x Faster with Pascal + TensorRT

Faster, more responsive AI-powered services such as voice recognition, speech translation

Efficient inference on images, video, & other data in hyperscale production data centers

INTRODUCING NVIDIA TensorRTHigh Performance Inference Engine

260 ms

Training

Device

Datacenter

NVIDIA DEEPSTREAM SDKDelivering Video Analytics at Scale

Inference

PreprocessHardware Decode

“Boy playing soccer”

Simple, high performance API for analyzing video

Decode H.264, HEVC, MPEG-2, MPEG-4, VP9

CUDA-optimized resize and scale

TensorRT

1x Tesla P4 Server +DeepStream SDK

13x E5-2650 v4 Servers

Concurrent Video Streams Analyzed

“Billions of intelligent devices will take advantage of deep learning to provide personalization and localization as GPUs become faster and faster over the next several years.” — Tractica

BILLIONS OF INTELLIGENT DEVICES

SMART CITIES OF THE FUTURE

“Pittsburgh's "predictive policing" program … police car laptops will display maps showing locations where crime is likely to occur, based on data-crunching algorithms developed by scientists at Carnegie Mellon University — Science

ACCELERATED ANALYTICS TECHNOLOGY

GPU-ACCELERATION HAS NO LIMITS

MapDMapD is 55x to 1,000x faster than comparable CPU databases on billion+ row datasets

KineticaHardware costs that are 1⁄10 that of standard in-memory databases

BlazeGraph200-300x speed-up

GraphistrySee 100x more data at millisecond speed

SQreamThe supercomputing powers of the GPU combined with SQream’s patented technology, results in up to 100 times faster analytics performance on terabyte-petabyte scale data sets

MASSIVE SCALE GPU ACCELERATED ANALYTICS

DEA theft of Silk Road bitcoinsSIEM attack escalationTwitter botnet deconstruction

GETTING STARTED WITH DEEP LEARNINGdeveloper.nvidia.com/deep-learning

Thank you!

introduction to deep learning (nvidia)

Technology

nvidia deep learning platform · 2018-01-30 · the nvidia...

deep learning demystified - nvidia€¦ · deep learning is...

the nvidia dgx-1 deep learning system - one stop...

deep learning at scale on nvidia v100 accelerators · deep...

nvidia deep learning最新情報in沖縄

accelerating the datacenter - nvidia · new! nvidia...

nvidia volta deep learning ami · ‣ pytorch from nvidia...

embedded deep learning with nvidia jetson

nvidia deep learning institute the nvidia deep learning...

Инструментарий nvidia для deep learning

nvidia gpus on openshift deep learning workloads with ·...

nvidia deep learning institute · >>deep learning workflows...

deep learning introduction - nvidia developer

nvidia deep learning sdk nvidia deep learning sdk...

nvidia deep learning platform€¦ · the nvidia deep...

deep learning and beyond...15 nvidia deep learning institute...

accelerate deep learning - nvidia · 9:00am: caffe: an open...

dgxupdate - nvidia...nvidia docker gpu driver nvidia driver...

nvidia deep learning solutions - alex sabatier

nvidia dgx-1 artificial intelligence system · 2019. 1....