nvidia tensorrt hyperscale inference platform infographic · 2019-10-27 · the explosion of ai...

THE EXPLOSION OF AIDemand for personalized services has led to a dramatic increase in the complexity, number, and variety of AI-powered applications and products. Applications use AI inference to recognize images, understand speech, or make recommendations. To

be useful, AI inference has to be fast, accurate, and easy to deploy.

EFFICIENCY RATE OF LEARNINGTHROUGHPUT

PROGRAMMABILTY LOW LATENCY ACCURACY SIZE OF NETWORK

THE POWER OF NVIDIA TensorRTNVIDIA TensorRT™ is a high-performance inference platform that includes an optimizer,

runtime engines, and inference server to deploy applications in production. TensorRT speeds apps up to 40X over CPU-only systems for video streaming, recommendation,

and natural language processing.

PRODUCTION-READYDATA CENTER INFERENCE

The NVIDIA TensorRT inference server is a containerized microservice that enables applications to use AI models in data center production. It maximizes GPU utilization,

supports all popular AI frameworks, and integrates with Kubernetes and Docker.

speeds apps up to 40X over CPU-only systems for video streaming, recommendation, and natural language processing.

Weight and ActivationPrecision Calbration

Layer and Tensor Fusion

Multi-Stream Execution

Dynamic Tensor Memory

Kernel Auto-Tuning

Application Developers Avoid spending time writing inference capabilities from scratch and focus on creating innovative solutions with AI.

DevOps EngineersEasily deploy inference services for multiple applications and take advantage of orchestration, load balancing, and autoscaling.

Data Scientists and ResearchersFocus on designing and training models using any of the top AI frameworks without worrying about inference implementation.

Image and VideoClassification

Natural LanguageProcessing

RecommendationEngine

ModelRepository

THE BEST AI PLATFORM. www.nvidia.com/data-center-inference

© 2018 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, TensorRT, and Tesla are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. All other trademarks and copyrights are the property of their respective owners.

INTRODUCING THENVIDIA AI INFERENCE PLATFORM

UNDERSTANDING INFERENCE PERFORMANCE

With inference, speed is just the beginning of performance. To get a complete picture about inference performance, there are seven factors to consider,

ranging from programmability to rate of learning.

INSIDE THE NVIDIA TensorRTHYPERSCALE INFERENCE PLATFORM

The NVIDIA TensorRT Hyperscale Inference Platform delivers on all fronts. It delivers the bestinference performance at scale with the versatility to handle the growing diversity of today's networks.

STATE-OF-THE-ART

AIGPU Deep Learning with the

NVIDIA TensorRT Hyperscale Inference Platform

NVIDIA T4 POWERED BY TURING TENSOR CORES

Efficient, high-throughput inference depends on a world-class platform. The NVIDIA® Tesla® T4 GPU is the world’s most advanced accelerator for all AI inference workloads. Powered by

NVIDIA Turing™ Tensor Cores, T4 provides revolutionary multi-precision inference performance to accelerate the diverse applications of modern AI.

Multi-PrecisionFP16 | Up to 65 TFLOPSINT8 | Up to 130 TOPSTFLOPS = trillion floating-point operations per secondTOPs = trillion operations per second

Kubernetes on NVIDIA GPUs

GPU or CPU

NVIDIA TensorRT Inference Server

PostprocessingPlugin

PreprocessingPlugin

nvidia tensorrt hyperscale inference platform infographic · 2019-10-27 · the explosion of ai...

Documents

fast neural network inference with tensorrt on autonomous...

converged charging for hyperscale service providers

deep learning inferencing on ibm cloud with nvidia...

swe-swdoctrt-001-reln vtensorrt 5.0.3 | october 2018 .........

what’s new in matlab and simulink...deploying deep...

hyperscale networking for all - chapter 2

march 20, 2019 tensorrt inference - nvidia · tensorrt...

advanced spark and tensorflow meetup 2017-05-06 reduced...

financial services einsights - hyperscale computing ebook

support matrix for tensorrt - docs.nvidia.com · support...

deepstream sdk - user manual search engine · cudnn 7 &...

hyperscale in action: implementation and beyond …

gpu inference performance study whitepaper - · pdf...

Высокопроизводительный...

hyperlane services for hyperscale data centers

tensorrt neural network deployment with digits and ·...

support matrix for tensorrt - nvidia · support matrix for...

swe-swdoctrt-001-devg vtensorrt 5.1.0 rc | january 2019...

gpu coder: automatic cuda and tensorrt code generation...

september 2018 tensorrt release notes - nvidia ·...