understanding the use cases driving nvme-over-fabrics adoption · 12/5/2018 · gpu farm: nvidia...
TRANSCRIPT
Understanding the Use Cases Driving NVMe-over-Fabrics Adoption
Julie Herd Director, Technical Marketing
E8 Storage
NVMe Developer Days 2018 San Diego, CA
1
NVMe Market Adoption
§ Adoption shifting over time • PC / Servers continue to lead • Storage adoption ramping
§ NVMe-oF for Storage • Standard critical for ramp • Major players entering market
NVMe Developer Days 2018 San Diego, CA
2
Early NVMe Use Cases
Applications that drive business revenue § Financial Analytics § Genome Research § Artificial Intelligence / Machine Learning / Deep Learning § Fluid Dynamics
NVMe Developer Days 2018 San Diego, CA
3
NVMe for Financial Analytics
§ “Latency is King”
§ FinTech driving automated trading
§ Real-time transaction processing
§ Daily trade analytics
NVMe Developer Days 2018 San Diego, CA
4
Market Data Analytics for Financials
Before • 1152 local SSDs in 72 servers • Market data copied nightly to all servers • Restricted to 10TB-20TB
After • 48 SSDs in 2 E8-D24 appliances • Market data shared to all 72 servers • Easily scalable to 300TB
In production with 2 of the world’s Top-10 largest hedge funds
NVMe Developer Days 2018 San Diego, CA
5
70%Costreduction!
Accelerating the Science of Life
§ Primary genome sequencing • Historically – 10 hours per genome • NVMe – 10 genomes per hour!
§ Secondary genomic analysis • Analyzing genomes per population • Requires TB of data
NVMe Developer Days 2018 San Diego, CA
6
Genomic Acceleration
"We were keen to test E8 by trying to integrate it with our Univa Grid Engine cluster as a consumable resource of ultra-performance scratch space. Following some simple tuning and using a single EDR link we were able to achieve about 5GB/s from one node and 1.5M 4k IOPS from one node. Using the E8 API we were quickly able to write a simple Grid Engine prolog/epilog that allowed for a user-requestable scratch volume to be automatically created and destroyed by a job. The E8 box behaved flawlessly and the integration with InfiniBand was simpler than we could have possibly expected for such a new product." Dr. Robert Esnouf, Director of Research Computing Oxford Big Data Institute + Wellcome Center for Human Genetics
Shared NVMe as a fast tier for parallelizing genomic processing
NVMe Developer Days 2018 San Diego, CA
7
Example Architecture
NVMe Developer Days 2018 San Diego, CA
8
Artificial Intelligence / Machine Learning
§ Training phase is critical to AI • Massive amounts of data required for deep learning • Fast storage required to keep expensive GPUs busy
§ Data profile • Large and small I/O • Millions of files • Low latency
NVMe Developer Days 2018 San Diego, CA
9
AI/ML with IBM GPFS and NVIDIA
10
Shared NVMe Accelerates Deep Learning GPUFarm:NvidiaDGX-1• Upto8GPUspernode• GPFSClient+E8Agentrunon
x86withinGPUServer• Upto126GPUnodesincluster
Mellanox100GIB
SharedNVMeStorage• E8-D242U24-HA• Dual-port2.5”NVMeDrives• Upto184TB(raw)per2U• PatentedDistributedRAID6
NVMe Developer Days 2018 San Diego, CA
Fluid Dynamics and EDA
§ Electronic designs are validated using fluid dynamics • High performance clusters • High throughput and low latency are key
§ Examples • Formula 1 • Architecture • Airplanes
NVMe Developer Days 2018 San Diego, CA
11
High Frequency Trading with Storage Class Memory
Before • 20TB cache RAM dedicated to 13
servers • 20µs latency over 40G Infiniband • 26U @ 9100W storage solution
After • 24 Intel Optane™ 1TB SSDs in E8-X24 • Data shared to all database nodes • 2U @ 1200W storage solution
Extreme performance for real-time market adjustments
12 NVMe Developer Days 2018 San Diego, CA