with a gpu data frame accelerate analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...mapd...

15
Accelerate Analytics with a GPU Data Frame Aaron Williams October 18, 2017

Upload: others

Post on 13-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge

Accelerate Analytics with a GPU Data FrameAaron WilliamsOctober 18, 2017

Page 2: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge

MapD: Extreme Analytics

2

100x Faster Queries

MapD Core

The world’s fastest columnar database, powered

by GPUs

+

Visualization at the Speed of Thought

MapD Immerse

A visualization front end that leverages the speed &

rendering superiority of GPUs

Page 3: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge

MapD System ArchitectureAccelerating the existing data infrastructure

3

Page 4: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge

4

MAPD DEMO

Page 5: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge

MapD BenchmarksBlogger Mark Litwintschik benchmarked MapD on a billion-row taxi data set and found it to be up to orders-of-magnitude faster than the fastest CPU databases

5

MapD Core: Comparative Query Acceleration*System Q 1 Q 2 Q 3 Q 4

BrytlytDB & 2-node p2.16xlarge cluster 36x 47x 25x 12x

ClickHouse, Intel Core i5 4670K 49x 58x 32x 25x

Redshift, 6-node ds2.8xlarge cluster 74x 24x 14x 6x

BigQuery 95x 38x 6x 6x

Presto, 50-node n1-standard-4 cluster 190x 75x 61x 41x

Amazon Athena 305x 117x 37x 13x

Elasticsearch (heavily tuned) 386x 343x n/a n/a

Spark 2.1, 11 x m3.xlarge cluster w/ HDFS 485x 153x 119x 169x

Presto, 10-node n1-standard-4 cluster 524x 189x 127x 61x

Vertica, Intel Core i5 4670K 685x 607x 203x 132x

Elasticsearch (lightly tuned) 1,642x 1,194x n/a n/a

Presto, 5-node m3.xlarge cluster w/ HDFS 1,667x 735x 388x 159x

Presto, 50-node m3.xlarge cluster w/ S3 2,048x 849x 164x 86x

PostgreSQL 9.5 & cstore_fdw 7,238x 3,302x 1,424x 722x

Spark 1.6, 5-node m3.xlarge cluster w/ S3 12,571x 5,906x 3,758x 1,884x

*All speed comparisons are to the “MapD & 1 Nvidia Pascal DGX-1” benchmark

Source: http://tech.marksblogg.com/benchmarks.html

Page 6: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge

Query Compilation with LLVM

6

Traditional DBs can be highly inefficient• each operator in SQL treated as a separate function• incurs tremendous overhead and prevents vectorization

MapD compiles queries w/LLVM to create one custom function• Queries run at speeds approaching hand-written functions• LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc).• Code can be generated to run query on CPU and GPU simultaneously

10111010101001010110101101010101

00110101101101010101010101011101LLVM

Page 7: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge

Keeping Data Close to ComputeMapD maximizes performance by optimizing memory use

7

SSD or NVRAM STORAGE (L3)250GB to 20TB1-2 GB/sec

CPU RAM (L2)32GB to 3TB70-120 GB/sec

GPU RAM (L1)24GB to 256GB1000-6000 GB/sec

Hot Data Speedup = 1500x to 5000xOver Cold Data

Warm DataSpeedup = 35x to 120xOver Cold Data

Cold Data

COMPUTELAYER

STORAGELAYER

Data Lake/Data Warehouse/System Of Record

Spee

d In

crea

ses

Space Increases

Page 8: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge

The Status Quo: Memory Bottlenecks

8

PCIe4-16GB/s

Page 9: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge

The GPU Open Analytics Initiative ModelStandard in-memory format; zero-copy interchange

9

GPU

Page 10: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge

The GPU Open Analytics Initiative ModelStandard in-memory format; zero-copy interchange

10

Page 11: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge

Interactive Machine LearningEmpowering the People in the Pipeline

11

Personas inAnalytics Lifecycle

(Illustrative)Business Analyst

Data Scientist

Data Engineer

IT Systems Admin

Data Scientist / Business Analyst

Data Preparation

Data Discovery& Feature

Engineering

Model & Validate

PredictOperationalize

Monitoring & Refinement

Evaluate & Decide

GPUsMapD H20.ai MapD

Page 12: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge

12

GOAI DEMO

Page 13: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge

Try MapDIt’s free and it’s easy (and @ortelius sez “it’s the new h0t sh1t”)

13

Play with the live demos:https://www.mapd.com/demos/

Download the Community Edition:https://www.mapd.com/platform/download-community/

Join our forums:https://community.mapd.com/

Review these slides:https://www.slideshare.net/aaronrogerwilliams

Page 14: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge

Aaron WilliamsVP of Global Community

@_arw_ [email protected] /in/aaronwilliams/ /williamsaaron

Page 15: with a GPU Data Frame Accelerate Analyticson-demand.gputechconf.com/gtc-il/2017/presentation/...MAPD DEMO. MapD Benchmarks ... System Q 1 Q 2 Q 3 Q 4 BrytlytDB & 2-node p2.16xlarge