a scalable artificial intelligence data pipeline for ... · sanhita sarkar, ph.d. global director,...

34
Sanhita Sarkar, Ph.D Global Director, Analytics Software Development Track: Machine Learning September 25, 2019 2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved. A Scalable Artificial Intelligence Data Pipeline for Accelerating Time to Insight

Upload: others

Post on 17-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Sanhita Sarkar, Ph.DGlobal Director, Analytics Software Development

Track: Machine Learning

September 25, 2019

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

A Scalable Artificial Intelligence Data Pipeline for Accelerating Time to Insight

Page 2: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Training and Inference Performance on a Disaggregated Architecture

Data Ingestion Performance with Apache Kafka© to an NVMe™ All-Flash Array

Summary and Best Practices

Topics Covered

3

4

7

Disaggregated Architectures for AI implementations 2

Training and Inference Performance on a Disaggregated Architecture

Data Ingestion Performance with Apache Kafka© to an NVMe™ All-Flash Array

Summary and Best Practices

3

4

7

Disaggregated Architectures for AI Implementations 2

2

DEMO: A Real-World Use Case6

An AI Data Pipeline - Stages and Requirements1

DEMO: A Real-World Use Case6

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Performance of a Data Pipeline from Object Notifications to Metadata Search5Performance of a Data Pipeline from Object Notifications to Metadata Search5

Page 3: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

AI Pipeline is Data Intensive

3

Data Sources

IoT and Sensors

Business Processes

Legacy

Ingest

Transient

Centralized

High Performance

Data Preparation

ETL - Data Cleansing,

Pre-processing

Model Training

Deep Learning

Frameworks

Trained Models

Model Serving

Deploy Trained Models

Inference

Data Hours and Days Seconds to ResultsDays and WeeksSpeed and Scale2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Page 4: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

AI Pipeline has Varying Characteristics and Performance Requirements

4

Data Sources

IoT and Sensors

Business Processes

Legacy

Ingest

Transient

Centralized

High Performance

Data Preparation

ETL - Data Cleansing,

Pre-processing

Model Training

Deep Learning

Frameworks

Trained Models

Model Serving

Deploy Trained Models

Inference

Data Hours and Days Seconds to ResultsDays and WeeksSpeed and Scale

Data Variety

Data Volume

Data Disparity

Data Volume

Data Velocity

Data Quality

Model Quality

Response Time

Throughput

Data Caching

Data Access Latency

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Page 5: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

AI Pipeline has Varying Infrastructure Requirements

5

Data Sources

IoT and Sensors

Business Processes

Legacy

Ingest

Transient

Centralized

High Performance

Data Preparation

ETL - Data Cleansing,

Pre-processing

Model Training

Deep Learning

Frameworks

Trained Models

Model Serving

Deploy Trained Models

Inference

Data Hours and Days Seconds to ResultsDays and WeeksSpeed and Scale

Throughput-oriented Temporary Tier

(NVMe™/Object Storage)

Throughput-oriented, Globally Accessible Capacity Tier

(Object Storage)

Throughput-oriented, High Performance Tier

(NVMe™)

Throughput-oriented, Random I/O Performance and Capacity Tier (NVMe™/Hybrid)

Low Latency, Throughput-oriented, Scalable, Random I/O

Performance Tier (NVMe™)

Scalable, Large/Sequential I/O, Capacity Archival Tier (Object Storage)

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Page 6: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Training and Inference Performance on a Disaggregated Architecture

Data Ingestion Performance with Apache Kafka© to an NVMe™ All-Flash Array

Summary and Best Practices

Topics Covered

3

4

7

Disaggregated Architectures for AI Implementations 2

6

DEMO: A Real-World Use Case6

An AI Data Pipeline - Stages and Requirements1

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Performance of a Data Pipeline from Object Notifications to Metadata Search5

Page 7: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Aggregated vs. Disaggregated Architecture for AI

Data Network

Shared pool of NVMe™ Flash Storage

Data Network

Object Storage

• Model training is limited to the flash storage capacity integrated in the GPU servers and this incurs multiple data transfers from the object storage, once data grows over capacity on the servers.

• Incurs delays in model serving and inference.

Object Storage

• Model training can scale independently on a disaggregated pool of GPUs, shared flash and object storage, with no subsequent data transfers.

• Inference by the model serving client is faster due to immediate access to trained models on the shared flash storage.

Model Training

Pool of GPUs

GPU server(s) collocated with Flash

Centralized Data Repository

Data Preparation

Model Serving

Model ServingModel TrainingCentralized Data Repository

Data Preparation

Training data

Training dataTrained models

Training data

Trained models

7

Ingest

Ingest

Inference

Clients

Clients

Clients

Inference

Clients

Clients

Clients

GPU GPU GPU GPU GPU GPU GPU GPU

GPUGPUGPUGPUGPUGPUGPUGPU

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Page 8: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

High Performance Rack-scale Flash Storage

Option 1: Disaggregated Architecture of an AI Data Pipeline

Raw S3 Object

Raw S3 Object

Raw S3 Object

Model vn+1

Model vn+2

Model v2n

AI framework (TensorFlow, PaddlePaddle, PyTorch, Caffe, etc)

GPU Cluster

AI Model Serving

Model Training Model Serving

Train and Validate Model

Batched

Produce & Publish CNN/RNN Models

Deploy

Inference

Clients

Clients

Clients

Centralized, High Capacity Landing Zone

Ingest

DataPreparation

Training Data

Training Data

Validation Data

Object Storage

Model v₁

Model v₂

Model vn

Validation Data

82019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Page 9: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Object Storage

NVMe™ All-Flash Array

Key-Value Store

Implemented Workflow: Option 1

Service for Object Notifications / Kafka Producer

Elasticsearch©

Gets notifications

of newly ingested S3

objects

Kafka TopicPublishes the

system metadata of the new S3 objects to a Kafka topic

AI Model Serving

Storage Bucket

Subscribes to the Kafka topic to get the S3 locations of the newly ingested objects

New Image S3™ ObjectsNew Text S3™ Objects

New Video S3™ Objects

Search and VisualizeCustom metadata of

image/text/video S3 objects

TensorFlow/ PaddlePaddle/

PyTorch

Gets the S3 objects from object storage

Uses AI/Deep Learning algorithms for

analyzing the objects, and generating labels, and custom metadata Stores the labels, paths and

custom metadata of the S3 objects in Elasticsearch

DashboardKibana™

9

Inference Score

Notifies S3 locations of failed objects to be used for re-training the model

Scor

e <

thre

shol

d

Data Preparation

AI Model Training

GPU Cluster

TensorFlow/ PaddlePaddle/

PyTorch Models

Gets

AI m

odel

from

Fla

sh A

rray

Training Data

Scor

e >

thre

shol

d

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Page 10: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

High Performance Rack-scale Flash Storage

Option 2: Disaggregated Architecture of an AI Data Pipeline

Raw Data

Raw Data

Raw Data

Model vn+1

Model n+2

Model v2n

AI framework (TensorFlow, PaddlePaddle, PyTorch, Caffe, etc)

GPU Cluster

AI Model Serving

Model Training Model Serving

Train and Validate Model

Batc

hed

Produce & Publish CNN/RNN Models

Deploy

Clients

Clients

Clients

High Capacity Data Archive

Ingest

Data Preparation

Training Data

Training Data

Validation Data

Object Storage

Model v₁

Model v₂

Model vn

Validation Data

10

Archived Training Data

Archived Raw Data

Inference

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Page 11: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Object Storage

Key-Value Store

NVMe™ All-Flash Array

Implemented Workflow: Option 2

Elasticsearch©

AI Model Serving

Image StreamText Stream

Video Stream

Search and VisualizeCustom metadata of

image/text/video files

TensorFlow/ PaddlePaddle/

PyTorch

Gets the input data and models from Flash Array

Uses AI/Deep Learning algorithms for

analyzing the input data files, and

generating labels, and custom metadata Stores the labels, paths and

custom metadata of the files in Elasticsearch

DashboardKibana™

11

Inference Score

Notifies locations of failed data files to be used for re-training the model

Scor

e <

thre

shol

d

Models

Raw Data

Data Preparation

AI Model Training

GPU Cluster

TensorFlow/ PaddlePaddle/

PyTorch

Apache Kafka©

Training Data

ObjectBucket

ObjectBucket

Compresses and moves old files and models

Key-Value Storage

Scor

e >

thre

shol

d

S3™ Cloud Connector

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Page 12: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Option 3: Disaggregated Architecture of an AI Data Pipeline

Raw Data

Raw Data

Raw Data

Model vn+1

Model n+2

Model v2n

AI framework (TensorFlow, PaddlePaddle, PyTorch, Caffe, etc)

GPU Cluster

AI Model Serving

Model Training Model Serving

Train and Validate Model

Batc

hed

Produce & Publish CNN/RNN Models

Deploy

Clients

Clients

Clients

Ingest

Data Preparation

Training Data

Training Data

Validation Data

Model v₁

Model v₂

Model vnValidation Data

12

Archived Training Data

Archived Raw Data

Inference

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Centralized, Globally Accessible High Capacity Tier

ARCHIVE

Object Storage

Page 13: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

StreamerData Generator

Object Storage w/ File Interface

Key-Value Store

Implemented Workflow: Option 3

Kafka Producer

Elasticsearch©

Kafka Topic 1

AI Model Serving

Directory

New Image FilesNew Text Files

New VideosSearch and Visualize

Custom metadata of image/text/video files

TensorFlow/ PaddlePaddle/

PyTorchUses AI/Deep Learning

algorithms to analyze the files, and generate custom

metadata Stores the paths and

custom metadata of the files in

Elasticsearch

DashboardKibana™

Inference Score

Scor

e <

thre

shol

d

Data Preparation

AI Model Training

GPU Cluster

TensorFlow/ PaddlePaddle/

PyTorch

Models

Gets files and AI model from object storage file interface

Training Data

Scor

e >

thre

shol

d

Watcher

Publishes new files

Kafka Topic 2

Publishes locations and metadata of

new files

Sends event notifications and locations of newly ingested files

Pushes latest files to a Kafka Producer

Raw Data

Not

ifies

loca

tions

of f

aile

d fil

es to

be

use

d fo

r re-

trai

ning

the

mod

el

Streams files

132019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Page 14: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Training and Inference Performance on a Disaggregated Architecture

Data Ingestion Performance with Apache Kafka© to an NVMe™ All-Flash Array

Summary and Best Practices

Topics Covered

3

4

7

Disaggregated Architectures for AI Implementations 2

14

DEMO: A Real-World Use Case6

An AI Data Pipeline - Stages and Requirements1

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Performance of a Data Pipeline from Object Notifications to Metadata Search5

Page 15: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Image Training Performance: Disaggregated Flash and GPUs

15

• On a disaggregated architecture comprising an NVMe™ All-Flash Array, and

a single 8-GPU server, the training performance with most AI models scales almost linearly up to 4 GPUs and is ~7x with 8 GPUs, except for AlexNet and LeNet, where the training performance saturates at 1 GPU itself.

multiple GPU servers, the training performance scales linearly with the number of servers, irrespective of the choice of AI models.

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

0

4000

8000

12000

16000

20000

24000

ResNet-50 AlextNet ResNet-152 Inception-V3 LeNet Inception-V4 GoogleNet

Imag

es/s

ecImage Training Performance Throughput: 8-GPU server

1GPU2 GPU4 GPU8 GPUHigher the

Better

0

40000

80000

120000

160000

200000

8GPU 16GPU 24GPU 32GPU 40GPU 48GPU 56GPU 64GPU

Imag

es/s

ec

Image Training Perfomance Throughput by Model

Inception-V4 ResNet-152 Inception-V3 ResNet-50 GoogleNet AlextNet LeNet

Higher the Better

Page 16: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

I/O Throughput: Image Training on Disaggregated Flash and GPUs

16

• On a disaggregated architecture comprising a single 8-GPU server and an NVMe™ All-Flash Array –

the average I/O throughput during training using the ResNet-50 model (compute intensive) is ~653 MB/s, the GPU utilization being ~100% (size of image data is 164 GB, each image being ~100 KB)

the average I/O throughput during training using the LeNet model (I/O intensive) is ~2.2 GB/s, the GPU utilization being ~20%. So the LeNet model requires ~3.4x the I/O throughput, compared to ResNet-50.

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

0

20

40

60

80

100

0

200

400

600

800

1000

GPU

usa

ge

I/O

Thr

ough

put

(MB/

s)

Time

I/O Throughput: Image Training with ResNet-50 (8 GPUs )

I/O Throughput GPU usage

0

20

40

60

80

100

1900

2000

2100

2200

2300

GPU

usa

ge

I/O

Thr

ough

put

(MB/

s)

Time

I/O Throughput: Image Training with LeNet (8 GPUs)

I/O Throughput GPU usage

Page 17: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Image Inference Performance on Disaggregated Flash and GPUs

17

• The inference throughput is measured as the aggregated images/sec inference results using ImageNet datasets across multiple GPU containers.

• On a disaggregated architecture comprising an NVMe™ All-Flash Array, and

a single 8-GPU server, results show that the inference image processing rates are between ~3.3x to ~4x the training rates of the corresponding TensorFlow models.

multiple GPU servers, users have the flexibility to run mixed AI workloads for training and inference, by dedicating one or two GPUs to inference for every 8 GPUs, rest being allocated to training.

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

0

4000

8000

12000

16000

20000

ResNet-50 Inception-V3 Inception-V4 VGG-16 ResNet-152

Imag

es/s

ecImage Inference Performance Throughput: 8-GPU server

1GPU2GPU4GPU8 GPU

Higher the Better

0

40000

80000

120000

160000

8GPU 16GPU 32GPU 64GPU

Imag

es/s

ec

Image Inference Performance Throughput by Model

Inception-V4ResNet-152VGG-16Inception-V3ResNet-50

Higher the Better

Page 18: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Example Configurations: GPU servers and an NVMe™ All-Flash Array

18

• An example allocation strategy of NVMe All-Flash arrays is considered for executing AI workloads –• 30% of the I/O bandwidth for model training, and the remaining 70% for various phases like data preparation,

inference, and other activities

• Considering the above allocation strategy, example configurations are derived using the I/O throughput required on a disaggregated architecture comprising a single NVMe All-Flash Array and a 8-GPU server, while using ResNet-50 and LeNet models for training - A single NVMe All-Flash Array can scale up to twelve 8-GPU servers running ResNet-50 model for the training phase, with

100% utilization of GPUs. With LeNet model, a single NVMe All-Flash Array can optimally scale up to three 8-GPU servers for the training phase.

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

NVMe™ All-Flash Array

AI Model Used for Training Number of 8-GPU Servers to Scale

Number of Flash Arrays 1Compute-intensive ResNet-50

Model 12

I/O-intensive LeNet Model 3

Mixed I/O Throughput GB/s (80% Reads, 20% Writes)

23.5

Page 19: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Training and Inference Performance on a Disaggregated Architecture

Data Ingestion Performance with Apache Kafka© to an NVMe™ All-Flash Array

Summary and Best Practices

Topics Covered

3

4

7

Disaggregated Architectures for AI Implementations 2

19

DEMO: A Real-World Use Case6

An AI Data Pipeline - Stages and Requirements1

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Performance of a Data Pipeline from Object Notifications to Metadata Search5

Page 20: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Concurrent Clients NVMe™ All-

Flash Array

20

Data Ingestion Benchmark Schematics

Apache Kafka© Connect Cluster

Apache Kafka© Cluster

Worker 1....

Kafka Node 1

Client Node 1

Connector 8Connector 1

Worker 2.... Connector 16Connector 9

Worker 3.... Connector 24Connector 17

Kafka Node 2

Kafka Node 3

Kafka Node 4

Client Node 2

Client Node 3

Client Node 4

4 GB/s

Streaming data

For each benchmark run on 4 clients:

• Record size: 500 bytes/message• Sends 400 M messages at 4 GB/s• Publishes to a Kafka topic, 8 M

messages at a time.

• Each connector in the Kafka Connect Cluster subscribes to a Kafka topic in the Kafka Cluster, and sinks the data to an NVMe All-Flash Array.

• A total of 32 connectors is writing to a single NVMe All-Flash Array.

• Achieved average write throughput from a single NVMe All-Flash Array is ~ 3.2 GB/s.

For each benchmark run, Kafka cluster can process• 8 M messages/sec, for the

message size of 500 bytes.• Average latency of

acknowledgment to the clients is ~40 ms/message.

Worker 4....Connector 25 Connector 32

100 GigE connections

100 GigE connections

~4 GB/s

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Page 21: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Data Ingestion Performance with Kafka to an NVMe™ All-Flash Array

21

• Each connector (known as sink connector) is dedicated to a Kafka topic, i.e., the number of connectors is equal to the number of topics/files.

• Write throughput increases linearly from 8 to 16 connectors and by 1.5x from 16 to 32 connectors.

• With 4 Kafka Connect worker nodes and 32 sink connectors pointing to a single NVMe All-Flash Array, the average write throughput achieved is 3.2 GB/s.

• This test helps to determine the number of connectors to configure in the Kafka Connect cluster, based on the number of flash arrays, the input ingestion rates, and the available I/O throughput from the flash arrays.

• 1 JVM is used for each worker node in the Kafka Connect cluster, with a JVM heap size of 64 GB.

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

1.1

2.1

3.2

0

0.5

1

1.5

2

2.5

3

3.5

8 16 32

Writ

e Th

roug

hput

(GB

/s)

Number of Connectors (one connector per topic/file)

Kafka Filestream Sink Connector - Scalablity with Number of Connectors

Higher the Better

1.1

2.1

3.2

00.5

11.5

22.5

33.5

1 2 4Wri

te T

hrou

ghpu

t (G

B/s)

Number of Connect Worker Nodes (8 Connectors per Worker Node )

Kafka Filestream Sink Connector - Scalablity with Number of Worker Nodes

Higher the Better

Page 22: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Training and Inference Performance on a Disaggregated Architecture

Data Ingestion Performance with Apache Kafka© to an NVMe™ All-Flash Array

Summary and Best Practices

Topics Covered

3

4

7

Disaggregated Architectures for AI Implementations 2

22

DEMO: A Real-World Use Case6

An AI Data Pipeline - Stages and Requirements1

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Performance of a Data Pipeline from Object Notifications to Metadata Search5

Page 23: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Performance: Object Notifications with Respect to Object Puts

• The benchmark comprises of 3 clients working as load generators and performing 'put' operations of a total of 2.4M puts using 300, 600 and 1200 connections in an object storage. The average notification rates of these object puts are measured simultaneously, with a batch size set to 1000, followed by measurements of the Kafka-Elastiscsearch Connector throughput and latency.

• The “average” rate of object notifications lags slightly behind the average rate of puts by approximately 572 to 730 notifications/sec with 600 and 1200 connections respectively, while there is no lag with 300 connections.

• The Kafka-Elasticsearch Connector consumes the messages at the same average rate of 1.7 MB/s to 2.1 MB/s that an object notification service (as a Kafka Producer) sends to the Kafka topic, with no visible latency in the Kafka cluster for acknowledging the notifications or in the Kafka-Elasticsearch Connector, as a consumer.

• An optimal throughput with no latency is achieved with a minimal configuration of Kafka, comprising of 2 nodes in the Kafka cluster, 1-2 Connect worker nodes with a total of 4 connectors, and 4-8 partitions of the Kafka topic. The CPU usage is 10-15% in both the Kafka cluster and Connect worker node(s), the Kafka JVM heap size being 32 GB.

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved. 23

2,655 2,677

2,091 2,091

3,046

2,4741,933 1,933

2,912

2,1821,705 1,705

0500

100015002000250030003500

Avg. Puts/sec Avg. Object NotificationMsgs/sec

Object NotificationService/Kafka Producer

Throughput (KB/s)

Elasticsearch ConnectorConsumer Throughput

(KB/s)

Object Notification Rates: Single Bucket to Single Kafka Cluster/Topic(2.4M Puts)

300 connections600 Connections1200 connections

Page 24: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Reference Implementation of Kafka for Object Notification Service

Reference Implementation of Kafka Total Kafka Storage space¹= ((Retention Period in Secs * (Object Notification

Message Rate * 800)* (1+ RF)) Bytes

Number of NodesPhysical or Virtual

Machine Configuration Per Node

Expected 90th Percentile Producer Acknowledgement

Latency

Object Notification Message Rate (K Messages /Sec)

2 Nodes (8 Shards, Replication Factor (RF)=1)

4 vcpus, 64 GB <10ms 5008 vcpus, 64 GB <10ms 1000

16 vcpus, 128 GB <10ms 200032 vcpus, 128 GB <10ms 400048 vcpus, 128 GB <10ms 8000

¹For example, considering object notification message rate = 2,700 messages/sec, retention period (in Kafka topic) = 7 days, and RF =1, the total Kafka storage space required = 2,700* 800 bytes * (7*24*3600)*2 = 2.4 TB

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved. 24

Page 25: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Node 1

Elasticsearch Implementation for Metadata Indexing and Search- Up to 3 Billion Object Notification Messages

Load Balancer

Clients

Shard 1 leader

Shard 7 Replica

Shard 2 leader

Shard 8 Replica

ES Instance 1

Indexed data filesTransaction Log files

Node 4

Shard 7 leader

Shard 1 Replica

Shard 8 leader

Shard 2 Replica

ES Instance 4

Indexed data filesTransaction Log files

1. Dividing into multiple shards allows for speeding up index creation times, but slows down query response times due to the overhead of aggregating results from all the shards.

2. Optimal shard configurations for indexing and querying 3 billion object notification messages of 800 bytes, are determined:

• 2 primary shards (and 2 replicas) per Elasticsearch instance;

• As the number of messages grows, a max of 375M messages (~280 GB) per shard provides an optimal performance for both querying and indexing.

Node 2

Shard 3 leader

Shard 5 Replica

Shard 4 leader

Shard 6 Replica

ES Instance 2

Indexed data filesTransaction Log files

Indexed data filesTransaction Log files

Node 3

Shard 5 leader

Shard 3 Replica

Shard 6 leader

Shard 4 Replica

ES Instance 3

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved. 25

Page 26: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Elasticsearch Performance: Index Creation Times- Up to 1 Billion Object Notification Messages

• This benchmark measures the index creation times by Elasticsearch, as the number of object notification messages grows up to 1 billion. Tests have been performed using 4 nodes and 8 shards in the Elasticsearch cluster and in batches of 25M puts in an object storage, with a 2-hour interval between the batches. The average rate of puts in the object storage and the average rate of notification messages during this benchmark have been 1,100 and 1,092, respectively.

• Results show that the index creation rate by Elasticsearch is similar to the object notification message rate.

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved. 26

25.050.5

75.2100.3

125.3150.5

175.8201.3

226.7252.2

0

50

100

150

200

250

300

100M 200M 300M 400M 500M 600M 700M 800M 900M 1000M

Tim

e (H

ours

)

Number of Object Notification Messages in Elasticsearch Index

Index Creation Times up to 1 Billion Object Notification Messages(1,100 Avg. Puts/Sec and 1,092 Avg. Object Notification Messages/Sec)

Page 27: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Elasticsearch Performance: Query Response Times and Throughput- Considering Object Notification Messages to Grow to 3 Billion over Time

• This benchmark executes a suite of 1,100 mixed queries, comprising of simple and complex queries, against the object notification messages. A simulation of messages up to 3 billion has been done using the existing 1 billion messages to measure the query response times for 3 billion and to come up with a reference implementation.

• Considering same number of nodes, the median response times of Elasticsearch for mixed queries increase with an increasing number of shards, along with a corresponding reduction in throughput, due to the overhead of aggregating the query results from all shards.

• Considering same number of shards, the median response times decrease by ~20%-38% from 2 to 4 nodes, along with a corresponding increase in throughput by 14%- 106%, the difference of throughput being significant with 3 billion messages.

• Optimal query response times and throughput are achieved with 4 nodes and 8 shards in an Elasticsearch cluster.

43 3

54

10

0

2

4

6

8

10

12

2 nodes(8 Shards)

4 nodes(8 Shards)

4 nodes(16 Shards)

Resp

onse

Tim

e (m

s)

Elasticsearch: Median Response Times (Mixed Queries)

1Billion 3 Billion

Lower the Better

4 nodes with 8 shards is an optimal choice

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Note: Here, a shard refers to a primary shard.

45,28251,705

47,031

8,886

18,34213,541

0

10000

20000

30000

40000

50000

60000

2 nodes(8 Shards)

4 nodes(8 Shards)

4 nodes(16 Shards)

Thro

ughp

ut (

Que

ries

per s

econ

d)

Elasticsearch: Mixed Query Throughput

1Billion 3 Billion Higher the Better

4 nodes with 8 shards is an optimal choice

27

Page 28: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Reference Implementation of Elasticsearch for Indexing and Search - Up to 3 Billion Object Notification Messages

Reference Implementation of Elasticsearch 3 Billion Object Notification Messages(Total Storage Space is ~ 4.5 TB including One Replica)

Number of NodesConfiguration of Physical

or Virtual Machine Per Node

Expected 90th

Percentile Response Time

Expected Query

Throughput(Queries/Sec)

Expected Indexing Rate (K

Messages/Sec)

4 Nodes (8 Shards, Replication Factor = 1)

4 vcpus, 256GB <1s 1528 118 vcpus, 256GB <1s 3057 23

16 vcpus, 256GB <1s 6114 4732 vcpus, 256GB <1s 12228 9348 vcpus, 256GB <1s 18342 141

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved. 28

Page 29: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Training and Inference Performance on a Disaggregated Architecture

Data Ingestion Performance with Apache Kafka© to an NVMe™ All-Flash Array

Summary and Best Practices

Topics Covered

3

4

7

Disaggregated Architectures for AI Implementations 2

29

DEMO: A Real-World Use Case6

An AI Data Pipeline - Stages and Requirements1

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Performance of a Data Pipeline from Object Notifications to Metadata Search5

Page 30: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Key-Value Store

Implemented Real-World Use Case

Service for Object Notifications

Elasticsearch©

Kafka TopicPublishes the

metadata of image S3 objects to a Kafka topic

AI Model Serving

Subscribes to the Kafka topic to get the S3 locations of the images

Car Image S3™ Objects

Search and Visualize

Custom metadata of car image S3

objects

TensorFlow

Gets the S3 image objects from object storage

Uses AI/Deep Learning algorithms for

analyzing the images, and generating custom

metadataStores the S3 object paths and

custom metadata in Elasticsearch

DashboardKibana™

30

Inference Score

Notifies S3 locations of failed flower image objects to be used for re-training the model

Scor

e <

80%

Data Preparation

AI Model Training

GPU Cluster

TensorFlow

Gets

AI m

odel

from

Fla

sh a

rray

Scor

e >

80%

Flower Image S3™ Objects

Models

NVMe™ All-Flash Array

Object Storage

Training Data

Search and Visualize

Custom metadata of flower image S3

objects

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Page 31: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Training and Inference Performance on a Disaggregated Architecture

Data Ingestion Performance with Apache Kafka© to an NVMe™ All-Flash Array

Summary and Best Practices

Topics Covered

3

4

7

Disaggregated Architectures for AI Implementations 2

31

DEMO: A Real-World Use Case6

An AI Data Pipeline - Stages and Requirements1

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Performance of a Data Pipeline from Object Notifications to Metadata Search5

Page 32: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Summary and Best Practices

32

• Implementing a disaggregated architecture of GPU compute, a shared pool of NVMe™ All-Flash Arrays and object storage system(s) has multiple benefits while executing AI workloads – Subsequent data transfers in and out of local SSDs of GPU servers can be avoided, as the data grows over capacity. Inference is faster due to immediate access to trained models on the shared flash storage. Businesses can independently scale GPU servers and shared flash arrays to meet the changing needs of their AI workloads, along

with the flexibility to run mixed AI workloads for training and inference. With a preferred allocation strategy of the I/O bandwidth, various teams can share and scale the flash arrays to serve multiple

GPU servers in a cost-effective manner. A high capacity object storage may be used as a landing zone for the ingested data as well as an archival solution.

• As a best practice to attain an optimal ingestion performance with Kafka to NVMe All-Flash Arrays – Number of connectors and worker nodes in the Kafka Connect cluster needs to be tuned, based on the number of flash arrays,

the input ingestion rates, and the available I/O throughput from the arrays; Based on the I/O throughput requirement, high-speed network interfaces and topology need to be configured for the Kafka

cluster, the worker nodes of the Kafka Connect cluster, and the flash array(s) to eliminate network bottlenecks.

• As a best practice to attain an optimal performance of object notifications and Elasticsearch – A minimal configuration of Kafka is needed, comprising of 2 nodes in the Kafka cluster, 1-2 Connect worker nodes with a total of

4 connectors, and 4-8 partitions of the Kafka topic. The object notification message rate is almost similar to the object put rate. The index creation rate of Elasticsearch is similar to the object notification message rate. Optimal indexing and search

performance is achievable with 4 nodes and 8 shards in an Elasticsearch cluster for up to 3 billion object notification messages.

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

Page 33: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Links to Western Digital Product Pages and White Papers

• Product Pages

Storage Platforms

Object Storage

• White Papers

ActiveScale™ Data Pipeline Service with Elasticsearch© – Best Practices for Metadata Indexing and Search

Use Cases for ActiveScale™ Data Pipeline Service

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.

33

Page 34: A Scalable Artificial Intelligence Data Pipeline for ... · Sanhita Sarkar, Ph.D. Global Director, Analytics Software Development. Track: Machine Learning. September 25, 2019. 2019

Architecting Data Infrastructure for the Zettabyte Age

2019 Storage Developer Conference, Santa Clara, CA © 2019 Western Digital Corporation or its affiliates. All rights reserved.