machine learning engineer @ twitter cortex cibele montez...

Post on 26-Aug-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Deep Learning at Twitter's Scale

Cibele Montez HalaszMachine Learning Engineer @ Twitter Cortex

October 11, 2018

12345

BackgroundWorkflow/PlatformModeling and Optimizations Additional Performance GainsGPU

Agenda

Background

● Challenges: Characteristics of Platform

● Data Shift

4

Sparsity of Data

Along with more body copy goes here.

Challenges

Source: Luca Belli/Dan Shiebler, June, 2018

VERY

SPARSE

DATA

5

Speed

Along with more body copy goes here.

Challenges

Source: Jacob Kastrenakes, The Verge, July, 2018

6

Data Shift

Along with more body copy goes here.

Data Shift

Source: Luca Belli/Dan Shiebler, June, 2018

7

Data Shift

Source: Luca Belli/Dan Shiebler, June, 2018

Data Shift

Machine Learning at the Company

● Environment● Modeling: some use cases

Environment

Environment: ML Overview

Source: Luca Belli/Dan Shiebler, June, 2018

Team A’s Data Aggregation and Feature Extraction Job

Cortex Embedding Generation Pipeline

Feature Registry

Team A’s Machine Learning Model

Team B’s Machine Learning Model

Team C’s Machine Learning Model

Environment

Environment: ML Workflows

Source: Devin Goodsell, NYC Machine Learning Meetup, June 20, 2018

Environment

Environment: ML Training

Environment

Environment: Priorities

● Feature Addition → Scalable data ● Data Addition → Scalable data ● Training → Fast, robust training engine ● Deployment → Seamless and tested ML services ● A/B test → Good AB test environment

Modeling: Modeling and Optimizations with TensorFlow

Modeling

Modeling

Discretizer Full Sparse Dense MLP Full Sparse

Modeling

Sparse Linear Layer

1 2 K...

V1 V2 Vn-2 Vn-1 V

Modeling

Sparse Linear Layer: First Approach

Source: Tensorflow

Modeling

Sparse Linear Layer: Final Approach

Source: Tensorflow

Modeling

Sparse Linear Layer

● Optimizers

○ SGD○ Lazy Adam

Source: Berkeley Research Artificial Lab

Modeling

Sparse Linear Layer: Variable Partitioning

output:

Modeling

Sparse Linear Layer: Variable Partitioning

output:

Modeling

Sparse Linear Layer: Variable Partitioning

output:

Modeling

Sparse Linear Layer: Variable Partitioning: Profiling

~33% reduction

Modeling

Sparse Linear Layer: Online Normalization

● Example:

Source: Nicolas Koumchatzky

Input: input_feature (value == 1M)⇒ weight_gradient == 1M⇒ update = 1M * learning_rate⇒ ?

Modeling

Sparse Linear Layer: Online Normalization

● Example:

Input: input_feature (value == 1M)⇒ weight_gradient == 1M⇒ update = 1M * learning_rate⇒

Source: Nicolas Koumchatzky

Modeling

Sparse Linear Layer: Online Normalization

● Normalization of input values

Source: Nicolas Koumchatzky

Belongs to [-1, 1]Trainable per-feature bias: discriminate absence and presence of features

Additional Performance Gains

Performance Gains

Hogwild

Source: Hogwild!, UC Berkeley

Performance Gains

Hogwild

Source: Tensorflow

Performance Gains

Custom Ops

GPU x CPU metrics

GPU Benchmarks

GPU Benchmarks

Batch Size/Model Optimization

CPU: Baseline(tf.sparse_tensor_dense_matmul)

CPU: After optimizations

GPU: Baseline(tf.sparse_tensor_dense_matmul)

GPU: After Optimizations

256 1024 samples/s 7372 samples/s 5504 samples/s 22528 samples/s

512 1638 samples/s 11264 samples/s 8448 samples/s 21504 samples/s

1024 2355 samples/s 13312 samples/s 10752 samples/s 22528 samples/s

GPU benchmarks were run with NVIDIA Tesla K80 ProcessorsCPU benchmarks were run with Intel Xenon Platinum 8180 Processors

Acknowledgements

● Andrew Bean● Ricardo Cervera-Navarro● Priyank Jain● Ruhua Jiang● Nicholas Léonard● Briac Marcatté● Mahak Patidar● Tim Sweeney● Pavan Yalamanchili● Yi Zhuang

Thank you!Questions?

Follow me on Twitter: @cibelemhEmail me at: cmontezhalasz@twitter.com

top related