pluto - nvidia...learning summit 2017. • tensorflow in alime, jun yang et al., shanghai gdg mar.,...

27
Pluto A Distributed Heterogeneous Deep Learning Framework Jun Yang, Yan Chen Large Scale Learning, Alibaba Cloud

Upload: others

Post on 30-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

Pluto A Distributed Heterogeneous Deep Learning Framework

Jun Yang, Yan Chen Large Scale Learning, Alibaba Cloud

Page 2: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

Outline

2

•  PAI(Platform of Artificial Intelligence) •  PAI Overview •  Deep Learning with PAI •  Pluto

•  PAI DL Application •  Chatbot Engine

•  Summary

Page 3: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

Machine Learning Platforms

3

Page 4: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

PAI Overview

OSS Streaming data: DataHub/TT/Kafka

Feature Engineering

Statistic Methods

Machine Learning

Deep Learning

……

PAI WEB Console PAI IDE

Database: ODPS/RDS

CPU/GPU/FPGA/ASIC/…

Fuxi Scheduler

MR/MPI/PS/Graph/Pluto…

PAI SDK

Data Storage

Distributed Computing

Algorithms

Serving

Frontend

4

Tutorial: data.aliyun.com

Page 5: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

5

PAI Project

Search

Experiment

Data Source

Component

Model

Serving

Page 6: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

Machine Learning with PAI

Data Preprocessing

Sampling & Filtering

Data Merge

Fill Missing Values

Normalization

Feature Engineering

Feature Transformatio

n

Feature Selection

Feature Importance

Feature Generation

Statistics

Correlation Coefficients

Histogram

Hypothesis Test

Visualization

Modeling

Binary Classificatio

n Multiple

Classification

Clustering

Regression

Prediction

Evaluation

Deep Learning

DNN

CNN

RNN

A La Carte

Application

NLP

Search & Rec.

Image Process

Network Analysis

Financial Section

Pluto

6

Page 7: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

7

Deep Learning with PAI

Page 8: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

8

PAI TensorFlow

•  Rich Data IO •  Distributed Job Optimization

(Multi. GPU/CPUs) •  Easy model Serving •  Hyper Parameter Tuning

Page 9: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

Pluto

9

Page 10: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

10

Single-card Optimization

•  Compiler-oriented strategy •  Fuse small ops into bigger one

•  Reduce CUDA kernel launch overhead •  Prepare data layout friendly with low-level computation library

•  Memory optimization •  Here again compiler-oriented tactics

•  Dependency analysis •  Lifetime analysis

Page 11: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

11

Multi-cards Optimization

•  Heuristic-based Model Parallelism •  Both model weights and feature map taken into consideration •  Memory allocator strategy taken into consideration •  A greedy allocation algorithm

•  With pre-run support

Page 12: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

12

Multi-cards Optimization

•  Hybrid-parallelism •  Mixture of data-parallelism and model-parallelism •  For communication-intensive parts, consider model-parallelism •  For computation-intensive parts, consider data-parallelism •  Tricks

•  Integrate seamlessly with computation graph style •  Happier with pyramid network

Page 13: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

13

Multi-cards Optimization

•  Hybrid-parallelism(cont.)

M40 Result K40 Result

Page 14: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

14

Multi-cards Optimization

•  Late-multiply •  Customized for fully-connected layers •  Trade-off between computation and communication

Wavg: [Nl ,Nl+1], X:[M, Nl], E:[M, Nl+1], here Nl,Nl+1 layer sizes, M is mini-batch size

Page 15: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

15

Multi-cards Optimization

•  Late-multiply(cont.)

Page 16: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

16

Multi-cards Optimization

•  Heuristic-based MA •  Automatic batch-size selection •  Learning rate auto-tuning •  Happier with sequential model

Page 17: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

17

Multi-cards Optimization

•  Heuristic-based MA(cont.)

Training Time in Wallclock

Model Metrics

Page 18: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

18

Inference Optimization

•  Quantization •  Significantly reduce model size(4X) •  Around 2X speed-up on average

•  Binarized Neural Network •  Binarize model weights •  Convert floating point computation into bit manipulation •  Both model size and computation speed significantly improved •  Training process needs to be manipulated to compensate for accuracy •  Happier with CNN, but for RNN…

Page 19: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

PAI DL Application

19

Page 20: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

20

AliMe – Personal Assistant Bot in E-commerce

AliMe for

Customers

AliMe for

Sellers

AliMe for

Enterprises

20 From 海青@云栖大会

Page 21: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

Open-Domain Conversations •  Retrieval Model

•  Learning to rank •  Generation Model

•  Sequence to Sequence (Seq2Seq) Model

•  Recurrent Neural Networks: LSTM, GRU (our choice)

Cho et al., 2014

Query QA pairs Knowledge Base

Q1-A1: s1 Q2-A2: s2 Q3-A3: s3

... Qn-An: sn

A1

21

Page 22: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

A Hybrid Conversation Model based on Seq2Seq •  Overview

Query IR Candidates Answer Rerank Output

Answer Generation

Score > T

Yes

No

Seq2Seq Model

QA pairs

Seq2Seq Based Rerank and Generation Modules

KnowledgeBase

Retrieval Module

Chat logs

SNS data

22 [AliMe Chat: Minghui Qiu et al., ACL 2017]

Page 23: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

PAI DL Support for AliMe •  Both the offline training and online serving

backed by PAI •  Through heuristic-based MA, the offline

training task has 2.8X convergence speed-up with 4 cards setting

•  Through quantization, the online serving task has 1.5X speed-up on commodity CPU servers.

23

Page 24: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

24

Conclusion

•  PAI DL •  End2end machine learning platform •  Support big data analytics •  Optimized Deep learning algorithms •  Scheduling on CPU/GPU cloud •  More data intelligence…

•  Pluto •  Distributed optimization engine of PAI DL

•  PAI DL Application •  PAI DL makes it easy to build DL methods

for industrial applications

SCAN BARCODE!START YOUR TRIAL !

数据智能 触手可及

Page 25: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

25

We are hiring! J

[email protected] [email protected]

Page 26: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

26

Reference

•  AliMe Chat: A Sequence to Sequence and Rerank based Chatbot Engine, Minghui Qiu et al., ACL 2017.

•  Deep Learning with PAI: a Case Study of AliMe, Minghui Qiu et al., Deep Learning Summit 2017.

•  TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017.

Page 27: Pluto - NVIDIA...Learning Summit 2017. • TensorFlow in AliMe, Jun Yang et al., Shanghai GDG Mar., 2017. Thanks! Title GTC.pptx Author yangjun Created Date 5/8/2017 6:16:09 AM

Thanks!