an introduction to deep learning with apache mxnet (november 2017)

Julien Simon, AI/ML Evangelist, EMEA

@julsimon

November 8th, 2017

An introduction to Deep Learning

with Apache MXNet

Agenda

• The Advent of Deep Learning

• Deep Learning Applications

• Apache MXNet Overview

• Apache MXNet Demos (there will be code, ha!)

Artificial Intelligence: design software applications which

exhibit human-like behavior, e.g. speech, natural language

processing, reasoning or intuition

Machine Learning: teach machines to learn without being

explicitly programmed

Deep Learning: using neural networks, teach machines to

learn from complex data where features cannot be explicitly

expressed

The Advent of

Deep Learning

Algorithms

The Advent of

Deep LearningData

Algorithms

The Advent of

Deep LearningData

& Acceleration

Algorithms

The Advent of

Deep LearningData

& Acceleration

Programming

models

Algorithms

Selected customers running AI on AWS

Infrastructure CPU

Engines MXNet TensorFlow Caffe Theano Pytorch CNTK

ServicesAmazon Polly

Platforms

Speech

Amazon Lex

Mobile

Amazon AI: Artificial Intelligence In The Hands Of Every Developer

Amazon

Spark &

EMRKinesis Batch ECS

Amazon Rekognition

Vision

Hardware innovation for Deep Learning

https://aws.amazon.com/blogs/aws/new-amazon-ec2-instances-with-up-to-8-nvidia-tesla-v100-gpus-p3/

https://devblogs.nvidia.com/parallelforall/inside-volta/

https://aws.amazon.com/fr/blogs/aws/now-available-compute-intensive-c5-instances-for-amazon-ec2/

Intel Skylake CPU

November 6th

Nvidia Volta GPU

October 25th

Deep Learning Applications

Image Classification

Same breed?

Humans: 5,1%

https://news.developer.nvidia.com/expedia-ranking-hotel-images-with-deep-learning/

• Expedia have over 10M images from

300,000 hotels

• Using great images boosts conversion

• Using Keras and EC2 GPU instances,

they fine-tuned a pre-trained Convolutional

Neural Network using 100,000 images

• Hotel descriptions now automatically feature the

best available images

• 17,000 images from Instagram

• 7 brands

• Inception v3 model, pre-trained on ImageNet

• Fine-tuning with TensorFlow and EC2 GPU

instances

• Additional work on color extraction

https://technology.condenast.com/story/handbag-brand-and-color-detection

Object Detection

https://github.com/precedenceguo/mx-rcnn https://github.com/zhreshold/mxnet-yolo

Object Segmentation

https://github.com/TuSimple/mx-maskrcnn

https://www.oreilly.com/ideas/self-driving-trucks-enter-the-fast-lane-using-deep-learning

Last June, tuSimple drove an autonomous

for 200 miles from Yuma, AZ to San Diego,

Text Detection and Recognition

https://github.com/Bartzi/stn-ocr

Face Detection

https://github.com/tornadomeet/mxnet-face

Real-Time Pose Estimation

https://github.com/dragonfly90/mxnet_Realtime_Multi-Person_Pose_Estimation

Machine Translation

https://aws.amazon.com/blogs/ai/train-neural-machine-translation-models-with-sockeye/

Amazon Echo

Natural Language Processing & Text-to-Speech

Apache MXNet overview

Apache MXNet

Programmable Portable High Performance

Near linear scaling

across hundreds of GPUs

Highly efficient

models for mobile

and IoT

Simple syntax,

multiple languages

Most Open Best On AWS

Optimized for

Deep Learning on

Accepted into the

Apache Incubator

Input Output

0 0 03

mx. sym. Convol ut i on( dat a, ker nel =( 5, 5) , num_f i l t er =20)

mx. sym. Pool i ng( dat a, pool _t ype=" max" , ker nel =( 2, 2) ,

st r i de=( 2, 2)

l st m. l st m_unr ol l ( num_l st m_l ayer , seq_l en, l en, num_hi dden, num_embed)

2 0 4=Max

mx. sym. Ful l yConnect ed( dat a, num_hi dden=128)

mx. symbol . Embeddi ng( dat a, i nput _di m, out put _di m = k)

2 0 2=Avg

Input Weights

cos(w, queen ) = cos(w, k i ng) - cos(w, m an ) + cos(w, w om an)

mx. sym. Act i vat i on( dat a, act _t ype=" xxxx" )

" r el u"

" t anh"

" si gmoi d"

" sof t r el u"

Neural Art

Face Search

Image Segmentation

Image Caption

“People Riding

Bikes”

Bicycle, People,

Road, Sport

Image Labels

Speech

“People Riding

Bikes”

Machine Translation

“Οι άνθρωποι

ιππασίας ποδήλατα”

Events

mx. model . FeedFor war d model . f i t

mx. sym. Sof t maxOut put

Anat omy of a Deep Lear ning Model

CPU or GPU: your choice

mod = mx.mod.Module(lenet)

mod = mx.mod.Module(lenet, context=mx.gpu(0))

mod = mx.mod.Module(lenet,

context=(mx.gpu(7), mx.gpu(8), mx.gpu(9)))

Inception v3Resnet

Alexnet

88%Efficiency

1 2 4 8 16 32 64 128 256

Distributed Training with MXNet

Apache MXNet demos

1. Image classification: using pre-trained models

Imagenet, multiple CNNs, MXNet

2. Image classification: fine-tuning a pre-trained model

CIFAR-10, ResNet-50, Keras + MXNet

3. Image classification: learning from scratch

MNIST, MLP & LeNet, MXNet

4. Machine Translation: translating German to English

News, LSTM, Sockeye + MXNet

Demo #1 – Image classification: using a pre-trained model

*** VGG16

[(0.46811387, 'n04296562 stage'), (0.24333163,

'n03272010 electric guitar'), (0.045918692, 'n02231487

walking stick, walkingstick, stick insect'),

(0.03316205, 'n04286575 spotlight, spot'),

(0.021694135, 'n03691459 loudspeaker, speaker, speaker

unit, loudspeaker system, speaker system')]

*** ResNet-152

[(0.8726753, 'n04296562 stage'), (0.046159592,

microphone, mike'), (0.018624334, 'n04286575

spotlight, spot'), (0.0058045341, 'n02676566 acoustic

guitar')]

*** Inception v3

[(0.44991142, 'n04296562 stage'), (0.43065304,

torch'), (0.012423956, 'n02676566 acoustic guitar'),

(0.0093934005, 'n03250847 drumstick')]

https://medium.com/@julsimon/an-introduction-to-the-mxnet-api-part-5-9e78534096db

Demo #2 – Image classification: fine-tuning a model

CIFAR-10 data set• 60,000 images in 10 classes

• 32x32 color images

Initial training• Resnet-50 CNN

• 200 epochs

• 82.12% validation

Cars vs. horses• 88.8% validation accuracy

https://medium.com/@julsimon/keras-shoot-out-part-3-fine-tuning-7d1548c51a41

Demo #2 – Image classification: fine-tuning a model

• Freezing all layers but the last one

• Fine-tuning on « cars vs. horses » for 10 epochs

• 2 minutes on 1 GPU (p2)

• 98.8% validation accuracy

Epoch 10/10

10000/10000 [==============================] - 12s

loss: 1.6989 - acc: 0.9994 - val_loss: 1.7490 - val_acc: 0.9880

2000/2000 [==============================] - 2s

[1.7490020694732666, 0.98799999999999999]

Demo #3 – Image classification: learning from scratch

MNIST data set

70,000 hand-written digits

28x28 grayscale images

https://medium.com/@julsimon/training-mxnet-part-1-mnist-6f0dc4210c62

Multi-Layer Perceptron vs. Handmade-Digits-From-Hell™784/128/64/10, Relu, AdaGrad, 100 epochs 97.51% validation accuracy

[[ 0.839 0.034 0.039 0.009 0. 0.008 0.066 0.002 0. 0.004]]

[[ 0. 0.988 0.001 0.003 0.001 0.001 0.002 0.003 0.001 0.002]]

[[ 0.006 0.01 0.95 0.029 0. 0.001 0.004 0. 0. 0.]]

[[ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]

[[ 0. 0.001 0.005 0.001 0.982 0.001 0. 0.007 0. 0.002]]

[[ 0.001 0.001 0. 0.078 0. 0.911 0.01 0. 0. 0.]]

[[ 0.003 0. 0.019 0. 0.005 0.004 0.863 0. 0.105 0.001]]

[[ 0.001 0.008 0.098 0.033 0. 0. 0. 0.852 0.004 0.004]]

[[ 0.001 0. 0.006 0. 0. 0.001 0.002 0. 0.991 0.]]

[[ 0.002 0.158 0.007 0.117 0.082 0.001 0. 0.239 0.17 0.224]]

LeNet CNN vs. Handmade-Digits-From-Hell™ReLu instead of tanh, 10 epochs, AdaGrad 99.20% validation accuracy

[[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

[[ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]

[[ 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]]

[[ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]

[[ 0. 0. 0.001 0. 0.998 0. 0. 0.001 0. 0.]]

[[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]]

[[ 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]]

[[ 0. 0. 0. 0.001 0. 0. 0. 0.999 0. 0.]]

[[ 0. 0. 0.006 0. 0. 0. 0. 0. 0.994 0.]]

[[ 0. 0. 0. 0.001 0.001 0. 0. 0.001 0.001 0.996]]

Demo #4 – Machine Translation: German to English

• Sockeye

• 5.8M sentences (news headlines), 5 hours of training on 8 GPUs (p2)

./translate.sh "Chopin zählt zu den bedeutendsten Persönlichkeiten der

Musikgeschichte Polens .”

Chopin is one of the most important personalities of Poland’s history

./translate.sh "Hotelbetreiber müssen künftig nur den Rundfunkbeitrag

bezahlen, wenn ihre Zimmer auch eine Empfangsmöglichkeit bieten .”

in the future , hotel operators must pay only the broadcasting fee if their

rooms also offer a reception facility .

https://aws.amazon.com/blogs/ai/train-neural-machine-translation-models-with-sockeye/

Anything you dream is fiction, and anything

you accomplish is science, the whole history

of mankind is nothing but science fiction.

Ray Bradbury

Resources

https://aws.amazon.com/ai/

https://aws.amazon.com/blogs/ai/

https://mxnet.io

https://github.com/gluon-api/

https://reinvent.awsevents.com/ watch this space ;)

https://medium.com/@julsimon/

https://aws.amazon.com/fr/events/semaine-ia/

Merci!Julien Simon, AI/ML Evangelist, EMEA

@julsimon

https://aws.amazon.com/evangelists/julien-simon/

an introduction to deep learning with apache mxnet (november 2017)

Technology

web stack deep dive apache

apache flink deep dive

smart manufacturing with apache spark streaming and deep

mxnet review: amazon's scalable deep learning · the...

apache hadoop - a deep dive (part 1 - hdfs)

mxnet talk @user!2016

deep learning tools and frameworks - microsoft.com · deep...

deep water - bringing tensorflow, caffe, mxnet to h2o

deep learning and apache spark

tuning and monitoring deep learning on apache spark

a deeper dive into apache mxnet - march 2017 aws online tech...

deep learning with apache spark: an introduction

apache...

deep learning with apache spark - an introduction

apache stratos (incubation) technical deep dive

deep dive into apache apex app development

deep dive on security in apache spark & apache zeppelin

deep learning & applications in non-cognitive domains ·...

tvm & the apache software foundation · 䡧 mentor for...

deep learning with mxnet - dmitry larko