quoc le, slides mlconf 11/15/13

32
Large Scale Deep Learning Quoc V. Le Google & CMU

Upload: sessionsevents

Post on 15-Jan-2015

1.219 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Quoc le, slides  MLconf 11/15/13

Large Scale Deep Learning

Quoc V. Le Google & CMU

Page 2: Quoc le, slides  MLconf 11/15/13

Deep Learning

•  Google is using Machine Learning •  Machine Learning is difficult •  Requires domain knowledge from human experts

Deep Learning:

•  Great performances for many problems

•  Works well with a large amount of data

•  Requires less domain knowledge

Focus:

•  Scale deep learning to bigger models and bigger problems

Quoc V. Le

Page 3: Quoc le, slides  MLconf 11/15/13

Deep Learning

•  Google is using Machine Learning •  Machine Learning is difficult •  Requires domain knowledge from human experts

Deep Learning:

•  Great performances for many problems

•  Works well with a large amount of data

•  Requires less domain knowledge

Focus:

•  Scale deep learning to bigger models and bigger problems

Quoc V. Le

Page 4: Quoc le, slides  MLconf 11/15/13

Quoc V. Le

What is Deep Learning?

Page 5: Quoc le, slides  MLconf 11/15/13

Quoc V. Le

x

v = g(B u)

A

(images, audio, texts, etc.)

u = g(A x)

What is Deep Learning?

B

Page 6: Quoc le, slides  MLconf 11/15/13

Quoc V. Le

x

v = g(B u)

A

(images, audio, texts, etc.)

u = g(A x)

What is Deep Learning?

B

Page 7: Quoc le, slides  MLconf 11/15/13

Quoc V. Le

Pixels

High-level features by Deep Learning

Edge detectors

Face detector, Cat detector

Page 8: Quoc le, slides  MLconf 11/15/13

Model

Training Data

Quoc V. Le

Google’s DistBelief

Goal: Train deep learning on many machines Model: A multiple layered architecture

Forward pass to compute the features Backward pass to compute the gradient

Page 9: Quoc le, slides  MLconf 11/15/13

Model

DistBelief distributes a model across multiple machines and multiple cores.

Training Data

Machine (Model Partition)

Quoc V. Le

Model partition with DistBelief

Page 10: Quoc le, slides  MLconf 11/15/13

Model

Machine (Model Partition)

Core Training Data

Quoc V. Le

DistBelief distributes a model across multiple machines and cores.

Model partition with DistBelief

Page 11: Quoc le, slides  MLconf 11/15/13

Model

Training Data

Stochastic Gradient Descent (SGD)

Model parameters are partitioned

Can use up to 1000 cores

Quoc V. Le

Model partition with DistBelief

Page 12: Quoc le, slides  MLconf 11/15/13

Model

Training Data

But training is still slow on large data sets

Can we add more parallelism? Idea: Train multiple models on different partitions of the data, and merge them

Quoc V. Le

Model partition with DistBelief

Page 13: Quoc le, slides  MLconf 11/15/13

Parameter Server

Model Workers

Data Shards

p’ = p + ∆p

∆p p’

Quoc V. Le

Data partition with DistBelief

Page 14: Quoc le, slides  MLconf 11/15/13

Model parallelism via model partitioning

Data parallelism via data partitioning and asynchronous communications

DistBelief can scale to billion examples and use 100,000 cores or more

Thanks to its speed, DistBelief dramatically improves many applications

Quoc V. Le

Parallelism in DistBelief

Page 15: Quoc le, slides  MLconf 11/15/13

Quoc V. Le

Voice Search Photo Search Text Understanding

Applications

Page 16: Quoc le, slides  MLconf 11/15/13

label!

Voice Search

Speech frame

Hidden layers with 1000s nodes

Classifier

Quoc V. Le

Page 17: Quoc le, slides  MLconf 11/15/13

Quoc V. Le

Voice Search

Page 18: Quoc le, slides  MLconf 11/15/13

Quoc V. Le

Voice Search Photo Search Text Understanding

Applications

Page 19: Quoc le, slides  MLconf 11/15/13

Photo Search

Page 20: Quoc le, slides  MLconf 11/15/13

Quoc V. Le

Cat detector Front page of New York Times

Page 21: Quoc le, slides  MLconf 11/15/13

Seat-belt Boston rocker

Archery Shredder

Page 22: Quoc le, slides  MLconf 11/15/13

Amusement, Park

Face

Hammock

Page 23: Quoc le, slides  MLconf 11/15/13

Google+ PhotoSearch

Page 24: Quoc le, slides  MLconf 11/15/13

Quoc V. Le

Voice Search Photo Search Text Understanding

Applications

Page 25: Quoc le, slides  MLconf 11/15/13

Text understanding

Quoc V. Le

Very useful but also difficult

We should try to understand the meaning of words

Deep Learning can learn the meaning of words

Page 26: Quoc le, slides  MLconf 11/15/13

~100-D vector space

dolphin

Clinton Paris

Text understanding

whale

Obama

Quoc V. Le

Page 27: Quoc le, slides  MLconf 11/15/13

the! cat! sat! on! the!

E E E E Word Matrix

Hidden Layers

Classifier

Predicting the next word in a sentence

is a matrix of dimension ||Vocab|| x d E

Quoc V. Le

Page 28: Quoc le, slides  MLconf 11/15/13

Visualizing the word vectors

•  Example nearest neighbors trained on Google News

apple Apple iPhone

Page 29: Quoc le, slides  MLconf 11/15/13

Mikolov, Sutskever, Le. Learning the Meaning behind Words. Google OpenSource Blog, 2013

Quoc V. Le

Relation Extraction

Page 30: Quoc le, slides  MLconf 11/15/13

Quoc V. Le

Machine Translation

Page 31: Quoc le, slides  MLconf 11/15/13

Quoc V. Le

Model partition Data partition

Voice Search Photo Search Text Understanding

Summary

Page 32: Quoc le, slides  MLconf 11/15/13

Samy Bengio, Tom Dean, Josh Levenberg, Geoff Hinton, Tomas Mikolov, Mark Mao, Patrick Nguyen, Marc’Aurelio Ranzato, Mark Segal, Jon Shlens, Ilya Sutskever, Vincent Vanhoucke

Additional Thanks:

Greg Corrado Jeff Dean Matthieu Devin Kai Chen

Rajat Monga Andrew Ng Paul Tucker Ke Yang

Joint work with