progress in deep learning theory - ttic · in neural nets → convexity is not needed ......

48

Upload: others

Post on 26-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption
Page 2: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Progress in Deep Learning Theory

• Exponential advantage of distributed representations

• Exponential advantage of depth

• Myth-busting : non-convexity & local minima

• Probabilistic interpretations of auto-encoders

Page 3: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Machine Learning, AI & No Free Lunch•

1.

2.

3.

4.

Page 4: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

ML 101. What We Are Fighting Against: The Curse of Dimensionality

Page 5: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Not Dimensionality so much as Number of Variations

(Bengio, Dellalleau & Le Roux 2007)

Page 6: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Putting Probability Mass where Structure is Plausible

•••

Page 7: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Bypassing the curse of dimensionality

Page 8: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Exponential advantage of distributed representations

Page 9: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Hidden Units Discover Semantically Meaningful Concepts

••

Page 10: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Each feature can be discovered without the need for seeing the exponentially large number of configurations of the other features•

••••

Page 11: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Exponential advantage of distributed representations

••

Page 12: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Classical Symbolic AI vs Representation Learning

••

cat do

g

person

Page 13: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Neural Language Models: fighting one exponential by another one!

Page 14: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Neural word embeddings: visualizationdirections = Learned Attributes

Page 15: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Analogical Representations for Free(Mikolov et al, ICLR 2013)

• ≈• ≈

Page 16: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Exponential advantage of depth

2n

= universal approximator2 layers of

Logic gatesFormal neuronsRBF units

Theorems on advantage of depth:(Hastad et al 86 & 91, Bengio et al 2007, Bengio & Delalleau 2011, Martens et al 2013, Pascanu et al 2014, Montufar et al NIPS 2014)

Some functions compactly represented with k layers may require exponential size with 2 layers

RBMs & auto-encoders = universal approximator

Page 17: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Why does it work? No Free Lunch

Page 18: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Exponential advantage of depth

Page 19: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

main

subroutine1 includes subsub1 code and subsub2 code and subsubsub1 code

“Shallow” computer program

subroutine2 includes subsub2 code and subsub3 code and subsubsub3 code and …

Page 20: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

main

sub1 sub2 sub3

subsub1 subsub2 subsub3

subsubsub1 subsubsub2subsubsub3

“Deep” computer program

Page 21: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Sharing Components in a Deep Architecture

Page 22: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Exponential advantage of depth

Page 23: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

A Myth is Being Debunked: Local Minima in Neural Nets → Convexity is not needed•

Page 24: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Saddle Points

Page 25: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Saddle Points During Training

•••

Page 26: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Low Index Critical Points

Page 27: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

The Next Challenge: Unsupervised Learning

•••

•••••

Page 28: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Why Latent Factors & Unsupervised Representation Learning? Because of Causality.

Page 29: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Probabilistic interpretation of auto-encoders

••

Page 30: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Denoising Auto-Encoder•

Reconstructio

n

Corrupted input

Corrupted input

Page 31: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Regularized Auto-Encoders Learn a Vector Field that Estimates a Gradient Field

Page 32: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Denoising Auto-Encoder Markov Chain

Page 33: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Variational Auto-Encoders (VAEs)

Page 34: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Geometric Interpretation of VAEs

Page 35: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Denoising Auto-Encoder vs Diffusion Inverter (Sohl-Dickstein et al ICML 2015)

Page 36: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Encoder-Decoder Framework

••

Page 37: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Attention Mechanism for Deep Learning

••

Page 38: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

End-to-End Machine Translation with Recurrent Nets and Attention Mechanism

Page 39: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Image-to-Text: Caption Generation with Attention

Page 40: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Paying Attention to Selected Parts of the Image While Uttering Words

Page 41: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Speaking about what one sees

Page 42: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Page 43: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

The Good

Page 44: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

And the Bad

Page 45: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

The Next Frontier: Reasoning and Question Answering

Page 46: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Ongoing Project: Knowledge Extraction

••

Page 47: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

Conclusions

•••

•…

Page 48: Progress in Deep Learning Theory - TTIC · in Neural Nets → Convexity is not needed ... End-to-End Machine Translation with Recurrent Nets and Attention Mechanism ... Image Caption

MILA: Montreal Institute for Learning Algorithms