unsupervised learning (dlai d9l1 2017 upc deep learning for artificial intelligence)

[course site]

Xavier Giro-i-Nietoxaviergiroupcedu

Associate ProfessorUniversitat Politecnica de CatalunyaTechnical University of Catalonia

Unsupervised Learning

DLUPC

2

Acknowledegments

Kevin McGuinness ldquoUnsupervised Learningrdquo Deep Learning for Computer Vision [Slides 2016] [Slides 2017]

3

Outline

1 Motivation

2 Unsupervised Learning

3 Predictive Learning

4 Self-supervised Learning

4

Outline

1 Motivation




5

Motivation

6

Motivation

7Greff Klaus Antti Rasmus Mathias Berglund Tele Hao Harri Valpola and Juergen Schmidhuber Tagger Deep unsupervised perceptual grouping NIPS 2016 [video] [code]

Yann Lecunrsquos Black Forest cake

8

Motivation

1

= ƒ( )

ƒ( )

= ƒ( )

9

Predict label y corresponding to observation x

Estimate the distribution of observation x

Predict action y based on observation x to maximize a future reward z

Motivation

1

= ƒ( )

ƒ( )

= ƒ( )

10

Motivation


11

Why Unsupervised Learning

It is the nature of how intelligent beings percept the world

It can save us tons of efforts to build a human-alike intelligent agent compared to a totally supervised fashion

Vast amounts of unlabelled data

12

Outline

1 Motivation




Max margin decision boundary

13

How data distribution P(x) influences decisions (1D)

Slide credit Kevin McGuinness (DLCV UPC 2017)

Semi supervised decision boundary

14



15



16



How P(x) is valuable for naive Bayesian classifier

17

X DataY Labels


x1

x2

Not linearly separable in this 2D space (

18

How clustering is valueble for linear classifiers


x1

x2Cluster 1 Cluster 2

Cluster 3Cluster 4

1 2 3 4

1 2 3 4

4D BoW representation

Separable with a linear classifiers in a 4D space

19



Example from Kevin McGuinness Juyter notebook 20


21

Assumptions for unsupervised learning


22



23



24Slide credit Kevin McGuinness (DLCV UPC 2017)

The manifold hypothesis

25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26

The manifold hypothesis Energy-based models

LeCun et al A Tutorial on Energy-Based Learning Predicting Structured Data 2006 httpyannlecuncomexdbpublispdflecun-06pdf

27

Autoencoder (AE)

F Van Veen ldquoThe Neural Network Zoordquo (2016)

28

Autoencoder (AE)

ldquoDeep Learning Tutorialrdquo Dept Computer Science Stanford

Autoencoders

Predict at the output the same input data

Do not need labels

29

Autoencoder (AE)


Dimensionality reduction

Use hidden layer as a feature extractor of any desired size

30

Autoencoder (AE)

EncoderW1

DecoderW2

hdata reconstruction

Loss(reconstruction error)

Latent variables (representationfeatures)

Figure Kevin McGuinness (DLCV UPC 2017)

Pretraining1 Initialize a NN solving an

autoencoding problem

31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction

y Loss(cross entropy)


autoencoding problem2 Train for final task with ldquofewrdquo labels

32

Autoencoder (AE)

Deep Learning TV ldquoAutoencoders - Ep 10rdquo

33F Van Veen ldquoThe Neural Network Zoordquo (2016)

Restricted Boltzmann Machine (RBM)

34


Figure Geoffrey Hinton (2013)

Salakhutdinov Ruslan Andriy Mnih and Geoffrey Hinton Restricted Boltzmann machines for collaborative filtering Proceedings of the 24th international conference on Machine learning ACM 2007

Shallow two-layer net

Restricted = No two nodes in a layer share a connection

Bipartite graph

Bidirectional graph Shared weights Different biases

35


DeepLearning4j ldquoA Beginnerrsquos Tutorial for Restricted Boltzmann Machinesrdquo

Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass

The reconstructed values at the visible layer are compared with the actual ones with the KL Divergence

41




42



RBMs are a specific type of autoencoder

Unsupervisedlearning

43


Deep Learning TV ldquoRestricted Boltzmann Machinesrdquo

44

Deep Belief Networks (DBN)


45


Hinton Geoffrey E Simon Osindero and Yee-Whye Teh A fast learning algorithm for deep belief nets Neural computation 18 no 7 (2006) 1527-1554

Architecture like an MLP Training as a stack of

RBMs

46



RBMs


47




RBMs

48




RBMs

49




RBMshellip so they do not need

labels


50



After the DBN is trained it can be fine-tuned with a reduced amount of labels to solve a supervised task with superior performance

Supervisedlearning

Softm

ax

51


Deep Learning TV ldquoDeep Belief Netsrdquo

52


Geoffrey Hinton Introduction to Deep Learning amp Deep Belief Netsrdquo (2012)Geoorey Hinton ldquoTutorial on Deep Belief Networksrdquo NIPS 2007

53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55

Junting Pan Xunyu Lin

Unsupervised Feature Learning

56

Slide credit Yann LeCun


57


Slide credit Junting Pan

Frame Reconstruction amp Prediction

59Srivastava Nitish Elman Mansimov and Ruslan Salakhutdinov Unsupervised Learning of Video Representations using LSTMs In ICML 2015 [Github]

Unsupervised feature learning (no labels) for




frame prediction




frame prediction


Unsupervised learned features (lots of data) are fine-tuned for activity recognition (little data)


Frame Prediction

63Ranzato MarcAurelio Arthur Szlam Joan Bruna Michael Mathieu Ronan Collobert and Sumit Chopra Video (language) modeling a baseline for generative models of natural videos arXiv preprint arXiv14126604 (2014)

64Mathieu Michael Camille Couprie and Yann LeCun Deep multi-scale video prediction beyond mean square error ICLR 2016 [project] [code]

Video frame prediction with a ConvNet

Frame Prediction


The blurry predictions from MSE are improved with multi-scale architecture adversarial learning and an image gradient difference loss function

Frame Prediction


The blurry predictions from MSE (l1) are improved with multi-scale architecture adversarial training and an image gradient difference loss (GDL) function

Frame Prediction


Frame Prediction

68Vondrick Carl Hamed Pirsiavash and Antonio Torralba Generating videos with scene dynamics NIPS 2016

Frame Prediction


Frame Prediction

70Xue Tianfan Jiajun Wu Katherine Bouman and Bill Freeman Visual dynamics Probabilistic future frame synthesis via cross convolutional networks NIPS 2016 [video]

71

Frame Prediction

Xue Tianfan Jiajun Wu Katherine Bouman and Bill Freeman Visual dynamics Probabilistic future frame synthesis via cross convolutional networks NIPS 2016 [video]

Given an input image probabilistic generation of future frames with a Variational AutoEncoder (VAE)

72

Frame Prediction


Encodes image as feature maps and motion as and cross-convolutional kernels

73

Outline

1 Motivation




First steps in video feature learning

74Le Quoc V Will Y Zou Serena Y Yeung and Andrew Y Ng Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis CVPR 2011

Temporal Weak Labels

75Goroshin Ross Joan Bruna Jonathan Tompson David Eigen and Yann LeCun Unsupervised learning of spatiotemporally coherent metrics ICCV 2015

Assumption adjacent video frames contain semantically similar informationAutoencoder trained with regularizations by slowliness and sparisty

76Jayaraman Dinesh and Kristen Grauman Slow and steady feature analysis higher order temporal coherence in video CVPR 2016 [video]




78(Slides by Xunyu Lin) Misra Ishan C Lawrence Zitnick and Martial Hebert Shuffle and learn unsupervised learning using temporal order verification ECCV 2016 [code]

Temporal order of frames is exploited as the supervisory signal for learning



Take temporal order as the supervisory signals for learning

Shuffled sequences

Binary classification

In order

Not in order


80Fernando Basura Hakan Bilen Efstratios Gavves and Stephen Gould Self-supervised video representation learning with odd-one-out networks CVPR 2017

Temporal Weak LabelsTrain a network to detect which of the video sequences contains frames wrong order

81

Spatio-Temporal Weak Labels

X Lin Campos V Giroacute-i-Nieto X Torres J and Canton-Ferrer C ldquoDisentangling Motion Foreground and Background Features in Videosrdquo in CVPR 2017 Workshop Brave New Motion Representations

kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction

of foreground in last frame

Reconstruction of foreground in

first frame

Reconstruction of background in first frame

uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82


Pathak Deepak Ross Girshick Piotr Dollaacuter Trevor Darrell and Bharath Hariharan Learning features by watching objects move CVPR 2017



84Aytar Yusuf Carl Vondrick and Antonio Torralba Soundnet Learning sound representations from unlabeled video NIPS 2016

85

Audio Features from Visual weak labels

Aytar Yusuf Carl Vondrick and Antonio Torralba Soundnet Learning sound representations from unlabeled video NIPS 2016

Object amp Scenes recognition in videos by analysing the audio track (only)


Visualization of the 1D filters over raw audio in conv1








89

Visualization of the video frames associated to the sounds that activate some of the last hidden units of Soundnet (conv7)



90

Audio amp Visual features from alignement

90Arandjelović Relja and Andrew Zisserman Look Listen and Learn ICCV 2017

Audio and visual features learned by assessing alignement

91L Chen S Srivastava Z Duan and C Xu Deep Cross-Modal Audio-Visual Generation ACM International Conference on Multimedia Thematic Workshops 2017


92Ephrat Ariel and Shmuel Peleg Vid2speech Speech Reconstruction from Silent Video ICASSP 2017


Video to Speech Representations

CNN(VGG)

Frame from a silent video

Audio feature

Post-hoc synthesis

94Chung Joon Son Amir Jamaludin and Andrew Zisserman You said that BMVC 2017


Speech to Video Synthesis (mouth)

96Karras Tero Timo Aila Samuli Laine Antti Herva and Jaakko Lehtinen Audio-driven facial animation by joint end-to-end learning of pose and emotion SIGGRAPH 2017


Speech to Video Synthesis (pose amp emotion)


99Suwajanakorn Supasorn Steven M Seitz and Ira Kemelmacher-Shlizerman Synthesizing Obama learning lip sync from audio SIGGRAPH 2017


100

Outline

1 Motivation




2

Acknowledegments

Kevin McGuinness ldquoUnsupervised Learningrdquo Deep Learning for Computer Vision [Slides 2016] [Slides 2017]

3

Outline

1 Motivation




4

Outline

1 Motivation




5

Motivation

6

Motivation



8

Motivation

1

= ƒ( )

ƒ( )

= ƒ( )

9




Motivation

1

= ƒ( )

ƒ( )

= ƒ( )

10

Motivation


11





12

Outline

1 Motivation





13




14



15



16




17

X DataY Labels


x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




3

Outline

1 Motivation




4

Outline

1 Motivation




5

Motivation

6

Motivation



8

Motivation

1

= ƒ( )

ƒ( )

= ƒ( )

9




Motivation

1

= ƒ( )

ƒ( )

= ƒ( )

10

Motivation


11





12

Outline

1 Motivation





13




14



15



16




17

X DataY Labels


x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




4

Outline

1 Motivation




5

Motivation

6

Motivation



8

Motivation

1

= ƒ( )

ƒ( )

= ƒ( )

9




Motivation

1

= ƒ( )

ƒ( )

= ƒ( )

10

Motivation


11





12

Outline

1 Motivation





13




14



15



16




17

X DataY Labels


x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




5

Motivation

6

Motivation



8

Motivation

1

= ƒ( )

ƒ( )

= ƒ( )

9




Motivation

1

= ƒ( )

ƒ( )

= ƒ( )

10

Motivation


11





12

Outline

1 Motivation





13




14



15



16




17

X DataY Labels


x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




6

Motivation



8

Motivation

1

= ƒ( )

ƒ( )

= ƒ( )

9




Motivation

1

= ƒ( )

ƒ( )

= ƒ( )

10

Motivation


11





12

Outline

1 Motivation





13




14



15



16




17

X DataY Labels


x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation






8

Motivation

1

= ƒ( )

ƒ( )

= ƒ( )

9




Motivation

1

= ƒ( )

ƒ( )

= ƒ( )

10

Motivation


11





12

Outline

1 Motivation





13




14



15



16




17

X DataY Labels


x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




1

= ƒ( )

ƒ( )

= ƒ( )

9




Motivation

1

= ƒ( )

ƒ( )

= ƒ( )

10

Motivation


11





12

Outline

1 Motivation





13




14



15



16




17

X DataY Labels


x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




1

= ƒ( )

ƒ( )

= ƒ( )

10

Motivation


11





12

Outline

1 Motivation





13




14



15



16




17

X DataY Labels


x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation





11





12

Outline

1 Motivation





13




14



15



16




17

X DataY Labels


x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




12

Outline

1 Motivation





13




14



15



16




17

X DataY Labels


x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation





13




14



15



16




17

X DataY Labels


x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation





14



15



16




17

X DataY Labels


x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




15



16




17

X DataY Labels


x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




16




17

X DataY Labels


x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation





17

X DataY Labels


x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




x1

x2


18



x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




x1


Cluster 3Cluster 4

1 2 3 4

1 2 3 4



19





21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation






21



22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




22



23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




23





25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation






25



x1

x2

Linear manifold

wTx + b

x1

x2

Non-linear manifold

26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




26



27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




27

Autoencoder (AE)


28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




28

Autoencoder (AE)


Autoencoders


Do not need labels

29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




29

Autoencoder (AE)




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




30

Autoencoder (AE)

EncoderW1

DecoderW2







31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




31

Autoencoder (AE)


EncoderW1

hdata ClassifierWC


prediction




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




32

Autoencoder (AE)




34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation






34






Bipartite graph


35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




35



Forward pass

36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




36



Forward pass

37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




37



Forward pass

38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




38



Backwardpass

39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




39



Backwardpass

40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




40



Backwardpass


41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




41




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




42





43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




43



44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




44



45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




45




RBMs

46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




46



RBMs


47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




47




RBMs

48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




48




RBMs

49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




49





labels


50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




50




Supervisedlearning

Softm

ax

51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




51



52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




52



53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




53


Geoffrey Hinton

54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




54

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




Acknowledgments

55



56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation





56



57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation





57









frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation











frame prediction




frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation







frame prediction




Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation







Frame Prediction




Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation






Frame Prediction



Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation






Frame Prediction



Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation






Frame Prediction


Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation





Frame Prediction


Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation





Frame Prediction


Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation





Frame Prediction


71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation





71

Frame Prediction



72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




72

Frame Prediction



73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




73

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation


















Shuffled sequences


In order

Not in order




81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation






81



kernel dec

C3D

ForegroundMotion

FirstForeground

Background

Fg

Dec

Bg

Dec

Fg

Dec Reconstruction



first frame


uNLC

MaskBlock

gradients

Last foreground

Kernels

shareweights

82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




82






85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation







85













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation













89




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation




90









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation









CNN(VGG)


Audio feature

Post-hoc synthesis










100

Outline

1 Motivation













100

Outline

1 Motivation




unsupervised learning (dlai d9l1 2017 upc deep learning for artificial intelligence)

Data & Analytics