deep learning - conceptual understanding and applications

77
Deep Learning Conceptual Understanding & Application Variants [email protected] The aim of this slide is at conceptual understanding of deep learning, rather than implementing it for practical applications. That is, this is more for paper readers, less for system developers. Note that, some variables are not uniquely defined nor consistently used for convenience. Also, variable/constant and vector/matrix are not clearly distinct. More importantly, some descriptions may be incorrect due to lack of my ability.

Upload: buhwan-jeong

Post on 13-Jul-2015

4.159 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Deep learning - Conceptual understanding and applications

Deep Learning Conceptual Understanding & Application Variants

[email protected]

The aim of this slide is at conceptual understanding of deep learning, rather than implementing it for practical applications. That is, this is more for paper readers, less for system developers.Note that, some variables are not uniquely defined nor consistently used for convenience. Also, variable/constant and vector/matrix are not clearly distinct.More importantly, some descriptions may be incorrect due to lack of my ability.

Page 2: Deep learning - Conceptual understanding and applications

Deep Learning (DL) is, in concise, a deep and wide neural network. That is half-true.

Page 3: Deep learning - Conceptual understanding and applications

Deep Learning is difficult because of

lack of understanding about

— artificial neural network (ANN) — unfamiliar applications.

Page 4: Deep learning - Conceptual understanding and applications

DL — ANN — Signal transmission between neurons — Graphical model (Belief network) — Linear regression / logistic regression — Weight, bias & activation — (Feed-forward) Multi-layer perceptron (MLP)

— Optimization — Back-propagation — (Stochastic) Gradient-decent

Page 5: Deep learning - Conceptual understanding and applications

DL — Applications — Speech recognition (Audio) — Natural language processing (Text) — Information retrieval (Text) — Image recognition (Vision)

— & yours (+)

Page 6: Deep learning - Conceptual understanding and applications

Artificial Neural Network (ANN)

Page 7: Deep learning - Conceptual understanding and applications

ANN is a nested ensemble of [logistic] regressions.

Page 8: Deep learning - Conceptual understanding and applications

Perceptron & Message Passing

Page 9: Deep learning - Conceptual understanding and applications

http://en.wikipedia.org/wiki/Neuron

Page 10: Deep learning - Conceptual understanding and applications

5 5

y = x

Message passing

Page 11: Deep learning - Conceptual understanding and applications

y = Σ xy = Σ wxy = Σ wx + b

Linear regression

Page 12: Deep learning - Conceptual understanding and applications

y = sigm(Σwx + b)

Logistic regression

Σwx?Σwx

Page 13: Deep learning - Conceptual understanding and applications

Activation function — Linear g(a) = a — Sigmoid g(a) = sigm(a) = 1 / (1 + exp(-a)) — Tanh g(a) = tanh(a) = (exp(a) - exp(-a)) / (exp(a) + exp(-a)) — Rectified linear g(a) = reclin(a) = max(0, a)

— Step — Gaussian — Softmax, usually for the output layer

Page 14: Deep learning - Conceptual understanding and applications

Bias controls activation.

y = ax

b

y = ax + b

Page 15: Deep learning - Conceptual understanding and applications

Multi-Layer Perceptron (MLP)

Page 16: Deep learning - Conceptual understanding and applications

Linearly non-separable by a single perceptron

Page 17: Deep learning - Conceptual understanding and applications

Input layer

Hidden layer

Output layer

http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html

Interconnecting weight

Bias

Activation function

Page 18: Deep learning - Conceptual understanding and applications

http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html

Page 19: Deep learning - Conceptual understanding and applications

Back-Propagation & Optimization

Page 20: Deep learning - Conceptual understanding and applications

{x, y} h=a(wx+b) y'=a2(w2h+b2)

l = y - y'w2 & b2w & b

w2R & b2RwR & bR y'

Back-propagation

{x, y}

—> feed-forward<— back-propagation

Page 21: Deep learning - Conceptual understanding and applications

Reinforced through epoch

ANN is a reinforcement learning.

Page 22: Deep learning - Conceptual understanding and applications

Local minimum Plateaus

http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html

Page 23: Deep learning - Conceptual understanding and applications

Stochastic Gradient Descent

The slide excludes details. See reference tutorial for that.

Page 24: Deep learning - Conceptual understanding and applications

ANN is a good approximator of any functions, if well-designed and trained.

Page 25: Deep learning - Conceptual understanding and applications

Deep Learning

Page 26: Deep learning - Conceptual understanding and applications

DL (DNN) is a deep and wide neural network.

many (2+) hidden layers

many input/hidden nodes

Page 27: Deep learning - Conceptual understanding and applications

DEEP

WIDELarge matrix

Many matrixImage from googling

Page 28: Deep learning - Conceptual understanding and applications

Brain image from Simon ThorpeOther from googling

Page 29: Deep learning - Conceptual understanding and applications

Many large connection matrices (W) — Computational burden (GPU) — Insufficient labeled data (Big Data) — Under-fitting (Optimization) — Over-fitting (Regularization)

Page 30: Deep learning - Conceptual understanding and applications

Unsupervised makes DL success.

Page 31: Deep learning - Conceptual understanding and applications

Unsupervised pre-training — Initialize W matrices in an unsupervised manner — Restricted Boltzmann Machine (RBM) — Auto-encoder

— Pre-train in a layer-wise (stacking) fashion — Fine-tune in a supervised manner

Page 32: Deep learning - Conceptual understanding and applications

Unsupervised representation learning — Construct a feature space — DBM / DBN — Auto-encoder — Sparse coding

— Feed new features to a shallow ANN as input

cf, PCA + regression

Page 33: Deep learning - Conceptual understanding and applications
Page 34: Deep learning - Conceptual understanding and applications

visible

hidden

Restricted Boltzmann Machine (RBM)

Page 35: Deep learning - Conceptual understanding and applications

Boltzmann Machine Example

Page 36: Deep learning - Conceptual understanding and applications

visible

hidden

Page 37: Deep learning - Conceptual understanding and applications

Auto-encoderx

x'

W

WT

encoder

decoder

Page 38: Deep learning - Conceptual understanding and applications

if k <= n (single & linear), then AE becomes PCA. otherwise, AE can be arbitrary (over-complete).

Hidden & Output layers plays the role of semantic feature space in general- compresses & latent feature (under-complete)- expanded & primitive feature (over-complete)

Page 39: Deep learning - Conceptual understanding and applications

Denoising auto-encoderx

x'

nois(x)

Page 40: Deep learning - Conceptual understanding and applications

Contractive auto-encoder

Regularized auto-encoderSame as Ridge (or similar to Lasso) Regression

Page 41: Deep learning - Conceptual understanding and applications

Hinton and Salakhutdinov, Science, 2006

Deep auto-encoder

Page 42: Deep learning - Conceptual understanding and applications

(k > n, over-complete)

Sparse coding

Page 43: Deep learning - Conceptual understanding and applications

X

Y

Page 44: Deep learning - Conceptual understanding and applications

RBM3

RBM2

RBM1

StackingTraining in the order of RBM1, RBM2, RBM3, …

Page 45: Deep learning - Conceptual understanding and applications

Representation Learning

Artificial Neural Network+

- Sparse coding - Auto-encoder - DBN / DBM

Page 46: Deep learning - Conceptual understanding and applications

Stochastic Dropout Training for preventing over-fitting

Page 47: Deep learning - Conceptual understanding and applications

http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html

Page 48: Deep learning - Conceptual understanding and applications

Deep Architecture

Page 49: Deep learning - Conceptual understanding and applications

Deep architecture is an ensemble of shallow networks.

Page 50: Deep learning - Conceptual understanding and applications

Key Deep Architectures — Deep Neural Network (DNN) — Deep Belief Network (DBN) — Deep Boltzmann Machine (DBM) — Recurrent Neural Network (RNN) — Convolution Neural Network (CNN) — Multi-modal/multi-tasking — Deep Stacking Network (DSN)

Page 51: Deep learning - Conceptual understanding and applications

Applications

Page 52: Deep learning - Conceptual understanding and applications

Applications — Speech recognition — Natural language processing — Image recognition — Information retrieval

Application Characteristic- Non-numeric data- Large dimension (curse of dimensionality)

Page 53: Deep learning - Conceptual understanding and applications

Natural language processing — Word embedding — Language modeling — Machine translation — Part-of-speech (POS) tagging — Named entity recognition — Sentiment analysis — Paraphrase detection — …

Page 54: Deep learning - Conceptual understanding and applications

Deep-structured semantic modeling (DSSM)

BM25(Q, D) vs Corr(sim(Q, D), CTR)

Page 55: Deep learning - Conceptual understanding and applications

DBN & DBM

Page 56: Deep learning - Conceptual understanding and applications

Recurrent Neural Network (RNN)

Page 57: Deep learning - Conceptual understanding and applications

Recurrent Neural Network (RNN)

Page 58: Deep learning - Conceptual understanding and applications

http://nlp.stanford.edu/courses/NAACL2013/NAACL2013-Socher-Manning-DeepLearning.pdf

Recursive Neural Network

Page 59: Deep learning - Conceptual understanding and applications

Convolution Neural Network (CNN)

Page 60: Deep learning - Conceptual understanding and applications

C-DSSM

Page 61: Deep learning - Conceptual understanding and applications

CBOW & Skip-gram (Word2Vec)

Page 62: Deep learning - Conceptual understanding and applications

Deep Stacking Network (DSN)

Page 63: Deep learning - Conceptual understanding and applications

Tensor Deep Stacking Network (TDSN)

Page 64: Deep learning - Conceptual understanding and applications

Multi-modal / Mulit-tasking

Page 65: Deep learning - Conceptual understanding and applications

Google Picture to Text

Page 66: Deep learning - Conceptual understanding and applications

Content-based Spotify Recommendation

Page 67: Deep learning - Conceptual understanding and applications

In various applications,

Deep Learning provides either — better performance than state-of-the-art — similar performance with less labor

Page 68: Deep learning - Conceptual understanding and applications

Remarks

Page 69: Deep learning - Conceptual understanding and applications

Deep Personalization with Rich Profile in a Vector Representation

- Versatile / flexible input- Dimension reduction- Same feature space==> Similarity computation

Page 70: Deep learning - Conceptual understanding and applications

RNN for CFhttp://erikbern.com/?p=589

Page 71: Deep learning - Conceptual understanding and applications

Hard work — Application-specific configuration — Network structure — Objective and loss function

— Parameter optimization

Page 72: Deep learning - Conceptual understanding and applications
Page 73: Deep learning - Conceptual understanding and applications

Yann LeCun —> Facebook Geoffrey Hinton —> Google Andrew Ng —> Baidu Joshua Benjio —> U of Montreal

Page 74: Deep learning - Conceptual understanding and applications

References Tutorial

— http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html — http://deeplearning.stanford.edu/tutorial

Paper — Deep learning: Method and Applications (Deng & Yu, 2013) — Representation learning: A review and new perspective (Benjio et al., 2014) — Learning deep architecture for AI (Benjio, 2009)

Slide & Video

— http://www.cs.toronto.edu/~fleet/courses/cifarSchool09/slidesBengio.pdf — http://nlp.stanford.edu/courses/NAACL2013

Book

— Deep learning (Benjio et al. 2015) http://www.iro.umontreal.ca/~bengioy/dlbook/

Page 75: Deep learning - Conceptual understanding and applications

Open Sources (from wikipedia) — Torch (Lua) https://github.com/torch/torch7 — Theano (Python) https://github.com/Theano/Theano/ — Deeplearning4j (word2vec for Java) https://github.com/SkymindIO/deeplearning4j — ND4J (Java) http://nd4j.org https://github.com/SkymindIO/nd4j — DeepLearn Toolbox (matlab) https://github.com/rasmusbergpalm/DeepLearnToolbox/graphs/contributors — convnetjs (javascript) https://github.com/karpathy/convnetjs — Gensim (word2vec for Python) https://github.com/piskvorky/gensim — Caffe (image) http://caffe.berkeleyvision.org — More+++

Page 76: Deep learning - Conceptual understanding and applications

FBI WARNING

Some reference citations are missed. Many of them come from the first tutorial and the first paper, and others from googling. All rights of those images captured, albeit not explicitly cited, are reserved to original authors. Free to share, but cautious of that.

Page 77: Deep learning - Conceptual understanding and applications

Left to Blank