multimedia data mining using deep learning

41
Multimedia Data Mining using deep learning Peter Wlodarczak [email protected]

Upload: peter-wlodarczak

Post on 15-Apr-2017

477 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Multimedia data mining using deep learning

Multimedia Data Mining

using deep learning

Peter Wlodarczak

[email protected]

Page 2: Multimedia data mining using deep learning

Agenda

Aims

Multimedia Data Mining

Artificial Neural Networks

Deep learning

Challenges

Discussion

Page 3: Multimedia data mining using deep learning

Aims

Analyze multimedia data for:

Object/face recognition

Voice commands

Natural Language Processing

Classification

Automatic caption generation

Record linkage (entity resolution)

Page 4: Multimedia data mining using deep learning

Multimedia Data Mining I

Multimedia data mining:

Unprecedented amount of Multimedia data

since Web 2.0 and Social Media

Prosumer data

Uses algorithms to extract useful patterns

and relations from image, audio and video

data

Traditional methods often not satisfactory

Unsuitable for high dimensionality

Page 5: Multimedia data mining using deep learning

Multimedia Data Mining II

Multimedia data mining has been

improved using deep learning in:

Visual data mining

Natural Language Processing

Deep learner are:

Machine Learning schemes

Usually multi-layered artificial neural

networks

Page 6: Multimedia data mining using deep learning

Artificial Neural Networks I

Artificial Neural Networks:

Suitable to give good approximations for

complex problems

Consist of perceptrons, neurons,

and weighted connections,

the axons

Page 7: Multimedia data mining using deep learning

Artificial Neural Networks II

Perceptron (Neuron)

Linear classifier

Data linearly separable using a hyperplane

Where w = weights, a = real-valued vector,

feature vector, a0 = bias

Binary classifier f(a) that maps its input

vector a to a single, binary output value

w0a0 + w1a1 + w2a2 + … + wkak = 0

Page 8: Multimedia data mining using deep learning

Artificial Neural Networks III

w0

1

bias

attr

a1

attr

a2

attr

a3

w1 w2

w3

f(a) = kwkak + b

f(a) > 0 or

f(a) < 0

Page 9: Multimedia data mining using deep learning

Artificial Neural Networks III

Training data

sex mask cape tie ears smokes class

Batman male yes yes no yes no Good

Robin male yes yes no no no Good

Alfred male no no yes no no Good

Penguin male no no yes no yes Bad

Catwoman female yes no no yes no Bad

Joker male no no no no no Bad

Test data

Batgirl female yes yes no yes no ?

Riddler male yes no no no no ?

Supervised learning

Page 10: Multimedia data mining using deep learning

Artificial Neural Networks IV

Not all data is linearly separable

Page 11: Multimedia data mining using deep learning

Artificial Neural Networks V

Multilayer Perceptron

Perceptrons organized in several layers

A layer is fully interconnected with the next

layer

All nodes except input node are perceptrons

Feedforward neural network

Uses backpropagation for training

Error propagated back to minimize loss function

Page 12: Multimedia data mining using deep learning

Artificial Neural Networks VI

Multilayer perceptron can be used for

non-linear, multiclass classification

Page 13: Multimedia data mining using deep learning

Artificial Neural Networks VII

Gradient descent optimization method

for learning weights

Page 14: Multimedia data mining using deep learning

Artificial Neural Networks VIII

Complexity has to be accurate

(Occam’s razor)

Schapire 2004

Page 15: Multimedia data mining using deep learning

Artificial Neural Networks IX

Schapire 2004

Page 16: Multimedia data mining using deep learning

Artificial Neural Networks X

For building an accurate classifier:

Enough training examples

Good performance on training set

Classifier that is not too complex,

overfitting

Allows to get approximate solutions for

very complex problems

Support Vector Machines (SVM) are a

much simpler alternative to ANN

Page 17: Multimedia data mining using deep learning

Deep learning I

Deep learning

No clear distinction to shallow learner

Multiple layers of non-linear processing

units

Each layer represents features at a higher

level

Forms a hierarchical representation

Majority of deep learners are aNN

Page 18: Multimedia data mining using deep learning

Deep learning II

Deep learning neural networks

Uses Rectified Linear Unit (ReLU)

Learn faster

Half-wave rectifier

f(z) = max(z, 0)

Use backpropagation for adjusting the

weights

Page 19: Multimedia data mining using deep learning

Deep learning III - ConvNet

LeNet 2015

Page 20: Multimedia data mining using deep learning

Deep learning IV - ConvNet

Convolutional neural networks

Inspired by the animal visual cortex

Visual cortex is the most powerful visual

processing system in existence

Typically two stages:

Convolutional stage

Pooling stage

Characterized by

sparse connectivity

shared weights

Page 21: Multimedia data mining using deep learning

Deep learning V - ConvNet

Shared weights

Subsets share weights and bias to form

feature map

Replicated across entire visual field

Page 22: Multimedia data mining using deep learning

Deep learning VI - ConvNet

Each layer accepts 3D input vector and

transforms it into a 3D output vector

Filters activate when specific feature is

mapped

CS231n 2015

Page 23: Multimedia data mining using deep learning

Deep learning VII - ConvNet

Receptive field spans all feature maps

LeNet 2015

Page 24: Multimedia data mining using deep learning

Deep learning VIII - ConvNet

MaxPooling

Non-linear down-sampling

Partitions input into non-overlapping

rectangles

Outputs maximum value for each sub-

region

Minimizes computation for next layer

Reduces dimensionality of intermediate

representations

Page 25: Multimedia data mining using deep learning

Deep learning IX - ConvNet

Convolutional and sampling sublayers

UFLDL 2015

Page 26: Multimedia data mining using deep learning

Deep learning X - ConvNet

Image cascading max-pooling with

convolutionary layer

Similar to edge detector

Page 27: Multimedia data mining using deep learning

Deep learning XI - RNN

Recurrent neural networks

Contain directed cycles

Take sequences as input, no fixed size

input and output vectors, e. g. natural

speech

Page 28: Multimedia data mining using deep learning

Deep learning XII - RNN

No fixed size of computations

Much simpler than ConvNets

Maintain inner state exhibiting dynamic

temporal behavior

Optimized through backpropagation

Can be extended with long time memory

extensions

Don’t necessary need sequences of inputs

Page 29: Multimedia data mining using deep learning

Deep learning XIII - RNN

Training RNN is a non-linear global

optimization problem

Trained using stochastic gradient descent

Non-linear, differentiable activation

function, e. g. rectifier

Trained through backpropagation through

time (BPTT)

Genetic algorithms can be used for training

Page 30: Multimedia data mining using deep learning

Deep learning XIV - RNN

Many different architectures for RNN

Elman SRN Spiking neural network

Page 31: Multimedia data mining using deep learning

Deep learning XV - RNN

RNN learns to read house numbers

RNN learns to paint house numbers

Karpathy 2015

Page 32: Multimedia data mining using deep learning

Deep learning XVI - RNN

RNN used for

Transcribe speech to text

Voice synthetization

Machine translation

Page 33: Multimedia data mining using deep learning

Deep learning XVII

Combining ConvNets and RNN for

image descriptions

Regions described

using language as

label space using

ConvNet

Language synthesizing

using RNN

Karpathy & Fei-Fei 2014

Page 34: Multimedia data mining using deep learning

Deep learning XVIII

ConvNet and RNN can be combined

Automated caption generation

Page 35: Multimedia data mining using deep learning

Deep learning XIX

Automatic feature extraction

No closed vocabulary set

Alignment of segments of sentences to

region on the image

Karpathy & Fei-Fei 2014

Page 36: Multimedia data mining using deep learning

Deep learning XX

Other applications

Object recognition

Movie classification

Handwriting recognition

Record linkage

Page 37: Multimedia data mining using deep learning

Challenges I

Main disadvantage large volumes of

training data needed

Overfitting if not enough training data

Optimization difficult

Finding relevant information

Privacy preservice data mining

Page 38: Multimedia data mining using deep learning

Challenges II

Describing actions

Page 39: Multimedia data mining using deep learning

Discussion

Future research in

Attention based models

Finding relevant information

Data democratization and Internet of

Things

Unsupervised learning

Semantic data modeling

Reasoning

Page 40: Multimedia data mining using deep learning

Thank you for the attention

Questions?

Page 41: Multimedia data mining using deep learning

References

Zhao, X, Li, X & Zhang, Z 2015, 'Multimedia Retrieval via Deep Learning to Rank ', IEEE Signal Processing Letters, vol. 22, no. 9, pp. 1487 -

91 <http://ieeexplore.ieee.org.ezproxy.usq.edu.au/xpls/abs_all.jsp?arnumber=7054452>.

Yu, W, Zhuang, F, He, Q & Shi, Z 2015, 'Learning deep representations via extreme learning machines', Neurocomputing, vol. 149, Part A,

pp. 308-15, <http://www.sciencedirect.com/science/article/pii/S0925231214011461>.

Xu, K, Ba, J, Kiros, R, Cho, K, Courville, A, Salakhutdinov, R, Zemel, R & Bengio, Y 2015, 'Show, Attend and Tell: Neural Image Caption

Generation with Visual Attention', Proceedings of the 32nd International Conference on Machine Learning from Data: Artificial Intelligence

and Statistics, vol. 37.

Xin, J, Wang, Z, Qu, L & Wang, G 2015, 'Elastic extreme learning machine for big data classification', Neurocomputing, vol. 149, Part A, pp.

464-71, <http://www.sciencedirect.com/science/article/pii/S0925231214011503>.

Weston, J, Chopra, S & Bordes, A 2015, 'Memory Networks', in 3rd International Conference on Learning Representations: proceedings of

the3rd International Conference on Learning Representations San Diego, viewed <http://arxiv.org/pdf/1410.3916v10.pdf>.

Weilong, H, Xinbo, G, Dacheng, T & Xuelong, L 2015, 'Blind Image Quality Assessment via Deep Learning', Neural Networks and Learning

Systems, IEEE Transactions on, vol. 26, no. 6, pp. 1275-86.

Wang, Y, Li, D, Du, Y & Pan, Z 2015, 'Anomaly detection in traffic using L1-norm minimization extreme learning machine', Neurocomputing,

vol. 149, Part A, pp. 415-25, <http://www.sciencedirect.com/science/article/pii/S0925231214011382>.

Vinyals, O, Toshev, A, Bengio, S & Erhan, D 2015, 'Show and Tell: A Neural Image Caption Generator', Google,

<http://arxiv.org/pdf/1411.4555v1.pdf>.

Noda, K, Yamaguchi, Y, Nakadai, K, Okuno, H & Ogata, T 2015, 'Audio-visual speech recognition using deep learning', Applied Intelligence,

vol. 42, no. 4, pp. 722-37, <http://dx.doi.org/10.1007/s10489-014-0629-7>.

Mao, W, Zhao, S, Mu, X & Wang, H 2015, 'Multi-dimensional extreme learning machine', Neurocomputing, vol. 149, Part A, pp. 160-70,

<http://www.sciencedirect.com/science/article/pii/S0925231214011540>.

Liu, X, Wang, L, Huang, G-B, Zhang, J & Yin, J 2015, 'Multiple kernel extreme learning machine', Neurocomputing, vol. 149, Part A, pp. 253-

64, <http://www.sciencedirect.com/science/article/pii/S0925231214011199>.

LeCun, Y, Bengio, Y & Hinton, G 2015, 'Deep learning', Nature, vol. 521, no. 7553, pp. 436-44, <http://dx.doi.org/10.1038/nature14539>.

Srivastava, N, Hinton, G, Krizhevsky, A, Sutskever, I & Salakhutdinov, R 2014, 'Dropout: a simple way to prevent neural networks from

overfitting', J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929-58.

Karpathy, A & Fei-Fei, L 2014, 'Deep visual-semantic alignments for generating image descriptions', arXiv preprint arXiv:1412.2306.