data representation by deep learning

1

[email protected], Deep Learning, Big Data

Hujun YinThe University of Manchester

Data Representation by Deep Learning

SOCO/CISIS/ICEUTE 2018


Outline

• Linear Representation (PCA)

• Nonlinear Representation (NLPCA, MDS, PC/S, etc.)

• Neural Networks

• Deep Neural Networks

• Deep Representation (ConvNet Features)

• Manifold & Meaning

• Autoassociative NNs & Deep Autoencoder

• Unsupervised ConvNets Features

• Summary

2


Linear Data Representation

Gene expressions,

Images,

Patient records,

Surveys,

Documents,

……

• Data matrix: X = [x1, x2, …, xN] , xi Rn

column vector,

xi = [x1, x2, …, xn]TN: number of samples

Ranking University F1 F2 F3 F4 F5 F6 F7 Total

1 Cambridge 241 182 247 97 88 100 50 10052 Oxford 214 175 244 97 81 100 30 941

3 LSE 200 175 233 97 68 100 50 923

4 Imperial 203 154 232 98 67 100 10 8645 York 206 143 208 94 63 76 60 8506 UCL 172 152 210 95 71 100 30 830

7 St Andrews 139 131 194 96 73 91 100 824

8 Warwick 153 155 215 97 69 86 20 7959 Bath 132 142 211 97 66 83 60 7919 Nottingham 176 125 218 96 74 72 30 791

11 Bristol 145 131 218 96 75 94 20 77911 Durham 163 132 207 91 64 72 50 77911 Edinburg 106 145 218 96 74 100 40 779

14 Lancaster 156 144 186 95 62 63 50 756

15 UMIST 135 144 188 97 58 100 30 75216 Birmingham 146 127 204 96 67 87 20 747

17 Loughborough 162 115 177 95 57 66 60 73218 Southampton 143 124 180 93 55 71 50 716

19 King’s College 135 126 204 96 63 100 -10 714

20 Newcastle 134 117 193 97 60 87 20 70821 Manchester 125 134 198 96 66 98 -10 70722 Leeds 122 127 199 97 61 74 20 700

23 Sheffield 143 125 213 97 61 72 -20 69124 East Anglia 125 127 176 96 63 60 40 68724 Leicester 125 120 183 94 52 93 20 687

Microarray Data:-0.4894 0.2899 0.3936 0.5354

-0.5995 -0.0051 0.3586 0.1327

-1.4729 -0.0744 0.3100 -0.0096

-1.0711 -0.0086 0.2926 0.0453

-1.5355 0.0212 0.3589 0.1080

-1.5630 -0.3017 -0.0401 -0.3226

-1.0337 0.0288 0.3059 0.2210

-0.5349 0.2162 0.3105 0.1780

-1.2897 -0.1296 0.2405 0.0714

-1.3245 -0.1271 0.1298 -0.0360

-0.7478 -0.2417 -0.0711 0.0607

-0.9940 0.2885 0.5186 0.4094

-0.3578 0.1788 0.3761 0.3277

-1.3140 0.1721 0.4649 0.5712

-1.3853 0.2254 0.5027 0.6137

-0.2544 0.1048 0.3443 0.2969

-0.3245 0.0863 0.2952 0.1004

-1.4130 0.1173 0.4131 0.4286

-0.7171 -0.2210 0.2796 -0.0098

-1.3806 0.2253 0.6327 0.2951

-1.7246 0.0460 0.2780 0.1937

-1.2140 0.0901 0.3640 0.4338

-1.2926 -0.0320 -0.0356 0.2214

-1.2551 -0.1435 0.2805 0.0478

-1.0175 -0.1559 0.0313 0.0245

-1.4163 0.0690 0.2400 0.2065


Linear Data Representation

• Data matrix: X = [x1, x2, …, xN] , xi Rn

column vector,

xi = [x1, x2, …, xn]T

x1

x2

v1

v2

N: number of samples

Assume: n is too large, and x1, x2, …, xn are correlated.

Can we de-correlate and reduce these variables?

e.g. n→ 1?

→ find v1 so that

v1T X or v1

T XXT v1 largest.

If n→ 2: in addition to v1,

find v2 so that v1⊥v2, and

v2T XXT v2 largest.

3


Linear Data Representation: PCA

• PCA: A linear coordinate transformation

X = [x1, x2, …, xN] , xi zero mean, Covariance: XXT

xi = [x1, x2, …, xn]T

(XXT−iI)vi=0

Eigenvalue problem:

• V=[v1, v2, … , vn]

• =diag [1, 2, … n]

• 1 2 … n eigenvalues or variances

VT XXT V=

x1

x2

v1

v2



• PCA: eigenface example – face images

4



• PCA: eigenface example – first 50 eigenvectors (eigenfaces)



• PCA: eigenface example – representation or reconstruction

Reconstruction of an image from the mean image and a number of weighted eigenfaces, calculated from the ORL database.

5


Why Study Images or Vision?

All our knowledge has its origins in our perceptions

– Leonardo da Vinci


Nonlinear Representation

MDS (Multidimensional Scaling)

• dij: inter-point distance in original space

• ij: dissimilarity• Dij: inter-point distance in projected plot

• Classical MDS: Dijij

• Metric MDS: Dijf(ij)=dij

• Nonmetric MDS: ijkl→DijDkl, i,j,k,l

−=

ji ij

ijij

ji

ij

Sammond

Dd

dS

2][1

−=

ji

ijij

ji

ij

Ddd

S 2

2][

1

4.9 3.0 1.4 0.2

4.7 3.2 1.3 0.2

4.6 3.1 1.5 0.2

5.0 3.6 1.4 0.2

5.4 3.9 1.7 0.4

4.6 3.4 1.4 0.3

……

7.0 3.2 4.7 1.4

6.4 3.2 4.5 1.5

6.9 3.1 4.9 1.5

5.5 2.3 4.0 1.3

6.5 2.8 4.6 1.5

5.7 2.8 4.5 1.3

……

6.3 3.3 6.0 2.5

5.8 2.7 5.1 1.9

7.1 3.0 5.9 2.1

6.3 2.9 5.6 1.8

6.5 3.0 5.8 2.2

7.6 3.0 6.6 2.1

…...

……

6



Principal Curve/Surface

)(inf)(:{sup)(

fff −=−=

xxx

])(|[)( == XX fEf

Projection:

Expectation:

=

S

ii

S

iii

F),(

),()(

x

Kernel smoothing:

-Hastie and Stuetzle (1989)

A smooth and self-consistent curve

passing through the “middle” of the

data.

}



• Kernel PCA: Shölkopf, Smola & Müller (1998)for nonlinear PCA.

° Kernel method has become popular.

Φ : X → F ,

: XX ,

° PCA

,qCq =,

1=

i

Tii

NxxC ,=

i

iixq

−=

x

qxqx2

1)(min

m

jj

Tj

7



• Kernel PCA: Shölkopf, Smola & Müller (1998)for nonlinear PCA.

,αKα =

,)(),(: jiijK xx =

,)( =

i

ii xq

,],......,[ 21T

N=α

,),(),( =

i

ikik xxqx

TN

iii

NCov )()(

1

1 == xx



° Select neighbourhood graph:

k nearest neighbours or ball.

° Reconstruct linear weights:

° Compute embedding coordinates Y:

−=i

jj iji XWXW 2||||min)(

• LLE (Local Linear Embedding):- Roweis & Saul (2000)

for nonlinear dimensionality reduction

−=i

jj iji YWYY 2||||min)(

8



• Grouping of linear/nonlinear mapping (Yin, FEEE, 2011)

Eigen decomp. based MDS based Principal manifold based

PCA, KPCA MDS Principal Curve/Surface

LLE Isomap SOM/ViSOM/GTM

HLLE CCA …

Laplacian eigenmap …

Spectral clustering

…


Neural Networks

v3

v2

x1

x2

x3

xn-1

xn

y1

y2

y3

Hidden layer Output layerInput layer

Weights Weights

::

v1

vnh

• Feed-forward Networks– Perceptron and multilayer perceptron– Radial basis function– Support vector machine

• Recurrent Networks– Hopfield networks– Boltzmann machine

v3

v2

x1

x2

x3

xn

y1

y2

y3

Weights

:

v1

vnh

yn

9


Neural Networks

• Multiplayer perceptron

How does MLP form nonlinear separations?v3

v2

x1

x2

x3

xn-1

xn

y1

y2

y3

Hidden layer Output layerInput layer

Weights Weights

::

v1

vnh

vev

−+=

1

1)(

Key points:

• Each hidden node forms a linearseparate boundary;• An output node is a combination of allhidden nodes, in effect forming apiecewise linear (or nonlinear) separationboundary.


Deep Neural Networks

• CNN (Convolutional Neural Network) or ConvNet: feature layers (convolutional filters to extract features) pooling/subsampling layer (summarize or abstractresponses of the filters, e.g. mean or max pooling) typical CNNs: LetNet5, AlexNet, VGG16, GoogLeNet

CNN LeNet5 Architecture (LeCun & Bottou 1998)

10



• Deep Recurrent or Belief NetworksRBM (Restricted Boltzmann Machine): Stochastic NN with layers of both visible and invisible (latent) nodes to model probabilistic relations of inputs and latent variables.

Image courtesy of deeplearning4j.org

Hinton & Salakhutdinov, 2006

wij=(‹vihj›data−‹ vihj›reconstr)



• Deep NN or Deep learning has demonstrated significant improvements over the conventional shallow NNs (with one/no hidden layer) in increasing number of real-world applications such as image/object recognition.

• Training Deep NNs takes much long time due to many layers and often requires GPUs; vanishing gradients.

• Deep learning is making great impacts in general AI, gaming, robotics/autonomous systems, and many other fields; and will shrive in the next few years.

11


Deep Representation

Visualizing Features of ConvNet (AlexNet):

From Zeiler & Fergus, ECCV2014


Deep Representation

Visualizing Features of ConvNet (AlexNet):

From Zeiler & Fergus, ECCV2014

12


Manifold

Topological space

Differential geometry

Riemannian metric

Riemannian manifold

Curvature

Metric

Topology

Neighbourhood

Geodesic

Intrinsic Invariance

Metric spaceTensor

Surface

Tangent space

Topological manifold

Differentiable manifold

Dimensionality reduction

Set Mannigfaltigkeit

chart

流形

Data visualisationRetinotopic mapping


Manifold

Intuitively, a manifold is a generalization of curves and surfaces to higher dimensions. It is locally Euclidean in that every point has a neighborhood, called a chart, homeomorphic to an open subset of Rn. The coordinates on a chart allow one to carry out computations as though in a Euclidean space, so that many concepts from Rn, such as differentiability, point-derivations, tangent spaces, and differential forms, carry over to a manifold.

L. Tu “An Introduction to Manifolds”

13


Manifold

Manifold is a topological space that is locally Euclidean

• Manifold learning is an approach to machine learning that is capitalizing on the manifold hypothesis: data generating distribution to concentrate near regions of low dimensionality.

• The use of the term manifold in machine learning is much looser than its use in mathematics: (i) data may not be strictly on the manifold, but only near it; (ii) the dimensionality may not be the same everywhere; (iii) the notion actually referred to in machine learning naturally extends to discrete spaces.

Y. Bengio, I Goodfellow, A. Courville

The “Deep Learning Book”, 2015 version


Manifold




14


Manifold





Learning Data Manifold

• Examples – toy data

From H. Yin, Neural Networks, 2008

LLE

gViSOM in Swissroll data

Isomap

-10-5

05

1015

0

10

20

30

-15

-10

-5

0

5

10

15

-4 -3 -2 -1 0 1 2 3-2

-1.5

-1

-0.5

0

0.5

1

1.5

-60 -40 -20 0 20 40 60-15

-10

-5

0

5

10

15Two-dimensional Isomap embedding (with neighborhood graph).

gViSOM with LLP0 10 20 30 40 50 60 70

0

2

4

6

8

10

12

14

16

18

15


Manifold & Data Variations

• Examples – images (Huang & Yin, Img. & Vis. Comp. 2012)

Lighting (YaleB Database)

Expression Lighting

(AR Database)Occlusion&Lighting Occlusion&Lighting


Autoassociative NNs/Autoencoder

• Autoassociative Neural Networks (Kramer, AIChE, 1991)

Diagram from, Z. Sadough, et al, J. Eng. Gas Turbines Power , 2014

YVTT=

)Y(GT =

PCA

Y: data matrix

V: eigenvector matrix

T: principal components

Y: reconstruction

)T(HY =ˆ)YY( ˆEmin −Via

back-propagationself-supervised learning

TVY =ˆ

16


Deep Autoencoder (DAE)

• Deep Autoencoder (Hinton & Salakhutdinov, Science, 2006)




17





Variational Autoencoder (VAE)

• VAEs (Kingma & Welling, 2014, 2015)from the tutorial by Jaan Altosaar: http://jaan.io

low-dimensional/latent space is stochastic

18



• ConvNets + VAEs (Brock, et al, arXiv, 2016)

ConvNets as VAEs

for voxel/3D object modelling with RGB-D data



• ConvNets + VAEs (Brock, et al, arXiv, 2016)

… the samples consistently bear a semblance of structure, with few to no free floating voxels,

suggesting that the decoder network has learned to maintain output voxel connectivity regardless of

the latent configuration. The major limitation of the VAE is that its generated samples do not, however,

resemble real objects ….

19


Deep Representation

from Hankins, Peng & Yin,WCCI2018

• Unsupervised Learning DNN Features


Deep Representation

from Hankins, Peng & Yin, WCCI2018

• Unsupervised Learning DNN Features

20


Deep Representation

from Peng & Yin, IDEAL2017

• Pre-generated DNN Early Features


Some Recent Studies

• ConvNet with BGP on LFW (Huang & Yin, Pattern Recognition, 2017)

• Video synthesis (demo)

21


Previous PhDs and Post-Doc’s/Research Associates

• Dr Shireen Zaki (Astro Malaysia)• Dr Yicun Ouyang (Shenzhen)• Dr. Aftab Khan (NUST)• Dr. James Burnstone (eLucid mHealth)• Dr. Weilin Huang (Oxford)• Dr. Zareen Mehboob (Surrey)• Dr. He Ni (Zhejiang Univ)• Dr. Israr Hussain (Gov.)• Dr. Lei A. Clifton (Oxford)• Dr. Carla Möller-Levet (Surrey)• MPhil Swapna Sarvesvaran• Dr. Richard Freeman (Michael Page)

• Dr. James Burstone (CEO, eLucid)• Dr. King-wai Lau (UCL)• Dr. Qingfu Zhang (Essex)• Dr. Michel Haritopoulos (France)• Ann Gledson• Ben Russell• Dr. Bicheng Li• Dr. Bruno Baruque (Burgos)• Dr. Jose A. Costa (Natal)• Dr. Lianxiang Zhu • Dr. Xiaohong Chen (NUAA,CSC)

• Ananya Gupta, Deep Learning for Object &Volumetric Estimate• Yao Peng, Deep Learning in Face Expression Recognition• Ali Alsuwaidi, Hyperspectral Image Classification• Richard Hankins, Action Recognition in Robotics• Jingwen Sun, DNN Structure Optimization• Dr Qing Tian, Multi-view Ordinal CCA (NUIST)

Current PhDs & post-doc

Team


Surmmary

• Data representation (features) are important to any follow-on classification, recognition and modelling tasks

• Manifold hypothesis says high dimensional data lies in lower dimensionality or low-dimensional submanifolds

• Deep Learning features are extensions of linear manifold/sub-manifold in multiple and hierarchical fashion

• Understanding data representation can help build/design better classifiers or analytic tools