practical deep neural networks - dgyrt.dgyblog.com/res/dlworkshop/introduction.pdf · practical...

30
Practical Deep Neural Networks GPU computing perspective Introduction Yuhuang Hu Chu Kiong Loo Advanced Robotic Lab Department of Artificial Intelligence Faculty of Computer Science & IT University of Malaya Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 1 / 30

Upload: nguyenduong

Post on 08-Aug-2018

252 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Practical Deep Neural NetworksGPU computing perspective

Introduction

Yuhuang Hu Chu Kiong Loo

Advanced Robotic LabDepartment of Artificial IntelligenceFaculty of Computer Science & IT

University of Malaya

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 1 / 30

Page 2: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Outline

1 Introduction

2 Linear Algebra

3 Dataset PreparationData StandardizationPrinciple Components AnalysisZCA Whitening

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 2 / 30

Page 3: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Introduction

Outline

1 Introduction

2 Linear Algebra

3 Dataset PreparationData StandardizationPrinciple Components AnalysisZCA Whitening

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 3 / 30

Page 4: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Introduction

Objectives

â Light introduction of numerical computation.

â Fundamentals of Machine Learning.

â Softmax Regression.

â Feed-forward Neural Network.

â Convolutional Networks.

â Recurrent Neural Networks.

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 4 / 30

Page 5: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Introduction

Prerequisites

? Basic training in Calculus

? Basic training in Linear Algebra

Matrix operationsMatrix properties: transform, rank, norm, determinant, etcEigendecomposition, Singular Value Decomposition.

? Basic programming skills

If-else conditioningLoopsFunction, class, librarySource code control: Git (optional)

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 5 / 30

Page 6: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Introduction

References

© Deep Learning: An MIT Press book in preparationMain reference in this workshop, still in development, awesome structure,

awesome contents.

© Machine Learning: A probabilistic perspectiveOne of the best Machine Learning books on the market.

© CS231n Convolutional Neural Networks for Visual Recognition NotesNice structured, well written, loads of pictures.

© CS229 Machine Learning Course MaterialsFor basic knowledge, well written, easy-to-read.

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 6 / 30

Page 7: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Introduction

Software and Tools

? Ubuntu 14.04

? CUDA Toolkit 7

? Python 2.7.9 (Why not 3.*?)

? Theano

? numpy, scipy, etc

? Eclipse+PyDev

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 7 / 30

Page 8: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Introduction

Reading List

A reading list is prepared for this workshop, all reading materials can befound at:http://rt.dgyblog.com/ref/ref-learning-deep-learning.html

The list keeps expanding!!

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 8 / 30

Page 9: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Linear Algebra

Outline

1 Introduction

2 Linear Algebra

3 Dataset PreparationData StandardizationPrinciple Components AnalysisZCA Whitening

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 9 / 30

Page 10: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Linear Algebra

Scalar, Vector, Matrix and Tensor

Scalars A scalar is a single number.

Vectors A vector is an array of numbers.

Matrices A matrix is a 2-D array of numbers.

Tensors A tensor is an array of numbers arranged on a regular gridwith a variable number of axes.

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 10 / 30

Page 11: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Linear Algebra

Matrix Operations and Properties

Identity In ∈ Rn×n, IA = AI = A.

Transpose (A>)ij = Aji.

Addition C = A+B where Cij = Aij +Bij .

Scalar D = a ·B + c where Dij = a ·Bij + c.

Multiply C = AB where Cij =∑

k AikBkj .

Element-wise Product C = A�B where Cij = AijBij .

Distributive A(B +C) = AB +AC.

Associative A(BC) = (AB)C.

Transpose of Matrix Product (AB)> = B>A>.

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 11 / 30

Page 12: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Linear Algebra

Inverse Matrices

The matrix inverse of A is denoted as A−1, and it is defined as the matrixsuch that

A−1A = AA−1 = In

if A is not singular matrix.If A is not square or is square but singular, it’s still possible to find it’sgeneralized inverse or pseudo-inverse.

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 12 / 30

Page 13: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Linear Algebra

Norms

We usually measure the size of vectors using an Lp norm:

‖x‖p =

(∑i

|xi|p) 1

p

There are few commonly used norms

L1 norm‖x‖1 =

∑i

|xi|.

l∞ (Max norm)‖x‖∞ = max

i|xi|.

Frobenius norm: Measure the size of matrix, analogy to the L2 norm.

‖A‖F =

√∑i,j

A2ij

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 13 / 30

Page 14: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Linear Algebra

Unit vector, Orthogonal, Orthonormal, Orthogonal Matrix

A unit vector is a vector with unit norm:

‖x‖2 = 1

A vector x and y are orthogonal to each other if x>y = 0. In Rn, at most nvectors may be mutually orthogonal with nonzero norm. If vectors are not onlyorthogonal but also have unit norm, we call them orthonormal.An orthogonal matrix is a square matrix whose rows are mutually orthonormaland whose columns are mutually orthonormal:

A>A = AA> = I

This implies thatA−1 = A>

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 14 / 30

Page 15: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Linear Algebra

Eigendecomposition

An eigenvector of a square matrix A is a non-zero vector v such that

Av = λv

where scalar λ is known as the eigenvalue corresponding to the eigenvector. If vis an eigenvector of A, so is any rescaled vector sv for s ∈ R, s 6= 0. Therefore,we usually only look for unit eigenvectors.We can represent the matrix A using an eigendecomposition.

A = V diag(λ)V −1

where V = [v(1),v(2), . . . ,v(n)] (one column per eigenvector) andλ = [λ1, . . . , λn].Every real symmetric matrix can be decomposed into

A = QΛQ>

where Q is an orthogonal matrix composed of eigenvectors of A, and Λ is adiagonal matrix, with λii being the eigenvalues.Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 15 / 30

Page 16: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Linear Algebra

Singular Value Decomposition (SVD)

SVD is a general purpose matrix factorization method that decompose an n×mmatrix X into:

X = UΣW>

where U is an n× n matrix whose columns are mutually orthonormal, Σ is aretangular diagonal matrix and W is an m×m matrix whose columns aremutually orthonormal. Elements alone the main diagonal of Σ are referred to asthe singular values. Columns of U and W are referred as the left-singular vectorsand right-singular vectors of X respectively.

Left-singular vectors of X are the eigenvectors of XX>.

Right-singular vectors of X are the eigenvectors of X>X.

Non-zero singular values of X are the square roots of the non-zeroeigenvalues for both XX> and X>X.

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 16 / 30

Page 17: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Linear Algebra

Trace, Determinant

The trace operator is defined as

Tr(A) =∑i

Aii

Frobenius norm in trace operator:

‖A‖F =√

Tr(A>A)

The determinant of a square matrix, denoted det(A) is a functionmapping matrices to real scalars. The determinant is equal to the productof all matrix’s eigenvalues.

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 17 / 30

Page 18: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Dataset Preparation

Outline

1 Introduction

2 Linear Algebra

3 Dataset PreparationData StandardizationPrinciple Components AnalysisZCA Whitening

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 18 / 30

Page 19: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Dataset Preparation

MNIST

MNIST dataset contains 60,000 training samples and 10,000 testingsamples (Lecun, Bottou, Bengio, & Haffner, 1998). Each sample is ahand-written digit from 0-9.

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 19 / 30

Page 20: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Dataset Preparation

CIFAR-10

CIFAR-10 dataset consists 60,000 images in 10 classes (Krizhevsky, 2009).There are 50,000 training images and 10,000 testing images.

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 20 / 30

Page 21: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Dataset Preparation

Dataset: How to split the data?

, If dataset has been splitted initially, please keep that way.

, If dataset comes as a whole, usually use 60% as trianing data, 20% ascross-validation and 20% as testing data.

, Another alternative is using 70% as training data, 30% as testingdata.

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 21 / 30

Page 22: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Dataset Preparation

Dataset: Cross validation

v Cross validation is used to keep track the performance of learningalgorithm (overfitting? underfitting?).

v Used to be a small portion of training dataset, randomly selected.

v Serve as testing data when testing data is not visible.

v Help to choose and adjust parameters of the learning model.

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 22 / 30

Page 23: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Dataset Preparation Data Standardization

Mean subtraction, Unit Variance

Let µ as mean image

µ =1

m

m∑i=1

x(i)

And then subtract mean image on every image:

x(i) = x(i) − µ

Let

σ2j =1

m

∑i

(x(i)j

)2and then

x(i)j =

x(i)j

σj

This is also called PCA whitening if it’s applied to rotated data, thevariance here can also be viewed as eigenvalues.Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 23 / 30

Page 24: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Dataset Preparation Principle Components Analysis

PCA

Principle Components Analysis (PCA) is a dimension reduction algorithmthat can be used to significantly speed up unsupervised feature learningalgorithm. PCA will find a lower-dimensional subspace onto which toproject our data.Given a set of data X = {x(1), x(2), . . . , x(m)}, where x(i) ∈ Rn,covariance is computed by

Σ =XX>

m

Use eigendecomposition, we can obtain the eigenvectors U :

U =

| | · · · |u1 u2 · · · un| | · · · |

and corresponding eigenvalues Λ = {λ1, . . . , λn}. Note that usually Λ is asorted in descending order.

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 24 / 30

Page 25: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Dataset Preparation Principle Components Analysis

PCA

Columns in U are principle bases, so that we can rotate our data to thenew bases:

Xrot = UTX;

Noted that X = UXrot.To one of the data x, we can reduce the dimension by

x̃ =

xrot,1...

xrot,k

0...0

xrot,1...

xrot,k

xrot,k+1...

xrot,n

= xrot

The resulted X̃ can approximate the distribution of Xrot since it’sdropping only small components. And we reduced n− k dimensions of theoriginal dataYuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 25 / 30

Page 26: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Dataset Preparation Principle Components Analysis

PCA

We can recover the approximation of original signal X̂ from X̃:

x̂ = U

x̃1...x̃k0...0

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 26 / 30

Page 27: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Dataset Preparation Principle Components Analysis

PCA: Number of components to retain

To determine k, we usually look at the percentage of variance retained,

p =

∑ki=1 λj∑nj=1 λj

(1)

One common heuristic is to choose k so as to retain 99% of the variance(p ≥ 0.99).

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 27 / 30

Page 28: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Dataset Preparation ZCA Whitening

ZCA Whitening

ZCA whitening is just a recover version of PCA whitening:

xZCA = UxPCA = Uxrot√Λ + ε

where ε is a regularization term. When x takes values around [−1, 1], avalue of ε ≈ 10−5 might be typical.

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 28 / 30

Page 29: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Q&A

Q&A

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 29 / 30

Page 30: Practical Deep Neural Networks - DGYrt.dgyblog.com/res/dlworkshop/introduction.pdf · Practical Deep Neural Networks ... ' CS231n Convolutional Neural Networks for Visual Recognition

Q&A

References

Krizhevsky, A. (2009). Learning multiple layers of features from tinyimages (Master Thesis).

Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998, Nov).Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11), 2278-2324.

Yuhuang Hu, Chu Kiong Loo (UM) Intro DNNs 30 / 30