fundamental supervised machine learning...

Fundamental Supervised Machine Learning ModelsKent Gauen and Dr. Xiao Wang

Department of Statistics, Purdue University, West Lafayette, IN

Introduction

I Common-place tool to solve complex tasks like natural language processingand image classification (de Freitas (2015),Ng,Hinton (2012))

I Significant tool in bio-informatics, biology, quality control, AI applications, datacompression, and many more...

I Deep learning outperforms people on ”human” skills, such as the board gameGo and some Atari games like Space Invaders

I Fill the gap of understanding between undergraduates in their research and themachine learning tools they use

I Investigate various model performances to distinguish strengths and weakness

Basics

I Models: the system of weights and bias terms used to convert input into outputI Cost Function (criterion): quantitatively measures the performance of the model,

”how well it does”I Training: the process of learning model weights and bias terms such that they

minimize the cost functionI Generalization: how well a model performs on new dataI Over-fitting: low cost on data used for training, but large cost for new dataI Regularization: discourages over-fitting

Data: Training, Cross Validation, Testing

Data-sets split into 3 sets: training set, cross validation, and testing setGeneral data-set split: 80%− 10%− 10%I Training Set: tunes model parametersI Cross Validation: regularizes model to prevent over-fittingI Testing Set: tests the generalization of the modelData pairs: (~xi, ~yi) for i th example from m total in the data set

MNIST Data set

I Little to no pre-processing required (compliments to Lecun et al. (1998))

I Straight-forward task: what class is this digit?

I Non-trivial task: issues of scale and location invariance

I Common performance benchmark for new image classification methods

~xi : 1× 282 or 1× 784 input vector~yi : 1× k , k = 10 label vector where yi,j ∈ {0, 1} and

∑∀j yi,j = 1

Acknowledgments

The research of the authors is supported by NSF grant DMS #1246818.

Neural Networks

Input

1

2

3

4

Hidden Layers

1

2

3

4

5

1

2

3

4

5

Predictions

1

2

I Characterized by model structure and cost functionI Non-linear activation functions create complex decision boundaries

characteristic to neural network modelsI∑

= woj + w1j ∗ x1 + w2j ∗ x2 + . . . + wnj ∗ xn and oj = φ(∑

), where oj isthe output from the j th node in a layer.

Logistic Regression

1

2

3

4

5

6

1

2

3

4

Input

Predictions

Cost

1

Model and Cost Function

Cross EntropyΘ(~xi, ~yi

)=

−m∑

i=1

k∑j=1

yi,j ln(πi,j) + λ‖θj‖2

Soft MaxΘ(~xi)

= πi,j =e−θ

Tj ~xi∑

j e−θTj ~xi

Sigmoid Function

σ (z) =1

1 + e−z

I ”Compresses” output between [0, 1]I Enables probabilistic interpretation of

discrete functions, such as ”yes” or”no” in a binomial setting

I Foundation for classification andfeature detection

Convolutional Neural Network

Input ImageConvolved Image

Pooled Image

A B

C D

A BC D

I A neural network which includes convolutional layersI Filters (3× 3 filter above) convolve across input imageI Type of feature extraction, reducing need for hand-engineered features

Support Vector Machine

x2

x1I Max-margin classifier: the ”best”

linear decision boundaryI Kernel-trick: transforming data into

linearly separable spaceI Hinge Loss: allows for some

misclassification if data classesoverlap

Hinge LossΘ =

1n

n∑i=1

max(

0, 1− yi

(~θ ∗ ~xi − b

))+ λ‖~θj‖2

Kernel− trick : ~xi → li and fj = exp

(−‖~xi − lj‖

2σ2

)[Gaussian]

Model Fitness Overview

1

10

100

0 10 20 30 40 50 60 70 80 90 100

Log

[Cost

(x)]

# of epochs

TrainingCross Validation

Logistic Regression

0.001

0.01

0.1

1

0 10 20 30 40 50 60 70 80 90 100

Log

[Cost

(x)]

# of epochs


Support Vector Machine*

0.1

1

10

100

0 5 10 15 20 25 30 35 40 45 50

Log

[Cost

(x)]

# of epochs


Multi-Layer Perceptron

0.0001

0.001

0.01

0.1

1

0 20 40 60 80 100 120 140 160

Log

[Cost

(x)]

# of epochs


Convolutional Neural Network

Results on MNIST

Model Training Error Testing Error CV Error # MisclassifiedLogit 91.25% 88.79% 88.13% 6683SVM 99.00% 96.69% 96.29% 1202MLP 99.12% 96.74% 96.47% 1119CNN 99.99% 98.74% 98.69% 307

Error =# correctly classified

# in data subset

References

N. de Freitas. Machine learning, 2015. URLhttps://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/.

G. Hinton. Neural networks for machine learning, June 2012. URLhttps://class.coursera.org/neuralnets-2012-001.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. InProceedings of the IEEE, pages 2278–2324, 1998.

A. Ng. Machine learning. URL https://www.coursera.org/learn/machine-learning.

https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/

https://class.coursera.org/neuralnets-2012-001

https://www.coursera.org/learn/machine-learning

fundamental supervised machine learning...

Documents