fundamental supervised machine learning...
TRANSCRIPT
Fundamental Supervised Machine Learning ModelsKent Gauen and Dr. Xiao Wang
Department of Statistics, Purdue University, West Lafayette, IN
Introduction
I Common-place tool to solve complex tasks like natural language processingand image classification (de Freitas (2015),Ng,Hinton (2012))
I Significant tool in bio-informatics, biology, quality control, AI applications, datacompression, and many more...
I Deep learning outperforms people on ”human” skills, such as the board gameGo and some Atari games like Space Invaders
I Fill the gap of understanding between undergraduates in their research and themachine learning tools they use
I Investigate various model performances to distinguish strengths and weakness
Basics
I Models: the system of weights and bias terms used to convert input into outputI Cost Function (criterion): quantitatively measures the performance of the model,
”how well it does”I Training: the process of learning model weights and bias terms such that they
minimize the cost functionI Generalization: how well a model performs on new dataI Over-fitting: low cost on data used for training, but large cost for new dataI Regularization: discourages over-fitting
Data: Training, Cross Validation, Testing
Data-sets split into 3 sets: training set, cross validation, and testing setGeneral data-set split: 80%− 10%− 10%I Training Set: tunes model parametersI Cross Validation: regularizes model to prevent over-fittingI Testing Set: tests the generalization of the modelData pairs: (~xi, ~yi) for i th example from m total in the data set
MNIST Data set
I Little to no pre-processing required (compliments to Lecun et al. (1998))
I Straight-forward task: what class is this digit?
I Non-trivial task: issues of scale and location invariance
I Common performance benchmark for new image classification methods
~xi : 1× 282 or 1× 784 input vector~yi : 1× k , k = 10 label vector where yi,j ∈ {0, 1} and
∑∀j yi,j = 1
Acknowledgments
The research of the authors is supported by NSF grant DMS #1246818.
Neural Networks
Input
1
2
3
4
Hidden Layers
1
2
3
4
5
1
2
3
4
5
Predictions
1
2
I Characterized by model structure and cost functionI Non-linear activation functions create complex decision boundaries
characteristic to neural network modelsI∑
= woj + w1j ∗ x1 + w2j ∗ x2 + . . . + wnj ∗ xn and oj = φ(∑
), where oj isthe output from the j th node in a layer.
Logistic Regression
1
2
3
4
5
6
1
2
3
4
Input
Predictions
Cost
1
Model and Cost Function
Cross EntropyΘ(~xi, ~yi
)=
−m∑
i=1
k∑j=1
yi,j ln(πi,j) + λ‖θj‖2
Soft MaxΘ(~xi)
= πi,j =e−θ
Tj ~xi∑
j e−θTj ~xi
Sigmoid Function
σ (z) =1
1 + e−z
I ”Compresses” output between [0, 1]I Enables probabilistic interpretation of
discrete functions, such as ”yes” or”no” in a binomial setting
I Foundation for classification andfeature detection
Convolutional Neural Network
Input ImageConvolved Image
Pooled Image
A B
C D
A BC D
I A neural network which includes convolutional layersI Filters (3× 3 filter above) convolve across input imageI Type of feature extraction, reducing need for hand-engineered features
Support Vector Machine
x2
x1I Max-margin classifier: the ”best”
linear decision boundaryI Kernel-trick: transforming data into
linearly separable spaceI Hinge Loss: allows for some
misclassification if data classesoverlap
Hinge LossΘ =
1n
n∑i=1
max(
0, 1− yi
(~θ ∗ ~xi − b
))+ λ‖~θj‖2
Kernel− trick : ~xi → li and fj = exp
(−‖~xi − lj‖
2σ2
)[Gaussian]
Model Fitness Overview
1
10
100
0 10 20 30 40 50 60 70 80 90 100
Log
[Cost
(x)]
# of epochs
TrainingCross Validation
Logistic Regression
0.001
0.01
0.1
1
0 10 20 30 40 50 60 70 80 90 100
Log
[Cost
(x)]
# of epochs
TrainingCross Validation
Support Vector Machine*
0.1
1
10
100
0 5 10 15 20 25 30 35 40 45 50
Log
[Cost
(x)]
# of epochs
TrainingCross Validation
Multi-Layer Perceptron
0.0001
0.001
0.01
0.1
1
0 20 40 60 80 100 120 140 160
Log
[Cost
(x)]
# of epochs
TrainingCross Validation
Convolutional Neural Network
Results on MNIST
Model Training Error Testing Error CV Error # MisclassifiedLogit 91.25% 88.79% 88.13% 6683SVM 99.00% 96.69% 96.29% 1202MLP 99.12% 96.74% 96.47% 1119CNN 99.99% 98.74% 98.69% 307
Error =# correctly classified
# in data subset
References
N. de Freitas. Machine learning, 2015. URLhttps://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/.
G. Hinton. Neural networks for machine learning, June 2012. URLhttps://class.coursera.org/neuralnets-2012-001.
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. InProceedings of the IEEE, pages 2278–2324, 1998.
A. Ng. Machine learning. URL https://www.coursera.org/learn/machine-learning.