deep learning explained - d37djvu3ytnwxt.cloudfront.net · deep learning explained module 2:...
TRANSCRIPT
![Page 1: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/1.jpg)
Deep Learning ExplainedModule 2: Logistic Regression
Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft
Roland Fernandez, Senior Researcher, Microsoft
![Page 2: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/2.jpg)
Module Outline
Application:
OCR with MNIST data
Model:
Logistic Regression
Concepts:
Loss, Minibatch
Train-Test-Predict workflow
![Page 3: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/3.jpg)
MNIST Handwritten Digits (OCR)
• Data set of hand written digits (0-9) with✓60,000 training images
✓10,000 test images
• Each image is: 28 x 28 pixels
Handwritten Digits
1 5 4 35 3 5 35 9 0 6
Corresponding Labels
![Page 4: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/4.jpg)
28 pix
28
pix
.
784 pixels (x)
Logistic Regression
784 pixels ( Ԧ𝑥)
0.1 0.1 0.3 0.9 0.4 0.2 0.1 0.1 0.6 0.3
Model
(W, 𝑏)
Ԧ𝑧 = W Ԧ𝑥𝑇 + 𝑏
Ԧ𝑧
0 1 2 3 4 5 6 7 8 9
weights (W)
784
10
bias( Ԧ𝑏)10
Model Parameters
Model that maps input features to
discrete output classesopposed to linear
regression
![Page 5: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/5.jpg)
28 pix
28
pix
.
784 pixels
S S = Sum (weights x pixels) = 𝑤0 ∙ Ԧ𝑥𝑇
784 784
Logistic Regression
0
𝑤0
784 pixels ( Ԧ𝑥)
![Page 6: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/6.jpg)
28 pix
28
pix
.
784 pixels (x)
S S S = Sum (weights x pixels) = 𝑤0 ∙ Ԧ𝑥𝑇
784 784
Logistic Regression
0 1
S = Sum (weights x pixels) = 𝑤1 ∙ Ԧ𝑥𝑇
784 784
𝑤1
784 pixels ( Ԧ𝑥)
![Page 7: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/7.jpg)
28 pix
28
pix
.
784 pixels (x)
S S
Weights (W)
S = Sum (weights x pixels) = 𝑤0 ∙ Ԧ𝑥𝑇
784 784
784
10
Logistic Regression
S0 1 9
784 pixels ( Ԧ𝑥)
S = Sum (weights x pixels ) = 𝑤9 ∙ Ԧ𝑥𝑇
784 784
…
…
S = Sum (weights x pixels) = 𝑤1 ∙ Ԧ𝑥𝑇
784 784
Ԧ𝑧 =
𝑤9
Ԧ𝑧 = W Ԧ𝑥𝑇
![Page 8: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/8.jpg)
28 pix
28
pix
.
784 pixels (x)
S S
weights (W)
= map to (0-1) rangeActivation function
0.1 0.1 0.3 0.9 0.4 0.2 0.1 0.1 0.6 0.3 Sigmoid
784
10
Model
Logistic Regression
Sbias ( Ԧ𝑏)(10) 0 1 9
…
784 pixels ( Ԧ𝑥)
Ԧ𝑧 = W Ԧ𝑥𝑇 + 𝑏Ԧ𝑧 =
![Page 9: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/9.jpg)
28 pix
28
pix
.
784 pixels (x)
S S
weights (W)
= map to (0-1) rangeActivation function
0.1 0.1 0.3 0.9 0.4 0.2 0.1 0.1 0.6 0.3 Sigmoid
784
10
Model
Logistic Regression
SBias (10)
(𝑏)0 1 9
…
784 pixels ( Ԧ𝑥)
Ԧ𝑧 = W Ԧ𝑥𝑇 + 𝑏Ԧ𝑧 =
![Page 10: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/10.jpg)
28 pix
28
pix
.
784 pixels (x)
S S
weights (W)
Ԧ𝑧 = W Ԧ𝑥𝑇 + 𝑏
= None (pass through)
z0 z1 z2 z3 z4 z5 z6 z7 z8 z9
784
10
Model
Logistic Regression with Softmax
0.08 0.08 0.10 0.17 0.11 0.09 0.08 0.08 0.13 0.01𝑒𝑧i
σ𝑗=09 𝑒𝑧j
softmax
SBias (10)
(𝑏)0 1 9
…
784 pixels ( Ԧ𝑥)
Ԧ𝑧 =
Predicted Probabilities (p)
![Page 11: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/11.jpg)
28 pix
28
pix
.
28 x 28 pix
Loss Function
Lossfunctions
se = σ𝑗= 09 𝑦𝑗 − 𝑝𝑗
2Squared error
ce = −σ𝑗=09 𝑦𝑗 𝑙𝑜𝑔 𝑝𝑗
Cross entropy error
1 5 4 35 3 5 35 9 0 6
Label One-hot encoded (Y)
0 0 0 1 0 0 0 0 0 0
Model(w, b)
Predicted Probabilities (p)
0.08 0.08 0.10 0.17 0.11 0.09 0.08 0.08 0.13 0.01
![Page 12: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/12.jpg)
Train(learner)
Reporting
TrainingData
Trainmore?
Data SamplerFeatures (x), Labels (Y)
Model
z(params)
Train Workflow
params
update params
loss
iterationsY
![Page 13: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/13.jpg)
Train Workflow
MNISTTrain
3
7
8
0
Input feature (X: 128 x 784)12
8 s
am
ples
(min
i-ba
tch
)
Loss cross_entropy_with_softmax(z,Y)One-hot encoded
Label
(Y: 128 x 10)
0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0
1 0 0 0 0 0 0 0 0 0
bias(𝑏
)(D
im-
10
)
Model
S
z = times(X , W) + b
= 𝐖 X𝑇 + 𝑏
Trainer(model, (loss, error), learner)
Trainer.train_minibatch({X, Y})
Learnersgd, adagrad etc, are solvers to estimate – W & b
Error(optional)
classification_error(z,Y)
weights (W)
784
10
![Page 14: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/14.jpg)
Learn the weights: Learners / Optimizers / Solvers
For 1 sample:
Loss (𝐿𝑖) = −σ𝑗=09 𝑦𝑗
𝑖𝑙𝑜𝑔 𝑝𝑗 where: 𝑝𝑗 = 𝑓(𝑥 𝑖 ; 𝜃)𝑗
𝜃 ∈ (𝑤, 𝑏)
For all samples (m = 60000 images):
Total loss = σ𝑖=1𝑚 𝐿𝑖 (𝜃; (𝑥
𝑖 , 𝑦 𝑖 ))
Convex function: There is 1 and only 1 minimum
Fig: courtesy http://codingwiththomas.blogspot.com/2012/09/particle-swarm-optimization.html
![Page 15: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/15.jpg)
Gradient Descent
𝜃′ = 𝜃 − 𝜇 𝑔𝑟𝑎𝑑 𝐿; 𝜃
Where: 𝜃 = model parameter𝜇 = learning rate
Computing “Total Loss” (σ𝑖𝑛 𝐿𝑖) for large data set is expensive and often redundant
- refer to http://sebastianruder.com/optimizing-gradient-descent/ for details
![Page 16: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/16.jpg)
Stochastic Gradient Descent (SGD)
SGD:Update the parameters foreach (data, label) pair
Mini-batch SGD:Update the parameters formini-batch setSet of (data, label) pairs
refer to http://sebastianruder.com/optimizing-gradient-descent/ for details on different learners
![Page 17: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/17.jpg)
Other learners
Momentum-SGDNestorovAdagradAdsdeltaAdam
Refer to http://sebastianruder.com/optimizing-gradient-descent/ for
details on different learners
Image by: Alec Radford
![Page 18: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/18.jpg)
Train(learner)
Reporting
TrainingData
Trainmore?
Data SamplerFeatures (x), Labels (Y)
Data SamplerFeatures (x), Labels (Y)
Validate
Reporting
More?
ValidationData
Model
z(params)
Validation Workflow
params
update params
trainedparams
loss
iterations
Modelfinal
Y Y
![Page 19: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/19.jpg)
Data SamplerFeatures (x), Labels (Y)
Test
Reporting
Testmore?
TestData
Modelfinal
Test Workflow
trainedparams
Y
![Page 20: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/20.jpg)
Test Workflow
MNISTTest
3
7
8
0
Input feature (X*: 32 x 784)3
2sa
mpl
es(m
ini-
batc
h)
One-hot encoded
Label
(Y*: 32 x 10)
0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0
1 0 0 0 0 0 0 0 0 0
ModelS
z = times(X* , 𝐖∗ ) + b*
= 𝐖∗ X∗𝑇 + 𝑏∗
Trainer.test_minibatch({X*, Y*})
Returns the classification error as % incorrectlylabeled MNIST image.
MNISTTrain
bias(𝑏
*)
(Dim
-10
)weights (W*)
784
10
![Page 21: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/21.jpg)
Prediction Workflow
Any MNIST
9
Input feature (new X: 1 x 784)Model
(W, b)
Model.eval(new X)
0.02 0.09 0.03 0.03 0.01 0.02 0.02 0.06 0.02 0.70
Predicted Softmax Probabilities (predicted_label)
[ numpy.argmax(predicted_label) for predicted_label in predicted_labels ]
[9]
![Page 22: Deep Learning Explained - d37djvu3ytnwxt.cloudfront.net · Deep Learning Explained Module 2: Logistic Regression Sayan D. Pathak, Ph.D., Principal ML Scientist, Microsoft Roland Fernandez,](https://reader030.vdocuments.site/reader030/viewer/2022020415/5bee983909d3f2112f8bb940/html5/thumbnails/22.jpg)
Prediction Workflow
Any MNIST
9
5
8
2
Input feature (new X: 25 x 784)Model
(W, b)
Model.eval(new X)
Predicted Softmax Probabilities (predicted_label)
[ numpy.argmax(predicted_label) for predicted_label in predicted_labels ]
[9, 5, 8, …, 2]