![Page 1: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/1.jpg)
An Introduction to Statistical andProbabilistic Linear Models
Maximilian Mozes
Proseminar Data MiningFakultat fur Informatik
Technische Universitat Munchen
June 07, 2017
![Page 2: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/2.jpg)
Introduction
In statistical learning theory, linear models areused for regression and classification tasks.
2
![Page 3: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/3.jpg)
Introduction
In statistical learning theory, linear models areused for regression and classification tasks.
I What is regression?
2
![Page 4: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/4.jpg)
Introduction
In statistical learning theory, linear models areused for regression and classification tasks.
I What is regression?
I What is classification?
2
![Page 5: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/5.jpg)
Introduction
In statistical learning theory, linear models areused for regression and classification tasks.
I What is regression?
I What is classification?
I How can we model such concepts in amathematical context?
2
![Page 6: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/6.jpg)
Linear regression
![Page 7: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/7.jpg)
Linear regression - basics
What is regression?
4
![Page 8: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/8.jpg)
Linear regression - basics
What is regression?
I Approximation of data using a (closed)mathematical expression
4
![Page 9: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/9.jpg)
Linear regression - basics
What is regression?
I Approximation of data using a (closed)mathematical expression
I Achieved by estimating the model parametersthat maximize the approximation
4
![Page 10: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/10.jpg)
Linear regression - example
I A company changes the price of their products for thenth time.
5
![Page 11: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/11.jpg)
Linear regression - example
I A company changes the price of their products for thenth time.
I They know how the price changes affected theconsumer behavior n− 1 times before.
5
![Page 12: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/12.jpg)
Linear regression - example
I A company changes the price of their products for thenth time.
I They know how the price changes affected theconsumer behavior n− 1 times before.
I Using linear regression, they can predict the consumerbehavior for the nth price change.
5
![Page 13: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/13.jpg)
Linear regression - basics
I Let D denote a set of n-dimensional datavectors
6
![Page 14: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/14.jpg)
Linear regression - basics
I Let D denote a set of n-dimensional datavectors
I Let x be an n-dimensional observation
6
![Page 15: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/15.jpg)
Linear regression - basics
I Let D denote a set of n-dimensional datavectors
I Let x be an n-dimensional observation
I How can we approximate x?
6
![Page 16: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/16.jpg)
Linear regression - basics
Create a linear function
y(x,w) = w0 + w1x1 + · · ·+ wnxn
that approximates x with w.
7
![Page 17: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/17.jpg)
Linear regression - basics
Example for n = 2.
y(x,w) = w0 + w1x1 + w2x2
8
![Page 18: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/18.jpg)
Linear regression - basics
ProblemWeight parameters wi are simply values.
9
![Page 19: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/19.jpg)
Linear regression - basics
ProblemWeight parameters wi are simply values.
=⇒ significant limitation!
9
![Page 20: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/20.jpg)
Linear regression - basics
ProblemWeight parameters wi are simply values.
=⇒ significant limitation!
IdeaUse weighted non-linear functions φj instead!
y(x,w) = w0 +n∑
j=1
wjφj(x) = w>φ(x),
where φ = (φ0, ..., φn)>.
9
![Page 21: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/21.jpg)
Polynomial regression
ExampleLet φi(x) = xi.
10
![Page 22: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/22.jpg)
Polynomial regression
ExampleLet φi(x) = xi.
y(x,w) = w0 +n∑
j=1
wjxj
= w0 + w1x+ w2x2 + · · ·+ wnx
n
10
![Page 23: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/23.jpg)
Polynomial regression
Approximation with a 2nd-order polynomial.
11
![Page 24: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/24.jpg)
Polynomial regression
Approximation with a 6th-order polynomial.
11
![Page 25: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/25.jpg)
Polynomial regression
Approximation with an 8th-order polynomial.
11
![Page 26: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/26.jpg)
Polynomial regression
Problem
12
![Page 27: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/27.jpg)
Polynomial regression
Problem
=⇒ Overfitting with higher polynomial degree
12
![Page 28: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/28.jpg)
Polynomial regression
13
![Page 29: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/29.jpg)
Polynomial regression
13
![Page 30: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/30.jpg)
Linear classification
![Page 31: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/31.jpg)
Linear classification - basics
What is classification?
15
![Page 32: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/32.jpg)
Linear classification - basics
What is classification?
I Aims to partition the data into predefinedclasses
15
![Page 33: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/33.jpg)
Linear classification - basics
What is classification?
I Aims to partition the data into predefinedclasses
I A class contains observations with similarcharacteristics
15
![Page 34: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/34.jpg)
Linear classification - example
I We have n different cucumbers andcourgettes.
16
![Page 35: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/35.jpg)
Linear classification - example
I We have n different cucumbers andcourgettes.
I Each record contains the weight and thetexture (smooth, rough).
16
![Page 36: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/36.jpg)
Linear classification - example
I We have n different cucumbers andcourgettes.
I Each record contains the weight and thetexture (smooth, rough).
I We want to predict the correct label withoutknowing the real label.
16
![Page 37: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/37.jpg)
Linear classification - basics
I Assume a two-class classification problem
17
![Page 38: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/38.jpg)
Linear classification - basics
I Assume a two-class classification problem
I How can we categorise the data intopredefined classes?
17
![Page 39: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/39.jpg)
Linear classification - basics
I Assume a two-class classification problem
I How can we categorise the data intopredefined classes?
17
![Page 40: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/40.jpg)
Linear classification - basics
I Assume a two-class classification problem
I How can we categorise the data intopredefined classes?
17
![Page 41: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/41.jpg)
Linear classification - basics
I Assume a two-class classification problem
I How can we categorise the data intopredefined classes?
17
![Page 42: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/42.jpg)
Linear classification - basics
Cucumbers vs. courgettes
18
![Page 43: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/43.jpg)
Linear classification - basics
Discriminant functions
I Given a dataset D,
19
![Page 44: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/44.jpg)
Linear classification - basics
Discriminant functions
I Given a dataset D,
I we aim to categorise x into either class C1 orC2.
19
![Page 45: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/45.jpg)
Linear classification - basics
Discriminant functions
I Given a dataset D,
I we aim to categorise x into either class C1 orC2.
Use y(x) such that
x ∈ C1 ⇐⇒ y(x) ≥ 0 and x ∈ C2 otherwise
19
![Page 46: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/46.jpg)
Linear classification - basics
Discriminant functions
I Given a dataset D,
I we aim to categorise x into either class C1 orC2.
Use y(x) such that
x ∈ C1 ⇐⇒ y(x) ≥ 0 and x ∈ C2 otherwise
withy(x) = w>x + w0.
19
![Page 47: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/47.jpg)
Linear classification - basics
Decision boundary H:
H := {x ∈ D : y(x) = 0}
20
![Page 48: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/48.jpg)
Linear classification - basics
Decision boundary H:
H := {x ∈ D : y(x) = 0}
20
![Page 49: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/49.jpg)
Non-linearity
However, it often occurs that the data are notlinearly-separable:
21
![Page 50: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/50.jpg)
Non-linearity
However, it often occurs that the data are notlinearly-separable:
21
![Page 51: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/51.jpg)
Non-linearity
However, it often occurs that the data are notlinearly-separable:
21
![Page 52: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/52.jpg)
Non-linearity
SolutionUse non-linear functions instead!
22
![Page 53: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/53.jpg)
Non-linearity
SolutionUse non-linear functions instead!
=⇒ y(x) = w>φ(x)
φ(x) = (φ0, φ1, ..., φM−1)>
22
![Page 54: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/54.jpg)
Common classification algorithms
Other commonly-used classification algorithms:
23
![Page 55: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/55.jpg)
Common classification algorithms
Other commonly-used classification algorithms:
I Naive Bayes classifier
23
![Page 56: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/56.jpg)
Common classification algorithms
Other commonly-used classification algorithms:
I Naive Bayes classifier
I Logistic regression (Cox, 1958)
23
![Page 57: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/57.jpg)
Common classification algorithms
Other commonly-used classification algorithms:
I Naive Bayes classifier
I Logistic regression (Cox, 1958)
I Support vector machines (Vapnik and Lerner,1963)
23
![Page 58: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/58.jpg)
Summary
24
![Page 59: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/59.jpg)
Summary
I Linear modelsLinear combinations of weighted (non-)linearfunctions
24
![Page 60: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/60.jpg)
Summary
I Linear modelsLinear combinations of weighted (non-)linearfunctions
I RegressionApproximation of a given amount of datausing a closed mathematical representation
24
![Page 61: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/61.jpg)
Summary
I Linear modelsLinear combinations of weighted (non-)linearfunctions
I RegressionApproximation of a given amount of datausing a closed mathematical representation
I ClassificationCategorisation of data according to individualcharacteristics and common patterns
24
![Page 62: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/62.jpg)
Further readings
Further readings:I J. Aldrich, 1997. R.A. Fisher and the Making of Maximum Likelihood
1912 - 1922. Statistical Science Vol. 12, No. 3, 162-176.
I D. Barber, Bayesian Reasoning and Machine Learning. CambridgeUniversity Press. 2012.
I C. M. Bishop, Pattern Recognition and Machine Learning. Springer.2006.
I T. Hastie, R. Tibshirani and J. Friedman, The Elements of StatisticalLearning - Data Mining, Inference, and Prediction. Second edition.Springer. 2009.
I K. Murphy, Machine Learning - A Probabilistic Perspective. The MITPress, Cambridge, Massachusetts, London, England. 2012.
I A. Y. Ng and M. I. Jordan, 2002. On Discriminative vs. Generativeclassifiers: A comparison of logistic regression and naive bayes.Advances in Neural Information Processing Systems.
25
![Page 63: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/63.jpg)
References
References:
I J. Aldrich, 1997. R.A. Fisher and the Making of Maximum Likelihood1912 - 1922. Statistical Science Vol. 12, No. 3, 162-176.
I D. Barber, Bayesian Reasoning and Machine Learning. CambridgeUniversity Press. 2012.
I C. M. Bishop, Pattern Recognition and Machine Learning. Springer.2006.
I D. R. Cox, 1958. The regression analysis of binary sequences. Journalof the Royal Statistical Society Vol. XX, No. 2.
26
![Page 64: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/64.jpg)
References
References:
I T. Hastie, R. Tibshirani and J. Friedman, The Elements of StatisticalLearning - Data Mining, Inference, and Prediction. Second edition.Springer. 2009.
I K. Murphy, Machine Learning - A Probabilistic Perspective. The MITPress, Cambridge, Massachusetts, London, England. 2012.
I F. Rosenblatt, 1958. The Perceptron: A probabilistic model forinformation storage and organization in the brain. PsychologicalReview Vol. 65, No. 6.
26
![Page 65: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/65.jpg)
Thank you for yourattention.
Questions?
![Page 66: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/66.jpg)
Backup slides
![Page 67: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/67.jpg)
Sum of least squares
![Page 68: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/68.jpg)
Parameter estimation
Estimating the weight parametersHow to choose the wi?
30
![Page 69: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/69.jpg)
Parameter estimation
Estimating the weight parametersHow to choose the wi?
=⇒ Find the set of parameters that maximize
p(D |w)
30
![Page 70: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/70.jpg)
Sum of least squares
I Method to optimize the weights w
31
![Page 71: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/71.jpg)
Sum of least squares
I Method to optimize the weights w
I Aims to minimize the residual sum of squares(RSS) by optimizing weights
31
![Page 72: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/72.jpg)
Sum of least squares
I Method to optimize the weights w
I Aims to minimize the residual sum of squares(RSS) by optimizing weights
I Minimize
RSS(w) =N∑i=1
(zi − y(xi,w))2
=N∑i=1
zi − w0 −M−1∑j=1
xijwj
2
31
![Page 73: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/73.jpg)
Sum of least squares
I RSS can be simplified by using an N ×Mmatrix X with the xi as rows
32
![Page 74: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/74.jpg)
Sum of least squares
I RSS can be simplified by using an N ×Mmatrix X with the xi as rows
I Then
RSS(w) = (z−Xw)>(z−Xw),
where z is a vector of target values
32
![Page 75: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/75.jpg)
Sum of least squares
I Building the derivatives leads to
∂RSS
∂w= −2X>(z−Xw)
∂2RSS
∂w∂w>= 2X>X
33
![Page 76: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/76.jpg)
Sum of least squares
I Building the derivatives leads to
∂RSS
∂w= −2X>(z−Xw)
∂2RSS
∂w∂w>= 2X>X
I Setting the first derivative to zero and solvingfor w results in
w = (X>X)−1X>z
33
![Page 77: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/77.jpg)
Sum of least squares
I Note: the approach assumes X>X to bepositive definite
34
![Page 78: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/78.jpg)
Sum of least squares
I Note: the approach assumes X>X to bepositive definite
I Therefore, X is assumed to have full columnrank
34
![Page 79: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/79.jpg)
Sum of least squares
I Approximation function can be described as
zi = y(xi,w) + ε,
where ε represents the data noise
35
![Page 80: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/80.jpg)
Sum of least squares
I Approximation function can be described as
zi = y(xi,w) + ε,
where ε represents the data noise
I RSS provides a measurement for theprediction error ED defined as
ED(w) =1
2
N∑i=1
(zi −w>φ(xi))2,
where φ = (φ0, ..., φM−1)>
35
![Page 81: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/81.jpg)
Maximum LikelihoodEstimation
![Page 82: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/82.jpg)
Maximum Likelihood Estimation
Maximum Likelihood Estimation (MLE)
37
![Page 83: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/83.jpg)
Maximum Likelihood Estimation
Maximum Likelihood Estimation (MLE)
I Introduced by Fisher (1922)
37
![Page 84: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/84.jpg)
Maximum Likelihood Estimation
Maximum Likelihood Estimation (MLE)
I Introduced by Fisher (1922)
I Commonly-used method to optimize themodel parameters
37
![Page 85: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/85.jpg)
Maximum Likelihood Estimation
GoalFind the optimal parameters w such thatp(D |w) is maximized, i.e.
w , arg maxw
p(D |w).
38
![Page 86: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/86.jpg)
Maximum Likelihood Estimation
GoalFind the optimal parameters w such thatp(D |w) is maximized, i.e.
w , arg maxw
p(D |w).
For target values zi, p(zi |xi,w) are assumed tobe independent such that
p(D |w) =N∏i=1
p(zi |xi,w)
holds for all zi ∈ D.38
![Page 87: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/87.jpg)
Maximum Likelihood Estimation
The product makes the equation unwieldy. Let’ssimplify it using the log!
39
![Page 88: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/88.jpg)
Maximum Likelihood Estimation
The product makes the equation unwieldy. Let’ssimplify it using the log!
log p(D |w) =N∑i=1
log p(zi |xi,w)
39
![Page 89: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/89.jpg)
Maximum Likelihood Estimation
The product makes the equation unwieldy. Let’ssimplify it using the log!
log p(D |w) =N∑i=1
log p(zi |xi,w)
We can now compute ∇ log p(D |w) = 0 andsolve for w.
=⇒ w , w are the optimal weight parameters!
39
![Page 90: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/90.jpg)
Maximum Likelihood Estimation
I Assumption: data noise ε follows Gaussiandistribution
40
![Page 91: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/91.jpg)
Maximum Likelihood Estimation
I Assumption: data noise ε follows Gaussiandistribution
I Then p(D |w) can be described as
p(z |x,w, β) = N (z | y(x,w), β−1),
where β is the model’s inverse variance
40
![Page 92: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/92.jpg)
Maximum Likelihood Estimation
I For z we get
log p(z |X,w, β) =N∑i=1
logN (zi |w>φ(xi), β−1)
41
![Page 93: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/93.jpg)
Maximum Likelihood Estimation
I It follows that
∇ log p(z |X,w, β) = β ·N∑i=1
(zi −w>φ(xi)) · φ(xi)>
42
![Page 94: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/94.jpg)
Maximum Likelihood Estimation
I It follows that
∇ log p(z |X,w, β) = β ·N∑i=1
(zi −w>φ(xi)) · φ(xi)>
I Setting the gradient to zero and solving for wleads to
wML = (Φ>Φ)−1Φ>z
42
![Page 95: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/95.jpg)
Maximum Likelihood Estimation
I Φ is called design matrix and is described as
Φ =
φ0(x1) φ1(x1) . . . φM−1(x1)φ0(x2) φ1(x2) . . . φM−1(x2)
...... . . . ...
φ0(xN) φ1(xN) . . . φM−1(xN)
43
![Page 96: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/96.jpg)
Regularization and ridgeregression
![Page 97: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/97.jpg)
Ridge regression
I Method to prevent overfitting
45
![Page 98: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/98.jpg)
Ridge regression
I Method to prevent overfitting
I Add regularization term compensating thebiased prediction
45
![Page 99: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/99.jpg)
Ridge regression
I Regularization term λ||w||22
46
![Page 100: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/100.jpg)
Ridge regression
I Regularization term λ||w||22I Error function E(w) becomes
E(w) =1
2
N∑i=1
(zi −w>φ(xi))2 + λ||w||22
46
![Page 101: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/101.jpg)
Ridge regression
I Regularization term λ||w||22I Error function E(w) becomes
E(w) =1
2
N∑i=1
(zi −w>φ(xi))2 + λ||w||22
I Minimizing E(w) and solving for w results in
wridge = (λI + Φ>Φ)−1Φ>z
46
![Page 102: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/102.jpg)
Non-linear classification
![Page 103: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/103.jpg)
Non-linearity
In our example
48
![Page 104: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/104.jpg)
Non-linearity
In our example
the appropriate decision boundary is
φ : (x1, x2)> 7→ (r · cos(x1), r · sin(x2))
>, r ∈ R.48
![Page 105: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/105.jpg)
Multi-class classification
![Page 106: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/106.jpg)
Multi-class classification
I The discriminant function for two-classclassification can be extended to a k-classproblem
50
![Page 107: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/107.jpg)
Multi-class classification
I The discriminant function for two-classclassification can be extended to a k-classproblem
I Use k-class discriminator
yk(x) = w>k x + wk0
50
![Page 108: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/108.jpg)
Multi-class classification
I The discriminant function for two-classclassification can be extended to a k-classproblem
I Use k-class discriminator
yk(x) = w>k x + wk0
I x is assigned to class Cj if yj(x) > yi(x) for alli 6= j
50
![Page 109: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/109.jpg)
Multi-class classification
I The decision boundary is then
yj(x) = yi(x)
which can be transformed to
(wj −wi)>x + wj0 − wi0 = 0
51
![Page 110: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/110.jpg)
Multi-class classification
52
![Page 111: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/111.jpg)
Perceptron algorithm
![Page 112: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/112.jpg)
Perceptron
Given an input vector x and a fixed non-linearfunction φ(x), the class of x is estimated by
y(x) = f(w>φ(x)),
where f(t) is called non-linear activationfunction.
54
![Page 113: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/113.jpg)
Perceptron
Given an input vector x and a fixed non-linearfunction φ(x), the class of x is estimated by
y(x) = f(w>φ(x)),
where f(t) is called non-linear activationfunction.
f(t) =
{+1 if t ≥ 0,
−1 otherwise.
54
![Page 114: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/114.jpg)
Probabilistic generativemodels
![Page 115: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/115.jpg)
Probabilistic generative models
I Compute probability p(x, z) directly insteadof optimizing the weight parameters
56
![Page 116: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/116.jpg)
Probabilistic generative models
I Compute probability p(x, z) directly insteadof optimizing the weight parameters
I Apply Bayes’ theorem on p(z |x)
56
![Page 117: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/117.jpg)
Bayes’ theorem
For a set of disjunct samples A1, ..., An and agiven sample B the probabilityp(Ai|B), i ∈ {1, ..., n} can be computed as follows:
p(Ai|B) =p(B|Ai) · p(Ai)∑nj=1 p(B|Aj) · p(Aj)
57
![Page 118: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/118.jpg)
Probabilistic generative models
I Consider a two-class classification problem forC1 and C2
58
![Page 119: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/119.jpg)
Probabilistic generative models
I Consider a two-class classification problem forC1 and C2
I Then
p(C1 |x) =p(x | C1)p(C1)
p(x | C1)p(C1) + p(x | C2)p(C2)
=1
1 + e−a
= σ(a),
where
a = log
(p(x | C1)p(C1)p(x | C2)p(C2)
)and σ(a) =
1
1 + e−a
58
![Page 120: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/120.jpg)
Probabilistic discriminativemodels
![Page 121: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/121.jpg)
Probabilistic discriminative models
I Predict the correct class by directlycomputing the posterior probability
p(z |x)
60
![Page 122: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/122.jpg)
Probabilistic discriminative models
I Predict the correct class by directlycomputing the posterior probability
p(z |x)
I This makes the computation of p(x | z)(Bayes’ theorem) redundant
60
![Page 123: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/123.jpg)
Probabilistic discriminative models
I Computing the posterior probability
p(Ck |x, θoptCk|x)
is then achieved by using MLE
61
![Page 124: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/124.jpg)
Probabilistic discriminative models
I Computing the posterior probability
p(Ck |x, θoptCk|x)
is then achieved by using MLE
I Disadvantage: only little knowledge about thegiven data is required (”black-box”)
61
![Page 125: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/125.jpg)
Logistic regression
![Page 126: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/126.jpg)
Logistic regression
I Commonly-used classification algorithm forbinary classification problems
63
![Page 127: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/127.jpg)
Logistic regression
I Commonly-used classification algorithm forbinary classification problems
I Assumption: data noise ε follows Bernoullidistribution
Ber(n) = pn(1− p)1−n, n ∈ {0, 1}
as this distribution is more appropriate for abinary classification problem
63
![Page 128: An Introduction to Statistical and Probabilistic …...An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakult at fur Informatik](https://reader033.vdocuments.site/reader033/viewer/2022042319/5f095ca77e708231d426792c/html5/thumbnails/128.jpg)
Logistic regression
I Predict the correct class label on theprobability
p(Ck |x,w) = Ber(Ck |σ(w>x)),
where σ is a squashing function (e.g. sigmoid)
64