machine learning math essentials part 2
DESCRIPTION
Machine Learning Math Essentials Part 2. Gaussian distribution. Most commonly used continuous probability distribution Also known as the normal distribution Two parameters define a Gaussian: Mean location of center Variance 2 width of curve. Gaussian distribution. In one dimension. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/1.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 1
Machine Learning
Math EssentialsPart 2
![Page 2: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/2.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 2
Most commonly used continuous probability distribution
Also known as the normal distribution
Two parameters define a Gaussian:– Mean location of center– Variance 2 width of curve
Gaussian distribution
![Page 3: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/3.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 3
2
2
2)(
2/122
)2(1),|(
x
exΝ
Gaussian distribution
In one dimension
![Page 4: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/4.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 4
2
2
2)(
2/122
)2(1),|(
x
exΝ
Gaussian distribution
In one dimension
Normalizing constant: insures that distribution
integrates to 1
Controls width of curve
Causes pdf to decrease as distance from center
increases
![Page 5: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/5.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 5
Gaussian distribution
= 0 2 = 1 = 2 2 = 1
= 0 2 = 5 = -2 2 = 0.3
2
21
21 xex
![Page 6: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/6.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 6
)()(21
2/12/
1T
||1
)2(1),|(
μxΣμx
ΣΣμx
eΝ d
Multivariate Gaussian distribution
In d dimensions
x and now d-dimensional vectors– gives center of distribution in d-dimensional space
2 replaced by , the d x d covariance matrix– contains pairwise covariances of every pair of features– Diagonal elements of are variances 2 of individual
features– describes distribution’s shape and spread
![Page 7: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/7.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 7
Covariance– Measures tendency for two variables to deviate from
their means in same (or opposite) directions at same time
Multivariate Gaussian distributionno
cov
aria
nce high (positive)
covariance
![Page 8: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/8.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 8
In two dimensions
Multivariate Gaussian distribution
13.03.025.0
00
Σ
μ
![Page 9: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/9.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 9
In two dimensions
Multivariate Gaussian distribution
26.06.02
Σ
1002
Σ
1001
Σ
![Page 10: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/10.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 10
In three dimensions
Multivariate Gaussian distribution
rng( 1 );mu = [ 2; 1; 1 ];sigma = [ 0.25 0.30 0.10; 0.30 1.00 0.70; 0.10 0.70 2.00] ;x = randn( 1000, 3 );x = x * sigma;x = x + repmat( mu', 1000, 1 );scatter3( x( :, 1 ), x( :, 2 ), x( :, 3 ), '.' );
![Page 11: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/11.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 11
Orthogonal projection of y onto x– Can take place in any space of dimensionality > 2– Unit vector in direction of x is
x / || x ||– Length of projection of y in
direction of x is|| y || cos( )
– Orthogonal projection ofy onto x is the vector
projx( y ) = x || y || cos( ) / || x || =[ ( x y ) / || x ||2 ] x (using dot product
alternate form)
Vector projection
y
x
projx( y )
![Page 12: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/12.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 12
There are many types of linear models in machine learning.– Common in both classification and regression.– A linear model consists of a vector in d-dimensional
feature space.– The vector attempts to capture the strongest gradient
(rate of change) in the output variable, as seen across all training samples.
– Different linear models optimize in different ways.– A point x in feature space is mapped from d dimensions
to a scalar (1-dimensional) output z by projection onto :
Linear models
dd xxz 11xβ
![Page 13: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/13.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 13
There are many types of linear models in machine learning.– The projection output z is typically transformed to a final
predicted output y by some function :
example: for logistic regression, is logistic function example: for linear regression, ( z ) = z
– Models are called linear because they are a linear function of the model vector components 1, …, d.
– Key feature of all linear models: no matter what is, a constant value of z is transformed to a constant value of y, so decision boundaries remain linear even after transform.
Linear models
)()()( 11 dd xxffzfy xβ
![Page 14: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/14.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 14
Geometry of projections
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)
w0 w
![Page 15: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/15.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 15
Geometry of projections
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)
![Page 16: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/16.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 16
Geometry of projections
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)
![Page 17: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/17.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 17
Geometry of projections
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)
![Page 18: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/18.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 18
Geometry of projections
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)
margin
![Page 19: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/19.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 19
From projection to prediction
positive margin class 1
negative margin class 0
![Page 20: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/20.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 20
Interpreting the model vector of coefficients
From MATLAB: B = [ 13.0460 -1.9024 -0.4047 ] = B( 1 ), = [ 1 2 ] = B( 2 : 3 ) , define location and orientation
of decision boundary– - is distance of decision
boundary from origin– decision boundary is
perpendicular to magnitude of defines gradient
of probabilities between 0 and 1
Logistic regression in two dimensions
![Page 21: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/21.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 21
Logistic function in d dimensions
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)
![Page 22: Machine Learning Math Essentials Part 2](https://reader036.vdocuments.site/reader036/viewer/2022081512/5681613b550346895dd09ba0/html5/thumbnails/22.jpg)
Jeff Howbert Introduction to Machine Learning Winter 2012 22
Decision boundary for logistic regression
slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)