lecture 3: math primer ii
DESCRIPTION
Lecture 3: Math Primer II. Machine Learning Andrew Rosenberg. Today. Wrap up of probability Vectors, Matrices. Calculus Derivation with respect to a vector. Properties of probability density functions. Sum Rule. Product Rule. Expected Values. - PowerPoint PPT PresentationTRANSCRIPT
Machine LearningAndrew Rosenberg
Lecture 3: Math Primer II
2
Today
• Wrap up of probability• Vectors, Matrices.• Calculus• Derivation with respect to a vector.
3
Properties of probability density functions
Sum Rule
Product Rule
4
Expected Values
• Given a random variable, with a distribution p(X), what is the expected value of X?
5
Multinomial Distribution
• If a variable, x, can take 1-of-K states, we represent the distribution of this variable as a multinomial distribution.
• The probability of x being in state k is μk
6
Expected Value of a Multinomial
• The expected value is the mean values.
7
Gaussian Distribution
• One Dimension
• D-Dimensions
8
Gaussians
9
How machine learning uses statistical modeling
• Expectation– The expected value of a function is the
hypothesis• Variance
– The variance is the confidence in that hypothesis
10
Variance• The variance of a random variable describes how
much variability around the expected value there is.• Calculated as the expected squared error.
11
Covariance
• The covariance of two random variables expresses how they vary together.
• If two variables are independent, their covariance equals zero.
12
Linear Algebra• Vectors
– A one dimensional array. – If not specified, assume x is a column
vector.• Matrices
– Higher dimensional array.– Typically denoted with capital letters.– n rows by m columns
13
Transposition
• Transposing a matrix swaps columns and rows.
14
Transposition
• Transposing a matrix swaps columns and rows.
15
Addition
• Matrices can be added to themselves iff they have the same dimensions.– A and B are both n-by-m matrices.
16
Multiplication• To multiply two matrices, the inner dimensions must
be the same.– An n-by-m matrix can be multiplied by an m-by-k matrix
17
Inversion
• The inverse of an n-by-n or square matrix A is denoted A-1, and has the following property.
• Where I is the identity matrix is an n-by-n matrix with ones along the diagonal.– Iij = 1 iff i = j, 0 otherwise
18
Identity Matrix
• Matrices are invariant under multiplication by the identity matrix.
19
Helpful matrix inversion properties
20
Norm
• The norm of a vector, x, represents the euclidean length of a vector.
21
Positive Definite-ness
• Quadratic form– Scalar
– Vector
• Positive Definite matrix M
• Positive Semi-definite
22
Calculus
• Derivatives and Integrals• Optimization
23
Derivatives
• A derivative of a function defines the slope at a point x.
24
Derivative Example
25
Integrals
• Integration is the inverse operation of derivation (plus a constant)
• Graphically, an integral can be considered the area under the curve defined by f(x)
26
Integration Example
27
Vector Calculus
• Derivation with respect to a matrix or vector
• Gradient• Change of Variables with a Vector
28
Derivative w.r.t. a vector
• Given a vector x, and a function f(x), how can we find f’(x)?
29
Derivative w.r.t. a vector
• Given a vector x, and a function f(x), how can we find f’(x)?
30
Example Derivation
31
Example Derivation
Also referred to as the gradient of a function.
32
Useful Vector Calculus identities
• Scalar Multiplication
• Product Rule
33
Useful Vector Calculus identities
• Derivative of an inverse
• Change of Variable
34
Optimization
• Have an objective function that we’d like to maximize or minimize, f(x)
• Set the first derivative to zero.
35
Optimization with constraints
• What if I want to constrain the parameters of the model.– The mean is less than 10
• Find the best likelihood, subject to a constraint.
• Two functions:– An objective function to maximize– An inequality that must be satisfied
36
Lagrange Multipliers
• Find maxima of f(x,y) subject to a constraint.
37
General form
• Maximizing:
• Subject to:
• Introduce a new variable, and find a maxima.
38
Example
• Maximizing:
• Subject to:
• Introduce a new variable, and find a maxima.
39
Example
Now have 3 equations with 3 unknowns.
40
ExampleEliminate Lambda Substitute and Solve
41
Why does Machine Learning need these tools?
• Calculus– We need to identify the maximum likelihood, or
minimum risk. Optimization– Integration allows the marginalization of
continuous probability density functions• Linear Algebra
– Many features leads to high dimensional spaces– Vectors and matrices allow us to compactly
describe and manipulate high dimension al feature spaces.
42
Why does Machine Learning need these tools?
• Vector Calculus– All of the optimization needs to be performed
in high dimensional spaces– Optimization of multiple variables
simultaneously – Gradient Descent– Want to take a marginal over high
dimensional distributions like Gaussians.
43
Next Time
• Linear Regression– Then Regularization