8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 1/46
Lecture 2.1: Vector Calculus
CSC 84020 - Machine Learning
Andrew Rosenberg
February 5, 2009
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 2/46
Today
Last Time
Probability ReviewToday
Vector Calculus
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 3/46
Background
Let’s talk.Linear Algebra
VectorsMatricesBasis Spaces
Eigenvectors/values?Inversion and transposition
Calculus
DerivationIntegration
Vector CalculusGradientsDerivation w.r.t. a vector
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 4/46
Linear Algebra Basics
What is a vector?
What is a matrix?
Transposition
Adding matrices and vectors
Multiplying matrices.
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 5/46
Definitions
A vector is a one dimensional array.We denote vectors as either x, x.If we don’t specify otherwise assume x is a column vector .
x =
x 0x 1. . .
x n−1
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 6/46
Definitions
A matrix is a higher dimensional array.We typically denote matrices as capital letters e.g., A.If A is an n-by-m matrix, it has the following structure
A =
a0,0 a0,1 . . . a0,m−1
a1,0 a1,1 a1,m−1...
. . ....
an−1,0 an−1,1 . . . an−1,m−1
M i i i
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 7/46
Matrix transposition
Transposing a matrix or vector swaps rows and columns.
A column-vector becomes a row-vector
x =
x 0x 1. . .
x n−1
x
T
=
x 0 x 1 . . . x n−
1
M i i i
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 8/46
Matrix transposition
Transposing a matrix or vector swaps rows and columns.
A column-vector becomes a row-vector
A =
a0,0 a0,1 . . . a0,m−1
a1,0 a1,1 a1,m−1...
.. .
.
..an−1,0 an−1,1 . . . an−1,m−1
A
T
=
a0,0 a1,0 . . . an−1,0
a0,1 a1,1 a1,m−1
... . . . ...a0,m−1 a1,m−1 . . . an−1,m−1
If A is n-by-m, then AT is m-by-n.
Addi M t i
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 9/46
Adding Matrices
Matrices can only be added if they have the same dimension.
A+B =
a0,0
+ b 0,0
a0,1
+ b 0,1
. . . a0,m
−
1+ b
0,m−
1a1,0 + b 1,0 a1,1 + b 1,1 a1,m−1 + b 1,m−1...
. . ....
an−1,0 + b n−1,0 an−1,1 + b n−1,1 . . . an−1,m−1 + b n−1,m−1
M lti l i t i
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 10/46
Multiplying matrices
To multiply two matrices, the inner dimensions must match.
An n-by-m can be multiplied by an n′-by-m′ matrix iff m = n′.
AB = C
c ij =m
k =0
aik ∗ b kj
That is, multiply the i -th row by the j -th column.
Image from wikipedia.
Us f l t i ti s
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 11/46
Useful matrix operations
Inversion
NormEigenvector decomposition
Matrix Inversion
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 12/46
Matrix Inversion
The inverse of an n-by-m matrix A is denoted A−1, and has thefollowing property.
AA−1 = I
Where I is the identity matrix, an n-by-n matrix where I ij = 1 iff i = j and 0 otherwise.If A is a square matrix (iff n = m) then,
A−1A = I
Matrix Inversion
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 13/46
Matrix Inversion
The inverse of an n-by-m matrix A is denoted A−1, and has thefollowing property.
AA−1 = I
Where I is the identity matrix, an n-by-n matrix where I ij = 1 iff i = j and 0 otherwise.If A is a square matrix (iff n = m) then,
A−1A = I
What is the inverse of a vector? x−1 =?
Some useful Matrix Inversion Properties
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 14/46
Some useful Matrix Inversion Properties
(A−1)−1 = A
(kA
)
−1
=k −1A−1
(AT )−1 = (A−1)T
(AB )−1
= B −1
A−1
The norm of a vector
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 15/46
The norm of a vector
The norm of a vector x is written ||x||.The norm represents the euclidean length of a vector.
||x|| =
n−1i =0
x 2i
=
x 20 + x 21 + . . . + x 2n−1
Positive Definite/Semi Definite
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 16/46
Positive Definite/Semi-Definite
A positive definite matrix, M has the property that
x T Mx > 0
A positive semi-definite matrix, M has the property that
x T Mx ≥ 0
Why might we care about these matrices?
Eigenvectors
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 17/46
Eigenvectors
For a square matrix A, the eigenvector is defined as
Aui = λi ui
Where ui is an eigenvector and λi is its correspondingeigenvalue.
In general, eigenvalues are complex numbers, but if A issymmetric, they are real.
Eigenvectors describe how a matrix transforms a vector, and canbe used to define a basis space, namely the eigenspace.
Who cares? The eigenvectors of a covariance matrix have somevery interesting properties.
Basis Spaces
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 18/46
Basis Spaces
Basis spaces allow vectors to be represented in different spaces.Our normal 2-dimensional basis space is generated by the vectors[0, 1], [1, 0].
Any 2-d vector can be expressed as the sum of linear factorsof these two basis vectors.
However, any two non-colinear vectors can generate a 2-d basisspace. In this basis space, the generating vectors are perpendicular.
Basis Spaces
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 19/46
Basis Spaces
Basis Spaces
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 20/46
Basis Spaces
Basis Spaces
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 21/46
Basis Spaces
Why do we care?
Dimensionality reduction.
Calculus Basics
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 22/46
Calculus Basics
What is a derivative?
What is an integral?
Derivatives
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 23/46
A derivative, d dx
f (x ) can be thought of as defining the slope of a
function f (x ). This is sometimes also written as f ′(x ).
Derivative Example
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 24/46
p
Integrals
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 25/46
g
Integrals are an inverse operation of the derivative (plus aconstant).
f (x )dx = F (x ) + c
F ′(x ) = f (x )
An integral can be thought of as a calculation of the area underthe curve defined by f (x ).
A definite integral evaluates the area over a finite region. Anindefinite integral is calculated over the range of (−∞,∞).
Integration Example
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 26/46
g p
Useful calculus operations
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 27/46
p
Product, quotient, summation rules for derivatives.
Useful integration and derivative identities.
Chain ruleIntegration by parts
Variable substitution (don’t forget the Jacobian!)
Calculus Identities
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 28/46
Summation rule
g (x ) = f 0(x ) + f 0(x )
g ′(x ) = f ′0(x ) + f ′1(x )
Product Ruleg (x ) = f 0(x )f 1(x )
g ′(x ) = f 0(x )f ′1(x ) + f ′0(x )f 1(x )
Quotient Rule
g (x ) =f 0(x )
f 1(x )
g ′(x ) =f 0(x )f ′1(x )− f ′0(x )f 1(x )
f 21 (x )
Calculus Identities
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 29/46
Constant multipliersg (x ) = cf (x )
g ′(x ) = cf ′(x )
Exponent Rule
g (x ) = f (x )k
g ′(x ) = kf (x )k −1
Chain Rule
g (x ) = f 0(f 1(x ))
g ′(x ) = f ′0(f 1(x ))f ′1(x )
Calculus Identities
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 30/46
Exponent Ruleg (x ) = e x
g ′(x ) = e x
g (x ) = k x
g ′(x ) = ln(k )k x
Logarithm Ruleg (x ) = ln(x )
g ′(x ) =1
x
g (x ) = logb (x )
g ′(x ) =1
x ln b
Calculus Operations
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 31/46
Integration by Parts
f (x )
dg (x )
dx dx = f (x )g (x ) −
g (x )
df (x )
dx dx
Variable Substitution
b a
f (g (x ))g ′(x )dx =
g (b )g (a)
f (x )dx
Vector Calculus
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 32/46
Derivation with respect to to a vector or matrix.
Gradient of a vector.Change of variables with a vector.
Derivation with respect to a vector
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 33/46
Given a vector x = (x 0, x 1, . . . , x n−1)T
, and a functionf (x) : Rn → R how can we find ∂ f (x)
∂ x?
Derivation with respect to a vector
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 34/46
Given a vector x = (x 0, x 1, . . . , x n−1)T
, and a functionf (x) : Rn → R how can we find ∂ f (x)
∂ x?
∂ f (x)
∂ x =
∂ f (x)∂ x 0
∂ f (x)∂ x
1...∂ f (x)∂ x n−1
This is also called the gradient of the function, and is often
written ∇f (x ) or ∇f .
Derivation with respect to a vector
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 35/46
Given a vector x = (x 0, x 1, . . . , x n−1)T
, and a functionf (x) : Rn → R how can we find ∂ f (x)
∂ x?
∂ f (x)
∂ x =
∂ f (x)∂ x 0
∂ f (x)∂ x
1...∂ f (x)∂ x n−1
This is also called the gradient of the function, and is often
written ∇f (x ) or ∇f .Why might this be useful?
Useful Vector Calculus identities
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 36/46
Given a vector x with |x| = n and a scalar variable y .
∂ x∂ y
=
∂ x0∂ y ∂ x1∂ y ...
∂ xn−1∂ y
Useful Vector Calculus identities
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 37/46
Given a vector x with |x| = n and a vector y with |y| = m .
∂ x∂ y
=
∂ x0∂ y0
∂ x0∂ y1
. . . ∂ x0∂ ym−1
∂ x1∂ y0
∂ x1∂ y1 . . .
∂ x1∂ ym−1
......
. . ....
∂ xn−1
∂ y0
∂ xn−1∂ y1
. . .∂ xn−1∂ ym−1
Vector Calculus Identities
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 38/46
Similar to – Scalar Multiplication Rule
∂
∂ x (xT a) =∂
∂ x (aT x) = a
Similar to – Product Rule
∂
∂ x
(AB ) =∂ A
∂ x
B + A∂ B
∂ x Derivative of an Matrix inverse.
∂
∂ x (A−1) = −A−1∂ A
∂ x A−1
Change of Variable in an Integral
f (x)d x =
f (u)
∂ x
∂ u
d u
Calculating the Expectation of a Gaussian
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 39/46
Now we have enough tools to calculate the expectation of avariable given a Gaussian Distribution.
Recall:
E[x |µ, σ
2
] =
p (x |µ, σ2
)xdx
=
N (x |µ, σ2)xdx
= 1
√2πσ2
exp −1
2σ2
(x
−µ)2 xdx
Calculating the Expectation of a Gaussian
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 40/46
E[x |µ, σ2] =
Z 1√
2πσ2exp
− 1
2σ2(x − µ)2
ffxdx
u = x − µ
du = dx
E[x |µ, σ2] =
Z 1√
2πσ2exp
− 1
2σ2(x − µ)2
ffxdx
=
Z 1√
2πσ2exp
− 1
2σ2u 2
ff(u + µ)du
=Z
1√2πσ2
exp − 1
2σ2u 2ff
udu + µ
Z 1√
2πσ2exp
− 1
2σ2u 2ff
du
Calculating the Expectation of a Gaussian
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 41/46
E[x |µ, σ2] =
Z 1√
2πσ2exp
− 1
2σ2u 2
ffudu + µ
Z 1√
2πσ2exp
− 1
2σ2u 2
ffdu
Z 1√
2πσ2exp
− 1
2σ2u 2
ffdu = 1
E[x |µ, σ2] =Z
1√2πσ2 exp − 12σ2 u 2
ffudu + µ
Aside: A function is Odd iff f (x ) = −f (−x ).
Odd functions have the propertyR ∞
−∞f (x )dx = 0.
A function is Even iff f (x ) = f (−x ).
The product of an odd function and an even function is an odd function.
Calculating the Expectation of a Gaussian
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 42/46
E[x |µ, σ2] =
1√2πσ2
exp
− 1
2σ2u 2
udu + µ
exp
− 1
2σ2u 2
is even
u is odd
exp
− 1
2σ2u 2
u is odd
1√
2πσ2
exp
− 1
2σ2u 2
udu = 0
E[x |µ, σ2] = µ
Why does Machine Learning need these tools?
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 43/46
Calculus
We need to find maximum likelihoods or minimum risks. Thisoptimization is accomplished with derivatives.
Integration allows us to marginalize continuous probability
density functions.
Linear Algebra
We will be working in high-dimension spaces.
Vectors and Matrices allow us to refer to high dimensionalpoints – groups of features – as vectors.
Matrices allow us to describe the feature space .
Why does machine learning need these tools
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 44/46
Vector Calculus
We need to do all of the calculus operations in
high-dimensional feature spaces.We will want to optimize multiple values simultaneously –Gradient Descent.
We will need to take a marginal over a high dimensional
distributions – Gaussians.
Broader Context
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 45/46
What we have so far:
Entities in the world are represented as feature vectors andmaybe a label.
We want to construct statistical models of the feature vectors.Finding the most likely model is an optimization problem.
Since the feature vectors may have more than one dimension,linear algebra can help us work with them.
Bye
8/3/2019 Andrew Rosenberg- Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-21-vector-calculus-csc-84020-machine-learning 46/46
Next
Linear Regression