math recapmath recap jingwei tang @ cgl. 19.02.2020. ... probability space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด)...

41
Math Recap Jingwei Tang @ CGL 19.02.2020 [email protected] 0

Upload: others

Post on 27-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Math Recap

Jingwei Tang @ CGL

19.02.2020

[email protected] 0

Page 2: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

1

โ€ข Bottom-up: Building up concepts from foundational to more advanced.This strategy has the advantage that the reader at all times is able to relyon their previously learned concepts.

โ€ข Top-down: Drilling down from practical needs to more basicrequirements. This goal-driven approach has the advantage that thereaders know at all times why they need to work on a particular concept.

-- <Mathematics for Machine Learning>

Page 3: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

2

โ€ข Bottom-up: Building up concepts from foundational to more advanced.This strategy has the advantage that the reader at all times is able to relyon their previously learned concepts.

โ€ข Top-down: Drilling down from practical needs to more basicrequirements. This goal-driven approach has the advantage that thereaders know at all times why they need to work on a particular concept.

-- <Mathematics for Machine Learning>

Page 4: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

1. Linear Algebra

2. (Brief) Vector Calculus

3. Probability Theory

3

Page 5: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Matrix Multiplications

โ€ข Let matrix ๐ด๐ด โˆˆ โ„๐‘š๐‘šร—๐‘๐‘ and ๐ต๐ต โˆˆ โ„๐‘๐‘ร—๐‘›๐‘›, matrix-matrix multiplication can be defined as

๐ถ๐ถ๐‘–๐‘–๐‘–๐‘– = ๏ฟฝ๐‘˜๐‘˜=1

๐‘๐‘

๐ด๐ด๐‘–๐‘–๐‘˜๐‘˜๐ต๐ต๐‘˜๐‘˜๐‘–๐‘–

the result matrix ๐ถ๐ถ โˆˆ โ„๐‘š๐‘šร—๐‘›๐‘›

โ€ข Vectors can be viewed as matrices: ๐‘ฅ๐‘ฅ โˆˆ โ„๐‘๐‘ร—1

๐‘ฆ๐‘ฆ๐‘–๐‘– = ๏ฟฝ๐‘˜๐‘˜=1

๐‘๐‘

๐ด๐ด๐‘–๐‘–๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘˜๐‘˜

4

Page 6: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Matrix Multiplications

โ€ข Example: matrix-matrix multiplication:

1 2 33 2 1

0 21 โˆ’10 1

=

โ€ข Example: matrix-vector multiplication:

0 21 โˆ’10 1

23 =

5

Page 7: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Matrix Multiplications

โˆ€๐ด๐ด โˆˆ โ„๐‘š๐‘šร—๐‘›๐‘›, ๐ต๐ต โˆˆ โ„๐‘›๐‘›ร—๐‘๐‘, ๐ถ๐ถ,๐ท๐ท โˆˆ โ„๐‘๐‘ร—๐‘ž๐‘ž

โ€ข Associativity: ๐ด๐ด๐ต๐ต ๐ถ๐ถ = ๐ด๐ด(๐ต๐ต๐ถ๐ถ)

โ€ข Distributivity: ๐ด๐ด + ๐ต๐ต ๐ถ๐ถ = ๐ด๐ด๐ถ๐ถ + ๐ต๐ต๐ถ๐ถ๐ด๐ด ๐ถ๐ถ + ๐ท๐ท = ๐ด๐ด๐ถ๐ถ + ๐ด๐ด๐ท๐ท

โ€ข NO Commutativity: ๐ด๐ด๐ต๐ต โ‰  ๐ต๐ต๐ด๐ด

6

Page 8: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Linear Systemsโ€ข System of linear equations

๐‘Ž๐‘Ž11๐‘ฅ๐‘ฅ1 + โ‹ฏ+ ๐‘Ž๐‘Ž1๐‘›๐‘›๐‘ฅ๐‘ฅ๐‘›๐‘› = ๐‘๐‘1โ€ฆ โ€ฆ

๐‘Ž๐‘Ž๐‘š๐‘š1๐‘ฅ๐‘ฅ1 + โ‹ฏ+ ๐‘Ž๐‘Ž๐‘š๐‘š๐‘›๐‘›๐‘ฅ๐‘ฅ๐‘›๐‘› = ๐‘๐‘๐‘š๐‘š

can be represented through matrix-vector multiplication:

๐‘Ž๐‘Ž11 โ€ฆ ๐‘Ž๐‘Ž1๐‘›๐‘›โ‹ฎ โ‹ฎ

๐‘Ž๐‘Ž๐‘š๐‘š1 โ€ฆ ๐‘Ž๐‘Ž๐‘š๐‘š๐‘›๐‘›

๐‘ฅ๐‘ฅ1โ‹ฎ๐‘ฅ๐‘ฅ๐‘›๐‘›

=๐‘๐‘1โ‹ฎ๐‘๐‘๐‘š๐‘š

7

Page 9: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Linear Systems ๐ด๐ด๐‘ฅ๐‘ฅ = ๐‘๐‘๐ด๐ด โˆˆ โ„๐‘š๐‘šร—๐‘›๐‘›, ๐‘ฅ๐‘ฅ โˆˆ โ„๐‘›๐‘›, ๐‘๐‘ โˆˆ โ„๐‘š๐‘š: ๐‘š๐‘š equations, ๐‘›๐‘› variables

โ€ข ๐‘š๐‘š = ๐‘›๐‘›: Square Systems Can have 0, 1,โˆž solution(s).

โ€ข ๐‘š๐‘š < ๐‘›๐‘›: Underdetermined Systems Typically have โˆž solutions.

โ€ข ๐‘š๐‘š > ๐‘›๐‘›: Overdetermined Systems Linear Regression: least-squares solution (min ๐ด๐ด๐‘ฅ๐‘ฅ โˆ’ ๐‘๐‘ 2

)

8

Page 10: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Linear Systems ๐ด๐ด๐‘ฅ๐‘ฅ = ๐‘๐‘Square system examples:

โ€ข 2๐‘ฅ๐‘ฅ + 3๐‘ฆ๐‘ฆ = 5๐‘ฅ๐‘ฅ + ๐‘ฆ๐‘ฆ = 3

โ€ข 2๐‘ฅ๐‘ฅ + 3๐‘ฆ๐‘ฆ = 54๐‘ฅ๐‘ฅ + 6๐‘ฆ๐‘ฆ = 5

โ€ข 2๐‘ฅ๐‘ฅ + 3๐‘ฆ๐‘ฆ = 54๐‘ฅ๐‘ฅ + 6๐‘ฆ๐‘ฆ = 10

9

Square system ๐ด๐ด๐‘ฅ๐‘ฅ = ๐‘๐‘ has an unique solution.

โ‡”

๐ด๐ด is invertible.

โ‡”

๐ด๐ด๐‘ฅ๐‘ฅ = 0 only has trivial solution ๐‘ฅ๐‘ฅ = 0.

โ‡’ ๐‘ฅ๐‘ฅ = 4๐‘ฆ๐‘ฆ = โˆ’1

โ‡’ NO solutions

โ‡’ ๐‘ฅ๐‘ฅ = 5โˆ’3๐‘ฆ๐‘ฆ2

,โˆ€๐‘ฆ๐‘ฆ โˆˆ โ„

Page 11: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Invertibility and Determinantโ€ข Matrix ๐ด๐ด โˆˆ โ„๐‘›๐‘›ร—๐‘›๐‘› is called invertible if there exists ๐ต๐ต โˆˆ โ„๐‘›๐‘›ร—๐‘›๐‘› s.t. ๐ด๐ด๐ต๐ต =๐ผ๐ผ = ๐ต๐ต๐ด๐ด, ๐ต๐ต is then called the inverse of ๐ด๐ด, ๐ต๐ต = ๐ด๐ดโˆ’1.

โ€ข ๐ด๐ด โˆˆ โ„๐‘›๐‘›ร—๐‘›๐‘› is invertible or nonsingular if and only if it is square and full rank. Equivalently, having det ๐ด๐ด โ‰  0.

โ€ข det ๐ด๐ด :โ„๐‘›๐‘›ร—๐‘›๐‘› โ†’ โ„

e.g. det ๐‘Ž๐‘Ž ๐‘๐‘๐‘๐‘ ๐‘‘๐‘‘ = ๐‘Ž๐‘Ž ๐‘๐‘

๐‘๐‘ ๐‘‘๐‘‘ = ๐‘Ž๐‘Ž๐‘‘๐‘‘ โˆ’ ๐‘๐‘๐‘๐‘

๐‘Ž๐‘Ž11 ๐‘Ž๐‘Ž12 ๐‘Ž๐‘Ž13๐‘Ž๐‘Ž21 ๐‘Ž๐‘Ž22 ๐‘Ž๐‘Ž23๐‘Ž๐‘Ž31 ๐‘Ž๐‘Ž32 ๐‘Ž๐‘Ž33

= ๐‘Ž๐‘Ž11๐‘Ž๐‘Ž22 ๐‘Ž๐‘Ž23๐‘Ž๐‘Ž32 ๐‘Ž๐‘Ž33 โˆ’ ๐‘Ž๐‘Ž12

๐‘Ž๐‘Ž21 ๐‘Ž๐‘Ž23๐‘Ž๐‘Ž31 ๐‘Ž๐‘Ž33 + ๐‘Ž๐‘Ž13

๐‘Ž๐‘Ž21 ๐‘Ž๐‘Ž22๐‘Ž๐‘Ž31 ๐‘Ž๐‘Ž32

10

Page 12: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Eigenvalues and Eigenvectorsโ€ข Let ๐ด๐ด โˆˆ โ„๐‘›๐‘›ร—๐‘›๐‘› be a square matrix. ๐œ†๐œ† โˆˆ โ„ is an

eigenvalue of ๐ด๐ด and ๐‘ฅ๐‘ฅ โˆˆ โ„๐‘›๐‘›\ {0} is the corresponding eigenvector if

๐ด๐ด๐‘ฅ๐‘ฅ = ๐œ†๐œ†๐‘ฅ๐‘ฅ

โ‡” ๐ด๐ด โˆ’ ๐œ†๐œ†๐ผ๐ผ๐‘›๐‘› ๐‘ฅ๐‘ฅ = 0 has solutions other than ๐‘ฅ๐‘ฅ = 0.

โ‡” ๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘ ๐ด๐ด โˆ’ ๐œ†๐œ†๐ผ๐ผ๐‘›๐‘› = 0. (Polynomial of degree ๐‘›๐‘›)

11

๐ด๐ด is invertible. Equivalently, det ๐ด๐ด โ‰  0

โ‡”

๐ด๐ด๐‘ฅ๐‘ฅ = 0 only has trivial solution ๐‘ฅ๐‘ฅ = 0.

Page 13: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Eigenvalues and EigenvectorsExample: find the eigenvalues and eigenvectors of

๐ด๐ด =1

12

12

1

12

Page 14: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

13

Solution:

๐œ†๐œ†1 =12

, ๐ธ๐ธ๐œ†๐œ†1 = span 1โˆ’1 ,

๐œ†๐œ†2 =32

, ๐ธ๐ธ๐œ†๐œ†2 = span 11 .

Eigenvalues and Eigenvectors

Page 15: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Eigendecomposition

๐ด๐ด| |๐‘ฅ๐‘ฅ1 โ‹ฏ ๐‘ฅ๐‘ฅ๐‘›๐‘›| |

=| |๐‘ฅ๐‘ฅ1 โ‹ฏ ๐‘ฅ๐‘ฅ๐‘›๐‘›| |

๐œ†๐œ†1 0โ‹ฑ

0 ๐œ†๐œ†๐‘›๐‘›

๐ด๐ด๐ด๐ด = ๐ด๐ด๐ท๐ทโ‡”

๐ด๐ด = ๐ด๐ด๐ท๐ท๐ด๐ดโˆ’1 (if and only if eigenvectors of ๐ด๐ด form a basis of โ„๐‘›๐‘›)

14

Page 16: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

EigendecompositionExample: find the Eigendecomposition of

๐ด๐ด =1

12

12

1

๐œ†๐œ†1 =12

, ๐ธ๐ธ๐œ†๐œ†1 = span 1โˆ’1 ,

๐œ†๐œ†2 =32

, ๐ธ๐ธ๐œ†๐œ†2 = span 11 .

15

Page 17: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Eigendecomposition

๐ด๐ด = ๐ด๐ด๐ท๐ท๐ด๐ดโˆ’1

๐ท๐ท =12

0

0 32

, ๐ด๐ด = 12

1 1โˆ’1 1 , ๐ด๐ดโˆ’1 = 1

21 โˆ’11 1

16

Optional, to form a set of (nice) normalized basis

Page 18: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Matrix Decompositionsโ€ข Eigendecomposition: ๐ด๐ด โˆˆ โ„๐‘›๐‘›ร—๐‘›๐‘›, eigenvectors of ๐ด๐ด form a basis of โ„๐‘›๐‘›

๐ด๐ด = ๐ด๐ด๐ท๐ท๐ด๐ดโˆ’1

โ€ข QR/QU Decomposition (from Gram-Schmidt process): ๐ด๐ด โˆˆ โ„๐‘›๐‘›ร—๐‘›๐‘›

๐ด๐ด = ๐‘„๐‘„๐‘„๐‘„โ€ข LU Decomposition (from Gaussian Elimination): ๐ด๐ด โˆˆ โ„๐‘š๐‘šร—๐‘›๐‘›

๐ด๐ด = ๐ฟ๐ฟ๐‘„๐‘„โ€ข Singular Value Decomposition (SVD): ๐ด๐ด โˆˆ โ„๐‘š๐‘šร—๐‘›๐‘›,๐‘„๐‘„ โˆˆ โ„๐‘š๐‘šร—๐‘š๐‘š,๐‘‰๐‘‰ โˆˆ โ„๐‘›๐‘›ร—๐‘›๐‘›. ฮฃ โˆˆ โ„๐‘š๐‘šร—๐‘›๐‘› is a diagonal matrix.

๐ด๐ด = ๐‘„๐‘„ฮฃ๐‘‰๐‘‰๐‘‡๐‘‡

โ€ข Cholesky Decomposition: ๐ด๐ด โˆˆ โ„๐‘›๐‘›ร—๐‘›๐‘›, symmetric and positive definite๐ด๐ด = ๐ฟ๐ฟ๐ฟ๐ฟ๐‘‡๐‘‡

17

Page 19: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Positive Definiteness of Matrices

โ€ข Symmetric: ๐ด๐ด โˆˆ โ„๐‘›๐‘›ร—๐‘›๐‘› is symmetric if ๐ด๐ด๐‘–๐‘–๐‘–๐‘– = ๐ด๐ด๐‘–๐‘–๐‘–๐‘– ,โˆ€๐‘–๐‘–, ๐‘—๐‘— โˆˆ [1,๐‘›๐‘›].

e.g. 1 22 0 ,

1 2 32 โˆ’1 53 5 0

โ€ข Symmetric Positive Definite: A Symmetric matrix ๐ด๐ด โˆˆ โ„๐‘›๐‘›ร—๐‘›๐‘› is called symmetric, positive definite if

โˆ€๐‘ฅ๐‘ฅ โˆˆ โ„๐‘›๐‘›\{0}: ๐‘ฅ๐‘ฅ๐‘‡๐‘‡๐ด๐ด๐‘ฅ๐‘ฅ > 0If only โ‰ฅ holds, ๐ด๐ด is called symmetric, positive semidefinite.

18

Page 20: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Positive Definiteness of Matrices

Example: find out whether the following matrix is symmetric positive definite

๐ด๐ด = 9 66 5

19

Page 21: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

1. Linear Algebra

2. (Brief) Vector Calculus

3. Probability Theory

20

Page 22: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Notion of Derivatives

Derivative: Let ๐‘“๐‘“:โ„ โ†’ โ„, ๐‘ฅ๐‘ฅ โ†’ ๐‘“๐‘“(๐‘ฅ๐‘ฅ), the derivative is defined as:

๐‘‘๐‘‘๐‘“๐‘“๐‘‘๐‘‘๐‘ฅ๐‘ฅ

โ‰” lim๐›ฟ๐›ฟ๐›ฟ๐›ฟโ†’0

๐‘“๐‘“ ๐‘ฅ๐‘ฅ + ๐›ฟ๐›ฟ๐‘ฅ๐‘ฅ โˆ’ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐›ฟ๐›ฟ๐‘ฅ๐‘ฅ

21

Page 23: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Notion of DerivativesPartial Derivative: Let ๐‘“๐‘“:โ„๐‘›๐‘› โ†’ โ„,๐’™๐’™ โ†’ ๐‘“๐‘“ ๐’™๐’™ ,๐’™๐’™ โˆˆ โ„๐‘›๐‘› of ๐‘›๐‘› variables, the partial derivative is defined as:

๐œ•๐œ•๐‘“๐‘“๐œ•๐œ•๐‘ฅ๐‘ฅ๐‘–๐‘–

โ‰” lim๐›ฟ๐›ฟ๐›ฟ๐›ฟโ†’0

๐‘“๐‘“ ๐‘ฅ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘ฅ๐‘–๐‘– + ๐›ฟ๐›ฟ๐‘ฅ๐‘ฅ, โ€ฆ ๐‘ฅ๐‘ฅ๐‘›๐‘› โˆ’ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘ฅ๐‘–๐‘– , โ€ฆ ๐‘ฅ๐‘ฅ๐‘›๐‘›๐›ฟ๐›ฟ๐‘ฅ๐‘ฅ

Gradient: Collect partial derivatives of all variables and form a row vector

๐›ป๐›ป๐‘“๐‘“ = grad๐‘“๐‘“ =๐‘‘๐‘‘๐‘“๐‘“๐‘‘๐‘‘๐’™๐’™

=๐œ•๐œ•๐‘“๐‘“๐œ•๐œ•๐‘ฅ๐‘ฅ1

๐œ•๐œ•๐‘“๐‘“๐œ•๐œ•๐‘ฅ๐‘ฅ2

โ€ฆ๐œ•๐œ•๐‘“๐‘“๐œ•๐œ•๐‘ฅ๐‘ฅ๐‘›๐‘›

โˆˆ โ„1ร—๐‘›๐‘›

22

Page 24: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Notion of Derivatives

23

Jacobian: Let ๐‘“๐‘“:โ„๐‘›๐‘› โ†’ โ„๐‘š๐‘š,๐’™๐’™ โ†’ ๐’‡๐’‡ ๐’™๐’™ ,๐’™๐’™ โˆˆ โ„๐‘›๐‘›,๐’‡๐’‡ ๐’™๐’™ โˆˆ โ„๐‘š๐‘š, stacking all the gradient of components of ๐’‡๐’‡ ๐’™๐’™ into a matrix:

๐ฝ๐ฝ =๐‘‘๐‘‘๐’‡๐’‡ ๐’™๐’™๐‘‘๐‘‘๐’™๐’™

=

๐‘‘๐‘‘๐‘“๐‘“1 ๐’™๐’™๐‘‘๐‘‘๐’™๐’™โ‹ฎ

๐‘‘๐‘‘๐‘“๐‘“๐‘š๐‘š ๐’™๐’™๐‘‘๐‘‘๐’™๐’™

=

๐‘‘๐‘‘๐‘“๐‘“1 ๐’™๐’™๐‘‘๐‘‘๐‘ฅ๐‘ฅ1

โ‹ฏ๐‘‘๐‘‘๐‘“๐‘“1 ๐’™๐’™๐‘‘๐‘‘๐‘ฅ๐‘ฅ๐‘›๐‘›

โ‹ฎ โ‹ฎ๐‘‘๐‘‘๐‘“๐‘“๐‘š๐‘š ๐’™๐’™๐‘‘๐‘‘๐‘ฅ๐‘ฅ1

โ‹ฏ๐‘‘๐‘‘๐‘“๐‘“๐‘š๐‘š ๐’™๐’™๐‘‘๐‘‘๐‘ฅ๐‘ฅ๐‘›๐‘›

โˆˆ โ„๐‘š๐‘šร—๐‘›๐‘›

Page 25: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Notion of Derivatives

โ€ข Derivative: ๐‘“๐‘“:โ„ โ†’ โ„, ๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘๐›ฟ๐›ฟโˆˆ โ„

โ€ข Gradient: ๐‘“๐‘“:โ„๐‘›๐‘› โ†’ โ„, ๐›ป๐›ป๐‘“๐‘“ โˆˆ โ„1ร—๐‘›๐‘›

โ€ข Jacobian: ๐‘“๐‘“:โ„๐‘›๐‘› โ†’ โ„๐‘š๐‘š, ๐ฝ๐ฝ โˆˆ โ„๐‘š๐‘šร—๐‘›๐‘›

24

Page 26: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Chain Ruleโ€ข Real-valued functions: ๐‘“๐‘“,๐‘”๐‘”:โ„ โ†’ โ„, ๐‘ฅ๐‘ฅ โ†’ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ , ๐‘ฅ๐‘ฅ โ†’ ๐‘”๐‘” ๐‘ฅ๐‘ฅ

๐‘‘๐‘‘๐‘”๐‘” ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘‘๐‘‘๐‘ฅ๐‘ฅ

=๐‘‘๐‘‘๐‘”๐‘”(๐‘“๐‘“ ๐‘ฅ๐‘ฅ )๐‘‘๐‘‘๐‘“๐‘“(๐‘ฅ๐‘ฅ)

๐‘‘๐‘‘๐‘“๐‘“(๐‘ฅ๐‘ฅ)๐‘‘๐‘‘๐‘ฅ๐‘ฅ

โ€ข Multi-variable functions: ๐‘“๐‘“:โ„๐‘›๐‘› โ†’ โ„๐‘š๐‘š,๐’™๐’™ โ†’ ๐’‡๐’‡ ๐’™๐’™ ,๐‘”๐‘”:โ„๐‘š๐‘š โ†’ โ„,๐’™๐’™ โ†’ ๐‘”๐‘” ๐’™๐’™

๐‘‘๐‘‘๐‘”๐‘” ๐‘“๐‘“ ๐’™๐’™๐‘‘๐‘‘๐’™๐’™

=๐‘‘๐‘‘๐‘”๐‘”(๐’‡๐’‡ ๐’™๐’™ )๐‘‘๐‘‘๐’‡๐’‡(๐’™๐’™)

๐‘‘๐‘‘๐’‡๐’‡ ๐’™๐’™๐‘‘๐‘‘๐’™๐’™

25

1 ร— ๐‘›๐‘› 1 ร— ๐‘š๐‘š ๐‘š๐‘š ร— ๐‘›๐‘›

Page 27: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

More Complicated Derivatives

โ€ข Derivative of Matricesโ€ข Higher-order Derivatives

[See references]

26

Page 29: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

1. Linear Algebra

2. (Brief) Vector Calculus

3. Probability Theory

28

Page 30: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Probability Space (ฮฉ,๐’œ๐’œ,๐ด๐ด)

โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment

โ€ข Event Space ๐’œ๐’œ: space of potential results of the experiment (a collection of all subsets of ฮฉ in discrete setting)

โ€ข Probability ๐ด๐ด: with each event ๐ด๐ด โˆˆ ๐’œ๐’œ, we associate a number ๐ด๐ด(๐ด๐ด)that measures the โ€˜degree of beliefโ€™ that the event will occur.

29

Page 31: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Probability Space (ฮฉ,๐’œ๐’œ,๐ด๐ด)

โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment

โ€ข Event Space ๐’œ๐’œ: space of potential results of the experiment (a collection of all subsets of ฮฉ in discrete setting)

โ€ข Probability ๐ด๐ด: with each event ๐ด๐ด โˆˆ ๐’œ๐’œ, we associate a number ๐ด๐ด(๐ด๐ด)that measures the โ€˜degree of beliefโ€™ that the event will occur.

โ€ข Random Variable ๐‘‹๐‘‹: A function/mapping ๐‘‹๐‘‹:ฮฉ โ†’ ๐’ฏ๐’ฏ. We are interested in the probabilities on elements of ๐’ฏ๐’ฏ.

30

Page 32: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Probability Space (ฮฉ,๐’œ๐’œ,๐ด๐ด)Example: tossing coins

โ€ข Experiment: tossing coins for two consecutive times.โ€ข Sample Space: ฮฉ = {โ„Žโ„Ž, ๐‘‘๐‘‘๐‘‘๐‘‘,โ„Ž๐‘‘๐‘‘, ๐‘‘๐‘‘โ„Ž} (โ„Ž for head and ๐‘‘๐‘‘ for tails)โ€ข Random Variable: ๐‘‹๐‘‹ maps the event to number of heads. ๐’ฏ๐’ฏ = 0,1,2 .

๐‘‹๐‘‹ โ„Žโ„Ž = 2,๐‘‹๐‘‹ ๐‘‘๐‘‘๐‘‘๐‘‘ = 0,๐‘‹๐‘‹ โ„Ž๐‘‘๐‘‘ = ๐‘‹๐‘‹ ๐‘‘๐‘‘โ„Ž = 1โ€ข Probabilities (on ๐’ฏ๐’ฏ):

๐ด๐ด ๐‘‹๐‘‹ = 0 = 0.25,๐ด๐ด ๐‘‹๐‘‹ = 1 = 0.5,๐ด๐ด ๐‘ฅ๐‘ฅ = 2 = 0.25

31

Page 33: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

PDF and CDFโ€ข Probability Density Function (PDF):

๐‘“๐‘“:โ„ โ†’ โ„ s.t. โˆ€๐’™๐’™ โˆˆ โ„, ๐‘“๐‘“ ๐‘ฅ๐‘ฅ โ‰ฅ 0 and โˆซโ„ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘‘๐‘‘๐‘ฅ๐‘ฅ = 1

We can associate a random variable ๐‘‹๐‘‹ with PDF:

๐ด๐ด ๐‘Ž๐‘Ž โ‰ค ๐‘‹๐‘‹ โ‰ค ๐‘๐‘ = ๏ฟฝ๐‘Ž๐‘Ž

๐‘๐‘๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘‘๐‘‘๐‘ฅ๐‘ฅ

โ€ข Cumulative Distribution Function(CDF):๐น๐น๐‘‹๐‘‹ ๐‘ฅ๐‘ฅ = ๐ด๐ด(๐‘‹๐‘‹ โ‰ค ๐‘ฅ๐‘ฅ)

๐น๐น๐‘‹๐‘‹ ๐‘ฅ๐‘ฅ = ๏ฟฝโˆ’โˆž

๐›ฟ๐›ฟ๐‘“๐‘“ ๐‘ง๐‘ง ๐‘‘๐‘‘๐‘ง๐‘ง , ๐‘“๐‘“ ๐‘ฅ๐‘ฅ =

๐‘‘๐‘‘๐น๐น๐‘‹๐‘‹ ๐‘ฅ๐‘ฅ๐‘‘๐‘‘๐‘ฅ๐‘ฅ

32

๐‘ฅ๐‘ฅ

Page 34: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Joint Distribution

โ€ข Let ๐‘‹๐‘‹,๐‘Œ๐‘Œ be two random variables over the same probability space. Joint distribution is defined as

๐น๐น๐‘‹๐‘‹,๐‘Œ๐‘Œ ๐‘ฅ๐‘ฅ,๐‘ฆ๐‘ฆ = ๐ด๐ด ๐‘‹๐‘‹ โ‰ค ๐‘ฅ๐‘ฅ,๐‘Œ๐‘Œ โ‰ค ๐‘ฆ๐‘ฆjoint density:

๐‘“๐‘“ ๐‘ฅ๐‘ฅ,๐‘ฆ๐‘ฆ =๐œ•๐œ•2๐น๐น๐‘‹๐‘‹,๐‘Œ๐‘Œ(๐‘ฅ๐‘ฅ,๐‘ฆ๐‘ฆ)

๐œ•๐œ•๐‘ฅ๐‘ฅ๐œ•๐œ•๐‘ฆ๐‘ฆโ€ข Marginalization:

๐‘“๐‘“๐‘‹๐‘‹ ๐‘ฅ๐‘ฅ = ๏ฟฝโˆ’โˆž

โˆž๐‘“๐‘“ ๐‘ฅ๐‘ฅ,๐‘ฆ๐‘ฆ ๐‘‘๐‘‘๐‘ฆ๐‘ฆ , ๐น๐น๐‘‹๐‘‹ ๐‘ฅ๐‘ฅ = ๐น๐น๐‘‹๐‘‹,๐‘Œ๐‘Œ(๐‘ฅ๐‘ฅ,โˆž)

33

Page 35: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Independence

โ€ข Two events ๐ด๐ด and ๐ต๐ต are independent if

๐ด๐ด ๐ด๐ด โˆฉ ๐ต๐ต = ๐ด๐ด ๐ด๐ด ๐ด๐ด(๐ต๐ต)

โ€ข Two Random Variables ๐‘‹๐‘‹ and ๐‘Œ๐‘Œ are independent if their joint distribution function factorizes, i.e.

๐น๐น๐‘‹๐‘‹,๐‘Œ๐‘Œ ๐‘ฅ๐‘ฅ,๐‘ฆ๐‘ฆ = ๐น๐น๐‘‹๐‘‹ ๐‘ฅ๐‘ฅ ๐น๐น๐‘Œ๐‘Œ(๐‘ฆ๐‘ฆ)

34

Page 36: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Conditional Probability

Probability of ๐ด๐ด given ๐ต๐ต has occurred:

๐ด๐ด ๐ด๐ด ๐ต๐ต =๐ด๐ด ๐ด๐ด โˆฉ ๐ต๐ต๐ด๐ด ๐ต๐ต

โ€ข Laws regarding conditional probability:โ€ข Law of total probability: ๐ด๐ด ๐ต๐ต = โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐ด๐ด ๐ต๐ต ๐ด๐ด๐‘–๐‘– ๐ด๐ด(๐ด๐ด๐‘–๐‘–)โ€ข Bayes Rule: ๐ด๐ด ๐ด๐ด ๐ต๐ต = ๐ด๐ด ๐ต๐ต ๐ด๐ด ๐‘ƒ๐‘ƒ ๐ด๐ด

๐‘ƒ๐‘ƒ ๐ต๐ตโ€ข Chain Rule: ๐ด๐ด ๐ด๐ด1, โ€ฆ๐ด๐ด๐‘›๐‘› = ๐ด๐ด ๐ด๐ด1 ๐ด๐ด ๐ด๐ด2 ๐ด๐ด1 ๐ด๐ด(๐ด๐ด3|๐ด๐ด1,๐ด๐ด2)โ‹ฏ๐ด๐ด(๐ด๐ด๐‘›๐‘›|๐ด๐ด1, โ€ฆ ,๐ด๐ด๐‘›๐‘›โˆ’1)

[See references]

35

Page 37: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Expectation

โ€ข The expected value of a function ๐‘”๐‘”:โ„ โ†’ โ„ of a univariate continuous random variable ๐‘‹๐‘‹~๐‘๐‘(๐‘ฅ๐‘ฅ) is given by

๐”ผ๐”ผ๐‘‹๐‘‹ ๐‘”๐‘” ๐‘ฅ๐‘ฅ = ๏ฟฝ๐’ณ๐’ณ๐‘”๐‘” ๐‘ฅ๐‘ฅ ๐‘๐‘ ๐‘ฅ๐‘ฅ ๐‘‘๐‘‘๐‘ฅ๐‘ฅ

โ€ข The mean of a random variable ๐‘‹๐‘‹ is defined as

๐”ผ๐”ผ๐‘‹๐‘‹ ๐‘ฅ๐‘ฅ = ๏ฟฝ๐’ณ๐’ณ๐‘ฅ๐‘ฅ๐‘๐‘ ๐‘ฅ๐‘ฅ ๐‘‘๐‘‘๐‘ฅ๐‘ฅ

โ€ข Linearity: ๐”ผ๐”ผ๐‘‹๐‘‹ ๐‘Ž๐‘Ž๐‘“๐‘“ ๐‘ฅ๐‘ฅ + ๐‘๐‘๐‘”๐‘” ๐‘ฅ๐‘ฅ = ๐‘Ž๐‘Ž๐”ผ๐”ผ๐‘‹๐‘‹ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ + ๐‘๐‘๐”ผ๐”ผ๐‘‹๐‘‹[๐‘”๐‘” ๐‘ฅ๐‘ฅ ]

36

Page 38: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

(Co)variance

โ€ข Covariance between two univariate random variables ๐‘‹๐‘‹,๐‘Œ๐‘Œ โˆˆ โ„:๐ถ๐ถ๐ถ๐ถ๐‘ฃ๐‘ฃ๐‘‹๐‘‹,๐‘Œ๐‘Œ[๐‘ฅ๐‘ฅ,๐‘ฆ๐‘ฆ] = ๐”ผ๐”ผ๐‘‹๐‘‹,๐‘Œ๐‘Œ (๐‘ฅ๐‘ฅ โˆ’ ๐”ผ๐”ผ๐‘‹๐‘‹ ๐‘ฅ๐‘ฅ )(๐‘ฆ๐‘ฆ โˆ’ ๐”ผ๐”ผ๐‘Œ๐‘Œ ๐‘ฆ๐‘ฆ ) = ๐”ผ๐”ผ ๐‘ฅ๐‘ฅ๐‘ฆ๐‘ฆ โˆ’ ๐”ผ๐”ผ ๐‘ฅ๐‘ฅ ๐”ผ๐”ผ[๐‘ฆ๐‘ฆ]

โ€ข Variance is the covariance with itself:๐•๐• ๐‘ฅ๐‘ฅ = ๐ถ๐ถ๐ถ๐ถ๐‘ฃ๐‘ฃ๐‘‹๐‘‹,๐‘‹๐‘‹[๐‘ฅ๐‘ฅ, ๐‘ฅ๐‘ฅ] = ๐”ผ๐”ผ๐‘‹๐‘‹ ๐‘ฅ๐‘ฅ โˆ’ ๐”ผ๐”ผ๐‘‹๐‘‹ ๐‘ฅ๐‘ฅ 2 = ๐”ผ๐”ผ๐‘‹๐‘‹ ๐‘ฅ๐‘ฅ2 โˆ’ ๐”ผ๐”ผ๐‘‹๐‘‹ ๐‘ฅ๐‘ฅ 2

โ€ข NOT Linear:๐•๐• ๐‘ฅ๐‘ฅ + ๐‘ฆ๐‘ฆ = ๐•๐• ๐‘ฅ๐‘ฅ + ๐•๐• ๐‘ฆ๐‘ฆ + ๐ถ๐ถ๐ถ๐ถ๐‘ฃ๐‘ฃ ๐‘ฅ๐‘ฅ,๐‘ฆ๐‘ฆ + ๐ถ๐ถ๐ถ๐ถ๐‘ฃ๐‘ฃ[๐‘ฆ๐‘ฆ, ๐‘ฅ๐‘ฅ]

37

Page 39: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

Guassian (Normal) Distribution

โ€ข The Gaussian distribution has a density

๐‘๐‘ ๐‘ฅ๐‘ฅ ๐œ‡๐œ‡,๐œŽ๐œŽ2 =12๐œ‹๐œ‹๐œŽ๐œŽ2

exp โˆ’๐‘ฅ๐‘ฅ โˆ’ ๐œ‡๐œ‡ 2

2๐œŽ๐œŽ2

Denoted as ๐‘‹๐‘‹~๐‘๐‘(๐‘ฅ๐‘ฅ|๐œ‡๐œ‡,๐œŽ๐œŽ2)

38

PDF, CDF, Joint distribution, Expectation, Covariance, Gaussian Distribution can all be extended with some efforts to higher dimensions.[see references]

Page 40: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

References

โ€ข Mathematics for Machine Learningโ€ข Matrix Cookbook

39

Page 41: Math RecapMath Recap Jingwei Tang @ CGL. 19.02.2020. ... Probability Space (ฮฉ, ๐’œ๐’œ, ๐ด๐ด) โ€ข Sample Space ฮฉ: set of all possible outcomes of an experiment โ€ข Event Space

End of Presentation

Start of Q&A Session

40