sharif university of...
TRANSCRIPT
![Page 1: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/1.jpg)
CE-717: Machine Learning Sharif University of Technology
M. Soleymani
Review (Probability & Linear Algebra)
![Page 2: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/2.jpg)
Outline
Axioms of probability theory
Joint probability, conditional probability, Bayes theorem
Discrete and continuous random variables
Probability mass and density functions
Expected value, variance, standard deviation
Expectation for two variables
covariance, correlation
Some probability distributions
Gaussian distribution
Linear Algebra
2
![Page 3: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/3.jpg)
Basic Probability Elements
3
Sample space (Ω): set of all possible outcomes (or worlds)
Outcomes are assumed to be mutually exclusive.
An event 𝐴 is a certain set of outcomes (i.e., subset of Ω).
A random variable is a function defined over the sample
space
Gender: 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 → {𝑚𝑎𝑙𝑒, 𝑓𝑒𝑚𝑎𝑙𝑒}
Height: 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 → ℝ+
![Page 4: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/4.jpg)
Probability Space
A probability space is defined as a triple (Ω, 𝐹, 𝑃):
A sample space Ω ≠ ∅ that contains the set of all possible
outcomes (outcomes also called states of nature)
A set 𝐹 whose elements are called events. The events are
subsets of Ω.𝐹 should be a “Borel Field”.
𝑃 represents the probability measure that assigns
probabilities to events.
4
![Page 5: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/5.jpg)
Probability Axioms (Kolomogrov)
5
Axioms define a reasonable theory of uncertainty
Kolomogrov’s probability axioms
𝑃(𝐴) ≥ 0 (∀𝐴 ⊆ Ω)
𝑃 Ω = 1
𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃(𝐴 ∩ 𝐵) (∀𝐴, 𝐵 ⊆ Ω)
Ω
A B
𝐴 ∩ 𝐵
![Page 6: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/6.jpg)
Random Variables
Random variables: Variables in probability theory
Domain of random variables:Boolean,discrete or continuous
Probability distribution: the function describing probabilities
of possible values of a random variable
𝑃 𝐷𝑖𝑐𝑒 = 1 =1
6, 𝑃(𝐷𝑖𝑐𝑒 = 2) =
1
6, …
6
![Page 7: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/7.jpg)
Random Variables
Random variable is a function that maps every outcome
in Ω to a real (complex) number.
To define probabilities easily as functions defined on (real)
numbers.
To compute expectation, variance,…
Head
Real line0 1
Tail
7
![Page 8: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/8.jpg)
Base Definitions
8
Joint probability distribution
The rules of probability (sum and product rule)
Bayes’ theorem
Independence
new evidence may be irrelevant
![Page 9: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/9.jpg)
Joint Probability Distribution
9
Probability of all combinations of the values for a set of
random variables.
If two or more random variables are considered together, they can
be described in terms of their joint probability
Example: Joint probability of features
𝑃(𝑋1, 𝑋2, … , 𝑋𝑑)
![Page 10: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/10.jpg)
Two Fundamental Rules
10
Sum rule:
𝑃 𝑌 =
𝑋
𝑃(𝑋, 𝑌)
Product rule:𝑃 𝑋, 𝑌 = 𝑃 𝑋 𝑌 𝑃(𝑌)
![Page 11: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/11.jpg)
Chain Rule
Chain rule is derived by successive application of product rule:
𝑃(𝑋1, … , 𝑋𝑛)= 𝑃(𝑋1, … , 𝑋𝑛−1 ) 𝑃(𝑋𝑛|𝑋1, … , 𝑋𝑛−1)= 𝑃(𝑋1, … , 𝑋𝑛−2) 𝑃(𝑋𝑛−1|𝑋1, … , 𝑋𝑛−2) 𝑃(𝑋𝑛|𝑋1, … , 𝑋𝑛−1)= …
= 𝑃(𝑋1) 𝑖=2
𝑛
𝑃(𝑋𝑖|𝑋1, … , 𝑋𝑖−1)
11
![Page 12: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/12.jpg)
Sum Rule: Example
12
[Bishop, Section1.2]
![Page 13: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/13.jpg)
Conditional Probability
𝑃(𝑋|𝑌) = 𝑃(𝑋, 𝑌) 𝑃(𝑌) if 𝑃(𝑋) > 0
Obtained from the product rule
𝑃(𝑋|𝑌) obeys the same rules as probabilities
𝑋𝑃(𝑋|𝑌) = 1
13
![Page 14: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/14.jpg)
Conditional Probability
For statistically dependent variables, knowing the value of
one variable may allow us to better estimate the other.
All probabilities in effect are conditional probabilities
E.g., 𝑃(𝐴) = 𝑃(𝐴 | 𝑜𝑢𝑟 𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑 𝑘𝑛𝑜𝑤𝑙𝑒𝑑𝑔𝑒)
Type equation here.
𝐴𝐵
𝑃(. |𝐵)Ω
Ω
Renormalize the probability of events jointly occur with B
14
![Page 15: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/15.jpg)
Conditional Probability: Example Rolling a fair dice
𝐴 : the outcome is an even number
𝐵 : the outcome is a prime number
𝑃 𝐴 𝐵 =𝑃(𝐴, 𝐵)
𝑃(𝐵)=
1/6
1/2=
1
3
Meningitis(𝑀) & Stiff neck (𝑆)
𝑃 𝑀 = 1, 𝑆 = 1 =1
5000
𝑃 𝑀 = 1 =1
2000
𝑃 𝑆 = 1 𝑀 = 1 =𝑃 𝑀 = 1, 𝑆 = 1
𝑃(𝑀 = 1)= 0.4
15
![Page 16: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/16.jpg)
Conditional Probability: Another Example
16
[Bishop, Section1.2]
![Page 17: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/17.jpg)
Bayes Theorem
𝑃 𝑌 𝑋 =𝑃 𝑋 𝑌 𝑃 𝑌
𝑃 𝑋
Obtained form the product rule and the symmetry property
𝑃 𝑋, 𝑌 = 𝑃 𝑌, 𝑋
𝑃 𝑋, 𝑌 = 𝑃 𝑋|𝑌 𝑃 𝑌 = 𝑃(𝑌|𝑋)𝑃 𝑋
In some problems, it may be difficult to compute 𝑃(𝑌|𝑋)directly, yet we might have information about 𝑃(𝑋|𝑌).
𝑃(𝐶𝑎𝑢𝑠𝑒|𝐸𝑓𝑓𝑒𝑐𝑡) = 𝑃(𝐸𝑓𝑓𝑒𝑐𝑡|𝐶𝑎𝑢𝑠𝑒) 𝑃(𝐶𝑎𝑢𝑠𝑒) / 𝑃(𝐸𝑓𝑓𝑒𝑐𝑡)
17
![Page 18: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/18.jpg)
Bayes Theorem
Often it would be useful to derive the rule a bit further:
𝑃 𝑌 𝑋 =𝑃 𝑋 𝑌 𝑃 𝑌
𝑃 𝑋=
𝑃 𝑋 𝑌 𝑃 𝑌
𝑌𝑃 𝑋 𝑌 𝑃 𝑌
18
![Page 19: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/19.jpg)
Bayes Theorem: Example
19
Meningitis(𝑀) & Stiff neck (𝑆)
𝑃 𝑀 = 1 =1
5000
𝑃 𝑆 = 1 = 0.01
𝑃 𝑆 = 1 𝑀 = 1 = 0.7
𝑃 𝑀 = 1 𝑆 = 1 =?
𝑃(𝑀 = 1|𝑆 = 1) = 𝑃 𝑆 = 1 𝑀 = 1 𝑃(𝑀 = 1) /𝑃(𝑆 = 1)= 0.7 × 0.0002/0.01 = 0.0014
![Page 20: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/20.jpg)
Prior and Posterior Probabilities
Prior or unconditional probabilities: belief in the absence of
any other evidence
e.g.,𝑃 𝑆 = 1 = 0.01
Posterior or conditional probabilities: belief in the presence of
evidences
e.g.,𝑃 𝑆 = 1 𝑀 = 1) = 0.7
20
![Page 21: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/21.jpg)
Independence of Random Variables
𝑋 and 𝑌 are independent iff
𝑃(𝑋|𝑌) = 𝑃(𝑋)
𝑃 𝑌 𝑋 = 𝑃 𝑌
𝑃(𝑋, 𝑌) = 𝑃(𝑋) 𝑃(𝑌)
Knowing 𝑌 tells us nothing about 𝑋 (and vice versa)
21
![Page 22: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/22.jpg)
Probability Mass Function (PMF)
Probability Mass Function (PMF) shows the probability
for each value of a discrete random variable
Each impulse magnitude is equal to the probability of the
corresponding outcome
Example: PMF of a fair dice
𝑃(𝑋 = 𝑥) ≥ 0
𝑥∈𝑋
𝑃(𝑋 = 𝑥) = 1
22
𝑃(𝑋)
𝑋
![Page 23: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/23.jpg)
Probability Density Function (PDF)
Probability Density Function (PDF) is defined for
continuous random variables
The probability of 𝑥 ∈ (𝑥0, 𝑥0 + 𝛿𝑥) is 𝑝(𝑥0) × 𝛿𝑥 (for 𝛿𝑥 → 0)
𝑝(𝑥): probability density over 𝑥
23
𝑝(𝑥)
𝑝(𝑥) ≥ 0
𝑝 𝑥 𝑑𝑥 = 1
𝑥
𝑝(𝑥0)
𝑥0
![Page 24: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/24.jpg)
Cumulative Distribution Function (CDF)
Cumulative Distribution Function (CDF)
Defined as the integration of PDF
Similarly defined on discrete variables (summation instead of integration)
Non-decreasing
Right Continuous
𝐹(−∞) = 0
𝐹 ∞ = 1
𝑑𝐹(𝑥)
𝑑𝑥= 𝑝 𝑥
𝑃 𝑢 ≤ 𝑥 ≤ 𝑣 = 𝐹 𝑣 − 𝐹(𝑢)
24
𝑥
𝐹 𝑥
𝑝(𝑥)
𝐶𝐷𝐹 𝑥 = 𝐹 𝑥 = −∞
𝑥
𝑝(𝑥)
![Page 25: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/25.jpg)
Distribution Statistics
Basic descriptors of spatial distributions:
Mean value
Variance & standard deviation
Moments
Covariance & correlation
25
![Page 26: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/26.jpg)
Expected Value
Expected (or mean) value: weighted average of all possible
values of the random variable
Expectation of a discrete random variable 𝑋:
𝐸 𝑥 = 𝑥𝑥𝑝(𝑥)
Expectation of a function of a discrete random variable 𝑋 :
𝐸 𝑓(𝑥) = 𝑥𝑓(𝑥)𝑝(𝑥)
Expected value of a continuous random variable 𝑋 :
𝐸 𝑥 = 𝑥𝑝 𝑥 𝑑𝑥
Expectation of a function of a continuous random variable 𝑋 :
𝐸 𝑓(𝑥) = 𝑓(𝑥)𝑝 𝑥 𝑑𝑥
26
![Page 27: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/27.jpg)
Expected Value
27
For expectation of a function of several variables, a
subscript is used to specify the variables is being average
over
Examples:
𝐸𝑥 𝑓 𝑥, 𝑦 = 𝑥 𝑝 𝑥 𝑓(𝑥, 𝑦)
𝐸𝑥|𝑦 𝑓 𝑥, 𝑦 = 𝑥 𝑝 𝑥|𝑦 𝑓(𝑥, 𝑦)
Other notation:𝐸𝑥[𝑓(𝑥, 𝑦)|𝑦]
𝐸𝑥,𝑦 𝑓 𝑥, 𝑦 = 𝑥 𝑦 𝑝 𝑥, 𝑦 𝑓(𝑥, 𝑦)
![Page 28: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/28.jpg)
Variance
Variance: a measure of how far values of a random
variable are spread out around its expected value
𝑉𝑎𝑟 𝑥 = 𝐸 𝑥 − 𝐸 𝑥 2
= 𝐸 𝑥2 − 𝐸 𝑥 2
Standard deviation: square root of variance:
σ𝑥 = 𝑉𝑎𝑟[𝑥]
28
![Page 29: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/29.jpg)
Moments
Moments nth order moment of a random variable 𝑋:
𝐸 𝑥𝑛
Normalized nth order moment:
𝐸 𝑥 − 𝐸 𝑥 𝑛
The first order moment is the mean value.
The second order moment is the variance added by thesquare of the mean.
29
![Page 30: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/30.jpg)
Correlation & Covariance
Correlation
𝐶𝑟𝑟 𝑥, 𝑦 = 𝐸𝑥,𝑦 𝑥𝑦
Covariance is the correlation of mean removed variables:
𝐶𝑜𝑣 𝑥, 𝑦 = 𝐸𝑥,𝑦 (𝑥 − 𝐸 𝑥 )(𝑦 − 𝐸 𝑦 )
30
Discrete RVs
𝑥
𝑦𝑥𝑦 𝑝(𝑥, 𝑦)
![Page 31: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/31.jpg)
Covariance: Example
31
𝐶𝑜𝑣 𝑥, 𝑦 = 0 𝐶𝑜𝑣 𝑥, 𝑦 = 0.9𝑥
𝑦
𝑥
𝑦
![Page 32: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/32.jpg)
Covariance Properties
32
The covariance value shows the tendency of the pair of
RVs to increase together
𝐶𝑜𝑣𝑥𝑦 > 0 ∶ 𝑥 and 𝑦 tend to increase together
𝐶𝑜𝑣𝑥𝑦 < 0 : 𝑥 tends to decrease when 𝑦 increases
𝐶𝑜𝑣𝑥𝑦 = 0 : no linear correlation between 𝑥 and 𝑦
![Page 33: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/33.jpg)
Pearson’s Product Moment Correlation
33
𝜌𝑋𝑌 =𝐶𝑜𝑣(𝑋, 𝑌)
𝜎𝑋𝜎𝑌=
𝐸 𝑋 − 𝜇𝑋 𝑌 − 𝜇𝑌𝜎𝑋𝜎𝑌
Defined only if both 𝜎𝑋 and 𝜎𝑌 are finite and nonzero.
𝜌𝑋𝑌 shows the degree of linear dependence between 𝑋 and 𝑌.
−1 ≤ 𝜌𝑋𝑌≤ 1
𝐸 𝑋 − 𝜇𝑋 𝑌 − 𝜇𝑌2 ≤ 𝐸 𝑋 − 𝜇𝑋
2 𝐸 𝑌 − 𝜇𝑌2 according to Cauchy-
Schwarz inequality (𝐶𝑋𝑌 ≤ 𝜎𝑋𝜎𝑌)
𝜌𝑋𝑌 = 1 shows a perfect positive linear relationship and 𝜌𝑋𝑌 = −1
shows a perfect negative linear relationship
![Page 34: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/34.jpg)
Pearson’s Correlation: Examples
34
[Wikipedia]
𝜌𝑋𝑌 =𝐶𝑜𝑣(𝑋, 𝑌)
𝜎𝑋𝜎𝑌
![Page 35: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/35.jpg)
Orthogonal, Uncorrelated & Independent RVs
Orthogonal random variables (𝐸 𝑥𝑦 = 0)
𝐶𝑟𝑟 𝑥, 𝑦 = 0
Uncorrelated random variables (𝐸 (𝑥 − 𝐸[𝑥])(𝑦 − 𝐸[𝑦])= 0)
𝐶𝑜𝑣 𝑥, 𝑦 = 0
Independent random variables ⇒ 𝐶𝑜𝑣 𝑥, 𝑦 = 0
𝐶𝑜𝑣 𝑥, 𝑦 = 0 ⇏ Independent random variables
35
![Page 36: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/36.jpg)
Covariance Matrix
If 𝒙 is a vector of random variables (𝑑 -dim random
vector):
Covariance matrix indicates the tendency of each pair of RVs
to vary together
36
𝜮 =𝐸((𝑥1 − 𝜇1)(𝑥1 − 𝜇1)) ⋯ 𝐸((𝑥1 − 𝜇1)(𝑥𝑑 − 𝜇𝑑))
⋮ ⋱ ⋮𝐸((𝑥𝑑 − 𝜇𝑑)(𝑥1 − 𝜇1)) ⋯ 𝐸((𝑥𝑑 − 𝜇𝑑)(𝑥𝑑 − 𝜇𝑑))
𝜮 = 𝐸[ 𝒙 − 𝝁𝒙 𝒙 − 𝝁𝒙𝑇]
𝝁𝒙 =
𝜇1⋮𝜇𝑑
=𝐸(𝑥1)
⋮𝐸(𝑥𝑑)
![Page 37: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/37.jpg)
Covariance Matrix: Two Variables
37
Σ = 𝐶 =𝜎12 𝜎12
𝜎21 𝜎22
𝜮 =1 00 1 𝜮 =
1 0.90.9 1
𝑋
𝑌
𝑋
𝑌
𝜎21 = 𝜎12 = 𝐶𝑜𝑣(𝑋, 𝑌)
![Page 38: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/38.jpg)
Covariance Matrix
38
𝜎𝑖𝑗 shows the covariance of 𝑋𝑖 and 𝑋𝑗:
𝜎𝑖𝑗 = 𝜎𝑗𝑖 = 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗)
𝜮 =
𝜎12 𝜎12
𝜎21 𝜎22
⋯ 𝜎1𝑑⋯ 𝜎2𝑑
⋮ ⋮𝜎𝑑1 𝜎𝑑2
⋱ ⋮⋯ 𝜎𝑑
2
![Page 39: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/39.jpg)
Sums of Random Variables
𝑍 = 𝑋 + 𝑌
Mean: 𝐸[𝑧] = 𝐸[𝑥] + 𝐸[𝑦]
Variance:𝑉𝑎𝑟(𝑍) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) + 2𝐶𝑜𝑣(𝑋, 𝑌)
If 𝑋,𝑌 independent:𝑉𝑎𝑟(𝑍) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌)
Distribution:
,( ) ( , )
( ) ( ) ( ) ( )
Z X Y
X Y X Y
p z p x z x dx
p x p y p x p z x dx
39
independence
![Page 40: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/40.jpg)
Some Famous Probability Density Functions
Uniform
Gaussian (Normal)
1
p x U a U bb a
2
22.1
,. 2
x
p x e N
𝑥
40
𝑥~𝑈(𝑎, 𝑏)𝑝(𝑥)
𝑏𝑎
1
𝑏 − 𝑎
𝑥~𝑁(𝜇, 𝜎2)
𝑥
𝑝(𝑥)
1
2𝜋𝜎
𝜇
![Page 41: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/41.jpg)
Gaussian (Normal) Distribution
41
68% within 𝜇 − 𝜎,𝜇 + 𝜎
95% within 𝜇 − 2𝜎,𝜇 + 2𝜎
It is widely used to model the distribution of continuousvariables
Standard Normal distribution: 𝜇 = 0, 𝜎 = 1
![Page 42: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/42.jpg)
Some Famous Probability Density Functions
42
Exponential
0
0 0
x
x e xp x e U x
x
𝑥
𝑝(𝑥)
0
![Page 43: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/43.jpg)
Some Famous Probability Mass Functions
Bernoulli: 𝑥 ∈ 0,1
𝑝 𝑥 = 1 − 𝜇 1−𝑥𝜇𝑥
Binomial
1n k k
nP X k p p
k
43
𝑥~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑝, 𝑛)
𝑘𝑛𝑝
𝑃(𝑥 = 𝑘)
𝑃(𝑥)
0 1
𝜇
![Page 44: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/44.jpg)
Multivariate Gaussian Distribution
𝒙 is a vector of 𝑑 Gaussian variables
𝑝 𝒙 ~𝑁 𝝁, 𝜮 =1
2𝜋 𝑑/2 𝜮 1/2𝑒−
12
𝒙−𝝁 𝑇𝜮−1 𝒙−𝝁
44
𝝁 =
𝜇1⋮𝜇𝑑
=𝐸(𝑥1)
⋮𝐸(𝑥𝑑)
𝜮 = 𝐸 𝒙 − 𝝁 𝒙 − 𝝁 𝑇
![Page 45: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/45.jpg)
Multivariate Gaussian Distribution
45
The covariance matrix is always symmetric and positive semi-
definite
Multivariate Gaussian is completely specified by 𝑑 + 𝑑(𝑑 + 1)/2 parameters
Special cases of :
= 𝜎2𝑰 : Independent random variables with the same variance
(circularly symmetric Gaussian)
Digonal matrix =𝜎12 … 𝟎⋮ ⋱ ⋮𝟎 ⋯ 𝜎𝑑
2: Independent random variables with
different variances
![Page 46: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/46.jpg)
Multivariate Gaussian Distribution
Level Surfaces
46
The Gaussian distribution will be constant on surfaces in
𝒙-space for which:𝒙 − 𝝁 𝑇𝜮−𝟏 𝒙 − 𝝁 = 𝐶
Principal axes of the hyper-ellipsoids are the eigenvectors of .
Bivariate Gaussian: Curves of constant density are
ellipses.
Hyper-ellipsoid
![Page 47: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/47.jpg)
Bivariate Gaussian distribution
𝜆1 and 𝜆2 are the eigenvalues of (𝜆1 ≥ 𝜆2) and 𝒗1 and
𝒗2 are the corresponding eigenvectors
47
𝒗1𝒗2
𝑙1𝑙2
𝑙1𝑙2
=𝜆1
𝜆2
![Page 48: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/48.jpg)
Gaussian Distribution Properties
Some attracting properties of the Gaussian distribution:
Marginal and conditional distributions of a Gaussian are also Gaussian
After a linear transformation, a Gaussian distribution is again Gaussian
There exists a linear transformation that diagonalizes the covariance matrix
(whitening transform).
It converts the multivariate normal distribution into a spherical one.
Gaussian is a distribution that maximizes the entropy
Gaussian is stable and infinitely divisible
Central Limit Theorem
Some distributions can be estimated by Gaussian distribution when their
parameter value is sufficiently large (e.g., Binomial)
48
![Page 49: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/49.jpg)
Central Limit Theorem
(under mild conditions)
Suppose i.i.d. (Independent Identically Distributed) RVs 𝑋𝑖 (𝑖= 1,… , 𝑁) with finite variances
Let 𝑆𝑁 = 𝑖=1𝑁 𝑋𝑖 be the sum of these RVs
Distribution of 𝑆𝑁 converges to a normal distribution as 𝑁increases, regardless to the distribution of the RVs.
Example:
49
𝑋𝑖~ uniform, 𝑖 = 1,… , 𝑁
𝑆𝑁 =1
𝑁
𝑖=1
𝑁
𝑋𝑖
𝑆1 𝑆2 𝑆10
![Page 50: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/50.jpg)
Linear Algebra: Basic Definitions
Matrix 𝑨:
Matrix Transpose
Symetric matrix 𝑨 = 𝑨𝑇
Vector 𝒂
11 12 1
21 22 2
1 2
...
...[ ]
... ... ... ...
...
n
n
ij m n
m m mn
a a a
a a aa
a a a
A
1 ,1T
ij jib a i n j m B A
1
1... [ ,..., ]T
n
n
a
a a
a
a a
50
![Page 51: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/51.jpg)
Linear Mapping
Linear function
𝑓(𝒙 + 𝒚) = 𝑓(𝒙) + 𝑓(𝒚) ∀𝒙, 𝒚 ∈ 𝑉
𝑓(𝑎𝒙) = 𝑎𝑓(𝒙) ∀𝒙 ∈ 𝑉, 𝑎 ∈ 𝐹
A linear function: 𝑓 𝒙 = 𝑤1𝑥1 +⋯+𝑤𝑑𝑥𝑑 = 𝒘𝑇 𝒙
In general, a matrix 𝐖m×d = 𝒘1 ⋯ 𝒘𝑚𝑇 can be used to
denote a map 𝑓:ℝ𝑑 → ℝ𝑚 where 𝑓𝑖 𝒙 = 𝑤11𝑥1 +⋯+𝑤𝑑1𝑥𝑑 = 𝒘𝑖
𝑇𝒙
51
![Page 52: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/52.jpg)
Inner (dot) product
Matrix multiplication
. .
[ ] [ ]
[ ] ,
ij m p ij p n
T
ij m n ij i j
a b
c c A B
A B
AB = C
Linear Algebra: Basic Definitions
1
,n
T
i i
i
a b
a b a b
52
i-th row of A j-th column of B
![Page 53: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/53.jpg)
Inner Product
Inner (dot) product
Length (Euclidean norm) of a vector
𝒂 is normalized iff 𝒂 2 = 1
Angle between vectors
Orthogonal vectors 𝒂 and 𝒃:
Orthonormal set of vectors 𝒂1, 𝒂2, … , 𝒂𝑛:
∀𝑖, 𝑗 𝒂𝑖𝑇𝒂𝑗 =
1 𝑖 = 𝑗
0 𝑜.𝑤.53
𝒂 2 = 𝒂𝑇𝒂 =
𝑖=1
𝑑
𝑎𝑖2
𝒂𝑇𝒃 =
𝑖=1
𝑑
𝑎𝑖𝑏𝑖
𝒂𝑇𝒃 = 0
𝑐𝑜𝑠𝜃 =𝒂𝑇𝒃
𝒂 2 𝒃 2
![Page 54: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/54.jpg)
Linear Independence
A set of vectors is linearly independent if no vector is
a linear combination of other vectors.
𝑐1𝒗1 + 𝑐2𝒗2+ . . . + 𝑐𝑘𝒗𝑘 = 0 ⇒
𝑐1 = 𝑐2 = . . . = 𝑐𝑘 = 0
54
![Page 55: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/55.jpg)
Matrix Determinant and Trace
Determinant
𝑑𝑒𝑡(𝑨𝑩) = 𝑑𝑒𝑡(𝑨) × 𝑑𝑒𝑡(𝑩)
Trace
1
[ ]n
jj
j
tr A a
1
det( ) ; 1,.... ;
( 1) det( )
n
ij ij
j
i j
ij ij
A a A i n
A M
55
![Page 56: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/56.jpg)
Matrix Inversion
Inverse of 𝐴𝑛×𝑛:
𝐴−1 exists iff det(𝐴) ≠ 0 (𝐴 is nonsingular)
Singular:det(𝐴) = 0
ill-conditioned:𝐴 is nonsingular but close to being singular
Pseudo-inverse for a non square matrix 𝐴# = 𝐴𝑇𝐴 −1𝐴𝑇
𝐴𝑇𝐴 is not singular
𝐴#𝐴 = 𝐼
1 nAB BA I B A
56
![Page 57: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/57.jpg)
Matrix Rank
𝑟𝑎𝑛𝑘(𝑨) : maximum number of linearly independent
columns or rows of A.
𝑨𝑚×𝑛: 𝑟𝑎𝑛𝑘(𝑨) ≤ min(𝑚, 𝑛)
Full rank 𝑨𝑛×𝑛 : 𝑟𝑎𝑛𝑘(𝑨) = 𝑛 iff 𝑨 is nonsingular
(det(𝑨) ≠ 0)
57
![Page 58: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/58.jpg)
Eigenvectors and Eigenvalues
det( ) 0n A I
1
( )n
j
j
tr
A
1
det( )n
j
j
A A
𝐴𝒗 = 𝜆𝒗
Characteristic equation:
n-th order polynomial, with n roots
58
![Page 59: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/59.jpg)
Eigenvector: Example
59
𝑨 =2 11 2
[wikipedia]
![Page 60: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/60.jpg)
Eigenvectors and Eigenvalues:
Symmetric Matrix
For a symmetric matrix, the eigenvectors corresponding
to distinct eigenvalues are orthogonal
These eigenvectors can be used to form an orthonormal
set (∀𝑖 ≠ 𝑗 𝒗𝑖𝑇𝒗𝑗 = 0 and 𝒗𝑖 = 1)
60
![Page 61: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/61.jpg)
Eigen Decomposition: Symmetric Matrix
𝑽 = 𝒗1 … 𝒗𝑁 , 𝜦 =𝜆1 … 𝟎⋮ ⋱ ⋮𝟎 … 𝜆𝑁
𝑨𝑽 = 𝑽𝜦 ⇒ 𝑨𝑽𝑽𝑇 = 𝑽𝜦𝑽𝑇𝑽𝑽𝑇=𝑰
𝑨 = 𝑽𝜦𝑽𝑇
Eigen decomposition of a symmetric matrix: 𝑨 = 𝑽𝜦𝑽𝑇
61
![Page 62: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/62.jpg)
Positive Definite Matrix
Symmetric 𝐴𝑛×𝑛 is positive definite iff:
∀𝒙 ∈ ℝ𝑛 ⇒ 𝒙𝑇𝐴𝒙 > 0
Eigen values of a positive define matrix are positive:
∀𝑖, 𝜆𝑖 > 0
62
![Page 63: Sharif University of Technologyce.sharif.edu/courses/95-96/1/ce717-2/resources/root/Review/Review.pdfSharif University of Technology M. Soleymani Review (Probability & Linear Algebra)](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b09c77c7f8b9a3d018e3844/html5/thumbnails/63.jpg)
Vector Derivatives
𝜕𝒙𝑇𝑨𝒙
𝜕𝒙= (𝑨 + 𝑨𝑇)𝒙
𝜕𝒃𝑇𝒙
𝜕𝒙= 𝒃
You can see more on the vector derivatives in the
uploaded review materials
63