probability in machine learning
TRANSCRIPT
![Page 1: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/1.jpg)
Probability in Machine Learning
![Page 2: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/2.jpg)
Three Axioms of Probability
β’ Given an Event πΈ in a sample space π, S = π=1Ϊπ πΈπ
β’ First axiomβ π πΈ β β, 0 β€ π(πΈ) β€ 1
β’ Second axiomβ π π = 1
β’ Third axiomβ Additivity, any countable sequence of mutually exclusive events πΈπβ π π=1Ϊ
π πΈπ = π πΈ1 + π πΈ2 +β―+ π πΈπ = Οπ=1π π πΈπ
![Page 3: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/3.jpg)
Random Variables
β’ A random variable is a variable whose values are numerical outcomes of a random phenomenon.
β’ Discrete variables and Probability Mass Function (PMF)
β’ Continuous Variables and Probability Density Function (PDF)
π₯
ππ π₯ = 1
Pr π β€ π β€ π = ΰΆ±π
π
ππ π₯ ππ₯
![Page 4: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/4.jpg)
Expected Value (Expectation)
β’ Expectation of a random variable X:
β’ Expected value of rolling one dice?
https://en.wikipedia.org/wiki/Expected_value
![Page 5: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/5.jpg)
Expected Value of Playing Roulette
β’ Bet $1 on single number (0 ~ 36), and get $35 payoff if you win. Whatβs the expected value?
πΈ πΊπππ ππππ $1 πππ‘ = β1 Γ36
37+ 35 Γ
1
37=β1
37
![Page 6: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/6.jpg)
Variance
β’ The variance of a random variable X is the expected value of the squared deviation from the mean of X
Var πΏ = πΈ[(πΏ β π)2]
![Page 7: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/7.jpg)
Bias and Variance
![Page 8: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/8.jpg)
Covariance
β’ Covariance is a measure of the joint variability of two random variables.
πΆππ£ π, π = πΈ π β πΈ[π] πΈ π β πΈ[π] = πΈ ππ β πΈ π πΈ[π]
https://programmathically.com/covariance-and-correlation/
![Page 9: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/9.jpg)
Standard Deviation
Ο =1
π
π=1
π
(π₯π β π)2 = πππ(πΏ)
68β95β99.7 rule
![Page 10: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/10.jpg)
6 Sigma
β’ A product has 99.99966% chance to be free of defects
https://www.leansixsigmadefinition.com/glossary/six-sigma/
![Page 11: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/11.jpg)
Union, Intersection, and Conditional Probability β’ π π΄ βͺ π΅ = π π΄ + π π΅ β π π΄ β© π΅
β’ π π΄ β© π΅ is simplified as π π΄π΅
β’ Conditional Probability π π΄|π΅ , the probability of event A given B has occurred
β π π΄|π΅ = ππ΄π΅
π΅
β π π΄π΅ = π π΄|π΅ π π΅ = π π΅|π΄ π(π΄)
![Page 12: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/12.jpg)
Chain Rule of Probability
β’ The joint probability can be expressed as chain rule
![Page 13: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/13.jpg)
Mutually Exclusive
β’ π π΄π΅ = 0
β’ π π΄ βͺ π΅ = π π΄ + π π΅
![Page 14: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/14.jpg)
Independence of Events
β’ Two events A and B are said to be independent if the probability of their intersection is equal to the product of their individual probabilitiesβπ π΄π΅ = π π΄ π π΅
βπ π΄|π΅ = π π΄
![Page 15: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/15.jpg)
Bayes Rule
π π΄|π΅ =π π΅|π΄ π(π΄)
π(π΅)
Proof:
Remember π π΄|π΅ = ππ΄π΅
π΅
So π π΄π΅ = π π΄|π΅ π π΅ = π π΅|π΄ π(π΄)Then Bayes π π΄|π΅ = π π΅|π΄ π(π΄)/π π΅
![Page 16: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/16.jpg)
NaΓ―ve Bayes Classifier
![Page 17: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/17.jpg)
NaΓ―ve = Assume All Features Independent
![Page 18: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/18.jpg)
Graphical Model
π π, π, π, π, π = π π π π π π π π, π π π π π(π|π)
![Page 19: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/19.jpg)
Normal (Gaussian) Distributionβ’ A type of continuous probability distribution for a real-valued random variable.
β’ One of the most important distributions
![Page 20: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/20.jpg)
Central Limit Theoryβ’ Averages of samples of observations of random variables independently
drawn from independent distributions converge to the normal distribution
https://corporatefinanceinstitute.com/resources/knowledge/other/central-limit-theorem/
![Page 21: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/21.jpg)
Bernoulli Distribution
β’ Definition
β’ PMF
β’ πΈ[π] = π
β’ Var(π) = ππ
https://en.wikipedia.org/wiki/Bernoulli_distribution
https://acegif.com/flipping-coin-gifs/
![Page 22: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/22.jpg)
Information Theory
β’ Self-information: πΌ π₯ = β log π(π₯)
β’ Shannon Entropy:
π» = β
π
ππ log2 ππ
![Page 23: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/23.jpg)
Entropy of Bernoulli Variable
β’π» π₯ = πΈ πΌ π₯ = βπΈ[log π(π₯)]
https://en.wikipedia.org/wiki/Information_theory
![Page 24: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/24.jpg)
Kullback-Leibler (KL) Divergence
β’ π·πΎπΏ(π| π = πΈ log π π β logπ π = πΈ logπ π₯
π π₯
KL Divergence is Asymmetric!
π·πΎπΏ(π| π != π·πΎπΏ(π| π
https://www.deeplearningbook.org/slides/03_prob.pdf
![Page 25: Probability in Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022053120/62922215e120670758703923/html5/thumbnails/25.jpg)
Key Takeaways
β’ Expected value (expectation) is mean (weighted average) of a random variable
β’ Event A and B are independent if π π΄π΅ = π π΄ π π΅
β’ Event A and B are mutually exclusive if π π΄π΅ = 0
β’ Central limit theorem tells us that Normal distribution is the one, if the data probability distribution is unknown
β’ Entropy is expected value of self information βπΈ[logπ(π₯)]
β’ KL divergence can measure the difference of two probability distributions and is asymmetric