lecture 3: probability and statistics
DESCRIPTION
Lecture 3: Probability and Statistics. Machine Learning Queens College. Today. Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers. Classical Artificial Intelligence. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/1.jpg)
Machine Learning
Queens College
Lecture 3: Probability and Statistics
![Page 2: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/2.jpg)
2
Today
• Probability and Statistics– Naïve Bayes Classification
• Linear Algebra– Matrix Multiplication– Matrix Inversion
• Calculus– Vector Calculus– Optimization– Lagrange Multipliers
![Page 3: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/3.jpg)
3
Classical Artificial Intelligence
• Expert Systems• Theorem Provers• Shakey• Chess
• Largely characterized by determinism.
![Page 4: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/4.jpg)
4
Modern Artificial Intelligence
• Fingerprint ID• Internet Search• Vision – facial ID, object recognition• Speech Recognition• Asimo• Jeopardy!
• Statistical modeling to generalize from data.
![Page 5: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/5.jpg)
5
Two Caveats about Statistical Modeling
• Black Swans• The Long Tail
![Page 6: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/6.jpg)
6
Black Swans
• In the 17th Century, all known swans were white.• Based on evidence, it is impossible for a swan to
be anything other than white.
• In the 18th Century, black swans were discovered in Western Australia
• Black Swans are rare, sometimes unpredictable events, that have extreme impact
• Almost all statistical models underestimate the likelihood of unseen events.
![Page 7: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/7.jpg)
7
The Long Tail
• Many events follow an exponential distribution• These distributions have a very long “tail”.
– I.e. A large region with significant probability mass, but low likelihood at any particular point.
• Often, interesting events occur in the Long Tail, but it is difficult to accurately model behavior in this region.
![Page 8: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/8.jpg)
8
Boxes and Balls
• 2 Boxes, one red and one blue.• Each contain colored balls.
![Page 9: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/9.jpg)
9
Boxes and Balls
• Suppose we randomly select a box, then randomly draw a ball from that box.
• The identity of the Box is a random variable, B.
• The identity of the ball is a random variable, L.
• B can take 2 values, r, or b• L can take 2 values, g or o.
![Page 10: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/10.jpg)
10
Boxes and Balls
• Given some information about B and L, we want to ask questions about the likelihood of different events.
• What is the probability of selecting an apple?
• If I chose an orange ball, what is the probability that I chose from the blue box?
![Page 11: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/11.jpg)
11
Some basics
• The probability (or likelihood) of an event is the fraction of times that the event occurs out of n trials, as n approaches infinity.
• Probabilities lie in the range [0,1]• Mutually exclusive events are events that cannot
simultaneously occur.– The sum of the likelihoods of all mutually exclusive events
must equal 1.
• If two events are independent then,
p(X, Y) = p(X)p(Y)
p(X|Y) = p(X)
![Page 12: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/12.jpg)
12
Joint Probability – P(X,Y)
• A Joint Probability function defines the likelihood of two (or more) events occurring.
• Let nij be the number of times event i and event j simultaneously occur.
Orange Green
Blue box 1 3 4
Red box 6 2 8
7 5 12
![Page 13: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/13.jpg)
13
Generalizing the Joint Probability
![Page 14: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/14.jpg)
14
Marginalization
• Consider the probability of X irrespective of Y.
• The number of instances in column j is the sum of instances in each cell
• Therefore, we can marginalize or “sum over” Y:
![Page 15: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/15.jpg)
15
Conditional Probability
• Consider only instances where X = xj.
• The fraction of these instances where Y = yi is the conditional probability– “The probability of y given x”
![Page 16: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/16.jpg)
16
Relating the Joint, Conditional and Marginal
![Page 17: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/17.jpg)
17
Sum and Product Rules
• In general, we’ll refer to a distribution over a random variable as p(X) and a distribution evaluated at a particular value as p(x).
Sum Rule
Product Rule
![Page 18: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/18.jpg)
18
Bayes Rule
![Page 19: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/19.jpg)
19
Interpretation of Bayes Rule
• Prior: Information we have before observation.
• Posterior: The distribution of Y after observing X
• Likelihood: The likelihood of observing X given Y
PriorPosterior
Likelihood
![Page 20: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/20.jpg)
20
Boxes and Balls with Bayes Rule
• Assuming I’m inherently more likely to select the red box (66.6%) than the blue box (33.3%).
• If I selected an orange ball, what is the likelihood that I selected the red box? – The blue box?
![Page 21: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/21.jpg)
21
Boxes and Balls
![Page 22: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/22.jpg)
22
Naïve Bayes Classification
• This is a simple case of a simple classification approach.
• Here the Box is the class, and the colored ball is a feature, or the observation.
• We can extend this Bayesian classification approach to incorporate more independent features.
![Page 23: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/23.jpg)
23
Naïve Bayes Classification
• Some theory first.
![Page 24: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/24.jpg)
24
Naïve Bayes Classification
• Assuming independent features simplifies the math.
![Page 25: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/25.jpg)
25
Naïve Bayes Example Data
HOT LIGHT SOFT RED
COLD HEAVY SOFT RED
HOT HEAVY FIRM RED
HOT LIGHT FIRM RED
COLD LIGHT SOFT BLUE
COLD HEAVY FIRM BLUE
HOT HEAVY FIRM BLUE
HOT LIGHT FIRM BLUE
HOT HEAVY FIRM ?????
![Page 26: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/26.jpg)
26
Naïve Bayes Example Data
HOT LIGHT SOFT RED
COLD HEAVY SOFT RED
HOT HEAVY FIRM RED
HOT LIGHT FIRM RED
COLD LIGHT SOFT BLUE
COLD HEAVY FIRM BLUE
HOT HEAVY FIRM BLUE
HOT LIGHT FIRM BLUE
HOT HEAVY FIRM ?????
Prior:
![Page 27: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/27.jpg)
27
Naïve Bayes Example Data
HOT LIGHT SOFT RED
COLD HEAVY SOFT RED
HOT HEAVY FIRM RED
HOT LIGHT FIRM RED
COLD LIGHT SOFT BLUE
COLD HEAVY SOFT BLUE
HOT HEAVY FIRM BLUE
HOT LIGHT FIRM BLUE
HOT HEAVY FIRM ?????
![Page 28: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/28.jpg)
28
Naïve Bayes Example Data
HOT LIGHT SOFT RED
COLD HEAVY SOFT RED
HOT HEAVY FIRM RED
HOT LIGHT FIRM RED
COLD LIGHT SOFT BLUE
COLD HEAVY SOFT BLUE
HOT HEAVY FIRM BLUE
HOT LIGHT FIRM BLUE
HOT HEAVY FIRM ?????
![Page 29: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/29.jpg)
29
Continuous Probabilities
• So far, X has been discrete where it can take one of M values.
• What if X is continuous?• Now p(x) is a continuous probability density
function.• The probability that x will lie in an interval (a,b)
is:
![Page 30: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/30.jpg)
30
Continuous probability example
![Page 31: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/31.jpg)
31
Properties of probability density functions
Sum Rule
Product Rule
![Page 32: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/32.jpg)
32
Expected Values
• Given a random variable, with a distribution p(X), what is the expected value of X?
![Page 33: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/33.jpg)
33
Multinomial Distribution
• If a variable, x, can take 1-of-K states, we represent the distribution of this variable as a multinomial distribution.
• The probability of x being in state k is μk
![Page 34: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/34.jpg)
34
Expected Value of a Multinomial
• The expected value is the mean values.
![Page 35: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/35.jpg)
35
Gaussian Distribution
• One Dimension
• D-Dimensions
![Page 36: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/36.jpg)
36
Gaussians
![Page 37: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/37.jpg)
37
How machine learning uses statistical modeling
• Expectation– The expected value of a function is the
hypothesis
• Variance– The variance is the confidence in that
hypothesis
![Page 38: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/38.jpg)
38
Variance
• The variance of a random variable describes how much variability around the expected value there is.
• Calculated as the expected squared error.
![Page 39: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/39.jpg)
39
Covariance
• The covariance of two random variables expresses how they vary together.
• If two variables are independent, their covariance equals zero.
![Page 40: Lecture 3: Probability and Statistics](https://reader036.vdocuments.site/reader036/viewer/2022062304/56812aa7550346895d8e6b49/html5/thumbnails/40.jpg)
40
Next Time: Math Primer
• Linear Algebra– Definitions and Properties
• Calculus– Review of derivation and integration
• Vector Calculus– Derivation w.r.t. a vector or matrix
• Lagrange Multipliers– Constrained optimization