random variables & entropy: extension and examples brooks zurn ee 270 / stat 270 fall 2007

21
Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Upload: lizbeth-baker

Post on 17-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Random Variables & Entropy:

Extension and Examples

Brooks ZurnEE 270 / STAT 270

FALL 2007

Page 2: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Overview

• Density Functions and Random Variables• Distribution Types• Entropy

Page 3: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Density Functions• PDF vs. CDF

– PDF shows probability of each size bin– CDF shows cumulative probability for all sizes up to and

including current bin– This data shows the normalized, relative size of a rodent as

seen from an overhead camera for 8 behaviors

Page 4: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Markov & Chebyshev Inequalities

• What’s the point?• Setting a maximum limit on probability• This limits the search space for a solution– When looking for a needle in a haystack, it helps

to have a smaller haystack.

• Can use limit to determine the necessary sample size

Page 5: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Markov & Chebyshev Inequalities

• Example: Mean height of a child in a kindergarten class is 3’6”. (Leon-Garcia text, p. 137 – see end of presentation)– Using Markov’s inequality, the probability of a child being taller than 9 feet

is <= 42/108 = .389. there will be fewer than 39 students over 9 feet tall in a class of 100

students. Also, there will be NO LESS THAN 41 students who are under 9’ tall.-Using Chebyshev’s inequality (and assuming the variance = 1 foot) the

probability of a child being taller than 9 feet is <= 122/1082 = .0123. there will be no more than 2 students taller than 9’ in a class of 100

students. (this is also consistent with Markov’s Inequality). Also, there will be NO LESS THAN 98 students under 9’ tall.This gives us a basic idea of how many student heights we need to measure

to rule out the possibility that we have a 9’ tall student…SAMPLE SIZE!!

Page 6: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Markov’s Inequality

Derivation:c

XEcXP

][}{

For a random variable X >= 0,

E[x]=, where fx (x)=P[x-e/2£X£x+e/2]/e

Assuming this also holds for X = a, because this is a continuous integral.

Page 7: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Markov’s InequalityTherefore

for c > 0, the number of values of x > c is infinite, therefore the value of c will stay constant while x continues to increase.

Page 8: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Markov’s Inequality

References: Lefebvre text.

Page 9: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Chebyshev’s Inequality0,}][{

2

2

cc

cYEYP

Derivation (INCOMPLETE):

Page 10: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Chebyshev’s Inequality

As before, c2 is constant and (Y-E[Y])2 continues to increase. But, how do fy|Y-E[Y]| and fY (Y-E[Y])2 relate?

(|Y-E[Y]|)2 = (Y-E[Y])2

As long as Y – E[Y] is >= 1, then u2 will be > u and the inequality holds, as per Markov’s Inequality.

Note: this is not a rigorous proof, and cases for which Y – E[Y] < 1 are not discussed.

Reference: Lefebvre text.

Page 11: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Note

• These both involve the Central Limit Theorem, which is derived in the Leon-Garcia text on p. 287.

• Central Limit Theorem states that the CDF of a normalized sequence of n random variables approaches the CDF of a Gaussian random variable. (p. 280)

Page 12: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Overview

• Entropy– What is it?– Used in…

Page 13: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Entropy

• What is it? – According to Jorge Cham (PhD Comics),

Page 14: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Entropy

• “Measure of uncertainty in a random experiment”

Reference: Leon-Garcia Text

• Used in information theory – Message transmission (for example, Lathi text p. 682)– Decision Tree ‘Gain Criterion’

• Leon-Garcia text p. 167• ID3, C4.5, ITI, etc. by J. Ross Quinlan and Paul Utgoff• Note: NOT same as the Gini index used as a splitting criterion

by the CART tree method (Breiman et al, 1984).

Page 15: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Entropy

• ID3 Decision Tree:Expected Information for a Binary Tree

where the entropy I is

E(A) is the average information needed to classify A.• ITI (Incremental Tree Inducer):• -Based on ID3 and its successor, C4.5.

-Uses a gain ratio metric to improve function for certain cases

n

iiin ppSSSI

1221 log),...,,(

q

j

jjjjjj

n

n SSSIs

sssAE

1

),...,,(...

)(21

21

Page 16: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Entropy

• ITI Decision Tree for Rodent Behaviors– ITI is an extension of ID3

Reference: ‘Rodent Data’ paper.

Page 17: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Distribution Types

• Continuous Random Variables– Normal (or Gaussian) Distribution – Uniform Distribution– Exponential Distribution– Rayleigh Random Variable

• Discrete (‘counting’) Random Variables– Binomial Distribution– Bernoulli and Geometric Distributions– Poisson Distribution

Page 18: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Poisson Distribution

• Number of events occurring in one time unit, time between events is exponentially distributed with mean 1/a.

• Gives a method for modeling completely random, independent events that occur after a random interval of time. (Leon-Garcia p. 106)

• Poisson Dist. can model a sequence of Bernoulli trials (Leon-Garcia p. 109)– Bernoulli gives the probability of a single coin toss.

!}{

nenXP

n and

)1(

0 !

)()(

zn

nX e

n

zezP

References: Kao text, Leon-Garcia text.

Page 20: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

References• Lefebvre Text:

– Applied Stochastic Processes, Mario Lefebvre. New York, NY: Springer., 2003• Kao Text:

– An Introduction to Stochastic Processes, Edward P. C. Kao. Belmont, CA, USA: Duxbury Press at Wadsworth Publishing Company, 1997.

• Lathi Text:– Modern Digital and Analog Communication Systems, 3rd ed., B. P. Lathi. New York,

Oxford: Oxford University Press, 1998.• Entropy-Based Decision Trees:

– ID3: P. E. Utgoff, "Incremental induction of decision trees.," Machine Learning, vol. 4, pp. 161-186, 1989.

– C4.5: J. R. Quinlan, C4.5: Programs for machine learning, 1st ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993.

– ITI: P. E. Utgoff, N. C. Berkman, and J. A. Clouse, "Decision tree induction based on efficient tree restructuring.," Machine Learning, vol. 29, pp. 5-44, 1997.

• Other Decision Tree Methods:– CART: L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, Classification and Regression

Trees. Belmont, CA: Wadsworth. 1984.• Rodent Data:

– J. Brooks Zurn, Xianhua Jiang, Yuichi Motai. Video-Based Tracking and Incremental Learning Applied to Rodent Behavioral Activity under Near-Infrared Illumination. To appear: IEEE Transactions on Instrumentation and Measurement, December 2007 or February 2008.

• Poisson Distribution Example:– http://en.wikipedia.org/wiki/Image:Poisson_distribution_PMF.png

Page 21: Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007

Questions?