statistics for social and behavioral sciences session #11: random variable, expectations (agresti...

Statistics for Socialand Behavioral Sciences

Session #11:Random Variable, Expectations

(Agresti and Finlay, Chapter 4)

Prof. Amine Ouazad

Statistics Course Outline

PART I. INTRODUCTION AND RESEARCH DESIGN

PART II. DESCRIBING DATA

PART III. DRAWING CONCLUSIONS FROM DATA: INFERENTIAL

STATISTICS

PART IV. : CORRELATION AND CAUSATION: REGRESSION

ANALYSIS

Week 1

Weeks 2-4

Weeks 5-9

Weeks 10-14

This is where we talk about Zmapp and Ebola!

Firenze or Lebanese Express?

Last Session• Four rules of probability distributions

1. P(not A) = 1 – P(A)2. P(A or B) = P(A) + P(B) when P(A and B)=03. P(A and B)=P(A) P(B given A)

Beware of the inverse probability fallacy, P(B given A) is not P(A given B)

3’. P(A and B)=P(A) P(B) when A and B are independent• Inverse Probability Fallacy:– P(A|B) is not P(B|A).– We have a formula P(A|B) = (P(B|A) P(A)) / P(B)

Outline

1. Random VariableProbability distribution of a random variableExpectation of a random variable

2. The normal distribution

3. Polls and normal distributions

Next time: Probability Distributions (continued) Chapter 4 of A&F

Random variableA random variable is a variable whose value is not given ex-ante… but rather can take multiple values ex-post.• Example: – X is a random variable that, before the coin is tossed (ex-ante),

can take values « Heads » or « Tails ». Once the coin is tossed (ex-post), the value of X is known, it is either « Heads » or « Tails ».

– Y is a random variable that can take values 1,2,3,4,5, or 6 depending on the draw of a dice. Before the dice is thrown, the value is not known. After the dice is drawn, we know the value of Y.

Probability distributionof a random variable

• Take all possible values of a random variable Y:– Example: 1,2,3,4,5,6– In general: y1, y2, y3, …, yK.

• Probability of the event that the random variable Y equates yk is noted P(Y=yk) or simply P(yk).

• The probability distribution of random variable Y is the list of all values of P(Y=yk).

• Example: for a balanced dice, theprobability distribution of Y is thelist of values P(Y=1), P(Y=2), P(Y=3), …which is {1/6,1/6,1/6,1/6,1/6,1/6}

All throughout the course we consider either discrete quantitative random variables or categorical random variables.

Expected value of a random variable

What are your expected gains when playing the coin game?• Gain is a random variable, equal to +10 AED when getting

heads, and -10 AED when getting tails.E(gain) = Gain when getting heads x Probability of heads

+ Gain when getting tails x Probability of tails.In general, for a random variable Y, the expected value of Y is:• E(Y) = S yk P(Y=yk)

Also note that probabilities sum to one.S P(Y=yk) = 1 Should I play this

game at all?What is my

expected gain??

Expected Earnings?• « Your annual earnings right after NYU Abu

Dhabi » is a random variable…– The variable has not been realized yet.

Let’s give it a nameY = « Your annual earnings right after NYU Abu Dhabi ».• E(earnings) = E(Y) = S yk P(Y=yk)

Takes potentially K values.• Problemo: We don’t observe earnings in the

future!!!

Hum, how much will I

earn??

An approximation is to use the distribution of current graduates …To substitute for our lack of knowledgeof P(Y=yk) for each k.• Earnings take K distinct values, no two graduates earn

exactly the same annual wage…• Hence an approximation of expected earnings is

E(Y) = S yk x (1/ K)• The average earnings of current graduates…• But that’s only an approximation !! What could be

wrong?

Expected Earnings? Hum, how much will I

earn??

Properties of the Expectation

The expectation of the sum is the sum of the expectations:• E(earnings – debt) = E(earnings) – E(debt)The expectation of a constant x the random variable is the constant x the expectation:• E( Constant x Earnings ) = Constant x E(Earnings)E.g. E(Earnings in AED) = 3.6 x E(Earnings in USD)Beware !!!• E( X Y ) is not E(X) E(Y) in general.• When X and Y are independent, E( X Y ) = E(X) E(Y).• Law of conditional expectation E(X)=E(E(X|Z))

Outline





A particular distribution

• Some random variables have a particular “bell-shaped” distribution:– Individuals’ height.

• What is the distribution of height at age 20? P(height)• What height can I expect for my child? E(height)

– Individuals’ weight.• What is the distribution of weight at age 35? P(weight)• What weight can I expect at age 35? E(height)

– The logarithm of income.• What is the distribution of the log of income after graduation?

P(log(income))• What log income can I expect after graduation?

• The “bell-shaped” distribution will now be called a “normal” distribution.

The normal distribution

• “The normal distribution is symmetric, bell shaped, and characterized by its mean m and standard deviation s.

• The probability within any particular number of standard deviations of m is the same for all normal distributions.”

P(m – s < height < m + s) = 0.68 or 68%P(m - 2 s < height < m + 2s) = 0.95 or 95% P(m - 3 s < height < m + 3s) = 0.997 or 99.7%

All of these are “events”

• Draw a histogram will a very small bin size… so that the little stairs disappear…. and a curve appears.

The normal distribution

In a normal distribution, the mode is equal to the mean!

Comparing test scores across colleges

Test scores have a normal distribution with mean 3 and standard deviation 4.

Test scores have a normal distribution with mean 4 and standard deviation 1.

“Hip hop in the Middle East”“Early paleontology in Indianapolis”

• Problem: how do I compare Marina’s test score of 3.6 at the paleontology course with a test score of 4.1 at the Hip Hop in the Middle East?

Z-score !

• Take a student’s paleontology test score at the end of the semester. This is a random variable.– Its probability distribution has a mean of m=3

with a standard deviation of s=4.– Now consider the “z-scored” paleontology test

score:

– The z-scored paleontology test score has a mean of 0, and a standard deviation of 1.

Standard Normal Distribution

• Is simply the normal distribution with mean 0 and standard deviation 1.

• A z-score of 3 means that the student is three times the standard deviation (of original test scores) above the mean.

So who has a better grade, Marina or Slavoj?

Outline





Who will win the mid termelections in the US?

• Mid term elections are held two years after the presidential elections in the United States.

• They take place early november 2014.• A question: what fraction of the voters will

vote for a democrat in Colorado?

Wrap up• A random variable is a variable whose value has not been realized.• The expectation of a random variable Y is:

E(Y) = S yk P(Y=yk)

Also, E(X+Y) = E(X) + E(Y), and E(c X)=c E(X), and E(E(X|Z))=E(X)

• Typically the probability distribution P is not known, but we approximate it….– Using the distribution for past values of Y (example: earnings of previous

graduates)– Using polls, to ask individuals for example how they will vote.

• The normal distribution is an ubiquitous distribution, that is symmetric, bell shaped. It is characterized by its mean m and its standard deviation s.

• The standard normal distribution has mean 0 and standard deviation 1.

Coming up: Readings:• Chapter 4 entirely – full of interesting examples and super relevant.• Online quiz tonight.• Go to the website

http://www.realclearpolitics.com/epolls/2014/senate/co/colorado_senate_gardner_vs_udall-3845.html and prepare one or two slides to present the race in Colorado.– Who do you think will win?– What is MoE?– What is the likely distribution of the “fraction of voters who will vote for Gardner?”

For help:

• Amine OuazadOffice 1135, Social Science [email protected] hour: Tuesday from 5 to 6.30pm.

• GAF: Irene [email protected] recitations. At the Academic Resource Center, Monday from 2 to 4pm.

http://www.realclearpolitics.com/epolls/2014/senate/co/colorado_senate_gardner_vs_udall-3845.html

http://www.realclearpolitics.com/epolls/2014/senate/co/colorado_senate_gardner_vs_udall-3845.html

mailto:[email protected]



statistics for social and behavioral sciences session #11: random variable, expectations (agresti...

Documents