econ 140 lecture 31 univariate populations lecture 3
Post on 21-Dec-2015
220 views
TRANSCRIPT
Lecture 3 1
Econ 140Econ 140
Univariate Populations
Lecture 3
Lecture 3 2
Econ 140Econ 140Today’s Plan
• Univariate statistics - distribution of a single variable
• Making inferences about population parameters from sample statistics - (For future reference: how can we relate the ‘a’ and ‘b’ parameters from last lecture to sample data)
• Dealing with two types of probability
– ‘A priori’ classical probability
– Empirical classical
Lecture 3 3
Econ 140Econ 140A Priori Classical Probability
• Characterized by a finite number of known outcomes
• The expected value of Y can be defined as
k
kkY pYYE
• The expected value will always be the mean value
µY is the population mean
is the sample mean
Y
• The outcome of an experiment is a randomized trial
Lecture 3 4
Econ 140Econ 140Flipping Coins
One coin Two coins Number of HeadsT T 0T H 1H T 1H H 2
• Example: flipping 2 fair coins
– Possible outcomes are:
HH, TT, HT, TH
– we know there are only 4 possible outcomes
– we get discreet outcomes because there are a finite number of possible outcomes
– We can represent known outcomes in a matrix
Lecture 3 5
Econ 140Econ 140Flipping Coins (2)
• The probability of some event A is nm
A )Pr(
– where m is the number of events keeping with event A and n is the total number of possible events. – If A is the number of heads when flipping 2 coins we can represent the probability distribution function like this:
Number of Heads
Probability Distribution Function (PDF)
0 0.25 = 1/41 0.50 = 1/22 0.25 = 1/4
Lecture 3 6
Econ 140Econ 140Flipping Coins (3)
• If we graph the PDF we get
Probability Distribution Function
0.00
0.25
0.50
0.75
1.00
-1 0 1 2 3
Number of Heads
Pro
bab
ility
• The expected value is• = 0(0.25) + 1(0.5) + 2(0.25)
k
kkY pYYE
Lecture 3 7
Econ 140Econ 140Empirical Classical Probability
• Characterized by an infinite number of possible outcomes
• With empirical classical probability, we use sample data to make inferences about underlying population parameters
– Most of the time, we don’t know what the population values are, so we need to use a sample
• Example: GPAs in the Econ 140 population
– We can take a sample of every 5th person in the room
– Assuming that our sample is random (that Econ 140 does not sit in some systematic fashion), we’ll have a representative sample of the population
Lecture 3 8
Econ 140Econ 140Empirical Classical Probability
• Statisticians/economists collect sample data for many other purposes
• CPS is another example: sampling occurs at the household level
• CPS uses weights to correct data for oversampling– Over-sampling would be if we picked 1 in 3 in front of
the room and only 1 in 5 in the back of the room. In that case we would over-sample the front
– There’s a spreadsheet example on the course website
(the weighted mean is our best guess of the population mean, whereas the unweighted mean is the sample mean)
Lecture 3 9
Econ 140Econ 140Empirical Classical Probability
• On the course website you’ll find an Excel spreadsheet that we will use to calculate the following:
– Expected value
– PDF and CDF
– Weights to translate sample data into population estimates
– Examine the difference between the sample (unweighted) mean and the estimated population (weighted) mean:
Weighted mean = sum(EARNWKE*EARNWT)/sum(EARNWT)
• This approximates the population mean estimate
Lecture 3 10
Econ 140Econ 140Empirical Classical Probability(3)
• So how do we construct a PDF for our spreadsheet example?
– Pick sensible earnings bands (ie 10 bands of $100)
– We can pick as many bands as we want - the greater the number of bands, the more accurate the shape of the PDF to the ‘true population’. More bands = more calculation!
Lecture 3 11
Econ 140Econ 140Empirical Classical Probability(2)
• Constructing PDFs:
– Count the number of observations in each band to get an absolute frequency
– Use weights to translate sample frequencies into estimates of the population frequencies
– Calculate relative frequencies for each band by dividing the absolute frequency for the band by the total frequency
Lecture 3 12
Econ 140Econ 140Empirical Classical Probability(4)
– A weighted way to approximate the PDF:
weightsall of avgband within weightsof avg weightsAvg
– When we have k bands, always check:
if the probabilities don’t sum to 1, we’ve made a mistake! 1kpk
Lecture 3 13
Econ 140Econ 140Empirical Classical Probability(5)
• Going back to our expected value…
The expected value of Y will be: k
kkYE pY
– The pk are frequencies and they can be weighted or not
– The Yk are the earnings bands midpoints (50, 150, 250, and so on in the spreadsheet)
• From our spreadsheet example our weighted mean was $316.63 and the unweighted mean was $317.04– Since the sample is so large, there is little difference between the sample (unweighted) mean and the
population (weighted) mean
Lecture 3 14
Econ 140Econ 140Empirical Classical Probability(6)
• We can also calculate the weighted and unweighted expected values:E(Weighted value): $326.85
E(Unweighted value: $327.31
• Why are the expected values different from the means? – We lose some information (bands for the wage data) in calculating the expected values!
• So why would we want to weight the observations?– With a small sample of what we think is a large population, we might not have sampled randomly. We use weights to make the sample more closely
resemble the population.
Lecture 3 15
Econ 140Econ 140Empirical Classical Probability(7)
• The mean is the first moment of distribution of earnings
• We may also want to consider how variable earnings are– we can do this by finding the variance, or standard error
• Calculate the variance– In our example, the unweighted variance is:
78.353,3022 kpYkY
– The weighted variance is 29730.34
– The difference between the two is 623.44
Lecture 3 16
Econ 140Econ 140Empirical Classical Probability(8)
The weighted PDF is pink
It’s tough to see, but the weighting scheme makes the population distribution tighter
Lecture 3 17
Econ 140Econ 140Empirical Classical Probability(9)
• We can use our PDF to answer:
– What is the probability that someone earns between $300 and $400?
• But we can’t use this PDF to answer:
– What is the probability that someone earns between $253 and $316?
• Why?
– The second question can’t be answered using our PDF because $253 and $316 fall somewhere within the earnings bands, not at the endpoints
Lecture 3 18
Econ 140Econ 140Standard Normal Curve
• We need to calculate something other than our PDF, using the sample mean, the sample variance, and an assumption about the shape of the distribution function
• Examine the assumption later
• The standard normal curve (also known as the Z table) will approximate the probability distribution of almost any continuous variable as the number of observations approaches infinity
Lecture 3 19
Econ 140Econ 140Standard Normal Curve (2)
• The standard deviation (measures the distance from the mean) is the square root of the variance:
y
2
2 23 3
68%area under curve
95%
99.7%
Lecture 3 20
Econ 140Econ 140Standard Normal Curve (3)
• Properties of the standard normal curve
– The curve is centered around
– The curve reaches its highest value at and tails off symmetrically at both ends
– The distribution is fully described by the expected value and the variance
y
y
• You can convert any distribution for which you have estimates of and to a standard normal distributiony 2
Lecture 3 21
Econ 140Econ 140Standard Normal Curve (4)
• A distribution only needs to be approximately normal for us to convert it to the standardized normal.
• The mass of the distribution must fall in the center, but the shape of the tails can be different
1
or
2
y
Lecture 3 22
Econ 140Econ 140Standard Normal Curve (5)
• If we want to know the probability that someone earns at most $C, we are asking: ?CYP
)( where
?*)(
)(
)(
YZ
CZP
CY
CY
We can rearrange terms to get:
• Properties for the standard normal variate Z:– It is normally distributed with a mean of zero and a variance of 1, written in shorthand
as Z~N(0,1)
Lecture 3 23
Econ 140Econ 140Standard Normal Curve (5)
• If we have some variable Y we can assume that Y will be normally distributed, written in shorthand as Y~N(µ,2)• We can use Z to convert Y to a normal distribution
• Look at the Z standardized normal distribution handout– You can calculate the area under the Z curve from the mean of zero to the value of interest– For example: read down the left hand column to 1.6 and along the top row to .4 you’ll find that the area under the curve between Z=0 and Z=1.64 is 0.4495
Lecture 3 24
Econ 140Econ 140Standard Normal Curve (6)
• Going back to our earlier question: What is the probability that someone earns between $300 and $400 [P(300Y 400)]?
2403.1985.00418.0)52.0104.0(
1985.0)52.00(
0418.0)0104.0(
52.0160
6.316400400
104.0160
6.316300300
16025608
256082
6.316
ZP
ZP
ZP
Z
Z
6.316300 400
P(300Y 400)
Z1 Z2
Lecture 3 25
Econ 140Econ 140What we’ve done
• ‘A priori’ empirical classical probability
– There are a finite number of possible outcomes
– Flipping coins example
• Empirical classical probability
– There are an infinite number of possible outcomes
– Difference between sample and population means
– Difference between sample and population expected values
– Difference in calculating PDF’s of a Univariate population.
• Use of standard normal distribution.