Download - Applied Statistics 3
-
8/13/2019 Applied Statistics 3
1/34
1
Review of Probability and
Statistics
-
8/13/2019 Applied Statistics 3
2/34
2
Random Variables
X is a random variable if it represents a randomdraw from some population
a discrete random variable can take on onlyselected values (usually Integers)
a continuous random variable can take on any
value in a real interval
associated with each random variable is aprobability distribution
-
8/13/2019 Applied Statistics 3
3/34
3
Random VariablesExamples
the outcome of a fair coin tossa discrete
random variable with P(Head)= 0.5 and
P(Tail)= 0.5
the height of a selected studenta
continuous random variable drawn from anapproximately normal distribution
-
8/13/2019 Applied Statistics 3
4/34
4
Expected Value of XE(X)
The expected value is really just a
probability weighted average of X
E(X) is the mean of the distribution of X,denoted by mxLet f(xi) be the probability that X=xi, then
n
i
iiX xfxXE1
)()(m
-
8/13/2019 Applied Statistics 3
5/34
5
Variance of XVar(X)
The variance of X is a measure of the
dispersion or spread of the distribution
Var(X) is the expected value of the squareddeviations from the mean, so
22 )( XX XEXVar m
-
8/13/2019 Applied Statistics 3
6/34
6
More on Variance
The square root of Var(X) is the standard
deviation of X
Var(X) can alternatively be written in termsof a weighted sum of squared deviations,
because
iXiX xfxXE 22 mm
-
8/13/2019 Applied Statistics 3
7/34
7
CovarianceCov(X,Y)
Covariance between X and Y is a measureof the association between two random
variables, X & YIf positive, then both move up or down
together
If negative, then if X is high, Y is low, viceversa
YXXY YXEYXCov mm ),(
-
8/13/2019 Applied Statistics 3
8/34
8
Correlation Between X and Y
Covariance is dependent upon the units ofX & Y [Cov(aX,bY)=abCov(X,Y)]
Correlation, Corr(X,Y), scales covarianceby the standard deviations of X & Y so thatit lies between -1 & 1
21
)()(
),(
YVarXVar
YXCov
YX
XYXY
-
8/13/2019 Applied Statistics 3
9/34
9
More on Correlation & Covariance
If X,Y=0 (or equivalently X,Y=0) then Xand Y are linearly unrelated
If X,Y= 1 then X and Y are said to beperfectly positively correlated
If X,Y=1 then X and Y are said to beperfectly negatively correlated
Corr(aX,bY) = Corr(X,Y) if ab>0
Corr(aX,bY) =Corr(X,Y) if ab
-
8/13/2019 Applied Statistics 3
10/34
10
Properties of Expectations
E(a)=a, Var(a)=0
E(mX)=mX, i.e. E(E(X))=E(X)
E(aX+b)=aE(X)+b
E(X+Y)=E(X)+E(Y)
E(X-Y)=E(X)-E(Y)
E(X- mX)=0 or E(X-E(X))=0
E((aX)2)=a2E(X2)
-
8/13/2019 Applied Statistics 3
11/34
11
More Properties
Var(X) = E(X2)mx2
Var(aX+b) = a2Var(X)
Var(X+Y) = Var(X) +Var(Y) +2Cov(X,Y)Var(X-Y) = Var(X) +Var(Y) - 2Cov(X,Y)
Cov(X,Y) = E(XY)-mxmy
If (and only if) X,Y independent, then Var(X+Y)=Var(X)+Var(Y), E(XY)=E(X)E(Y)
-
8/13/2019 Applied Statistics 3
12/34
12
The Normal Distribution
A general normal distribution, with mean mand variance 2is written as N(m, 2)
It has the following probability densityfunction (pdf)
2
2
2
)(
21)(
m
x
exf
-
8/13/2019 Applied Statistics 3
13/34
13
The Standard Normal
Any random variable can be standardized by
subtracting the mean, m, and dividing by the
standard deviation,
22
2
1 z
ez
X
XXZ
m
E(Z)=0, Var(Z)=1
Thus, the standard normal, N(0,1), has pdf
-
8/13/2019 Applied Statistics 3
14/34
14
Properties of the Normal Distn
If X~N(m,2), then aX+b ~N(am+b,a22)
A linear combination of independent,
identically distributed (iid) normal randomvariables will also be normally distributed
If Y1,Y2, Ynare iid and ~N(m,2), then
n,N~
2
mY
-
8/13/2019 Applied Statistics 3
15/34
15
Cumulative Distribution Function
For a pdf, f(x), where f(x) is P(X = x), the
cumulative distribution function (cdf), F(x),
is P(X x); P(X > x) = 1F(x) =P(Xa) = 2[1-F(a)]P(a Z b) = F(b)F(a)
-
8/13/2019 Applied Statistics 3
16/34
16
Random Samples and Sampling
For a random variable Y, repeated drawsfrom the same population can be labeled as
Y1, Y2, . . . , YnIf every combination of n sample points
has an equal chance of being selected, thisis a random sample
A random sample is a set of independent,identically distributed (i.i.d) randomvariables
-
8/13/2019 Applied Statistics 3
17/34
17
Estimators as Random Variables
Each of our sample statistics (e.g. the
sample mean, sample variance, etc.) is a
random variable - Why?Each time we pull a random sample, well
get different sample statistics
If we pull lots and lots of samples, well geta distribution of sample statistics
-
8/13/2019 Applied Statistics 3
18/34
18
Sampling Distributions
The Estimators from Samples (like Sample
Mean, Sample Variance etc) are themselves
Random Variables and their distributionsare termed as Sampling Distributions.
These include:
Chi Square Distribution t-Distribution
F-Distribution
-
8/13/2019 Applied Statistics 3
19/34
19
The Chi-Square Distribution
Suppose that Zi, i=1,,n are iid ~ N(0,1),
and X=(Zi2), then
X has a chi-square distribution with ndegrees of freedom (dof), that is
X~2n
If X~2n, then E(X)=n and Var(X)=2n
-
8/13/2019 Applied Statistics 3
20/34
20
The t distribution
If a random variable, T, has a t distribution with n
degrees of freedom, then it is denoted as T~tn
E(T)=0 (for n>1) and Var(T)=n/(n-2) (for n>2)T is a function of Z~N(0,1) and X~2nas follows:
n
X
Z
T
-
8/13/2019 Applied Statistics 3
21/34
21
The F Distribution
If a random variable, F, has an F distribution with
(k1,k2) dof, then it is denoted as F~Fk1,k2
F is a function of X1~2
k1and X2~2
k2as follows:
2
2
1
1
k
X
k
X
F
-
8/13/2019 Applied Statistics 3
22/34
22
Estimators and Estimates
Typically, we cant observe the full population, so
we must make inferences based on estimates from
a random sample
An estimator is a mathematical formula for
estimating a population parameter from sample
data
An estimate is the actual number (numericalvalue) the formula produces from the sample data
-
8/13/2019 Applied Statistics 3
23/34
23
Examples of Estimators
Suppose we want to estimate the population mean
Suppose we use the formula for E(Y), butsubstitute 1/n for f(yi) as the probability weightsince each point has an equal chance of beingincluded in the sample (sample being random),then
Can calculate the sample average for our sample:
n
i
iYn
Y1
1
-
8/13/2019 Applied Statistics 3
24/34
24
What Makes a Good Estimator?
Unbiasedness
Efficiency
Mean Square Error (MSE)
Asymptotic properties (for large samples):
Consistency
-
8/13/2019 Applied Statistics 3
25/34
25
Unbiasedness of Estimators
We want our estimator to be right, on average
We say an estimator, W, of a Population
Parameter, q, is unbiased if E(W)=E(q)For our example, that means we want
YYE m)(
-
8/13/2019 Applied Statistics 3
26/34
26
Proof: Sample Mean is Unbiased
YY
n
i
Y
n
ii
n
ii
nnn
YEnYnEYE
mmm
11
)(
11
)(
1
11
-
8/13/2019 Applied Statistics 3
27/34
27
Example
PopulationPerson Age (Years)
A 40B 42C 44D 50E 65
Take a sample of size four
-
8/13/2019 Applied Statistics 3
28/34
28
Population and Sample StatisticsPopulation Possible Samples of Size 4
Person Age ABCD ABCE ABDE ACDE BCDEA 40 40 40 40 40B 42 42 42 42 42C 44 44 44 44 44D 50 50 50 50 50E 65 65 65 65 65
Mean 48.2 44 47.8 49.3 49.8 50.3Variance 102 18.7 135 129 120 108Std Dev 10.1 4.32 11.6 11.4 11 10.4
-
8/13/2019 Applied Statistics 3
29/34
29
Unbiasedness
Mean of the Sample Averages =
(44+47.8+49.3+49.8+50.3)/5 = 48.2 Mean of the Sample Variances =
(18.7+135+129+120+108)/5 = 102
Mean of the Sample Std Devs =
(4.32+11.6+11.4+11+10.4)/5 = 9.73
-
8/13/2019 Applied Statistics 3
30/34
30
Efficiency of Estimator
Want our estimator to be closer to the truth,
on average, than any other estimator
We say an estimator, W, is efficient ifVar(W)< Var( any other estimator)
Note, for our example
nnY
nVarYVar
n
i
n
i
i
2
1
2
21
11)(
-
8/13/2019 Applied Statistics 3
31/34
31
MSE of Estimator
What if we cant find an unbiasedestimator?
Define mean square error as E[(W-q)2
]Get trade off between unbiasedness and
efficiency, since MSE = variance + bias2
For our example, it means minimizing
22 YY YEYVarYE mm
-
8/13/2019 Applied Statistics 3
32/34
32
Consistency of Estimator
Asymptotic property, that is, what happens
as the sample size goes to infinity?
Want distribution of W to converge to q,i.e. plim(W)=q
For our example, it means we want
nYPY
as0m
-
8/13/2019 Applied Statistics 3
33/34
33
Central Limit Theorem
Asymptotic Normality implies that P(Z
-
8/13/2019 Applied Statistics 3
34/34
34
Estimate of Population Variance
We have a good estimate of mY, would likea good estimate of 2Y
Can use the sample variance given belownote division by n-1, not n, since mean isestimated tooif know mcan use n
2
1
2
11
n
i
i YYn
S