Download - Applied Statistics 3

8/13/2019 Applied Statistics 3

1/34

1

Review of Probability and

Statistics


2/34

2

Random Variables

X is a random variable if it represents a randomdraw from some population

a discrete random variable can take on onlyselected values (usually Integers)

a continuous random variable can take on any

value in a real interval

associated with each random variable is aprobability distribution


3/34

3

Random VariablesExamples

the outcome of a fair coin tossa discrete

random variable with P(Head)= 0.5 and

P(Tail)= 0.5

the height of a selected studenta

continuous random variable drawn from anapproximately normal distribution


4/34

4

Expected Value of XE(X)

The expected value is really just a

probability weighted average of X

E(X) is the mean of the distribution of X,denoted by mxLet f(xi) be the probability that X=xi, then

n

i

iiX xfxXE1

)()(m


5/34

5

Variance of XVar(X)

The variance of X is a measure of the

dispersion or spread of the distribution

Var(X) is the expected value of the squareddeviations from the mean, so

22 )( XX XEXVar m


6/34

6

More on Variance

The square root of Var(X) is the standard

deviation of X

Var(X) can alternatively be written in termsof a weighted sum of squared deviations,

because

iXiX xfxXE 22 mm


7/34

7

CovarianceCov(X,Y)

Covariance between X and Y is a measureof the association between two random

variables, X & YIf positive, then both move up or down

together

If negative, then if X is high, Y is low, viceversa

YXXY YXEYXCov mm ),(


8/34

8

Correlation Between X and Y

Covariance is dependent upon the units ofX & Y [Cov(aX,bY)=abCov(X,Y)]

Correlation, Corr(X,Y), scales covarianceby the standard deviations of X & Y so thatit lies between -1 & 1

21

)()(

),(

YVarXVar

YXCov

YX

XYXY


9/34

9

More on Correlation & Covariance

If X,Y=0 (or equivalently X,Y=0) then Xand Y are linearly unrelated

If X,Y= 1 then X and Y are said to beperfectly positively correlated

If X,Y=1 then X and Y are said to beperfectly negatively correlated

Corr(aX,bY) = Corr(X,Y) if ab>0

Corr(aX,bY) =Corr(X,Y) if ab


10/34

10

Properties of Expectations

E(a)=a, Var(a)=0

E(mX)=mX, i.e. E(E(X))=E(X)

E(aX+b)=aE(X)+b

E(X+Y)=E(X)+E(Y)

E(X-Y)=E(X)-E(Y)

E(X- mX)=0 or E(X-E(X))=0

E((aX)2)=a2E(X2)


11/34

11

More Properties

Var(X) = E(X2)mx2

Var(aX+b) = a2Var(X)

Var(X+Y) = Var(X) +Var(Y) +2Cov(X,Y)Var(X-Y) = Var(X) +Var(Y) - 2Cov(X,Y)

Cov(X,Y) = E(XY)-mxmy

If (and only if) X,Y independent, then Var(X+Y)=Var(X)+Var(Y), E(XY)=E(X)E(Y)


12/34

12

The Normal Distribution

A general normal distribution, with mean mand variance 2is written as N(m, 2)

It has the following probability densityfunction (pdf)

2

2

2

)(

21)(

m

x

exf


13/34

13

The Standard Normal

Any random variable can be standardized by

subtracting the mean, m, and dividing by the

standard deviation,

22

2

1 z

ez

X

XXZ

m

E(Z)=0, Var(Z)=1

Thus, the standard normal, N(0,1), has pdf


14/34

14

Properties of the Normal Distn

If X~N(m,2), then aX+b ~N(am+b,a22)

A linear combination of independent,

identically distributed (iid) normal randomvariables will also be normally distributed

If Y1,Y2, Ynare iid and ~N(m,2), then

n,N~

2

mY


15/34

15

Cumulative Distribution Function

For a pdf, f(x), where f(x) is P(X = x), the

cumulative distribution function (cdf), F(x),

is P(X x); P(X > x) = 1F(x) =P(Xa) = 2[1-F(a)]P(a Z b) = F(b)F(a)


16/34

16

Random Samples and Sampling

For a random variable Y, repeated drawsfrom the same population can be labeled as

Y1, Y2, . . . , YnIf every combination of n sample points

has an equal chance of being selected, thisis a random sample

A random sample is a set of independent,identically distributed (i.i.d) randomvariables


17/34

17

Estimators as Random Variables

Each of our sample statistics (e.g. the

sample mean, sample variance, etc.) is a

random variable - Why?Each time we pull a random sample, well

get different sample statistics

If we pull lots and lots of samples, well geta distribution of sample statistics


18/34

18

Sampling Distributions

The Estimators from Samples (like Sample

Mean, Sample Variance etc) are themselves

Random Variables and their distributionsare termed as Sampling Distributions.

These include:

Chi Square Distribution t-Distribution

F-Distribution


19/34

19

The Chi-Square Distribution

Suppose that Zi, i=1,,n are iid ~ N(0,1),

and X=(Zi2), then

X has a chi-square distribution with ndegrees of freedom (dof), that is

X~2n

If X~2n, then E(X)=n and Var(X)=2n


20/34

20

The t distribution

If a random variable, T, has a t distribution with n

degrees of freedom, then it is denoted as T~tn

E(T)=0 (for n>1) and Var(T)=n/(n-2) (for n>2)T is a function of Z~N(0,1) and X~2nas follows:

n

X

Z

T


21/34

21

The F Distribution

If a random variable, F, has an F distribution with

(k1,k2) dof, then it is denoted as F~Fk1,k2

F is a function of X1~2

k1and X2~2

k2as follows:

2

2

1

1

k

X

k

X

F


22/34

22

Estimators and Estimates

Typically, we cant observe the full population, so

we must make inferences based on estimates from

a random sample

An estimator is a mathematical formula for

estimating a population parameter from sample

data

An estimate is the actual number (numericalvalue) the formula produces from the sample data


23/34

23

Examples of Estimators

Suppose we want to estimate the population mean

Suppose we use the formula for E(Y), butsubstitute 1/n for f(yi) as the probability weightsince each point has an equal chance of beingincluded in the sample (sample being random),then

Can calculate the sample average for our sample:

n

i

iYn

Y1

1


24/34

24

What Makes a Good Estimator?

Unbiasedness

Efficiency

Mean Square Error (MSE)

Asymptotic properties (for large samples):

Consistency


25/34

25

Unbiasedness of Estimators

We want our estimator to be right, on average

We say an estimator, W, of a Population

Parameter, q, is unbiased if E(W)=E(q)For our example, that means we want

YYE m)(


26/34

26

Proof: Sample Mean is Unbiased

YY

n

i

Y

n

ii

n

ii

nnn

YEnYnEYE

mmm

11

)(

11

)(

1

11


27/34

27

Example

PopulationPerson Age (Years)

A 40B 42C 44D 50E 65

Take a sample of size four


28/34

28

Population and Sample StatisticsPopulation Possible Samples of Size 4

Person Age ABCD ABCE ABDE ACDE BCDEA 40 40 40 40 40B 42 42 42 42 42C 44 44 44 44 44D 50 50 50 50 50E 65 65 65 65 65

Mean 48.2 44 47.8 49.3 49.8 50.3Variance 102 18.7 135 129 120 108Std Dev 10.1 4.32 11.6 11.4 11 10.4


29/34

29

Unbiasedness

Mean of the Sample Averages =

(44+47.8+49.3+49.8+50.3)/5 = 48.2 Mean of the Sample Variances =

(18.7+135+129+120+108)/5 = 102

Mean of the Sample Std Devs =

(4.32+11.6+11.4+11+10.4)/5 = 9.73


30/34

30

Efficiency of Estimator

Want our estimator to be closer to the truth,

on average, than any other estimator

We say an estimator, W, is efficient ifVar(W)< Var( any other estimator)

Note, for our example

nnY

nVarYVar

n

i

n

i

i

2

1

2

21

11)(


31/34

31

MSE of Estimator

What if we cant find an unbiasedestimator?

Define mean square error as E[(W-q)2

]Get trade off between unbiasedness and

efficiency, since MSE = variance + bias2

For our example, it means minimizing

22 YY YEYVarYE mm


32/34

32

Consistency of Estimator

Asymptotic property, that is, what happens

as the sample size goes to infinity?

Want distribution of W to converge to q,i.e. plim(W)=q

For our example, it means we want

nYPY

as0m


33/34

33

Central Limit Theorem

Asymptotic Normality implies that P(Z


34/34

34

Estimate of Population Variance

We have a good estimate of mY, would likea good estimate of 2Y

Can use the sample variance given belownote division by n-1, not n, since mean isestimated tooif know mcan use n

2

1

2

11

n

i

i YYn

S

Download - Applied Statistics 3

Top Related