stat 4060 design and analysis of surveys

60
22/6/23 www.uic.edu.hk/~xlpeng 1 STAT 4060 Design and Analysis of Surveys Exam: 60% Mid Test: 20% Mini Project: 10% Continuous assessment: 10%

Upload: juancarlos-rodriguez

Post on 30-Dec-2015

51 views

Category:

Documents


3 download

DESCRIPTION

STAT 4060 Design and Analysis of Surveys. Exam: 6 0% Mid Test: 20% Mini Project: 10 % Continuous assessment : 10 %. What we have learned:. 1. Simple random sampling, confidence interval and choice of sample size. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 1

STAT 4060 Design and Analysis of Surveys

Exam: 60% Mid Test: 20% Mini Project: 10% Continuous assessment: 10%

Page 2: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 2

What we have learned:

1. Simple random sampling, confidence interval and choice of sample size.

2. Ratio and regression estimators, systematic sampling.

3. Stratified random sampling, allocation of stratum weights.

4. Cluster sampling.

Page 3: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 3

Population Parameter

Page 4: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 4

Sample Statistics

Page 5: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 5

Simple random sampling

We shall consider the use of simple random samples for estimating the three population characteristics:

the population mean

the population total

and the proportion P.

We shall discuss how any estimators behave in terms of their sampling distributions. The variance is often a crucial measure.

1

1, denoted , ;

N

jj

Y Y YN

1

, denoted , ;N

T T jj

Y Y Y

Page 6: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 6

Page 7: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 7

Proof of (1.9)

n

SfS

Nn

nS

Nn

N

yynnyVarn

YnYyynnYyVarnn

YnyEynnEyn

YyyyEn

YnyEyEyEyVar

jii

jii

jii

jijii

n

ii

222

22222

2222

222

22

1

22

)1(11

)),cov()1()((1

})),)(cov(1())(({1

})1({1

)(1

)/()()()(

Page 8: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 8

Confidence interval for the population mean

Page 9: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 9

Page 10: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 10

Page 11: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 11

Page 12: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 12

Page 13: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 13

Page 14: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 14

Ratio Estimation and Regression Estimation(Chapter 4, Textbook, Barnett, V., 1991)

2.1 Estimation of a population ratio: The ratio estimator In some situations it is useful to estimate a (positive) ratio of two

population characteristics: the totals, or means, of two (positive) variables X and Y.

Page 15: STAT  4060  Design and Analysis of Surveys

The sample average of ratio

unbiased for estimating the population mean

Two obvious estimators of R are

The ratio of the sample averages

is widely used.

23/4/19 www.uic.edu.hk/~xlpeng 15

1 1

1 1( / )

n n

i i ii i

r y x rn n

/ /T Tr y x y x

1 1

1 1( / )

N N

j j jj j

R R Y XN N

but biased for estimating R

Page 16: STAT  4060  Design and Analysis of Surveys

The bias in estimating R by r

The bias in estimating R by r is the expectation of the following difference:

(2.3)

23/4/19 www.uic.edu.hk/~xlpeng 16

( ) /r R y Rx x 1

1y Rx x X

X X

2

1 .y Rx x X x X

X X X

2

[( )( )]( )

y Rx E y Rx x XE r R E

X X

Page 17: STAT  4060  Design and Analysis of Surveys

Discussion about the bias

23/4/19 www.uic.edu.hk/~xlpeng 17

Page 18: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 18

(2.5)

2

21

2 2 22

( )1

1

12

Nj j

j

Y YX X

Y RXf

nX N

fS RS R S

nX

( ) ( )j j j j jZ Y RX Y Y RX RX

Page 19: STAT  4060  Design and Analysis of Surveys

2.2 Ratio estimation of a population mean or total

23/4/19 www.uic.edu.hk/~xlpeng 19

( / )Ry rX X x y

( / )TR T Ry rX NX x y Ny

Page 20: STAT  4060  Design and Analysis of Surveys

Variance of ratio estimator

23/4/19 www.uic.edu.hk/~xlpeng 20

Page 21: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 21

Page 22: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 22

The estimate of the ratio R of the present weight to prestudy weight for the herd is:

Solution:

000929.012

646.848,8)

500

121(

880

11)(

22

2

rSXn

frVar

030485.0000929.0)( rse

Page 23: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 23

This examines when the variance of (2.10) could be less or greater than that of (1.9)

Page 24: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 24

Page 25: STAT  4060  Design and Analysis of Surveys

2.3 Regression estimation

Condition (2.15.1) demands that X and Y be linearly related, but, if the linear relationship does not pass through the origin, then, it suggests considering an alternative estimator known as regression estimator.

23/4/19 www.uic.edu.hk/~xlpeng 25

Page 26: STAT  4060  Design and Analysis of Surveys

2.3 Regression estimation

23/4/19 www.uic.edu.hk/~xlpeng 26

A practicable simple linear regression model is (2.17)

.

An ideal (perfect) linear relationship is

(2.16)

)( jj XXbYY

(2.18)

jjj EXXbYY )(

Page 27: STAT  4060  Design and Analysis of Surveys

2.3 Regression estimation

23/4/19 www.uic.edu.hk/~xlpeng 27

Consider the average (mean) of either (2.16) or (2.17),

( )Ly y b X x (2.19)

Page 28: STAT  4060  Design and Analysis of Surveys

2.3 Regression estimation

23/4/19 www.uic.edu.hk/~xlpeng 28

2( ) [( ) ]L LVar y E y Y 2

2 2 2

2 2

{[( ) ( )] }

1( 2 )

1(1 )

L

Y YX X

Y YX

E y Y b x X

fS bS b S

nfS

n

21( )Y

fS Var y

n

(2.20)

y

Page 29: STAT  4060  Design and Analysis of Surveys

2.3 Regression estimation

23/4/19 www.uic.edu.hk/~xlpeng 29

From (2.20),

2 2 21min { ( )} min ( 2 )b L b Y YX X

fVar y S bS b S

n

2 21(1 )Y YX

fS

n

The minimum is obtained with 2min / /YX X YX Y Xb b S S S S

Y

Thus the most efficient regression estimator of is

( / )( )L YX Y Xy y S S X x

(2.22)

Page 30: STAT  4060  Design and Analysis of Surveys

2.3 Regression estimation

23/4/19 www.uic.edu.hk/~xlpeng 30

The optimal value of b of (2.22) suggests the obvious estimate:

1min 2 2

1

( )( )( )

( )

n

i iyx in

x ii

y y x xsb b

s x x

(2.24)

( )Ly y b X x (2.25)

which enjoys the following asymptotic properties:

1( ) ( )LE y Y O n

Page 31: STAT  4060  Design and Analysis of Surveys

2.3 Regression estimation

23/4/19 www.uic.edu.hk/~xlpeng 31

Asymptotic properties:

( )LVar y

2 2 2 3/21( / ) ( )Y YX X

fS S S O n

n

21( ) ( )L y yx

fV y s bs

n

(2.27)

(2.26) )()1(1 2/322

nOSn

fXYX

Page 32: STAT  4060  Design and Analysis of Surveys

2.4 Comparison of ratio and regression estimators

23/4/19 www.uic.edu.hk/~xlpeng 32

Page 33: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 33

2.4 Comparison of ratio and regression estimators

2 2 2 21( ) ( ) 2R L X YX Y X YX Y

fV y Var y R S R S S S

n

21X YX Y

fRS S

n

Page 34: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 34

Stratified Simple Random Sampling(Chapter 5, Textbook, Barnett, V., 1991)

Consider another sampling method:

Page 35: STAT  4060  Design and Analysis of Surveys

Some Notations

23/4/19 www.uic.edu.hk/~xlpeng 35

To estimate the population mean of a finite population, we assume that the population is stratified, that is to say it has been divided into k non-overlapping groups, or strata, of sizes:

The stratum means and variances are denoted by

and

Page 36: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 36

Estimation of Population Characteristicsin Stratified Populations

Page 37: STAT  4060  Design and Analysis of Surveys

Estimating

23/4/19 www.uic.edu.hk/~xlpeng 37

The stratified sample mean is defined as

Here we assume the weights Wi=Ni /N is given (known).

Page 38: STAT  4060  Design and Analysis of Surveys

The mean and variance of

23/4/19 www.uic.edu.hk/~xlpeng 38

Note that

Since

Because it is assumed that “sampling in different strata are independent”, that is

Page 39: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 39

Simple random sampling

Stratified sampling with proportional allocation

Page 40: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 40

(a) When stratum size is large enough:

N

N i

Page 41: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 41

(b) When stratum size is not large enough:

The stratified sample mean will be more efficient than the s.r. sample mean

If and only if variation between the stratum means is sufficiently large

compared with within-strata variation!

Page 42: STAT  4060  Design and Analysis of Surveys

Optimum Choice of Sample Size

23/4/19 www.uic.edu.hk/~xlpeng 42

To achieve required precision of estimation Some cost limitation

The simplest form assumes that there is some overhead cost, c0 of administering

The survey, and that individual observations from the ith stratum each cost an

Amount ci. Thus the total cost is:

Page 43: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 43

I. Minimum variance for fixed cost (Cont.)

Page 44: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 44

I. Minimum variance for fixed cost (Cont.)

Then

Page 45: STAT  4060  Design and Analysis of Surveys

II. Minimum cost for fixed variance

23/4/19 www.uic.edu.hk/~xlpeng 45

Consider to satisfy for the minimum possible total cost.

Page 46: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 46

iii nwnwGiven ,

Page 47: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 47

Page 48: STAT  4060  Design and Analysis of Surveys

Comparison of proportional allocation and optimum allocation

23/4/19 www.uic.edu.hk/~xlpeng 48

Thus the extent of the potential gain from optimum (Neyman) allocation

Compared with proportional allocation depends on the variability of the

stratum variances: the larger this is, the greater the relative advantage

Of optimum allocation.

Page 49: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 49

Cluster Sampling(Chapter 6, Textbook, Barnett, V., 1991)

Page 50: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 50

Page 51: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 51

Page 52: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 52

Page 53: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 53

Page 54: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 54

Comparison of s.r. sampling with cluster sampling

Page 55: STAT  4060  Design and Analysis of Surveys

Systematic Sampling

23/4/19 www.uic.edu.hk/~xlpeng 55

Systematic sample can be viewed as a cluster sample of size m=1!

Systematic sample mean

Page 56: STAT  4060  Design and Analysis of Surveys

Systematic Sampling

23/4/19 www.uic.edu.hk/~xlpeng 56

Comparison of s.r. sampling with systimatic sampling

Page 57: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 57

Page 58: STAT  4060  Design and Analysis of Surveys

Two ways of estimating ---

23/4/19 www.uic.edu.hk/~xlpeng 58

Y

Page 59: STAT  4060  Design and Analysis of Surveys

23/4/19www.uic.edu.hk/~xlpeng 59

n

Page 60: STAT  4060  Design and Analysis of Surveys

23/4/19 www.uic.edu.hk/~xlpeng 60