stt 200 – lecture 5, section 23,24 recitation 11 (3/26/2013 )

12
STT 200 – LECTURE 5, SECTION 23,24 RECITATION 11 (3/26/2013) TA: Zhen Zhang [email protected] Office hour: (C500 WH) 3-4 PM Tuesday (office tel.: 432-3342) Help-room: (A102 WH) 9:00AM-1:00PM, Monday Class meet on Tuesday: 12:40 – 1:30PM A224 WH, Section 23 1:50 – 2:40PM A234 WH, Section 24 1

Upload: winka

Post on 05-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

TA: Zhen Zhang [email protected] Office hour: (C500 WH) 3-4 PM Tuesday ( office tel.: 432-3342) Help-room: (A102 WH) 9:00AM-1:00PM , Monday Class meet on Tuesday: 12:40 – 1:30PM A224 WH, Section 23 1:50 – 2:40PM A234 WH, Section 24. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: STT 200 – Lecture 5, section 23,24 Recitation  11 (3/26/2013 )

1

STT 200 – LECTURE 5, SECTION 23,24RECITATION 11

(3/26/2013)

TA: Zhen Zhang

[email protected] hour: (C500 WH) 3-4 PM Tuesday

(office tel.: 432-3342)Help-room: (A102 WH) 9:00AM-1:00PM, Monday

Class meet on Tuesday: 12:40 – 1:30PM A224 WH, Section 231:50 – 2:40PM A234 WH, Section 24

Page 2: STT 200 – Lecture 5, section 23,24 Recitation  11 (3/26/2013 )

2

MAIN GOALS

Understand the sampling distribution of sample proportion .

The normal model , where is the population proportion, and is the sample size.

Page 3: STT 200 – Lecture 5, section 23,24 Recitation  11 (3/26/2013 )

3

DATA Here are data from a population of 400 people, indicating whether they do ("Yes")

or don't ("No") have wireless internet service at home. Please copy the following chunk and paste in R.

haswi <- c("Yes","Yes","Yes","No","Yes","No","Yes","No","Yes","Yes","No","Yes","Yes","Yes", "No","Yes","No","No","Yes","No","No","No","No","No","Yes","Yes","Yes","No","Yes","Yes","Yes","No","No","No","Yes","Yes","No","Yes","Yes","Yes","Yes","No","No","No","No","No","No","Yes","Yes","No","Yes","No","No","Yes","No","No","No","No","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","No","No","Yes","No","No","Yes","No","No","No","No","No","Yes","Yes","No","Yes","No","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","No","Yes","No","Yes","Yes","No","No","No","No","Yes","No","Yes","No","Yes","Yes","Yes","Yes","Yes","Yes","Yes","Yes","Yes","No","No","Yes","No","Yes","Yes","Yes","No","No","Yes","No","Yes","Yes","Yes","No","Yes","No","No","No","Yes","No","Yes","Yes","No","Yes","Yes","Yes","No","Yes","No","Yes","No","No","No","Yes","No","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","No","Yes","No","No","Yes","Yes","No","Yes","Yes","Yes","No","No","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","No","Yes","Yes","Yes","Yes","No","Yes","No","No","Yes","No","Yes","Yes","No","Yes","Yes","No","No","Yes","No","Yes","Yes","Yes","No","Yes","Yes","No","No","Yes","No","Yes","Yes","No","Yes","Yes","Yes","No","Yes","Yes","No","No","No","Yes","Yes","Yes","No","Yes","Yes","Yes","Yes","Yes","No","Yes","Yes","No","Yes","No","No","Yes","Yes","No","Yes","Yes","No","Yes","Yes","No","Yes","No","No","Yes","Yes","No","Yes","No","Yes","No","Yes","No","Yes","No","No","Yes","No","Yes","Yes","No","Yes","Yes","Yes","No","Yes","No","Yes","No","Yes","Yes","No","No","Yes","Yes","Yes","No","Yes","No","No","Yes","No","Yes","Yes","Yes","No","No","Yes","No","Yes","No","Yes","Yes","Yes","No","No","No","Yes","No","No","No","Yes","Yes","Yes","No","Yes","No","Yes","No","No","Yes","Yes","Yes","No","Yes","No","Yes","No","No","Yes","Yes","Yes","No","No","No","No","Yes","No","No","No","No","No","Yes","No","Yes","Yes","No","Yes","Yes","Yes","Yes","No","No","No","Yes","No","No","Yes","Yes","No","Yes","Yes","Yes","No","Yes","No","No","No","No","Yes","No","No","No","Yes","Yes","Yes","Yes","Yes","No","Yes","Yes","No","Yes","Yes","No","Yes","No","Yes","No","No","No","Yes","No","No","Yes","No")

Page 4: STT 200 – Lecture 5, section 23,24 Recitation  11 (3/26/2013 )

4

DATA Here is a table of integers between 1 and 400 chosen at random. R chuck:rd <- c(92,149,41,310,307,130,296,130,77,399,212,301,25,177,313,147,298,160,354,20,199, 191,104,164,216,399,25,99,28,91,211,357,350,301,39,372,61,67,304,333,174,321,191,157,316,172,5,277,78,396,208,126,162,311,17,287,138,160,124,266,177,209,361,41,398,9,79,299,257,315,40,278,2,225,206,383,254,74,335,159,37,360,9,393,143,246,305,152,90,312,208,172,117,277,93,399,226,8,231,386,136,75,38,56,37,267,381,63,52,231,287,94,50,77,179,337,387,318,112,219,17,356,77,183,259,258,141,198,30,36,61,306,65,330,161,348,19,20,61,275,365,241,115,4,338,205,108,241,190,374,323,243,146,318,217,375,267,44,373,185,341,283,200,178,266,390,232,263,386,36,270,50,315,83,90,281,260,41,305,136,116,185,25,338,4,367,296,183,103,290,208,170,143,158,198,132,155,144,26,104,281,150,240,68,67,339,389,345,141,268,349,99,147,65,170,375,317,251,185,278,80,250,4,378,175,130,359,319,400,59,166,147,130,107,123,304,234,41,20,165,96,115,272,149,142,75,262,235,106,107,354,362,2,81,89,309,371,10,282,202,203,156,386,130,252,26,387,143,237,183,328,306,27,187,310,321,183,109,198,200,281,70,394,378,203,42,34,318,156,255,354,53,196,20,382,97,292,188,179,69,151,14,348,311,389,298,399,104,300,243,163,316,328,65,167,200,301,305,27,176,69,301,188,192,242,350,92,86,42,373,195,118,64,289,329,131,156,252,169,299,191,302,19,83,220,326,229,285,267,351,333,101,128,146,307,304,245,264,149,163,353,276,296,243,8,127,31,210,263,33,384,176,125,275,76,45,60,59,143,324,281,376,298,54,62,170,295,293,27,183,126,375,21,294,242,364,145,138,52,267,26,308,391,352,78,98,211,174,277,176,74,295,64,315,171,135,159,111,79,348,88,23,348,111,188,16,152,212,104,349,14,272,209,73,238,146,50,113,103,204,389,158,260,344,207,329,184,250,38,231,292,300,34,170,343,233,275,14,15,244,104,96,234,297,113,270,369,202,37,310,294,64,183,253,299,287,225,166,260,125,198,2,180,219,117,358,191,301,310,254,230,296,2,134,67,186,265,161,130,257,166,339,33,332,137,61,340,16,212,209,42,315,8,269,68,389,316,355,62,51,64,388,260,319,244,116,265,169,153,147,170,59,329,261,384,272,367,177,217,278,266,307,182,225,80,264,342,280,350,366,280,156,323,208,110,37,266,260,59,33,314,80,185,185,87,228,246,61,369,60,119,179,326,223,128,62,98,130,283,328,225,398,3,138,140,84,381,234,131,364,294,59,343,126,93,14,204,50,35,161,15,142,275,72,254,194,309,115,344,378,267,23,111,168,334,92,213,1,181,246,336,52,82,4,115,286,3,87,121,84,281,181,58,372,232,30,279,258,154,37,6,113,125,317,123,198,25,388,268,106)

Page 5: STT 200 – Lecture 5, section 23,24 Recitation  11 (3/26/2013 )

5

PROBLEMS Use the following procedure to choose 25 people at random from the

population on the first page. For each person, record whether he does or does not have wireless internet service.

(a) Choose a starting point in the table of integers between 1 and 400 by closing your eyes and pointing at the table.

(b) Starting there, use the next 25 numbers to choose your sample of size 25. (There's a small chance that you'll pick the same person twice, but we'll not worry about that.)

Record the 25 yeses and nos here:

What proportion of the 25 people in your sample said yes? This is , the sample proportion.

The population proportion who have wireless internet service is . How far is your estimate from the true value ?

Page 6: STT 200 – Lecture 5, section 23,24 Recitation  11 (3/26/2013 )

6

SIMULATION Suppose we have many students who draw a sample with size 25 and get a

sample proportion , We plot the histogram of ’s obtained by all students, and impose the density of on it.

Den

sity

0.3 0.4 0.5 0.6 0.7 0.8

01

23

45

6

Page 7: STT 200 – Lecture 5, section 23,24 Recitation  11 (3/26/2013 )

7

PROBLEMS Comment: is (before the data are collected) a random variable, and that we’ll

use what we know about its distribution to try to quantify how confident we are in its estimation of .

Now we'll investigate more generally. First, using the facts that:

the mean of is , and that the standard deviation of is , compute the mean and standard deviation of in our case when population proportion and sample size .

standard deviation of .

Page 8: STT 200 – Lecture 5, section 23,24 Recitation  11 (3/26/2013 )

8

PROBLEMS Next use the fact that a normal model is a good model for the distribution of to

compute the probability that is within of the actual value of .

, thus the probability:

The z-score for under the normal model above is

with the area below is .Similarly, the z-score for under the normal model above is

with the area below is . So the area in-between is.

Or: normcdf() or normcdf() in a calculator, or pnorm(1.007) – pnorm(-1.007) in R.

Page 9: STT 200 – Lecture 5, section 23,24 Recitation  11 (3/26/2013 )

9

PROBLEMS Repeat the two questions above, but this time with .

mean will again be but standard deviation of , smaller! And

0.0 0.2 0.4 0.6 0.8 1.0

p = 0.5575

N(0.5575, 0.0993), n=25N(0.5575, 0.0497), n=100

Page 10: STT 200 – Lecture 5, section 23,24 Recitation  11 (3/26/2013 )

10

APPENDIX R codes for the problems.# prob 4:n <- 25; p <- 0.5575( sdphat <- sqrt(p*(1-p)/n) )

# prob 5:( pnorm(p+0.1, p, sdphat) - pnorm(p-0.1, p, sdphat) )

# prob 6:n2 <- 100( sdphat2 <- sqrt(p*(1-p)/n2) )( pnorm(p+0.1, p, sdphat2) - pnorm(p-0.1, p, sdphat2) )

# comparison of n=25 and n=100vec <- seq(0.01,0.99, length=1000)par(yaxt='n',mar=c(4,.3,.3,.3)) plot(dnorm(vec, p, sdphat2)~vec, type='n', ylab=' ',xlab=expression(hat(p)))grid(col='gray80')lines(dnorm(vec, p, sdphat)~vec, lty=1, lwd=2)lines(dnorm(vec, p, sdphat2)~vec, lty=2, lwd=2)abline(v=p, col='red', lty=2)text(x=p,y=0,labels=paste("p =",round(p,4)),col='red')legend('topleft', legend=c(paste('N(',round(p,4),', ',round(sdphat,4),'), n=25',sep=''), paste('N(',round(p,4),', ',round(sdphat2,4),'), n=100',sep='')), bg='gray90', inset=.02, lty=c(1,2), lwd=c(2,2))

Page 11: STT 200 – Lecture 5, section 23,24 Recitation  11 (3/26/2013 )

11

APPENDIX(CONT’D) R codes for the simulations(N <- length(haswi))(L <- length(rd))

# prob 1:set.seed(20); n <- 25( mystart <- sample(1:L, size=1) )( myindex <- rd[mystart+c(1:n)] )( mysample <- haswi[myindex] )

# prob 2:( myphat <- sum(mysample=="Yes")/n )

# prob 3:p <- 0.5575 ( p - myphat )

# above is for one students. For many students, we have phatsset.seed(241); phats <- numeric(nstudents <- 10000)for (t in 1:nstudents){ mystarts <- sample(1:L, size=1) myindexs <- rd[mystarts+c(1:n)] mysamples <- haswi[myindexs] phats[t] <- sum(mysamples=="Yes")/n}phats <- na.omit(phats)

# prob 4:( sdphat <- sqrt(p*(1-p)/n) )hist(phats, xlab=expression(hat(p)), freq=F, main='')vec <- seq(min(phats), max(phats), length=1000); lines(dnorm(vec, p, sdphat)~vec)

Page 12: STT 200 – Lecture 5, section 23,24 Recitation  11 (3/26/2013 )

12Thank you.