sta 291 fall 2009

23
STA 291 Fall 2009 Lecture 2 Dustin Lueker

Upload: judah

Post on 23-Feb-2016

12 views

Category:

Documents


0 download

DESCRIPTION

STA 291 Fall 2009. Lecture 2 Dustin Lueker. Basic Terminology. P arameter Numerical characteristic of the p opulation Calculated using the whole p opulation S tatistic Numerical characteristic of the s ample Calculated using the s ample. Simple Random Sampling (SRS). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: STA 291 Fall 2009

STA 291Fall 2009

Lecture 2Dustin Lueker

Page 2: STA 291 Fall 2009

Parameter◦ Numerical characteristic of the population

Calculated using the whole population Statistic

◦ Numerical characteristic of the sample Calculated using the sample

Basic Terminology

2STA 291 Fall 2009 Lecture 2

Page 3: STA 291 Fall 2009

Each possible sample has the same probability of being selected

The sample size is usually denoted by n

Simple Random Sampling (SRS)

STA 291 Fall 2009 Lecture 2 3

Page 4: STA 291 Fall 2009

Population of 4 students: Alf, Buford, Charlie, Dixie

Select a SRS of size n = 2 to ask them about their smoking habits◦ 6 possible samples of size 2

A,B A,C A,D B,C B,D C,D

Example of SRS

STA 291 Fall 2009 Lecture 2 4

Page 5: STA 291 Fall 2009

Each of the size possible samples has to have the same probability of being selected◦ How could we do this?

Roll a die Random number generator

How to choose a SRS?

STA 291 Fall 2009 Lecture 2 5

Page 6: STA 291 Fall 2009

Convenience sample◦ Selecting subjects that are easily accessible to you

Volunteer sample◦ Selecting the first two subjects who volunteer to take

the survey

What are the problems with these samples?◦ Proper representation of the population◦ Bias

Examples Mall interview Street corner interview

Common Problems when Sampling

STA 291 Fall 2009 Lecture 2 6

Page 7: STA 291 Fall 2009

A survey of 300 random individuals was conducted in Louisville that revealed that President Obama had an approval rating of 67%.◦ Is 67% a statistic or parameter?◦ The surveyors stated that only 67% of

Kentuckians approved of President Obama. What is the problem with this statement? Why might the surveyors have chosen Louisville as

their sampling location?

Example

7STA 291 Fall 2009 Lecture 2

Page 8: STA 291 Fall 2009

1936 presidential election of Alfred Landon vs. Franklin Roosevelt◦ Literary Digest sent out over 10 million

questionaires in the mail to predict the election outcome What type of sample is this?

◦ 2 million responses predicted an landslide victory for Alfred Landon

◦ George Gallup used a much small random sample and predicted a clear victory for FDR

FDR won with 62% of the vote

Famous Example

8STA 291 Fall 2009 Lecture 2

Page 9: STA 291 Fall 2009

TV, radio call-in polls◦ “should the UN headquarters continue to be

located in the United States?” ABC poll with 186,000 callers: 67% no Scientific random sample of 500: 28% no

Which sample is more trust worthy? Would any of you call in to give you opinion? Why

or why not?

Other Examples

STA 291 Fall 2009 Lecture 2 9

Page 10: STA 291 Fall 2009

Another advantage of random samples◦ Inferential statistical methods can be applied to

state that “the true percentage of all Americans who want the UN headquarters out of the United States is between 24% and 32%”

◦ These methods cannot be applied to volunteer sample

Other Examples

STA 291 Fall 2009 Lecture 2 10

Page 11: STA 291 Fall 2009

Whenever you see results from a poll, check whether they come from a random sample

Preferably, it should be stated ◦ Who sponsored and conducted the poll?◦ How were the questions worded?◦ How was the sample selected? ◦ How large was it?

If not, the results may not be trustworthy

Don’t Trust Bad Samples

11STA 291 Fall 2009 Lecture 2

Page 12: STA 291 Fall 2009

Kalton et al. (1978), England Two groups get questions with slightly

different wording◦ Group 1

“Are you in favor of giving special priority to buses in the rush hour or not ?”

◦ Group 2 “Are you in favor of giving special priority to buses in

the rush hour or should cars have just as much priority as buses ?”

Question Wording

12STA 291 Fall 2009 Lecture 2

Page 13: STA 291 Fall 2009

Result: Proportion of people saying that priority should be given to buses.

Question Wording

13

Without reference to cars

With reference to cars

Difference

All respondents 0.69 (n=1076) 0.55 (n=1081) 0.14Women 0.65 (n=585) 0.49 (n=590) 0.16Men 0.74 (n=491) 0.66 (n=488) 0.08Non Car-owners 0.73 (n=565) 0.55 (n=554) 0.18Car owners 0.66 (n=509) 0.54 (n=522) 0.12

STA 291 Fall 2009 Lecture 2

Page 14: STA 291 Fall 2009

Two questions asked in different order during the cold war◦ (1)“Do you think the U.S. should let Russian newspaper

reporters come here and send back whatever they want?”

◦ (2)“Do you think Russia should let American newspaper reporters come in and send back whatever they want?” When question (1) was asked first, 36% answered “Yes” When question (2) was asked first, 73% answered “Yes” to

question (1)

Question Order

STA 291 Fall 2009 Lecture 2 14

Page 15: STA 291 Fall 2009

‘Flavors’ of Statistics Descriptive Statistics

◦ Summarizing the information in a collection of data

Inferential Statistics◦ Using information from a sample to make

conclusions/predictions about the population

15STA 291 Fall 2009 Lecture 2

Page 16: STA 291 Fall 2009

Example 71% of individuals surveyed believed that

the Kentucky Football team will return to a bowl game in 2009◦ Is 71% an example of descriptive or inferential

statistics? From the same sample it is concluded that

at least 85% of Kentucky Football fans approve of Coach Brooks’ job here at UK◦ Is 85% an example of descriptive or inferential

statistics?

16STA 291 Fall 2009 Lecture 2

Page 17: STA 291 Fall 2009

Nominal◦ Gender, nationality, hair color, state of residence

Nominal variables have a scale of unordered categories It does not make sense to say, for example, that green hair

is greater/higher/better than orange hair Ordinal

◦ Disease status, company rating, grade in STA 291 Ordinal variables have a scale of ordered categories,

they are often treated in a quantitative manner (A = 4.0, B = 3.0, etc.) One unit can have more of a certain property than does

another unit

Qualitative Variables

17STA 291 Fall 2009 Lecture 2

Page 18: STA 291 Fall 2009

Quantitative◦ Age, income, height

Quantitative variables are measured numerically, that is, for each subject a number is observed The scale for quantitative variables is called interval

scale

Quantitative Variables

18STA 291 Fall 2009 Lecture 2

Page 19: STA 291 Fall 2009

A survey of Kentucky Football fans obtained the following information◦ Age◦ Whether they preferred the new blue helmet or the old

white helmet◦ The number of games they think the team will win in 2009◦ How they felt the UK vs. U of L game would turn out

U of L in a blowout U of L in a close game UK in a close game UK in a blowout

Are these qualitative or quantitative variables and what is the scale for each?

Example

19STA 291 Fall 2009 Lecture 2

Page 20: STA 291 Fall 2009

A variable is discrete if it can take on a finite number of values◦ Gender◦ Favorite MLB team

Qualitative variables are discrete Continuous variables can take an infinite

continuum of possible real number values◦ Time spent studying for STA 291 per day

27 minutes 27.487 minutes 27.48682 minutes

Can be subdivided into more accurate values Therefore continuous

Discrete and Continuous

20STA 291 Fall 2009 Lecture 2

Page 21: STA 291 Fall 2009

An observational study observes individuals and measures variables of interest but does not attempt to influence the responses◦ Purpose of an observational study is to describe/compare

groups or situations Example: Select a sample of men and women and ask

whether he/she has taken aspirin regularly over the past 2 years, and whether he/she had suffered a heart attack over the same period

Observational Study

21STA 291 Fall 2009 Lecture 2

Page 22: STA 291 Fall 2009

An experiment deliberately imposes some treatment on individuals in order to observe their responses◦ Purpose of an experiment is to study whether the

treatment causes a change in the response Example: Randomly select men and women, divide the

sample into two groups. One group would take aspirin daily, the other would not. After 2 years, determine for each group the proportion of people who had suffered a heart attack.

Experiment

22STA 291 Fall 2009 Lecture 2

Page 23: STA 291 Fall 2009

Observational Studies◦ Passive data collection◦ We observe, record, or measure, but don’t interfere

Experiments◦ Active data production◦ Actively intervene by imposing some treatment in order

to see what happens

◦ Experiments are preferable if they are possible We are able to control more things and be sure our data isn’t

tainted

Which is Preferred?

23STA 291 Fall 2009 Lecture 2