sta 291 fall 2009
DESCRIPTION
STA 291 Fall 2009. Lecture 2 Dustin Lueker. Basic Terminology. P arameter Numerical characteristic of the p opulation Calculated using the whole p opulation S tatistic Numerical characteristic of the s ample Calculated using the s ample. Simple Random Sampling (SRS). - PowerPoint PPT PresentationTRANSCRIPT
STA 291Fall 2009
Lecture 2Dustin Lueker
Parameter◦ Numerical characteristic of the population
Calculated using the whole population Statistic
◦ Numerical characteristic of the sample Calculated using the sample
Basic Terminology
2STA 291 Fall 2009 Lecture 2
Each possible sample has the same probability of being selected
The sample size is usually denoted by n
Simple Random Sampling (SRS)
STA 291 Fall 2009 Lecture 2 3
Population of 4 students: Alf, Buford, Charlie, Dixie
Select a SRS of size n = 2 to ask them about their smoking habits◦ 6 possible samples of size 2
A,B A,C A,D B,C B,D C,D
Example of SRS
STA 291 Fall 2009 Lecture 2 4
Each of the size possible samples has to have the same probability of being selected◦ How could we do this?
Roll a die Random number generator
How to choose a SRS?
STA 291 Fall 2009 Lecture 2 5
Convenience sample◦ Selecting subjects that are easily accessible to you
Volunteer sample◦ Selecting the first two subjects who volunteer to take
the survey
What are the problems with these samples?◦ Proper representation of the population◦ Bias
Examples Mall interview Street corner interview
Common Problems when Sampling
STA 291 Fall 2009 Lecture 2 6
A survey of 300 random individuals was conducted in Louisville that revealed that President Obama had an approval rating of 67%.◦ Is 67% a statistic or parameter?◦ The surveyors stated that only 67% of
Kentuckians approved of President Obama. What is the problem with this statement? Why might the surveyors have chosen Louisville as
their sampling location?
Example
7STA 291 Fall 2009 Lecture 2
1936 presidential election of Alfred Landon vs. Franklin Roosevelt◦ Literary Digest sent out over 10 million
questionaires in the mail to predict the election outcome What type of sample is this?
◦ 2 million responses predicted an landslide victory for Alfred Landon
◦ George Gallup used a much small random sample and predicted a clear victory for FDR
FDR won with 62% of the vote
Famous Example
8STA 291 Fall 2009 Lecture 2
TV, radio call-in polls◦ “should the UN headquarters continue to be
located in the United States?” ABC poll with 186,000 callers: 67% no Scientific random sample of 500: 28% no
Which sample is more trust worthy? Would any of you call in to give you opinion? Why
or why not?
Other Examples
STA 291 Fall 2009 Lecture 2 9
Another advantage of random samples◦ Inferential statistical methods can be applied to
state that “the true percentage of all Americans who want the UN headquarters out of the United States is between 24% and 32%”
◦ These methods cannot be applied to volunteer sample
Other Examples
STA 291 Fall 2009 Lecture 2 10
Whenever you see results from a poll, check whether they come from a random sample
Preferably, it should be stated ◦ Who sponsored and conducted the poll?◦ How were the questions worded?◦ How was the sample selected? ◦ How large was it?
If not, the results may not be trustworthy
Don’t Trust Bad Samples
11STA 291 Fall 2009 Lecture 2
Kalton et al. (1978), England Two groups get questions with slightly
different wording◦ Group 1
“Are you in favor of giving special priority to buses in the rush hour or not ?”
◦ Group 2 “Are you in favor of giving special priority to buses in
the rush hour or should cars have just as much priority as buses ?”
Question Wording
12STA 291 Fall 2009 Lecture 2
Result: Proportion of people saying that priority should be given to buses.
Question Wording
13
Without reference to cars
With reference to cars
Difference
All respondents 0.69 (n=1076) 0.55 (n=1081) 0.14Women 0.65 (n=585) 0.49 (n=590) 0.16Men 0.74 (n=491) 0.66 (n=488) 0.08Non Car-owners 0.73 (n=565) 0.55 (n=554) 0.18Car owners 0.66 (n=509) 0.54 (n=522) 0.12
STA 291 Fall 2009 Lecture 2
Two questions asked in different order during the cold war◦ (1)“Do you think the U.S. should let Russian newspaper
reporters come here and send back whatever they want?”
◦ (2)“Do you think Russia should let American newspaper reporters come in and send back whatever they want?” When question (1) was asked first, 36% answered “Yes” When question (2) was asked first, 73% answered “Yes” to
question (1)
Question Order
STA 291 Fall 2009 Lecture 2 14
‘Flavors’ of Statistics Descriptive Statistics
◦ Summarizing the information in a collection of data
Inferential Statistics◦ Using information from a sample to make
conclusions/predictions about the population
15STA 291 Fall 2009 Lecture 2
Example 71% of individuals surveyed believed that
the Kentucky Football team will return to a bowl game in 2009◦ Is 71% an example of descriptive or inferential
statistics? From the same sample it is concluded that
at least 85% of Kentucky Football fans approve of Coach Brooks’ job here at UK◦ Is 85% an example of descriptive or inferential
statistics?
16STA 291 Fall 2009 Lecture 2
Nominal◦ Gender, nationality, hair color, state of residence
Nominal variables have a scale of unordered categories It does not make sense to say, for example, that green hair
is greater/higher/better than orange hair Ordinal
◦ Disease status, company rating, grade in STA 291 Ordinal variables have a scale of ordered categories,
they are often treated in a quantitative manner (A = 4.0, B = 3.0, etc.) One unit can have more of a certain property than does
another unit
Qualitative Variables
17STA 291 Fall 2009 Lecture 2
Quantitative◦ Age, income, height
Quantitative variables are measured numerically, that is, for each subject a number is observed The scale for quantitative variables is called interval
scale
Quantitative Variables
18STA 291 Fall 2009 Lecture 2
A survey of Kentucky Football fans obtained the following information◦ Age◦ Whether they preferred the new blue helmet or the old
white helmet◦ The number of games they think the team will win in 2009◦ How they felt the UK vs. U of L game would turn out
U of L in a blowout U of L in a close game UK in a close game UK in a blowout
Are these qualitative or quantitative variables and what is the scale for each?
Example
19STA 291 Fall 2009 Lecture 2
A variable is discrete if it can take on a finite number of values◦ Gender◦ Favorite MLB team
Qualitative variables are discrete Continuous variables can take an infinite
continuum of possible real number values◦ Time spent studying for STA 291 per day
27 minutes 27.487 minutes 27.48682 minutes
Can be subdivided into more accurate values Therefore continuous
Discrete and Continuous
20STA 291 Fall 2009 Lecture 2
An observational study observes individuals and measures variables of interest but does not attempt to influence the responses◦ Purpose of an observational study is to describe/compare
groups or situations Example: Select a sample of men and women and ask
whether he/she has taken aspirin regularly over the past 2 years, and whether he/she had suffered a heart attack over the same period
Observational Study
21STA 291 Fall 2009 Lecture 2
An experiment deliberately imposes some treatment on individuals in order to observe their responses◦ Purpose of an experiment is to study whether the
treatment causes a change in the response Example: Randomly select men and women, divide the
sample into two groups. One group would take aspirin daily, the other would not. After 2 years, determine for each group the proportion of people who had suffered a heart attack.
Experiment
22STA 291 Fall 2009 Lecture 2
Observational Studies◦ Passive data collection◦ We observe, record, or measure, but don’t interfere
Experiments◦ Active data production◦ Actively intervene by imposing some treatment in order
to see what happens
◦ Experiments are preferable if they are possible We are able to control more things and be sure our data isn’t
tainted
Which is Preferred?
23STA 291 Fall 2009 Lecture 2