feb. 6 statistic for the day: number of florida high school students who take physical education...
TRANSCRIPT
Feb. 6 Statistic for the day:Number of Florida high school
students who take physical education courses online: 1204
Assignment: Continue to review for test Assignment: Continue to review for test on Monday!on Monday!
These slides were created by Tom Hettmansperger and in some cases modified by David Hunter
Friday, Feb. 6Review
Exam #1 (100 points)Exam #1 (100 points)
Monday, Feb 9 in classMonday, Feb 9 in class
60 Multiple choice questions60 Multiple choice questions
Responsible for Responsible for Anything in lecture (except SFD)Anything in lecture (except SFD)Anything in book Chapts 1,4,5,7,8,9Anything in book Chapts 1,4,5,7,8,9
Bring ID! Bring pencils! Bring 1 sheet of notes!
2 Types of studies to obtain data relevant to your research:
Randomized ExperimentRandomized Experiment Observational StudyObservational Study
Literary Digest Survey Results:
2.4 million responded!2.4 million responded! 43% were for Roosevelt43% were for Roosevelt Literary Digest predicted a landslide victory Literary Digest predicted a landslide victory
for Alf Landonfor Alf Landon
Turning Data into Information: The distribution of the data The shape of the distributionThe shape of the distribution
Is it skewed or is it symmetric?Is it skewed or is it symmetric? What is a typical value?What is a typical value?
Should we use the mean or the median?Should we use the mean or the median? What is the spread of the distribution?What is the spread of the distribution?
Should we use the standard deviation or the Should we use the standard deviation or the interquartile range? interquartile range?
What are the quartiles?What are the quartiles?
Mean vs. Median: Which is more “typical” in this (right-skewed) case?
10005000
100
50
0
CDs
Fre
quen
cy
Histogram of CD ownership, Stat 100.2 S04
Mean = 89
Median=50
Age at Death of English Rulers
60, 50, 47, 53, 48, 33, 71, 43, 65, 34, 60, 50, 47, 53, 48, 33, 71, 43, 65, 34, 56, 59, 49, 81, 67, 68, 49, 16, 86, 6756, 59, 49, 81, 67, 68, 49, 16, 86, 67
Turn these data into information.
Shape: Stem and Leaf Display 1 6
2
3 34
4 37899
5 0369
6 05778
7 1
8 16
The Median and the Quartiles
16 33 34 43 47 * 48 49 49 50 53**56 59 60 65 67***67 68 71 81 86
Q1 M Q3
The first quartile is the number that divides the data into the firstquarter and the last three quarters.
The median divides the data into halves.
(5) (5) (5) (5)
5 Number Summary
Median M = 54.5Median M = 54.5 First Quartile Q1 = 47.5First Quartile Q1 = 47.5 Third Quartile Q3 = 67Third Quartile Q3 = 67 Lowest = 16Lowest = 16 Highest = 86Highest = 86
Anatomy of a Boxplot
10
20
30
40
50
60
70
80
90
age
Age at death of a sample of 20rulers of England
IQR = Q3 - Q1
(whiskers)Reasonable range of data
Outlier
Q1
Q3
M
Shape: Histogram
10 20 30 40 50 60 70 80 90
0
1
2
3
4
5
age
Fre
que
ncy
Age at death of a sample of 20 rulers of England
Rough way to approximate the standard deviation:
Look at the histogram and estimate the Look at the histogram and estimate the range of the middle 95%range of the middle 95% of the data. of the data.
The standard deviation is about The standard deviation is about
¼¼ of this range of this range
Research Question 1: How high should I build my doorways so that 99% of the people will not
have to duck?
Secondary Question 2: If I built my Secondary Question 2: If I built my doors 75 inches (6 feet 3 inches) doors 75 inches (6 feet 3 inches) high, what percent of the people high, what percent of the people
would have to duck?would have to duck?
(Assume normal distribution with mean 68, st. dev. 4)
Z-Scores: Measurement in Standard Deviations
Given the mean (68), the standard deviation Given the mean (68), the standard deviation (4), and a value (height say 75) compute (4), and a value (height say 75) compute
This says that 75 is 1.75 standard deviations above the mean.
Z = (75-mean) / SD = (75-68) / 4 = 1.75
Morals of the story:
Whenever you meet a graph that is very far from square, it is likely to produce an impression different from what you would have obtained from the data themselves.
Almost any graph in which the vertical scale does not start at zero is deceptive.
BAD
Bogus vertical scale. Hard to say what the graph should look like.
Portion of income taken by the government. Top: spending equal to the income in western states. Bottom: more densely populated east.
A perplexing polling paradox
People generally believe the results of polls.People generally believe the results of polls. People do not believe in the scientific People do not believe in the scientific
principles on which polls are basedprinciples on which polls are based
According to Gallup, most Americans said that a survey of 1500 to 2000 respondents (a larger-than-average sample size for national polls) CANNOT represent the views of all Americans.
How are Gallup Opinion Polls Taken?
Telephone interviews: Random digit dialingTelephone interviews: Random digit dialing At random pickAt random pick
Exchange (area code + first three digits; e.g., Exchange (area code + first three digits; e.g., 814 865)814 865)
Next two digits eg. 22Next two digits eg. 22 Last two digits eg. 11Last two digits eg. 11
Up to three callbacks (why callbacks?)Up to three callbacks (why callbacks?) Evenings and weekendsEvenings and weekends This catches unlisted numbersThis catches unlisted numbers
Designed to be a random sample from the POPULATION of people with telephones.
All members of the population are equallylikely to be in the sample.
Called a SIMPLE RANDOM SAMPLE.
Polls typically take roughly 1500 or 1600 people.
We generally will NOT have the benefit of a histogram to get the standard deviationor the margin of error of the sample percentage.
SECRET FORMULA FOR THE MARGIN OF ERROR OF A SAMPLE PERCENTAGE:
1-------------------------------Square root of sample size
Margin of error: 2 standard deviations
The Morning After PillThe Morning After Pill
YesYes NoNo Not sureNot sure
59.1%59.1% 37.1%37.1% 3.8%3.8%
Do you think that the ‘morning-after’ contraceptivepill should be available over the counter?
USA Today call-in poll
(http://www.usatoday.com/quick/health/qh1206a.htm)
Volunteer response vs. volunteer sample
Contraceptive call-in poll?Volunteer sample!
1936 Literary Digest poll?Volunteer response!
Which is worse?Volunteer sample!
Do you have a tattoo?
YesYes
MenMen
NoNo
MenMen
YesYes
WomenWomen
NoNo
WomenWomen
15%15% 85%85% 23%23% 77%77%
Based on:100 men136 womenStat100.2 S04
Sampling methods
(Simple) random sampling(Simple) random sampling Stratified random samplingStratified random sampling Cluster samplingCluster sampling Systematic samplingSystematic sampling Bad: Haphazard or convenience sampling Bad: Haphazard or convenience sampling
(as in tattoo survey)(as in tattoo survey)
Stratified random sampling
Divide population into subgroups, or strataDivide population into subgroups, or strata From each stratum, select a random sampleFrom each stratum, select a random sample
Example: Select a random sample from each of four groups of students (in-state non-minority, in-state minority, out-of-state non-minority, out-of-state minority) to ensure adequate representation of each group.
Cluster sampling
Divide population into subgroups, or Divide population into subgroups, or clustersclusters
Select a random sample of clustersSelect a random sample of clusters Measure individuals within selected clusters Measure individuals within selected clusters
according to some planaccording to some planExample: To study high schoolers, first take a random sample of schools and then look in depth at all students in selected schools
Systematic sampling
From a list of individuals in the population, From a list of individuals in the population, select every kselect every kthth individual individual
Grizzly example: “Decimation”, a term originally used for a punishment for mutinous Roman legions in which the legion was lined up and every tenth person killed.
Comparisons
Randomized ExperimentsRandomized Experiments
Observational StudiesObservational Studies
EXPLANATORY VARIABLE says whichEXPLANATORY VARIABLE says which
population we sampled from.population we sampled from.
RESPONSE VARIABLE says what we RESPONSE VARIABLE says what we
measured or counted.measured or counted.
The key to a good observational study ora good randomized experiment is
RANDOMIZATION
in both cases.
• In observational studies we need a randomsample from each population.
• In randomized experiments we must randomize the subjects to the differenttreatments (or treatment and control groups).
Randomized Experiment
Associated concepts and ideas:
•Control group (provides a benchmark)•Blinding: single or double (reduce bias)•Placebo (benchmark, blinding)•Confounding (a lurking third variable)•Pairing or blocking (reduces noise in data)
The Hawthorne effect
Imagine the following study, intended to determine the prevalence of cheating:
Individual students taking an exam in a particular course are filmed and observed closely by a team of extra observers, who then record the number of instances of cheating they observe.
Named for Elton Mayo’s famous study (1924-1932) of workers at the Hawthorne, Illinois plant of the Western Electric Company
What sort of a study could be used to answer this?•Observational Study?•Randomized Experiment?
If we cannot establish cause and effect, perhaps we can we establish an association between cell phones and cancer using an observational study.
Research question: Do cell phones cause cancer?
Possible Observational Study:
Response Variable: whether or not a subject gets cancer.
Explanatory Variable: whether or not the subject uses a cell phone.
This may require a very long time.
A special kind of observational study:
SWITCH RESPONSE AND EXPLANATORY VARIABLES
Response Variable: whether a subject uses a cell phone or not
Explanatory Variable: whether a subject has cancer or not.
1. Select a sample of cancer patients (Cancer Case)
2. Develop a group of people who match the cancer patients but do not have cancer. (Control)
3. Compute the % who use cell phones in each group.
Called a retrospective Case-Control Study
Research question: How does putting a smiley face on the bill influence a waitperson’s tip?
Response variable: Size of tipResponse variable: Size of tip Explanatory variable: Smiley face or notExplanatory variable: Smiley face or not Interacting variable: Sex of waitpersonInteracting variable: Sex of waitperson Female waitress: Drawing a smiley face Female waitress: Drawing a smiley face increasedincreased
tip tip Male waiter: Drawing a smiley face Male waiter: Drawing a smiley face decreaseddecreased tip tip
Source: Journ. Appl. Soc. Psych, 1996