stat 113 inferential statistics ii
TRANSCRIPT
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
STAT 113Inferential Statistics II
Standard Error
Oberlin College
November 8-10, 2021
1 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Outline
Reminder: Inference Goals
Standard Error
Confidence Intervals: Justification
2 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Two Main Goals of Inference
1. Estimating unknown quantities in a population using a dataset (by reporting confidence intervals)
2. Assessing strength of evidence about “yes/no” questions(by carrying out hypothesis tests)
3 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Variability due to Sampling
• Each potential dataset (sample) is animperfect/incomplete snapshot of the underlyingpopulation/process/phenomenon• Therefore, statistics are imperfect reflections of the
underlying parameters• However, if samples are representative, statistics areusually close to the corresponding parameter• So, we can estimate (with some, but not full certainty) that
the unknown underlying parameter is probably close to thecorresponding statistic
4 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Self Check: Statistics and ParametersIs each of the following a statistic or a parameter?1. The mean temperature in a set of 1000 measurements taken
throughout 2020 at the Cleveland airport2. The mean temperature in Cleveland in 20203. The structural association between household income and
standardized test scores in the U.S.4. The correlation between household incomes and standardized
test scores in a dataset about college admissions5. The proportion of the time the home team won in NBA
basketball games in 20196. The size of the structural advantage associated with playing at
home in the NBA
5 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Definition: Sampling Distribution
• Consider all possible datasets of a certain sample size, n,produced by taking a representative snapshot (sample) froma process/phenomenon/population.• Each one has its own value for a particular statistic (like the
mean of a certain variable).• A sampling distribution is the collection of values of all of
these statistics (such as sample means)• Note that this is a hypothetical/theoretical construction; we
almost never actually have more than onedataset/sample/statistic
6 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Sample Distribution 6= Sampling Distribution
Sample Distribution 6= Sampling Distribution
Sample Distribution 6= Sampling Distribution
• The cases in a sample are individual observations• The cases in a sampling distribution are statistics (such as
means), each from a different potential dataset
7 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
If the process produces a flavor-life distribution like this:
Process Mean = 66.8
55 60 65 70 75 80Flavor Life (minutes)
which could yield any of the following data setsSample Mean = 65.7
55 60 65 70 75 80Flavor Life (minutes)
Sample Mean = 65.9
55 60 65 70 75 80Flavor Life (minutes)
Sample Mean = 66.9
55 60 65 70 75 80Flavor Life (minutes)
then each potential set of 10 gumballs has a mean flavor life.The sampling distribution of all such potential means mightlook like this:
55 60 65 70 75 80Mean Flavor Life (minutes)
8 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Demo: StatKey
http://lock5stat.com/statkey
9 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Self Check: Sampling Distributions
7. Which of the following best characterizes what a single caseis in a sampling distribution where the parameter of interest isthe long-run winning percentage of the home team inNBA games?(a) An NBA game(b) Whether the home team won or lost in a given game(c) A dataset consisting of several NBA games(d) The long-run winning percentage of the home team in NBA
games
10 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Self Check: Sampling Distributions
8. Which of the following best characterizes what the variable isin the above sampling distribution?(a) Whether the home team won or lost in a given game(b) The percentage of the time the home team won in a particular
dataset of NBA games(c) The long-run winning percentage of the home team in NBA
games(d) Whether or not the home team has a structural advantage in
the NBA
11 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Outline
Reminder: Inference Goals
Standard Error
Confidence Intervals: Justification
12 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Definition: Standard Error
• The distribution of a quantitative variable has a standarddeviation
• In the context of a sampling distribution, the cases arepossible datasets• The sample statistic is then a variable, since it characterizes
each possible dataset• In other words, a statistic that summarizes an individualdataset becomes a variable when we consider all possibledatasets
• The variability in the statistic across all possible datasetscan be summarized by its standard deviation.• In this particular context, the standard deviation has a special
name: the standard error (i.e., the standard deviation of thestatistic).
13 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Definition: Standard Error
• The distribution of a quantitative variable has a standarddeviation• In the context of a sampling distribution, the cases arepossible datasets
• The sample statistic is then a variable, since it characterizeseach possible dataset• In other words, a statistic that summarizes an individualdataset becomes a variable when we consider all possibledatasets
• The variability in the statistic across all possible datasetscan be summarized by its standard deviation.• In this particular context, the standard deviation has a special
name: the standard error (i.e., the standard deviation of thestatistic).
13 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Definition: Standard Error
• The distribution of a quantitative variable has a standarddeviation• In the context of a sampling distribution, the cases arepossible datasets• The sample statistic is then a variable, since it characterizes
each possible dataset
• In other words, a statistic that summarizes an individualdataset becomes a variable when we consider all possibledatasets
• The variability in the statistic across all possible datasetscan be summarized by its standard deviation.• In this particular context, the standard deviation has a special
name: the standard error (i.e., the standard deviation of thestatistic).
13 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Definition: Standard Error
• The distribution of a quantitative variable has a standarddeviation• In the context of a sampling distribution, the cases arepossible datasets• The sample statistic is then a variable, since it characterizes
each possible dataset• In other words, a statistic that summarizes an individualdataset becomes a variable when we consider all possibledatasets
• The variability in the statistic across all possible datasetscan be summarized by its standard deviation.• In this particular context, the standard deviation has a special
name: the standard error (i.e., the standard deviation of thestatistic).
13 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Definition: Standard Error
• The distribution of a quantitative variable has a standarddeviation• In the context of a sampling distribution, the cases arepossible datasets• The sample statistic is then a variable, since it characterizes
each possible dataset• In other words, a statistic that summarizes an individualdataset becomes a variable when we consider all possibledatasets• The variability in the statistic across all possible datasets
can be summarized by its standard deviation.
• In this particular context, the standard deviation has a specialname: the standard error (i.e., the standard deviation of thestatistic).
13 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Definition: Standard Error
• The distribution of a quantitative variable has a standarddeviation• In the context of a sampling distribution, the cases arepossible datasets• The sample statistic is then a variable, since it characterizes
each possible dataset• In other words, a statistic that summarizes an individualdataset becomes a variable when we consider all possibledatasets• The variability in the statistic across all possible datasets
can be summarized by its standard deviation.• In this particular context, the standard deviation has a special
name: the standard error (i.e., the standard deviation of thestatistic).
13 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
The flavor-lives of each of a set of individual gumballs has astandard devation
Process Mean = 66.8
Process SD = 2.8
55 60 65 70 75 80Flavor Life (minutes)
The mean flavor lives of each potential dataset of 10gumballs also have a standard devation
Mean of Means = 66.8
SD of Means = 0.9
55 60 65 70 75 80Mean Flavor Life (minutes)
The latter standard deviation is called the standard error of themean flavor life.
14 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
The flavor-lives of each of a set of individual gumballs has astandard devation
Process Mean = 66.8
Process SD = 2.8
55 60 65 70 75 80Flavor Life (minutes)
The mean flavor lives of each potential dataset of 10gumballs also have a standard devation
Mean of Means = 66.8
SD of Means = 0.9
55 60 65 70 75 80Mean Flavor Life (minutes)
The latter standard deviation is called the standard error of themean flavor life.
14 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
The flavor-lives of each of a set of individual gumballs has astandard devation
Process Mean = 66.8
Process SD = 2.8
55 60 65 70 75 80Flavor Life (minutes)
The mean flavor lives of each potential dataset of 10gumballs also have a standard devation
Mean of Means = 66.8
SD of Means = 0.9
55 60 65 70 75 80Mean Flavor Life (minutes)
The latter standard deviation is called the standard error of themean flavor life.
14 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Self Check: Standard Error
7. A series of 100 measurements records temperatures at theCleveland airport. Which of the following best describes whatthe standard error of the mean temperature at theCleveland airport can tell us?(a) How variable the temperatures are in Cleveland(b) Whether our dataset consists of days and times that are
warmer than average or cooler than average(c) How much sampling bias exists in our data-collection procedure(d) How much measurement bias exists in our data-collection
procedure(e) How precise we can expect an estimate of the mean
temperature to be using our data-collection procedure
15 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Outline
Reminder: Inference Goals
Standard Error
Confidence Intervals: Justification
16 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Estimation Using the 95% Rule
In a bell-shaped distribution , most (about 95%) individualvalues are within 2 Standard Deviations of the mean .
17 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Estimation Using the 95% Rule
In a bell-shaped sampling distribution of sample means,most (about 95%) individual sample means are within 2Standard Errors of the mean of sample means.
17 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Estimation Using the 95% Rule
In a bell-shaped sampling distribution of sample means,most (about 95%) individual sample means are within 2Standard Errors of the mean of sample means.
If the samples are representative, then the mean ofsample means is the population/process mean; i.e., theparameter of interest
17 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Estimation Using the 95% RuleIn a bell-shaped sampling distribution of sample means,most (about 95%) individual sample means are within 2Standard Errors of the mean of sample means.
If the samples are representative, then the mean ofsample means is the population/process mean; i.e., theparameter of interest
So, 95% of the time that I obtain one sample mean (frommy study/snapshot/dataset), the parameter of interest (thepopulation/process mean) is within 2 Standard Errorsof it.
17 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Confidence Intervals: Construction
• So, if I have a sample statistic, and if I can find astandard error, I can estimate that the populationmean, µ, is between
x̄− 2SE and x̄+ 2SE
• This statement should be correct about 95% of the time.
18 / 19
Reminder: Inference Goals Standard Error Confidence Intervals: Justification
Self Check: Confidence IntervalsA poll asked 500 registered voters to describe their dispositiontoward president Trump using a scale that goes from -5 to +5,where negative values indicate a net unfavorable disposition, andpositive values indicate a net favorable disposition. Suppose themean rating is -1.1, with a reported standard error of 0.4.8. What is the statistic here?9. What parameter is this statistic potentially well suited to
estimate?10. Give a range of values that the parameter is likely to fall within
19 / 19