i.6 statistical tests
TRANSCRIPT
-
8/6/2019 I.6 Statistical Tests
1/29
I.6 The Nature of Statistical Testing
Definitions:
Statistic: statement of numerical information about asample
Parameter: statement of numerical information about apopulation
Remember: When a statistic is put forth to represent apopulation, there is some ERROR associated with that figure.
Error can be described by the probability that one chose agood sample
We will use:
Normal distribution or Sampling Distribution Standard Deviation (standard error of the mean)
Definition:
Point Estimate: a single number, based on sample data, used toestimate a population parameter.
-
8/6/2019 I.6 Statistical Tests
2/29
Its great if the standard deviation is small, but HOW
SMALL?
We use an interval estimate:
An interval within we might state the parameter probablylies
Based on the sample data
A confidence interval is a specific interval estimate of aparameter determined by using data obtained from a sampleand by using the confidence level of the estimate
The probability associated with a specific interval is calledthe confidence level or confidence coefficient
Confidence because the probability is viewed as anindicator the parameter lies within the interval
The higher the probability, the more certain we are themethod will produce an interval containing the parameter
BUT! Confidence levels are specified BEFORE the intervalestimate is made
-
8/6/2019 I.6 Statistical Tests
3/29
So for a fixed confidence level (say 0.95), how does one find aconfidence interval to estimate the population mean?
Take 95/2 = 47.5. The z-value corresponding to 47.5% is 1.96.
If X is our (single) sample mean we would expect the interval
( 1.96 , 1.96 )x xX X
to contain the parameter with confidence level .95.
OR
With a confidence level of .95, the parameter is within
1.96 +1.96X X
X X
The more confidence desired, the allowance for sampling erroris greater.
This means less precision for the estimate.
-
8/6/2019 I.6 Statistical Tests
4/29
Example:
Suppose a company wants to know an interval estimate at the
.95 confidence level for the mean time
it takes a machine toproduce its product.
A sample of 100 shows an average time of 6 minutes per item.
Assuming 1.5 minutes, we need to find X .
Now comes the choices
We must assume the population is infinite (We werent told this)
This gives us:
1.5.15
100X
And so we have:
1.96 +1.96
6 1.96 0.15 6+1.96 0.15
5.706 6.294
X XX X
With a confidence level of .95, the mean is in this region.
-
8/6/2019 I.6 Statistical Tests
5/29
Suppose a sample of 100 has an average of 5.8 minutes per item.
To get a confidence level of .95 wed expect the following:
1.96 +1.96
5.8 1.96 0.15 5.8 + 1.96 0.15
5.506 6.094
X XX X
Suppose we wanted a confidence level of .9. Then wed need
the corresponding z-value.
90/2=45
And the corresponding z-value is about 1.64 with an intervalestimate of
1.65 +1.65X X
X X
-
8/6/2019 I.6 Statistical Tests
6/29
KEEP IN MIND: The use of the normal curve is based on theidea that the sampling distribution of means for LARGEsamples is normally distributed.
If the sample size is too small (say 30 or fewer), then wed use a
t distribution
(We dont cover this)
You might have asked How do we know the standard deviation
of the population? If we knew that, wed probably know themean.
In practice, the standard deviation is estimated. One way isto do the following:
1) Take a sample size of size n.
2) Let S standard for the samples standard deviation (you findthis on your own)
3) The estimated population standard deviation is:
1
nS
n
-
8/6/2019 I.6 Statistical Tests
7/29
Example: Suppose we seek an interval estimate for thepopulation given in Table 4.1 on pg 62. at the confidence of .90.
We gather a sample of size 40
Yearly Salary Number of Employees17,500 319,000 620,500 4
22,500 322,750 828,000 533,000 640,000 155,000 4
For a confidence of .90 we want to compute
1.65X
X and +1.65 XX
1. Find the mean of the above table.3(17500) 6(19000) 4(20500) 3(22250) 8(22750) 5(28000) 6(33000) 1(40000) 4(55000)
40
27,381
X
X
-
8/6/2019 I.6 Statistical Tests
8/29
It will take a long time to find S, so Ill just tell you:
S = 10,672.5
2. We estimate the population standard deviation using theestimator
1
4010672.5
39
10672.5 (1.012)
nS
n
And so the population standard deviation is estimated to be:
10672.5 (1.1012)
10,800.6
-
8/6/2019 I.6 Statistical Tests
9/29
3. Find X : Note: the population size is known
1
(10,800.6) (700 40)
69940
1659.4
X
X
N n
Nn
** X measures on average how far the sample mean is from the population
mean.
4. PLUG everything into 1.65 XX and +1.65 XX
1.65X
X =
+1.65 XX
=
So the interval is
$24,642.99 < < $30,119.01
-
8/6/2019 I.6 Statistical Tests
10/29
Example:
Suppose an agency of a state has an aid program available to
cities with an average income less than $16,000. The city ofWellon does not qualify because their average income is$17,000.
You believe they miscalculated thinking the actual average isabout $15,500.
BUT before you go bring this up to the state you want to test
it out.
A couple things to think about:
1. You must decide how large or significant should thedifference be between the average of the sample and the15,500 you think is correct.
2. Is $1,500 significant enough to bring your case to thestate?
3. What are your chances of getting a sample mean which ismore than $1,500 away?
4. If the chances are too great, you probably wont bring thisup to the state.
The city will reject their point of view if the difference betweentheir sample mean and $15,000 has a 5 percent or less chance ofoccurring.
-
8/6/2019 I.6 Statistical Tests
11/29
Consider the following graph:
The city will reject their point of view if 1X y or 2X y .
y1 and y2 are chosen about $15,500 so that 95% of the area
will lie between y1 and y2 and 5% will lie outside y1 and y2.
1 15,500 1.96 Xy
2 15,500 1.96 Xy
-
8/6/2019 I.6 Statistical Tests
12/29
Consider the following graph:
Suppose you took a sample of 30 wage earners, compute theirmean salary X , the estimator , and
X .
Heres what you do:
If the sample mean falls within the acceptance region, then you
have a good case to bring to the state.
If your sample mean falls outside the acceptance region, thenyou have a very small chance of a mean salary near $15,500.
-
8/6/2019 I.6 Statistical Tests
13/29
HYPOTHESIS TESTING PROCEDURE
1.State the assumed value of the population parameter to betested.
a.This statement is known as the null hypothesisb.It is denoted 0H . It refers to no differencec.State the conclusion to be drawn if the initial
assumption is rejected.
d.It is called the alternate hypothesis, i.e, in otherwords, there is a difference
.
2. Determine a criterion for rejection or acceptance.Establish a min acceptance level of probability for a
difference between the population parameter and thecorresponding sample statistic.
This probability level is the risk of rejecting the nullhypothesis when it is actually true.
The risk level is calledthe level of significance or alphalevel
For example: .05 Means there is a 5% chance Im wrong.
-
8/6/2019 I.6 Statistical Tests
14/29
3. Determine the appropriate probability distribution. Normal Distribution t distribution Etc.
4. Based on the significance level and the distribution chosen,define the rejection region or regions.
If the sample statistic does not fall in a rejection region,there isno statistical evidence to doubt the nullhypothesis
But!! This doesnt prove it5. Formally state the decision rule based on the sample
results.
Did you reject or fail to reject your null hypothesis?
6. Take the necessary sample and compute the appropriatesample statistic
7. Make the statistical conclusion concerning the nullhypothesis according to the results form Step 5.
-
8/6/2019 I.6 Statistical Tests
15/29
Chi-square Testing
It is used to determine the probability that the difference
between actually observed sample data and expected datahave occurred by chance.
Compare the expected distribution of a data set to the observeddistribution.
Example:
Suppose John took a coin and flipped it 5 times and recorded thenumber of tails which appeared. If John repeated this process 32times he could EXPECT the following frequency of tails toappear.
John flips a coin 5 times (this is ONE trial) and does 32 trials.
No. of Tails Expected0 11 52 103 104 55 1
What this means:
Out of the 32 trials:
-
8/6/2019 I.6 Statistical Tests
16/29
One of the trials John gets NO tails.
Five trials he gets 1 tail out of 5
Ten trials he gets 3 tails out of 5.
Five trials he gets 4 tails
In actually carrying out the process, this is what he really gets.
No. of Tails Tails Observed0 21 62 143 94 05 1
Example: There are 14 trials where 2 of the 5 tosses comes uptails.
The Question: Is his observed distribution different ENOUGHfrom the expected distribution to suspect a biased coin?
Example: If out of all 32 trials tails shows up 4 out of 5 timesthen the coin would be suspect to a bias.
-
8/6/2019 I.6 Statistical Tests
17/29
Null Hypothesis: The coin is fair (i.e., no bias)
Choose a significant levelsay .05
Formula for the chi-square statistic:
2 2 2
2 3 31 1 2 2
1 2 3
( ) ( ( ) ( )... n n
n
O E O EO E O E X
E E E E
iO is the observed frequency outcome for the ith data point and
iE is the expected frequency outcome for the ith data point.
Example: For no tails:
Expected 0 1E time Observed 0 2O times
For 3 tails:
Expected 3 10E times Observed 3 14O times
-
8/6/2019 I.6 Statistical Tests
18/29
So
2
2 2 2 2 22
0 5(2 1) (6 5) (14 10) (9 10) (1 1)
1 5 10 10 5 1X
2 7.9X
Chi Square Probability Distribution
d .05 .025 .01 .005
1 3.841 5.024 6.635 7.8792 5.991 7.378 9.210 10.597
3 7.815 9.348 11.345 12.8384 9.488 11.143 13.277 14.8605 11.070 12.832 15.086 16.7506 12.562 14.449 16.812 18.5487 14.067 16.013 18.475 20.2788 15.507 17.535 20.090 21.9559 16.919 19.023 21.666 23.589
10 18.307 20.483 23.209 25.188
-
8/6/2019 I.6 Statistical Tests
19/29
Left hand column represents the Degrees of Freedom
Example:
There are 6 different outcomes for tossing a coin. You can
get 0 tails, 1 tail, 2 tails, 3 tails 4 tails, and 5 tails.
Once we count up how many times we got 0, 1, 2, 3, 4 tails, weknow how many outcomes got 5 tails.
So it took us 5 entries (0, 1, 2, 3, 4) to figure out the 6th entry
will be 1.
* If there is one column of data, d = # of rows minus 1 = r - 1
With2 7.9X , 5 degrees of freedom, and .05
we look at the table and write down the number.
Its 11.070
In the coin tossing example where there are 5 degrees of
freedom and .05 was chosen, the man should reject his null
hypothesis if the
2
X statistic is large then 11.070.
Since our statistic was 7.9 (below 11.070), the null hypothesisshould NOT be rejected.
Conclusion:
-
8/6/2019 I.6 Statistical Tests
20/29
Example: Suppose two candidates (White and Smith) arerunning for election to congress in a district with 460,000 voters.It is suggested that women in the electorate voted for White in
significantly larger numbers than men. How can this suggestionbe tested?
We need to make some assumptions.
Suppose the electorate was evenly divided between menand woman (230,000 men and 230,000 women).
White received 62% of the votes. The expected distribution for 100 men and 100 women
would beVotes for White
Yes NoSex ofVoters
M 62 38F 62 38
If there were no bias (meaning there wasnt a larger
number of women voting for White than men), this shouldbe the expected outcome.
Exactly 62% of each (Men and Women) voted for White
However, when they took a sample of 100 men and 100 women,the following distribution was observed.
-
8/6/2019 I.6 Statistical Tests
21/29
Votes for White
Yes NoSex ofVoters
M 52 48F 72 28
It is clear White got a higher proportion of votes fromwomen in this sample
BUT.. is the difference significant.
1.Null Hypothesis: There are no differences betweenmen and women in this election.
2.Choose .01 3.Now find the Chi-square distribution.4.When there are more than one column and one row,
the degrees of freedom are
d = (# of columns 1) x (# of rows 1)
5.We decide to reject the hypothesis if 2 6.635X
-
8/6/2019 I.6 Statistical Tests
22/29
2 2 2 22
2
2
(52 62) (72 62) (48 38) (28 38)
62 62 38 38
100 / 62 100 / 62 100 / 38 100 / 38
8.489
X
X
X
Therefore, the null hypothesis is rejected; the difference issignificant.
What does this mean?
If we were to run the election over again, then wereconfident woman would still vote in a large number for
White than men.
Why did this concern us?
We wanted to make sure the difference between menand women didnt happen by chance.
If the difference was by chance, it would be mean that ifwe did run the election again the women might not votemore for White than men.
-
8/6/2019 I.6 Statistical Tests
23/29
Example: There are 20 questions on Brians multiple choiceexam. Bill observed the following distribution of answers onthe exam.
It looks like Brian might favor b as his answer on the test. Billwants to know if this is by chance. If its not by chance, thenwhen Bill doesnt know an answer should he choose b?
1.Null Hypothesis: Brian doesnt favor bor There is no difference in the distrubtion
of answers
2.Bill sets .05 3.There are 4 degrees of freedom4.If 2X > 9.488, then we reject the hypothesis.5.The expected distribution, meaning each answer is
equally likely, is:
Answer Frequencya 3b 8c 3d 3e 3
-
8/6/2019 I.6 Statistical Tests
24/29
Answer Frequency
a 4b 4c 4d 4E 4
Then
Since 5 < 9.488, Bill cannot reject they hypothesis. This meansBrian might not favor b and that the distribution could have
occurred by chance.
-
8/6/2019 I.6 Statistical Tests
25/29
Example: Consider the following tables of expected andobserved distributions concerning the buying preferences ofconsumers.
Observed ExpectedBrand
FavoredNo. of
respondentsFavored No. of
respondentsA 123 A 105B 76 B 86C 48 C 56
Based on these distributions, what should the chi-square statisticbe?
2 2 2
2
2
2
123 105 76 86 48 56
105 86 56
3.09 1.16 1.14
5.39
X
X
X
There are 2 degrees of freedom.
From our chart, we will not be rejecting the null hypothesis(whatever it is)
-
8/6/2019 I.6 Statistical Tests
26/29
Two brands of varnish, High-Glo and No-Glo, are available at alocal store. The manager of the store is keeping track of sales,so that he can accurately predict the needs of the customers.
The manager had previously predicted these needs and hasdecided to test his figures by using a chi square statistic. Theexpected and actual sales for 115 customers are listed below:
Brand Expected ActualHi-Glo 75 62No-Glo 40 53
There is 1 degree of freedom.
2 2
2
2
2
62 75 53 40
75 40
2.729 3.189
5.918
X
X
X
If .05 or .025 , we would not reject the null hypothesis
If .01 or below, we would have to reject his hypothesis
-
8/6/2019 I.6 Statistical Tests
27/29
In your project, you will have a setup similar to this:Expected
Yes No Noresponse/opinionGroup 1 A B CGroup 2 D E F
Actual / Observed
Yes No Noresponse/opinion
Group 1 R1C1 R1C2 R1C3Group 2 R2C1 R2C2 R2C3
Ex. R1C1 = row 1, column 1Ex. R2C3 = row 2, column 3
How to find the numbers in the expected table?
A(row1,col1) = 1 1 1 2 1 3 1 1 2 1
total number surveyed
R C R C R C x R C R C
(add up the numbers in row 1) x (add up the numbers in column 1)
total number surveyed
F(row2,col3) = 2 1 2 2 2 3 1 3 2 3
total number surveyed
R C R C R C x R C R C
add up the numbers in row 2 add up the numbers in column 3total number surveyed
x
-
8/6/2019 I.6 Statistical Tests
28/29
Example: Suppose I surveyed men and women on a certain question and foundthat
Men Women
Yes 57 45
No 23 12N/R 8 2
I want to test if there was a difference on how men and women would answer thisquestion. In other words, are the responses to this question dependent on theirgender?
Null:
Alternative:
Create the expected table:
-
8/6/2019 I.6 Statistical Tests
29/29
Find the chi square statistic:
If 0.05 , what is my rejection level?
Conclusion