basic concept of statistics
DESCRIPTION
Basic concept of statistics. Measures of central tendency. Measures of dispersion & variability. Measures of tendency central. Arithmetic mean (= simple average). Best estimate of population mean is the sample mean, X. measurement in population. summation. sample size. - PowerPoint PPT PresentationTRANSCRIPT
Basic concept of statistics Measures of central Measures of central tendency
Measures of dispersion & variability
Measures of tendency centralMeasures of tendency central
Arithmetic mean (= simple average)
summationmeasurement in population
index of measurement
• Best estimate of population mean is the sample mean, X
n
XX
n
ii
1
sample size
Measures of variabilityMeasures of variability
All describe how “spread out” the dataAll describe how “spread out” the data
1. Sum of squares,sum of squared deviations from the mean
• For a sample,
2)( XXSS i
2.2. Average or mean sum of Average or mean sum of squares = variance, squares = variance, ss22::
• For a sample,
1
22
n
XXs i )(
Why?
nn – 1 represents the – 1 represents the degrees of degrees of freedomfreedom, , , or number of independent , or number of independent quantities in the estimate quantities in the estimate ss22..
1
22
n
XXs i )(
• therefore, once n – 1 of all deviations are specified, the last deviation is already determined.
01
n
ii XX )(Greek
letter “nu”
3.3. Standard deviation, Standard deviation, ss
• For a sample,1
2
n
XXs i )(
• Variance has squared measurement units – to regain original units, take the square root
4.4. Standard error of the meanStandard error of the mean
• For a sample,ns
sX
2
Standard error of the mean is a Standard error of the mean is a measure of variability among the measure of variability among the means of repeated samples from means of repeated samples from a population.a population.
Means of repeated random samples, Means of repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …
14
14
14
14
1414
14
1414
14
1515
14
1213
13
1313
13
1313
16
16 16
16
12
14
15
14X 15X 13X
A Population of Values
For a large enough number of large For a large enough number of large samples, the frequency distribution samples, the frequency distribution of the sample means (= sampling of the sample means (= sampling
distribution), approaches a normal distribution), approaches a normal distribution.distribution.
Sample mean
Frequency
Normal distribution: bell-shaped curveNormal distribution: bell-shaped curve
Testing Testing statistical hypotheses between 2 statistical hypotheses between 2
meansmeans1.1. State the research question in State the research question in
terms of statistical hypotheses.terms of statistical hypotheses.
It is always started with a statement that hypothesizes “no difference”, called the null hypothesis = H0.
E.g., H0: Mean bill length of female hummingbirds is equal to mean bill length of male hummingbirds
Then we formulate a statement Then we formulate a statement that must be true if the null that must be true if the null hypothesis is false, called the hypothesis is false, called the alternate hypothesisalternate hypothesis = = HHAA . .
E.g., HA: Mean bill length of female hummingbirds is not equal to mean bill length of male hummingbirds
If we reject H0 as a result of sample evidence, then we conclude that HA is true.
2. Choose an appropriate statistical test that would allow you to reject H0 if H0 were false. E.g., Student’s E.g., Student’s tt test for hypotheses test for hypotheses about meansabout means
William Sealey Gosset
(a.k.a. “Student”)
21
21
XXs
XXt
Standard error of the difference
between the sample means
To estimate s(X1—X2), we must first
know the relation between both
populations.
Mean of sample 2
Mean of sample 1
t Statistic,
Relation between populationsRelation between populations
Independent populationsIndependent populations1. Identical (homogenous ) variance
2. Not identical (heterogeneous) variance
Dependent populationsDependent populations
Pooled variance:Pooled variance:21
212
SSSS
sp
Then,
2
2
1
2
21 n
s
n
ss pp
XX
Independent Population with homogenous variances
t Y 1 Y 2SE
Y 1 Y 2
SEY 1 Y 2
sp2 1
n11
n2
sp2
df1s12 df2s2
2
df1 df2
Independent Population with homogenous variances
3. Select the level of significance for the statistical test.
Level of significance (alpha value = )the probability of incorrectly rejecting the null hypothesis when it is, in fact, true. Traditionally, researchers choose Traditionally, researchers choose = 0.05. = 0.05.
5 percent of the time, or 1 time out of 20, the statistical test will reject H0 when it is true.
Note: the choice of 0.05 is arbitrary!
4.4. Determine the Determine the critical valuecritical value the the test statistic must attain to be test statistic must attain to be declared significant.declared significant.
Most test statistics have a frequency distribution
Test statistic
Fre
quency
When sample sizes are small, the sampling distribution is described better by the t distribution than by
the standard normal (Z) distribution.
Shape of t distribution depends on degrees of freedom, = n – 1.
Z = t(=)
t(=25)
t(=1)t(=5)
t
t
Area of Rejection
Area of Acceptance
Area of Rejection
Lower critical value
Upper critical value
0
0.95 0.0250.025For = 0.05
The distribution of a test statistic is divided into The distribution of a test statistic is divided into an area of acceptance and an area of rejection.an area of acceptance and an area of rejection.
5.5. Perform the statistical test.Perform the statistical test.
003
5025147515
21
21.
.
..
XXs
XXt
5021
. XXs
Mean bill length from a sample of 5 female hummingbirds, X1 = 15.75;
Mean bill length from a sample of 5 male hummingbirds, X2 = 14.25;
6.6. Draw and state the conclusions.Draw and state the conclusions.
• Compare the calculated test statistic with the critical test statistic at the chosen .
• Obtain the P-value = probability for the test statistic.
• Reject or fail to reject H0.
Critical t for a test about equality = t(2),
to test H0 at = 0.05 using n1 = 5, n2 = 5,
if |t| 2.306, reject H0.
t(2), = t0.05(2),8 = 2.306.
Since calculated Since calculated tt > > tt0.05(2),80.05(2),8 (because (because 3.000 > 2.306), reject 3.000 > 2.306), reject HH00..
Conclude that hummingbird bill length is sexually size-dimorphic.
What is the probability, P, of observing by chance a difference as large as we saw
between female and male hummingbird bill lengths?
0.01 < P < 0.02
t Y 1 Y 2s12
n1
s22
n2
df
s12
n1
s22
n2
2
s12 n1 2n1 1
s22 n2 2n2 1
Independent Population with heterogenous variances
Sample Null hypothesis: The mean difference is equal to
o
Dependent Populations
Test statisticNull distribution
t with n-1 df*n is the number of pairs
compare
How unusual is this test statistic?
P < 0.05 P > 0.05
Reject Ho Fail to reject Ho
t d do
SEd
Analysis of VarianceAnalysis of Variance
(ANOVA)(ANOVA)
Independent T-testIndependent T-test Compares the means of one variable for TWO Compares the means of one variable for TWO
groups of cases.groups of cases. Statistical formula:Statistical formula:
Meaning: compare ‘standardized’ mean differenceMeaning: compare ‘standardized’ mean difference But this is limited to two groups. What if But this is limited to two groups. What if
groups > 2?groups > 2?• Pair wised T Test (previous example)Pair wised T Test (previous example)• ANOVA (ANalysis Of Variance)ANOVA (ANalysis Of Variance)
21
21
21
XXXX S
XXt
From T Test to ANOVAFrom T Test to ANOVA
1. Pairwise T-Test1. Pairwise T-TestIf you compare three or more groups If you compare three or more groups using t-tests with the usual 0.05 level of using t-tests with the usual 0.05 level of significance, you would have to compare significance, you would have to compare each pairs (A to B, A to C, B to C), so the each pairs (A to B, A to C, B to C), so the chance of getting the wrong result would chance of getting the wrong result would be: be:
1 - (0.95 x 0.95 x 0.95) = 14.3% 1 - (0.95 x 0.95 x 0.95) = 14.3%
Multiple T-Tests will increase the false Multiple T-Tests will increase the false alarm. alarm.
2. Analysis Of Variance2. Analysis Of Variance In T-Test, mean difference is used. In T-Test, mean difference is used.
Similar, in ANOVA test comparing the Similar, in ANOVA test comparing the observed variance among means is used.observed variance among means is used.
The logic behind ANOVA:The logic behind ANOVA:• If groups are from the same population, If groups are from the same population,
variance among means will be small (Note variance among means will be small (Note that the means from the groups are not that the means from the groups are not exactly the same.)exactly the same.)
• If groups are from different population, If groups are from different population, variance among means will be large.variance among means will be large.
From T Test to ANOVAFrom T Test to ANOVA
What is ANOVA?What is ANOVA? ANOVA (Analysis of Variance) is a procedure designed ANOVA (Analysis of Variance) is a procedure designed
to determine if the manipulation of one or more to determine if the manipulation of one or more independent variables in an experiment has a independent variables in an experiment has a statistically significant influence on the value of the statistically significant influence on the value of the dependent variable.dependent variable.
AssumptionAssumption Each independent variable is categorical (nominal scale). Each independent variable is categorical (nominal scale).
Independent variables are called Independent variables are called FactorsFactors and their values are and their values are called called levelslevels..
The dependent variable is numerical (ratio scale)The dependent variable is numerical (ratio scale) The basic idea is that the “variance” of the dependent The basic idea is that the “variance” of the dependent
variable given the influence of one or more variable given the influence of one or more independent variables {Expected Sum of Squares for a independent variables {Expected Sum of Squares for a Factor} is checked to see if it is significantly greater Factor} is checked to see if it is significantly greater than the “variance” of the dependent variable than the “variance” of the dependent variable (assuming no influence of the independent variables) (assuming no influence of the independent variables) {also known as the Mean-Square-Error (MSE)}.{also known as the Mean-Square-Error (MSE)}.
Rationale for ANOVARationale for ANOVA
• We can break the total variance in a study We can break the total variance in a study into meaningful pieces that correspond to into meaningful pieces that correspond to treatment effects and error. That’s why treatment effects and error. That’s why we call this Analysis of Variance.we call this Analysis of Variance.
GXThe Grand Mean, taken over all observations.
AX
1AX
The mean of any group.
The mean of a specific group (1 in this case).
iXThe observation or raw data for the ith subject.
The ANOVA ModelThe ANOVA Model
)()( AiGAGi XXXXXX Note:
Trial i The grand mean
A treatment
effect
Error
SS Total = SS Treatment + SS Error
Analysis of VarianceAnalysis of Variance (ANOVA) can be used to (ANOVA) can be used to test for the equality of three or more test for the equality of three or more population means using data obtained from population means using data obtained from observational or experimental studies.observational or experimental studies.
Use the sample results to test the following Use the sample results to test the following hypotheses.hypotheses.
HH00: : 11==22==33==. . . . . . = = kkHHaa: Not all population means are equal: Not all population means are equal
If If HH00 is rejected, we cannot conclude that all is rejected, we cannot conclude that all population means are different.population means are different.
Rejecting Rejecting HH00 means that at least two means that at least two population means have different values.population means have different values.
Analysis of VarianceAnalysis of Variance
Assumptions for Analysis of VarianceAssumptions for Analysis of Variance
For each population, the response variable is For each population, the response variable is normally distributed.normally distributed.
The variance of the response variable, denotedThe variance of the response variable, denoted 22, is the same for all of the populations., is the same for all of the populations.
The effect of independent variable is additiveThe effect of independent variable is additive The observations must be independent.The observations must be independent.
Analysis of Variance:Analysis of Variance:Testing for the Equality of t Population MeansTesting for the Equality of t Population Means
Between-Treatments Estimate of Between-Treatments Estimate of Population VariancePopulation Variance
Within-Treatments Estimate of Population Within-Treatments Estimate of Population VarianceVariance
Comparing the Variance Estimates: The Comparing the Variance Estimates: The FF TestTest
ANOVA TableANOVA Table
A between-treatments estimate ofA between-treatments estimate of σσ2 2 is called the is called the mean square due to treatmentsmean square due to treatments (MSTR).(MSTR).
The numerator of MSTR is called the The numerator of MSTR is called the sum of sum of squares due to treatmentssquares due to treatments (SSTR).(SSTR).
The denominator of MSTR represents the The denominator of MSTR represents the degrees of freedomdegrees of freedom associated with SSTR. associated with SSTR.
Between-Treatments Estimate Between-Treatments Estimate of Population Varianceof Population Variance
2
1
( )
MSTR1
k
j jj
n x x
k
2
1
( )
MSTR1
k
j jj
n x x
k
The estimate ofThe estimate of 22 based on the variation of the based on the variation of the sample observations within each treatment is sample observations within each treatment is called the called the mean square due to errormean square due to error (MSE).(MSE).
The numerator of MSE is called the The numerator of MSE is called the sum of sum of squares due to errorsquares due to error (SSE).(SSE).
The denominator of MSE represents the degrees The denominator of MSE represents the degrees of freedom associated with SSE.of freedom associated with SSE.
Within-Treatments Estimate Within-Treatments Estimate of Population Varianceof Population Variance
2
1
( 1)
MSE
k
j jj
T
n s
n k
2
1
( 1)
MSE
k
j jj
T
n s
n k
Comparing the Variance Estimates: Comparing the Variance Estimates: The The F F Test Test
If the null hypothesis is true and the ANOVA If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution assumptions are valid, the sampling distribution of MSTR/MSE is an of MSTR/MSE is an FF distribution with MSTR distribution with MSTR d.f. equal to d.f. equal to kk - 1 and MSE d.f. equal to - 1 and MSE d.f. equal to nnTT - - kk..
If the means of the If the means of the kk populations are not equal, populations are not equal, the value of MSTR/MSE will be inflated because the value of MSTR/MSE will be inflated because MSTR overestimatesMSTR overestimates σσ22..
Hence, we will reject Hence, we will reject HH00 if the resulting value of if the resulting value of
MSTR/MSEMSTR/MSE appears to be too large to have been appears to be too large to have been selected at random from the appropriate selected at random from the appropriate FF
distributiondistribution..
Test for the Equality of Test for the Equality of kk Population Population MeansMeans
HypothesesHypotheses
HH00: : 11==22==33==. . . . . . = = kkHHaa: : Not all population means are equalNot all population means are equal
Test StatisticTest StatisticFF = MSTR/MSE = MSTR/MSE
Test for the Equality of Test for the Equality of kk Population Population MeansMeans
Rejection RuleRejection Rule
Using test statistic: RejectUsing test statistic: Reject HH00 if if FF > > FFaa
Using Using pp-value: Reject-value: Reject HH00 if if pp-value-value < < aa
where the value of where the value of FFa a is based on an is based on an FF distribution with distribution with tt - 1 numerator degrees of - 1 numerator degrees of freedom and freedom and nnTT - - t t denominator degrees of denominator degrees of freedomfreedom
The figure below shows the rejection region The figure below shows the rejection region associated with a level of significance equal toassociated with a level of significance equal to where where FF denotes the critical value.denotes the critical value.
Sampling Distribution of MSTR/MSESampling Distribution of MSTR/MSE
Do Not Reject H0Do Not Reject H0 Reject H0Reject H0
MSTR/MSEMSTR/MSE
Critical ValueCritical ValueFF
ANOVA TableANOVA TableSource of Sum of Degrees of MeanSource of Sum of Degrees of Mean
Variation Squares Freedom Squares FVariation Squares Freedom Squares F
TreatmentTreatment SSTRSSTR kk- 1- 1 MSTR MSTR/MSEMSTR MSTR/MSE
ErrorError SSESSE nnT T - - kMSEMSE
TotalTotal SSTSST nnTT - 1 - 1
SST divided by its degrees of freedom SST divided by its degrees of freedom nnTT - 1 is simply - 1 is simply the overall sample variance that would be obtained if the overall sample variance that would be obtained if we treated the entire we treated the entire nnTT observations as one data set. observations as one data set.
k
j
n
iij
j
xx1 1
2 SSESSTR)(SST
k
j
n
iij
j
xx1 1
2 SSESSTR)(SST
What does Anova tell us?What does Anova tell us?
ANOVA will tell us whether we have ANOVA will tell us whether we have sufficient evidence to say that sufficient evidence to say that
measurements from at least one measurements from at least one treatment differ significantly from at treatment differ significantly from at
least one other.least one other.
It will not tell us which ones differ, or how It will not tell us which ones differ, or how many differ.many differ.
ANOVA vs t-testANOVA vs t-test ANOVA is like a t-test among multiple ANOVA is like a t-test among multiple
data sets simultaneouslydata sets simultaneously• t-tests can only be done between two data t-tests can only be done between two data
sets, or between one set and a “true” sets, or between one set and a “true” valuevalue
ANOVA uses the F distribution instead ANOVA uses the F distribution instead of the t-distributionof the t-distribution
ANOVA assumes that all of the data ANOVA assumes that all of the data sets have equal variancessets have equal variances• Use caution on close decisions if they Use caution on close decisions if they
don’tdon’t
ANOVA – a Hypothesis TestANOVA – a Hypothesis Test
HH00: There is no significant : There is no significant difference among the results difference among the results provided by treatments.provided by treatments.
HHaa: At least one of the treatments : At least one of the treatments provides results significantly provides results significantly different from at least one other.different from at least one other.
Yij = + j + ij
By definition, j = 0
t
j=1
The experiment produces
(r x t) Yij data values.
The analysis produces estimates of t. (We can then get estimates
of the ij by subtraction).
Linear Model
Y11 Y12 Y13 Y14 Y15 Y16 … Y1t
Y21 Y22 Y23 Y24 Y25 Y26 … Y2t
Y31 Y32 Y33 Y34 Y35 Y36 … Y3t
Y41 Y42 Y43 Y44 Y45 Y46 … Y4t
. . . . . . … .
. . . . . . … .
. . . . . . … .Yr1 Yr2 Yr3 Yr4 Yr5 Yr6 … Yrt_________________________________________________________________________________ __ __ __ __ __ __
Y.1 Y.2 Y.3 Y.4 Y.5 Y.6 … Y.t
1 2 3 4 5 6 … t
Y•1, Y•2, …, are Column Means_ _
Y• • = Y• j /t = “GRAND MEAN”
(assuming same # data points in each column)
(otherwise, Y• • = mean of all the data)
j=1
t
MODEL: Yij = + j + ij
Y• • estimates
Y • j - Y • • estimatesj (= j – ) (for all j)
These estimates are based on Gauss’ (1796)
PRINCIPLE OF LEAST SQUARES
and on COMMON SENSE
MODEL: Yij = + j + ij
If you insert the estimates into the MODEL,
(1) Yij = Y • • + (Y•j - Y • • ) + ij.
it follows that our estimate of ij is
(2) ij = Yij - Y•j
<
<
Then, Yij = Y• • + (Y• j - Y• • ) + ( Yij - Y• j)
or, (Yij - Y• • ) = (Y•j - Y• •) + (Yij - Y•j ) { { {(3)
TOTAL
VARIABILITY
in Y
=
Variability
in Y
associated
with X
Variability
in Y
associated
with all other factors
+
If you square both sides of (3), and double sum both sides (over i and j), you get, [after some unpleasant algebra, but lots of terms which
“cancel”]
(Yij - Y• • )2 = R • (Y•j - Y• •)
2 + (Yij - Y•j)
2t r
j=1 i=1 { { {j=1
t t r
j=1 i=1
TSS
TOTAL SUM OF SQUARES
=
=
SSBC
SUM OF SQUARES BETWEEN COLUMNS
+
+
SSW (SSE)
SUM OF SQUARES WITHIN COLUMNS( ( (
( ((
ANOVA TABLE
S V SS DFMeansquare
(M.S.)
Between
Columns (due to brand)
Within Columns (due to error)
SSBc t - 1 MSBC
SSBC
t- 1
SSW (r - 1) •t
SSW(r-1)•t
= MSW
=
TOTAL TSS tr -1
Hypothesis,
HO: 1 = 2 = • • • c = 0
HI: not all j = 0
Or
HO: 1 = 2 = • • • • c
HI: not all j are EQUAL
(All column means are equal)
The probability Law of MSBC MSW
= “Fcalc” , is
The F - distribution with (t-1, (r-1)t)degrees of freedom
Assuming
HO true.
Table Value
Example: Reed ManufacturingExample: Reed ManufacturingReed would like to know if the mean number of Reed would like to know if the mean number of
hours worked per week is the same for the hours worked per week is the same for the department managers at her three manufacturing department managers at her three manufacturing
plants (Buffalo, Pittsburgh, and Detroit). plants (Buffalo, Pittsburgh, and Detroit).
A simple random sample of 5 managers from each A simple random sample of 5 managers from each ofof
the three plants was taken and the number of the three plants was taken and the number of hourshours
worked by each manager for the previous weekworked by each manager for the previous week
Sample DataSample Data
Plant 1Plant 1 Plant 2 Plant 2 Plant 3Plant 3
ObservationObservation Buffalo Pittsburgh Buffalo Pittsburgh DetroitDetroit
11 48 48 73 73 51 51
22 54 54 63 63 63 63
33 57 57 66 66 61 61
44 54 54 64 64 54 54
55 62 62 74 74 56 56
Sample MeanSample Mean 55 55 68 68 57 57
Sample VarianceSample Variance 26.026.0 26.5 26.5 24.524.5
Example: Reed ManufacturingExample: Reed Manufacturing
HypothesesHypotheses
HH00:: 11==22==33
HHaa: Not all the means are equal: Not all the means are equal
where:where:
1 1 = mean number of hours worked per week by the = mean number of hours worked per week by the managers at managers at
Plant 1Plant 1
2 2 = mean number of hours worked per week by the = mean number of hours worked per week by the managers atmanagers at
Plant 2Plant 2
3 3 = mean number of hours worked per week by the = mean number of hours worked per week by the managers at managers at
Plant 3Plant 3
Example: Reed ManufacturingExample: Reed Manufacturing
Mean Square Due to TreatmentsMean Square Due to Treatments Since the sample sizes are all equalSince the sample sizes are all equal
μμ= (55 + 68 + 57)/3 = 60= (55 + 68 + 57)/3 = 60 SSTR = 5(55 -SSTR = 5(55 - 60)60)22 + 5(68 -+ 5(68 - 60)60)22 + 5(57 -+ 5(57 - 60)60)22 = =
490490
MSTR = 490/(3 - 1) = 245MSTR = 490/(3 - 1) = 245
Mean Square Due to ErrorMean Square Due to ErrorSSE = 4(26.0) + 4(26.5) + 4(24.5) = 308SSE = 4(26.0) + 4(26.5) + 4(24.5) = 308MSE = 308/(15 - 3) = 25.667MSE = 308/(15 - 3) = 25.667
==
Example: Reed ManufacturingExample: Reed Manufacturing
FF - Test - Test
If If HH00 is true, the ratio MSTR/MSE is true, the ratio MSTR/MSE should be should be
near 1 because both MSTR and MSE are near 1 because both MSTR and MSE are estimatingestimating 22. .
If If HHaa is true, the ratio should be is true, the ratio should be significantly larger than 1 because significantly larger than 1 because MSTR tends to overestimateMSTR tends to overestimate 22..
Example: Reed ManufacturingExample: Reed Manufacturing
Example: Reed ManufacturingExample: Reed Manufacturing
Rejection RuleRejection Rule
Using test statistic: Reject Using test statistic: Reject HH00 if if FF > > 3.893.89
Using Using pp-value-value : Reject : Reject HH00 if if pp-value -value < .05< .05
where where FF.05.05 = 3.89 is based on an = 3.89 is based on an FF distribution with 2 numerator degrees of distribution with 2 numerator degrees of freedom and 12 denominator degrees of freedom and 12 denominator degrees of freedomfreedom
Example: Reed ManufacturingExample: Reed Manufacturing
Test StatisticTest Statistic
FF = MSTR/MSE = 245/25.667 = 9.55 = MSTR/MSE = 245/25.667 = 9.55
ConclusionConclusion
FF = 9.55 > = 9.55 > FF.05.05 = 3.89, so we reject = 3.89, so we reject HH00. . The mean number of hours worked per The mean number of hours worked per week by department managers is not the week by department managers is not the same at each plant. same at each plant.
ANOVA TableANOVA Table
Source of Sum of Degrees of MeanSource of Sum of Degrees of Mean
Variation Squares Freedom Variation Squares Freedom Square FSquare F Treatments Treatments 490 2 245 490 2 245 9.55 9.55 Error Error 308 12 25.667308 12 25.667
Total Total 798 798 1414
Example: Reed ManufacturingExample: Reed Manufacturing
Step 1Step 1 Select the Select the ToolsTools pull-down menu pull-down menu Step 2Step 2 Choose the Choose the Data AnalysisData Analysis option option Step 3Step 3 Choose Choose Anova: Single FactorAnova: Single Factor
from the list of Analysis Toolsfrom the list of Analysis Tools
Using Excel’s Anova: Using Excel’s Anova: Single Factor Tool Single Factor Tool
Step 4Step 4 When the Anova: Single Factor dialog box When the Anova: Single Factor dialog box appears:appears:
Enter B1:D6 in the Enter B1:D6 in the Input RangeInput Range box box
Select Grouped By Select Grouped By ColumnsColumns
Select Select Labels in First RowLabels in First Row
Enter .05 in the Enter .05 in the AlphaAlpha box box
Select Select Output RangeOutput Range Enter A8 (your choice) in the Enter A8 (your choice) in the Output Output
RangeRange box box
Click Click OKOK
Using Excel’s Anova: Using Excel’s Anova: Single Factor ToolSingle Factor Tool
Value Worksheet (top portion)Value Worksheet (top portion)
A B C D E1 Observation Buffalo Pittsburgh Detroit2 1 48 73 51 3 2 54 63 634 3 57 66 615 4 54 64 54 6 5 62 74 56
Using Excel’s Anova:Using Excel’s Anova: Single Factor Tool Single Factor Tool
Value Worksheet (bottom portion)Value Worksheet (bottom portion)
Using Excel’s Anova: Using Excel’s Anova: Single Factor ToolSingle Factor Tool
A B C D E F G8 Anova: Single Factor9
10 SUMMARY11 Groups Count Sum Average Variance12 Buffalo 5 275 55 2613 Pittsburgh 5 340 68 26.514 Detroit 5 285 57 24.5151617 ANOVA18 Source of Variation SS df MS F P-value F crit19 Between Groups 490 2 245 9.54545 0.00331 3.8852920 Within Groups 308 12 25.66672122 Total 798 14
Using the Using the pp-Value-ValueThe value worksheet shows that the The value worksheet shows that the pp--
value is .00331value is .00331The rejection rule is “The rejection rule is “Reject Reject HH00 if if pp-value -value
< .05”< .05”Thus, we reject Thus, we reject HH00 because the because the pp-value -value
= .00331 <= .00331 < = .05= .05We conclude that the mean number of We conclude that the mean number of
hours worked per week by the managers hours worked per week by the managers differ among the three plantsdiffer among the three plants
Using Excel’s Anova: Using Excel’s Anova: Single Factor ToolSingle Factor Tool