basic concept of statistics measures of central measures of central tendency measures of dispersion...

91
Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Upload: margaret-henry

Post on 14-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Basic concept of statistics Measures of central Measures of central tendency

Measures of dispersion & variability

Page 2: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Measures of tendency centralMeasures of tendency central

Arithmetic mean (= simple average)

summationmeasurement in population

index of measurement

• Best estimate of population mean is the sample mean, X

n

XX

n

ii

1

sample size

Page 3: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Measures of variabilityMeasures of variability

All describe how “spread out” the dataAll describe how “spread out” the data

1. Sum of squares,sum of squared deviations from the mean

• For a sample,

2)( XXSS i

Page 4: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

2.2. Average or mean sum of Average or mean sum of squares = variance, squares = variance, ss22::

• For a sample,

1

22

n

XXs i )(

Why?

Page 5: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

nn – 1 represents the – 1 represents the degrees of degrees of freedomfreedom, , , or number of independent , or number of independent quantities in the estimate quantities in the estimate ss22..

1

22

n

XXs i )(

• therefore, once n – 1 of all deviations are specified, the last deviation is already determined.

01

n

ii XX )(Greek

letter “nu”

Page 6: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

3.3. Standard deviation, Standard deviation, ss

• For a sample,1

2

n

XXs i )(

• Variance has squared measurement units – to regain original units, take the square root

Page 7: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

4.4. Standard error of the meanStandard error of the mean

• For a sample,ns

sX

2

Standard error of the mean is a measure of variability among the means of

repeated samples from a population.

Page 8: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

N = 28 N = 28 μμ = 44 = 44 σσ² = 1.214² = 1.214

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

Population

Page 9: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

repeated random samplingrepeated random sampling, each with sample size, , each with sample size, nn = 5 values = 5 values

……

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

43

Page 10: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

repeated random samplingrepeated random sampling, each with sample size, , each with sample size, nn = 5 values = 5 values

……

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

43 44

Page 11: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

repeated random samplingrepeated random sampling, each with sample size, , each with sample size, nn = 5 values = 5 values

……

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

43 44 45

Page 12: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

repeated random samplingrepeated random sampling, each with sample size, , each with sample size, nn = 5 values = 5 values

……

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

43 44 45 44

Page 13: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

repeated random samplingrepeated random sampling, each with sample size, , each with sample size, nn = 5 values = 5 values

……

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

43 44 45 44 44

Page 14: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

repeated random samplingrepeated random sampling, each with sample size, , each with sample size, nn = 5 values = 5 values

……

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

44X

A Population of ValuesBody Weight Data (Kg)

Page 15: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

46

Page 16: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

46 44

Page 17: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

46 44 46

Page 18: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

46 44 46 45

Page 19: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

46 44 46 45 44

Page 20: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

45X

A Population of ValuesBody Weight Data (Kg)

Page 21: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

42

Page 22: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

42 42

Page 23: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

42 42 43

Page 24: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

42 42 43 45

Page 25: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

A Population of ValuesBody Weight Data (Kg)

42 42 43 45 43

Page 26: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …

44

44

44

44

4444

44

4444

44

4545

44

4243

43

4343

43

4343

46

46 46

46

42

44

45

43X

A Population of ValuesBody Weight Data (Kg)

Page 27: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

For a large enough number of large For a large enough number of large samples, the frequency distribution samples, the frequency distribution of the sample means (= sampling of the sample means (= sampling

distribution), approaches a normal distribution), approaches a normal distribution.distribution.

Page 28: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Sample mean

Frequency

Normal distribution: bell-shaped curveNormal distribution: bell-shaped curve

Page 29: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Testing statistical hypothesesTesting statistical hypotheses between 2 means between 2 means

1.1. State the research question in State the research question in terms of statistical hypotheses.terms of statistical hypotheses.

It is always started with a statement that hypothesizes “no difference”, called the null hypothesis = H0.

E.g., H0: Mean bill length of female hummingbirds is equal to mean bill length of male hummingbirds

Page 30: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Then we formulate a statement Then we formulate a statement that must be true if the null that must be true if the null hypothesis is false, called the hypothesis is false, called the alternate hypothesisalternate hypothesis = = HHAA . .

E.g., HA: Mean bill length of female hummingbirds is not equal to mean bill length of male hummingbirds

If we reject H0 as a result of sample evidence, then we conclude that HA

is true.

Page 31: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

2. Choose an appropriate statistical test that would allow you to reject H0 if H0 were false. E.g., Student’s E.g., Student’s tt test for hypotheses test for hypotheses about meansabout means

William Sealey Gosset

(a.k.a. “Student”)

Page 32: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

21

21

XXs

XXt

Standard error of the difference

between the sample means

To estimate s(X1 - X2), we must first

know

the relation between both

populations.

Mean of sample 2

Mean of sample 1

t Statistic,

Page 33: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

How to evaluate the success of this experimental design class

Compare the score of statistics and experimental design of several student

Compare the score of experimental design of several student from two serial classes

Compare the score of experimental design of several student from two different

classes

Page 34: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Comparing the score of Statistics and experimental

experimental design of several student

Similar Student

Dependent

populations

Identical Variance

Different Student

Independent

populations

Identical Variance

Not Identical Variance

Page 35: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Different Student

Independent

populations

Identical Variance

Not Identical Variance

Comparing the score of experimental design of several student from two serial classes

Page 36: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Comparing the score of experimental design of several

student from two classes

Different Student

Independent

populations

Identical Variance

Not Identical Variance

Page 37: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Relation between populationsRelation between populations

Dependent populations Independent populations

1. Identical (homogenous ) variance

2. Not identical (heterogeneous) variance

Page 38: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Sample Null hypothesis: The mean difference is equal to

o

Dependent Populations

Test statisticNull distribution

t with n-1 df*n is the number of pairs

compare

How unusual is this test statistic?

P < 0.05 P > 0.05

Reject Ho Fail to reject Ho

t d do

SEd

Page 39: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Pooled variance:Pooled variance:21

222

2112

ss

sp

Then,

2

2

1

2

21 n

s

n

ss pp

XX

Independent Population with homogenous variances

Page 40: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

t Y 1 Y 2SE

Y 1 Y 2

SEY 1 Y 2

sp2 1

n11

n2

21

222

2112

dfdf

sdfsdfsp

Independent Population with homogenous variances

Page 41: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

When sample sizes are small, the sampling distribution is described better by the t distribution than by

the standard normal (Z) distribution.

Shape of t distribution depends on degrees of freedom, = n – 1.

Page 42: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Z = t(=)

t(=25)

t(=1)t(=5)

t

Page 43: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

t

Area of Rejection

Area of Acceptance

Area of Rejection

Lower critical value

Upper critical value

0

0.95 0.0250.025For = 0.05

The distribution of a test statistic is divided into The distribution of a test statistic is divided into an area of acceptance and an area of rejection.an area of acceptance and an area of rejection.

Page 44: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Critical t for a test about equality = t(2),

Page 45: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

t Y 1 Y 2s12

n1

s22

n2

df

s12

n1

s22

n2

2

s12 n1 2n1 1

s22 n2 2n2 1

Independent Population with heterogenous variances

Page 46: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Analysis of VarianceAnalysis of Variance

(ANOVA)(ANOVA)

Page 47: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Independent T-testIndependent T-test Compares the means of one variable for TWO

groups of cases. Statistical formula:

Meaning: compare ‘standardized’ mean difference But this is limited to two groups. What if

groups > 2?• Pair wised T Test (previous example)• ANOVA (Analysis of Variance)

21

21

21

XXXX S

XXt

Page 48: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

From T Test to ANOVAFrom T Test to ANOVA

11. Pairwise T-TestIf you compare three or more groups using t-tests with the usual 0.05 level of significance, you would have to compare each pairs (A to B, A to C, B to C), so the chance of getting the wrong result would be:

1 - (0.95 x 0.95 x 0.95)   =   14.3% Multiple T-Tests will increase the false alarm.

Page 49: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

2. 2. Analysis of Variance In T-Test, mean difference is used.

Similar, in ANOVA test comparing the observed variance among means is used.

The logic behind ANOVA:• If groups are from the same population,

variance among means will be small (Note that the means from the groups are not exactly the same.)

• If groups are from different population, variance among means will be large.

From T Test to ANOVAFrom T Test to ANOVA

Page 50: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

What is ANOVA?What is ANOVA? Analysis of Variance A procedure designed to determine if the

manipulation of one or more independent variables in an experiment has a statistically significant influence on the value of the dependent variable.

Assumption:Each independent variable is categorical

(nominal scale). Independent variables are called Factors and their values are called levels.

The dependent variable is numerical (ratio scale)

Page 51: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

What is ANOVA?What is ANOVA?The basic idea of Anova:

The “variance” of the dependent variable given the influence of one

or more independent variables {Expected Sum of Squares for a Factor} is checked to see if it is

significantly greater than the “variance” of the dependent variable

(assuming no influence of the independent variables) {also known as the Mean-Square-Error (MSE)}.

Page 52: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Pair-t-Test

Amir 69 Budi 82

Abas 64 Berta 78

Abi 70 Bambang 82

Aura 67 Banu 81

Ana 69 Betty 82

Anis 69 Bagus 77

Berth 78

Average 68 80

n 6 7

Var. sample 4.8 5.07

Page 53: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

ANOVA TABLE OF 2 POPULATIONS

S V SS DF Mean square

(M.S.)

Between populations

Within populations

SSbetween

1 MSBSSBDFB

SSWithin

(n1-1)+ (n2-1)

SSWDFW

= MSW

=

TOTAL SSTotal n1 + n2 -1

Page 54: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Rationale for ANOVARationale for ANOVA

• We can break the total variance in a study We can break the total variance in a study into meaningful pieces that correspond to into meaningful pieces that correspond to treatment effects and error. That’s why treatment effects and error. That’s why we call this Analysis of Variance.we call this Analysis of Variance.

GXThe Grand Mean, taken over all observations.

AX

1AX

The mean of any group.

The mean of a specific group (1 in this case).

iXThe observation or raw data for the ith subject.

Page 55: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

The ANOVA ModelThe ANOVA Model

)()( AiGAGi XXXXXX

Trial i The grand mean

A treatment

effect

Error

SS Total = SS Treatment + SS Error

Page 56: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Analysis of Variance (ANOVA) can be used to test for the equality of three or more population means using data obtained from observational or experimental studies.

Use the sample results to test the following hypotheses.

H0: 1=2=3=. . . = kHa: Not all population means are equal

If H0 is rejected, we cannot conclude that all population means are different.

Rejecting H0 means that at least two population means have different values.

Analysis of VarianceAnalysis of Variance

Page 57: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Assumptions for Analysis of VarianceAssumptions for Analysis of Variance

For each population, the response variable is normally distributed.

The variance of the response variable, denoted 2, is the same for all of the populations.

The effect of independent variable is additive

The observations must be independent.

Page 58: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Analysis of Variance:Testing for the Equality of t Population Means

Between-Treatments Estimate of Population Variance

Within-Treatments Estimate of Population Variance

Comparing the Variance Estimates: The F Test

ANOVA Table

Page 59: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

A between-treatments estimate of σ2 is called the mean square due to treatments (MSTR).

The numerator of MSTR is called the sum of squares due to treatments (SSTR).

The denominator of MSTR represents the degrees of freedom associated with SSTR.

Between-Treatments Estimate Between-Treatments Estimate of Population Varianceof Population Variance

2

1

( )

MSTR1

k

j jj

n x x

k

2

1

( )

MSTR1

k

j jj

n x x

k

Page 60: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

The estimate of 2 based on the variation of the sample observations within each treatment is called the mean square due to error (MSE).

The numerator of MSE is called the sum of squares due to error (SSE).

The denominator of MSE represents the degrees of freedom associated with SSE.

Within-Treatments Estimate Within-Treatments Estimate of Population Varianceof Population Variance

2

1

( 1)

MSE

k

j jj

T

n s

n k

2

1

( 1)

MSE

k

j jj

T

n s

n k

Page 61: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Comparing the Variance Estimates: Comparing the Variance Estimates: The The F F Test Test

If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of MSTR/MSE is an F distribution with MSTR d.f. equal to k - 1 and MSE d.f. equal to nT - k.

If the means of the k populations are not equal, the value of MSTR/MSE will be inflated because MSTR overestimates σ2.

Hence, we will reject H0 if the resulting value of MSTR/MSE appears to be too large to have been selected at random from the appropriate F distribution.

Page 62: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Test for the Equality of Test for the Equality of kk Population Population MeansMeans

Hypotheses

H0: 1=2=3=. . . = kHa: Not all population means are equal

Test StatisticF = MSTR/MSE

Page 63: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Test for the Equality of Test for the Equality of kk Population Population MeansMeans

Rejection Rule Using test statistic: Reject H0 if F > Fa

Using p-value: Reject H0 if p-value < a

where the value of Fa is based on an F distribution with t - 1 numerator degrees of freedom and nT - t denominator degrees of freedom

Page 64: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

The figure below shows the rejection region associated with a level of significance equal to where F denotes the critical value.

Sampling Distribution of MSTR/MSESampling Distribution of MSTR/MSE

Do Not Reject H0Do Not Reject H0 Reject H0Reject H0

MSTR/MSEMSTR/MSE

Critical ValueCritical ValueFF

Page 65: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

ANOVA TableANOVA TableSource of Sum of Degrees of MeanSource of Sum of Degrees of Mean

Variation Squares Freedom Squares FVariation Squares Freedom Squares F

TreatmentTreatment SSTRSSTR kk- 1- 1 MSTR MSTR/MSEMSTR MSTR/MSE

ErrorError SSESSE nnT T - - kMSEMSE

TotalTotal SSTSST nnTT - 1 - 1

SST divided by its degrees of freedom nT - 1 is simply the overall sample variance that would be obtained if we treated the entire nT observations as one data set.

k

j

n

iij

j

xx1 1

2 SSESSTR)(SST

k

j

n

iij

j

xx1 1

2 SSESSTR)(SST

Page 66: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

What does Anova tell us?What does Anova tell us?

ANOVA will tell us whether we have sufficient evidence to say

that measurements from at least one treatment differ significantly

from at least one other.It will not tell us which ones

differ, or how many differ.

Page 67: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

ANOVA vs t-testANOVA vs t-test ANOVA is like a t-test among multiple

data sets simultaneously• t-tests can only be done between two data

sets, or between one set and a “true” value

ANOVA uses the F distribution instead of the t-distribution

ANOVA assumes that all of the data sets have equal variances• Use caution on close decisions if they

don’t

Page 68: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

ANOVA – a Hypothesis TestANOVA – a Hypothesis Test

H0:

There is no significant difference among the results provided by treatments.

Ha:

At least one of the treatments provides results significantly different from at least one other.

Page 69: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Yij = + j + ij

By definition, j = 0

t

j=1

The experiment produces

(r x t) Yij data values.

The analysis produces estimates of t. (We can then get estimates

of the ij by subtraction).

Linear Model

Page 70: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Y11 Y12 Y13 Y14 Y15 Y16 … Y1t

Y21 Y22 Y23 Y24 Y25 Y26 … Y2t

Y31 Y32 Y33 Y34 Y35 Y36 … Y3t

Y41 Y42 Y43 Y44 Y45 Y46 … Y4t

. . . . . . … .

. . . . . . … .

. . . . . . … .Yr1 Yr2 Yr3 Yr4 Yr5 Yr6 … Yrt_________________________________________________________________________________ __ __ __ __ __ __

Y.1 Y.2 Y.3 Y.4 Y.5 Y.6 … Y.t

              

1 2 3 4 5 6 … t

Y•1, Y•2, …, are Column Means_ _

Page 71: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Y• • = Y• j /t = “GRAND MEAN”

(assuming same # data points in each column)

(otherwise, Y• • = mean of all the data)

j=1

t

Page 72: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

MODEL: Yij = + j + ij

Y• • estimates

Y • j - Y • • estimatesj (= j – ) (for all j)

These estimates are based on Gauss’ (1796)

PRINCIPLE OF LEAST SQUARES

and on COMMON SENSE

Page 73: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

MODEL: Yij = + j + ij

If you insert the estimates into the MODEL,

(1) Yij = Y • • + (Y•j - Y • • ) + ij.

it follows that our estimate of ij is

(2) ij = Yij - Y•j

<

<

Page 74: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Then, Yij = Y• • + (Y• j - Y• • ) + ( Yij - Y• j)

or, (Yij - Y• • ) = (Y•j - Y• •) + (Yij - Y•j ) { { {(3)

TOTAL

VARIABILITY

in Y

=

Variability

in Y

associated

with X

Variability

in Y

associated

with all other

factors

+

Page 75: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

If you square both sides of (3), and double sum both sides (over i and j), you get, [after some unpleasant algebra, but lots of terms

which “cancel”]

(Yij - Y• • )2 = R • (Y•j - Y• •)

2 + (Yij - Y•j)

2t r

j=1 i=1 { { {j=1

t t r

j=1 i=1

TSS

TOTAL SUM OF SQUARES

=

=

SSBC

SUM OF

SQUARES BETWEEN COLUMNS

+

+

SSW (SSE)

SUM OF SQUARES WITHIN COLUMNS( ( (

( ((

Page 76: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

ANOVA TABLES V SS DF

Meansquare

(M.S.)

Between

Columns (due to brand)

Within Columns (due to error)

SSBc t - 1 MSBC

SSBC

t- 1

SSWc (r - 1) •t

SSWc

(r-1)•t= MSW

=

TOTAL TSS tr -1

Page 77: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Hypothesis,

HO: 1 = 2 = • • • c = 0

HI: not all j = 0

Or

HO: 1 = 2 = • • • • c

HI: not all j are EQUAL

(All column means are equal)

Page 78: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

The probability Law of MSBC MSWc

= “Fcalc” , is

The F - distribution with (t-1, (r-1)t)degrees of freedom

Assuming

HO true.

Table Value

Page 79: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Example: Reed ManufacturingExample: Reed ManufacturingReed would like to know if the mean number of Reed would like to know if the mean number of

hours worked per week is the same for the hours worked per week is the same for the department managers at her three manufacturing department managers at her three manufacturing

plants (Buffalo, Pittsburgh, and Detroit). plants (Buffalo, Pittsburgh, and Detroit).

A simple random sample of 5 managers from each A simple random sample of 5 managers from each ofof

the three plants was taken and the number of the three plants was taken and the number of hourshours

worked by each manager for the previous weekworked by each manager for the previous week

Page 80: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Sample DataSample Data

ObservationObservation Catfish Catfish Thilapia Thilapia Tuna Tuna

11 08 08 33 33 11 11

22 14 14 23 23 23 23

33 17 17 26 26 21 21

44 14 14 24 24 14 14

55 22 22 34 34 16 16 Sample MeanSample Mean 15 15 28 28

17 17 Sample VarianceSample Variance 26.026.0 26.5 26.5

24.5 24.5

Example: Example: source of protein of Fish Feedsource of protein of Fish Feed

Page 81: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

HypothesesHypotheses

HH00:: 11==22==33

HHaa: Not all the means are equal: Not all the means are equal

where:where:

1 1 = protein content of catfish (%)= protein content of catfish (%)

2 2 = protein content of thilapia (%)= protein content of thilapia (%)

3 3 = protein content of tuna (%)= protein content of tuna (%)

Example: Protein sourceExample: Protein source

Page 82: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Mean Square Due to TreatmentsMean Square Due to Treatments Since the sample sizes are all equalSince the sample sizes are all equal

μμ= (15 + 28 + 17)/3 = 20= (15 + 28 + 17)/3 = 20 SSTR = 5(15 -SSTR = 5(15 - 20)20)22 + 5(28 -+ 5(28 - 20)20)22 + 5(17 -+ 5(17 - 20)20)22 = =

490490

MSTR = 490/(3 - 1) = 245MSTR = 490/(3 - 1) = 245

Mean Square Due to ErrorMean Square Due to ErrorSSE = 4(26.0) + 4(26.5) + 4(24.5) = 308SSE = 4(26.0) + 4(26.5) + 4(24.5) = 308MSE = 308/(15 - 3) = 25.667MSE = 308/(15 - 3) = 25.667

==

Example: Protein sourceExample: Protein source

Page 83: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

FF - Test - Test

If If HH00 is true, the ratio MSTR/MSE is true, the ratio MSTR/MSE should be should be

near 1 because both MSTR and MSE are near 1 because both MSTR and MSE are estimatingestimating 22. .

If If HHaa is true, the ratio should be is true, the ratio should be significantly larger than 1 because significantly larger than 1 because MSTR tends to overestimateMSTR tends to overestimate 22..

Example: Protein sourceExample: Protein source

Page 84: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Example: Protein sourceExample: Protein source

Rejection RuleRejection Rule

Using test statistic: Reject Using test statistic: Reject HH00 if if FF > > 3.893.89

Using Using pp-value-value : Reject : Reject HH00 if if pp-value -value < .05< .05

where where FF.05.05 = 3.89 is based on an = 3.89 is based on an FF distribution with 2 numerator degrees of distribution with 2 numerator degrees of freedom and 12 denominator degrees of freedom and 12 denominator degrees of freedomfreedom

Page 85: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Example: Protein sourceExample: Protein source

Test StatisticTest Statistic

FF = MSTR/MSE = 245/25.667 = 9.55 = MSTR/MSE = 245/25.667 = 9.55

ConclusionConclusion

FF = 9.55 > = 9.55 > FF.05.05 = 3.89, so we reject = 3.89, so we reject HH00. .

The mean number of hours worked per The mean number of hours worked per week by department managers is not the week by department managers is not the same at each plant. same at each plant.

Page 86: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

ANOVA TableANOVA Table

Source of Sum of Degrees of MeanSource of Sum of Degrees of Mean

Variation Squares Freedom Variation Squares Freedom Square FSquare F Treatments Treatments 490 2 245 490 2 245 9.55 9.55 Error Error 308 12 25.667308 12 25.667

Total Total 798 798 1414

Example: Protein SourceExample: Protein Source

Page 87: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Step 1Step 1 Select the Select the ToolsTools pull-down menu pull-down menu Step 2Step 2 Choose the Choose the Data AnalysisData Analysis option option Step 3Step 3 Choose Choose Anova: Single FactorAnova: Single Factor

from the list of Analysis Toolsfrom the list of Analysis Tools

Using Excel’s Anova: Using Excel’s Anova: Single Factor Tool Single Factor Tool

Page 88: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Step 4Step 4 When the Anova: Single Factor dialog box When the Anova: Single Factor dialog box appears:appears:

Enter B1:D6 in the Enter B1:D6 in the Input RangeInput Range box box

Select Grouped By Select Grouped By ColumnsColumns

Select Select Labels in First RowLabels in First Row

Enter .05 in the Enter .05 in the AlphaAlpha box box

Select Select Output RangeOutput Range Enter A8 (your choice) in the Enter A8 (your choice) in the Output Output

RangeRange box box

Click Click OKOK

Using Excel’s Anova: Using Excel’s Anova: Single Factor ToolSingle Factor Tool

Page 89: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Value Worksheet (top portion)Value Worksheet (top portion)

A B C D E1 Observation Buffalo Pittsburgh Detroit2 1 48 73 51 3 2 54 63 634 3 57 66 615 4 54 64 54 6 5 62 74 56

Using Excel’s Anova:Using Excel’s Anova: Single Factor Tool Single Factor Tool

Page 90: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Value Worksheet (bottom portion)Value Worksheet (bottom portion)

Using Excel’s Anova: Using Excel’s Anova: Single Factor ToolSingle Factor Tool

A B C D E F G8 Anova: Single Factor9

10 SUMMARY11 Groups Count Sum Average Variance12 Buffalo 5 275 55 2613 Pittsburgh 5 340 68 26.514 Detroit 5 285 57 24.5151617 ANOVA18 Source of Variation SS df MS F P-value F crit19 Between Groups 490 2 245 9.54545 0.00331 3.8852920 Within Groups 308 12 25.66672122 Total 798 14

Page 91: Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability

Using the Using the pp-Value-ValueThe value worksheet shows that the The value worksheet shows that the pp--

value is .00331value is .00331The rejection rule is “The rejection rule is “Reject Reject HH00 if if pp-value -value

< .05”< .05”Thus, we reject Thus, we reject HH00 because the because the pp-value -value

= .00331 <= .00331 < = .05= .05We conclude that the mean number of We conclude that the mean number of

hours worked per week by the managers hours worked per week by the managers differ among the three plantsdiffer among the three plants

Using Excel’s Anova: Using Excel’s Anova: Single Factor ToolSingle Factor Tool