assignment3 pizza(03)

12
PROBLEM Bill Afantenou, second year Statistics graduate student at Queensland University of Technology conducted an experiment to find out how, by varying whether he ordered thick or thin crust, whether Coke was ordered with the pizza and whether garlic bread was ordered with the pizza, affect the time it took for a pizza to be delivered to the front door of his house. Being a poor graduate student and limitation of time, he decided to have only two replicates, just to get a reasonable estimate of the variance. He also tried to repeat the experiment in as nearly as possible identical conditions to reduce “noise”. He ordered the pizza from the same shop, namely Domino’s Pizza. To be consistent he ordered a Supreme pizza each time at approximately the same time of day. The response was measured from the time he closed the telephone to the time the pizza was delivered to the front door of his house. Bill wrote each of the eight treatments on a piece of paper twice, put them all into a hat, mixed them up, and took them out one at a time to allocate the order in which each treatment was done. The three qualitative independent variables Crust (Thin=0, Thick=1), Coke (No=0, Yes=1), and Bread (Garlic bread: No=0, Yes=1) and the response variable Delivery time, are in minutes Analyze the data and summarize your findings?

Upload: arka-chakraborty

Post on 21-Jul-2016

7 views

Category:

Documents


0 download

DESCRIPTION

Regression problem

TRANSCRIPT

Page 1: Assignment3 Pizza(03)

PROBLEM

Bill Afantenou, second year Statistics graduate student at Queensland University of Technology conducted an experiment to find out how, by varying whether he ordered thick or thin crust, whether Coke was ordered with the pizza and whether garlic bread was ordered with the pizza, affect the time it took for a pizza to be delivered to the front door of his house. Being a poor graduate student and limitation of time, he decided to have only two replicates, just to get a reasonable estimate of the variance. He also tried to repeat the experiment in as nearly as possible identical conditions to reduce “noise”. He ordered the pizza from the same shop, namely Domino’s Pizza. To be consistent he ordered a Supreme pizza each time at approximately the same time of day. The response was measured from the time he closed the telephone to the time the pizza was delivered to the front door of his house. Bill wrote each of the eight treatments on a piece of paper twice, put them all into a hat, mixed them up, and took them out one at a time to allocate the order in which each treatment was done. The three qualitative independent variables Crust (Thin=0, Thick=1), Coke (No=0, Yes=1), and Bread (Garlic bread: No=0, Yes=1) and the response variable Delivery time, are in minutes Analyze the data and summarize your findings?

Page 2: Assignment3 Pizza(03)

SOLUTION

Identification of Variables

To start the analysis, the first step is to come up with the listing of independent and dependent variables .The following table shows the variables involved in the given problem:

Variable NameType of Variable

Nature of Variable

Levels of the Variable

Crust

Independent Qualitative

Thick Crust Thin Crust

CokeOrdered With

PizzaNot Ordered with Pizza

Garlic BreadOrdered With

PizzaNot Ordered With Pizza

Delivery time Dependent Quantitative N.A.

There are total 8 possible combinations from the above three factors with two repetitions for each treatment.

1Applied Statistics Assignment - III

Page 3: Assignment3 Pizza(03)

Visualization of Data

Scatter Plot

Box Plot

Observations from Box plot and Scatter Plot:

Order with thick crust and coke without garlic bread takes longer than all other treatments

The treatment with thin crust and coke along with garlic bread takes the least delivery time

All remaining treatment differ very slightly from one another

Model for Analysis

Next, we shall analyze whether the mentioned and other remaining treatments differ significantly and how the each factor affects delivery time. Since the design of the experiment corresponds to completely randomized design, the analysis can be started by taking the model as below:

Yijkl = µ + αi + βj + ψk+ (αβ)ij + (ψα)ki + (βψ)jk+(αβψ)ijk + εikj ;

where Yij = Observed value,µ = Overall mean, αi = Main effect of crustβj = Main effect of cokeψk = Main effect of bread(αβ)ij = Effect of interaction between crust and coke(ψα)ki = Effect of interaction between bread and coke (βψ)jk = Effect of interaction between coke and bread (αβψ)ijk = Effect of interaction between crust, coke and breadεijkl = Residuals, which is i.i.d. ~N(0,σ2)

2Applied Statistics Assignment - III

Page 4: Assignment3 Pizza(03)

Analysis of Variance (ANOVA):

First it is to be checked whether the source of variation in the data set is due to Treatment or due to chance alone:

Analysis of Variance Table

Source Degree Of Freedom

Sum Squares

Mean Square F - Value P- Value

Treatment 7 48.938 6.9911 4.4743 0.02586

Residuals 8 12.5 1.5625

Inference: Since the p value is only 0.02586 , it is not due to chance that we observe a variation in the data set . Thus treatment has an effect on the delivery time of the pizza.

Next the source of variation in the dataset for different factors are analyzed using the model Yijkl = µ + αi + βj + ψk+ (αβ)ij + (ψα)ki + (βψ)jk+(αβψ)ijk + εikj and the following results were obtained:

Analysis of Variance Table

Source Degree Of Freedom

Sum Squares

Mean Square

F - Value P- Value

Crust 1 18.0625 18.0625 11.56 0.00936

Coke 1 0.5625 0.5625 0.36 0.56511 Bread 1 18.0625 18.0625 11.56 0.0093

6Crust:Coke 1 10.5625 10.5625 6.76 0.0316

2Crust:Bread 1 0.0625 0.0625 0.04 0.84647 Coke:Bread 1 1.5625 1.5625 1.00 0.34659

Crust:Coke:Bread 1 0.0625 0.0625 0.04 0.84647 Residual 8 12.5000 1.5625

3Applied Statistics Assignment - III

Page 5: Assignment3 Pizza(03)

Inference:

Main effect of Crust is significant Main effect of Bread is significant Main effect of coke is insignificant however two way interaction of Crust with

Coke is significant Two way interactions involving bread are also insignificant Three way interaction is insignificant

Thus, the model assumed has to be refined so that the factors with insignificant effects can be neglected. Thus the refined model becomes:

Yijkl = µ + αi + βj + ψk+ (αβ)ij + εikj ;

Note: Although the main effect of coke is insignificant but due to Hierarchy Principle we cannot eliminate it from the model.

Analysis of Variance (ANOVA) for Refined Model :

The source of variation in the dataset analyzed using the refined model is as follows:

Analysis of Variance Table

Source Degree Of Freedom

Sum Squares

Mean Square F - Value P- Value

Crust 1 18.0625 18.0625 14.0044 0.003253

Coke 1 0.5625 0.5625 0.4361 0.522588 Bread 1 18.0625 18.0625 14.0044 0.003253

Crust:Coke 1 10.5625 10.5625 8.1894 0.015469

Residual 11 14.1875 1.2898

Inference:

Main effect of Crust is significant (as before) Main effect of Bread is significant (as before) Clearly, there is a two way interaction between Crust and Coke and

any conclusion on the effect of coke / bread on delivery time can only be studied by analyzing their interaction plots

4Applied Statistics Assignment - III

Page 6: Assignment3 Pizza(03)

Validation of Underlying Assumptions

The following are the assumptions :

Normality of the residuals Homoscedasticity of the residuals with respect to each factor

Residuals Vs Fitted value and Normalized Q – Q Plot for Residuals:

Formal Test for Normality:

Method P – ValueShapiro-Wilk normality test 0.3607613Anderson-Darling normality test 0.4712468Cramer-von Mises normality test 0.5079630Lilliefors (Kolmogorov-Smirnov) normality test 0.6053916Shapiro-Francia normality test 0.5500471

5Applied Statistics Assignment - III

Page 7: Assignment3 Pizza(03)

Inference:

The residuals Vs fitted values are randomly distributed without any pattern, thus proving model is indeed correct

All the tests as well as the Q-Q plot of standardized residuals shows that they are indeed normally distributed

Test For Homoscedasticity:

Homoscedasticity of residuals is checked using Bartlett’s test. The below are the results of the same on each factor:

For Crust: Bartlett's K-squared = 0.8209, df = 1, p-value = 0.3649 For Coke: Bartlett's K-squared = 0.2276, df = 1, p-value = 0.6333 For Bread: Bartlett's K-squared = 0.2061, df = 1, p-value = 0.6499

Thus it can be inferred from a high p value that its by chance only that we are seeing a variation in deviation in the dataset and our null hypothesis of homoscedasticity of the residuals cannot be rejected.

Multiple Comparison :

Now, the effect of different levels of the factors on delivery time are studied by multiple comparison using Tukey’s HSD method for the complete model.

Tukey multiple comparisons of means 95% family-wise confidence level

Fit: aov(Delivery Time ~ Crust * Coke + Bread)Sl no Comparison Differenc

e Lower Upper P-adj

1 Thick Crust – Thin Crust 2.125 0.87519

08 3.374809 0.0032535

2 With Coke – Without Coke -0.375 -

1.624809 0.8748092 0.5225883

3With Garlic

Bread – Without Garlic Bread

-2.125 -3.37480

9

-0.875190

80.0032535

4Thick Crust

Without Coke – Thin Crust Without

Coke0.50

-1.916811

92.9168118

8 0.9226704

5 Thin Crust With -2.00 - 0.4168118 0.1166394

6Applied Statistics Assignment - III

Page 8: Assignment3 Pizza(03)

Coke – Thin Crust Without Coke

4.4168119 8

6Thick Crust With Coke – Thin Crust

Without Coke1.75

-0.666811

94.1668118

8 0.1887960

7Thin Crust With

Coke – Thick Crust Without

Coke-2.50

-4.91681

19

-0.083188

120.0420753

8Thick Crust With

Coke – Thick Crust Without Coke

1.25-

1.1668119

3.66681188 0.4396923

9Thick Crust With

Coke – Thin Crust With Coke

3.75 1.3331881

6.16681188 0.0032453

Inference: The following combinations have significant effect on the delivery time of the pizza

Thin crust takes less delivery time than thick crust (even with coke) Thin crust with coke takes less time thick crust without coke With garlic bread delivery time is more than without garlic bread (No

interaction with coke and crust)

Interaction Plots:Interaction plot between Crust and Coke: From the ANOVA table we can clearly see that the two factors – coke and crust interact, hence we plot their interaction.

7Applied Statistics Assignment - III

Page 9: Assignment3 Pizza(03)

Conclusions:

o Statistical evidence suggest that thin crust pizzas when ordered with coke do have a significantly lower delivery time compared to thick crust ones ordered with or without coke (refer Si No 7 & 9 of TukeyHSD comparison table above)

Effect of Garlic Bread: As mentioned before, garlic bread has statistically significant effect on delivery time. But to understand its effects, lets plot 3 way interaction diagram:

8Applied Statistics Assignment - III

Page 10: Assignment3 Pizza(03)

Here, we can interpret the Tukey’s HSD result of effect of garlic bread by comparing means when it is ordered as against when it is not ordered – both are statistically different. The same can also be observed by a simple box plot:

Conclusions:

o If garlic bread is ordered then the delivery time is reduced.

Final Summary:

From Bill Afantenou’s experimental data, statistically, we can have following inferences:

9Applied Statistics Assignment - III

Page 11: Assignment3 Pizza(03)

The delivery time gets reduced on ordering garlic bread as compared to the case when it is not ordered.

The delivery time gets reduced on ordering thin crust pizza as compared to thick crust one.

The effect of coke can’t be stated independently because of its anti – synergistic interaction with the type of crust.

Between the eight treatment groups, the order comprising of thin crust with coke and garlic bread takes significantly lesser time then thick crust with coke and no garlic bread, the exact intuition as observed from the box plot initially.

10Applied Statistics Assignment - III