tap versus bottled water : can you tell the...

Tap versus Bottled Water : Can You Tell the Difference ? ESP

Jessica Gagnon-Sénat

Vanier College

December 2013

1

Introduction:

There has been this running conflict of tap water versus bottled water. Which one is

cleaner? Which one tastes better? The subjective idea that one tastes “better” than the other

raises the question; does bottled water truly taste different from tap water, or is this preference of

taste only biased from knowing the origin of the water firsthand? This issue will be tested in

order to identify if one can truly taste the difference between the two types of water.

R.A. Fisher is the originator of the following experiment design [NP00]. This design was

achieved due to an English doctor, Dr. Muriel Bristol, who claimed she could taste the difference

between tea which had the milk poured in first or after [NP00]. Fisher was suspicious to this

statement and tested the woman [NP00]. Her results were not recorded, but this anecdote is the

origin of his experiment design which is presently used “to test the skills of touch therapists”

(100) “and to test subjects purported to have extrasensory perception” [NP00].

Sir Ronald Aylmer Fisher (1890-1962) was a statistician and geneticist [Enc]. With a

scholarship, Fisher attended University of Cambridge to study mathematics [Enc]. After these

studies, in 1914, he became a high school teacher in mathematics and physics, but this ended

when he “became the statistician for the Rothamsted Experimental Station” in Hertfordshire in

1919 [Enc]. As a statistician, “Fisher introduced the principle of randomization” and was the

originator of “the concept of analysis of variance”, which is also known as ANOVA [Enc]. He

wrote a couple of books including The Design of Experiments (1935) where the Fisher exact test

is explained [NP00].

In this experiment, we will employ the Fisher exact test with 2x2 contingency tables. The

Fisher exact test obliges the marginals (sums of row and column) to remain fixed and is used

when observing small samples [McD09].

2

Objective:

The twenty-first century opposition of tap water versus bottled water has created its fair

share of controversies. The objective of this experiment is to test if a subject can correctly

discriminate between tap water and bottled water using Fisher’s tea experiment design.

Questions:

How many cups should be used in the test? The number of samples should be large

enough that the repetitious samples create a (close to) normal distribution graph without torturing

the subject. Due to these factors, thirty samples of water per subject were chosen.

What conclusions can be drawn from a perfect score or from one with few errors? How

many errors are establishing the boundary between chance and actual discriminatory skill? The

use of hypothesis testing with a standard 0.05 level of significance will be used, but the p-values

will also be reported. Looking ahead, the significance establishes that less than 5 mistakes

[observe table 1] is required to reject the null hypothesis of the subject having expertise in

identifying tap water to bottled water.

Instruction sheet for experimental process: At first, it was difficult to know how the test

was going to be implemented. At some point, there was a possibility that each glass were to have

a folded paper at the bottom with its actual type of water written there. After further

contemplation, the idea of plastic shot glasses identified as A1, A2, A3,..., A10 for Subject A and

B1, B2, B3,..., B10 for Subject B was achieved. The samples were to be separated in groups of

ten in order to give the subject a little rest and to avoid using too many glasses and causing

confusion. Before conducting the experiment, each labeled shot glass had to be allotted tap or

bottled water randomly using a flip of a coin or simply “random” selection chosen by the

instructor. Once that has been done and the proper type of water is poured into the glasses, the

3

instructor, who does not face the subject, asks the subject to identify each labelled shot glass and

records their responses. In addition, it is to be noted that the tap water used was Laval tap water

and the bottled water used was of the “Naya” brand. Following the acquisition of the data, the

use of hypothesis testing based on the hypergeometric distribution and, additionally, the

approximate test will be used to make a conclusion about the subjects’ expertise in the

differentiation of tap and bottled water.

Mathematical Explanation:

Hypothesis testing

In order to decide if tap and bottled water are distinguishable by the subjects, hypothesis testing

will be used. A statistical hypothesis is “a statement about the parameter of one or more

populations” [MR11]. In other words, it is an assumption that a certain parameter will be of a

certain value. This statistical hypothesis is called the null hypothesis and is denoted by Ho. The

alternative hypothesis, denoted by H1, is the counter argument which could be one-sided (an

inequality, < or >) or two-sided (≠).

In the case of this experiment, our null hypothesis is that the subject is not able to discriminate

between tap and bottled water (subject will guess randomly) which implies that our alternative

hypothesis is that the subject can identify by taste which water is bottled and which is tap. It has

been decided that the test will have a 0.05 level of significance, which implies that there is

possibility of type one error. Type one error is when the null hypothesis is rejected when the

statement is in fact true.

4

Hypergeometric Distribution

Under hypothesis testing with a null hypothesis that the subjects have no expertise at

differencing tap water from bottled water, the hypergeometric distribution is a good fit to analyze

the results. This distribution implies that the trials are not independent if there are no

replacements [MR11]. Furthermore, it includes probabilistic combinations to be computed.

For a set of N objects, there are K successes which implies that there are N-K failures. If the

number of successes in a sample size n is identified with the random variable X, then the

probability mass function of the hypergeometric distribution looks like this:

p(X=x)=p(x)=

The mean (μ) and variance(σ2) of the hypergeometric distribution are:

μ=np

σ2=np(1-p) (

)

where p=

.

An Approximate Test

If data for large samples is recorded, the normal approximation of the hypergeometric

distribution becomes suitable. Also, depending on how large the samples are, it can be very long

to compute the necessary “p-values for Fisher’s exact test”, which is why the approximate test is

useful [NP00]. It must be stressed that, to use the approximation effectively, the marginals are

kept constant throughout the samples. Considering that the Z value is obtained by computing

Z=

, the expected value and the standard deviation of the hypergeometric must be calculated.

5

In addition, if the samples are not very large, a continuity correction should be used in the

equation to obtain Z which will be ±0.5 depending if the probabilities are exclusive or inclusive

[NP00]. Following is a table that displays the experimental design that will be used. In this table

the values a, b, c and d are counts and the sums are the marginals. The total number of samples

used is denoted by N.

Subject

says

WATER

Tap Bottled

Tap a c a+c

Bottled b d b+d

a+b c+d N

Using the formulas stated previously and considering the arbitrary table of values, we can

determine that

The mean denotes the number of mistakes (or correct guesses) expected by the subject. Knowing

these equations, we can solve for the Z-value.

6

Once the equation for Z is known, the approximation can easily be computed by replacing the

marginal values in the equation and applying the continuity correction to the “x” for intermediate

sized samples.

7

Probabilities:

Following is a table of probabilities for a subject to make a certain number of mistakes for one

type of water. Considering that there are 30 samples of water, 15 of each type (tap or bottled), it

implies that there is a maximum possibility of 15 mistakes to be done. The probabilities were

computed using the hypergeometric distribution in order to test our hypothesis that the subjects

are unable to identify with skill the tap water and bottled water.

The Probability of each Possible Situation [Table 1]

Number of mistakes Number of correct Probability

0 15 6.44673E-09

1 14 1.45051E-06

2 13 7.10751E-05

3 12 0.001334633

4 11 0.012011699

5 10 0.058136624

6 9 0.161490623

7 8 0.266953888

8 7 0.266953888

9 6 0.161490623

10 5 0.058136624

11 4 0.012011699

12 3 0.001334633

13 2 7.10751E-05

14 1 1.45051E-06

15 0 6.44673E-09

8

For instance, if a subject makes 5 mistakes (10 correct), the probability is calculated as such,

Basically, there are 15 successes of the water actually being of a certain type and out of those 15

samples, there are 30 samples of water. To put it differently, there are 15 instances of tap water

in the sample of 30 waters which explains the denominator. Furthermore, in this example, out of

15 samples of tap water, 10 were correctly identified which indicates that 5 samples that were

classified as tap water were, in reality, part of the 15 samples of bottled water. And simply, the

computation of 10 being correctly identified (or making 5 mistakes, both would have identical

probabilities) is done by substituting x by 10.

Data:

The experimental results were recorded with a preconception that the marginals of the 2x2

contingency tables would remain constant. In the first experimental design, there is an

approximately constant temperature between the tap water and the bottled water. This

experimental design has 30 samples of both subjects. Then, the experimental design 2 does not

have a constant variable of temperature and, once again, has 30 samples of each subject. Both

experimental designs were recorded since the tap water had acquired a particular taste when it

was refrigerated in the same place as the bottled water for 24 hours. Therefore, to avoid skewed

results, a second, similar experiment was done.

9

Counts of Water Correctly Identified as Tap Water with Approximate Temperature Constant

Subject A

Subject

says

WATER

Tap Bottled

Tap 10 5 15

Bottled 5 10 15

15 15 30

It is observed from the table that Subject A has correctly identified 10 out of 15 of the water

samples. As we refer to Table 1, the p-value of making five errors is 0.071555 (p-

value=0.058136624+0.012011699+0.001334633+7.10751E-05+1.45051E-06+6.44673E-09

=0.071555), which is the probability of making 5 mistakes or less computed by the sum of each

probability leading to five errors. Considering that the significance level is 0.05 and the p-value

obtained is greater than this, the null hypothesis of the subject being unable to identify tap water

from bottled water is failed to be rejected. Nonetheless, the p-value is very near the critical value

so it is best to say that Subject A is marginally competent at differentiating tap water from

bottled water.

Subject B

Subject

says

WATER

Tap Bottled

Tap 12 3 15

Bottled 3 12 15

15 15 30

Subject B has correctly identified 12 samples which results in a p-value of 0.001407. This really

small p-value leads to the rejection of the null hypothesis under 0.05 level of significance. This

implies that, with constant temperature, Subject B has shown expertise in differentiating tap

water from bottled water.

10

Counts of Water Correctly Identified as Tap Water with Without Temperature Constant

Subject A

Subject

says

WATER

Tap Bottled

Tap 11 4 15

Bottled 4 11 15

15 15 30

Without temperature as a controlled variable, Subject A has only done 4 mistakes leading to a

rejection of the null hypothesis under 0.05 level of significance due to the p-value being

0.013419. It can, therefore, be said that the subject shows expertise in correctly identifying tap

water and bottled water.

Subject B

Subject

says

WATER

Tap Bottled

Tap 10 5 15

Bottled 5 10 15

15 15 30

With the second experimental design, the p-value of making five errors, which Subject B has

done, is 0.071555. This p-value is exceeding α=0.05, and, therefore, the null hypothesis of the

subject being able to distinguish tap water from bottled water is failed to be rejected.

This experiment has made a connection between temperature of the waters and differentiation of

the water type. In this experiment, Subject A had better results when there was not a temperature

constant, whereas Subject B had a better skill at identifying the type of water by taste when the

temperature was constant. Nonetheless, in both cases, it can be concluded with leniency that the

subjects were able to distinguish the bottled water from the tap water. The probabilities suggest

11

that it is uncommon to guess all of the bottles and get as many correct as both subjects, A and B,

have. Subjects A and B had at most 5 mistakes with any experimental design and, even though

the p-value associated to 5 mistakes is over 0.05, the value of 0.071555 is very close. Due to this,

we reject our null hypothesis of the subjects not being able to discriminate between tap and

bottled water and, therefore, accept our alternative hypothesis of the subjects having a skill

enabling them to differentiate tap from bottled water.

Approximate Test

It has been established earlier that for this experiment, a normal approximation can be computed,

especially for large samples. Though, the sample size of this experiment does not guarantee a

reliable approximation, it will be tested. Previously, using an arbitrary 2x2 contingency table

with marginals a+b, a +c, b+d, c+d, it was determined that

Since we are taking into account that a+b=a+c=b+d=c+d=15 and N=30 in this experiment, we

can conclude, using the approximate test, that

Therefore, knowing these values, we can deduce that

12

To compute the Z-value with the data recorded from the experiment, the continuity correction

must be applied. With this in mind, 10 correct “guesses” will be corrected to 9.5, 11 correct

“guesses” will be corrected to 10.5 and 12 correct “guesses” would be corrected to 11.5.

The probability associated with this Z-score is 0.92784122 which implies that the probability of

making at most 5 mistakes is normally approximated to 0.07215878. The true value of the p-

value found by using the hypergeometric distribution through Fisher’s exact test is 0.071555.

Both probabilities are very close especially considering that when the probability is rounded to

the nearest thousandth, the values are identical.

The probability associated with this Z-score is 0.98573063 which implies that the p-value of

making 4 mistakes is normally approximated to 0.01426937. The true value of the p-value is

0.013419. The values resemble each other and the same conclusion of rejecting the null

hypothesis will be deduced.

The probability associated with this Z-score is 0.99824816 which implies that the probability of

making 3 mistakes or less is normally approximated to 0.00175184. The p-value computed using

the hypergeometric distribution was 0.001407. These two p-values are further apart from one

another compared to the previous ones, but, once again, the same conclusion is reached

regarding the hypothesis testing.

13

Conclusion

The Fisher exact test was administered in this experiment to observe different subjects’ claim of

sensory perception of water type. In this experiment, two subjects, A and B, were tested by

tasting 30 samples of water in experiment design 1 which consisted of a temperature constant,

and 30 samples of water in experiment design 2 which did not have the temperature variable

controlled. After tasting each sample, the subjects asserted the type of water they believed it was;

tap or bottled. Conclusions from the experiment were deduced from the results by having them

be observed through Fisher’s exact test with the use of the hypergeometric distribution,

hypothesis testing, as well as normal approximation. It was detected that the approximate p-

values yielded similar results from the ones obtained with the hypergeometric distribution.

Furthermore, due to the standard significance level of 0.05, one subject failed to reject the null

hypothesis in each experimental design. Nonetheless, it could be leniently concluded that the

subjects were able to differentiate the tap water from the bottled water since, in the first

experimental design, Subject B showed expertise whereas Subject A did not, and the opposite

occurred in the second experimental design. Also, having had chosen a higher level of

significance such as 0.10, all subjects would have proved their expertise. Therefore, subjectively,

experimental results show the subjects’ ability to differentiate tap water from bottled water, yet

mathematically, using a significance level of 0.05, the expertise varies between subjects and the

controlled temperature constant.

14

References

[Enc] Editors of Encyclopaedia. “Sir Ronald Aylmer Fisher”. Encyclopaedia Britannica.

Encyclopaedia Britannica, n.d.. Web. 20 Dec. 2013.

[McD09] McDonald, J.H. Handbook of Biological Statistics (2nd ed.). Baltimore: Sparky House

Publishing, 2009. 70-75. Print.

[MR11] Montgomery, Douglas C., Runger, George C.. Applied Statistica and Probability for

Engineers (5th

ed.). United States: John Wiley & Sons, 2011. Print.

[NP00] Nolan, Deborah, Speed, Terry. “Can She Taste the Difference.” Stats Labs

Mathematical Statistics Through Applications. New York: Springer-Volad, 2000. 99-117. Print.

tap versus bottled water : can you tell the...

Documents