tap versus bottled water : can you tell the...
TRANSCRIPT
Tap versus Bottled Water : Can You Tell the Difference ? ESP
Jessica Gagnon-Sénat
Vanier College
December 2013
1
Introduction:
There has been this running conflict of tap water versus bottled water. Which one is
cleaner? Which one tastes better? The subjective idea that one tastes “better” than the other
raises the question; does bottled water truly taste different from tap water, or is this preference of
taste only biased from knowing the origin of the water firsthand? This issue will be tested in
order to identify if one can truly taste the difference between the two types of water.
R.A. Fisher is the originator of the following experiment design [NP00]. This design was
achieved due to an English doctor, Dr. Muriel Bristol, who claimed she could taste the difference
between tea which had the milk poured in first or after [NP00]. Fisher was suspicious to this
statement and tested the woman [NP00]. Her results were not recorded, but this anecdote is the
origin of his experiment design which is presently used “to test the skills of touch therapists”
(100) “and to test subjects purported to have extrasensory perception” [NP00].
Sir Ronald Aylmer Fisher (1890-1962) was a statistician and geneticist [Enc]. With a
scholarship, Fisher attended University of Cambridge to study mathematics [Enc]. After these
studies, in 1914, he became a high school teacher in mathematics and physics, but this ended
when he “became the statistician for the Rothamsted Experimental Station” in Hertfordshire in
1919 [Enc]. As a statistician, “Fisher introduced the principle of randomization” and was the
originator of “the concept of analysis of variance”, which is also known as ANOVA [Enc]. He
wrote a couple of books including The Design of Experiments (1935) where the Fisher exact test
is explained [NP00].
In this experiment, we will employ the Fisher exact test with 2x2 contingency tables. The
Fisher exact test obliges the marginals (sums of row and column) to remain fixed and is used
when observing small samples [McD09].
2
Objective:
The twenty-first century opposition of tap water versus bottled water has created its fair
share of controversies. The objective of this experiment is to test if a subject can correctly
discriminate between tap water and bottled water using Fisher’s tea experiment design.
Questions:
How many cups should be used in the test? The number of samples should be large
enough that the repetitious samples create a (close to) normal distribution graph without torturing
the subject. Due to these factors, thirty samples of water per subject were chosen.
What conclusions can be drawn from a perfect score or from one with few errors? How
many errors are establishing the boundary between chance and actual discriminatory skill? The
use of hypothesis testing with a standard 0.05 level of significance will be used, but the p-values
will also be reported. Looking ahead, the significance establishes that less than 5 mistakes
[observe table 1] is required to reject the null hypothesis of the subject having expertise in
identifying tap water to bottled water.
Instruction sheet for experimental process: At first, it was difficult to know how the test
was going to be implemented. At some point, there was a possibility that each glass were to have
a folded paper at the bottom with its actual type of water written there. After further
contemplation, the idea of plastic shot glasses identified as A1, A2, A3,..., A10 for Subject A and
B1, B2, B3,..., B10 for Subject B was achieved. The samples were to be separated in groups of
ten in order to give the subject a little rest and to avoid using too many glasses and causing
confusion. Before conducting the experiment, each labeled shot glass had to be allotted tap or
bottled water randomly using a flip of a coin or simply “random” selection chosen by the
instructor. Once that has been done and the proper type of water is poured into the glasses, the
3
instructor, who does not face the subject, asks the subject to identify each labelled shot glass and
records their responses. In addition, it is to be noted that the tap water used was Laval tap water
and the bottled water used was of the “Naya” brand. Following the acquisition of the data, the
use of hypothesis testing based on the hypergeometric distribution and, additionally, the
approximate test will be used to make a conclusion about the subjects’ expertise in the
differentiation of tap and bottled water.
Mathematical Explanation:
Hypothesis testing
In order to decide if tap and bottled water are distinguishable by the subjects, hypothesis testing
will be used. A statistical hypothesis is “a statement about the parameter of one or more
populations” [MR11]. In other words, it is an assumption that a certain parameter will be of a
certain value. This statistical hypothesis is called the null hypothesis and is denoted by Ho. The
alternative hypothesis, denoted by H1, is the counter argument which could be one-sided (an
inequality, < or >) or two-sided (≠).
In the case of this experiment, our null hypothesis is that the subject is not able to discriminate
between tap and bottled water (subject will guess randomly) which implies that our alternative
hypothesis is that the subject can identify by taste which water is bottled and which is tap. It has
been decided that the test will have a 0.05 level of significance, which implies that there is
possibility of type one error. Type one error is when the null hypothesis is rejected when the
statement is in fact true.
4
Hypergeometric Distribution
Under hypothesis testing with a null hypothesis that the subjects have no expertise at
differencing tap water from bottled water, the hypergeometric distribution is a good fit to analyze
the results. This distribution implies that the trials are not independent if there are no
replacements [MR11]. Furthermore, it includes probabilistic combinations to be computed.
For a set of N objects, there are K successes which implies that there are N-K failures. If the
number of successes in a sample size n is identified with the random variable X, then the
probability mass function of the hypergeometric distribution looks like this:
p(X=x)=p(x)=
The mean (μ) and variance(σ2) of the hypergeometric distribution are:
μ=np
σ2=np(1-p) (
)
where p=
.
An Approximate Test
If data for large samples is recorded, the normal approximation of the hypergeometric
distribution becomes suitable. Also, depending on how large the samples are, it can be very long
to compute the necessary “p-values for Fisher’s exact test”, which is why the approximate test is
useful [NP00]. It must be stressed that, to use the approximation effectively, the marginals are
kept constant throughout the samples. Considering that the Z value is obtained by computing
Z=
, the expected value and the standard deviation of the hypergeometric must be calculated.
5
In addition, if the samples are not very large, a continuity correction should be used in the
equation to obtain Z which will be ±0.5 depending if the probabilities are exclusive or inclusive
[NP00]. Following is a table that displays the experimental design that will be used. In this table
the values a, b, c and d are counts and the sums are the marginals. The total number of samples
used is denoted by N.
Subject
says
WATER
Tap Bottled
Tap a c a+c
Bottled b d b+d
a+b c+d N
Using the formulas stated previously and considering the arbitrary table of values, we can
determine that
The mean denotes the number of mistakes (or correct guesses) expected by the subject. Knowing
these equations, we can solve for the Z-value.
6
Once the equation for Z is known, the approximation can easily be computed by replacing the
marginal values in the equation and applying the continuity correction to the “x” for intermediate
sized samples.
7
Probabilities:
Following is a table of probabilities for a subject to make a certain number of mistakes for one
type of water. Considering that there are 30 samples of water, 15 of each type (tap or bottled), it
implies that there is a maximum possibility of 15 mistakes to be done. The probabilities were
computed using the hypergeometric distribution in order to test our hypothesis that the subjects
are unable to identify with skill the tap water and bottled water.
The Probability of each Possible Situation [Table 1]
Number of mistakes Number of correct Probability
0 15 6.44673E-09
1 14 1.45051E-06
2 13 7.10751E-05
3 12 0.001334633
4 11 0.012011699
5 10 0.058136624
6 9 0.161490623
7 8 0.266953888
8 7 0.266953888
9 6 0.161490623
10 5 0.058136624
11 4 0.012011699
12 3 0.001334633
13 2 7.10751E-05
14 1 1.45051E-06
15 0 6.44673E-09
8
For instance, if a subject makes 5 mistakes (10 correct), the probability is calculated as such,
Basically, there are 15 successes of the water actually being of a certain type and out of those 15
samples, there are 30 samples of water. To put it differently, there are 15 instances of tap water
in the sample of 30 waters which explains the denominator. Furthermore, in this example, out of
15 samples of tap water, 10 were correctly identified which indicates that 5 samples that were
classified as tap water were, in reality, part of the 15 samples of bottled water. And simply, the
computation of 10 being correctly identified (or making 5 mistakes, both would have identical
probabilities) is done by substituting x by 10.
Data:
The experimental results were recorded with a preconception that the marginals of the 2x2
contingency tables would remain constant. In the first experimental design, there is an
approximately constant temperature between the tap water and the bottled water. This
experimental design has 30 samples of both subjects. Then, the experimental design 2 does not
have a constant variable of temperature and, once again, has 30 samples of each subject. Both
experimental designs were recorded since the tap water had acquired a particular taste when it
was refrigerated in the same place as the bottled water for 24 hours. Therefore, to avoid skewed
results, a second, similar experiment was done.
9
Counts of Water Correctly Identified as Tap Water with Approximate Temperature Constant
Subject A
Subject
says
WATER
Tap Bottled
Tap 10 5 15
Bottled 5 10 15
15 15 30
It is observed from the table that Subject A has correctly identified 10 out of 15 of the water
samples. As we refer to Table 1, the p-value of making five errors is 0.071555 (p-
value=0.058136624+0.012011699+0.001334633+7.10751E-05+1.45051E-06+6.44673E-09
=0.071555), which is the probability of making 5 mistakes or less computed by the sum of each
probability leading to five errors. Considering that the significance level is 0.05 and the p-value
obtained is greater than this, the null hypothesis of the subject being unable to identify tap water
from bottled water is failed to be rejected. Nonetheless, the p-value is very near the critical value
so it is best to say that Subject A is marginally competent at differentiating tap water from
bottled water.
Subject B
Subject
says
WATER
Tap Bottled
Tap 12 3 15
Bottled 3 12 15
15 15 30
Subject B has correctly identified 12 samples which results in a p-value of 0.001407. This really
small p-value leads to the rejection of the null hypothesis under 0.05 level of significance. This
implies that, with constant temperature, Subject B has shown expertise in differentiating tap
water from bottled water.
10
Counts of Water Correctly Identified as Tap Water with Without Temperature Constant
Subject A
Subject
says
WATER
Tap Bottled
Tap 11 4 15
Bottled 4 11 15
15 15 30
Without temperature as a controlled variable, Subject A has only done 4 mistakes leading to a
rejection of the null hypothesis under 0.05 level of significance due to the p-value being
0.013419. It can, therefore, be said that the subject shows expertise in correctly identifying tap
water and bottled water.
Subject B
Subject
says
WATER
Tap Bottled
Tap 10 5 15
Bottled 5 10 15
15 15 30
With the second experimental design, the p-value of making five errors, which Subject B has
done, is 0.071555. This p-value is exceeding α=0.05, and, therefore, the null hypothesis of the
subject being able to distinguish tap water from bottled water is failed to be rejected.
This experiment has made a connection between temperature of the waters and differentiation of
the water type. In this experiment, Subject A had better results when there was not a temperature
constant, whereas Subject B had a better skill at identifying the type of water by taste when the
temperature was constant. Nonetheless, in both cases, it can be concluded with leniency that the
subjects were able to distinguish the bottled water from the tap water. The probabilities suggest
11
that it is uncommon to guess all of the bottles and get as many correct as both subjects, A and B,
have. Subjects A and B had at most 5 mistakes with any experimental design and, even though
the p-value associated to 5 mistakes is over 0.05, the value of 0.071555 is very close. Due to this,
we reject our null hypothesis of the subjects not being able to discriminate between tap and
bottled water and, therefore, accept our alternative hypothesis of the subjects having a skill
enabling them to differentiate tap from bottled water.
Approximate Test
It has been established earlier that for this experiment, a normal approximation can be computed,
especially for large samples. Though, the sample size of this experiment does not guarantee a
reliable approximation, it will be tested. Previously, using an arbitrary 2x2 contingency table
with marginals a+b, a +c, b+d, c+d, it was determined that
Since we are taking into account that a+b=a+c=b+d=c+d=15 and N=30 in this experiment, we
can conclude, using the approximate test, that
Therefore, knowing these values, we can deduce that
12
To compute the Z-value with the data recorded from the experiment, the continuity correction
must be applied. With this in mind, 10 correct “guesses” will be corrected to 9.5, 11 correct
“guesses” will be corrected to 10.5 and 12 correct “guesses” would be corrected to 11.5.
The probability associated with this Z-score is 0.92784122 which implies that the probability of
making at most 5 mistakes is normally approximated to 0.07215878. The true value of the p-
value found by using the hypergeometric distribution through Fisher’s exact test is 0.071555.
Both probabilities are very close especially considering that when the probability is rounded to
the nearest thousandth, the values are identical.
The probability associated with this Z-score is 0.98573063 which implies that the p-value of
making 4 mistakes is normally approximated to 0.01426937. The true value of the p-value is
0.013419. The values resemble each other and the same conclusion of rejecting the null
hypothesis will be deduced.
The probability associated with this Z-score is 0.99824816 which implies that the probability of
making 3 mistakes or less is normally approximated to 0.00175184. The p-value computed using
the hypergeometric distribution was 0.001407. These two p-values are further apart from one
another compared to the previous ones, but, once again, the same conclusion is reached
regarding the hypothesis testing.
13
Conclusion
The Fisher exact test was administered in this experiment to observe different subjects’ claim of
sensory perception of water type. In this experiment, two subjects, A and B, were tested by
tasting 30 samples of water in experiment design 1 which consisted of a temperature constant,
and 30 samples of water in experiment design 2 which did not have the temperature variable
controlled. After tasting each sample, the subjects asserted the type of water they believed it was;
tap or bottled. Conclusions from the experiment were deduced from the results by having them
be observed through Fisher’s exact test with the use of the hypergeometric distribution,
hypothesis testing, as well as normal approximation. It was detected that the approximate p-
values yielded similar results from the ones obtained with the hypergeometric distribution.
Furthermore, due to the standard significance level of 0.05, one subject failed to reject the null
hypothesis in each experimental design. Nonetheless, it could be leniently concluded that the
subjects were able to differentiate the tap water from the bottled water since, in the first
experimental design, Subject B showed expertise whereas Subject A did not, and the opposite
occurred in the second experimental design. Also, having had chosen a higher level of
significance such as 0.10, all subjects would have proved their expertise. Therefore, subjectively,
experimental results show the subjects’ ability to differentiate tap water from bottled water, yet
mathematically, using a significance level of 0.05, the expertise varies between subjects and the
controlled temperature constant.
14
References
[Enc] Editors of Encyclopaedia. “Sir Ronald Aylmer Fisher”. Encyclopaedia Britannica.
Encyclopaedia Britannica, n.d.. Web. 20 Dec. 2013.
[McD09] McDonald, J.H. Handbook of Biological Statistics (2nd ed.). Baltimore: Sparky House
Publishing, 2009. 70-75. Print.
[MR11] Montgomery, Douglas C., Runger, George C.. Applied Statistica and Probability for
Engineers (5th
ed.). United States: John Wiley & Sons, 2011. Print.
[NP00] Nolan, Deborah, Speed, Terry. “Can She Taste the Difference.” Stats Labs
Mathematical Statistics Through Applications. New York: Springer-Volad, 2000. 99-117. Print.