(11) notched and variable width box-plots
TRANSCRIPT
-
7/30/2019 (11) Notched and Variable Width Box-Plots
1/16
Applied Statistics and Computing Lab
NOTCHED AND VARIABLE WIDTH BOX-PLOT
Applied Statistics and Computing Lab
Indian School of Business
-
7/30/2019 (11) Notched and Variable Width Box-Plots
2/16
Applied Statistics and Computing Lab
Learning goals What is a notched box-plot?
How does one construct such a plot? What is a variable width box-plot?
How does one construct such a plot?
How are they useful? Can one combine these two features of a box-plot?
How does one construct a box-plot for data with factors?
What is its use?
2
-
7/30/2019 (11) Notched and Variable Width Box-Plots
3/16
Applied Statistics and Computing Lab
Dataset For this study of Notched and Variable Width box-plots, we
consider a slightly modified version of the scores dataset Suppose the score record is blank for some students, during
some or all the exams
The student could have been absent for the exam
There could be a data entry error
In either case, we do not have 50 scores for each of the exams
3
Variable name First minor Second minor Third minor First semester
GPA
Second
semester
scores
# of
observations
available
48 45 48 47 47
-
7/30/2019 (11) Notched and Variable Width Box-Plots
4/16
Applied Statistics and Computing Lab
Notched Box-plots As per Oxford Advanced Learners Dictionary, one of the meanings
of notch is a V-shaped cut in an edge or a surface.
This is used to test whether two or more population medians are
equal at 5% level
In a notched box-plot, a notch appears on either side of the
median. The interval corresponding to the notch is the confidenceinterval for the population median
If the notches of the box-plots of variables in the same frame do
not overlap, then we conclude that the population medians are
different (using a test at 5% level of significance)
4
-
7/30/2019 (11) Notched and Variable Width Box-Plots
5/16
Applied Statistics and Computing Lab
Notched box-plots (contd.)
5
-
7/30/2019 (11) Notched and Variable Width Box-Plots
6/16
Applied Statistics and Computing Lab
Width of the box If there is only one batch (variable), the width can be arbitrary.
If there are several batches, each having the same number ofobservations, then again the width can be the same for all the
variables.
If there are several batches with varying numbers of observations, it
is desirable that the Box-plots produced in the same frame exhibitthis information.
This can be done using varwidth option in R
When this option is used, the width of each box is proportional
to the square root of the number of observations
6
-
7/30/2019 (11) Notched and Variable Width Box-Plots
7/16
Applied Statistics and Computing Lab
Variable width box-plot
7
-
7/30/2019 (11) Notched and Variable Width Box-Plots
8/16
Applied Statistics and Computing Lab
What if we combine the features of notches
and variable width, to make a variable width
notched box-plot?
8
-
7/30/2019 (11) Notched and Variable Width Box-Plots
9/16
Applied Statistics and Computing Lab
Variable width Notched Box-plot
9
-
7/30/2019 (11) Notched and Variable Width Box-Plots
10/16
Applied Statistics and Computing Lab
Comments on the Box-plot
Earlier we remarked that the medians of the three minors
appear to be close. From the preceding plot, it is clear thatthe notch of First.minor does not overlap with those of the
other two
Thus the earlier belief is refuted
The upper end of the notch of the Box-plot of Second.minor
barely coincides with the lower end of the notch of
Third.minor
Thus it cannot be said that the medians of the minors atpopulation level are the same
10
-
7/30/2019 (11) Notched and Variable Width Box-Plots
11/16
Applied Statistics and Computing Lab
Box-plot for data with factors
Sometimes we have data on a batch with factors
Research has shown that in the fast-paced world of electronics, thekey factor that separates the winners from the losers is actually
how slowa firm is in making decisions: The most successful firms
take longer to arrive at strategic decisions on product development,
adopting new technologies, or developing new products The following values are the number of months taken to arrive at a
decision, for firms ranked high, medium and low in terms of
Performance:
11
High 3.5 4.8 3 6.5 7.5 8 2 6 5.5 6.5 7 9 5 10 6
Medium 3 5.5 6 4 4 4.5 6 2 9 4.5 5 2.5 7
Low 1 2.5 2 1.5 1.5 6 3.8 4.5 0.5 2 3.5 1 2
-
7/30/2019 (11) Notched and Variable Width Box-Plots
12/16
Applied Statistics and Computing Lab
Box-plot for data with factors (contd.)
Notice that in such cases, typically one does analysis of
variance to test the equality of means. Here, the batch is thedata on the number of months taken to arrive at a decision
and the factor is the performance: high, medium and low
In such cases one can use a variable width notched Box-plot
to examine the equality of medians. This can be usedindependently or in conjunction with the analysis of variance
in arriving at meaningful conclusions on the location behavior
of different factors
12
-
7/30/2019 (11) Notched and Variable Width Box-Plots
13/16
Applied Statistics and Computing Lab
Box-plot for data with factors (contd.)
13
-
7/30/2019 (11) Notched and Variable Width Box-Plots
14/16
Applied Statistics and Computing Lab
Comments on the Box-plot
From the plot it is clear that the medians in the population are
most unlikely to be equal
For the Box-plot for high performance, the notch is within thefirst and third quartiles. However, for the plots corresponding
to low and medium performances, the lower end of the notch
is below the first quartile. Thus the population median could
fall below the observed first quartile in these two cases
It is also worth noting that the sampling variability of the
median (as observed by the length of the notch) is about the
same for the three factors (performance groups).
14
-
7/30/2019 (11) Notched and Variable Width Box-Plots
15/16
Applied Statistics and Computing Lab
R-codesPlot R-code
Notched box-plot boxplot(data name, notch=TRUE)
Variable width box-plot install.packages(aplpack)
library(aplpack)
boxplot(data name, varwidth=TRUE)
Variable width notched boxplot boxplot(data name, varwidth=TRUE,
notch=TRUE)
Box-plot for data with factors Boxplot(numeric variable~factor
variable, varwidth=TRUE, notch=TRUE)
15
-
7/30/2019 (11) Notched and Variable Width Box-Plots
16/16
Applied Statistics and Computing Lab
Thank you