(11) notched and variable width box-plots

Upload: asclabisb

Post on 14-Apr-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    1/16

    Applied Statistics and Computing Lab

    NOTCHED AND VARIABLE WIDTH BOX-PLOT

    Applied Statistics and Computing Lab

    Indian School of Business

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    2/16

    Applied Statistics and Computing Lab

    Learning goals What is a notched box-plot?

    How does one construct such a plot? What is a variable width box-plot?

    How does one construct such a plot?

    How are they useful? Can one combine these two features of a box-plot?

    How does one construct a box-plot for data with factors?

    What is its use?

    2

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    3/16

    Applied Statistics and Computing Lab

    Dataset For this study of Notched and Variable Width box-plots, we

    consider a slightly modified version of the scores dataset Suppose the score record is blank for some students, during

    some or all the exams

    The student could have been absent for the exam

    There could be a data entry error

    In either case, we do not have 50 scores for each of the exams

    3

    Variable name First minor Second minor Third minor First semester

    GPA

    Second

    semester

    scores

    # of

    observations

    available

    48 45 48 47 47

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    4/16

    Applied Statistics and Computing Lab

    Notched Box-plots As per Oxford Advanced Learners Dictionary, one of the meanings

    of notch is a V-shaped cut in an edge or a surface.

    This is used to test whether two or more population medians are

    equal at 5% level

    In a notched box-plot, a notch appears on either side of the

    median. The interval corresponding to the notch is the confidenceinterval for the population median

    If the notches of the box-plots of variables in the same frame do

    not overlap, then we conclude that the population medians are

    different (using a test at 5% level of significance)

    4

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    5/16

    Applied Statistics and Computing Lab

    Notched box-plots (contd.)

    5

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    6/16

    Applied Statistics and Computing Lab

    Width of the box If there is only one batch (variable), the width can be arbitrary.

    If there are several batches, each having the same number ofobservations, then again the width can be the same for all the

    variables.

    If there are several batches with varying numbers of observations, it

    is desirable that the Box-plots produced in the same frame exhibitthis information.

    This can be done using varwidth option in R

    When this option is used, the width of each box is proportional

    to the square root of the number of observations

    6

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    7/16

    Applied Statistics and Computing Lab

    Variable width box-plot

    7

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    8/16

    Applied Statistics and Computing Lab

    What if we combine the features of notches

    and variable width, to make a variable width

    notched box-plot?

    8

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    9/16

    Applied Statistics and Computing Lab

    Variable width Notched Box-plot

    9

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    10/16

    Applied Statistics and Computing Lab

    Comments on the Box-plot

    Earlier we remarked that the medians of the three minors

    appear to be close. From the preceding plot, it is clear thatthe notch of First.minor does not overlap with those of the

    other two

    Thus the earlier belief is refuted

    The upper end of the notch of the Box-plot of Second.minor

    barely coincides with the lower end of the notch of

    Third.minor

    Thus it cannot be said that the medians of the minors atpopulation level are the same

    10

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    11/16

    Applied Statistics and Computing Lab

    Box-plot for data with factors

    Sometimes we have data on a batch with factors

    Research has shown that in the fast-paced world of electronics, thekey factor that separates the winners from the losers is actually

    how slowa firm is in making decisions: The most successful firms

    take longer to arrive at strategic decisions on product development,

    adopting new technologies, or developing new products The following values are the number of months taken to arrive at a

    decision, for firms ranked high, medium and low in terms of

    Performance:

    11

    High 3.5 4.8 3 6.5 7.5 8 2 6 5.5 6.5 7 9 5 10 6

    Medium 3 5.5 6 4 4 4.5 6 2 9 4.5 5 2.5 7

    Low 1 2.5 2 1.5 1.5 6 3.8 4.5 0.5 2 3.5 1 2

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    12/16

    Applied Statistics and Computing Lab

    Box-plot for data with factors (contd.)

    Notice that in such cases, typically one does analysis of

    variance to test the equality of means. Here, the batch is thedata on the number of months taken to arrive at a decision

    and the factor is the performance: high, medium and low

    In such cases one can use a variable width notched Box-plot

    to examine the equality of medians. This can be usedindependently or in conjunction with the analysis of variance

    in arriving at meaningful conclusions on the location behavior

    of different factors

    12

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    13/16

    Applied Statistics and Computing Lab

    Box-plot for data with factors (contd.)

    13

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    14/16

    Applied Statistics and Computing Lab

    Comments on the Box-plot

    From the plot it is clear that the medians in the population are

    most unlikely to be equal

    For the Box-plot for high performance, the notch is within thefirst and third quartiles. However, for the plots corresponding

    to low and medium performances, the lower end of the notch

    is below the first quartile. Thus the population median could

    fall below the observed first quartile in these two cases

    It is also worth noting that the sampling variability of the

    median (as observed by the length of the notch) is about the

    same for the three factors (performance groups).

    14

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    15/16

    Applied Statistics and Computing Lab

    R-codesPlot R-code

    Notched box-plot boxplot(data name, notch=TRUE)

    Variable width box-plot install.packages(aplpack)

    library(aplpack)

    boxplot(data name, varwidth=TRUE)

    Variable width notched boxplot boxplot(data name, varwidth=TRUE,

    notch=TRUE)

    Box-plot for data with factors Boxplot(numeric variable~factor

    variable, varwidth=TRUE, notch=TRUE)

    15

  • 7/30/2019 (11) Notched and Variable Width Box-Plots

    16/16

    Applied Statistics and Computing Lab

    Thank you