(8b) grouped data_central tendency and dispersion

Upload: asclabisb

Post on 14-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    1/17

    Applied Statistics and Computing Lab

    CENTRAL TENDENCY AND DISPERSION:For grouped data

    Applied Statistics and Computing Lab

    Indian School of Business

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    2/17

    Applied Statistics and Computing Lab

    Learning goals

    Understanding data with class intervals

    Learning to evaluate various measures ofcentral tendency and dispersion, for grouped

    data

    2

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    3/17

    Applied Statistics and Computing Lab

    Introduction We studied measures of central tendency and

    dispersion for discrete data The data was represented in form of a list

    How do we deal with data with class intervals?

    Can we find a value that represents a given classinterval?

    Class intervals could emerge from both discrete as well

    as continuous data

    We would look at a dataset consisting of Nobservations, distributed across n classes

    3

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    4/17

    Applied Statistics and Computing Lab

    Class mark Class mark is the midpoint of a class interval

    Calculated as the arithmetic mean of the class limits E.g. if we are looking at the number of students whose scores lie

    between 60 and 70, (60 is the lower limit and 70 is the upper limit)6 0 + 7 0

    2= 65

    is the class mark or the midpoint of the class interval 60-70

    Class mark cannot be determined for a data with open classes

    (intervals indicated by open bracket on either sides)

    In case of overlapping classes (where the upper limit of a class and

    the lower limit of the next one are equal), we assign that

    overlapping value to that class where the value is the lower limit

    4

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    5/17

    Applied Statistics and Computing Lab

    Cumulative frequency Cumulative frequency is the frequency of values up to the upper limit of the

    corresponding class interval

    For the class, denote its frequency as , cumulative frequency as and classmark as

    Class # Class interval Frequency Cumulative frequency Class mark

    1 =

    =

    +

    22 = +

    = +

    2

    3 = + + =

    +

    2

    i = + ++ =

    +

    2

    n = + + ++ = = +

    2

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    6/17

    Applied Statistics and Computing Lab

    Example Weights from the body measurement data used earlier

    Weight values are given up to one decimal point

    6

    Class interval Frequency Cumulative frequency Class mark

    40-49.9 27 27 44.95

    50-59.9 124 151 54.95

    60-69.9 120 271 64.95

    70-79.9 115 386 74.95

    80-89.9 87 473 84.95

    90-99.9 25 498 94.95

    100-109.9 8 506 104.95

    110-119.9 1 507 114.95

    Total 507 - -

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    7/17

    Applied Statistics and Computing Lab

    Summation of all values In ungrouped data set, the value of each observation is

    considered

    In grouped data, that is not possible How can we account for all the values of a dataset? As the class mark or the midpoint is considered to

    represent every value belonging to that particular classinterval, that value holds to be a proxy for all the values

    Can repeat the class mark as many times as thenumber of values belonging to that class interval;nothing but the frequency

    Hence, =

    7

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    8/17

    Applied Statistics and Computing Lab

    Means For data consisting of N observations distributed across n distinct

    class intervals, =

    = =

    =

    =

    For the weights data, Arithmetic mean = 69.15

    Geometric mean = 67.88

    Harmonic mean = 66.64

    8

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    9/17

    Applied Statistics and Computing Lab

    Median How do we determine the value that has 50% of the data on each of its

    two sides?

    Initially we can at least determine the class interval in which the valuewould lie, the median class

    Let be the upper limit and be the lower limit of the median class

    Let indicate the frequency of the median class and indicate thecumulative frequency of the class preceding the median class, then

    = +( )(

    2 )

    This is obtained under the assumption that cumulative frequencyincreases from every class to another

    9

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    10/17

    Applied Statistics and Computing Lab

    Median (contd.) Total 507 observations

    = 253.5 observation splits the data into 2 equal halves

    10

    Class interval Frequency Cumulativefrequency

    40-49.9 27 27

    50-59.9 124 151

    60-69.9 120 271

    70-79.9 115 386

    80-89.9 87 473

    90-99.9 25 498

    100-109.9 8 506

    110-119.9 1 507

    Total 507 -

    Median class, as the

    253.5th observation

    would lie in thisinterval

    = +

    ( )(2

    )

    = 60 +

    (69.9 60)(507

    2 151)

    120 = 68.46

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    11/17

    Applied Statistics and Computing Lab

    Quantiles Suppose k is the number of quantiles

    k=4 for quartiles, k=10 for deciles and k=100 for percentiles

    The quantile is the (

    ) value of the data

    Must note that (

    ) is not the numerical value of the quantile, it is only the position corresponding

    to the quantile when the data is organised in an ascending order

    For median i.e. the 2nd quartile, it was the (

    )= (

    ) value

    Using cumulative frequencies, we can then determine the class to which the given quantile belongs

    As per the notations used earlier,

    = + [

    ]

    = 4 and = 2 gives the median

    = = { +

    34

    } { +

    14

    }

    Where, the s and s refer to the lower and upper limits of the corresponding quantile classes

    11

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    12/17

    Applied Statistics and Computing Lab

    Mode Can easily identify the class interval with the highest frequency; the

    modal class

    How do we determine the value which has the highest density? Formula given by:

    = + [

    ] where;

    u

    12

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    13/17

    Applied Statistics and Computing Lab13

    Mode (contd.)

    13

    Class interval Frequency

    40-49.9 27

    50-59.9 124

    60-69.9 120

    70-79.9 115

    80-89.9 87

    90-99.9 25

    100-109.9 8

    110-119.9 1

    Total 507

    Modal class, class

    interval with the

    highest frequency

    = +

    + = 50 + 59.9 50

    124 27

    124 27 + 124 120= 59.51

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    14/17

    Applied Statistics and Computing Lab

    Absolute deviations

    For data consisting of N observations

    distributed across n distinct class intervals,

    =

    =

    where, is the class mark

    14

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    15/17

    Applied Statistics and Computing Lab

    Central moments For grouped data consisting of N observations

    distributed across n distinct class intervals, = =

    ( )

    where, is the class mark

    = = ()

    = Coefficient of skewness and kurtosis can be

    calculated accordingly

    15

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    16/17

    Applied Statistics and Computing Lab

    Conclusion We can verify that the values obtained with the

    formulae for grouped data, are very close to the valuesobtained by considering the data as ungrouped

    In many situation, describing data using class intervalsis more insightful

    Therefore these formulae can be useful for quick handcalculation In this age of extensive computational power, these

    measures can be calculated without dividing the data

    into class intervals Yet, these formulae are important from theoretical

    point of view

    16

  • 7/30/2019 (8b) Grouped Data_central Tendency and Dispersion

    17/17

    Applied Statistics and Computing Lab

    Thank you