revision analysing data. measures of central tendency such as the mean and the median can be used to...

25
Revision Analysing data

Upload: scott-glenn

Post on 13-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Revision

Analysing data

Measures of central tendency such as the mean and the median

can be used to determine the location of the distribution of data

values.

Measures of spread such as the range (minimum - maximum), the

standard deviation, and the variance tell you how spread out

the data is.

In a statistical investigation, you should discuss each variable in

terms of its central tendency and spread.

Another important step in evaluating a set of data is to look

at the overall shape of the distribution of each set of data.

Think about

• SHAPE: Unimodal, bimodal, multimodal,uniformSymmetry, skewnessOutliers, clusters, gaps• CENTER:Mean, median, • SPREAD:Standard deviation, variance, range, interquartile range (IQR)

A good way to portray the shape of the distribution is with a

histogram. You would look for evidence of distinct groupings (e.g. male/female or different

species) and outliers.

No! No! No!- this is not a histogram!

The following are examples of histograms and box and whisker plots of various distributions.

(Note: the box and whisker plot does not relate to the histogram above it. It is just an example of

what it could look like.)

Negatively skewed (unimodal)

Positively skewed

If the shape is skewed:Report the median and IQR. You may want to include the mean and standard

deviation, but you should point out why the mean and median differ. The fact that

the mean and median do not agree is a sign that the distribution may be skewed.

Symmetric

If the shape is symmetric:report the mean and standard

deviation and possibly the median and IQR as well.

Uniform

Groupings (bimodal)

Outlier

If there are any clear outliers and you are reporting the mean and standard

deviation, report them with the outliers present and with outliers removed. The

differences may be revealing. (Of course, the median and IQR are not likely to be

affected by the outliers.)

An outlier can be the most informative part of your data. (Or it might be just an error.)

An example of what to say…

The main body of the distribution is unimodal and nearly symmetric around $500,000, with

slightly more than half of CEOs earning salaries higher than that. But there are some high outliers. The outliers are CEOs whose salaries are higher than what is typical for

most CEOs of large corporations. Even though the vast majority of CEOs have

salaries below $1,000,000 a year, there are a few with salaries between $2,500,000 and

$3,000,000 a year.

unimodal

high outliers

Unimodal

Back

High outliers

Back

Most have salaries below $100,000