mare 250 dr. jason turner descriptive measures. descriptive measures – numbers that are used to...

35
MARE 250 Dr. Jason Turner Descriptive Measures

Upload: alejandra-bradway

Post on 14-Dec-2015

229 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

MARE 250Dr. Jason Turner

Descriptive Measures

Page 2: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Descriptive Measures

Descriptive Measures – numbers that are used to describe datasets

Parts of Descriptive Statistics

Used to summarize raw data

Page 3: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Descriptive Measures

Measures of Center

Measures of Variation – how data are distributed around center

5-number summary – used to construct visual representation - Boxplot

Page 4: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Measures of Center

Measure of Central Tendency – indicate where center or most typical value of data set lie

Mean, Median, Mode

Page 5: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Measures of Center

Mean – of a dataset is the sum of the observations divided by the number of observations; Arithmetic Average

10,20,30,40,50,60,70,80,90,100 = 550

550 / 10 = 55

Page 6: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Measures of Center

Median – the number that divides the bottom 50% of the data from the top 50%

1) Arrange data in increasing order2) If number of observations is ODD, the median is the observation exactly in the middle3) If the number of observations is EVEN, median is the mean of the middle two observations

Page 7: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Measures of Center

Median = (n+1)/2

10,20,30,40,50,60,70,80,90,100, 110(ODD); Median = 60

10,20,30,40,50,60,70,80,90,100(EVEN); Median = 50+60/2 = 55

Page 8: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Measures of Center

Mode – frequency of each value inthe data set

If no value occurs more than once – No Mode; 10,20,30,40,50,60,70,80,90,100

Otherwise – any value with greatest frequency is Mode; 10,20,30,40,50,50, 60,70,80,90,100…Mode is 50

Page 9: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Measures of Center

The mode is useful if the distribution is skewed or bimodal (having two very pronounced values around which data are concentrated)

30

Num

ber

of I

ndiv

idua

ls

0

10

20

Page 10: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

You are so totally skewed!

The mean is sensitive to extreme (very large or small) observations and the median is not

Therefore – you can determine how skewed your data is by looking at the relationship between median and mean

Mean is Greater than the Median

Mean and Median are Equal

Mean is Less Than the Median

Page 11: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Resistance Measures

A resistance measure is not sensitive to the influences of a few extreme observations

Median – resistant measure of centerMean – not resistant

Outliers DO NOT affect Median

Outliers DO affect Mean

Page 12: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Resistance Measures

Resistance of Mean can be improved by using – Trimmed Means – a specified percentage of the smallest and largest observations are removed before computing the mean

Will do something like this later when exploring the data and evaluating outliers…(their effects upon the mean)

Page 13: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Measures of Variation

Measures of Variation (Spread) – amount of variability in the data set

Range, Standard Deviation, Variance

Range = Maximum Observation – Minimum Observation10,20,30,40,50,60,70,80,90,100;Range = 100-10 = 90

Page 14: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Measures of Variation

Standard Deviation - (±SD) measures the variation by indicating how far (on average) the observations are from the mean

Large Dev. – farFrom mean

Small Dev. – Close to mean

Page 15: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Measures of Variation

Variance - (measure used by statistical formulas) square of the standard deviation

“Equal Variance” is one of the assumptions of parametric means testing…(we will learn this later)

Page 16: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Measures of Variation

Three Standard Deviations Rule – almost all observations in any data set lie within three standard deviations to either side of the mean; “almost all” defined in 2-ways by stats nerds…

Page 17: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Measures of Variation

Three Standard Deviations Rule –

Chebychev’s Rule – 89% of data within 3 Standard Deviations

Empirical Rule – 99.7% of observations are within 3 Standard deviations; if data are approximately bell-shaped

Page 18: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

5 Number Summary

Percentiles – data set is divided into hundredths (100 equal parts)

Why?..Percentiles are not sensitive to the influence of a few extreme observations (outliers)

Page 19: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

5 Number Summary

Quartiles – data set is divided into quarters (4 equal parts); most typically used

Data set has 3 Quartiles: Q1, Q2, Q3

Q1 – is the number that divides the bottom 25% from top 75%

Q2 – is the median; bottom 50% from top 50%Q3 – is the number that divides the bottom 75% from top 25%

Page 20: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

5 Number Summary

Quartiles – data set is divided into quarters (4 equal parts); most typically used

Page 21: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

5 Number Summary

Interquartile Range (IQR) – the difference between the first and third quartiles

IQR = Q3 – Q1

The IQR gives you the range of the middle 50% of the data

Page 22: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Outlier, Outlier

Outliers – observations that fall well outside the overall pattern of the data

Requires special attention

May be the result of:Measurement or Recording ErrorObservation from a different populationUnusual Extreme observation

Page 23: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Pants on Fire!

Must deal with outliers: (Yes, really!)

If error – can delete; otherwise judgment call

Can use quartiles and IQR to identify potential outliers

Page 24: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

The Outer Limits

Lower and Upper Limits:Lower limit – is the number that lies 1.5

IQR’s below the first quartile

Lower Limit = Q1 - 1.5 * IQR

Upper limit – is the number that lies 1.5 IQR’s above the first quartile

Upper Limit = Q3 + 1.5 * IQR

Page 25: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

The Outer Limits

If a value is outside the “Outer Limits” of a dataset it is an…

OUTLIER!OUTLIER!

Page 26: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

5 Number Summary

5-Number Summary:Min, Q1, Q2, Q3, Max

Written in increasing order

Provides information on Center and Variation

Are used to construct Box-Plots

Page 27: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Boxplot

Boxplot (Box-and-Whisker-Design): based on the 5-number summary provide graphic display of the center and variation

Q1 Q2 Q3

Min Max

0 70

Page 28: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Boxplot

Potential Outlier

0 70

*

Modified Boxplot – includes outliers

Note that Min & Max are determine after outliers are removed!

Page 29: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Boxplot

Page 30: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Boxplot

Boxplots summarize information about the shape, dispersion, and center of your data

They can also help you spot outliers

Page 31: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Boxplot

Left edge of the box represents the first quartile (Q1), while the right edge represents the third quartile (Q3)Box portion of the plot represents the interquartile range (IQR) - middle 50% of data

Q1 Q2 Q3LowerLimit

UpperLimit

0 70

Page 32: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Boxplot

The line drawn through the box represents the median of the data

The lines extending from the box are called whiskers

The whiskers extend outward to indicate the Upper and Lower limits in the data set (excluding outliers)

Page 33: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Boxplot

Extreme values, or outliers, are represented by dots A value is considered an outlier if it is outside of the box (greater than Q3 or less than Q1) by more than 1.5 times the IQR

0 70

*

Potential Outlier

Page 34: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Boxplot

Use the boxplot to assess the symmetry of the data:

If the data are fairly symmetric, the median line will be roughly in the middle of the IQR box and the whiskers will be similar in length

0 70

Page 35: MARE 250 Dr. Jason Turner Descriptive Measures. Descriptive Measures – numbers that are used to describe datasets Parts of Descriptive Statistics Used

Boxplot

Use the boxplot to assess the symmetry of the data:

If the data are skewed, the median may not fall in the middle of the IQR box, and one whisker will likely be noticeably longer than the other

0 70