descriptive statistics - lesson 2 - ryan safnerryansafner.com/teaching/ecmg212s2017/2. descriptive...

Post on 29-Jun-2018

233 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Descriptive StatisticsLesson 2

Ryan Safner1

1Department of EconomicsHood College

ECMG 212 - Statistics for Business and EconomicsSpring 2017

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 1 / 95

Lesson Plan

1 Describing Categorical Data

2 Describing Quantitative Data

Measures of Central TendencyMeasures of Locating DataMeasures of Spread

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 2 / 95

Variables and Distributions

All variables have a distribution of different individual values (andhow often it takes on these values)

We often want to display this distribution in a useful way to searchfor interesting patterns

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 3 / 95

Frequency Tables

A frequency table organizes data by recording counts or relativefrequencies for categories

Count: the total number of occurrences for a category

Relative frequency: the proportion or percentage of a categoryoccurring relative to all categories

RF (%) =Count of Category

Total Count(×100%)

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 4 / 95

Frequency Tables

Example

The ads that air during the Super Bowl are very expensive: a 30-second adduring the 2013 Super Bowl cost about $4M. Polls often ask whetherrespondents are more interested in the game or the commercials. Here are40 responses from one such poll:

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 5 / 95

Frequency Tables

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 6 / 95

Frequency Tables

Response Count Percentage

Commercials 8 20%Game 18 45%Won’t Watch 12 30%No Answer/Don’t Know 2 5%

Total 40 100%

Responses to Survey on Super Bowl

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 7 / 95

Displaying Data

Three rules of data analysis:

1 Make a graph2 Make a graph3 Make a graph

The Area principle: the area occupied by a part of the graph shouldcorrespond to the magnitude of the value it represents

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 8 / 95

Displaying Data

Three rules of data analysis:1 Make a graph

2 Make a graph3 Make a graph

The Area principle: the area occupied by a part of the graph shouldcorrespond to the magnitude of the value it represents

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 8 / 95

Displaying Data

Three rules of data analysis:1 Make a graph2 Make a graph

3 Make a graph

The Area principle: the area occupied by a part of the graph shouldcorrespond to the magnitude of the value it represents

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 8 / 95

Displaying Data

Three rules of data analysis:1 Make a graph2 Make a graph3 Make a graph

The Area principle: the area occupied by a part of the graph shouldcorrespond to the magnitude of the value it represents

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 8 / 95

Displaying Data

Three rules of data analysis:1 Make a graph2 Make a graph3 Make a graph

The Area principle: the area occupied by a part of the graph shouldcorrespond to the magnitude of the value it represents

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 8 / 95

Pie Graph

A pie graph represents categories as wedges in a circle proportional tothe relative frequency of that category

Wedges can be counts...

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 9 / 95

Pie Graph

A pie graph represents categories as wedges in a circle proportional tothe relative frequency of that category

Wedges can be counts...or relative frequencies

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 10 / 95

Bar Graph

A bar graph represents categories as bars with lengths proportional tothe relative frequency of that category

Bars can be counts...

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 11 / 95

Bar Graph

A bar graph represents categories as bars with lengths proportional tothe relative frequency of that category

Bars can be counts...or relative frequencies

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 12 / 95

Categorical Data and Graphs

Pie graphs and bar graphs are only valid for categorical data!

Can only represent counts or frequencies of different categories

Make sure that categories do not overlap – misleading

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 13 / 95

Comparing Two Variables: Contingency Tables

We can see how two categorical variables are related with acontingency table

Shows how individuals are distributed along each variable dependingon the value of the other variable

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 14 / 95

Comparing Two Variables: Contingency Tables

Example

Sex

Response Female Male Total

Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8

Total 516 492 1008

Each cell in a table gives the count for the combination of values ofboth variables

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 15 / 95

Comparing Two Variables: Contingency Tables

Example

Sex

Response Female Male Total

Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8

Total 516 492 1008

Each cell in a table gives the count for the combination of values ofboth variables

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 15 / 95

Comparing Two Variables: Contingency Tables

Example

Sex

Response Female Male Total

Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8

Total 516 492 1008

Marginal distribution of a variable is the distribution of total count ofthat variable’s values alone

Focuses on the margins (in bold) of the table

Marginal distribution of ResponseMarginal distribution of Sex

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 16 / 95

Comparing Two Variables: Contingency Tables

Example

Sex

Response Female Male Total

Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8

Total 516 492 1008

Marginal distribution of a variable is the distribution of total count ofthat variable’s values alone

Focuses on the margins (in bold) of the tableMarginal distribution of Response

Marginal distribution of Sex

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 16 / 95

Comparing Two Variables: Contingency Tables

Example

Sex

Response Female Male Total

Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8

Total 516 492 1008

Marginal distribution of a variable is the distribution of total count ofthat variable’s values alone

Focuses on the margins (in bold) of the tableMarginal distribution of ResponseMarginal distribution of Sex

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 16 / 95

Comparing Two Variables: Contingency Tables

Example

Sex

Response Female Male Total

Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8

Total 516 492 1008

Conditional distribution of a variable is the distribution of values avariable takes conditional on another variable taking on a specificvalue

Conditional distribution of responses for femalesConditional distribution of sex for non-watchers

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 17 / 95

Comparing Two Variables: Contingency Tables

Example

Sex

Response Female Male Total

Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8

Total 516 492 1008

Conditional distribution of a variable is the distribution of values avariable takes conditional on another variable taking on a specificvalue

Conditional distribution of responses for females

Conditional distribution of sex for non-watchers

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 17 / 95

Comparing Two Variables: Contingency Tables

Example

Sex

Response Female Male Total

Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8

Total 516 492 1008

Conditional distribution of a variable is the distribution of values avariable takes conditional on another variable taking on a specificvalue

Conditional distribution of responses for femalesConditional distribution of sex for non-watchers

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 17 / 95

Comparing Two Variables: Contingency Tables

Example

Sex

Response Female Male Total

Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8

Total 516 492 1008

277 men plan to watch the game, what percentage is this?

Column percent vs. row percent vs. total percent

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 18 / 95

Comparing Two Variables: Contingency Tables

Example

Sex

Response Female Male Total

Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8

Total 516 492 1008

277 men plan to watch the game, what percentage is this?

Column percent vs. row percent vs. total percent

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 18 / 95

Comparing Two Variables: Contingency Tables

Example

Sex

Response Female Male Total

Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8

Total 516 492 1008

What percent of respondents are men who will watch the game?

What percent of women plan to watch for the commercials?

What percent of those who won’t watch are men?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 19 / 95

Comparing Two Variables: Contingency Tables

Example

Sex

Response Female Male Total

Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8

Total 516 492 1008

What percent of respondents are men who will watch the game?

What percent of women plan to watch for the commercials?

What percent of those who won’t watch are men?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 19 / 95

Comparing Two Variables: Contingency Tables

Example

Sex

Response Female Male Total

Game 198 277 475Commercials 154 79 233Won’t Watch 160 132 292NA/Don’t Know 4 4 8

Total 516 492 1008

What percent of respondents are men who will watch the game?

What percent of women plan to watch for the commercials?

What percent of those who won’t watch are men?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 19 / 95

Comparing Two Variables

Is there an association between the response to the survey and therespondent’s sex, or are the two independent?

Find the conditional distribution of responses by sex, and make agraph

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 20 / 95

Comparing Two Variables

A clustered bar chart allows us to compare the two distributions sideby side

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 21 / 95

Comparing Two Variables

A segmented bar chart shows the responses by sex

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 22 / 95

Simpson’s Paradox

Caution

Comparing percentages across different values or groups can lead tomisleading results – Simpson’s Paradox

Example

Suppose it’s the last inning of a baseball game, your team is down by 1with the bases loaded and 2 outs. The pitcher is due up, so you’ll besending in a pinch-hitter. There are 2 batters available on the bench.Whom should you send in to bat?

Player Overall

A 33 for 103B 45 for 151

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 23 / 95

Simpson’s Paradox

Caution

Comparing percentages across different values or groups can lead tomisleading results – Simpson’s Paradox

Example

Suppose it’s the last inning of a baseball game, your team is down by 1with the bases loaded and 2 outs. The pitcher is due up, so you’ll besending in a pinch-hitter. There are 2 batters available on the bench.Whom should you send in to bat?

Player Overall

A 33 for 103B 45 for 151

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 23 / 95

Simpson’s Paradox

Caution

Comparing percentages across different values or groups can lead tomisleading results – Simpson’s Paradox

Example

Suppose it’s the last inning of a baseball game, your team is down by 1with the bases loaded and 2 outs. The pitcher is due up, so you’ll besending in a pinch-hitter. There are 2 batters available on the bench.Whom should you send in to bat?

Player Overall

A 33 for 103B 45 for 151

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 23 / 95

Simpson’s Paradox

Caution

Comparing percentages across different values or groups can lead tomisleading results – Simpson’s Paradox

Example

Suppose it’s the last inning of a baseball game, your team is down by 1with the bases loaded and 2 outs. The pitcher is due up, so you’ll besending in a pinch-hitter. There are 2 batters available on the bench.Whom should you send in to bat?

Player Overall vs LHP vs RHP

A 33 for 103 28 for 81 5 for 22B 45 for 151 12 for 32 33 for 119

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 24 / 95

Simpson’s Paradox

Example

Two companies have labor and management classifications of employees.Company A’s laborers have a higher average salary than company B’s, asdo Company A’s managers. But overall, company B pays a higher averagesalary. How can that be? And which is the better way to compare earningpotential at the two companies?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 25 / 95

Cautions

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 26 / 95

Cautions

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 27 / 95

Cautions

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 28 / 95

Cautions

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 29 / 95

Cautions

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 30 / 95

Cautions

Open Letter to Kansas School Board

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 31 / 95

Cautions

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 32 / 95

Cautions

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 33 / 95

Cautions

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 34 / 95

Lesson Plan

1 Describing Categorical Data

2 Describing Quantitative Data

Measures of Central TendencyMeasures of Locating DataMeasures of Spread

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 35 / 95

Describing Quantitative Data

Suppose instead we quantitative data

Example

A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 36 / 95

Stem-and-Leaf Plots

A stem-and-leaf plot is a quick way of organizing and displaying data(best for small datasets)

Divide each observation into a stem and a leaf, with the leafcontaining the final significant digit

e.g. For 53, stem 5, leaf 3

e.g. For 413 stem 41, leaf 3

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 37 / 95

Stem-and-Leaf Plots

A stem-and-leaf plot is a quick way of organizing and displaying data(best for small datasets)

Divide each observation into a stem and a leaf, with the leafcontaining the final significant digit

e.g. For 53, stem 5, leaf 3e.g. For 413 stem 41, leaf 3

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 37 / 95

Stem-and-Leaf Plots

Example

A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

0 0123456 2 67 1 1 4 6 98 3 6 89 3 5

10

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 38 / 95

Stem-and-Leaf Plots

Example

A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

0 0123456 2 67 1 1 4 6 98 3 6 89 3 5

10

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 38 / 95

Stem-and-Leaf Plots

Example

A sample of residents of Frederick report the distances from their home totheir local supermarket (in miles):{0.5, 1.2, 1.4, 1.4, 1.5, 2.2, 3.7, 4.2, 4.4, 4.4, 8.2}Create a stem-and-leaf plot.

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 39 / 95

Stem-and-Leaf Plots

We can quickly compare two distributions with a side-by-sidestem-and-leaf plot

Example

The stock prices of Apple over 10 days are: {320, 340, 333, 321, 332, 333,351, 329, 301, 339}

The stock prices of Microsoft over 10 days are: {290, 292, 302, 310, 303,299, 301, 319, 319, 307}

29 0 2 91 30 1 2 3 7

31 0 9 90 1 9 32

2 3 3 9 330 341 35

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 40 / 95

Stem-and-Leaf Plots

We can quickly compare two distributions with a side-by-sidestem-and-leaf plot

Example

The stock prices of Apple over 10 days are: {320, 340, 333, 321, 332, 333,351, 329, 301, 339}

The stock prices of Microsoft over 10 days are: {290, 292, 302, 310, 303,299, 301, 319, 319, 307}

29 0 2 91 30 1 2 3 7

31 0 9 90 1 9 32

2 3 3 9 330 341 35

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 40 / 95

Histograms

A more visually-appealing way to present this data is a histogram, thequantitative analogue to a bar graph

We divide up the data into bins of a certain size, and count up thenumber of values falling within those bins, representing these as bars

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 41 / 95

Histograms

Example

A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Quiz Grades

No.

of S

tude

nts

0 20 40 60 80 100

02

4

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 42 / 95

Histograms

Example

A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Note: Excel essentially plots a bar graph by first turning quantitative into categorical data

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 43 / 95

Histograms

Example

A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

We can also make a relative frequency (percentage) histogram

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 44 / 95

Histograms

0 0123456 2 67 1 1 4 6 98 3 6 89 3 5

10

A stem-and-leaf plot is shaped like a sideways histogram

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 45 / 95

Quantitative Distributions: Shape

For distributions of quantitative data, we are often interested in theirshape, particularly:

ModesSymmetrySkewnessCenterSpreadOutliers

Formal definitions for these using probability theory, for now focus onhow a histogram “looks”

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 46 / 95

Mode

The mode of a variable is its most frequent value

A variable can have more than one mode

Example

A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 47 / 95

Mode

The mode of a variable is its most frequent value

A variable can have more than one mode

Example

A class of 13 students takes a quiz out of 100 points with the followingresults: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 47 / 95

Mode

Looking at the distribution (histogram), the modes are the “peaks” ofthe distribution

May be unimodal, bimodal, trimodal, etc.

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 48 / 95

Mode

A distribution that does not have any clear mode is uniform

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 49 / 95

Symmetry

A distribution is symmetric if its distribution looks roughly the sameon either side of the “center”

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 50 / 95

Skewness

The thinner ends of a distribution (far left & far right) are called thetails of the distribution

If one tail stretches farther than the other, the distribution is said tobe skewed in the direction of the longer tail

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 51 / 95

Skewness

The thinner ends of a distribution (far left & far right) are called thetails of the distribution

If one tail stretches farther than the other, the distribution is said tobe skewed in the direction of the longer tail

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 52 / 95

Outliers

An extreme value that does not appear part of the general pattern ofa distribution is an outlier

Note: Excel essentially plots a bar graph by first turning quantitative intocategorical data

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 53 / 95

Outliers

Outliers can strongly affect descriptive statistics about a dataset

Outliers can be the most informative part of the data

Outliers could be the result of errors

Outliers should always be discussed in presentations about data

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 54 / 95

Arithmetic Mean

The natural measure of the center of a population’s distribution is its“average” or arithmetic mean (µ)

µ =x1 + x2 + ...+ xn

n=

1

N

N∑i=1

xi

For N values of variable x ,“mu” is the sum of all individual x values(xi ) from 1 to N, divided by the N number of values

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 55 / 95

Arithmetic Mean

The natural measure of the center of a population’s distribution is its“average” or arithmetic mean (µ)

µ =x1 + x2 + ...+ xn

n=

1

N

N∑i=1

xi

For N values of variable x ,“mu” is the sum of all individual x values(xi ) from 1 to N, divided by the N number of values

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 55 / 95

Arithmetic Mean

When we are dealing with a sample, we compute the sample mean(X̄ )

X̄ =x1 + x2 + ...+ xn

n=

1

n

n∑i=1

xi

For n values of variable x ,“x-bar” is the sum of all individual x values(xi ) from 1 to n, divided by the n number of values

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 56 / 95

Arithmetic Mean

When we are dealing with a sample, we compute the sample mean(X̄ )

X̄ =x1 + x2 + ...+ xn

n=

1

n

n∑i=1

xi

For n values of variable x ,“x-bar” is the sum of all individual x values(xi ) from 1 to n, divided by the n number of values

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 56 / 95

Arithmetic Mean

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Mean: 0+62+66+71+71+74+76+79+83+86+88+93+9513 = 944

13 = 72.61

Note the mean need not be an actual value of the data!

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 57 / 95

Arithmetic Mean

{62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

If we drop the outlier (0):Mean: 62+66+71+71+74+76+79+83+86+88+93+95

12 = 94412 = 78.67

The mean is not robust to outliers!Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 58 / 95

Median

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

The median is the midpoint of the distribution

50% to the left of the median, 50% to the right of the median

Arrange values of data in numerical order

For odd n: median is middle observation

For even n: median is average of two middle observations

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 59 / 95

Median

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

The median is the midpoint of the distribution

50% to the left of the median, 50% to the right of the median

Arrange values of data in numerical order

For odd n: median is middle observation

For even n: median is average of two middle observations

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 59 / 95

Median

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

The median is the midpoint of the distribution

50% to the left of the median, 50% to the right of the median

Arrange values of data in numerical order

For odd n: median is middle observation

For even n: median is average of two middle observations

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 59 / 95

Median

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

The median is the midpoint of the distribution

50% to the left of the median, 50% to the right of the median

Arrange values of data in numerical order

For odd n: median is middle observation

For even n: median is average of two middle observations

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 59 / 95

Median

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

The median is robust to outliers!

{62, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 60 / 95

Mean, Median, & Skewness

{1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 7}

For a symmetric distribution, mean=median

Mean: 6416 = 4

Median: 4

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 61 / 95

Mean, Median, & Skewness

{1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 7}

For a symmetric distribution, mean=medianMean: 64

16 = 4

Median: 4

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 61 / 95

Mean, Median, & Skewness

{1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 7}

For a symmetric distribution, mean=medianMean: 64

16 = 4Median: 4

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 61 / 95

Mean, Median, & Skewness

{1, 2, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, 7}

For a distribution skewed to the left, mean<median

Mean: 6013 = 4.6

Median: 5

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 62 / 95

Mean, Median, & Skewness

{1, 2, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, 7}

For a distribution skewed to the left, mean<medianMean: 60

13 = 4.6

Median: 5

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 62 / 95

Mean, Median, & Skewness

{1, 2, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, 7}

For a distribution skewed to the left, mean<medianMean: 60

13 = 4.6Median: 5

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 62 / 95

Mean, Median, & Skewness

{1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 6, 7}

For a distribution skewed to the right, mean>median

Mean: 4413 = 3.4

Median: 3

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 63 / 95

Mean, Median, & Skewness

{1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 6, 7}

For a distribution skewed to the right, mean>medianMean: 44

13 = 3.4

Median: 3

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 63 / 95

Mean, Median, & Skewness

{1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 6, 7}

For a distribution skewed to the right, mean>medianMean: 44

13 = 3.4Median: 3

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 63 / 95

Mean, Median, & Skewness

Example

A sample of the per capita consumption of gasoline (in gallons) for 10U.S. States in the year 2017 are given below:{556, 560, 537, 409, 530, 485, 521, 486, 504, 434}

1 Find the mean

2 Find the median

3 Is this distribution symmetric, skewed to the left, or skewed to theright?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 64 / 95

Mean, Median, & Skewness

Example

A sample of the per capita consumption of gasoline (in gallons) for 10U.S. States in the year 2017 are given below:{556, 560, 537, 409, 530, 485, 521, 486, 504, 434}

1 Find the mean

2 Find the median

3 Is this distribution symmetric, skewed to the left, or skewed to theright?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 64 / 95

Mean, Median, & Skewness

Example

A sample of the per capita consumption of gasoline (in gallons) for 10U.S. States in the year 2017 are given below:{556, 560, 537, 409, 530, 485, 521, 486, 504, 434}

1 Find the mean

2 Find the median

3 Is this distribution symmetric, skewed to the left, or skewed to theright?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 64 / 95

Mean, Median, & Skewness

Example

A sample of the GDP growth rate for 11 developed countries in the year2017 are given below:{0.05, 0.03, 0.02, 0.01, 0.00, 0.09, 0.11, 0.02, 0.03, 0.04, 0.01}

1 Find the mean

2 Find the median

3 Is this distribution symmetric, skewed to the left, or skewed to theright?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 65 / 95

Mean, Median, & Skewness

Example

A sample of the GDP growth rate for 11 developed countries in the year2017 are given below:{0.05, 0.03, 0.02, 0.01, 0.00, 0.09, 0.11, 0.02, 0.03, 0.04, 0.01}

1 Find the mean

2 Find the median

3 Is this distribution symmetric, skewed to the left, or skewed to theright?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 65 / 95

Mean, Median, & Skewness

Example

A sample of the GDP growth rate for 11 developed countries in the year2017 are given below:{0.05, 0.03, 0.02, 0.01, 0.00, 0.09, 0.11, 0.02, 0.03, 0.04, 0.01}

1 Find the mean

2 Find the median

3 Is this distribution symmetric, skewed to the left, or skewed to theright?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 65 / 95

Percentiles

We often care about specific values in the distribution and how theyrelate to the rest of the distribution

A helpful measure for a data value’s local is its percentile, measuringthe percentage of all data that is less than (or equal to) that value

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 66 / 95

Percentiles

To calculate the kth percentile, after ordering the data in numericalorder, calculate:

i =k

100(n + 1)

Where i is the index (rank or position) of the value & n is the totalnumber of observations

If i comes out to a whole number, the answer is that position

If i is not an integer, round up and round down, and take the averageof those positions in the data

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 67 / 95

Percentiles

Example

The following are a sample of 20 SAT Math scores: {570, 575, 580, 590,620, 635, 640, 645, 650, 650, 650, 670, 675, 675, 680, 710, 720, 745, 770,780}

1 Find the 20th percentile

2 Find the 84th percentile

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 68 / 95

Percentiles

Example

The following are a sample of 20 SAT Math scores: {570, 575, 580, 590,620, 635, 640, 645, 650, 650, 650, 670, 675, 675, 680, 710, 720, 745, 770,780}

1 Find the 20th percentile

2 Find the 84th percentile

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 68 / 95

Percentiles

To find the percentile of a particular data value, after ordering thedata in numerical order, calculate:

x + 0.5y

n∗ 100 then round to the nearest integer

x is number of data values counting from the first up to the valueright before the chosen value

y is the number of data values equal to the chosen value

n is total number of data

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 69 / 95

Percentiles

Example

The following are a sample of 20 SAT Math scores: {570, 575, 580, 590,620, 635, 640, 645, 650, 650, 650, 670, 675, 675, 680, 710, 720, 745, 770,780}

1 What percentile is a score of 645?

2 What percentile is a score of 675?

3 What percentile is a score of 720?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 70 / 95

Percentiles

Example

The following are a sample of 20 SAT Math scores: {570, 575, 580, 590,620, 635, 640, 645, 650, 650, 650, 670, 675, 675, 680, 710, 720, 745, 770,780}

1 What percentile is a score of 645?

2 What percentile is a score of 675?

3 What percentile is a score of 720?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 70 / 95

Percentiles

Example

The following are a sample of 20 SAT Math scores: {570, 575, 580, 590,620, 635, 640, 645, 650, 650, 650, 670, 675, 675, 680, 710, 720, 745, 770,780}

1 What percentile is a score of 645?

2 What percentile is a score of 675?

3 What percentile is a score of 720?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 70 / 95

Quartiles

We can divide up a distribution into four equal quartiles, eachcomprising a quarter (25%) of the data:

Quartile % of data

1 25%2 50%3 75%4 100%

The 2nd quartile (Q2) is the median

The 1st quartile (Q1) is the median of all the data beneath the medianThe 3rd quartile (Q3) is the median of all the data above the median

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 71 / 95

Quartiles

We can divide up a distribution into four equal quartiles, eachcomprising a quarter (25%) of the data:

Quartile % of data

1 25%2 50%3 75%4 100%

The 2nd quartile (Q2) is the median

The 1st quartile (Q1) is the median of all the data beneath the medianThe 3rd quartile (Q3) is the median of all the data above the median

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 71 / 95

Quartiles

We can divide up a distribution into four equal quartiles, eachcomprising a quarter (25%) of the data:

Quartile % of data

1 25%2 50%3 75%4 100%

The 2nd quartile (Q2) is the median

The 1st quartile (Q1) is the median of all the data beneath the median

The 3rd quartile (Q3) is the median of all the data above the median

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 71 / 95

Quartiles

We can divide up a distribution into four equal quartiles, eachcomprising a quarter (25%) of the data:

Quartile % of data

1 25%2 50%3 75%4 100%

The 2nd quartile (Q2) is the median

The 1st quartile (Q1) is the median of all the data beneath the medianThe 3rd quartile (Q3) is the median of all the data above the median

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 71 / 95

Measures of Spread

The more variation in the data, the less helpful a measure of centraltendency will tell us

So in addition to measuring the center, we also want to measure thespread

The simplest way is looking at the range, or the difference betweenthe extremes:

Range = max −min

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 72 / 95

Measures of Spread

The more variation in the data, the less helpful a measure of centraltendency will tell us

So in addition to measuring the center, we also want to measure thespread

The simplest way is looking at the range, or the difference betweenthe extremes:

Range = max −min

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 72 / 95

Measures of Spread

The more variation in the data, the less helpful a measure of centraltendency will tell us

So in addition to measuring the center, we also want to measure thespread

The simplest way is looking at the range, or the difference betweenthe extremes:

Range = max −min

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 72 / 95

Range

Example

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Range = 95− 0 = 95

Example

{62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Range = 95− 62 = 33

Note that the range is not robust to outliers

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 73 / 95

Range

Example

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Range = 95− 0 = 95

Example

{62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Range = 95− 62 = 33

Note that the range is not robust to outliers

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 73 / 95

Range

Example

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Range = 95− 0 = 95

Example

{62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Range = 95− 62 = 33

Note that the range is not robust to outliers

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 73 / 95

Range

Example

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Range = 95− 0 = 95

Example

{62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Range = 95− 62 = 33

Note that the range is not robust to outliers

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 73 / 95

Range

Example

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Range = 95− 0 = 95

Example

{62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Range = 95− 62 = 33

Note that the range is not robust to outliers

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 73 / 95

Interquartile Range

One helpful measure of spread is the interquartile range, the middle50%:

IQR = Q3 − Q1

Example

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Median = 76

Q1 = 71

Q3 = 86

IQR = 86− 71 = 15

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 74 / 95

Interquartile Range

One helpful measure of spread is the interquartile range, the middle50%:

IQR = Q3 − Q1

Example

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Median = 76

Q1 = 71

Q3 = 86

IQR = 86− 71 = 15

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 74 / 95

Interquartile Range

One helpful measure of spread is the interquartile range, the middle50%:

IQR = Q3 − Q1

Example

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Median = 76

Q1 = 71

Q3 = 86

IQR = 86− 71 = 15

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 74 / 95

Interquartile Range

One helpful measure of spread is the interquartile range, the middle50%:

IQR = Q3 − Q1

Example

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Median = 76

Q1 = 71

Q3 = 86

IQR = 86− 71 = 15

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 74 / 95

Five-Number Summary

Once we know the values of the quartiles, we can construct afive-number summary of a distribution, including:

1 Minimum2 Q13 Median4 Q35 Maximum

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 75 / 95

Five-Number Summary

Example

{0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}

Min Q1 Median Q3 Max

0 71 76 86 95

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 76 / 95

Boxplots

The graphical way to present the five number summary is a boxplot(or a “box-and-whisker plot”)

The length of the box isthe IQR (Q1-Q3)

The line within the box isthe median

The “whiskers” identifydata within 1.5× IQR

Points beyond thewhiskers are outliers

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 77 / 95

Boxplots

The graphical way to present the five number summary is a boxplot(or a “box-and-whisker plot”)

The length of the box isthe IQR (Q1-Q3)

The line within the box isthe median

The “whiskers” identifydata within 1.5× IQR

Points beyond thewhiskers are outliers

Q3

Q1

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 77 / 95

Boxplots

The graphical way to present the five number summary is a boxplot(or a “box-and-whisker plot”)

The length of the box isthe IQR (Q1-Q3)

The line within the box isthe median

The “whiskers” identifydata within 1.5× IQR

Points beyond thewhiskers are outliers

Q3

Q1

Median

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 77 / 95

Boxplots

The graphical way to present the five number summary is a boxplot(or a “box-and-whisker plot”)

The length of the box isthe IQR (Q1-Q3)

The line within the box isthe median

The “whiskers” identifydata within 1.5× IQR

Points beyond thewhiskers are outliers

Q3

Q1

Median

Q3 + 1.5 ∗ IQR

Q1 − 1.5 ∗ IQR

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 77 / 95

Boxplots

The graphical way to present the five number summary is a boxplot(or a “box-and-whisker plot”)

The length of the box isthe IQR (Q1-Q3)

The line within the box isthe median

The “whiskers” identifydata within 1.5× IQR

Points beyond thewhiskers are outliers

Q3

Q1

Median

Q3 + 1.5 ∗ IQR

Q1 − 1.5 ∗ IQR

Outlier

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 77 / 95

Boxplots

Quiz 1: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}Quiz 2: {50, 62, 72, 73, 79, 81, 82, 82, 86, 90, 94, 98, 99}

Quiz 1

Min Q1 Median Q3 Max

0 71 76 86 95

Quiz 2

Min Q1 Median Q3 Max

50 73 82 90 99

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 78 / 95

Boxplots

Quiz 1: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}Quiz 2: {50, 62, 72, 73, 79, 81, 82, 82, 86, 90, 94, 98, 99}

Quiz 1

Min Q1 Median Q3 Max

0 71 76 86 95

Quiz 2

Min Q1 Median Q3 Max

50 73 82 90 99

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 78 / 95

Boxplots

Quiz 1: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}Quiz 2: {50, 62, 72, 73, 79, 81, 82, 82, 86, 90, 94, 98, 99}

Quiz 1

Min Q1 Median Q3 Max

0 71 76 86 95

Quiz 2

Min Q1 Median Q3 Max

50 73 82 90 99

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 78 / 95

Boxplots

Quiz 1: {0, 62, 66, 71, 71, 74, 76, 79, 83, 86, 88, 93, 95}Quiz 2: {50, 62, 72, 73, 79, 81, 82, 82, 86, 90, 94, 98, 99}

●0

25

50

75

100

Quiz 1 Quiz 2Quiz

Sco

res variable

Quiz 1

Quiz 2

Boxplots are great for quickly comparing multiple datasets

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 79 / 95

Boxplots

Boxplots for daily AIG closing stock price

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 80 / 95

Boxplots

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 81 / 95

Boxplots

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 82 / 95

Boxplots

Alternate way of constructing a boxplot: extend “whiskers” from Q1

to Minimum and Q3 to MaximumBut less rigorous way of discovering outliersYour textbook uses this method, as does MS Excel

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 83 / 95

Deviations

Each observation deviates from the mean of the data:

deviation = xi − µ

There are as many deviations as there are data points (n)

We can measure the average or standard deviation from the mean

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 84 / 95

Variance

The population variance (σ2) of a population distribution measuresthe average of the squared deviations from the population mean

σ2 =

N∑i=1

(xi − µ)2

N

Why do we square deviations?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 85 / 95

Variance

The population variance (σ2) of a population distribution measuresthe average of the squared deviations from the population mean

σ2 =

N∑i=1

(xi − µ)2

N

Why do we square deviations?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 85 / 95

Standard Deviation

Square root the variance to get the population standard deviation(σ), the average deviation from the mean (in x units)

σ =√σ2 =

√√√√√√N∑i=1

(xi − µ)2

N

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 86 / 95

Variance

The sample variance (s2) of a sample distribution measures theaverage of the squared deviations from the sample mean

s2 =

n∑i=1

(xi − x̄)2

n − 1

Why divide by n − 1?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 87 / 95

Variance

The sample variance (s2) of a sample distribution measures theaverage of the squared deviations from the sample mean

s2 =

n∑i=1

(xi − x̄)2

n − 1

Why divide by n − 1?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 87 / 95

Standard Deviation

Square root the variance to get the sample standard deviation (s), theaverage deviation from the mean (in x units)

s =√s2 =

√√√√√√n∑

i=1

(xi − x̄)2

n − 1

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 88 / 95

Descriptive Statistics: Population vs. Sample

Population Parameters

Population Size: N

Mean: µ

Variance:

σ2 = 1N

N∑i=1

(xi − µ)2

Standard Deviation:σ =√σ2

Sample Statistics

Sample Size: n

Mean: x̄

Variance:

s2 = 1n−1

n∑i=1

(xi − x̄)2

Standard Deviation:s =√s2

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 89 / 95

Variance & Standard Deviation

Example

{-10, 0, 10, 20, 30}

1 Find the mean: −10+0+10+20+305 = 10

2 Find deviations from mean and square them:

xi xi − x̄ (xi − x̄)2

-10 -20 4000 -10 100

10 0 020 10 10030 20 400

∑0 1000

3 Add them all up

400 + 100 + 0 + 100 + 400 = 1000

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 90 / 95

Variance & Standard Deviation

Example

{-10, 0, 10, 20, 30}

1 Find the mean: −10+0+10+20+305 = 10

2 Find deviations from mean and square them:

xi xi − x̄ (xi − x̄)2

-10 -20 4000 -10 100

10 0 020 10 10030 20 400

∑0 1000

3 Add them all up

400 + 100 + 0 + 100 + 400 = 1000

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 90 / 95

Variance & Standard Deviation

Example

{-10, 0, 10, 20, 30}

1 Find the mean: −10+0+10+20+305 = 10

2 Find deviations from mean and square them:

xi xi − x̄ (xi − x̄)2

-10 -20 4000 -10 100

10 0 020 10 10030 20 400

∑0 1000

3 Add them all up

400 + 100 + 0 + 100 + 400 = 1000

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 90 / 95

Variance & Standard Deviation

Example

{-10, 0, 10, 20, 30}

1 Find the mean: −10+0+10+20+305 = 10

2 Find deviations from mean and square them:

xi xi − x̄ (xi − x̄)2

-10 -20 4000 -10 100

10 0 020 10 10030 20 400∑

0 1000

3 Add them all up

400 + 100 + 0 + 100 + 400 = 1000

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 90 / 95

Variance & Standard Deviation

Example

{-10, 0, 10, 20, 30}

5 Divide by n − 11000

4= 250

6 Square root (for standard deviation):

√250 ≈ 16

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 91 / 95

Variance & Standard Deviation

Example

{-10, 0, 10, 20, 30}

5 Divide by n − 11000

4= 250

6 Square root (for standard deviation):

√250 ≈ 16

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 91 / 95

Variance & Standard Deviation

Example

{8, 9, 10, 11, 12}

1 Find the mean

2 Find the variance

3 Find the standard deviation

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 92 / 95

Standardizing Variables

Sometimes we want to know how far a value is from its mean

We standardize a variable, or calculate its z-score:

Z =x − x̄

s

Z is the number of standard deviations a value is away from its mean(above +, below −)

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 93 / 95

Standardizing Variables

Example

A real estate analyst finds from data on 350 recent sales, that the averageprice was $175,000 with a standard deviation of $55,000. The size of thehouses (in square feet) averaged 2100 sq. ft. with a standard deviation of650 sq. ft.Which is more unusual, a house in this town that costs $340,000, or a5000 sq. ft. house?

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 94 / 95

Descriptive Statistics

Most software programs can easily compute descriptive statistics (e.g.mean, median, quartiles, standard deviation) for us

MS Excel: Descriptive statistics in Data Analysis pack

TI-83+ calculators1 Enter data in L1 : STAT → 1.Edit → input data values in column2 CLEAR → STAT → CALC → 1.1-Var Stats, ENTER → 2nd L1 ENTER

Ryan Safner (Hood College) ECMG 212 - Lesson 2 Fall 2016 95 / 95

top related