quantitative variables

33
Quantitative Variables Recall that quantitative variables have units, and are measured on a continuous scale… Examples: income (in $), height (in inches), website popularity (by number if hits)

Upload: malha

Post on 23-Feb-2016

53 views

Category:

Documents


0 download

DESCRIPTION

Quantitative Variables. Recall that quantitative variables have units, and are measured on a continuous scale… Examples: income (in $), height (in inches), website popularity (by number if hits). Quantitative Variables. Mathematical operations on quantitative variables makes sense … - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Quantitative Variables

Quantitative Variables

• Recall that quantitative variables have units, and are measured on a continuous scale…

• Examples: income (in $), height (in inches), website popularity (by number if hits)

Page 2: Quantitative Variables

Quantitative Variables

• Mathematical operations on quantitative variables makes sense …

• Adding, subtracting, taking the arithmetic average etc…

Page 3: Quantitative Variables

Visualizing quantitative variables

• Histogram – note that the bars touch each other – the values at the bottom are continuous!

Page 4: Quantitative Variables

Visualizing quantitative variables

• Dot plot

Page 5: Quantitative Variables

So why visualize?

• To see the features of the data– Shape– Center– Spread

Page 6: Quantitative Variables

Constructing a Histogram

Page 7: Quantitative Variables

Step 1 – Choose the Classes

Page 8: Quantitative Variables

Step 2 – Count

Page 9: Quantitative Variables

Step 3 – Draw the Histogram

Page 10: Quantitative Variables

Slide 2- 10

Identifying Identifiers

• Identifier variables are categorical variables with exactly one individual in each category.– Examples: Social Security Number, ISBN, FedEx

Tracking Number• Don’t be tempted to analyze identifier variables.• Be careful not to consider all variables with one

case per category, like year, as identifier variables.– The Why will help you decide how to treat identifier

variables.

Page 11: Quantitative Variables

Shape - Modality and Symmetry

Page 12: Quantitative Variables

Slide 4- 12

Humps and Bumps

1. Does the histogram have a single, central hump or several separated bumps?

– Humps in a histogram are called modes.– A histogram with one main peak is dubbed

unimodal; histograms with two peaks are bimodal; histograms with three or more peaks are called multimodal.

Page 13: Quantitative Variables

Slide 4- 13

Humps and Bumps (cont.)• A bimodal histogram has two apparent peaks:

Page 14: Quantitative Variables

Slide 4- 14

Humps and Bumps (cont.)• A histogram that doesn’t appear to have any mode and in

which all the bars are approximately the same height is called uniform:

Page 15: Quantitative Variables

Slide 4- 15

Symmetry

2. Is the histogram symmetric?– If you can fold the histogram along a vertical line

through the middle and have the edges match pretty closely, the histogram is symmetric.

Page 16: Quantitative Variables

Slide 4- 16

Symmetry (cont.)

– The (usually) thinner ends of a distribution are called the tails. If one tail stretches out farther than the other, the histogram is said to be skewed to the side of the longer tail.

– In the figure below, the histogram on the left is said to be skewed left, while the histogram on the right is said to be skewed right.

Page 17: Quantitative Variables

Slide 4- 17

Anything Unusual?

3. Do any unusual features stick out?– Sometimes it’s the unusual features that tell us

something interesting or exciting about the data.– You should always mention any stragglers, or

outliers, that stand off away from the body of the distribution.

– Are there any gaps in the distribution? If so, we might have data from more than one group.

Page 18: Quantitative Variables

Slide 4- 18

Anything Unusual? (cont.)

• The following histogram has outliers—there are three cities in the leftmost bar:

Page 19: Quantitative Variables

Shape - Outliers

Do any unusual features stick out?

We will discuss these in more detail when we introduce box plots.

Page 20: Quantitative Variables

Why do we care about shape?

• When quantitative variables are skewed, we describe the center and spread using different measures than if the variable is symmetric.

Page 21: Quantitative Variables

The center of the distribution - median

•The “most typical value” in the data usually refers to some measure of the “center” of the distribution

•The median is the point that divides the histogram into two equal pieces

Page 22: Quantitative Variables

Calculating the median

• First, order all values from smallest to largest

• Let n = sample size• If n is odd, the median is located at the (n+1)/2 position

• If n is even, the median is the average of the two middle points

Page 23: Quantitative Variables

Calculating the median• Example 1 : Earthquakes in N.Z.• 2010 EQ magnitudes in N.Z.: 3.2,3.2,3.3,3.4,3.5,3.5,3.6,3.6, 3.7, 3.8,3.9,3.9,6.4

• Since n is odd:– Median is located at the

(n+1)/2 = (13+1)/2 = 7th position– Median is 3.6

Page 24: Quantitative Variables

Calculating the median• Example 2 : Earthquakes in Samoa• 2010 Earthquake magnitudes in Samoa: 1.1,3.5,4.4,4.6,5.1,6.0

• Since n is even:– Median is the average of

• (n/2) = (6/2) = 3rd value (4.4)• (n/2)+1 = (6/2)+1 = 4th value (4.6)

– Median is (4.4+4.6)/2 = 4.5

Page 25: Quantitative Variables

Median - Interpretation

• Example 1: The typical earthquake size in Fiji in 2010 was 3.6 on the Richter scale

• How useful is this?

Page 26: Quantitative Variables

Spread

• If all earthquakes in Fiji were 3.6, then the Median would be sufficient information

• But they are not, so we need to see how spread out are the earthquakes around 3.6

Page 27: Quantitative Variables

Spread - Range

• Range = max value - min value• For the Fiji example:

– Range = 6.4-3.2 = 3.2• This is not useful…why?

Page 28: Quantitative Variables

Spread-IQR

• Inter-quartile range• IQR = Q3 - Q1• Q1 = Median of 1st half• Q3 = Median of 2nd half• One single number that captures “how spread out the data is”

Page 29: Quantitative Variables

Spread-IQR• NZ Earthquake example cont:• 2010 EQ magnitudes in N.Z. (divided): 1st half: 3.2,3.2,3.3,3.4,3.5,3.5,3.6,2nd half: 3.6, 3.6, 3.7,3.8,3.9,3.9,6.4

• Q1 = (n+1)/2 = (7+1)/2 = 4 -> 3.4• Q3 = (n+1)/2 = (7+1)/2 = 4 -> 3.8• IQR = 3.8-3.4 = 0.4• When n is odd, include median in both lists…don’t when n is even

Page 30: Quantitative Variables

IQR

• Almost always a reasonable summary of the spread of a distribution

• Shows how spread out the middle 50% of the data is

• One problem is that it ignores a lot of individual variation

Page 31: Quantitative Variables

5-Number Summary

• Minimum• Q1• Median• Q3• Maximum

Page 32: Quantitative Variables

Slide 5- 32

The Five-Number Summary• The five-number summary of a distribution reports its

median, quartiles, and extremes (maximum and minimum).– Example: The five-number summary for the ages

at death for rock concert goers who died from being crushed is

Max 47 years

Q3 22

Median 19

Q1 17

Min 13

Page 33: Quantitative Variables

Categorical or Quantitative?