data visualization by david kretch

28
Data Visualization April 3, 2015 When you should graph What you should graph Given some data, how would you graph it

Upload: summit-consulting-llc

Post on 21-Aug-2015

198 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: Data Visualization by David Kretch

Data Visualization

April 3, 2015

• When you should graph

• What you should graph

• Given some data, how would you graph it

Page 2: Data Visualization by David Kretch

2

When should you graph your data?

Data Visualization

AlwaysDon’t just make graphs for client reports -- graph your data for yourself, so you understand it.

If you use a table in a report, see if you can make it into a graph.

Page 3: Data Visualization by David Kretch

3

Why graphs?

Because of the environment that humans evolved in, we are much

better at getting info from color, size, shape, and position than from reading text.

Data Visualization

Find the dangerous creatures!

Page 4: Data Visualization by David Kretch

4

Why graphs work

• Color

• Size• Shape

• Position

Data Visualization

Page 5: Data Visualization by David Kretch

5

Why else do people like graphs?

People like cool-looking stuff.

Data Visualization

Not cool Cool

Page 6: Data Visualization by David Kretch

6

What are we currently doing?

• Making lots of tables

Data Visualization

Group Mean 25% 50% 75%

Bananas 11.3 2.7 4.6 23.1

Kittens 4.0 0.9 3.6 7.5

Phones -3.1 -11.0 -2.9 2.2

Variable Parameter Estimate

Cuteness 0.6***

Ability to Fly 1.4***

Deadliness 11.2***

Telepathy -9.8***

Big Ears -17.3***

Page 7: Data Visualization by David Kretch

7

What is wrong with tables?

Tables give only a partial picture – means only tell us so much.

Figuring out what’s bigger, and by how much, requires more work.

The information is not necessarily in any order, so we need to read all the numbers.

Data Visualization

Page 8: Data Visualization by David Kretch

8

What kinds of graphs should you make?

• The distribution, instead of giving just mean, median, etc.

• The relationship between two variables – the conditional distribution

• Graph estimation results’ point estimates and confidence intervals

Data Visualization

Page 9: Data Visualization by David Kretch

9

What to expect out of this presentation

1. Discussion of the type of graph (e.g. distributions)

2. How the type of graph applies to continuous vs. categorical data

3. Extensions (e.g. graphing more than one at a time)

What not to expect: how to do these in any particular software.

Data Visualization

Page 10: Data Visualization by David Kretch

10

Distributions

Data Visualization

Page 11: Data Visualization by David Kretch

11

Distributions – Continuous variables

Make density plots/histograms for continuous variables. These give much more information than means, medians, etc.

Two distributions with the same mean, but which are dramatically different.

Data Visualization

Page 12: Data Visualization by David Kretch

12

Density vs. histogram

A density plot is basically a smoothed histogram.

Data Visualization

Page 13: Data Visualization by David Kretch

13

Distributions – Categorical variables

Make bar charts for categorical variables.

Tip: if your categories don’t have any inherent order, order them from largest to smallest.

Data Visualization

Page 14: Data Visualization by David Kretch

14

Compare distributions using color

Suppose we want to compare the distribution of income among different occupations. Plot all the distributions, distinguished by color, and use transparency to make them all visible simultaneously.

Data Visualization

Page 15: Data Visualization by David Kretch

15

Highlighting important facts

Add vertical lines to highlight the means.

Data Visualization

Page 16: Data Visualization by David Kretch

16

Relationships

Data Visualization

Page 17: Data Visualization by David Kretch

17

Relationships between variables

If we’re asking, for example, what GDP growth looks like at different levels of government spending, we can show this using a scatterplot.

Data Visualization

Page 18: Data Visualization by David Kretch

18

How to show trends

We can highlight the trend using scatterplot smoothing, which adapts the shape of the trend line to the data.

Data Visualization

Page 19: Data Visualization by David Kretch

19

How to show multiple groups

We can see if the relationship differs among groups by giving each group a color.

Data Visualization

Page 20: Data Visualization by David Kretch

20

Another use for colors

Suppose we want to come up with rules to identify people’s favorite food based on population density and elevation (bear with me)

Can we see this on a graph?

Data Visualization

Page 21: Data Visualization by David Kretch

21

Graphing relationships with categorical data

With categorical data, you typically can’t use scatterplots because points fall right on top of each other (‘overplotting’).However! We can use jittering to move the plotted points slightly.

Data Visualization

Without jittering With jittering

Page 22: Data Visualization by David Kretch

22

Graphing relationships with categorical data

The next step beyond jittering is to use a boxplot, which shows– The mean, – 25th and 75th percentiles, – 1.5 times the inter-quartile range (IQR)– outliers (plotted as points)

Data Visualization

mean

75th pctile

mean + 1.5 *IQR

outlier

Page 23: Data Visualization by David Kretch

23

Looping back

A boxplot isn’t, after all, all that different from the multi-colored density plot we showed earlier. Which is better depends on what you’re trying to show.

Data Visualization

Page 24: Data Visualization by David Kretch

24

Use log scale if your data spans a wide range

Let’s say you have a large range of values, but most of your data is concentrated to one part of the range.

It’s easier to see what’s going when we use log scale.

Data Visualization

Page 25: Data Visualization by David Kretch

25

Estimation results

Data Visualization

Page 26: Data Visualization by David Kretch

26

Graphing estimation results

We make a lot of regression tables, but we can make them easier to understand by putting them into graphs.

Data Visualization

Page 27: Data Visualization by David Kretch

27

ggplot(df, aes(population_density, elevation, color = favorite_food)) + geom_point()

Data Visualization

dataset x variable y variable

make scatterplot

color variable

All graphs made in R and ggplot2

Page 28: Data Visualization by David Kretch

28

Data Visualization Checklist

• Always graph

• Use color, size, shape, and position

• Three important types of graph:– Distribution– Relationship– Estimation results

• Highlight important facts

• Make it cool-looking

Data Visualization