design principles for data presentation: translating analysis into

47
Design principles for data presentation: Translating analysis into visualization Andrew Eppig, UC Berkeley CAIR Annual Conference 8 November 2012

Upload: nguyencong

Post on 14-Feb-2017

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Design principles for data presentation: Translating analysis into

 Design principles for data presentation:  Translating analysis into

visualizationAndrew Eppig, UC BerkeleyCAIR Annual Conference

8 November 2012

Page 2: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Overview• Design Principles

• Data Sample and Summary

• Why Charts Matter

• Visualization Process

• Chart Selection

• Layout

• Aesthetics

• Self-Sufficiency Check

• Institutional Examples and Applications2

Page 3: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Guiding Design Principles

• Good visualizations start with good data and detailed analysis

• Know your data

• Good visualizations directly answer specific, focused questions

• Know what question(s) you are asking

• Good visualizations get out of the way of the data

• Let the data tell its story without excess clutter or distraction

3

“Too often we pay more attention to ‘pretty’ than to the most important element: information.”

-- Dona Wong, The Secrets of Graphics Presentation

Page 4: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Data Sample, Summary, and Metrics

4

Name Weight (lb.) Height (in.) BMIBatman 210 74 27.0Michael Phelps 165 75 20.6Wonder Woman 130 72 17.6Hope Solo 140 69 20.7

Gender Group N BMI Mean BMI Std. Dev.

MaleComics 1,239 26.0 4.2

Male Athletes 403 23.8 3.5Male

Models 493 21.6 2.3

FemaleComics 505 20.3 3.4

Female Athletes 254 22.0 3.4FemaleModels 489 18.2 2.7

BMI = 703 xweight[lb]height[in]2

Page 5: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Default Excel Charting

5

0.000

5.000

10.000

15.000

20.000

25.000

30.000

35.000

Models Athletes Comics

Average BMI by Group and Gender

Female

Male

Page 6: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Excessive Chart Junk

6

17.000

18.000

19.000

20.000

21.000

22.000

23.000

24.000

25.000

26.000

27.000

Female

Male

18.200

21.600

22.000

23.800

20.300

26.000

BM

I

Average BMI by Group and Gender

Models Athletes Comics

Page 7: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Improved Excel Charting

7

Average BMI by Group and Gender

18!

22!

22!

24!

20!

26!

Female!

Male!

Comics

Comics

Athletes

Athletes

Models

Models

Page 8: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Visualization Checklist• What stories does the data tell? Which story do

you want to tell?

• What visualization will best aid the story?

• Who is the audience?

• What metric should you use?

• Which type of chart should you use?

• What is the layout of the visualization?

• How can details enhance the chart?

• Font, color, lines/shading, and text

Sour

ce:

Kai

ser

Fung

, Jun

k Ch

arts

8

“What are the content-reasoning tasks that this display is supposed to help with?” -- Edward Tufte, Beautiful Evidence

Page 9: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Chart Selection

“Meaningful quantitative information always involves relationships. When displayed in graphs, these relationships always boil down to one or more of eight specific relationships: time series, ranking, part-to-whole, deviation,

distribution, correlation, geospatial, nominal comparison.” -- Stephen Few, Designing Effective Tables and Graphs

9

Body Mass Index Distributions

nalysis Question: Are Comic Book Superheroes’ bodies more like Top Athletes’ or Top Models’ bodies? Are comparisons the same for both men and women? To account for differences in height and weight, Body Mass Index (BMI) is used: BMI = 703 x weight / height². Comparing the BMI distributions will reveal similarities or differences.

Male Superheroes FeMale Superheroes

Male Athletes feMale Athletes

Male Models feMale Models

Body Mass Index

Body Mass Index

Body Mass Index

Body Mass Index

Body Mass Index Body Mass Index

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

15 20 25 30 35 15 20 25 30 35

15 20 25 30 35 15 20 25 30 35

15 20 25 30 35 15 20 25 30 35

0.0

0

.10

.0 0

.10

.0 0

.1

0.0

0

.1 0

.20

.0 0

.1 0

.20

.0 0

.1 0

.2

50% of distribution is INSIDE shaded areaMedian BMI = 25.2

39% of distribution is INSIDE shaded areaMedian BMI = 23.2

17% of distribution is INSIDE shaded areaMedian BMI = 21.6

50% of distribution is INSIDE shaded areaMedian BMI = 19.7

31% of distribution is INSIDE shaded area

Median BMI = 21.4

28% of distribution is INSIDE shaded area

Median BMI = 17.7

ummary: Male superheroes Tend to have a higher bmi than top MAle Athletes and a much higher bmi than top male models. So Male superheroes are beyond super human - but more like top male athletes than like top male models. Female superheroes tend to have a lower BMI than top female athletes and a higher bmi than top female models. So female superheroes are Neither super humans nor Super models but between Top female athletes and top female models. Female superheroes also have less variation in their BMI than male Superheroes or Top athletes of either gender.

Super humans- or -

Super Models?Analysis, Writing, Art, and Lettering by: Andrew Eppig

Sources: DC Comics (dc.wikia.com); marvel comics (Marvel.com/universe); 2008 US Olympic Team (www.2008.nbcolympics.com); Models.com (models.com)

(N = 1,239)

(N = 403)

(N = 493)

(N = 505)

(N = 254)

(N = 489)

Blue Shaded area shows middle 50% of superhero BMI distributions

A

S

Page 10: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Visualization Layout

“Tufte’s (1990) recommendation of ‘small multiples’ [...] uses the replication in the display to facilitate comparison to the

implicit model of no change between the displays.” -- Andrew Gelman, Exploratory Data Analysis for Complex Models

10

Body Mass Index Distributions

nalysis Question: Are Comic Book Superheroes’ bodies more like Top Athletes’ or Top Models’ bodies? Are comparisons the same for both men and women? To account for differences in height and weight, Body Mass Index (BMI) is used: BMI = 703 x weight / height². Comparing the BMI distributions will reveal similarities or differences.

Male Superheroes FeMale Superheroes

Male Athletes feMale Athletes

Male Models feMale Models

Body Mass Index

Body Mass Index

Body Mass Index

Body Mass Index

Body Mass Index Body Mass Index

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

15 20 25 30 35 15 20 25 30 35

15 20 25 30 35 15 20 25 30 35

15 20 25 30 35 15 20 25 30 35

0.0

0.1

0.0

0.1

0.0

0.1

0.0

0.1

0.2

0.0

0.1

0.2

0.0

0.1

0.2

50% of distribution is INSIDE shaded areaMedian BMI = 25.2

39% of distribution is INSIDE shaded areaMedian BMI = 23.2

17% of distribution is INSIDE shaded areaMedian BMI = 21.6

50% of distribution is INSIDE shaded areaMedian BMI = 19.7

31% of distribution is INSIDE shaded area

Median BMI = 21.4

28% of distribution is INSIDE shaded area

Median BMI = 17.7

ummary: Male superheroes Tend to have a higher bmi than top MAle Athletes and a much higher bmi than top male models. So Male superheroes are beyond super human - but more like top male athletes than like top male models. Female superheroes tend to have a lower BMI than top female athletes and a higher bmi than top female models. So female superheroes are Neither super humans nor Super models but between Top female athletes and top female models. Female superheroes also have less variation in their BMI than male Superheroes or Top athletes of either gender.

Super humans- or -

Super Models?Analysis, Writing, Art, and Lettering by: Andrew Eppig

Sources: DC Comics (dc.wikia.com); marvel comics (Marvel.com/universe); 2008 US Olympic Team (www.2008.nbcolympics.com); Models.com (models.com)

(N = 1,239)

(N = 403)

(N = 493)

(N = 505)

(N = 254)

(N = 489)

Blue Shaded area shows middle 50% of superhero BMI distributions

A

S

Page 11: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Aesthetic Considerations• Font: How do you increase legibility

and decrease distraction?

• Color: Which color palette is appropriate?

• Line/Shading: Which weight, color, and style will enhance the final product?

• Text: Can adding labels and narrative provide useful context?

“Hue contrast is easy to overuse to the point of visual clutter. A better approach is to use a few high chroma colors as color contrast in a presentation consisting primarily of grays and muted colors.”

-- Maureen Stone, Choosing Colors for Data Visualization

Sour

ce:

Che

mic

al C

olor

Pla

te C

orp.

11

Page 12: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Final Infographic

12

Body Mass Index Distributions

SuperHeroes nalysis Question: Are Comic Book Superheroes’ bodies more like Top Athletes’ or Top Models’ bodies? Are comparisons the same for both men and women? To account for differences in height and weight, Body Mass Index (BMI) is used: BMI = 703 x weight / height². Comparing the BMI distributions will reveal similarities or differences.

Male Superheroes FeMale Superheroes

Male Athletes feMale Athletes

Male Models feMale Models

Body Mass Index

Body Mass Index

Body Mass Index

Body Mass Index

Body Mass Index Body Mass Index

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

15 20 25 30 35 15 20 25 30 35

15 20 25 30 35 15 20 25 30 35

15 20 25 30 35 15 20 25 30 35

0.0

0.1

0.0

0.1

0.0

0.1

0.0

0.1

0.2

0.0

0.1

0.2

0.0

0.1

0.2

50% of distribution is INSIDE shaded areaMedian BMI = 25.2

39% of distribution is INSIDE shaded areaMedian BMI = 23.2

17% of distribution is INSIDE shaded areaMedian BMI = 21.6

50% of distribution is INSIDE shaded areaMedian BMI = 19.7

31% of distribution is INSIDE shaded area

Median BMI = 21.4

28% of distribution is INSIDE shaded area

Median BMI = 17.7

ummary: Male superheroes Tend to have a higher bmi than top MAle Athletes and a much higher bmi than top male models. So Male superheroes are beyond super human - but more like top male athletes than like top male models. Female superheroes tend to have a lower BMI than top female athletes and a higher bmi than top female models. So female superheroes are Neither super humans nor Super models but between Top female athletes and top female models. Female superheroes also have less variation in their BMI than male Superheroes or Top athletes of either gender.

Super humans- or -

Super Models?Analysis, Writing, Art, and Lettering by: Andrew Eppig

Sources: DC Comics (dc.wikia.com); marvel comics (Marvel.com/universe); 2008 US Olympic Team (www.2008.nbcolympics.com); Models.com (models.com)

(N = 1,239)

(N = 403)

(N = 493)

(N = 505)

(N = 254)

(N = 489)

Blue Shaded area shows middle 50% of superhero BMI distributions

A

S

Page 13: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Self-Sufficiency Test

“Can the graphical elements stand on their own feet?  If one removes the numbers from the graphic, can one still understand the

key messages?” -- Kaiser Fung, Junk Charts13

Male Superheroes FeMale Superheroes

Male Athletes feMale Athletes

Male Models feMale Models

Body Mass Index

Body Mass Index

Body Mass Index

Body Mass Index

Body Mass Index Body Mass Index

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Page 14: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Chart Function and Selection

14

Analytical Relationship Highlighted Feature

Time Series Changes over time

Ranking Relative position

Part-to-Whole Fraction of whole

Deviation Differences between sets

Distribution Range and frequency

Correlation Relationship between sets

Geospatial Location

Nominal Comparison Group values

Source: Stephen Few, Show Me the Numbers (2nd edition)

Page 15: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Line Charts - Time Series

15

20,000

22,000

24,000

26,000

1980

19

82 19

84 19

86 19

88 19

90 19

92 19

94 19

96 19

98 20

00 20

02 20

04 20

06 20

08 20

10 20

12 0

5,000

10,000

15,000

20,000

25,000

30,000

1980 1985 1990 1995 2000 2005 2010 20,000

22,000

24,000

26,000

1980 1985 1990 1995 2000 2005 2010

x-axis has too many labelsand labels are slanted✘ y-axis scale is too large✘

y-axis gridlines are too heavy✘

20,000

22,000

24,000

26,000

1980 1985 1990 1995 2000 2005 2010

UC Berkeley Undergraduate Fall Enrollment, 1983-2012Headcount

✓ x-axis labels are horizontal and legibly spaced

✓ data range is roughly 2/3 of the y-axis scale

✓ y-axis gridlines are light in color and weight

Source: UC Berkeley, Cal Answers

Page 16: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

0.00

500.00

1,000.00

1,500.00

2,000.00

2,500.00

1 2 3 4 5 6 7 0

500

1,000

1,500

2,000

2,500

1 2.5 4 5.5 7

0 500

1,000 1,500 2,000 2,500

1 2 3 4 5 6 7

Line Charts - Distribution

16

y-axis labels have extraneous decimal points✘

x-axis labels use an unintuitive interval✘ axis labels are too heavy✘

✓ y-axis labels are rounded to the nearest major increment

✓ axis labels use natural intervals

1, 2, 3... 2, 4, 6... 10, 20, 30...

✓ x-axis labels are light in size and weight

0

500

1,000

1,500

2,000

2,500

1 2 3 4 5 6 7

Undergraduate Years to Degree, UC Berkeley Fall 2004 Cohort

Headcount

Years to Graduation Source: UC Berkeley, Cal Answers

Page 17: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

6

1,464

3,757

977

1,151

1,731

1,101

680

1,774

579

3,158

846

Other EVCP Programs

L&S-Undergraduate

L&S-Social Sciences

L&S-Math & Phys. Sciences

L&S-Bio Sciences

L&S-Arts & Humanities

L&S-Administered Programs

Haas School of Business

College of Natural Resources

College of Env. Design

College of Engineering

College of Chemistry

0

1000

2000

3000

4000

L&S-S

ocial

Scie

nces

Colleg

e of E

ngine

ering

Colleg

e of N

atural

Res

ource

s

L&S-A

rts &

Hum

anitie

s

L&S-U

nderg

radua

te

L&S-B

io Scie

nces

L&S-A

dmini

stered

Prog

rams

L&S-M

ath &

Phy

s. Scie

nces

Colleg

e of C

hemist

ry

Haas S

choo

l of B

usine

ss

Colleg

e of E

nv. D

esign

Other E

VCP Prog

rams

Bar Charts - Ranking

17

6

579

680

846

977

1,101

1,151

1,464

1,731

1,774

3,158

3,757

Other EVCP Programs

College of Env. Design

Haas School of Business

College of Chemistry

L&S-Math & Phys. Sciences

L&S-Administered Programs

L&S-Bio Sciences

L&S-Undergraduate

L&S-Arts & Humanities

College of Natural Resources

College of Engineering

L&S-Social Sciences

Undergraduate Major Headcount by Division, UC Berkeley Fall 2012

✓ labels are horizontal and easy to read

✓ units are usefully ranked by descending value

x-axis labels are slanted and small✘

units are ranked alphabetically✘Source: UC Berkeley, Cal Answers

Page 18: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

0%

1%

3%

5%

10%

13%

29%

39%

Pacific Islander

Native American/Alaskan Native

African American

Decline to State

International

Chicano/Latino

White

Asian

0% 10% 20% 30% 40%

Pacific Islander

Native American/Alaskan Native

African American

Decline to State

International

Chicano/Latino

White

Asian

Bar Charts - Part-to-Whole

18

✓ values are easy to read

✓ gaps between bars are 25-50% of the bar width

values are hard to determine✘

gaps between bars are too large✘Source: UC Berkeley, Cal Answers

Undergraduate Demographic Shares by Race/Ethnicity, UC Berkeley Fall 2012

0%

1%

3%

5%

10%

13%

29%

39%

Pacific Islander

Native American/Alaskan Native

African American

Decline to State

International

Chicano/Latino

White

Asian

Page 19: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

0!

500!

1,000!

1,500!

2,000!

2,500!

0.0! 1.0! 2.0! 3.0! 4.0!

0!

500!

1,000!

1,500!

2,000!

2,500!

0.0! 1.0! 2.0! 3.0! 4.0!

23! 2! 0! 2! 6! 3! 4! 2! 5! 5! 10! 8! 13!17!22!23!29!49!66!82!

151!201!291!352!459!

621!748!874!

1,049!

1,235!

1,446!

1,677!

1,969!

2,118!2,104!

2,260!2,339!

2,024!

1,817!

1,295!

394!

0!

500!

1,000!

1,500!

2,000!

2,500!

0.0! 1.0! 2.0! 3.0! 4.0!

Bar Charts - Histogram

19

y-axis gridlines are too dense✘ data labels distract, add clutter✘

✓ y-axis gridlines are only on the major increments

✓ shape of the distribution is easily seen without distraction

UC Berkeley Undergraduate Cumulative GPA, Spring 2012

Count

Cumulative GPA Source: UC Berkeley, Cal Answers

Page 20: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

0! 500! 1,000! 1,500! 2,000! 2,500! 3,000!

1990!

1995!

2000!

2005!

2010!

0!

500!

1,000!

1,500!

2,000!

2,500!

3,000!

1990! 1995! 2000! 2005! 2010!

0!

500!

1,000!

1,500!

2,000!

2,500!

3,000!

2012! 2007! 2002! 1997! 1992!

Bar Charts - Time Series

20

time variable is shown from right to left✘

y-axis is used for time variable✘

multiple hues are used for the same kind of data✘

0!

500!

1,000!

1,500!

2,000!

2,500!

3,000!

1990! 1995! 2000! 2005! 2010!

UC Berkeley International Undergraduate Fall Enrollment, 1990-2012

Headcount

✓ years increase from left to right

✓ years are plotted on x-axis

✓ bars are colored using shades of a single hue

Source: UC Berkeley, Cal Answers

Page 21: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

60%! 70%! 80%! 90%! 100%!

60%! 70%! 80%! 90%! 100%!

Dot Plots - Deviation

21

✓ dots’ size makes them easy to see

✓ group colors are complementary

points are too small✘

group colors are too similar✘Source: UC Berkeley, Cal Answers

Asian

Other/Decline to State

White

Pacific Islander

Native American/Alaskan Native

Chicano/Latino

International

African American

Asian

Other/Decline to State

White

Pacific Islander

Native American/Alaskan Native

Chicano/Latino

International

African American

UC Berkeley New Freshmen 6-Year Graduation Rates by Race/Ethnicity, Fall 2004 Cohort

60%! 70%! 80%! 90%! 100%!

Asian

Other/Decline to State

White

Pacific Islander

Native American/Alaskan Native

Chicano/Latino

International

African American

Men Women

6-Year Graduation Rate

Page 22: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

40%!

50%!

60%!

70%!

80%!

90%!

100%!

0%! 10%! 20%! 30%! 40%! 50%!

Scatter Plots - Correlation

22

too many bright hues are used to mark groups✘

chart length distorts the correlation of the data✘

✓ groups are marked using muted hues

✓ chart dimensions are close to the golden ratio (1:1.618)

Source: UC Accountability Report, 2011

40%!

50%!

60%!

70%!

80%!

90%!

100%!

0%! 10%! 20%! 30%! 40%! 50%!

* Respect Rate = percentage of students of a given race/ethnicity who responded strongly agree, agree, or somewhat agree to the prompt “students of my race/ethnicity are respected at this campus” on UCUES.

40%!

50%!

60%!

70%!

80%!

90%!

100%!

0%! 10%! 20%! 30%! 40%! 50%!

Impact of Critical Mass on Respect Rates by Race/Ethnicity and by UC Campus, 2007-2008 AY

Respect Rate*

Share of Race/Ethnicity Among New Students on Campus

African American

Chicano/Latino

Asian/Pacific Islander

White

Page 23: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

37%!

25%!

17%!

11%!

10%!

37%!

10%!11%!

11%!

14%!

17%!

37%!

10%!11%!

17%!

25%!

Pie Charts - Part-to-Whole

23

more than five groups are shown✘

slices are blown out of the pie✘

smallest slices are given too prominent location✘

✓ Only five groups are shown with the rest aggregated together

✓ center of the pie is shown making angles visible

✓ groups are ordered usefully with the largest slices at the top

Source: UC Berkeley, Cal Answers

37%!

10%!11%!

17%!

25%!

UC Berkeley College of Letters & Sciences Enrollment Shares, Fall 2012

Social Sciences

Arts & Humanities

Biological SciencesMath & Physical Sciences

Other L&S Divisions

Page 24: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Visualization Layout: Attention Areas

24

High Visual Focus

Good for primary content

Medium Visual Focus

Good for secondary content

Medium Visual Focus

Good for secondary content

Low Visual Focus

Good for tertiary content

Page 25: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Visualization Aesthetics: Color

25

avoid alternating high contrast hues

avoid using more than one high chroma hue✘

✓ use a palette mostly of grays and muted hues

✓ choose a few high chroma colors for contrast

✓ use shades and tints to ensure that a black-and-white copy will still be coherent

!"#$%&'(()"*+,(-"*.,(/010,(!"#$%&''$()*##)$+,-*.&'$/-01#$),$2.3,*4&)0,.$/*&5"0678

2"3"$(4.3&55&(6 7$8+95

!"#$%&'(()"*+,(-"*.,(/0102

3"4"$(5.4&66&(7 8#6&9

Bright (high chroma) Muted

Sour

ce:

Don

a W

ong,

The W

all S

tree

t Jou

rnal

Gui

de to

Info

rmat

ion

Gra

phic

s: Th

e D

os a

nd D

on’ts

of P

rese

ntin

g D

ata,

Fact

s, an

d Fi

gure

s

Page 26: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

20%! 30%! 40%! 50%!

Visualization Aesthetics: Font

26

✓ font choice, weight, and spacing aid clarity

✓ single font used for labels -- second font only used for the title

Source: UC Berkeley, Cal Answers

20%! 30%! 40%! 50%!

UC Berkeley New Freshmen Yield Rates by Race/Ethnicity, Fall 2010 Cohort

Asian

Other/Decline to State

African American

Pacific Islander

Chicano/Latino

White

Native American/Alaskan Native

International

Women Men

Yield Rate

UC BERKELEY NEW FRESHMEN YIELD RATES BY RACE/ETHNICITY, FALL 2010 COHORT

Asian

Other/Decline to State

African American

Pacific Islander

Chicano/Latino

White

Native American/Alaskan Native

International

WOMEN MEN

YIELD RATE

bold and condensed fonts confuse the viewermultiplicity of fonts deters legibility

Page 27: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

0!

1,000!

2,000!

3,000!

4,000!

5,000!

6,000!

7,000!

1995! 2000! 2005! 2010!

Davis Irvine Los Angeles

Merced Riverside San Diego

Santa Barbara Santa Cruz Berkeley

0!

1,000!

2,000!

3,000!

4,000!

5,000!

6,000!

7,000!

1995! 2000! 2005! 2010!

0!

1,000!

2,000!

3,000!

4,000!

5,000!

6,000!

7,000!

1995! 2000! 2005! 2010!

Visualization Aesthetics: Lines/Shading

27

more than four groups are identified in one chart✘

weight of lines blurs trend details✘

label position makes identification hard✘

✓ only two groups are identified

✓ line weights are used for emphasis

✓ lines are directly labeled

Source: UC Accountability Report, 2011

0!

1,000!

2,000!

3,000!

4,000!

5,000!

6,000!

7,000!

1995! 2000! 2005! 2010!

UC New Fall Undergraduate Enrollment by Campus, 1995-2011

Headcount

UC Berkeley

Other UC Campuses

Page 28: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Visualization Aesthetics: Labels/Text

28

UC Berkeley Undergraduate New Enrollment Shares by Gender and Race/Ethnicity, 1983-2012

Source: UC Berkeley, Cal Answers

Share

Share

Share

Share

Prop 209 enacted

Women

Men

Men

Women

Men

Women

Men

Women

White Asian/Pacific Islander

Underrepresented Minority International

0%!

10%!

20%!

30%!

1980! 1985! 1990! 1995! 2000! 2005! 2010!

0%!

10%!

20%!

30%!

1980! 1985! 1990! 1995! 2000! 2005! 2010!

0%!

10%!

20%!

30%!

1980! 1985! 1990! 1995! 2000! 2005! 2010!

0%!

10%!

20%!

30%!

1980! 1985! 1990! 1995! 2000! 2005! 2010!

Prop 209 banned affirmative action in 1997, precipitating a sharp decline in underrepresented minority (URM) students shares, which have yet to recover.

The overall gender gap with women outnumbering men is driven by Asian and URM students where the gender gaps are largest.

Page 29: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Summary

• Know what question you are asking a visualization to answer

• Choose the best metric for your analysis and your audience

• Choose your chart to fit your question rather than your question to fit your chart

• Let the data tell its story without excess clutter or distraction

• Keep the focus of the visualization on the data

• Make sure all use of font, color, shading, and text enhance rather than distract

• Provide narrative to contextualize the highlights of the data

29

Page 30: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Contact Information

30

Please feel free to contact me with questions or comments

Andrew Eppig

Research Analyst

Equity & Inclusion

UC Berkeley

104 California Hall #1500

Berkeley, CA 94720-1500

[email protected]

Page 31: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Web Resources

31

• Junk Charts -- Kaiser Fung

• http://junkcharts.typepad.com

• Flowing Data -- Nathan Yau

• http://flowingdata.com/

• Charts ‘n’ Things -- NY Times Graphics Department

• http://chartsnthings.tumblr.com/

• Perceptual Edge -- Stephen Few

• http://www.perceptualedge.com

Page 32: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Print Resources

32

Edward Tufte

• The Visual Display of Quantitative Information, 1983, Cheshire, CT: Graphics Press

• Visual Explanations: Images and Quantities, Evidence and Narrative, 1997, Cheshire, CT: Graphics Press

William Cleveland

• The Elements of Graphing Data, 1994, revised ed., Murray Hill, NJ: AT&T Bell Laboratories

Dona Wong

• The Wall Street Journal Guide to Information Graphics: The Dos and Don’ts of Presenting Data, Facts, and Figures, 2010, New York: W.W. Norton and Co.

Stephen Few

• Information Dashboard Design: The Effective Visual Communication of Data, 2006, Oakland, CA: Analytics Press

• Now You See It: Simple Visualization Techniques for Quantitative Analysis, 2009, Oakland, CA: Analytics Press

• Show Me the Numbers: Designing Tables and Graphs to Enlighten, 2012, second ed., Oakland, CA: Analytics Press

Page 33: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Appendices

33

Page 34: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Classic Charts

34

Charles Minard's 1869 chart showing the number of men in Napoleon’s 1812 Russian campaign army, their movements, as well as the temperature they encountered on the return path. Lithograph, 62 x 30 cm

Page 35: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Classic Charts

35

Detail from John Snow's spot map of the Golden Square outbreak [1854 London cholera outbreak] showing area enclosed within the Voronoi network diagram. Snow's original dotted line to denote equidistance between the Broad Street pump and the nearest alternative pump for procuring water has been replaced by a solid line for legibility. Fold lines and tear in original (adapted from CIC, between 106 and 07).

Page 36: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Bad Chart Examples

36

!"#$%&'($)*++,-$."*/+)$0#$1234#$,.$035$,.$."*267

8*/-4#9$:-,;"34$-#;-361#)$36$<)2,-)$!/=1#>$%&(?7

The problem:

• The 1978 dollar should be roughly half as big as the 1958 dollar ($0.44 vs $1.00) instead of the roughly one quarter as big

How the problem occurred:

• The chart uses 2-D graphics (i.e., representations of dollar bills with length and width), and both the length and the height were scaled by 1/2 -- resulting in the area being scaled by 1/4 (1/2 x 1/2)

The fix:

• When dealing with 2-D area representations (never use 3-D), remember to scale the area rather than scaling each dimension separately

Source: Tufte, 1983

Page 37: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Bad Chart Examples

37

The problem:

• The message (growth of medical spending in emerging markets) is obfuscated and exaggerated

How the problem occurred:

• The chart uses too many bold colors, which creates visual confusion

• The chart uses pie charts for each year, which makes it hard to see trends

• The chart scales the pie charts incorrectly by scaling only the radius opposed to the area which distorts the changes

The fix:

• When dealing with trend data, time series using line charts are the best choice

Source: “Expanding Circles of Error”, Junk Charts

Page 38: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Data Exploration via Visualization

38

!"#$%&"'()*+,(-.)/

0%1&'/2**!%3(&4*5("6/&7**899:;

<%=6*>&?1$=6%$@#*ABA9*(6()C#"#*%D*E%64%6*!"##$%&'%(&)*+#"*,%4"4*6%$*1$")"F/*G&(.="'()*-/$=%4#;**5("6/&H#*.)%$*%D*$=/*$=/*4($(*"#*(*&/I/)($"%6;

Howard Wainer’s visualization of John Arbuthnot’s 1710 analysis of London Bills of Mortality not only depicts historical incidents, it also provides a check for data quality. The 1704 spike is not associated with any historical incident. A check of the data reveals a transcription error by Arbuthnot where the 1674 data point was mistakenly labeled as 1704.

Source: Wainer, 2009

Page 39: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Infographic Creation Details

39

Page 40: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Data Preparation Steps

• Source identification

• Data collection

• Data scrubbing

• Data analysis

40

Page 41: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Infographic Source Identification

• Super heroes and villains: DC and Marvel

• http://dc.wikia.com/

• http://marvel.com/universe/Main_Page

• Top athletes: 2008 US Olympic Team

• http://www.2008.nbcolympics.com/athletes/index.html

• Top models: models.com listings

• http://models.com/

41

Page 42: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Infographic Data Collection

42

• Create Python web scraper

• Crawl web sites

• Download web pages

• Extract height, weight, and gender data

• Save data to file

Page 43: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Infographic Data Scrubbing

• Check data quality

• Did extraction get correct height and weight?

• Are there duplicate entries?

• Remove super hero and super villain outliers

• Define height window based on athlete and model data

• Define weight window based on athlete and model data

43

Page 44: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Infographic Data Analysis

• Combine all data in R

• Super heroes and villains, athletes, and models

• Create dummy variables

• Gender: male, female

• Source: super hero/villain, athlete, model

• Calculate BMI for each record

• Check summary statistics

• Data ranges, mean, standard deviation

• Run t-tests between groups

44

Page 45: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Height Distributions

45

Male Superheroes FeMale Superheroes

Male Athletes feMale Athletes

Male Models feMale Models

Height (ft.)

height (ft.)

height (ft.)

height (ft.)

height (ft.) height (ft.)

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

0.0

0

.1 0

.2

(N = 1,289) (N = 1,289)

(N = 403) (N = 254)

(N = 493) (N = 479) 0

.0 0

.1 0

.20

.0 0

.1 0

.20

.0 0

.1 0

.2

0.0

0

.1 0

.20

.0 0

.1 0

.2

5’0” 5’6” 6’0” 6’6” 7’0”

59% of distribution is INSIDE shaded areaMedian height = 5’8”

66% of distribution is INSIDE shaded areaMedian height = 6’0”

5’0” 5’6” 6’0” 6’6” 7’0”

5’0” 5’6” 6’0” 6’6” 7’0”

35% of distribution is INSIDE shaded areaMedian height = 6’1”

50% of distribution is INSIDE shaded areaMedian height = 6’0”

50% of distribution is INSIDE shaded areaMedian height = 5’7”

42% of distribution is INSIDE shaded areaMedian height = 5’8”

5’0” 5’6” 6’0” 6’6” 7’0”

5’0” 5’6” 6’0” 6’6” 7’0”

5’0” 5’6” 6’0” 6’6” 7’0”

Page 46: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

Weight Distributions

46

Male Superheroes FeMale Superheroes

Male Athletes feMale Athletes

Male Models feMale Models

weight (lbs.)

weight (lbs.)

weight (lbs.)

weight (lbs.)

weight (lbs.) weight (lbs.)

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

100 150 200 250

0.0

0

.1

(N = 1,289) (N = 1,289)

(N = 403) (N = 254)

(N = 493) (N = 479)

0.0

0

.1

0.0

0

.1

0.0

0

.2 0

.40

.0 0

.2 0

.40

.0 0

.2 0

.4

50% of distribution is INSIDE shaded areaMedian weight = 185

100 150 200 250

50% of distribution is INSIDE shaded areaMedian weight = 128

40% of distribution is INSIDE shaded areaMedian weight = 175

100 150 200 250 100 150 200 250

38% of distribution is INSIDE shaded areaMedian weight = 140

35% of distribution is INSIDE shaded areaMedian weight = 160

100 150 200 250

39% of distribution is INSIDE shaded areaMedian weight = 115

100 150 200 250

Page 47: Design principles for data presentation: Translating analysis into

CAIR 2012 Andrew Eppig, 8 November 2012

With Revised Data

47

Male Superheroes FeMale Superheroes

Male Athletes feMale Athletes

Male Models feMale Models

Body Mass Index

Body Mass Index

Body Mass Index

Body Mass Index

Body Mass Index Body Mass Index

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

Pro

babi

lity

15 20 25 30 35

0.0

0

.1

(N = 1,289)

15 20 25 30 35

15 20 25 30 35

15 20 25 30 35

15 20 25 30 35

15 20 25 30 35

(N = 1,289)

(N = 3,631) (N = 2,610)

(N = 493) (N = 479)

0.0

0

.1

0.0

0

.1

0.0

0

.2 0

.4

50% of distribution is INSIDE shaded areaMedian BMI = 25.4

47% of distribution is INSIDE shaded areaMedian BMI = 23.9

17% of distribution is INSIDE shaded areaMedian BMI = 21.6

0.0

0

.2 0

.40

.0 0

.2 0

.428% of distribution is INSIDE shaded areaMedian BMI = 18.3

50% of distribution is INSIDE shaded areaMedian BMI = 19.7

32% of distribution is INSIDE shaded areaMedian BMI = 21.5