how to make a picture worth a thousand words: effectively ...guiding principles • make the data...

84
How to make a picture worth a thousand words: Effectively communicating your research results using statistical graphics Yates Coley, PhD Kaiser Permanente Washington Health Research Institute Seattle , WA Joint work with Mike Jackson, PhD, KPWHRI April 4, 2018

Upload: others

Post on 16-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • How to make a picture worth a thousand words:

    Effectively communicating your research results using statistical graphics

    Yates Coley, PhD Kaiser Permanente Washington Health Research Institute

    Seattle , WA

    Joint work with Mike Jackson, PhD, KPWHRI

    April 4, 2018

  • Seminar Outline

    • Introduction

    • Fundamentals of Statistical Graphics

    • Data Visualization Best Practices

    • Resources

  • Data Visualization is…

    • a scientific discipline.

    • both a principled and subjective art.

    • work!

    • important!

    • an organizing framework.

  • Objectives

    • Present organizing framework for data visualization

    • Describe conceptual best practices for creating statistical graphics and give concrete examples

    • Provide sources and references for future consultation

  • Seminar Outline

    • Introduction

    • Fundamentals of Statistical Graphics

    • Data Visualization Best Practices

    • Resources

  • Components of a data visualization

    • Visual Cues

    • Coordinate System

    • Scale

    • Context

  • Yau (2013) Data PointsFIGURE 33 Visual cuesYau, Nathan. Data Points, edited by Nathan Yau, Wiley, 2013. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/jhu/detail.action?docID=1158630.Created from jhu on 2017-05-28 16:03:12.

    Cop

    yrig

    ht ©

    201

    3. W

    iley.

    All

    right

    s re

    serv

    ed.

    Visual Cues

  • Yau (2013) Data PointsFIGURE 33 Visual cuesYau, Nathan. Data Points, edited by Nathan Yau, Wiley, 2013. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/jhu/detail.action?docID=1158630.Created from jhu on 2017-05-28 16:03:12.

    Cop

    yrig

    ht ©

    201

    3. W

    iley.

    All

    right

    s re

    serv

    ed.

    Visual Cues

    Quantitative Variables

  • Yau (2013) Data PointsFIGURE 33 Visual cuesYau, Nathan. Data Points, edited by Nathan Yau, Wiley, 2013. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/jhu/detail.action?docID=1158630.Created from jhu on 2017-05-28 16:03:12.

    Cop

    yrig

    ht ©

    201

    3. W

    iley.

    All

    right

    s re

    serv

    ed.

    Visual Cues

    Categorical Variables

  • ●●

    ●●

    ●●

    ●● ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ● ●

    ●●

    Age (years)

    PSA

    (ng/

    mL)

    45 55 65 75

    05

    1015

    ● ●Low volume High volume

    Diagnostic characteristics of patients in active surveillance

    Diagnostic biopsy

  • Components of a data visualization

    • Visual Cues

    • Coordinate System

    • Scale

    • Context

  • Data Visualization Process

    • What data do you have?

    • What do you want to know about your data?

    • What visualization method should you use?

    • What do you see and does it make sense?

    Yau (2013) Data Points

  • Data Visualization Process

    • What data do you have?

    • Continuous, ordinal, or categorical?

    • Time series?

    • What do you want to know about your data?

    • What visualization method should you use?

    • What do you see and does it make sense?

    Yau (2013) Data Points

  • Data Visualization Process• What data do you have?

    • What do you want to know about your data?

    • Distributions of single variables?

    • Relationships between variables?

    • Summaries or unit-level detail?

    • What visualization method should you use?

    • What do you see and does it make sense?

    Yau (2013) Data Points

  • Data Visualization Process

    • What data do you have?

    • What do you want to know about your data?

    • What visualization method should you use?

    • What do you see and does it make sense?

    Yau (2013) Data Points

  • 0 5 10 15

    020

    4060

    80

    0 5 10 15

    020

    4060

    80

    PSA (ng/mL)

    Num

    ber o

    f pat

    ient

    sHistogram: Unit-level Boxplot: Summary

    ●●●

    05

    1015

    05

    1015

    PSA

    (ng/

    mL)

  • PSA density at diagnosis

    Low High

    050

    100

    150

    200

    250

    PSA density at diagnosisN

    umbe

    r of p

    atie

    nts

    Low High

    Cancer volume at diagnosis

    Low High

    050

    100

    150

    200

    250

    Cancer volume at diagnosis

    Low High

    Num

    ber o

    f pat

    ient

    s

  • ●●

    ●●

    ●●

    ●● ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ● ●

    ●●

    Age (years)

    PSA

    (ng/

    mL)

    45 55 65 75

    05

    1015

    Diagnostic PSA of patients in active surveillance

  • ●●

    ● ●

    ● ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●● ●

    ●●

    ●● ● ●

    ●●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●● ●

    ● ●

    ●●

    ●●

    ●●●

    ● ●●

    ● ●

    ● ●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ●● ● ● ● ● ● ● ● ●

    ● ●● ●

    ●●

    ●● ●

    ● ●●

    ● ●

    ●● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ● ● ●

    ● ●

    ●●

    ●● ●

    ● ●

    ●●

    ●●

    ● ●●

    ●●

    ●●●

    ● ●

    ●●

    ● ●

    ● ●● ●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●● ●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ● ● ● ● ●

    ●●

    ●●

    ●●

    ● ● ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ● ●

    ● ●

    ●● ●

    ● ●

    ● ●● ●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ● ● ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ● ●

    ●●

    ●● ●

    ●●

    ●●

    ● ●

    ●● ●

    ● ●

    ● ● ●● ●

    ● ● ●

    ● ●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ● ● ●

    ●●

    ●●●

    ●●

    ●●

    ● ●

    ●●● ●

    ● ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ● ●●

    ● ●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ● ●

    ● ●

    ●● ●● ●

    ● ●●

    ●●●●

    ● ●●

    ● ●

    ● ●

    ●●

    ● ●

    ●● ●● ● ●

    ● ●

    ●●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●● ●

    ●●●

    ● ●

    ● ●

    ●● ●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ● ●

    ● ●

    ● ●●

    ● ●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ● ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ● ●

    ●●●

    ● ●

    ●●●

    ●●

    ●●

    ●●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●● ●

    ● ●

    ● ●● ●

    ● ●

    ●●

    ● ●

    ● ●

    ● ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ●● ●

    ●●●

    ●● ●●

    ● ●●

    ●●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ● ● ●●

    ●●● ●

    ● ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ● ●●●

    ●●

    ● ●

    ●● ● ● ● ●

    ● ●

    ●● ●

    ● ●●

    ● ●

    ●●

    ●●●

    ●●

    ●●

    ● ● ●●

    ●●

    ● ●

    ●●●

    ●●

    ●● ●

    ●●●

    ●●

    ●●

    ●●● ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●●

    Age (years)

    PSA

    (ng/

    mL)

    45 55 65 75

    05

    1015

    20PSA observations for patients throughout active surveillance

  • 0

    5

    10

    15

    20

    50 55 60 65 70

    0

    5

    10

    15

    20

    25

    30

    35

    40

    PSA observations for patients throughout active surveillance

    Age (years)

    PSA

    (ng/

    mL)

  • Diagnostic biopsy

    ●●

    ●●

    ●●

    ●● ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ● ●

    ●●

    Age (years)

    PSA

    (ng/

    mL)

    45 55 65 75

    05

    1015

    ● ●Low volume High volume

    Diagnostic characteristics of patients in active surveillance

  • 40 45 50 55 60 65 70 75

    010

    3050

    40 45 50 55 60 65 70 75

    01

    23

    45

    67

    Age at prostate cancer diagnosis

    Low volume on diagnostic biopsy

    High volume on diagnostic biopsy

    Age (years)

    Age (years)

    Num

    ber o

    f pat

    ient

    sN

    umbe

    r of p

    atie

    nts

  • ●●

    4550

    5560

    6570

    75Age at prostate cancer diagnosis

    Age

    (yea

    rs)

    Low volume ondiagnostic biopsy

    High volume ondiagnostic biopsy

  • Volume

    PSAD

    Low High

    High

    Low

    PSA density

    Mosaic plot

    Volume on diagnostic

    biopsy

    Low High

    Low

    High

  • Data Visualization Process

    • What data do you have?

    • What do you want to know about your data?

    • What visualization method should you use?

    • What do you see and does it make sense?

    Yau (2013) Data Points

  • Seminar Outline

    • Introduction

    • Fundamentals of Statistical Graphics

    • Data Visualization Best Practices

    • Resources

  • How do we define an “effective” statistical graphic?

    • An effective statistical graphic enables the reader to

    • extract information accurately

    • with reasonable effort and

    • high confidence.

    Enrico Bertini Lecture #3

    https://drive.google.com/drive/folders/0B-9uY9BLNUVFajg1bGg5YWp3V0k

  • Expressiveness Principle

    Statistical graphic “should express all and only the information in the data” (and statistical results).

    Enrico Bertini Lecture #4

    https://drive.google.com/drive/folders/0B-9uY9BLNUVFajg1bGg5YWp3V0k

  • Enrico Bertini Lecture #3

    0 100 200 300 400 500 600 1200700 800 900 1000 1100

    Number of observations

    A B C D E F G H IJ K L M N O P Q

    Category

    https://drive.google.com/drive/folders/0B-9uY9BLNUVFajg1bGg5YWp3V0k

  • Enrico Bertini Lecture #3

    Sorted Bar Chart

    K B Q I E L O A D PJ H G C M N F

    0 100 200 300 400 500 600 1200700 800 900 1000 1100

    Category

    Number of observations

    https://drive.google.com/drive/folders/0B-9uY9BLNUVFajg1bGg5YWp3V0k

  • line chart with categorical data (wrong!)

    Enrico Bertini Lecture #3

    A B C D E F G H I J K L M N O P Q

    1200

    1000

    800

    600

    400

    200

    0

    Category

    Number of observations

    https://drive.google.com/drive/folders/0B-9uY9BLNUVFajg1bGg5YWp3V0k

  • Effectiveness Principle• “The importance of the information should match

    the salience of the mode of visual encoding”.

    • “Salience” is characterized by:

    • Accuracy

    • Discriminability

    • Separability

    • “Pop-out”

    • GroupingEnrico Bertini Lecture #4

    https://drive.google.com/drive/folders/0B-9uY9BLNUVFajg1bGg5YWp3V0k

  • Enrico Bertini Lecture #8

    Quantitative Variables Categorical Variables

    https://drive.google.com/drive/folders/0B-9uY9BLNUVFajg1bGg5YWp3V0k

  • Enrico Bertini Lecture #8

    Quantitative Variables Categorical Variables

    https://drive.google.com/drive/folders/0B-9uY9BLNUVFajg1bGg5YWp3V0k

  • Diagnostic biopsy

    !!

    !

    !

    !

    !!

    !

    !

    !!

    !

    !

    !! !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!!!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !!

    !

    !

    !

    !!!

    !

    !

    !

    !

    !

    !

    !

    !!

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    ! !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !!

    !

    !

    !!

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    Age (years)

    PSA

    (ng/

    mL)

    45 55 65 75

    05

    1015

    ! !Low volume High volume

    Diagnostic characteristics of patients in active surveillance AccuracyDiscriminability

    SeparabilityPop-out

    Grouping

  • Diagnostic biopsy

    !!

    !

    !

    !

    !!

    !

    !

    !!

    !

    !

    !! !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!!!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !!

    !

    !

    !

    !!!

    !

    !

    !

    !

    !

    !

    !

    !!

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    ! !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !!

    !

    !

    !!

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    Age (years)

    PSA

    (ng/

    mL)

    45 55 65 75

    05

    1015

    ! !Low volume High volume

    Diagnostic characteristics of patients in active surveillance AccuracyDiscriminability

    SeparabilityPop-out

    Grouping

  • Diagnostic biopsy

    !!

    !

    !

    !

    !!

    !

    !

    !!

    !

    !

    !! !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!!!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !!

    !

    !

    !

    !!!

    !

    !

    !

    !

    !

    !

    !

    !!

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    ! !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !!

    !

    !

    !!

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    Age (years)

    PSA

    (ng/

    mL)

    45 55 65 75

    05

    1015

    ! !Low volume High volume

    Diagnostic characteristics of patients in active surveillance AccuracyDiscriminability

    SeparabilityPop-out

    Grouping

  • Diagnostic biopsy

    !!

    !

    !

    !

    !!

    !

    !

    !!

    !

    !

    !! !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!!!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !!

    !

    !

    !

    !!!

    !

    !

    !

    !

    !

    !

    !

    !!

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    ! !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !!

    !

    !

    !!

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    Age (years)

    PSA

    (ng/

    mL)

    45 55 65 75

    05

    1015

    ! !Low volume High volume

    Diagnostic characteristics of patients in active surveillance AccuracyDiscriminability

    SeparabilityPop-out

    Grouping

  • Diagnostic biopsy

    !!

    !

    !

    !

    !!

    !

    !

    !!

    !

    !

    !! !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!!!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !!

    !

    !

    !

    !!!

    !

    !

    !

    !

    !

    !

    !

    !!

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    ! !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !!

    !

    !

    !!

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    Age (years)

    PSA

    (ng/

    mL)

    45 55 65 75

    05

    1015

    ! !Low volume High volume

    Diagnostic characteristics of patients in active surveillance AccuracyDiscriminability

    SeparabilityPop-out

    Grouping

  • Diagnostic biopsy

    !!

    !

    !

    !

    !!

    !

    !

    !!

    !

    !

    !! !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!!!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !!

    !

    !

    !

    !!!

    !

    !

    !

    !

    !

    !

    !

    !!

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    ! !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !!

    !

    !

    !!

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    ! !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    Age (years)

    PSA

    (ng/

    mL)

    45 55 65 75

    05

    1015

    ! !Low volume High volume

    Diagnostic characteristics of patients in active surveillance AccuracyDiscriminability

    SeparabilityPop-out

    Grouping

  • Enrico Bertini Lecture #8

    Quantitative Variables Categorical Variables

    https://drive.google.com/drive/folders/0B-9uY9BLNUVFajg1bGg5YWp3V0k

  • Source: New York Times

    https://www.nytimes.com/interactive/2018/03/06/business/china-tariffs.htmlhttps://drive.google.com/drive/folders/0B-9uY9BLNUVFajg1bGg5YWp3V0k

  • Source: The Economist

    https://www.economist.com/blogs/graphicdetail/2018/03/daily-chart-13

  • Enrico Bertini Lecture #8

    Quantitative Variables Categorical Variables

    https://drive.google.com/drive/folders/0B-9uY9BLNUVFajg1bGg5YWp3V0k

  • Enrico Bertini Lecture #8

    Quantitative Variables Categorical Variables

    https://drive.google.com/drive/folders/0B-9uY9BLNUVFajg1bGg5YWp3V0k

  • Diagnostic biopsy

    ●●

    ●●

    ●●

    ●● ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ● ●

    ●●

    Age (years)

    PSA

    (ng/

    mL)

    45 55 65 75

    05

    1015

    ● ●Low volume High volume

    Diagnostic characteristics of patients in active surveillance

  • ●●

    ●●

    ●● ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●●

    ●●

    Age (years)

    PSA

    (ng/

    mL)

    45 55 65 75

    05

    1015

    ● Low volume High volume

    Diagnostic characteristics of patients in active surveillance

    Diagnostic biopsy

  • Enrico Bertini Lecture #8

    Quantitative Variables Categorical Variables

    Perceptionvs.

    Cognition

    https://drive.google.com/drive/folders/0B-9uY9BLNUVFajg1bGg5YWp3V0k

  • Enrico Bertini Lecture #3

    0 100 200 300 400 500 600 1200700 800 900 1000 1100

    Number of observations

    A B C D E F G H IJ K L M N O P Q

    Category

    https://drive.google.com/drive/folders/0B-9uY9BLNUVFajg1bGg5YWp3V0k

  • Enrico Bertini Lecture #3

    Sorted Bar Chart

    K B Q I E L O A D PJ H G C M N F

    0 100 200 300 400 500 600 1200700 800 900 1000 1100

    Category

    Number of observations

    https://drive.google.com/drive/folders/0B-9uY9BLNUVFajg1bGg5YWp3V0k

  • 0

    5

    10

    15

    20

    50 55 60 65 70

    0

    5

    10

    15

    20

    25

    30

    35

    40

    PSA observations for patients throughout active surveillance

    Age (years)

    PSA

    (ng/

    mL)

  • Guiding Principles• Make the data stand out. Maximize the data-to-

    ink ratio.

    • Avoid superfluidity. Remove “chartjunk”. Reduce non-data ink and redundant data-ink.

    • Strive for clarity.

    • Clear vision.

    • Clear understanding.

    Cleveland (1983) Elements of Graphing Data Edward Tufte (1985) Visual Display of Quantitative Information

  • Guiding Principles• Make the data stand out. Maximize the data-to-

    ink ratio.

    • Avoid superfluidity. Remove “chartjunk”. Reduce non-data ink and redundant data-ink.

    • Strive for clarity.

    • Clear vision.

    • Clear understanding.

    Cleveland (1983) Elements of Graphing Data Edward Tufte (1985) Visual Display of Quantitative Information

  • Visual Cues“Make graphical elements encoding data visually prominent.”

    CARDIOVASCULAROTHER

    FIRSTCARDIOVASCULAR

    CARE UNIT

    1950 1960 1970 1980

    −40

    −30

    −20

    −10

    0

    YEAR

    PER

    CEN

    T C

    HAN

    GE

    IN D

    EATH

    RAT

    E FR

    OM

    195

    0

    ●●

    ● ●●

    ●●

    ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●● ●

    ●●

    ●●

    ●● ●

    ●●

    ●● ●

    ●●

    ●●

    Cleveland (1983) VDQI, Ch. 2

  • Visual ProminencePlotting symbols are large, dark enough to be easily seen

    ! CARDIOVASCULAROTHER

    FIRSTCARDIOVASCULAR

    CARE UNIT

    1950 1960 1970 1980

    !40

    !30

    !20

    !10

    0

    YEAR

    PER

    CEN

    T C

    HAN

    GE

    IN D

    EATH

    RAT

    E FR

    OM

    195

    0

    !

    !

    !!

    ! !!

    !

    !!

    !

    !

    !

    !! !

    ! !

    !!

    !

    !!

    !

    !!

    !! !

    !

    !

    !!

    !

    !! !

    !!

    !

    !

    !

    !!

    !! !

    !!

    !

    !! !

    !

    !

    !!

    !

    !!

    Cleveland (1983) VDQI, Ch. 2

    Plotting symbols are large, dark enough to be easily seen

  • Visual ProminencePlotting symbols aren’t obscured by connecting lines

    ! CARDIOVASCULAROTHER

    FIRSTCARDIOVASCULAR

    CARE UNIT

    1950 1960 1970 1980

    !40

    !30

    !20

    !10

    0

    YEAR

    PER

    CEN

    T C

    HAN

    GE

    IN D

    EATH

    RAT

    E FR

    OM

    195

    0

    !

    !

    !!

    ! !!

    !

    !!

    !

    !

    !

    !! !

    ! !

    !!

    !

    !!

    !

    !!

    !! !

    !

    !

    !!

    !

    !! !

    !!

    !

    !

    !

    !!

    !! !

    !!

    !

    !! !

    !

    !

    !!

    !

    !!

    Cleveland (1983) VDQI, Ch. 2

    Plotting symbols aren’t obscured by connecting lines

  • Visual ProminenceOverlapping plotting symbols are easily distinguishable

    ! CARDIOVASCULAROTHER

    FIRSTCARDIOVASCULAR

    CARE UNIT

    1950 1960 1970 1980

    !40

    !30

    !20

    !10

    0

    YEAR

    PER

    CEN

    T C

    HAN

    GE

    IN D

    EATH

    RAT

    E FR

    OM

    195

    0

    !

    !

    !!

    ! !!

    !

    !!

    !

    !

    !

    !! !

    ! !

    !!

    !

    !!

    !

    !!

    !! !

    !

    !

    !!

    !

    !! !

    !!

    !

    !

    !

    !!

    !! !

    !!

    !

    !! !

    !

    !

    !!

    !

    !!

    Cleveland (1983) VDQI, Ch. 2

    Overlapping plotting symbols are easily distinguishable

  • Visual ProminenceSuperposed data readily visually discriminated

    ! CARDIOVASCULAROTHER

    FIRSTCARDIOVASCULAR

    CARE UNIT

    1950 1960 1970 1980

    !40

    !30

    !20

    !10

    0

    YEAR

    PER

    CEN

    T C

    HAN

    GE

    IN D

    EATH

    RAT

    E FR

    OM

    195

    0

    !

    !

    !!

    ! !!

    !

    !!

    !

    !

    !

    !! !

    ! !

    !!

    !

    !!

    !

    !!

    !! !

    !

    !

    !!

    !

    !! !

    !!

    !

    !

    !

    !!

    !! !

    !!

    !

    !! !

    !

    !

    !!

    !

    !!

    Cleveland (1983) VDQI, Ch. 2

    Superposed data readily visually discriminated

  • Visual ProminenceGraphical elements do not interfere with data

    ! CARDIOVASCULAROTHER

    FIRSTCARDIOVASCULAR

    CARE UNIT

    1950 1960 1970 1980

    !40

    !30

    !20

    !10

    0

    YEAR

    PER

    CEN

    T C

    HAN

    GE

    IN D

    EATH

    RAT

    E FR

    OM

    195

    0

    !

    !

    !!

    ! !!

    !

    !!

    !

    !

    !

    !! !

    ! !

    !!

    !

    !!

    !

    !!

    !! !

    !

    !

    !!

    !

    !! !

    !!

    !

    !

    !

    !!

    !! !

    !!

    !

    !! !

    !

    !

    !!

    !

    !!

    Cleveland (1983) VDQI, Ch. 2

    Graphical elements do not interfere with data

  • Visual hierarchy

    Yau (2013) Data Points

    Visual Hierarchy | 203

    point of interest. This creates a visual hierarchy that helps readers immediately focus on the vital parts of a data graphic and use the surroundings as context, as opposed to a flat graphic that a reader must visually rummage through.

    For example, Figure 5-1 is the scatterplot from the previous chapter that shows NBA players’ usage percentage versus points per game. The dots, fitted line, grid, border, and labels are of the same color and thickness, so there is no clear visual focus. It’s a flat image, where all the elements are on the same level.

    FIGURE 51 All visual elements on the same level

    This is easily remedied with a few small changes. In Figure 5-2, the line width of the grid lines is reduced so that they are no longer as thick as the fitted line. In this example, you want the data to stand out. The grid lines also alternate in width so that it is easier to see where each data point lies in the coordinate system, and there’s no imaginary blur that you get in the original chart.

    FIGURE 52 Width of grid lines reduced to fit in background

    Yau, Nathan. Data Points, edited by Nathan Yau, Wiley, 2013. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/jhu/detail.action?docID=1158630.Created from jhu on 2017-05-28 16:12:31.

    Cop

    yrig

    ht ©

    201

    3. W

    iley.

    All

    right

    s re

    serv

    ed.

    Place visual elements on different “levels” to shift focus, draw attention to most important aspect of data or results.

  • Visual hierarchyPlace visual elements on different “levels” to shift focus, draw attention to most important aspect of data or results.

    Yau (2013) Data Points

    204 | CHAPTER 5: Visualizing with Clarity

    Still though, the fitted line is obscured by all the dots, because (1) it’s thin com-pared to the radius of each dot and (2) it still blends in with the grid behind it. Figure 5-3 changes the color to blue to make the data stand out more, and the width of the fitted line is increased so that it clearly rests on top of the dots.

    FIGURE 53 Focus of chart shifted to fitted line with color and width

    The chart is a lot more readable now, but if you imagine people viewing the graphic like they would a body of text—from top to bottom and left to right—more descriptive axis labels and less prominent value labels can help, as shown in Figure 5-4. The text within the chart works similar to how it does in an essay or a book. Headers are often printed bigger and in a bold font to provide both structure and a sense of flow. In this case, the bolder labels provide immediate context for what the chart is about. Also, notice fewer and less prominent gridlines, which directs focus further to the upward trend.

    FIGURE 54 Grid and value labels adjusted and fewer, less prominent gridlines

    Yau, Nathan. Data Points, edited by Nathan Yau, Wiley, 2013. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/jhu/detail.action?docID=1158630.Created from jhu on 2017-05-28 16:12:31.

    Cop

    yrig

    ht ©

    201

    3. W

    iley.

    All

    right

    s re

    serv

    ed.

  • Guiding Principles• Make the data stand out. Maximize the data-to-

    ink ratio.

    • Avoid superfluidity. Remove “chartjunk”. Reduce non-data ink and redundant data-ink.

    • Strive for clarity.

    • Clear vision.

    • Clear understanding.

    Cleveland (1983) Elements of Graphing Data Edward Tufte (1985) Visual Display of Quantitative Information

  • Reduce non-data ink?●

    CARDIOVASCULAROTHER

    FIRSTCARDIOVASCULAR

    CARE UNIT

    1950 1960 1970 1980

    −40

    −30

    −20

    −10

    0

    YEAR

    PER

    CEN

    T C

    HAN

    GE

    IN D

    EATH

    RAT

    E FR

    OM

    195

    0

    ●●

    ● ●●

    ●●

    ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●● ●

    ●●

    ●●

    ●● ●

    ●●

    ●● ●

    ●●

    ●●

  • Reduce redundant data ink?●

    CARDIOVASCULAROTHER

    FIRSTCARDIOVASCULAR

    CARE UNIT

    1950 1960 1970 1980

    −40

    −30

    −20

    −10

    0

    YEAR

    PER

    CEN

    T C

    HAN

    GE

    IN D

    EATH

    RAT

    E FR

    OM

    195

    0

    ●●

    ● ●●

    ●●

    ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●● ●

    ●●

    ●●

    ●● ●

    ●●

    ●● ●

    ●●

    ●●

  • 0 5 10 15

    020

    4060

    80

    0 5 10 15

    020

    4060

    80

    PSA (ng/mL)

    Num

    ber o

    f pat

    ient

    s

    ●●●

    05

    1015

    05

    1015

    PSA

    (ng/

    mL)

    Reduce non-data ink?

  • Reduce redundant data ink!●

    CARDIOVASCULAROTHER

    FIRSTCARDIOVASCULAR

    CARE UNIT

    1950 1960 1970 1980

    −40

    −30

    −20

    −10

    0

    YEAR

    PER

    CEN

    T C

    HAN

    GE

    IN D

    EATH

    RAT

    E FR

    OM

    195

    0

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ● ●

    ● ●

    ●● ●

    ●●

    ● ●

    ● ●

  • Guiding Principles• Make the data stand out. Maximize the data-to-

    ink ratio.

    • Avoid superfluidity. Remove “chartjunk”. Reduce non-data ink and redundant data-ink.

    • Strive for clarity.

    • Clear vision.

    • Clear understanding.

    Cleveland (1983) Elements of Graphing Data Edward Tufte (1985) Visual Display of Quantitative Information

  • CARDIOVASCULAROTHER

    FIRSTCARDIOVASCULAR

    CARE UNIT

    1950 1960 1970 1980

    −40

    −30

    −20

    −10

    0

    YEAR

    PER

    CEN

    T C

    HAN

    GE

    IN D

    EATH

    RAT

    E FR

    OM

    195

    0

    ●●

    ● ●●

    ●●

    ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●● ●

    ●●

    ●●

    ●● ●

    ●●

    ●● ●

    ●●

    ●●

    Data labels?

    CARDIOVASCULARDEATHS

    OTHERDEATHS

    FIRST CARDIOVASCULAR CARE UNIT

  • Grid lines?●

    CARDIOVASCULAROTHER

    FIRSTCARDIOVASCULAR

    CARE UNIT

    1950 1960 1970 1980

    −40

    −30

    −20

    −10

    0

    YEAR

    PER

    CEN

    T C

    HAN

    GE

    IN D

    EATH

    RAT

    E FR

    OM

    195

    0

    ●●

    ● ●●

    ●●

    ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●● ●

    ●●

    ●●

    ●● ●

    ●●

    ●● ●

    ●●

    ●●

  • Guidelines for Text●

    CARDIOVASCULAROTHER

    FIRSTCARDIOVASCULAR

    CARE UNIT

    1950 1960 1970 1980

    −40

    −30

    −20

    −10

    0

    YEAR

    PER

    CEN

    T C

    HAN

    GE

    IN D

    EATH

    RAT

    E FR

    OM

    195

    0

    ●●

    ● ●●

    ●●

    ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●● ●

    ●●

    ●●

    ●● ●

    ●●

    ●● ●

    ●●

    ●●

    Cardiovasculardeaths

    Other deaths

    First cardiovascularcare unit

    Year

    Change indeath rate (%)

  • Scales• “Choose the scales

    so that data fill up as much of the data region as possible.”

    • “Choose the range of the tick marks to include or nearly include the range of the data.”

    Cleveland (1983) Elements of Graphing Data

    Visualization Components | 109

    FIGURE 315 Scales

    Numeric

    The visual spacing on a linear scale is the same regardless of where you are on the axis. So if you were to measure the distance between two points on the lower end of the scale, it’d be the same if they were at the high end of the scale.

    On the other hand, a logarithmic scale condenses as you increase values. This scale is used less than the linear scale and is not as well understood or straightforward for those who don’t regularly work with data, but it’s useful if you’re interested in percent differences more than you are raw counts or your data has a wide range.

    For example, when you compare state populations in the United States, you deal with numbers from the hundreds of thousands up to the tens of millions. As of this writing, California has a population of approximately 38 million peo-ple, whereas Wyoming has a population of approximately 600,000. As shown

    Yau, Nathan. Data Points, edited by Nathan Yau, Wiley, 2013. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/jhu/detail.action?docID=1158630.Created from jhu on 2017-05-30 15:30:29.

    Cop

    yrig

    ht ©

    201

    3. W

    iley.

    All

    right

    s re

    serv

    ed.

    Yau (2013) Data Points

  • Dual y-axes: NOT clear

  • Dual y-axes: NOT clear

    250,000

    300,000

    350,000

    2 million

    1 million

    1.5 million

    0.5 million

    # Abortions # Cancer screening,Prevention

  • Source Evergreen Data

    http://stephanieevergreen.com/two-alternatives-to-using-a-second-y-axis/

  • Source Evergreen Data

    http://stephanieevergreen.com/two-alternatives-to-using-a-second-y-axis/

  • Source Evergreen Data

    http://stephanieevergreen.com/two-alternatives-to-using-a-second-y-axis/

  • Clear Understanding• Provide clear explanations for error bars,

    confidence bands, etc.

    • Make legends comprehensive and informative.

    1. Describe everything that is graphed.

    2. Draw attention to the important features of the data.

    3. Describe the conclusions that are drawn from the data on the graph.

    Cleveland (1983) Elements of Graphing Data

  • Keep it simple. Or not.

    • “A large amount of quantitative information can be packed into a small region.” (p. 90)

    • “Many useful graphs require careful, detailed study.” (p. 94)

    Cleveland (1983) Elements of Graphing Data

  • Proofread. Edit. Revise. Repeat.

    • Creating statistical graphics is an iterative process.

    • Consider alternative graphical approaches.

    • Share graphics with collaborators, colleagues to gauge understanding.

    • For presentation: evaluate figures (size, color) when projected on big screen

  • Seminar Outline

    • Introduction

    • Fundamentals of Statistical Graphics

    • Data Visualization Best Practices

    • Resources

  • Books on Data Visualization• William Cleveland The Elements of Graphing Data (1985)• Edward Tufte:

    • The Visual Display of Quantitative Information (1983, 2001)

    • Envisioning Information (1990, 2001) • Visual Explanations (1997) • Beautiful Evidence (2006)

    • Leland Wilkinson Grammar of Graphics (1999) • Nathan Yau

    • Visualize This (2011) • Data Points (2013)

  • Online Resources

    • Flowing Data (Nathan Yau)

    • Information Visualization course from Enrico Bertini

    • Data Remixed (Ben Jones)

    • Dear Data (Giorgia Lupi and Stefanie Posavec)

    • WTF Visualizations

    http://flowingdata.com/http://enrico.bertini.io/teaching/http://dataremixed.comhttp://www.dear-data.comhttp://viz.wtf/

  • Short course by Mike Jackson, October 22