an introduction to data visualization - github pages · an introduction to data visualization...
TRANSCRIPT
An Introduction to Data VisualizationAnamaria Crisan
@amcrisan http://cs.ubc.ca/[email protected] 1
Master of Science ( Bioinformatics )
PhD(Computer Science)
GenomeDX Biosciences
British Columbia Centre for Disease Control
2010 2013 20152008
2
PhD Candidate, Computer ScienceUniversity of British Columbia
Webinar Learning Goals
Have a high-level understanding of data visualization design and evaluation
Have a basic understanding of different data visualizations tools as well as their strengths and limitations
Today
Tomorrow
3
What we’ll talk about
4
Why should we visualize data?
How do we use data visualizations?
How should we visualize data?5
A Comment on “How Should we Visualize Data?”
There are two aspects of visualizations to think about:
How do you make a visualization? Is it the right visualization?
6
Why should we visualize data?
7
Translating Numbers to Words
http://bit.ly/1FxtT2z
It is not always easy to reason consistently with numbers
8
60%
Probability Frequency Visualization6 in 10
< <
Whiting (2015) “How well do health professionals interpret diagnostic information? A systematic review”
• Numeracy : the ability to reason with numbers§ Individuals with low numeracy have a difficulty interpreting numbers and probabilities§ Also true amongst educated professionals
• Visualization can make data more accessible to individuals with lower numeracy skills
Least Understandable Most Understandable
Visualizing Data is Effective
9
But …. Visualization Design ALSO matters
Baseline Visualization
Alternative 1 Alternative 2
Zikmund-Fisher (2013). A demonstration of ''less can be more'' in risk graphics.
Example: Communicating Survival Benefit of Cancer Therapy
11
OPTION A OPTION B
Example: Infection Transmission in a Hospital
12
Example: Visualizing Arteries of the Heart for Surgery Planning
Borkin (2011). “Evaluation of Artery Visualizations for Heart Disease Diagnosis” Made with : Processing 13
EXISTING STANDARD Accuracy : 39%
REVISED VISUALIZATIONAccuracy: 91%
Borkin (2011). “Evaluation of Artery Visualizations for Heart Disease Diagnosis” Made with : Processing 14
Example: Visualizing Arteries of the Heart for Surgery Planning
How do we use data visualizations?
15
Role of Data visualization in the current paradigm of scientific research
= Communication
16
Do you have a
research
Problem?
Yes.
No.
Do all the
Science!
But eventually you’ll have a problem
right?
Duh.
Informthe masses!
https://www.ratbotcomics.com/comics/pgrc_2014/1/1.html
17
Yes.
No.
Do all the
Science!
Duh.
Inform
Maybe data
Visualization?
Infographics are pretty
the masses!
Problem?
right?18
Do you have a
research
But eventually you’ll have a problem
Yes.
No.
Do all the
Science!
Duh.
Inform
Did it work?
Maybe data
Visualization?
the masses!
Infographics are pretty
Problem?
right?19
Do you have a
research
But eventually you’ll have a problem
Yes.
No.
Do all the
Science!
Duh.
Inform
Did it work?
Maybe data
Visualization?No : (
the masses!
Different Infographics?
Problem?
right?20
Do you have a
research
But eventually you’ll have a problem
Yes.
No.
Do all the
Science!
Duh.
the masses!Inform
Did it work?
Maybe data
Visualization?No : (
Different Infographics?
Declare VictoryYes!
(maybe?)
Problem?
right?21
Do you have a
research
But eventually you’ll have a problem
Limitation #1 : Missed Opportunity in Exploration
Do all the
Science!
DataVisualization!
the masses!Inform
Missed Opportunity for Exploration§ Exploration is looking at your data,
trying different analysis methods, assessing if there are outliers or missing data etc.
22
Limitation #1 : Missed Opportunity in Exploration
Same stats, different graphs (Anscombe’s quartet)
23
Autodesk Research (2017). Same Stats, Different Graphs: https://www.autodeskresearch.com/publications/samestats
Same stats, different graphs
Limitation #1 : Missed Opportunity in Exploration
24
Autodesk Research (2017). Same Stats, Different Graphs: https://www.autodeskresearch.com/publications/samestats
Same stats, different graphs (Datasaurus)
Limitation #1 : Missed Opportunity in Exploration
25
Limitations #2 : Identifying the Appropriate Vis
Selecting the appropriate data visualization is challenging
DataVisualization!
We’ll spend the rest of the talk on this subject
§ True for exploration & communication applications
26
How should we visualize data ?
27
Human Perception & Cognition
Computer Graphics
Data Analysis
Cross Cutting Disciplines in Information Visualization
Visualization Design & Analysis28
R. Kosara (EagerEyes) – https://eagereyes.org/basics/encoding-vs-decoding
Encoding and Decoding Information
A Small Digression
30
Non-colour blind individual
Colour blind individual
Example 1: A Heat map Example 2: The Dress
Concrete Examples of Perception in Action
Colour Blind Simulator: http://www.color-blindness.com/coblis-color-blindness-simulator/
And… we’re back!
32
Putting it all Together for Visualization Design & Analysis
§ Non-trivial to condense knowledge across all these areas
§ Still an ongoing area of research§ I will try convey a simpler
intuition about design & analysis
33
Why? (Motivation)Why do you need to visualize data?How will you, or others, use the visualization?
Breaking Down a Visualization in Three Questions
34
Breaking Down a Visualization in Three Questions
Why? (Motivation)Why do you need to visualize data?How will you, or others, use the visualization?
What? (Data & Tasks)What kind of data is being visualized?What tasks are performed with the data?
35
People tend to jump to this level and ignore why and what
What? (Data & Tasks)What kind of data is being visualized?What tasks are performed with the data?
How? (Visual & Interactive Design)How do you make the visualization?Is it the right visualization?
Why? (Motivation)Why do you need to visualize data?How will you, or others, use the visualization?
Breaking Down a Visualization in Three Questions
36
Design & Evaluation with Three Questions
Why?
What?
How?
Design EvaluationDoes the visualization address the the intended need?
Are you using the right data, or deriving the right data?
Are the visual & interactive choices appropriate for the data and tasks?
Does the visualization support the tasks using that data?
If interactive / computer based, is the visualization easy to use and reliable (i.e doesn’t crash all the time)
37
A Nested-model for Visualization Design & Analysis
Why?
What?
How?
Design
Evaluation
T. Munzner (2014) – Visualization Design and Analysis
Domain Problem*
Data+ Task
Visual + Interaction Design Choices
Algorithm
Infovis (Information Visualization) research advocates an iterative process
T. Munzner (2014) – Visualization Design and Analysis
Design
Evaluation
Thinking Systematically about Data Visualization
*Domain Problem = Motivation 39
An iterative approach to development allows us to get feedback before committing to ineffective design choices
An Iterative Process
40
1. Identify a relevant problem that effects you or a group of stakeholders
Domain Problem
Data+ Task
Visual + Interaction Design Choices
Algorithm
T. Munzner (2014) – Visualization Design and Analysis
Thinking Systematically about Data Visualization
41
NursesClinicians
Medical Health Officers Researchers
Community Leaders
§ Multidisciplinary decision making teams§ More data & diverse data types = more informed decision making§ BUT – different stakeholder abilities to interpret data & different needs
Public Health Stakeholders
PoliticiansPatients
42
2. Ask what data stakeholders use (is it available)?
3. Ask what stakeholders do with the data [tasks]
Domain Problem
Data+ Task
Visual + Interaction Design Choices
Algorithm
T. Munzner (2014) – Visualization Design and Analysis
Thinking Systematically about Data Visualization
43
Many Different Types of Data!
T. Munzner (2014) – Visualization Design and Analysis44
Don’t Just Visualize the Raw Data!
Original (Raw) Data
Derived Data
Example Example when this advice is ignored
T. Munzner (2014) – Visualization Design and Analysis XKCD
People also Perform Different Tasks with Data
A Crisan (2017) – Evidence Base Design and Analysis of a whole genome sequence clinical report….
WGS equivalent
DIAGNOSIS TASKS TREATMENT TASKS SURVEILLENCE TASKS
TOTAL SCOREDiagnose
Latent TBDiagnose Active TB
Reactive vs New Acuqistion
Characterize Transmission Risk
Choose Meds
Choose Tx Duration
Assess Response to Tx
Guide Contact Tracing
Report to Public Health
Define a Cluster
Connect case to
Existing Cluster
Guide Public Health
Response
Patient Identifier Same 3 3 3 3 3 3 3 2 1 1 1 1 26
Sample Collection Date Same 3 3 2 3 3 3 3 1 1 1 1 1 24
Patient Prior TB Results Same 3 2 3 3 3 3 3 1 1 1 0 1 23
Speciation Speciation 1 3 2 3 3 3 3 2 1 1 1 1 23
Sample Type (sputum, fine needle aspirate)
Same 2 3 2 3 3 3 3 1 1 1 0 1 22
Culture results WGS data 1 3 2 3 3 3 3 2 1 1 0 1 22
Sample Collection Site (lymph node, blood draw etc.)
Same 2 3 2 3 3 3 3 1 1 0 0 1 21
Acid Fast Bacilli Smear Speciation 2 3 2 3 2 3 3 1 1 1 0 1 21
Resistotype Predicted DST 0 2 3 1 3 3 2 2 1 1 1 1 19
Phenotype DST Predicted DST* 0 2 3 2 3 3 2 1 1 1 0 1 18
Chest x-ray NA 3 3 2 3 0 2 3 1 0 0 0 0 17
Report Releate Date Same 2 2 1 2 2 2 2 1 0 1 0 1 15
Requester IDs Same 2 2 2 2 2 2 2 1 0 0 0 0 15
Interpretation or comments from reviewer
Same 2 2 1 2 2 2 3 1 0 0 0 0 15
Predicted DST Predicted DST 0 2 2 1 3 3 2 1 0 1 0 0 15
MIRU-VNTR SNPs 0 2 3 1 1 1 1 1 1 1 1 1 13
Cluster Assignment Cluster Assignemnt 0 2 2 1 1 1 0 1 1 1 1 1 11
SNP/variant disance SNPs 0 1 2 1 1 1 0 1 1 1 1 1 10
Phylogenetic Tree Phylogenetic Tree 0 2 1 1 1 1 0 1 0 1 1 1 9
Reviewer ID Same 1 1 1 1 1 1 1 1 0 0 0 0 8
TST results Speciation** 3 1 1 1 0 0 0 1 0 0 0 0 7
IGRA results Speciation** 3 1 1 1 0 0 0 1 0 0 0 0 7
Lab QC WGS Speciffic 0 1 2 1 1 1 0 1 0 0 0 0 7
Spoligotype SNPs 0 1 1 1 0 0 0 0 0 0 0 0 3
RFLP SNPs 0 1 1 1 0 0 0 0 0 0 0 0 3
3 (High) 2 (Some) 1 (Low) 0 (V. L ow)Degree of consensus
46
4. Explore if other visualizations have addressed this problem and set of tasks & data
5. Implement your own solution (remember this include interaction!)
T. Munzner (2014) – Visualization Design and Analysis
Domain Problem
Data+ Task
Visual + Interaction Design Choices
Algorithm
Thinking Systematically about Data Visualization
47
https://www.youtube.com/watch?v=j4Ut4krp8GQ
Example of a more complex visualization
48
A Small Digression
49
Mark:Basic Graphical Element(basic building block)
Channel:Controls the appearance of marks
Marks & Channels : Basic Building Blocks
T. Munzner (2014) – Visualization Design and Analysis49
Example
Channels Vary in their Effectiveness
Bar ChartPosition Common Scale
Pie ChartAngle & Area
J. Heer (2010) – Crowdsourcing Graphical Perception: Using Mechanical Turk ……50
ggplot (data = mpg, aes( x= display, y = cty, colour = class)) + geom_point( )
Channel: Position Channel: Colour
Mark: Point
Marks & Channels : ggplot2 example
Note: Generally in ggplot2 aesthetics refer to channels and geoms refer to marks, but there are complex geoms that aren’t simple marks but chart types (i.e. geom_density) and there are aesthetics that have little to do with the visual channels directly (i.e. group)
https://rpubs.com/hadley/ggplot-intro51
And… we’re back!
53
4. Explore if other visualizations have addressed this problem and set of tasks
5. Implement your own solution (part or all of that solution could be a new algorithm)
Domain Problem
Data+ Task
Visual + Interaction Design Choices
Algorithm
Thinking Systematically about Data Visualization
54
6. Test multiple alternatives (including new ones you develop) with stakeholders
7. Gather qualitative & quantitative evaluation data
Domain Problem*
Data+ Task
Visual + Interaction Design Choices
Algorithm
Thinking Systematically about Data Visualization
55
1. Identify a relevant problem that effects you or a group of stakeholders
2. Ask what data stakeholders use (is it available)?
3. Ask what stakeholders do with the data [tasks]
4. Explore if other visualizations have addressed this problem and set of tasks & data
5. Implement your own solution (vis and/or algorithm)
6. Test multiple alternatives (including new ones you develop) with stakeholders
7. Gather qualitative & quantitative evaluation data
Design
Evaluation
Thinking Systematically about Data Visualization
56
Discovery Design ImplementInformation Gathering Design & Evaluation Finalize Design
Expert Consults
Task & DataQuestionnaire
Design Sprint
Design Choice Questionnaire
TB Workflow
MapData GatheredQualitative
Quantitative
Study Design Exploratory Sequential Model Embedded Model
https://peerj.com/articles/4218/
MYCOBACTERIUM TUBERCULOSISGENOME SEQUENCING REPORTNOT FOR DIAGNOSTIC USE
Pa ent Name JOHN DOE BarcodeBirth Date 2000-01-01 Pa ent ID 12345678910Loca on SOMEPLACE Sample Type SPUTUM
Sample Source PULMONARY Sample Date 2016-12-25
Sample ID A12345678 Sequenced From MGIT CULTURED ISOLATE
Repor ng Lab LAB NAME Report Date/Time 2017-01-01, 15:36
Requested By REQUESTER NAME Requester Contact [email protected]
SummaryThe specimen was posi ve for Mycobacterium tuberculosis. It is resistant to isoniaizd and ri-fampin. It belongs to a cluster, sugges ng recent transmission.
OrganismThe specimen was posi ve forMycobacterium tuberculosis, lineage 2.2.1 (East-Asian Beijing).
Drug Suscep bility
Resistance is reported when a high-confidenceresistance-conferring muta on is detected. “Nomuta on detected” does not exclude the possi-bility of resistance.
! No drug resistance predicted!Mono-resistance predicted"!Mul -drug resistance predicted! Extensive drug resistance predicted
Drug class Interpreta on Drug Resistance Gene (Amino Acid Muta on)
Ethambutol No muta on detectedSuscep blePyrazinimide No muta on detected
Isoniazid katG (S315T)First Line
ResistantRifampin rpoB (S531L)
Streptomycin No muta on detected
Ciprofloxacin No muta on detected
Ofloxacin No muta on detectedMoxifloxacin No muta on detectedAmikacin No muta on detectedKanamycin No muta on detected
Second Line Suscep ble
Capreomycin No muta on detected
Page 1 of 2 Pa ent ID: 12345678910 | Date: 2017-01-01 | Loca on: Someplace
My Work: Evidence Based Design
57
My Work: Exploring Vis for Genomic Epidemiology
OPTION A OPTION B
How do researchers visualize data? How can we systematically compare visualizations?
58
Wrapping up
59
DATA VISUALIZATION IS NOT
JUST AN ART PROJECT
60
Have a high-level understanding of data visualization design and evaluation
Revisiting Today’s Learning Goal
§ Visualizations of data are useful§ Helpful in instance of low numeracy§ Can used in communication and exploration
§ But.. visualization design also matters§ Many different alternatives, important to test
§ It’s possible to think systematically about visualizations§ Many disciplines cross cut information visualization research§ At the bear minimum think “Why”, “What”, “How”
§ Some small examples to get you started§ https://peerj.com/articles/4218/ + more to come
61
An Introduction to Data VisualizationAnamaria Crisan
@amcrisan http://cs.ubc.ca/[email protected] 62