DSC 201: Data Analysis & Visualization
Visualization Dr. David Koop
D. Koop, DSC 201, Fall 2017
Quiz
2D. Koop, DSC 201, Fall 2017
Sheet 3
0 1,000,000
Refugees (incl. refugee-..
Map based on Longitude (generated) and Latitude (generated). Color shows sum of Refugees (incl. refugee-like situations).Details are shown for Origin. The data is filtered on Year, which ranges from 2015 to 2015.
Assignment 1• http://www.cis.umassd.edu/~dkoop/
dsc201-2017fa/assignment1.html • Due next Thursday (Sept. 28) • Goals:
- Using Tableau - Exploratory Data Analysis - Visualization
• Data: UN Persons of Concern • Find outliers, trends, etc.
3D. Koop, DSC 201, Fall 2017
Exploratory Data Analysis• John W. Tukey
- Born in New Bedford - 1977: Highly influential book
• Emphasis on value of visualization in discovering trends, relationships
• From a review of the book: “Tukey favors analysis of data with little more than pencil and paper. Specifically, there is no need for a calculator, a computer, or a lettering guide to do the analyses he proposes” [R.M. Church, 1979]
4D. Koop, DSC 201, Fall 2017
Types of EDA• Univariate (one attribute) vs. multivariate (2+ attributes) • Non-graphical vs. graphical
- Non-graphical ~ statistics - Graphical ~ visualizations
• All are important!
5D. Koop, DSC 201, Fall 2017
Univariate Non-graphical EDA• Categorical Data:
- Frequency counts, proportions - Groupings
• Quantitative Data: - Distribution - Summary statistics: mean, median, mode, variance, standard
deviation, quantiles
6D. Koop, DSC 201, Fall 2017
Univariate Graphical EDA• Categorical Data: grouping, bar charts
• Quantitative Data: strip charts, steam-and-leaf, histograms, boxplots
7D. Koop, DSC 201, Fall 2017
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
10%
11%
12%
Histograms and Distributions
8D. Koop, DSC 201, Fall 2017
[Cloudera]
Boxplots• Show distribution • Multiple summary statistics can be
read from the chart • Also provides a general shape of
the data • Best for unimodal data
9D. Koop, DSC 201, Fall 2017
[N. Yao]
Multivariate non-graphical EDA• Crosstabs and Pivot Tables
- What is in the data? Count • Correlation and covariance
- Correlation: how related are different attributes? • Positive correlation (related) • Negative correlation (related) • Zero (unrelated)
- Covariance: how do two attributes change together?
10D. Koop, DSC 201, Fall 2017
Crosstabs & Pivot Tables• Count groups and subgroups • At least two different attributes • Can subdivide vertically and
horizontally for more subgroups • Sometimes totals are useful
11D. Koop, DSC 201, Fall 2017
Correlation• Pearson Correlation Coefficient:
• r > 0 (+ correlation), r < 0 (- correlation), |r| ~ 1 (strong correlation) • Examples:
- Correlation(Number of people who drowned by falling into a pool, Films Nicolas Cage appeared in) = 0.666004
- Correlation(Divorce rate in Maine, Per capita consumption of margarine) = 0.992558
- Correlation(People who drowned after falling out of a fishing boat, Marriage rate in Kentucky) = 0.952407
12D. Koop, DSC 201, Fall 2017
stddev(y)stddev(x)
covariance(x, y)
Multivariate Graphical EDA• Categorical Data:
- Grouped bar charts - Parallel sets
• Quantitative Data: - Scatterplots: look for correlation
• Usually put outcome on y-axis • Can encode other variables
- Side-by-side boxplots - Parallel coordinates
13D. Koop, DSC 201, Fall 2017
6 extracat: New Approaches in Visualization of Categorical Data in R
Figure 4: Multiple barchart of Cont (x), Type (y), Infl (x) and Sat (x) generated with theMondrian software.
frequencies is called doubledeckerplots (Hofmann 2001), which has all explanatory variableson the x-axis. Its interpretability decreases with the number of displayed combinations. Forrelatively small examples such as the Copenhagen housing data, the graphic is among thebest visual representations as Figure 3
R> doubledecker(xtabs(Freq ~ Cont + Type + Infl + Sat, data = housing))
illustrates: It is easy to compare combinations with di↵erent levels of influence or di↵erenttypes of residence but less easy to compare the two di↵erent levels of contact.
Figure 4 shows the multiple barchart visualisation of the example discussed in Figures 1and 2 created with the interactive software Mondrian (Theus and Urbanek 2008). The maindi↵erence between the multiple barchart and the rmb plot is that the multiple barchart dis-plays absolute frequencies whereas the rmb plot shows their factorization into conditionalrelative frequencies and weights. The advantage of this becomes apparent in the last tworows ("Atrium" and "Terrace") of the left side of the plot (low contact) where the bars arevery small and hardly comparable. Within each combination of Type, Infl and Cont the ratioof any two absolute frequencies is obviously the same as that of the corresponding conditionalrelative frequencies. Unfortunately this does not hold for two di↵erent combinations of thesethree variables and thus only the ratios of the bars can be compared.
Figure 2 and 4 show that it is at least possible to judge strong di↵erences in the shape of thedistributions of a target variable in both classical mosaicplots and multiple barcharts. E.g.,the strong positive relationship of Infl and Sat is apparent in both graphics. Neverthelessin many examples the rmb plot provides a better overview and allows for more precise com-
Bar Chart Matrix
14D. Koop, DSC 201, Fall 2017
[Pilhöfer and Unwin]
Journal of Statistical Software 3
Variable Description LevelsCont Contact to other residents "Low", "High"Infl Influence on housing conditions "Low", "Medium", "High"Type Type of residence "Tower", "Atrium", "Apartment", "Terrace"Sat Satisfaction "Low", "Medium", "High"
Table 1: The Copenhagen housing dataset.
In principle, classical mosaicplots (see Friendly 1994; Hartigan and Kleiner 1981) also showboth pi|jk and n
+jk but while the space is e�ciently used, it becomes harder to establishthe relation between the rectangles and the corresponding variable combinations with everyadditional variable. Comparing the proportions of a target category in di↵erent combinationsof explanatory variables is only possible in a qualitative manner, because the correspondingrectangles neither share a common axis nor have a common scale.
By contrast multiple barcharts and fluctuation diagrams display only the total number ofobservations nijk but allocate the information in equal-sized rectangles in a hierarchical gridlayout (see Hofmann 2000). The allocation along the grid makes it easier to read the plotand also allows better comparisons especially within the rows or columns because all combi-nations now share the same x- and y-axis scales. In multiple barcharts the y-axis is set to[0,max(nijk)] and the x-axis is cut into equal segments for the target categories (or vice versa).Unfortunately comparisons of the conditional distributions of a target variable are quite hard:Comparing absolute frequencies ni|s and ni|t of target category i in two explanatory combi-nations s and t is obviously not equivalent to the comparison of the relative frequencies pi|sand pi|t and hence it is necessary to use ratios of the form
ni|snj|s
=pi|spj|s
andni|tnj|t
=pi|tpj|t
instead.
The basic version of rmb plots is constructed as follows: Consider a set of m categoricalvariables including one target variable. The basis of the plot is a multiple barchart of them � 1 explanatory variables displaying the observed frequencies n
+jk of their combinations.The plot uses horizontal bars which means that all bars have an equal height and their widthsare proportional to the ratios
n+jk
max(n+jk).
The conditional distributions of the target categories defined by the probabilities pi|jk aredisplayed inside these bars. The basic type of visualization is again a barchart with verticalbars. An alternative which is discussed in Section 3 is the generalized spineplot versionwhich splits each bar from the basis plot vertically into segments according to their relativefrequencies, just as in classical mosaicplots or spineplots. In both versions the x- and y-axisscales are the same, namely [0,max(n
+jk)] and [0, 1] respectively.
A first introductory example using the well-known Copenhagen housing dataset (c.f. Venablesand Ripley 2002) is shown in Figure 1. In R the dataset is available from the MASS packageand the variables are listed in Table 1.
Figure 1 shows the variables Cont and Infl on the x-axis, Type on the y-axis and Sat as thetarget variable which is by convention on the x-axis. The graphic reveals the weak influence ofthe Cont variable and the strong positive correlation between Infl and Sat: The di↵erencesbetween the distributions on the left side (low contact) and the corresponding counterpartson the right side (high contact) are quite small and hence the influence of the Cont variableon the satisfaction of the respondents is weak. In contrast the variable Infl shows a strongpositive correlation with the target variable: The people who judged their influence to be low
Data: Robert J. MacG. Dawson. Curves?
Survived alpha » size »Survived Perished
Sex alpha » size »Female Male
Age alpha » size »Child Adult
Class alpha » size »Second Class First Class Third Class Crew
Explanation
Parallel Sets
15D. Koop, DSC 201, Fall 2017
[Titanic Data, J. Davies]
Scatterplot
16D. Koop, DSC 201, Fall 2017
Scatterplots and Correlation
17D. Koop, DSC 201, Fall 2017
0
0
0
0
0
0
00
5
5
5
5
5
5
55
10
10
10
10
10
10
1010
15
15
15
15
15
15
1515
20
20
20
20
20
20
2020
25
25
25
25
25
25
2525
30
30
30
30
30
30
3030
35
35
35
35
35
35
3535
40
40
40
40
40
40
4040
45
45
45
45
45
45
4545
economy (mpg)
economy (mpg)
economy (mpg)
economy (mpg)
economy (mpg)
economy (mpg)
economy (mpg)economy (mpg)
3.0
3.0
3.0
3.0
3.0
3.0
3.03.0
3.5
3.5
3.5
3.5
3.5
3.5
3.53.5
4.0
4.0
4.0
4.0
4.0
4.0
4.04.0
4.5
4.5
4.5
4.5
4.5
4.5
4.54.5
5.0
5.0
5.0
5.0
5.0
5.0
5.05.0
5.5
5.5
5.5
5.5
5.5
5.5
5.55.5
6.0
6.0
6.0
6.0
6.0
6.0
6.06.0
6.5
6.5
6.5
6.5
6.5
6.5
6.56.5
7.0
7.0
7.0
7.0
7.0
7.0
7.07.0
7.5
7.5
7.5
7.5
7.5
7.5
7.57.5
8.0
8.0
8.0
8.0
8.0
8.0
8.08.0cylinders
cylinders
cylinders
cylinders
cylinders
cylinders
cylinderscylinders
100
100
100
100
100
100
100100
150
150
150
150
150
150
150150
200
200
200
200
200
200
200200
250
250
250
250
250
250
250250
300
300
300
300
300
300
300300
350
350
350
350
350
350
350350
400
400
400
400
400
400
400400
450
450
450
450
450
450
450450
displacement (cc)
displacement (cc)
displacement (cc)
displacement (cc)
displacement (cc)
displacement (cc)
displacement (cc)displacement (cc)
0
0
0
0
0
0
00
20
20
20
20
20
20
2020
40
40
40
40
40
40
4040
60
60
60
60
60
60
6060
80
80
80
80
80
80
8080
100
100
100
100
100
100
100100
120
120
120
120
120
120
120120
140
140
140
140
140
140
140140
160
160
160
160
160
160
160160
180
180
180
180
180
180
180180
200
200
200
200
200
200
200200
220
220
220
220
220
220
220220
power (hp)
power (hp)
power (hp)
power (hp)
power (hp)
power (hp)
power (hp)power (hp)
2,000
2,000
2,000
2,000
2,000
2,000
2,0002,000
2,500
2,500
2,500
2,500
2,500
2,500
2,5002,500
3,000
3,000
3,000
3,000
3,000
3,000
3,0003,000
3,500
3,500
3,500
3,500
3,500
3,500
3,5003,500
4,000
4,000
4,000
4,000
4,000
4,000
4,0004,000
4,500
4,500
4,500
4,500
4,500
4,500
4,5004,500
5,000
5,000
5,000
5,000
5,000
5,000
5,0005,000
weight (lb)
weight (lb)
weight (lb)
weight (lb)
weight (lb)
weight (lb)
weight (lb)weight (lb)
8
8
8
8
8
8
88
10
10
10
10
10
10
1010
12
12
12
12
12
12
1212
14
14
14
14
14
14
1414
16
16
16
16
16
16
1616
18
18
18
18
18
18
1818
20
20
20
20
20
20
2020
22
22
22
22
22
22
2222
24
24
24
24
24
24
2424
0-60 mph (s)
0-60 mph (s)
0-60 mph (s)
0-60 mph (s)
0-60 mph (s)
0-60 mph (s)
0-60 mph (s)0-60 mph (s)
70
70
70
70
70
70
7070
71
71
71
71
71
71
7171
72
72
72
72
72
72
7272
73
73
73
73
73
73
7373
74
74
74
74
74
74
7474
75
75
75
75
75
75
7575
76
76
76
76
76
76
7676
77
77
77
77
77
77
7777
78
78
78
78
78
78
7878
79
79
79
79
79
79
7979
80
80
80
80
80
80
8080
81
81
81
81
81
81
8181
82
82
82
82
82
82
8282year
year
year
year
year
year
yearyear
Parallel Coordinates
18D. Koop, DSC 201, Fall 2017
[M. Bostock]
Multiple Boxplots
19D. Koop, DSC 201, Fall 2017
Visualization
20D. Koop, DSC 201, Fall 2017
MTA Fare Data Visualization
21D. Koop, DSC 201, Fall 2017
MTA Fare Data Visualization
21D. Koop, DSC 201, Fall 2017
“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.”
— T. Munzner
22D. Koop, DSC 201, Fall 2017
“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.”
— T. Munzner
23D. Koop, DSC 201, Fall 2017
“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.”
— T. Munzner
23D. Koop, DSC 201, Fall 2017
NYC Subway Fare Data
“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.”
— T. Munzner
23D. Koop, DSC 201, Fall 2017
Find Interesting NYC Subway Ridership Patterns
NYC Subway Fare Data
“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.”
24D. Koop, DSC 201, Fall 2017
Why People?• Certain tasks can be totally automated
- Statistical computations - Machine learning algorithms - We don’t need visualization for these tasks (although perhaps for
debugging them…) • Analysis problems are often ill-specified
- What is the correct question? - Exploit human visual system, pattern detection capabilities - Goal may be an automated solution or a visual analysis system
• Presentation - It is often easier to show someone something than to tell them a
bunch of facts about the data (and let them explore it)
25D. Koop, DSC 201, Fall 2017
Why Computers?
26D. Koop, DSC 201, Fall 2017
[Cerebral, Barsky et al., 2007]
Why Computers?
27D. Koop, DSC 201, Fall 2017
[Cerebral, Barsky et al., 2007]
Resource Limitations• Memory and space constraints • How many pixels do I have? • Information Density
28D. Koop, DSC 201, Fall 2017
[McGuffin & Robert, 2010]
Fig. 2. A tree of regions and major islands of the Philippines, drawnusing squarified treemaps (top) and using icicle diagrams (bottom). Thetwo diagrams on the left weight leaf nodes by geographic area, whereasthe two diagrams on the right give equal area to leaf nodes. Labels arerotated when necessary to maximize their size.
arbitrarily deep. Our analysis allows us to rank tree representations bytheir efficiency, which is useful for helping designers choose the mostefficient representation allowable within other given constraints.
Our work also quantifies an interesting difference between repre-sentations in how they distribute area across nodes. For example, theicicle diagrams in Figures 1C, 2C, and 2D allocate equal area to eachlevel of the tree: the root node has the same area as all the leaf nodestogether. Treemaps, in contrast, typically allocate more area to deepernodes. There is a tradeoff here, since we would like users to be ableto see as many deep nodes as possible (which tend to also be the mostnumerous nodes), while at the same time providing some informationabout shallow nodes (for example, to give an overview of subtrees,and/or to guide the user in zooming operations). This article developsa new metric, the mean area exponent, that describes the distributionof area across levels of a tree representation, to quantify this tradeoff.
Finally, we also present a set of design guidelines for using treerepresentations, as well as a few novel tree representations, includinga variation on squarified treemaps that allows for larger labels withinthe nodes.
2 RELATED WORK
Different tree representations, including classical node-link, icicle,nested enclosure, and indented outline, were identified decades agoin [4, 16], and an interactive version of the indented outline represen-tation (now popular in file browsers such as Microsoft Explorer) waspresented in [11]. Subsequent years have seen variations on these rep-resentations proposed. Treemaps are a relatively recent innovation,and are a kind of nested enclosure representation. Treemaps are of-ten described as space-filling, a highly desirable property for space-efficiency.
The term “space-filling” can sometimes be problematic, however.For example, a view sometimes expressed [21] holds that tree repre-sentations can be divided into two classes: (1) node-link diagrams, thatillustrate parent-child relationships with line segments or curves, and
(2) space-filling representations, which include treemaps and concen-tric circles such as Sunburst (Sunburst was described as space-filling in[27], and [21] similarly describe [2] as space-filling.) However, thesetwo classes seem to not be disjoint, because some node-link diagramsalso “fill space” [19, 20]. The 2nd class also ignores an interesting dif-ference between treemaps and concentric circles, namely that parentnodes in treemaps enclose their children, whereas parents in concen-tric circle diagrams are adjacent to their children. Finally, the term“space-filling” suggests increased space-efficiency, however it is easyto design a treemap layout algorithm that occupies all available spacewithout making good use of it, for example, by using excessively thickmargins, or by concentrating child nodes in only one corner of theirparent, leaving the rest of the parent empty and unused. Would sucha treemap cease to be considered space-filling, even though its rootnode covers all the available space? Without a precise definition of“space-filling”, we recommend being cautious about using this termto refer to a category of tree representations, since the name seemsto imply that members of the category are more space-efficient thannon-members. As an alternative, categories of representations couldinstead be based on how the nodes are drawn (e.g. representationswhere the nodes are mapped to points, and those where the nodes aremapped to areas) or on how parent-child relationships are shown (e.g.through line segments, enclosure, adjacency, or relative positioning).The space-efficiency of a given representation can be treated as a sep-arate matter, and evaluated by several metrics, as demonstrated in thisarticle.
Within the graph drawing community, a common approach for eval-uating space-efficiency is to compare the total area required by differ-ent drawings (i.e., representations) of the same graph or tree. Sinceany drawing can be scaled arbitrarily in x and y, to ensure a meaning-ful comparison, the “resolution” of the representations is fixed, oftenby requiring that nodes be positioned on a grid (i.e. with integer co-ordinates) [9]. There are problems with this general approach, how-ever, especially when comparing representations of trees rather thangraphs. For example, allowing only grid positions may be mislead-ing, because non-integer coordinates can significantly reduce total areawithout compromising the clarity of the representation or the spaceavailable for labels (Figure 3). As a potential remedy, instead of posi-tioning nodes on a grid, we might instead impose a minimum distancebetween nodes, or a minimum size for non-overlapping labels centeredover the nodes. Unfortunately, matters are complicated by the fact thatsome tree representations (such as Figures 1C, 1E, 1F, 1G) involvenodes that have an area and shape, and there may be nodes and labelsof different sizes within a single representation (e.g. deeper nodes maybe smaller and have smaller labels). This makes it less clear how toimpose a fixed resolution in a way that is fair across tree representa-tions. Note that this issue does not arise in traditional graph drawing,where nodes are typically mapped to points.
Fig. 3. A and B are adapted from a comparison in Figure 5 of [1], andshow two different graphical representations of the same tree wherenodes are constrained to positions with integer coordinates. B is clearlymore compact than A. In C, however, we have redrawn the represen-tation from A with the integer coordinate constraint relaxed, and the re-sulting graphical representation has a convex hull whose area is onlyabout 5% greater than that in B. Notice also that the minimum horizon-tal spacing between nodes in B and C is the same, allowing nodes to beoverlaid with horizontally oriented labels of the same size in both cases.
In our work, rather than comparing total area with a fixed resolution,we fix the total area available, and fix its aspect ratio. Representations
“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively”
29D. Koop, DSC 201, Fall 2017
Why Visual?
30D. Koop, DSC 201, Fall 2017
[F. J. Anscombe]
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Why Visual?
30D. Koop, DSC 201, Fall 2017
[F. J. Anscombe]
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Mean of x 9Variance of x 11Mean of y 7.50Variance of y 4.122Correlation 0.816
●
●●
●●
●
●
●
●
●●
4 6 8 10 12 14 16 18
4
6
8
10
12
x1
y 1
●●
●●●
●
●
●
●
●
●
4 6 8 10 12 14 16 18
4
6
8
10
12
x2
y 2●
●
●
●●
●
●●
●
●●
4 6 8 10 12 14 16 18
4
6
8
10
12
x3
y 3
●●
●
●●
●
●
●
●
●
●
4 6 8 10 12 14 16 18
4
6
8
10
12
x4
y 4
Why Visual?
31D. Koop, DSC 201, Fall 2017
[F. J. Anscombe]
Visual Pop-out
32D. Koop, DSC 201, Fall 2017
[C. G. Healey, http://www.csc.ncsu.edu/faculty/healey/PP/]
Visual Pop-out
33D. Koop, DSC 201, Fall 2017
[C. G. Healey, http://www.csc.ncsu.edu/faculty/healey/PP/]
Visual Pop-out
34D. Koop, DSC 201, Fall 2017
[C. G. Healey, http://www.csc.ncsu.edu/faculty/healey/PP/]
Visual Perception Limitations
35D. Koop, DSC 201, Fall 2017
[C. G. Healey, http://www.csc.ncsu.edu/faculty/healey/PP/]
Visual Perception Limitations
36D. Koop, DSC 201, Fall 2017
[C. G. Healey, http://www.csc.ncsu.edu/faculty/healey/PP/]
Other Human Limitations• Visual working memory is small • Change blindness: Large changes go unnoticed when we are
working on something else in our view
37D. Koop, DSC 201, Fall 2017
“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively”
38D. Koop, DSC 201, Fall 2017
Design Iteration
39D. Koop, DSC 201, Fall 2017
[19 Sketches of Quarterback Timelines, K. Quelay]
Design Iteration
40D. Koop, DSC 201, Fall 2017
[19 Sketches of Quarterback Timelines, K. Quelay]
Design Iteration
41D. Koop, DSC 201, Fall 2017
[19 Sketches of Quarterback Timelines, K. Quelay]
Another Design Example
42D. Koop, DSC 201, Fall 2017
[M. Stefaner, 2013]
“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively”
43D. Koop, DSC 201, Fall 2017
Why Effectiveness?• “It’s not just about pretty pictures” • Any depiction of data requires the designer to make choices about
how that data is visually represented - Analogy to photography - Lots of possibilities (see quarterback study)
• Effectiveness measures how well the visualization helps a person with their tasks - How? insight, engagement, efficiency? - Benchmarks and user studies
44D. Koop, DSC 201, Fall 2017
Effectiveness
45D. Koop, DSC 201, Fall 2017
[S. Hayward, 2015]
Effectiveness
46D. Koop, DSC 201, Fall 2017
[@bizweekgraphics]
Effectiveness
47D. Koop, DSC 201, Fall 2017
[S. Hayward, 2015]
Tableau Example
48D. Koop, DSC 201, Fall 2017