declutter and focus: empirically evaluating design ...€¦ · communicating data jones • •...

11
Declutter and Focus: Empirically Evaluating Design Guidelines for Effective Data Communication Kiran Ajani, Elsie Lee, Cindy Xiong, Cole Nussbaumer Knaflic, William Kemper, and Steven Franconeri, Member, IEEE Fig. 1. We empirically evaluated the effects of two visualization design themes frequently prescribed by practitioner guides: declutter and focus. Decluttered designs showed small advantages for subjective ratings, and adding focus to the designs showed additional subjective rating advantages, along with a strong influence on what data pattern was remembered by viewers. Abstract—Data visualization design has a powerful effect on which patterns we see as salient and how quickly we see them. The visualization practitioner community prescribes two popular guidelines for creating clear and efficient visualizations: declutter and focus. The declutter guidelines suggest removing non-critical gridlines, excessive labeling of data values, and color variability to improve aesthetics and to maximize the emphasis on the data relative to the design itself. The focus guidelines for explanatory communication recommend including a clear headline that describes the relevant data pattern, highlighting a subset of relevant data values with a unique color, and connecting those values to written annotations that contextualize them in a broader argument. We evaluated how these recommendations impact recall of the depicted information across cluttered, decluttered, and decluttered+focused designs of six graph topics. Participants were asked to redraw previously seen visualizations, to recall their topics and main conclusions, and to rate the varied designs on aesthetics, clarity, professionalism, and trustworthiness. Decluttering designs led to higher ratings on professionalism, and adding focus to the design led to higher ratings on aesthetics and clarity, and better memory for the highlighted pattern in the data, as reflected both by redrawings of the original visualization and typed free-response conclusions. The results largely empirically validate the intuitions of visualization designers and practitioners. Index Terms—data visualization, data communication, data storytelling, empirical evaluation, visualization aesthetics 1 I NTRODUCTION Each day, across organizations, research labs, journalism outlets, and classrooms, tens of millions of people attempt to communicate specific patterns in data using visualizations. One estimate from Microsoft, albeit 20 years old, puts the number of PowerPoint presentations alone • Kiran Ajani is with Case Western Reserve University School of Medicine. E-mail: [email protected]. • Elsie Lee is with University of Michigan School of Information. E-mail: [email protected]. *corresponding author • Cole Nussbaumer Knaflic is with storytelling with data. Cindy Xiong, William Kemper, and Steven Franconeri are with Northwestern University. Manuscript received xx xxx. 201x; accepted xx xxx. 201x. Date of Publication xx xxx. 201x; date of current version xx xxx. 201x. For information on obtaining reprints of this article, please send e-mail to: [email protected]. Digital Object Identifier: xx.xxxx/TVCG.201x.xxxxxxx at 30 million per day [1] – and the majority of those presentations likely contain data visualizations intended to highlight particular data patterns. Given the ubiquity of visual data communication, it is vital that visualizations transmit intended patterns to audiences quickly and clearly. In contrast, dozens of best-selling practitioner guides (Table 1) argue that business-as-usual visualizations are ineffective, confusing, or even misleading [236]. These books prescribe multiple tactics for improving graphical communication, but two themes stand out as common across many of them. The first guideline is to ‘declutter’ a visualization (contrast the first and second examples in Figure 1), by removing unnecessary elements like gridlines, marks, legends, and colors. The second is to ‘focus’ visualizations (contrast the second and third examples in Figure 1), by providing annotation and highlighting that lead a viewer to focus on a given pattern in the data. Dozens of books intended for practitioners prescribe these designs because they note that real world visualizations and presentations tend to violate these guidelines across organizations. While we know of no empirically driven estimates of their prevalence, paper author C.N.K. estimates, based on training more than 25,000 people across various

Upload: others

Post on 02-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Declutter and Focus: Empirically Evaluating Design ...€¦ · Communicating Data Jones • • With Tableau Data Visualisation Kirk • • Storytelling With Data Knaflic • •

Declutter and Focus: Empirically Evaluating Design Guidelines forEffective Data Communication

Kiran Ajani, Elsie Lee, Cindy Xiong, Cole Nussbaumer Knaflic, William Kemper, and Steven Franconeri, Member, IEEE

Fig. 1. We empirically evaluated the effects of two visualization design themes frequently prescribed by practitioner guides: declutterand focus. Decluttered designs showed small advantages for subjective ratings, and adding focus to the designs showed additionalsubjective rating advantages, along with a strong influence on what data pattern was remembered by viewers.

Abstract—Data visualization design has a powerful effect on which patterns we see as salient and how quickly we see them. Thevisualization practitioner community prescribes two popular guidelines for creating clear and efficient visualizations: declutter and focus.The declutter guidelines suggest removing non-critical gridlines, excessive labeling of data values, and color variability to improveaesthetics and to maximize the emphasis on the data relative to the design itself. The focus guidelines for explanatory communicationrecommend including a clear headline that describes the relevant data pattern, highlighting a subset of relevant data values with aunique color, and connecting those values to written annotations that contextualize them in a broader argument. We evaluated howthese recommendations impact recall of the depicted information across cluttered, decluttered, and decluttered+focused designs ofsix graph topics. Participants were asked to redraw previously seen visualizations, to recall their topics and main conclusions, andto rate the varied designs on aesthetics, clarity, professionalism, and trustworthiness. Decluttering designs led to higher ratings onprofessionalism, and adding focus to the design led to higher ratings on aesthetics and clarity, and better memory for the highlightedpattern in the data, as reflected both by redrawings of the original visualization and typed free-response conclusions. The resultslargely empirically validate the intuitions of visualization designers and practitioners.

Index Terms—data visualization, data communication, data storytelling, empirical evaluation, visualization aesthetics

1 INTRODUCTION

Each day, across organizations, research labs, journalism outlets, andclassrooms, tens of millions of people attempt to communicate specificpatterns in data using visualizations. One estimate from Microsoft,albeit 20 years old, puts the number of PowerPoint presentations alone

• Kiran Ajani is with Case Western Reserve University School of Medicine.E-mail: [email protected].

• Elsie Lee is with University of Michigan School of Information.E-mail: [email protected]. *corresponding author

• Cole Nussbaumer Knaflic is with storytelling with data.• Cindy Xiong, William Kemper, and Steven Franconeri are with Northwestern

University.

Manuscript received xx xxx. 201x; accepted xx xxx. 201x. Date of Publicationxx xxx. 201x; date of current version xx xxx. 201x. For information onobtaining reprints of this article, please send e-mail to: [email protected] Object Identifier: xx.xxxx/TVCG.201x.xxxxxxx

at 30 million per day [1] – and the majority of those presentationslikely contain data visualizations intended to highlight particular datapatterns. Given the ubiquity of visual data communication, it is vitalthat visualizations transmit intended patterns to audiences quickly andclearly. In contrast, dozens of best-selling practitioner guides (Table 1)argue that business-as-usual visualizations are ineffective, confusing,or even misleading [2–36]. These books prescribe multiple tacticsfor improving graphical communication, but two themes stand out ascommon across many of them. The first guideline is to ‘declutter’ avisualization (contrast the first and second examples in Figure 1), byremoving unnecessary elements like gridlines, marks, legends, andcolors. The second is to ‘focus’ visualizations (contrast the second andthird examples in Figure 1), by providing annotation and highlightingthat lead a viewer to focus on a given pattern in the data.

Dozens of books intended for practitioners prescribe these designsbecause they note that real world visualizations and presentations tendto violate these guidelines across organizations. While we know of noempirically driven estimates of their prevalence, paper author C.N.K.estimates, based on training more than 25,000 people across various

Page 2: Declutter and Focus: Empirically Evaluating Design ...€¦ · Communicating Data Jones • • With Tableau Data Visualisation Kirk • • Storytelling With Data Knaflic • •

Table 1. Data Visualization Practitioner Guides

Book Title Author Declutter FocusInfo We Trust Andrews •Good Charts Berinato • •The Functional Art Cairo • •The Truthful Art Cairo • •Data At Work Camoes • •Trees, Maps, and Doumont • •TheoremsDataStory Duarte •slide:ology Duarte • •Effective Data Storytelling Dykes • •Effective Data Evergreen • •VisualizationPresenting Data Evergreen • •EffectivelyNow You See It Few • •Information Dashboard Few • •DesignAvoiding Data Pitfalls Jones • •Communicating Data Jones • •With TableauData Visualisation Kirk • •Storytelling With Data Knaflic • •Storytelling With Data: Knaflic • •Let’s Practice#MakeoverMonday Kriebel et al. • •The Book of Trees Lima •Design for Information Meirelles •Visualization Analysis Munzner •& DesignTableau Your Data! Murray • •Better Presentations Schwabish • •Elevate the Debate Schwabish • •Visual Explanations Tufte • •Visual Display of Tufte • •Quantitative InformationEnvisioning Information Tufte • •The Power of Data Vora • •StorytellingInformation Visualization: Ware •Perception for DesignVisual Thinking for Design Ware •Big Book of Dashboards Wexler et al. • •WSJ Guide to Information Wong • •GraphicsData Points Yau • •

organizations and industries, that the overwhelming majority of visu-alizations intended for explanatory purposes include graphical clutterand do not focus attention. Note that this sample is likely to be anoverestimate, because these organizations had self-selected for visual-ization design training. But it is consistent with the fact that dozensof best-selling books have taken the time to make these prescriptions –that would be unlikely if those practices were already common in thereal world.

While these two guidelines are argued to improve the comprehen-sion and clarity of data communication, given their prevalence asprescriptions and their potential for impact in daily life, these rec-ommendations have not been sufficiently empirically evaluated. Thedeclutter guideline has some less controversial elements, such as re-placing legends with direct labels, which are highly likely to improveperformance [3–7, 9–20, 24–30, 32–36]. However, other aspects of thisprocess present potential tradeoffs. In the example in Figure 1, theoverall aesthetic of the visualization might improve by removing dottedlines connecting the segments (contrast the first and second graphs),but that could also decrease the precision of comparing values betweenthe two stacks. Removing the variety of colors for the nominal cate-

gories (as shown between the first and second graphs) might lead tohigher aesthetic appeal, but it may also render the different categoriesharder to match. Are these ‘clutter’ elements really such an impedimentfor a powerful visual system that takes up almost half of the humanbrain [37], or might they perhaps lead to a level of minimalism thatthe viewer finds boring? Likewise, the focus guideline points viewerstoward a particular pattern in the data, with one intention of having aviewer better remember the shape of the data. But again, given such apowerful visual system, is this step really needed? Might the viewerinstead feel that a single story is being ‘pushed’ on them?

We empirically tested the impacts of these design guidelines, acrossparticipant ratings (aesthetics, clarity, professionalism, and trustwor-thiness) and memory recall (via drawings and typed responses) forcluttered, decluttered, and decluttered+focused versions of visualizeddatasets. In collaboration with a practitioner guide author (paper authorC.N.K.), we generated six example visualization topics in each of thethree design styles shown in Figure 1. We find that decluttered visu-alizations are generally rated more positively on professionalism, butlarger benefits appeared for decluttered+focused visualizations. Thesewere rated more positively on aesthetics and clarity, and led to improvedmemory for focus-relevant data patterns, as measured by drawings andtyped conclusions from memory. However, the focus manipulation didnot have an additional effect beyond the declutter manipulation on rat-ings of professionalism, and neither decluttering alone nor the additionof focusing showed strong evidence for improving trustworthiness of agraph.

1.1 Related Work1.1.1 Decluttering a VisualizationOne form of argument for minimalist design in a visualization is Tufte’s‘data-to-ink ratio’ [28]. While the definitions of ‘data’ vs. ‘ink’ can bevague and subject to context [38], the general prescription is to removeany unnecessary elements in a visualization. In many cases, this ruleis invoked as a reason to omit ‘chartjunk,’ pictorial ornamentation andmetaphors such as arranging increasingly large bars in a graph as amonster’s teeth, or putting a memorable dinosaur in the backgroundof a graph. Some past work shows that these pictorial embellishmentscan lead to either better or worse performance on immediate reportsdepending on the details of task and context [39, 40], and found nosignificant effect of the presence of chartjunk on decision-making[41]. Chartjunk can lead to higher engagement and aesthetics ratings[40], as well as better short- and long-term memory for whether thevisualization was previously seen [42] and what the data content ormessage was [43–45].

Our present work focuses not on such pictorial ‘chartjunk,’ but with‘clutter,’ “...conventional graphical paraphernalia routines added toevery display that passes by: over-busy grid lines and excess ticks, ...the debris of computer plotting...” [28]. A critique of the default settingsof Microsoft Excel 2007 shows that the software created visualizationsthat nudge users toward redundant gridlines, excessive labels, excessivecolor, and 3D effects [31, 32, 46–48]. Authors of data visualizationbooks also suggest reducing the number of colors present in a displayto as few as possible, eliminating separate colors to indicate nominalcategories in the data (e.g., using different colors for marks in a lineor bar graph), or other areas of a visualization such as the background[3, 19, 46, 49].

Some existing studies have sought to determine whether declutteringwould lead to objectively better performance on graphical perceptionand memory tasks. One study found that within a simulated monitor-ing dashboard for nine metrics, removing unnecessary elements (tickmarks, verbose scale number labeling, redundant readouts, coloredbackgrounds highlighting relevant thresholds, etc.) did not improveresponse times or situation awareness accuracy [50]. A more extrememanipulation stripped away the graphical display entirely so that val-ues were only displayed as text digits, and this did improve responsetimes [50]. But this more drastic change might improve performancenot because it omits clutter, but because it contained larger versions oftext digits and a more precise data representation that was likely moresuitable for the participant’s high-precision monitoring task. Another

Page 3: Declutter and Focus: Empirically Evaluating Design ...€¦ · Communicating Data Jones • • With Tableau Data Visualisation Kirk • • Storytelling With Data Knaflic • •

study asked participants to compute simple means, differences, andcomparisons on bar graphs with only two bars, while manipulating thepresence of individual bits of ‘clutter,’ finding that including axis tickmarks slightly increase response times, but completely removing the x-and y-axes can slightly slow responses, with both effects occurring forbar but not line graphs [39]. Finally, another study showed that graphswith ‘data redundancy’ – symbolic numbers placed on or near visualmarks that already show those numbers visually – actually showedhigher quality memory reports for their content [44].

Other studies evaluate the impact of decluttering on preference rat-ings. When they are important for a precise estimation task, viewersprefer lower-contrast gridlines compared to heavier lines that can ob-scure the underlying data, and the authors even provide quantitativealpha values for the preferred range [51], and show how this rangevaries by the color of the gridlines [52]. Another study evaluated rat-ings of beauty, clarity, effectiveness, and simplicity of visualizationswith high vs. low ‘data-ink’ ratios [53]. Surprisingly, visualizationswith lower data-ink ratios (more ‘clutter’) were rated more positively,potentially because the more minimalistic style was unfamiliar, giventhat the cluttered designs are encountered more frequently in tools likeMicrosoft Excel. Another study found similar results, especially forextremely minimalistic designs [54], and again argued that the lowerlevel of familiarity might drive a distaste for overly decluttered designs.

1.1.2 Focusing a VisualizationWhile broad statistics about the data values, such as distributions andoutliers, are available quickly [55], picking out one of the many – evendozens or hundreds – of potential patterns, trends, and relations ofinterest within it is an inefficient perceptual process [56], requiringseconds or minutes of processing to unpack the ‘paragraphs-worth’ ofinformation implicit in a single visualization [57]. One study showedthat once a pattern is seen within a visualization, a ‘curse of expertise’biases people to tend to think that others will focus on that patternas well, even when they don’t. Viewers saw background informationthat focused on a particular data pattern (i.e., relationships betweentwo lines from a four-line graph) in a visualization. They then askedparticipants to forget that story, and predict which of multiple possiblepatterns would be most salient to a viewer who had not heard thatstory. Despite reminders that other viewers had not heard the story,participants still incorrectly predicted that others would see what theysaw in the visualization [58].

This problem motivates a frequent practitioner guideline to focus avisualization: if there is a single pattern that a viewer should extractand remember, then the designer should state the pattern clearly withdirect annotation, and highlight the key data values that create thatpattern [2–36]. One study found that visualizations with a title showedhigher quality memory reports for their content, especially when thetitle included ‘message redundancy,’ an additional section of text thatfocused the viewer on a key pattern in the visualization [44]. That studyalso found that titles were robustly fixated (especially when placed atthe top as opposed to the bottom of the visualization), and that laterdescriptions from memory tended to reflect a rewording of the contentexpressed in the title. Other work focuses not just on focusing a singleview on a single visualization, but instead on the broader practice of‘storytelling’: creating a sequence of views that follows an argumentor other narrative as a rhetorical tool to guide a viewer through a morecomplex set of data patterns over time, with some considering the addedcomplexities of user interactivity, path choice, and drilldown [59–62].Some of these studies pick out real-world examples of highlightingrelevant data values in a visualization [60].

1.2 Contributions of the present workRelatively more existing work has tested the declutter guideline thanthe focus guideline, but this work has not converged on a clear answer.Some of this work shows an advantage to decluttering [51], some showsno difference [50], some shows a disadvantage [44], and some showsa mix [39]. For preference ratings, at least two recent studies actuallyshow a preference for more cluttered designs, perhaps because theyare more familiar [53, 54], though these studies use graphs that are

already highly minimalistic and abstract (e.g. an Excel bar graph of‘Sales’ across four regions ‘North,’ ‘South,’ etc.). Other work thatfinds little impediment of clutter on objective performance uses verysimple displays, such as 2-value graphs [39], or relies on a specificdashboard monitoring task [50]. While well-controlled, these highlyabstract graphs and specific tasks may not reflect the preference andperformance effects generated by the types of clutter seen frequently byparticipants. To test this idea, we use realistic examples of both graphtopics and clutter types drawn from a guidebook derived from manyyears of experience in organizational presentation settings [19].

Little work has tested the effects of the focus guideline. In the closestwork, titles (particularly those that focused on a single message) led tobetter objective memory of the content in the visualization [44]. Whilehighly suggestive, these data are correlational because presence of atitle was not randomly assigned, so it is possible that a third variablecontributed to that advantage. For example, a particularly salient datapattern could drive both the original visualization author and experi-mental participant to notice that more memorable pattern. We thereforetest the same visualizations across three designs, in a between-subjectcounterbalanced manipulation. Furthermore, that study was concernedwith measuring ‘objective’ memory for correct vs. incorrect informa-tion in the visualization, while our focus is on measuring how stronglya focus manipulation could subjectively emphasize one possible patternin the data over others. This previous work also measured relativelylong-term memory for visualizations after seeing 100 total visualiza-tions for 10 seconds each, simulating a longer-term exposure to manyvisualizations. In contrast, our goal was to test an immediate under-standing after a 10-second viewing period, in order to simulate theexperience of being shown data on a handout or presentation slide in ameeting, conference, or discussion. In addition to memory reports viatyped text, we add a novel method of collecting drawings from memory,to see if the focus manipulation would affect not only what they say, butwhat they remember seeing. Finally, we know of no existing work thatmeasures preference ratings (aesthetics, etc.) across the increasinglyprevalent practice of implementing the focus guideline in visualizationdesigns.

2 EXPERIMENT OVERVIEW

Figure 1 depicts an example of three design variations – cluttered,decluttered, and decluttered+focused – for one of the visualizationtopics used in our experiments. We take representative examples froma popular practitioner guide [19], 100,000+ copies sold according tostorytellingwithdata.com [63], and the alternate designs were createdin collaboration with that book’s author.

We measured several metrics of communication effectiveness acrossthese three design variations, including whether viewers would bemore likely to recall the intended message of each visualization, asmeasured by qualitative coding of visualizations drawn from memoryand from typed free-responses. We additionally measured whether theywould rate the design variations differently across quantitative scalesof aesthetics, clarity, professionalism, and trustworthiness, as well asqualitative explanations of those ratings.

2.1 Participants

An omnibus power analysis from the quantitative data collected froma pilot experiment (see Supplementary Materials) suggested a targetsample of 24 participants would give us more than 95% power todetect an overall effect of graph version (cluttered, decluttered, de-cluttered+focused) on quantitative ratings of aesthetics, clarity, profes-sionalism, and trustworthiness. All 24 participants whose data wereused in the final analysis are either students or community members atNorthwestern University (18 female, age range 18 to 26, average age19.5, all normal or corrected-to-normal vision). We replaced a subset ofour initial 24 participants to resolve a condition counterbalancing errorand this replacement was performed blind to participant results, onlygoverned by which conditions they were shown. They participated inreturn for $10/hour or for course credit.

Page 4: Declutter and Focus: Empirically Evaluating Design ...€¦ · Communicating Data Jones • • With Tableau Data Visualisation Kirk • • Storytelling With Data Knaflic • •

Cluttered Decluttered FocusedNew

sCar

Holiday

Tires

Prison

Plants

Fig. 2. We created three versions (cluttered, decluttered, and focused) ofsix different graph topics following popular data visualization guidelines.

3 MATERIALS AND PROCEDURE

3.1 Stimuli

The stimuli for this experiment (Figure 2) consisted of cluttered, declut-tered, and decluttered+focused (which will be shortened to ‘focused’from here forward) versions each of six different example graphs: con-cerns about an automotive design split by concern category (‘Car’),holiday shopping frequencies over time split by gender (‘Holiday’),news sources over time split by medium (‘News’), the distribution ofcustomer-preferred flower seeds split by customer category (‘Plants’),US prisoner offenses split by category (‘Prison’), and retail prices oftires over time split by manufacturer (‘Tires’). The graphs consistedof vertical bar charts, horizontal bar charts, stacked vertical bar charts,and line graphs. We created the three different visualization designs ofthe same graph, using examples adapted from Nussbaumer Knaflic [19]

The cluttered graphs contained a larger palette of color, used low-legibility fonts, and included gridlines, heavy borders, backgroundshading, axis tick marks, data markers, redundant numeric labeling,diagonally rotated text, overuse of bolded text, and 3D shading cues.

The decluttered graphs removed color, chart borders, gridlines, tickmarks, and even axis lines in some cases, but kept the graph’s title, datavalues, and axes and their labels. Text was oriented horizontally insteadof diagonally, and text was spatially aligned to other elements in thegraph; for example, instead of being centered, titles were left-alignedwith the y-axis. White space was added between major elements (e.g.between the title and graph, or graph and footnote). Legends wereconverted into direct labels. For example, for the News graph, thedecluttered version removed the excessive data points and gridlines

present in the cluttered version, and added white space.The focused graphs added a single highlight color (e.g. red) to the

grayscale graph, intended to focus the viewer on a given pattern. In theHoliday graph topic, intensity of color was manipulated to focus theviewer particularly to a pair of data values. A one-sentence annotationdescribed a conclusion that could be drawn from that data pattern, withkey words in the same font color as the highlighted data pattern. Forexample, in the News graph, the focused version adds contrast betweenthe blue ‘Internet’ line and the other news source lines to draw attentionto the pattern of increasing usage.

3.2 Procedure

3.2.1 Part 1

We presented each of the six graph topics to 24 participants in a Latinsquare counterbalanced design that balanced presentation order acrossexamples. Participants were placed into one of six counterbalancinggroups, each of which had a predetermined balance of cluttered, de-cluttered, and focused versions such that each participant viewed twoexamples of each of the three visualization designs for a total of sixgraphs. The combination of which visualization design was seen withwhich graph topic was balanced across the counterbalancing groups,resulting in an equal presentation of each visualization design and graphtopic across all participants.

Redraw Task. Participants saw the following prompt: “You areabout to be shown a graph for 10 seconds. After the graph is displayed,you will be asked to redraw as much as you can remember about thegraph. Please do not draw anything on the paper in front of you untilafter the graph has disappeared from the screen.” Participants wereadditionally presented with a scenario that provided context for thegraph they were about to see. For example, before seeing any versionof the Plants graph, they read the following scenario: “A local botanicalgarden sells seven different brands of flower seeds to the community.These different seed companies also sell all over the United States. Ourlocal botanical garden wants to know how these brands are selling inour own store, compared to how they sell around the United States ingeneral.” All of the scenarios are available in Supplementary Materials.After the 10-second exposure, participants saw the prompt: “Please take1-2 minutes to redraw whatever you can remember from the graph thatyou just saw on the piece of paper in front of you. Advance to the nextpage once you have redrawn the graph.” Participants then redrew asmuch of the graph as possible from memory. All drawings are availablein the Supplementary Materials.

Free Response Conclusions. Participants wrote about the subjectmatter and conclusions that could be drawn from the graph. On the nextpage after redrawing the graph, they typed out the subject matter (“Whatwas the subject matter of the graph?”) and their perceived conclusion ofthe data (“What conclusions can be drawn from the graph?”) in a freeresponse text box. The subject matter responses were collected but arenot reported because they were largely redundant with more detailedresponses provided for the conclusion. For example, in response to thePlants graph, the subject matter written by Subject 1 was: “how seedbrands are selling in a local botanical store compared to the US,” whilethe conclusion response was: “which seeds are most popular locally,which seeds are most popular nationally, what seeds have poor saleslocally and/or nationally.” The subject matter and conclusion data isavailable in the Supplementary Materials.

3.2.2 Part 2

Quantitative Evaluation. We next presented all three visualizationdesigns of each graph topic simultaneously to each participant, andasked them to rate the three designs according to four Likert scales (1-5range). This process was repeated for each of the six topics for a totalof 18 graphs (3 designs x 6 topics) in a counterbalanced order suchthat each topic appeared an equal number of times in each spot in theordering. The four scales were:

• Aesthetics: “Overall, is this a visually appealing image?”, ratedfrom one (“very hideous”) to five (“very beautiful”).

Page 5: Declutter and Focus: Empirically Evaluating Design ...€¦ · Communicating Data Jones • • With Tableau Data Visualisation Kirk • • Storytelling With Data Knaflic • •

• Clarity: “Is it clear what information is being presented andwhy?”, rated from one (“I am utterly confused”) to five (“makesperfect sense”).

• Professionalism: “Does this graph look like something youwould see in a professional environment?”, rated from one (“veryunprofessional”) to five (“very professional”).

• Trustworthiness: “Based on the presentation, how trustworthyis the person who made this graph?”, rated from one (“very un-trustworthy”) to five (“very trustworthy”).

Qualitative Evaluation. Finally, we asked an open-ended ques-tion: “Please explain your reasoning for choosing these ratings. (1-2sentences)” for each of the four scales for all 18 graphs.

4 PILOT EXPERIMENT

We first piloted this experiment (17 participants, all received coursecredit at Northwestern University) in order to measure statistical powerfor our quantitative measures and to refine our stimuli. The stimuli andprocedure were similar to those described above, with the followingexceptions. For stimuli, we realized that some graph topics in thepilot condition appeared to have insufficient contrast in the focus oftheir conclusions between the decluttered and focused designs (e.g.the Prison topic), and made edits to the stimuli to address this forthe experiment. Focused designs initially omitted source information,which trustworthiness rating explanations revealed to be importantto participants, so these were added for all graph topics and designsafter the pilot. The Holiday example had confusingly inconsistenttemporal bins sizes on the x-axis (e.g., some bars represented 2-weekranges, some months), so for the experiment we changed these labelsto consistently present one month intervals. Finally, the pilot originallyused abstract data labels on the Plant and Tires topics, which weresubsequently changed from “Segment 1, Segment 2, Segment 3, etc.”to “Fresh Blooms, Terra Nova, Plant Select, etc.” and from “Product 1,Product 2, Product 3, etc.” to “BF Goodrich, Bridgestone, Continental,etc.,” respectively.

For the procedure of the pilot, when participants rated graphs alongthe four scales, they saw each of the 18 total graphs (3 designs x6 topics) individually, which we later decided would make it moredifficult for participants to weigh the different designs for each topicsimultaneously. Therefore, the experiment showed all three designsfor each topic at the same time. We did not perform a formal analysison the qualitative pilot data, as they were only intended to provideinspiration for refining the stimuli.

4.1 Pilot ResultsAn initial MANOVA examined the effect of graph version (cluttered, de-cluttered, and focused) on aesthetics, clarity, professionalism, and trust-worthiness ratings of the visualizations. This test revealed a significantmultivariate effect (Pillai’s value = 0.34) of design, after accounting forwithin-subject error. We conducted 999 simulations using Bonferroniadjustment methods using the R pairwise.perm.manova function [64]to obtain post-hoc pairwise comparisons between the three levels ofdesign with corrections for multiple testing. The test revealed that over-all, across all four dimensions, focused designs were rated significantlyhigher across all metrics than decluttered designs (p = 0.003), whichwere rated significantly higher than cluttered designs (p = 0.003).

We entered the Pillai’s value into G*Power [65] and conducted apower analysis. In this experimental design with 3 groups of visu-alization design (cluttered, decluttered, and focused) and 4 responsevariables (aesthetics, clarity, professionalism, and trustworthiness), thepilot experiment achieved 90.4% power at the alpha level of 0.05. Withan increased sample size of 20 participants (with 3 measures for eachresponse variable), we would be able to obtain 95.29% power at analpha level of 0.05. We only analyzed the rating data from the pilot inorder to conduct a power analysis for Experiment 1.

5 EXPERIMENT RESULTS

Based on pilot rating effect sizes, we conducted the experiment with24 new participants in the same Latin-Square design such that all par-ticipants saw an equal amount of all three groups of design (cluttered,

Fig. 3. Whether the redrawn visualization is coded as containing ele-ments that are relevant (blue) or irrelevant (grey) to the pattern of datahighlighted by the focused design.

Table 2. Does the redrawn graph show the relevant conclusion?

Topic RelevantCar Does the graph include the word “noise” (or a

variation)?Holiday Is the first two months higher for women than men

AND is the last two months lower for women thanmen?

News Does the graph show “Internet” as increasing?Tires Do the lines converge on the right side of the graph?Prison Does the graph have both minor and major drug

offenses labeled OR include a sentence about minorand major drug offenses?

Plants Is the physical size OR percentage for the third,fourth, and fifth segments of US Population smallerthan those of Our Customers?

decluttered, focused). Because paper author C.N.K. is an author ofpractitioner books that prescribe the methods tested here, she couldbe perceived as having a conflict of interest in the outcome of thisstudy. Therefore, while she offered advice in the design of the stimuliand offered abstract contextualization advice on the design of qualita-tive coding schemes, she was purposely excluded from both the dataanalysis stage and initial drafts of the manuscript. Interactive visualiza-tions of the data are available at http://visualthinking.psych.northwestern.edu/projects/DeclutterFocus/ (note to anony-mous reviewers, the site does not track visitors).

5.1 Redraw Task ResultsOur results for the redraw task are shown in Figure 3, which comparesthe percentage of relevant to irrelevant redraws for all three visualizationdesigns (cluttered, decluttered, and focused) of all six graph topics.Qualitative coding of the contents of the redrawn graphs was performedby the first author and a second coder who were blind to both thestudy design and the condition manipulations. Any discrepancies wereresolved by a tiebreaker vote from the second author. Ratings werebased on a rubric of questions, as displayed in Table 2. To calculateinter-rater reliability (IRR), we used Cohen’s Kappa for 2 Raters. TheIRR kappa value for the redrawn graphs was strong [66]: Kappa =0.814 (z = 16, p < 0.001).

We explored whether the redrawn visualization was more likelyto reflect the graph author’s intended message, as determined by themessage that was featured most saliently in the focused graph condi-tion, and operationalized as shown in Table 2. We created a binary‘relevancy score’ based on whether or not the participants’ redrawngraphs contained at least one key element that displayed the main trend.For example, the relevancy score for the Car topic was only basedon whether or not the graph included the word “noise.” The Holidaytopic was the only topic that required two features to be present: “Isthe first two months higher for women than men AND is the last twomonths lower for women than men?” This was because the annotationon the focused visualization concentrated on a higher proportion of

Page 6: Declutter and Focus: Empirically Evaluating Design ...€¦ · Communicating Data Jones • • With Tableau Data Visualisation Kirk • • Storytelling With Data Knaflic • •

Fig. 4. Participants were more likely to recall the relevant conclusionsfrom focused visualizations compared to cluttered and decluttered de-signs. Blue rectangles represent typed topics that were relevant to thefocused design’s highlighted pattern, grey rectangles represent irrelevantones. The rectangle width depicts the percentage of participants namingthat topic. Across topics, focused designs led to more focus-relevantconclusions

women shopping earlier in the year compared to men. To determinewhether participants actually understood this conclusion, their graphhad to demonstrate women shopping more in earlier months and menshopping more in later months. If participants did show the intendedtrend in their redrawn graph of a topic, the trial would be coded asa 1, otherwise it would be coded as a 0. We hypothesized that theredraw visualizations would more successfully replicate main messagesillustrated in the focused visualizations, compared to decluttered andcluttered visualizations.

We used a mixed-effect linear model to fit the relevance scores [67]under the three visualization designs. For fixed effects, in addition tovisualization design, we also included graph topic and its interactionwith graph design as predictors. We used a random intercept termaccounting for individual differences as random effects to fit the Latin-Square design of the experiment.

The regression model indicated a moderate effect of visualizationdesign, χ2 = 21.20, η2 partial = 0.15, p < 0.001. Post-hoc analysiswith Tukey’s adjustments suggests that, overall, participants redrewvisualizations with the most relevant information when they viewedvisualizations with the focused design compared to cluttered (Est =0.75, p < 0.001) and decluttered (Est = 0.62, p = 0.007). There is nosignificant difference between relevance scores for the cluttered anddecluttered designs (Est = 0.13, p = 0.81).

There is also an effect of topic, χ2=38.31, η2 partial = 0.17,p < 0.001, suggesting that participants more readily included criticalinformation for some visualization topics than they did others. How-ever, the multiple comparisons of means using Tukey contrast did notreveal significant differences between visualizations of different topics.There is no interaction between design and topic χ2 = 16.05, p = 0.10.

5.2 Free Response Conclusion Results

Figure 4 shows a comparison of the relevant and irrelevant conclusionswritten by participants for each of the three visualization designs (clut-tered, decluttered, and focused). The free response conclusions werecategorized similarly to the redrawn graphs. The coders devised a setof categories from the entire set of responses using open coding basedon grounded theory [68], such that a new category was created for eachresponse that did not fit into previous categories. These categories werecreated in order to ascertain the number and variety of conclusionsparticipants could potentially draw from the graphs. All conclusionswere placed into one or more categories based on certain keywords.For example, any conclusion that mentioned steering as a concern inthe Car graph was categorized as “Steering”; if the conclusion addition-ally mentioned engine noise as a top concern, the conclusion wouldbe categorized as “Noise, Steering.” For analysis of participants whomentioned more than one conclusion, if they said a relevant conclusion,their response was analyzed as “relevant” even if they wrote additionalirrelevant conclusions as well. After all of the categories were created,the coders determined which ones were “relevant” to the conclusionsof the focused graphs based on the annotation on the graphs. The fullset of categories had both relevant and irrelevant conclusions, though

Table 3. Relevant Conclusion Categories

Topic Conclusion CriteriaCar Noise Mentions engine noise as

the top / one of the topconcern(s)

Holiday Men Mentions that men shop morein later months (Novemberand December)

Holiday Women Mentions that women shopmore / earlier than men

News Internet Mentions that internet usage isincreasing

Plant Local / National Mentions a difference betweenlocal and national sales

Prison Minor / Major Mentions both major andminor drug offenses

Tires Competition Mentions that tire prices arebecoming more competitive

Tires Convergence Mentions that tire pricesconverge

we only used the relevant conclusions for our final analysis. Table3 shows a subset of only the relevant categories. The full table ofconclusion categories as well as the categorization key are availablein Supplementary Materials. The qualitative coding for the recalledconclusions was also done by the first author, who was blind to eachtrial’s conditions, and a second coder who additionally was blind tothe study design. Discrepancies were resolved for this task without atiebreaker. To calculate inter-rater reliability (IRR), we used Cohen’sKappa for 2 Raters. The IRR kappa value for the recalled conclusionswas strong [66]: Kappa = 0.88 (z = 32.5, p < 0.001).

We conducted a logistic general linear regression predicting thelikelihood that participants would write a certain type of conclusionusing: the conclusion type (relevant or irrelevant; whether the conclu-sion was relevant to the message of the original visualization shown tothem), visualization design (cluttered, decluttered, and focused), andtheir interaction. Follow-up analysis of deviance using a Chi-squarebased ANOVA suggests that there is no significant main effect of de-sign (p = 0.82), such that participants were equally likely to drawconclusions for each design, but a significant main effect of conclusiontype (p = 5.66e−08), such that participants overall were more likelyto mention relevant information than irrelevant information in theirconclusions, and a significant interaction (p < 0.001). Note that Figure4 shows more grey (irrelevant) than blue (relevant) – this is becauseeach conclusion could have been placed into both relevant and irrel-evant categories depending on its content. Although more irrelevantcategories were mentioned overall, for each conclusion, participantsmore likely mentioned at least something that was coded as relevant.Close examination of the model reveals that these effects were drivenby participants being more likely to write visualization-story-relevantconclusions in their free responses when the original visualizationwas a focused design. Compared to showing participants a clutteredvisualization, showing them a focused visualization would increasethe likelihood of them writing a story-relevant conclusion afterwardsduring free-recall by 2.96 fold (Est = 1.08, p = 0.006). Showing par-ticipants a focused visualization would also significantly increase thelikelihood of writing a story-relevant conclusion compared to showingthem a decluttered visualization by 2.49 fold (Est = 0.91, p = 0.017).There is no significant difference in the likelihood of writing a story-relevant conclusion during free-recall between participants who viewedthe cluttered and decluttered visualizations (Est = 0.17, OR = 1.19,p = 0.67).

5.3 Quantitative Ratings on Aesthetics, Clarity, Profes-sionalism, and Trustworthiness

The average quantitative ratings for the three visualization designsacross these four variables are depicted in Figure 5. We conducted aMANOVA analysis examining the effect of visualization design on aes-

Page 7: Declutter and Focus: Empirically Evaluating Design ...€¦ · Communicating Data Jones • • With Tableau Data Visualisation Kirk • • Storytelling With Data Knaflic • •

Fig. 5. Quantitative ratings for the three visualization design on a scale from 1 to 5 on aesthetics, clarity, professionalism, and trustworthiness.

Table 4. Correlation and Variance Inflation Factors.

Aesthetics Clarity Prof. Trust. VIFAesthetics 1 0.60* 0.73* 0.59* 2.37Clarity 1 0.61* 0.52* 1.76Professionalism 1 0.69* 2.93Trustworthiness 1 2.00

thetics, clarity, professionalism, and trustworthiness ratings of the visu-alizations. We additionally considered possible effects of visualizationtopic and an interaction between design and topic. One participant’sdata was not analyzed because they did not complete this componentdue to a computer power failure, for a total of N = 23. After adjustingfor within-subject error, the analysis reveals a significant multivariateeffect of design (Pillai’s value = 0.57), a trending effect of topic (Pillai’svalue = 0.08), and a significant interaction between design and topic(Pillai’s value = 0.28).

Post-hoc analysis with Tukey’s adjustment using the lme4 pack-age in R studio [69] suggests that, comparing focused and cluttereddesigns, the focused designs are rated significantly higher on visualaesthetic appeal (Est = 0.86, p = 0.005), trendingly higher on clar-ity (Est = 0.60, p = 0.08), significantly higher on professionalism(Est = 1.10, p < 0.001) and not significantly different on trustwor-thiness (Est = 0.22, p = 0.64). Comparing focused and decluttereddesigns, the focused design was rated trendingly higher on aesthetics(Est = 0.53, p = 0.06), trendingly higher on clarity (Est = 0.55, p =0.06), and not significantly different on professionalism (Est =−0.01,p = 0.99) or trustworthiness (Est = −0.31, p = 0.31). Comparingdecluttered and cluttered designs, they are not significantly differenton aesthetics (Est = 0.33, p = 0.43) or clarity (Est = 0.05, p = 0.98),but decluttered is rated as significantly more professional (Est = 1.11,p < 0.001), and trendingly more trustworthy (Est = 0.53, p = 0.06).Overall, visualization design seems to have a relatively large effect sizeon all four ratings dimensions (aesthetics, clarity, professionalism, andtrustworthiness) above and beyond other factors such as topics, witha partial η2 ranging from 0.326 to 0.475. Details are available in theSupplementary Materials.

The trending main effect of topic and significant interaction of topicand design were driven by the Prison visualization being rated higherthan the Plants visualization on clarity (Est =−0.72, p = 0.043) andprofessionalism (Est = 0.63, p = 0.046), as well as the Tires storybeing rated higher on professionalism than the Car (Est = 0.90, p <0.001), Holiday (Est = 0.82, p = 0.002), and Plants (Est = 1.05, p <0.001) visualizations. Overall, the main effect of design persisted inthe same pattern despite these interactions.

Our four ratings measures are not fully independent. For example,increasing the professionalism of a visualization may go hand-in-handwith making it more aesthetically pleasing. We therefore conductedcollinearity diagnostics on these dependent variables in terms of vari-ance inflation factors (VIF). A VIF greater than 1 would suggest somecorrelation between them. For example, aesthetics has a VIF of 2.37,which is 1.37 times bigger than what would be expected if it had nocorrelation with the other three dimensions. Table 4 shows the VIFand correlation between all four rating dimensions. All VIF values arebelow four, suggesting a small to moderate correlation between them.Professionalism seems to be the most highly correlated with the otherthree dimensions.

Fig. 6. Qualitative ratings for the three visualization designs, where thelargest bubble represents 88% of participants coded as matching thattheme. Each bubble represents a different graph topic, allowing theviewer to gauge the consistency of comment frequency across topics(mappings from topics to bubbles are available in the full dataset in theSupplementary Materials)

5.4 Qualitative Rating Results

In addition to the Likert scale ratings on the four dimensions above(aesthetics, clarity, professionalism, and trustworthiness), we also quali-tatively coded the participants’ open-ended comments about the reasonsfor their ratings (their entry in a single text box could contain explana-tions of their ratings for one or all of the graph designs). The codingused the same four metrics as the quantitative ratings (Figure 6). Eachone of these metrics had several specific categories to evaluate themetric in more detail, including positive or negative sentiment (e.g.“too much color”). As above, one participant’s data was not analyzedbecause they did not complete this component due to a computer powerfailure, for a total of N = 23.

The open-ended comments were qualitatively coded by the secondauthor. Similar to the free response conclusions, the comments wereopen coded, letting the codes arise from the open-ended comments.Codes were created as they were seen in the data, with new codes beingcreated for concepts that were commonly being mentioned. Basedon these codes, four prominent categories of comments were decided:color, aesthetics, information, and emphasis. For each category, therewere codes that fell into either positive or negative comments about the

Page 8: Declutter and Focus: Empirically Evaluating Design ...€¦ · Communicating Data Jones • • With Tableau Data Visualisation Kirk • • Storytelling With Data Knaflic • •

category. For example, a common comment was that the participantthought that there were too many colors. This would have been codedas “too many colors” as well as the more general category of “dislikescolor.” Another participant commented, “Left graph uses too manycolors which is distracting and messy,” which was coded as “dislikedcolor, too many colors, disliked aesthetics, cluttered.” After all of thecodes had been defined, the second author made sure that all of thecomments were coded according to all of the codes.

For reliability, a second coder who was blind to both the study designand the condition manipulations coded the open-ended comments basedon the codes decided by the second author. Any discrepancies betweenthe second author and the blind coder were discussed and both codersagreed on a final decision. Inter-rater reliability (IRR) was calculatedthe same way as the previous coding. For the open-ended commentsfor the ratings qualitative coding, our kappa was 0.811 (z = 83.5, p <0.001), indicating strong agreement [66].

In order to have the qualitative rating responses be alignable tothe quantitative rating responses, we categorized each of the codesinto the categories from the quantitative rating: aesthetics, clarity,professionalism, and trustworthiness. Several new codes were added(e.g. emphasis is untrustworthy) to address these new categories. Thefirst author and the blind coder coded these new codes. Because of thecombination of the quantitative rating categories and the qualitativeopen-ended comments, some of the categories have fewer instancesof the code showing up in the data, such as the trustworthy category.This may suggest that participants did not consider trustworthiness asimportant as the other aspects of the graphs, since they mentioned itfewer times.

We conducted a regression analysis using a logistic general linearmodel with visualization design (cluttered, decluttered, focused) andsentiment (positive and negative), and their interactions, to predict thelikelihood that participants would produce such a response, for each ofthe four dimensions (aesthetics, clarity, professionalism, and trustwor-thiness). Analysis of variance using Chi-square comparisons revealedan overall main effect of visualization design across all four dimensions:aesthetics (χ2 = 48.71, p< 0.001), clarity (χ2 = 57.61, p< 0.001), pro-fessionalism (χ2 = 41.31, p < 0.001), and trustworthiness (χ2 = 28.89,p < 0.001). For aesthetics, clarity, and professionalism ratings, there isalso a significant main effect of sentiment and interaction of sentimentand design (details can be found in the Supplementary Materials).

Post-hoc analysis with the logistic model reveals that for aesthet-ics, participants were significantly less likely to associate the clut-tered design with a positive sentiment (Est =−1.15, p < 0.001, OR =0.39). They were also less likely to associate decluttered (Est =−1.16,p < 0.001, OR = 0.31) and focused designs (Est =−1.16, p < 0.001,OR = 0.18) with negative sentiments. Many participants claimed thatthe cluttered graphs were “disorganized” and “visually unappealing.”They were significantly more likely to associate decluttered and focuseddesigns with positive sentiment (Decluttered: Est = 1.39, p < 0.001;Focused: Est = 3.00, p < 0.001), with cluttered design as the refer-ence. For clarity, compared to the cluttered design, participants weresignificantly more likely to associate the focused design (Est = 1.61,p < 0.001, OR = 5.01) with positive sentiment. One participant notedthis difference for clarity and mentioned that the “[cluttered] one isunclear and looks messy with all the labels and colors. The [focused]one is the best as it emphasizes the data for internet, which conveys themessage clearly.” For professionalism, participants were significantlyless likely to associate decluttered (Est = −2.93, p < 0.001, OR =0.05) and focused designs (Est =−1.65, p < 0.001, OR = 0.19) withnegative sentiments and more likely to associate them with positivesentiments (Est = 1.83, p = 0.024, OR = 6.26; Est = 1.11, p = 0.034,OR = 3.03), compared to cluttered designs. Interestingly, some partici-pants claimed that the cluttered graphs seemed to be made by children:“The middle [cluttered] graph looks very childish like something a mid-dle schooler would make for a school project.” Very few participantsmentioned trustworthiness-related attributes in their free responses, andthus whether participants associate different designs with positive ornegative sentiments regarding trustworthiness remains inconclusive.Of note, some participants expressed that the focused graphs seemed

suspicious in the way that they pushed a single interpretation of thedata: “The [focused] graph made the point of emphasis most clear,though took a hit in trustworthiness because the extra text made it seemlike it had an agenda.” We found these unanticipated responses to beparticularly interesting. Detailed statistical analysis can be found in theSupplementary Materials.

6 DISCUSSION

Participants were 2.5-3x more likely to recall a focus-relevant conclu-sion for focused designs relative to the non-focused designs, and weremore likely to redraw the focused elements of the original visualization.They also found those designs more aesthetically appealing and clear,associating more positive sentiment with these visualizations and morenegative sentiment with the cluttered visualizations. Across trustworthi-ness and professionalism, participants preferred decluttered or focusedgraphs to cluttered ones.

6.1 Redrawing and Free Response DiscussionCoding of the free response conclusions reveals how many differentpatterns a viewer might focus on when that focus is not guided bythe visualization’s designer. Across all of the relevant and irrelevantcategorized conclusions, there were 12 total conclusion categories thatparticipants entered for the Car graph, 9 for the Holiday graph, 9 forthe News graph, 10 for the Tires graph, 10 for the Prison graph, and7 for the Plants graph (these categories are partially represented bythe rectangles within the bars of Figure 4, though that visualizationallows the viewer to see which categories are the same across designs;full data are available in the Supplementary Materials). This varietyfurther validates the practitioner focus guideline, showing that differentviewers will otherwise focus on many possible patterns of data.

Across redrawn graphs and free responses, there were no statisticallysignificant interactions among design and topic, which is to be expectedgiven that these combinations were manipulated between-subjects (asingle participant only saw one design for each topic). Yet, we pausefor a moment to speculate on why some topics showed bigger focuseffects than others.

For the Car topic, while ‘noise’ issues were the topic of focus, theywere also present across three of the top largest value bars. This patternis therefore consistent not only with the strong focus on that topicacross redrawings and conclusions, but also in the redrawings (less sothe conclusions) of the cluttered and decluttered designs.

For the Holiday topic, there was only a small effect of focusing inthe redrawings, possibly because of the complexity of the topic. Itrequired the participant to notice (and draw or describe) women andmen showing opposite patterns in some months compared to others.

The News topic showed only a small effect of the focus manipulationacross both the redrawings and written conclusions. But this was curi-ously matched by the decluttered design, even though the decluttereddesign did not focus on the key pattern.

The Plants topic also showed only a small increase in focus on thefocus-relevant topic, perhaps because even in the cluttered design, asalient set of diverging lines showed the same trend that was highlightedin the focus design.

The Prison topic showed a strong focus effect, with half of partici-pants who viewed the focused design drawing the focused trend, andnone doing so for drawings or free response conclusions in the non-focus designs. This may be due to the focused pattern (picking outtwo bars that both referred to drug offenses) being less intrinsicallysalient, leaving more room for the highlighting to guide viewers. In theabsence of that guidance, most participants instead focused on ‘Immi-gration’ as the largest bar, and around half of the conclusions focusedon Immigration being the most prominent cause of incarceration.

We also informally examined whether redrawn graphs included otherfeatures unrelated to the focus manipulation. We used a binary (Y/N)code for whether various features were present for the redrawn graphsto find the features that were most likely to be missing or presentacross each combination of design (cluttered, decluttered, and focused)and graph topic. For example, participants who saw the cluttered Cargraph drew axis labels and arranged bars from largest to smallest more

Page 9: Declutter and Focus: Empirically Evaluating Design ...€¦ · Communicating Data Jones • • With Tableau Data Visualisation Kirk • • Storytelling With Data Knaflic • •

than participants who saw either the decluttered or the focused version.Figure S1 in Supplementary Materials displays each of the 18 graphspaired with a handpicked drawing that most reflects the features thattended to be present or missing for that combination. Each stimulus andassociated redrawn graph is supplemented with annotations identifyingthe main changes and features. We intend this analysis to be illustrativeand to be used as inspiration, and do not have sufficient data to makefirm claims about these choices or trends.

6.2 Quantitative and Qualitative Ratings Discussion

Aesthetics and clarity ratings were trended toward being higher forfocused graphs compared to decluttered. Open-ended responses wereconsistent with these ratings, with many describing the declutteredgraphs as “boring” or “vague.” Decluttered designs were rated higheron aesthetics and clarity compared to cluttered designs. The clutteredgraphs often generated negative sentiments towards the excessive useof color in the cluttered graphs. Professionalism ratings were similarfor decluttered and focused graphs, and both were higher comparedto cluttered. For trustworthiness, there was no significant differenceacross designs, though some qualitative comments indicated that thecluttered graphs seemed to be made by children or that the focusedgraphs had an agenda.

6.3 Limitations and Future Directions

One limitation of the present study is that the quantitative ratings scalesfor aesthetics, clarity, professionalism, and trustworthiness were some-what correlated to each other. Our scales were inspired by past work(aesthetics and clarity) [53,54], with the addition of two new scales thatwe thought would be important to measure in organizational and busi-ness settings (professionalism and trustworthiness). All of the metricscorrelated with each other at R values of 0.5-0.7, suggesting that theseconcepts are not fully dissociable. Particularly strong relationships thatmerit future consideration include professionalism’s correlation withboth aesthetics and trustworthiness.

Another limitation is the scope of the tested population. The guide-lines tested here are intended for visualizations placed in front of awide variety of audiences, including people making presentations inorganizations, educational settings, and the press. Our undergraduateparticipant population should be a reasonable model, but differencesmight arise for other audiences. A student might be more accustomedto being asked to browse through data for patterns they find interesting,but a busy executive might have less patience with a non-focused graph,demanding the ‘so what’ immediately.

Similarly, decluttered graphs led to some polarization, with somefinding them boring, and with a few people finding the cluttered graphsto be ‘professional’ in a positive way, perhaps because they see thosecluttered designs as more familiar (see [53, 54] for similar findings).Such counterintuitive preferences could be even more prevalent whentesting a population of people in business organizations.

Another limitation of the present work is that the ‘declutter’ designmanipulation included a large set of design changes that are typicallyprescribed in contemporary practitioner guides. While the results showno downside of decluttering, and a mild improvement in ratings, thereare likely cases where our ‘clutter’ could be beneficial. For example,one recent blog post from a practitioner book author decried blanketrules against gridlines, which can be critical for helping people readvalues far from an axis [70]. A solution to this might be to empiri-cally evaluate the effect of every declutter guideline element separately(gridlines, color variety, white space. . . ) for every conceivable task ormeasure (speed to pick a value, accuracy to pick a value, memory fora trend, . . . ), presenting an imposingly large space of empirical tests.Given the existing work, we are now largely satisfied with the state ofthe existing empirical literature on the declutter guideline, and trustthe designer community’s intuitions for guidance on such contextually-dependent decisions – but stand ready to offer evaluation should theyfail to reach a clear consensus.

7 CONCLUSIONS

Compared to the focus guideline, we found weaker but generally pos-itive evidence for the benefits of the declutter guideline, with only ahandful of people finding decluttered designs too simple, or divergingfrom the typical cluttered style that is familiar to them. We found strongevidence for benefits of the combination of decluttered and focuseddesigns, for both preference ratings and for its ability to point a viewertowards a single critical pattern. Without this manipulation, there wasan impressive diversity of other patterns that participants found moresalient.

We were surprised to find little existing empirical evidence in supportof the power of the focus guideline. One might ask: Why bother testingits power – doesn’t everyone know that highlighting and naming apattern is important, because it will lead people to remember it? Incontrast, despite the massive number of practitioner guides making thisprescription, few people in the real world tend to do it. For the mostcommon reader of this article – an academic – think of the rate at whichpeople guide you to the patterns in their data across talks, posters, andpapers; we suspect that your own value will be somewhere around 5%.The rest of the world – businesses, nonprofits, classrooms, and labs –would likely report a similarly low rate. One cause of this is likely to bethe curse of expertise, where visualization designers assume that otherssee what they see in their data [58]. The designer may assume that theaudience knows a similar amount, or even more, about a topic thanthey do, and will intuitively know what patterns to focus on. However,the designer is typically the person who has studied the topic and itscontext most deeply, putting them in a unique position to guide theiraudience.

There are also contexts where one might avoid the focus manipu-lation. The designer may not know what patterns are most important,and may want to show data to gain the audience’s perspective, or toinitiate a group brainstorm. The technique may also feel heavy handedat times – even some of our participants reported lower trust in thefocused design because they found them ‘pushy.’ There may be toomany patterns to highlight, especially in a static document. We do notuse the technique for Figures 3 and 4 in this paper simply because thereare too many trends to highlight without creating clutter for the paper’sreader.

A final reason to test the power of the focus guideline is to provideempirical support for the guideline prescriptions of the community ofbook authors and workshop instructors who directly or indirectly trainhundreds of thousands of people in data visualization design. This com-munity is often asked for ‘proof’ that their guidelines work, especiallyfrom people with a ‘curse of knowledge’ that their visualizations arealready perfect [58]. This proof is needed not just for the readers orattendees of sessions – an already receptive audience – but also for thecolleagues and supervisors that they will need to convince the followingweek. We hope that the present results provide that evidence for theefficacy of the focus guideline for improving the audience’s receptionto and memorability of the data they want to communicate, the pointthey want to make, and the action they hope to inspire.

ACKNOWLEDGMENTS

The authors wish to thank Evan Anderson for assisting with blind qual-itative coding, Evan Lee for digitizing redrawn graphs, and CaitlynMcColeman and Miriam Novack for assisting on the inter-rater reliabil-ity analysis. Thanks also to Nicole Jardine and Cristina Ceja for theirhelpful comments on this paper.

REFERENCES

[1] Ian Parker. Absolute powerpoint. The New Yorker, 28:76–87, 2001.[2] RJ Andrews. Info We Trust: How to Inspire the World with Data. John

Wiley & Sons, 2019.[3] Nicolas P Rougier, Michael Droettboom, and Philip E Bourne. Ten simple

rules for better figures. PLoS Comput Biol, 10(9):e1003833, 2014.[4] Scott Berinato. Good charts: The HBR guide to making smarter, more

persuasive data visualizations. Harvard Business Review Press, 2016.[5] Alberto Cairo. The Functional Art: An introduction to information graph-

ics and visualization. New Riders, 2012.

Page 10: Declutter and Focus: Empirically Evaluating Design ...€¦ · Communicating Data Jones • • With Tableau Data Visualisation Kirk • • Storytelling With Data Knaflic • •

[6] Alberto Cairo. The truthful art: data, charts, and maps for communication.New Riders, 2016.

[7] Jorge Camoes. Data at work: Best practices for creating effective chartsand information graphics in Microsoft Excel. New Riders, 2016.

[8] Nancy Duarte. Data story: explain data and inspire action through story.Ideapress Publishing, 2019.

[9] Nancy Duarte. Slide: ology: The art and science of creating great presen-tations, volume 1. O’Reilly Media Sebastopol, CA, 2008.

[10] Brent Dykes. Effective data storytelling: how to drive change with data,narrative and visuals. John Wiley and Sons, Inc., 2020.

[11] Stephanie DH Evergreen. Effective data visualization: The right chart forthe right data. Sage Publications, 2019.

[12] Stephanie DH Evergreen. Presenting data effectively: Communicatingyour findings for maximum impact. Sage Publications, 2017.

[13] Stephen Few. Now you see it: simple visualization techniques for quanti-tative analysis. Analytics Press, 2009.

[14] Stephen Few. Information dashboard design: The effective visual commu-nication of data. O’Reilly Media, Inc., 2006.

[15] Ben Jones. Avoiding data pitfalls: how to steer clear of common blunderswhen working with data and presenting analysis and visualizations. Wiley,2020.

[16] Ben Jones. Communicating Data with Tableau: Designing, Developing,and Delivering Data Visualizations. ” O’Reilly Media, Inc.”, 2014.

[17] Andy Kirk. Data visualisation: a handbook for data driven design. Sage,2016.

[18] Cole Nussbaumer Knaflic and Catherine Madden. Storytelling with data:lets practice! Wiley, 2020.

[19] Cole Nussbaumer Knaflic. Storytelling with data: A data visualizationguide for business professionals. John Wiley & Sons, 2015.

[20] Andy Kriebel and Eva Murray. # MakeoverMonday: Improving how WeVisualize and Analyze Data, One Chart at a Time. John Wiley & Sons,2018.

[21] Manuel Lima. The book of trees: visualizing branches of knowledge.Princeton Architectural Press, 2014.

[22] Isabel Meirelles. Design for information: an introduction to the histories,theories, and best practices behind effective information visualizations.Rockport publishers, 2013.

[23] Tamara Munzner. Visualization analysis and design. CRC press, 2014.[24] Daniel G Murray. Tableau your data!: fast and easy visual analysis with

tableau software. John Wiley & Sons, 2013.[25] Jonathan Schwabish. Better presentations: a guide for scholars, re-

searchers, and wonks. Columbia University Press, 2016.[26] Jonathan A Schwabish. Elevate the debate: A multilayered approach to

communicating your research. John Wiley & Sons, 2020.[27] Edward R Tufte and David Robins. Visual explanations. Graphics

Cheshire, CT, 1997.[28] Edward R Tufte. The visual display of quantitative information, volume 2.

Graphics press Cheshire, CT, 2001.[29] Edward R Tufte, Nora Hillman Goeler, and Richard Benson. Envisioning

information, volume 126. Graphics press Cheshire, CT, 1990.[30] Sejal Vora. The Power of Data Storytelling. SAGE Publications India,

2019.[31] Colin Ware. Information visualization: perception for design. Elsevier,

2012.[32] Colin Ware. Visual thinking: For design. Elsevier, 2010.[33] Steve Wexler, Jeffrey Shaffer, and Andy Cotgreave. The big book of

dashboards: visualizing your data using real-world business scenarios.John Wiley & Sons, 2017.

[34] Dona M Wong. The Wall Street Journal guide to information graphics:The dos and don’ts of presenting data, facts, and figures. New York: WWNorton & Company, 2010.

[35] Nathan Yau. Data points: Visualization that means something. John Wiley& Sons, 2013.

[36] Jean-Luc Doumont. Trees, maps, and theorems. Brussels: Principiae,2009.

[37] David C Van Essen, Charles H Anderson, and Daniel J Felleman. Infor-mation processing in the primate visual system: an integrated systemsperspective. Science, 255(5043):419–423, 1992.

[38] Michael Correll and Michael Gleicher. Bad for data, good for the brain:Knowledge-first axioms for visualization design. In IEEE VIS 2014, 2014.

[39] Douglas J Gillan and Edward H Richman. Minimalism and the syntax ofgraphs. Human Factors, 36(4):619–644, 1994.

[40] Huiyang Li and Nadine Moacdieh. Is “chart junk” useful? an extended

examination of visual embellishment. In Proceedings of the HumanFactors and Ergonomics Society Annual Meeting, volume 58, pages 1516–1520. SAGE Publications Sage CA: Los Angeles, CA, 2014.

[41] Robert D Helgeson and Robert A Moriarty. The effect of fill patternson graphical interpretation and decision making. Technical report, AIRFORCE INST OF TECH WRIGHT-PATTERSON AFB OH, 1993.

[42] Michelle A Borkin, Azalea A Vo, Zoya Bylinskii, Phillip Isola, ShashankSunkavalli, Aude Oliva, and Hanspeter Pfister. What makes a visualizationmemorable? IEEE Transactions on Visualization and Computer Graphics,19(12):2306–2315, 2013.

[43] Steve Haroz, Robert Kosara, and Steven L Franconeri. Isotype visualiza-tion: Working memory, performance, and engagement with pictographs.In Proceedings of the 33rd annual ACM conference on human factors incomputing systems, pages 1191–1200, 2015.

[44] Michelle A Borkin, Zoya Bylinskii, Nam Wook Kim, Constance May Bain-bridge, Chelsea S Yeh, Daniel Borkin, Hanspeter Pfister, and Aude Oliva.Beyond memorability: Visualization recognition and recall. IEEE transac-tions on visualization and computer graphics, 22(1):519–528, 2015.

[45] Scott Bateman, Regan L Mandryk, Carl Gutwin, Aaron Genest, DavidMcDine, and Christopher Brooks. Useful junk? the effects of visual em-bellishment on comprehension and memorability of charts. In Proceedingsof the SIGCHI conference on human factors in computing systems, pages2573–2582, 2010.

[46] Yu-Sung Su. It’s easy to produce chartjunk using microsoft® excel 2007but hard to make good graphs. Computational Statistics & Data Analysis,52(10):4594–4601, 2008.

[47] Stephen Kosslyn. Better PowerPoint (R): Quick Fixes Based On How YourAudience Thinks. Oxford University Press, 2010.

[48] Andy Kirk. Data Visualization: a successful design process. PacktPublishing Ltd, 2012.

[49] Stephen Few. Show me the numbers. Analytics Pres, 2004.[50] Anthony J Blasio and Ann M Bisantz. A comparison of the effects of

data–ink ratio on performance with dynamic displays in a monitoring task.International journal of industrial ergonomics, 30(2):89–101, 2002.

[51] Lyn Bartram and Maureen C Stone. Whisper, don’t scream: Grids andtransparency. IEEE Transactions on Visualization and Computer Graphics,17(10):1444–1458, 2010.

[52] Lyn Bartram, Billy Cheung, and Maureen Stone. The effect of colour andtransparency on the perception of overlaid grids. IEEE Transactions onVisualization and Computer Graphics, 17(12):1942–1948, 2011.

[53] Stephen Hill, Barry Wray, and Christopher Sibona. Minimalism in datavisualization: Perceptions of beauty, clarity, effectiveness, and simplicity.Journal of Information Systems Applied Research, 11(1):34, 2018.

[54] Ohad Inbar, Noam Tractinsky, and Joachim Meyer. Minimalism in infor-mation visualization: attitudes towards maximizing the data-ink ratio. InProceedings of the 14th European conference on Cognitive ergonomics:invent! explore!, pages 185–188, 2007.

[55] Danielle Albers Szafir, Steve Haroz, Michael Gleicher, and Steven Fran-coneri. Four types of ensemble coding in data visualizations. Journal ofvision, 16(5):11–11, 2016.

[56] Christine Nothelfer and Steven Franconeri. Measures of the benefit ofdirect encoding of data deltas for data pair relation perception. IEEE trans-actions on visualization and computer graphics, 26(1):311–320, 2019.

[57] Priti Shah and Eric G Freedman. Bar and line graph comprehension: Aninteraction of top-down and bottom-up processes. Topics in cognitivescience, 3(3):560–578, 2011.

[58] Cindy Xiong, Lisanne van Weelden, and Steven Franconeri. The curse ofknowledge in visual data communication. IEEE transactions on visualiza-tion and computer graphics, 2019.

[59] Robert Kosara and Jock Mackinlay. Storytelling: The next step for visual-ization. Computer, 46(5):44–50, 2013.

[60] Edward Segel and Jeffrey Heer. Narrative visualization: Telling storieswith data. IEEE transactions on visualization and computer graphics,16(6):1139–1148, 2010.

[61] Jeremy Boy, Francoise Detienne, and Jean-Daniel Fekete. Storytellingin information visualizations: Does it engage users to explore data? InProceedings of the 33rd Annual ACM Conference on Human Factors inComputing Systems, pages 1449–1458, 2015.

[62] Jessica Hullman and Nick Diakopoulos. Visualization rhetoric: Framingeffects in narrative visualization. IEEE transactions on visualization andcomputer graphics, 17(12):2231–2240, 2011.

[63] storytelling with data. http://www.storytellingwithdata.com/.Accessed: 2020-04-21.

Page 11: Declutter and Focus: Empirically Evaluating Design ...€¦ · Communicating Data Jones • • With Tableau Data Visualisation Kirk • • Storytelling With Data Knaflic • •

[64] RVAideMemoire. https://www.rdocumentation.org/packages/RVAideMemoire/versions/0.9-73/topics/pairwise.perm.

manova. Accessed: 2020-04-15.[65] Franz Faul, Edgar Erdfelder, Albert-Georg Lang, and Axel Buchner. G*

power 3: A flexible statistical power analysis program for the social, behav-ioral, and biomedical sciences. Behavior research methods, 39(2):175–191,2007.

[66] Mary L McHugh. Interrater reliability: the kappa statistic. Biochemiamedica: Biochemia medica, 22(3):276–282, 2012.

[67] Douglas Bates. Fitting linear mixed models in r. R news, 5(1):27–30,2005.

[68] Anselm Strauss and Juliet Corbin. Grounded theory methodology. Hand-book of qualitative research, 17:273–85, 1994.

[69] Douglas M Bates. lme4: Mixed-effects modeling with r, 2010.[70] You probably need gridlines. https://stephanieevergreen.com/

you-probably-need-gridlines/. Accessed: 2020-04-21.