seven dirty secrets of data visualisation | feature | .net magazine

8
2/26/13 5:51 PM Seven dirty secrets of data visualisation | Feature | .net magazine Page 1 of 8 http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation Seven dirty secrets of data visualisation Data visualisation - and in particular, web-based data visualisation - is having its mo- ment. JavaScript libraries like D3.js, Raphaël, and Paper.js, building on modern browser support for Canvas and SVG, have made it easier than ever to produce complex visuali- sations that, until recently, were the province of computer scientists and a handful of specialist designers. Visualisation is the new 'must-have' element in project proposals and personal portfo- lios, and startups like Platfora, Datameer, and our own employers ClearStory Data and Chartio are raising millions for analytics platforms with browser-based visualisation in- terfaces. To some extent, the buzz is justified. Data visualisation is a wonderful way of exploring data, finding new insights, and telling a compelling story. But what are the real chal- lenges visualisation developers face - and what don't they want you to know about their work? We'll lead you through some of the dirty secrets of the information visualisation (info- vis) profession, offering an inside look at the process of visualisation development, along with practical tools and approaches for dealing with its inevitable challenges and frustrations. Secret #1: Real data is ugly Most data visualisation tutorials start with a pleasant fantasy: a pristine data set. Whether you’re learning to build a basic bar chart or a force-directed network graph, you’re presented with clean, normalised, well-formatted base data. This perfect JSON or CSV file is the digital analog of the neatly prepped mise en place in a televised cooking show: the refined result of tedious, painstaking work presented as raw ingredients. In practice, when dealing with most real-world data sets, expect to spend up to 80 per cent of your time finding, acquiring, loading, cleaning and transforming your data. Some of this process can be done with automated tools, but almost any data cleaning in- volving two or more data sets will require some level of manual work. A wide variety of

Upload: jacobdmiller

Post on 27-Oct-2015

5 views

Category:

Documents


2 download

DESCRIPTION

Information arch

TRANSCRIPT

Page 1: Seven Dirty Secrets of Data Visualisation | Feature | .Net Magazine

2/26/13 5:51 PMSeven dirty secrets of data visualisation | Feature | .net magazine

Page 1 of 8http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation

Seven dirty secrets of data visualisation

Data visualisation - and in particular, web-based data visualisation - is having its mo-ment. JavaScript libraries like D3.js, Raphaël, and Paper.js, building on modern browsersupport for Canvas and SVG, have made it easier than ever to produce complex visuali-sations that, until recently, were the province of computer scientists and a handful ofspecialist designers.

Visualisation is the new 'must-have' element in project proposals and personal portfo-lios, and startups like Platfora, Datameer, and our own employers ClearStory Data andChartio are raising millions for analytics platforms with browser-based visualisation in-terfaces.

To some extent, the buzz is justified. Data visualisation is a wonderful way of exploringdata, finding new insights, and telling a compelling story. But what are the real chal-lenges visualisation developers face - and what don't they want you to know about theirwork?

We'll lead you through some of the dirty secrets of the information visualisation (info-vis) profession, offering an inside look at the process of visualisation development,along with practical tools and approaches for dealing with its inevitable challenges andfrustrations.

Secret #1: Real data is ugly

Most data visualisation tutorials start with a pleasant fantasy: a pristine data set.Whether you’re learning to build a basic bar chart or a force-directed network graph,you’re presented with clean, normalised, well-formatted base data. This perfect JSON orCSV file is the digital analog of the neatly prepped mise en place in a televised cookingshow: the refined result of tedious, painstaking work presented as raw ingredients. Inpractice, when dealing with most real-world data sets, expect to spend up to 80 per centof your time finding, acquiring, loading, cleaning and transforming your data.

Some of this process can be done with automated tools, but almost any data cleaning in-volving two or more data sets will require some level of manual work. A wide variety of

Page 2: Seven Dirty Secrets of Data Visualisation | Feature | .Net Magazine

2/26/13 5:51 PMSeven dirty secrets of data visualisation | Feature | .net magazine

Page 2 of 8http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation

tools can convert XLS to XML or timestamps to other date formats, but nothing can au-tomagically map one company’s internal sales categories to those of its competitors, ordeal reliably with data entry typos, incompatible character encodings, or (shudder) poorOCR.

Tools and strategies

Budget significant time in any visualisation project for data cleanup. Increase yourestimate (in some cases exponentially) for multiple data sources, manually enteredor OCR data, divergent categorisation schemes, and non-standard formatsGoogle Refine is a great data cleanup workhorse, though it has limitations, particu-larly for non-tabular data. Other cleanup-specific tools include Data Wrangler andMr. Data Converter. However, many tasks still require basic proficiency in a script-ing language like Python or manual work in Excel. Save your scripts - you’ll usethem againEat your own dog food if you can: visualisation is a great tool for identifying dataproblems. Use scatter plots and histograms to find and fix suspicious outliers

Secret #2: A bar chart is usually better

Page 3: Seven Dirty Secrets of Data Visualisation | Feature | .Net Magazine

2/26/13 5:51 PMSeven dirty secrets of data visualisation | Feature | .net magazine

Page 3 of 8http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation

Compared to bar charts, bubble charts support more data points in less space,doughnut charts clearly indicate part-whole relationships, and treemaps sup-port hierarchical categories - but none match simple bars for fine-grainedcomparison

One of the first questions to ask when considering a potential visualisation design is“Why is this better than a bar chart?” If you’re visualising a single quantitative measureover a single categorical dimension, there is rarely a better option. Likewise, time-baseddata is usually best displayed on a line chart, and scatterplots are often best for explor-ing correlations between two linear measures. At the risk of sounding regressive, thereare good reasons these charts have been in continuous use since the 18th century. Barcharts are one of the best tools available for facilitating visual comparisons, leveragingour innate ability to precisely compare side-by-side lengths.

Page 4: Seven Dirty Secrets of Data Visualisation | Feature | .Net Magazine

2/26/13 5:51 PMSeven dirty secrets of data visualisation | Feature | .net magazine

Page 4 of 8http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation

The corollary to bar chart superiority, and perhaps the dirtiest secret in this article, isthat the coolest-looking visualisations are often the least useful. The novelty and aesthet-ic appeal of custom visualisations comes at a cost: the clarity of the data. Most bar chartalternatives ask the viewer to compare differences we have a harder time discerning: ar-eas, angles, hues, or opacities. At best, such visualisations make comparison difficult; atworst, they distort the data entirely, leading viewers to false conclusions.

Tools and strategies

Don’t dismiss traditional visualisation choices if they represent the best option foryour data. Start with bar and line charts, and look further only when the data re-quires itHave a good rationale for choosing other options. Compared to bar charts, bubblecharts support more data points with a wider range of values; pies and doughnutsclearly indicate part-whole relationships; treemaps support hierarchical categoriesBar charts have the added bonus of being one of the easiest visualisations to make -you can hand-code an effective bar chart in HTML using nothing but CSS and min-imal JavaScript, or make one in Excel with a single function

Secret #3: There’s no substitute for real data

Cleaning and formatting a single data set is hard enough, but what if you’re building alive visualisation that will run with many different datasets? Maybe you have to build avisualisation for use in multiple departments within one company, where every depart-ment has its own database, and you don’t have time to manually clean each dataset.Your first instinct may be to grab some demo data and use that to build your visualisa-tion; your visualisation library may even come with standard sample data.

Unfortunately, there is no substitute for real data. Demo data tends to have a normaldistribution and a manageable number of records; it’s designed to show visualisationsin their best light. A bar chart doesn’t just have the prerequisite bars, it looks like an ide-al bar chart. It doesn’t help you plan for data discrepancies, null values, outliers, or oth-er real-world problems. If you rely too much on demo data, when you plug in real datayou may find that your visualisation isn’t the best one suited for your data to beginwith.

Page 5: Seven Dirty Secrets of Data Visualisation | Feature | .Net Magazine

2/26/13 5:51 PMSeven dirty secrets of data visualisation | Feature | .net magazine

Page 5 of 8http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation

Tools and strategies

Ideally use several random samples of real data if you cannot access an entiredatasetInvalid and missing data is a guarantee. If your data won’t be cleaned before beinggraphed, do not clean your sample dataReal data may be so large as to overwhelm your visualisation or the system gener-ating it. Be sure that if you use a sample of data you correctly scale up the samplesize (or reduce it appropriately) before creating a final visualisation

Secret #4: The devil is in the details

Laying out labels horizontally can quickly lead to crowding and illegible text(top). Rotating labels 90 degrees improves legibility, but takes away signifi-cant space from the visualisation. Finding a truncated or abbreviated label for-mat is one approach, but won't work for every data set

Page 6: Seven Dirty Secrets of Data Visualisation | Feature | .Net Magazine

2/26/13 5:51 PMSeven dirty secrets of data visualisation | Feature | .net magazine

Page 6 of 8http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation

Designing the labels, legends and axes for your visualisation is often an afterthought tothe initial visualisation. But these elements are crucially important to the visualisation,and can be difficult and time-consuming to get right, especially when you can’t predictthe data ahead of time.

When laying out your visualisation, leave significant rendering space for any additionalmarks you may need, often including relatively wide margins around the graphical partof your visualisation. Axis labels should be spaced such that they do not occlude eachother and are easily readable. Use rotate or reposition labels if necessary for legibility. Ifa particular area is overcrowded with labels, but you need them for clarity, considermoving the labels farther from the elements they reference and connect them with an in-dicating line. Another technique is to group crowded labels together in a single tooltip-like group. Consider the space you’ve allowed and the length of the longer labels. If thelabels won’t fit you might need to shorten them with ellipses, or simply truncate the textat a fixed length.

Similarly, legends require advance planning to render well. One easy option is to re-serve some space for the legend to one side of the graphic. Unfortunately, this meansthat you’ll need to reduce the size of the graphical portion of your visualisation. In orderto preserve some space you may be able to place the legend in an empty part of thegraphic, or make the legend draggable so the viewer can access any graphics under-neath.

Tools and strategies

Plan space around your graphic for labels, axes and legendsDesignate a maximum character length for labels, truncating if needed to preventcrowding. Group nearby labels together, revealing them in response to user actionsConsider scrolling or accordion-style expansion for long legendsWhatever you do, don’t leave these elements out. Labels may seem like a sec-ondary concern when you’re focused on the graphic elements, but they are incredi-bly important to your viewers

Secret #5: Animate only when appropriate

Page 7: Seven Dirty Secrets of Data Visualisation | Feature | .Net Magazine

2/26/13 5:51 PMSeven dirty secrets of data visualisation | Feature | .net magazine

Page 7 of 8http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation

As a visualisation author, it’s often tempting to add animations into your final product.Animations are a powerful way of connecting data to changes in state and trends. How-ever, animations can also lead to confusing or misleading interpretations of your data.You should carefully plan for how it will affect your entire output and not simply add itat the end of your work. Animations work best when they can reveal data relationshipsshowing how data groups together between different states, how the data changes overtime, or how data points are directly related.

In general, make your animations simple, predictable and re-playable. Allow users toview the animation multiple times so they can track where objects start and end. Avoidoccluding objects in a transition with other objects, which makes tracking more difficultand do not transition objects along unpredictable paths. With complex animations, re-search suggests that viewers’ comprehension improves when the animation is brokeninto simple 'staged' transitions. A stage pauses the animation with the objects in a transi-tioning state and provides the viewer a moment to reflect on the state of each object.

Tools and strategies

Strive to make your animations as simple as possibleConsider staged animations when an animation is either complex or has manytransitioning objectsFlashy animations are often entertaining at first, but quickly become frustrating tothe viewer. Do not add animation just because you can

Secret #6: Visualisation is not analysis

It's a central tenet of the field that data visualisation can yield meaningful insight. Whilethere’s a great deal of truth to this, it’s important to remember that visualisation is a toolto aid analysis, not a substitute for analytical skill. It’s also not a substitute for statistics:your chart may highlight differences or correlations between data points, but to reliablydraw conclusions from these insights often requires a more rigorous statistical ap-proach. (The reverse can also be true - as Anscombe’s Quartet demonstrates, visualisa-tions can reveal differences statistics hide.) Really understanding your data generally re-quires a combination of analytical skills, domain expertise, and effort. Don’t expect yourvisualisations to do this work for you, and make sure you manage the expectations of

Page 8: Seven Dirty Secrets of Data Visualisation | Feature | .Net Magazine

2/26/13 5:51 PMSeven dirty secrets of data visualisation | Feature | .net magazine

Page 8 of 8http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation

your clients and your CEO when creating or commissioning visualisations.

Tools and strategies

Unless you’re a data analyst, be very careful about promising real insight. Considerworking with a statistician or a domain expert if you need to offer reliable conclu-sionsSmall design decisions - the colour palette you use, or how you represent a particu-lar variable - can skew the conclusions a visualisation suggests. If you’re using vi-sualisations for analysis, try a variety of options, rather than relying on a singleviewStephen Few’s Now You See It offers a good practical introduction to using visuali-sation for business analysis, including suggestions for developers on how to designanalytically-valid visualisation tools

Secret #7: Data visualisation takes more than code

The range of libraries and tutorials now available make it easier than ever to produceproduction-quality web-based visualisations without specialised expertise. But creatingvisualisations that offer real insight or tell a compelling story still requires a particularlywide range of real skills in addition to coding, including graphic design, data analysis,and an understanding of interaction design and human perception. No library or tech-nology can substitute for knowing what you’re doing.

But the flip side of this secret is that you don’t need to know that much - especially ifyou use well-established visualisations and interaction principles. Learn enough aboutthe field to avoid newbie mistakes (always zero-base your bar charts and never set a cir-cle radius with a linear scale), keep things simple (no 3D, limited animation, no dropshadows), base your work on solid examples and you can create great visualisations.