data visualization in sociology - · pdf filecurrent state of data visualization in sociology....

27
Data Visualization in Sociology Kieran Healy and James Moody Sociology Department, Duke University, Durham, North Carolina 27708; email: [email protected], [email protected] Annu. Rev. Sociol. 2014. 40:105–28 First published online as a Review in Advance on June 6, 2014 The Annual Review of Sociology is online at soc.annualreviews.org This article’s doi: 10.1146/annurev-soc-071312-145551 Copyright c 2014 by Annual Reviews. All rights reserved Keywords visualization, statistics, methods, exploratory data analysis Abstract Visualizing data is central to social scientific work. Despite a promising early beginning, sociology has lagged in the use of visual tools. We review the history and current state of visualization in sociology. Using examples throughout, we discuss recent developments in ways of seeing raw data and presenting the results of statistical modeling. We make a general distinction between those methods and tools designed to help explore data sets and those designed to help present results to others. We argue that recent advances should be seen as part of a broader shift toward easier sharing of code and data both between researchers and with wider publics, and we encourage practitioners and publishers to work toward a higher and more consistent standard for the graphical display of sociological insights. 105 Annu. Rev. Sociol. 2014.40:105-128. Downloaded from www.annualreviews.org Access provided by Duke University on 08/09/17. For personal use only.

Upload: vonhi

Post on 01-Feb-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

Data Visualizationin SociologyKieran Healy and James MoodySociology Department, Duke University, Durham, North Carolina 27708;email: [email protected], [email protected]

Annu. Rev. Sociol. 2014. 40:105–28

First published online as a Review in Advance onJune 6, 2014

The Annual Review of Sociology is online atsoc.annualreviews.org

This article’s doi:10.1146/annurev-soc-071312-145551

Copyright c⃝ 2014 by Annual Reviews.All rights reserved

Keywordsvisualization, statistics, methods, exploratory data analysis

AbstractVisualizing data is central to social scientific work. Despite a promisingearly beginning, sociology has lagged in the use of visual tools. Wereview the history and current state of visualization in sociology. Usingexamples throughout, we discuss recent developments in ways of seeingraw data and presenting the results of statistical modeling. We make ageneral distinction between those methods and tools designed to helpexplore data sets and those designed to help present results to others.We argue that recent advances should be seen as part of a broader shifttoward easier sharing of code and data both between researchers andwith wider publics, and we encourage practitioners and publishers towork toward a higher and more consistent standard for the graphicaldisplay of sociological insights.

105

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 2: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

INTRODUCTIONFrom the mind’s eye to the Hubble telescope,visualization is a central feature of discovery,understanding, and communication in science.There are many different ways to see. Visualtools range from false-color photographs oftelescopic images in astronomy to reconstruc-tions of prehistoric creatures in paleontology.In the statistical sciences, images are often moreabstract than models of fighting dinosaurs—depending as they must on conventions that linksize, value, texture, color, orientation, or shapeto quantities (Bertin 1967 [2010]). But statisti-cal visualizations are nonetheless critical to pro-moting science. One need only think of the nowiconic hockey-stick diagram of earth tempera-ture for a clear case (Mann et al. 1999). De-spite its ubiquity in most of the natural sciences,visualization often remains an afterthought insociology.

In this article, we review the history andcurrent state of data visualization in sociology.Our aim is to encourage sociologists to usethese methods effectively across the researchand publication process. We begin with a briefhistory, then present an overview of the the-ory of graphical presentation. The bulk of ourreview is organized around the uses of visualiza-tion in first the exploration and then the presen-tation of data, with exemplars of good practice.We also discuss workflow and software issuesand the question of whether better visualizationcan make sociological research more accessible.

SOCIOLOGY LAGSFirst, why are statistical visualizations so com-mon in other fields and rare in sociology? Al-though model summaries offer exacting preci-sion in expressing particular quantities—suchas the slope of a line through data points—getting a sense of multiple patterns simulta-neously is typically easier visually. The point ismade forcefully by Anscombe’s (1973) famousquartet, reproduced in Figure 1a. Each data setcontains 11 observations on two variables. Thebasic statistical properties of each data set are

almost identical, up to and including their bi-variate regression lines. But when visualized as ascatterplot, the differences are readily apparent(see also Chatterjee & Firat 2007). Lest we thinksuch features are confined to carefully con-structed examples, consider Jackman’s (1980)intervention in a debate between Hewitt (1977)and Stack (1979) over a critical test of Lenski’s(1966) theory of inequality and politics, repro-duced in Figure 1b. The argument is won ata glance, as the figure shows that the seem-ingly strong negative association between voterturnout and income inequality depends entirelyon the inclusion of South Africa in the sample.

Given the power of statistical visualization,then, it is puzzling that quantitative sociology isso often practiced without visual referents. Oneneed only compare a recent issue of the Ameri-can Sociological Review or the American Journal ofSociology to Science, Nature, or the Proceedings ofthe National Academy of Science to see the radicaldifference in visual acuity. It is common for thepremier journals in sociology to publish articleswith many tables, but no figures. The oppositeis true in the premier natural science journals.There, a key figure is often the heart of the ar-ticle. In Nature, for example, the online tableof contents includes a thumbnail of the centralfigure to serve as the link to the rest of the paper.

It has not always been so. Early in the historyof the discipline, data visualizations were com-mon and not appreciably out of step with thewider scientific community. Exemplars of barcharts (Hart 1896), line graphs (Marro 1899),parametric density plots and dot plots withstandard errors (Chapin 1924), scatterplots(Sletto 1936), and social network diagrams(Lundberg & Steele 1938) are easy to find inearly sociological journal articles. Du Bois’s(1898 [1967]) The Philadelphia Negro is filledwith innovative visualizations, including choro-pleth maps, table-and-histogram combinations,time series, and others. But somewhere alongthe line sociology became a field where sophis-ticated statistical models were almost invari-ably represented by dense tables of variablesalong rows and model numbers along columns.Though they may signal scientific rigor, such

106 Healy · Moody

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 3: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

1 2

3 4

5.0

7.5

10.0

12.5

5.0

7.5

10.0

12.5

5 10 15 5 10 15x values

y va

lues

For all panels, N = 11; mean = 7.5; regression: Y = 3 + 0.5(X); r = 0.82.SE of slope estimate: 0.118, t = 4.24; sum of squares (X − X): 100

South Africa

Bivariate slope including South Africa (N = 18)

Bivariate slope excluding South Africa (N = 17)

a Anscombe’s quartet (1973) b Jackman (1980)

Figure 1Visualizations reveal model summary failures: (a) Anscombe’s quartet shows how statistically identical data sets can look very different;(b) visualization from Jackman (1980) decisively demonstrates the influence of outlying data points in an analysis.

tables can easily be substantively indecipherableto most readers and perhaps at times even toauthors. The reasons for this are beyond thescope of this review, although several possi-bly complementary hypotheses suggest them-selves. First, to the extent that graphical im-agery was thought of as descriptive, statisticalimages may have been collateral damage in thewar between causal-inferential modeling anddescriptive reportage. Second, figures may haveseemed unsophisticated. The very clarity of a(good) figure made the work seem too sim-ple. Third, and more charitably, visualizationin sociology might have been a victim of thefield’s relatively rapid embrace of quantitativemethods. American sociology adopted sophis-ticated modeling techniques quite early com-pared with other social sciences. The rangeand variety of its research questions and datasources meant that the statistical tool kit in so-ciology in the late 1960s and into the 1970swas more varied than in economics or psy-

chology at the time and much more developedthan what was then current in political science.But this was also a period when the visual-ization tools of statistical software lagged wellbehind their strictly computational abilities.Conventions of data presentation may havestandardized at a time when the possibilities forvisualization were narrower. Finally, some ofthe resistance to figures may have come fromthe fact that the tables in early journal articlesand monographs often contained actual datarather than summaries or model results. In a re-view of a history of graphical methods in statis-tics written in 1938, John Maynard Keynes re-marked that he wished the author

could have added a warning, supported byhorrid examples, of the evils of the graphicalmethod unsupported by tables of figures. Bothfor accurate understanding, and particularly tofacilitate the use of the same material by otherpeople, it is essential that graphs should not be

www.annualreviews.org • Data Visualization in Sociology 107

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 4: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

published by themselves, but only when sup-ported by the tables which lead up to them. Itwould be an exceedingly good rule to forbidin any scientific periodical the publication ofgraphs unsupported by tables. (Keynes 1938,p. 282, emphasis added)

To speak anachronistically, here Keynes isarguing that economists need the underlyingdata along with the visual summary for the sakeof reproducibility. We are now at a point whenthe volume of data used in a typical quantita-tive article far exceeds what can be presented ina series of tables. But Keynes’s point is worthbearing in mind. The utility of visualizationmethods—in particular their ability to effec-tively summarize large quantities of data or so-phisticated modeling techniques—is partly de-pendent on related advances in our ability toeasily share data and reproduce analyses. If dataare accessible as needed, using figures insteadof tables becomes much easier. Not coinci-dentally, this is another area where sociologyhas lagged behind other social sciences (Freese2007).

Whatever their relative importance, the netresult of these processes for sociology has beena training and publication standard that rarelyincludes graphical treatments of statistics. Newstudents are typically not taught to think aboutgraphics and statistics in a consistent, coherentway.

Our argument is not that sociologists shouldbe producing more visualizations just becauseeveryone else is doing it. Indeed, as we discussbelow, there is considerable debate about whatsort of visual work is most effective, when it canbe superfluous, and how it can at times be mis-leading to researchers and audiences alike. Justlike sober and authoritative tables, data visual-izations have their own rhetoric of plausibility.Anscombe’s quartet notwithstanding, summarystatistics and modeling can be thought of astools that deliberately simplify data to let us seepast the cloud of data points. We do not thinkvisualization will give us the right answer sim-ply by looking. Rather, we should think abouthow visualization might be more effectively in-

tegrated into all stages of our work. Softwarenow makes routinely generating figures easierthan ever. Even if many disciplinary journalsstill lag in their editorial desire or ability topresent good data visualizations, we argue thatit is time for these methods to be fully integratedinto sociology’s research process.

VISUALIZATION IN PRINCIPLEBook-length treatments of good statistical visu-alization practice abound. Their content rangesfrom the more theoretical—emphasizing, forinstance, the nature and origins of visualconventions—to more pragmatic collectionsof current best practices meant to serve asan inspiration to practitioners. In betweenare efforts to codify practice and developtaste, and guides to working implementations.The most influential general treatments areprobably Bertin’s (1967 [2010]) Semiology ofGraphics, Cleveland’s The Elements of GraphingData (1994) and Visualizing Data (1993), andWilkinson’s (1995 [2005]) The Grammar ofGraphics. Overviews of contemporary practicecan be had in Few (2009, 2012) and Yau (2012).There are also several books based specificallyon visualization techniques within a particularsoftware program, such as Friendly (2000) forSAS, Mitchell (2012) for Stata, Murrell (2011)for R, and Kleimean & Horton (2013) forcomparisons of multiple programs. Sometimesthe graphical capabilities of particular softwareapplications are loosely related to the moretheoretical work, taking from them a concernwith aesthetic principles and possibly specificsorts of plots. In other cases, the linkage iscloser. Sarkar (2008) describes a data visu-alization package for R that closely followsCleveland’s ideas (and some earlier associatedsoftware), and Wickham (2009, 2010) describesa software package for R that implements andextends principles worked out in Wilkinson’s(1995 [2005]) The Grammar of Graphics.

The conceptual literature is deep and com-prehensive, although its representatives do notalways speak in one voice. This is to be expectedin an area where theoretical development

108 Healy · Moody

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 5: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

involves judgments of taste. The best-knowncritic and tastemaker by far in the field isEdward R. Tufte. It is fair to say that The VisualDisplay of Quantitative Information (Tufte 1983)is a classic in the field, and its three follow-uptexts are also widely read (Tufte 1990, 1997,2006). Described as “self-exemplifying” (Tufte2006, p. 10), the bulk of the work is a seriesof negative and positive examples with moregeneral principles (or rules of thumb) extractedfrom them rather than a direct guide to practice,akin more to a reference book on ingredientsthan to a cookbook for daily use in the kitchen.At the same time, Tufte’s early work in politi-cal science shows that he applied his ideas wellbefore codifying them in this way. His PoliticalControl of the Economy (Tufte 1978) combinesdata tables, figures, and text in a manner thatremains remarkably fresh almost 40 years later.

Across his work, Tufte preaches a consistentset of principles, though they vary in their de-gree of specificity. Thus,

Graphical excellence is the well-designed pre-sentation of interesting data—a matter of sub-

stance, of statistics, and of design. . . . [It] consistsof complex ideas communicated with clarity,precision, and efficiency. . . . [It] is that whichgives to the viewer the greatest number ofideas in the shortest time with the least inkin the smallest space. . . . [It] is nearly alwaysmultivariate. . . . And graphical excellence re-quires telling the truth about the data. (Tufte1983, p. 51)

Tufte illustrates the point with Charles JosephMinard’s famous visualization of Napoleon’smarch on Moscow, reproduced in Figure 2.He remarks that this image “may well be thebest statistical graphic ever drawn,” and arguesthat it “tells a rich, coherent story with its mul-tivariate data, far more enlightening than justa single number bouncing along over time. Sixvariables are plotted: the size of the army, its lo-cation on a two-dimensional surface, directionof the army’s movement, and temperature onvarious dates during the retreat from Moscow”(Tufte 1983, p. 40). It is worth noting how dif-ferent Minard’s image is from most contem-porary statistical graphics. Until recently, these

Figure 2Minard’s visualization of Napoleon’s advance on and retreat from Moscow is a classic of visualization, but its design is in many waysatypical.

www.annualreviews.org • Data Visualization in Sociology 109

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 6: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

have tended to be generalizations of the scatter-plot or barplot, either in the direction of seeingmore data or seeing the output of models. Theformer looks for ways to increase the volume ofdata visible, the number of variables displayedwithin a panel, or the number of panels dis-played within a plot. The latter looks for waysto see results of models—point estimates, con-fidence ranges, predicted probabilities, and soon. Tufte (1983, p. 177) acknowledges that atour de force such as Minard’s “can be describedand admired, but there are no compositionalprinciples on how to create that one wonderfulgraphic in a million.” The best one can do for“more routine, workaday designs” is to suggestsome guidelines such as “have a properly cho-sen format and design,” “use words, numbers,and drawing together,” “display an accessiblecomplexity of detail,” and “avoid content-freedecoration, including chartjunk” (p. 177).

Among this set of general goals are somespecific details that can be employed to gooduse across applications. This includes extensiveuse of layering and separation, for example,building on the insights of good cartography.Judicious use of stroke weight and color allowsone to layer multiple meanings on a singlevisual plane. The ability to successfully pulloff such effects depends on use of the smallesteffective difference—lighter lines, smaller colorvariations, and simpler textures. It has longbeen a complaint of chart designers that accom-plishing this often means working very muchagainst the (highly detailed, drop-shadowed,rich, Corinthian leather) grain of the defaultsettings in spreadsheet or other chart-makingapplications. Comparison and evaluation areoften enhanced by the use of many smallmultiples—plots that repeatedly display somereference variable or relationship (e.g., grossdomestic product versus health care costs overtime) and iterate across some other variable ofinterest (e.g., country) in an ordered fashion(see also Bertin 1967 [2010], pp. 217–45). Theuse of such multiples highlights the notion ofparallelism that allows a reader to carefullycompare across instances of similar-but-crucially-different items. Combined, these fea-

tures facilitate a simultaneous micro and macroreading where key points are clearly communi-cated at the surface, but deeper meaning is ob-tained through careful review and exploration.

A common complaint about Tufte’s workis that there are so few direct instructions.Busy cooks want a cookbook, not a pictureof a fantastic meal. The tendency for thecodification of data visualization to vacillatebetween overly abstract maxims and overlyspecific examples is characteristic of any craftwhere a practical sense of how to proceed—ataste or feeling for the right choice—mattersfor successful execution. A long-standing andplausible response to the problem is to have thedesigner make many of the judicious choices inadvance and then embed them for users in thedefault settings of graphics applications. Giventhat graphical software aimed at regular usershas been around for several decades now, how-ever, these efforts have proven less successfulthan initially hoped. In the foreword to thenew edition of Semiology of Graphics, HowardWainer (2010, p. xi) reflects on the hope heand others once felt that easy-to-use graphicaltools and software would lead to better generalpractice by way of smarter defaults. But, heargues, this has not happened. In the end, high-quality graphical presentation requires craftinga deliberately designed message rather thanaccepting the pre-established setting. Recenttheoretical work explicitly recognizes the limitsof relying on defaults. Following Wilkinson inimplementing ggplot’s “grammar of graphics”for R, Wickham (2010, p. 3) notes that theanalogy to grammar is useful because although“[a] good grammar will allow us to gain insightinto the composition of complicated graphics,and reveal unexpected connections betweenseemingly different graphics[,] . . . there willstill be many grammatically correct but non-sensical graphics. . . . [G]ood grammar is justthe first step in creating a good sentence.”

If software defaults cannot enforce the ele-ments of good taste, the next best—or maybebetter—thing is a means to easily expose themechanics of good practice. One of the mostpositive developments in statistical software

110 Healy · Moody

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 7: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

over the past 15 years has been its integrationwith a much broader set of tools built tofacilitate the sharing of both data and code.The first wave of modern statistical graphicsand information design could convey, in print,the general principles and the quality products.But the crucial piece in between—the designprocess and practical assembly—remainedopaque. Subsequently, communities of usersbegan to share not just output but code muchmore widely, whether under the auspices ofa for-profit developer (as in the case of Stata)or actively backed by free or open-sourcelicensed platforms (as with R) or expert userblogs (http://sas-and-r.blogspot.com, http://flowingdata.com, http://www.r-statistics.com/tag/visualization). Some of these havedeveloped into comprehensive referencesaimed at the practicing researcher (Chang2013). Most recently, pastebins and softwaredevelopment platforms backed by distributedversion control systems—most notablyGithub—have made sharing code both techni-cally much easier and normatively expected.

As with the move toward replication datasets, everyday sharing of code allows novicesto look behind the curtain much more easilythan before. And perhaps unlike the earlieremphasis on accepting sensible defaults, itencourages new users to tinker with variousmethods and learn by doing. In many cases,software now allows users to control verydetailed layout elements in their programscripts, which (with a little extra languagework) allows one to override defaults withprincipled graphical choices. This ongoingintegration of guidebooks, how-to websites,code repositories, and fully reproducibleexamples is a major step forward for improvingvisualization practice. As one particularly well-developed example among many, UCLA’sInstitute for Digital Research and Educationhas a large library of worked graphical examplesimplemented across several statistics packages(http://www.ats.ucla.edu/stat/dae). Finally,because most statistical packages can now pro-duce graphics as editable vector graphics files,one can use any graphical editor to fine-tune

elements (such as line thickness, greatersubtlety in color selection, etc.) for production.

These developments do not make questionsof judgment and good practice go away. Sta-tistical visualization needs to be thought of aspart and parcel of analysis and presentation. Weshould be crafting visualizations thoughtfully inthe same way we craft arguments or build mod-els. Resources of this sort cannot by themselvesguarantee that code snippets will not simply bemechanically copied or inappropriately appliedby users looking for a shortcut to a good out-come. But, to paraphrase Keynes from a dif-ferent context, they do seem to promise if notcivilized visualization, at least the possibility ofcivilized visualization.

VISUALIZATION IN PRACTICEWe have argued that there are several promis-ing ways that general principles of visualizationcan become more tangible in everyday use. Wenow turn to the question of current practicein a little more detail. Here we follow thecommon distinction between visualizationfor exploration versus presentation of a finalfinding. The former is meant for internalconsumption, as the researcher examines thedata to figure out what is going on; the latteris designed to convince a wider audience. Nat-urally, these processes overlap to some degree.The general principles covered in the previoussection—regarding clarity, honesty, showingthe data, and so on—apply equally to both thebackstage and frontstage of visualization work.But what is needed in each case does differ.Some recent developments on each side areworth highlighting.

Exploring the DataGraphical methods are now well integrated intothe process of checking assumptions and ro-bustness in most statistical packages and areoften generated by default. Figure 3 shows atypical example of some diagnostic plots of anordinary least squares regression. They wereproduced on demand and by default, with no

www.annualreviews.org • Data Visualization in Sociology 111

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 8: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

a

Fitted valuesR

esid

uals

Residuals vs Fitted

Theoretical Quantiles

Normal Q−Q

Scale−Location

Leverage

Residuals vs LSt

anda

rdiz

ed R

esid

uals

Fitted values

b

Figure 3Default diagnosticplots for a linearmodel: (a) R, (b) SAS.Though automaticallyproduced, both panelspresent informationclearly and withjudicious use oflabeling and color.

112 Healy · Moody

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 9: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

further tweaking or polishing. Note that al-though we voiced some skepticism above aboutthe ability of defaults to shape practice, theseplots are models of clarity. They could becalled into service for presentation purposes ina pinch. Their real utility, however, is the easewith which they can be produced and viewedas part of one’s everyday workflow as a socialscientist: With tools like these, comments onoutliers such as Jackman’s (1980) should neveragain be necessary.

Diagnostic plots of this kind are—inprinciple—what you look at after a model hasbeen chosen. They are confirmatory rather thanstrictly exploratory. Advocacy of exploratorydata analysis (EDA), of looking carefully andcreatively before modeling, is most closely as-sociated with John Tukey (1972, 1977). His-torically, EDA has been closely tied to the riseof graphical capabilities in statistical comput-ing, particularly tools that allow rapid interac-tive visualization. A mild sense of unease withEDA is a feature of the statistical literature. Theapproach is explicitly inductive and concernedwith exploring data in a relatively freewheelingfashion as an aid to discovery, which at times canseem uncomfortably opportunistic or unstruc-tured. To working social scientists these are of-ten virtues, but statistics is also the disciplinewhere the avoidance of spurious associations isa major focus of technical work.

As data sets have continued to increase inboth size and dimensionality, and as computingpower and graphical methods have tried tokeep up, there has been a rapprochementbetween the strictly exploratory and strictlyconfirmatory approaches. Working socialscientists routinely explore their data as partof the process of cleaning and checking it. Itwould be naive to think researchers were not onthe lookout—literally—for interesting patternsin complex data sets. Recent developments inEDA have focused on extending establishedmethods of easily looking at a lot of data atonce, and on developing new ways for visuallychecking the validity of apparent relationships.The idea is to make the exploratory a littlemore confirmatory.

A first useful tool for this sort of explorationis a generalized scatterplot matrix. In a standardpairs plot, the goal is to see all the bivariate rela-tionships in the data at once, presented in a gridso that quick comparisons can easily be made.An unfortunate limitation, particularly for thesocial sciences, is that these plots do a poor jobwith categorical variables. Ideally we would liketo see the panels of the matrix display the datain a form appropriate to the underlying vari-able. A generalized pairs plot (Emerson et al.2013) accomplishes this, using barcode plots,boxplots, mosaic plots, and other methods.Figure 4 shows an example. The specific soft-ware implementation adds additional function-ality, including the ability to display differentplots—such as barcode and mosaic plots—inthe upper and lower triangles of the plot ma-trix, histograms along the main diagonal, andthe option of adding smoothed or linear regres-sion lines to panels.

Generalized pairs plots can be extended evenfurther, depending on the software, by allow-ing further partitioning within panels. For in-stance, we can show separate histograms of acontinuous variable broken out by the valuesof a categorical variable. Multipanel plots areintrinsically rich in information. When com-bined with several within-panel types of repre-sentation and a large number of variables, theycan become quite complex. But, again, the mainutility of this approach is less in the presenta-tion of finished work—although it can certainlybe useful for that—and more in the way it en-ables the working researcher to quickly inves-tigate aspects of her own data. The goal is notto pithily summarize a single point one alreadyknows, but to open things up for further ex-ploration. Harrell (2001) remains an exemplarybook-length demonstration of the virtues of in-tegrating graphical methods with the processof data exploration (including exploring pat-terns of missingness in the data) right acrossthe process of model building, diagnostics, andpresentation.

With many variables and large amountsof data, a square matrix of plots can becomeunwieldy even to the trained eye. Seeing more

www.annualreviews.org • Data Visualization in Sociology 113

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 10: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

Figure 4A generalized pairs plot handles categorical data easily, and in different ways.

data more quickly, and in particular exploringhigh-dimensional data in a controlled way,has been a focus of recent visualization re-search. Early work—going back to Tukey, andothers—allowed for the exploration of datain three dimensions, for instance by way of

rotating a cloud of points on a screen. Thissort of approach “demoed well,” as spinningaround a cloud of colored points looks quiteimpressive to the casual observer. But in-terpreting these displays is another matter.Thus, methods for interactively exploring

114 Healy · Moody

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 11: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

−1 −0.8 −0.6 −0.4 −0.2 0Correlation

0.2 0.4 0.6 0.8 1

cerebvas

pubhealth

pubtopriv

proglib.tri

m

assault

donors

roadsexternal

gdphealth

poppop.dens

tradco

n.trim

cerebvas

pubhealth

pubtopriv

proglib.trim

assault

donors

roads

external

gdp

health

pop

pop.dens

tradcon.trim

−0.04

0.59

0.45

−0.13

0.12

−0.27

0.07

−0.08

−0.12

0.25

0.28

−0.01

−0.01

0.01

0.11

0.48

−0.04

−0.27

−0.01

−0.37

0.27

0.02

0

0.83

0.44

0.15

−0.2

−0.09

−0.41

−0.35

−0.19

−0.33

−0.28

0.39

0.14

−0.01

0.07

−0.44

−0.26

−0.42

−0.3

−0.19

0.27

0.06

−0.03

−0.06

−0.04

−0.33

−0.76

−0.64

0.35

0.14

0.33

−0.34

0.14

−0.11

−0.16

0.51

0.47

0.03

−0.23

−0.02

0.06

0.58

0.38

−0.21

0.04

−0.07

0.26

0.26

0.04

−0.16

0.11

−0.03

−0.12

0.33

0.07 0.85

Figure 5A correlation matrix represented as a tiled heat map (upper triangle) with color-keyed correlation coefficients(lower triangle).

data sets advanced on two fronts. The firstmoved toward further development of mul-tiple panels, notably with innovative ways ofvisually conditioning on additional variablesor highlighting interactively selected casesacross panels. Co-plots, shingles, and contouror surface plots are all examples of this kindof development (Cleveland 1993, pp. 186–271;Sarkar 2008, pp. 67–115). Increasingly, thesemethods take advantage of color for presentingdata, as with heatmaps or tiled representa-tions of a correlation matrix (see Figure 5).

Tools for permuting correlation matrices,either in the order produced by factor-analytictechniques or other direct optimization, allowone to identify higher-order patterns in suchfigures (Breiger & Melamed 2014).

A second direction has been the develop-ment of parallel coordinate plots, which showmultiple variables side by side in a way thatallows for the visualization of both specificoutliers and clusters of association acrossmany variables at once (Moustafa & Wegman2006, Inselberg 2009). Figure 6 gives a simple

www.annualreviews.org • Data Visualization in Sociology 115

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 12: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

−2.5

0.0

2.5

roads

conse

nt.law

txp.pop opt

pubtopiv

external

pubhealth varia

ble

pop.dens

donors

health

assault

gdp

cerebvas

Valu

e

CorporatistLiberalSocDem

World

Figure 6A parallel coordinates plot highlighting a possibly relevant grouping variable.

example, although the approach is best suitedto much larger numbers of variables and obser-vations than shown here. This sort of plot alsobenefits from being used interactively, as theordering of the variables (and the highlightingof possible grouping variables) can changethe interpretability of the graph quickly. TheGGobi system, for example, is designed toprovide interactive, semiautomated facilitiesfor “touring” large, high-dimensional data inreal time using parallel plots and a variety ofother methods (Cook & Swaine 2007).

This broad EDA tradition has recently be-gun to reconnect with the model-checkingor diagnostic approach, with convergencehappening from both directions. The long-standing concern here is that a striking visual-ization might not correspond to any robust un-derlying phenomenon. Early advocates of datavisualization typically presented a “parade ofhorribles” (e.g., Wainer 1984) showing how bad

visual presentation can distort or misrepresentthe data. But even properly presented visual-izations can be vulnerable to spurious patternattribution on the part of researchers and ob-servers. From the EDA side, Wickham et al.(2010) and Buja et al. (2009) provide some prin-cipled ways for assessing, in a broadly graphi-cal manner, whether or not the patterns one isseeing are likely to be spurious. For example,a permutation lineup presents observed datain a small-multiple context surrounded by nullplots of generated data. “Which plot shows thereal data?” Buja et al. (2009, p. 4372) ask. Ifobservers cannot reliably pick it out, then weshould doubt both the utility of the plot andthe soundness of any inferences (or arguments)based on it. From the modeling side, Gelman(2004, pp. 773–74) argues that a Bayesian ap-proach provides a principled framework for as-sessing “the implicit model checking involvedin virtually any data display.”

116 Healy · Moody

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 13: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

Although we have argued that sociologistshave been relatively slow to adopt data visual-ization, several of the issues we have discussedhave independently appeared within the socio-logical literature. Sociologists routinely dealwith data where almost all the variables of inter-est are categorical, for example. And, as notedabove, the routine and effective display of cate-gorical data (especially cross-classified categor-ical data) has not been a trivial problem to solve.Furthermore, sociology has a long tradition ofusing methods that reduce high-dimensionaldata in some way—especially via factor analysis,principal components, correspondence analy-sis, or other related methods. In Distinction, forexample, Bourdieu (1984, pp. 128–29, 262, 266,343) presents his analysis of the space of Frenchsocial class and taste in a way that is both highlyvisual but also—for some critics—decidedly dif-ficult to interpret. This family of methods lendsitself to suggestive visualization in what mightbe called a configurational mode. This is some-what inimical to the Anglo-American traditionof seeking causal relations in statistical models.Breiger (2000) provides a useful discussion ofsome of the issues here, emphasizing points ofconvergence.

Dimensional reduction of this sort typicallycharacterizes the problem of interest in termsof space or distance, which naturally encouragesthe mapping of social systems. Sociologists havebeen among the earliest users of these visualiza-tion tools, particularly with network analysis.The earliest interactive network tools were lit-erally peg boards and rubber bands (Freeman2004) or pins-and-strings.1 Interactive explo-ration of social network data has obviously beenmade much easier with the advent of efficientcomputer programs. Released in 1996, PAJEKwas one of the earliest completely interactivevisualization tools that was also optimized forlarge networks. Earlier software typically sepa-rated the visualization and analysis steps. Therehas since been rapid growth in the development

1See http://www.soc.duke.edu/∼jmoody77/VizARS/sna_peg.jpg.

of interactive network exploration tools, in-cluding on the web (http://www.theyrule.net,http://dirtyenergymoney.com). The chal-lenge for such work is excess reduction in theinherent complexity of the data, which has ledmethodologists to propose fit statistics for net-work layouts (Moody et al. 2005, Brandes et al.2012).

The rapid availability of fully dynamic net-work data has created opportunities and chal-lenges for visualization. Network movies, forexample, allow one to capture the relational dy-namics as they unfold in space and time (Moodyet al. 2005, Bender-deMoll et al. 2008, Morriset al. 2009). The clear advantage of a net-work movie is that one can reserve the twodimensions of the visual plane for mappingthe topography of the social system and watchthe shape of the system change as the anima-tion runs. This is particularly useful for explo-ration, as it makes visible dynamic features thatare otherwise difficult to capture in summarystatistics. But there are also costs. People tendto have poor visual memories, so comparingnonadjacent moments in time is challenging,and the analyst must make strong assumptionsabout how to aggregate the network eventsover time. Similar visualization challenges arebecoming common in dynamic statistical dis-plays, such as the GapMinder data set, whichallows one to explore associations over time(http://www.gapminder.org).

Presenting the ResultsThese considerations lead naturally to the ques-tion of presenting data. Most of the principlesdiscussed above regarding the construction offigures for exploring data also apply to present-ing it, if only because the audiences are oftenthe same—that is, experts in a particular field.But effective statistical graphics have a rhetor-ical aspect, too (Kostelnick 2008). In general,the goal is to look for ways of presenting thedata that are both effective with respect to one’sargument and honest with respect to the data.

Though conceptually simple and among theearliest examples of statistical visualizations,

www.annualreviews.org • Data Visualization in Sociology 117

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 14: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

0 20 40 60 80 1000

20

40

60

80

Aut

hors

(%)

Total number of lifetime publications Total number of lifetime publications

Aut

hors

(%)

1 10 1000.001

0.01

0.1

1

10

100

a b

Figure 7The distribution of authors’ lifetime number of publications in three very selective sociology journals ishighly skewed. In comparison to a standard histogram (a), a log-log histogram (b) is much better at revealingdetails in the “long tail” of the distribution.

variable distributions remain of keen substan-tive interest. Many of the distributions typicallystudied in sociology are extremely skewed anddifficult to display as simple histograms. Con-sider, for example, some data on the number oftimes authors publish in a select set of journals(here the American Sociological Review, Ameri-can Journal of Sociology, and Social Forces) overthe course of their career. Figure 7a presentsa standard histogram, whereas Figure 7b fol-lows the convention now common in the phys-ical sciences of presenting the distribution on alog-log scale.

When comparing distributions across cat-egorical variables, comparative boxplots allowone to examine multiple moments of a distri-bution across multiple categories or over time(with some loss of resolution). The presentationof joint distributions of multiple categoricalvariables has similarly been improved witharea-accurate Venn diagrams (see for example,http://www.eulerdiagrams.org/eulerAPE).An important contribution to this literatureis the work of Handcock & Morris (1999) onrelative distribution methods. By comparingthe ratio of two distributions at each pointalong the x-axis, one is quickly able to identify

differences in both shape and central tendency.Figure 8 reproduces the relative distributionin permanent wage growth for two cohorts ofthe National Longitudinal Survey. If the wagedistributions were identical, the density wouldbe a simple horizontal line at 1.0; instead wesee much greater inequality (heavier tails atboth ends) in the recent cohort.

A related problem involves effectivelydisplaying trends over time, particularly whenattempting to demonstrate strong variabilityacross units. The convention of reserving thex-axis for time and the y-axis for magnitudebecomes tricky if many series are given equalweight. An effective solution involves carefullychoosing colors, line weights, and labels tohighlight a particular strand among many (seeFigure 10 below). Moody et al. (2011) areable to demonstrate the wild variability inadolescent popularity sequences by generatinga scatterplot of trajectory summaries withexemplar labels.2 Because each position in the

2See http://www.soc.duke.edu/∼jmoody77/VizARS/Figure5.jpg for trendspace; http://www.soc.duke.edu/∼jmoody77/VizARS/Figure%206.pdf for application ofthis space to model prediction outcomes.

118 Healy · Moody

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 15: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

Proportion of the original cohort

Permanent differences in log wagesRe

lativ

e de

nsity

0.0

0.5

1.0

1.5

2.0

2.5

3.0

0.0 0.2 0.4 0.6 0.8 1.0

−1 0.5 1 1.5 2

Figure 8The relative probability density function distribution of permanent wage growth in the original and recentNational Longitudinal Survey cohorts. A decile bar chart is superimposed on the density estimate. Theupper axis is labeled in permanent differences in log wages (adapted from Handcock & Morris 1999).

field captures a unique trend, the distributionalcoverage of the space suggests there is notypical sequence.

Moving beyond simple variable compari-son displays, the bulk of statistical work in so-ciology involves complex multivariate models.Even with good statistical training, tables ofcoefficients are hard to decipher quickly andtend to foreground statistical significance oversubstantive magnitudes. Straightforwardly in-terpreting the effects of independent variables israrely intuitive, especially for models with com-plex link functions, categorical components,or interaction terms. Although odds ratios aremargin free and thus nominally interpretable,knowing whether an effect is substantively largeis often difficult without comparative contextand may be impossible to discern directly fromthe table without intimate knowledge of the un-derlying distribution of control variables. Thesimplest solution to this problem is to use themodel to predict outcome variables at differ-ent levels or combinations of the independent

variables of interest. Figure 9a shows a pow-erful example from Mirowsky & Ross (2007).They use a new style of vector graphs for la-tent growth models by age (see Mirowsky &Kim 2007) to display predicted values from in-teraction terms. This enables them to take re-sults from a complex structural equation modelof people’s perceived sense of control and si-multaneously illustrate both within-cohort andbetween-cohort changes at varying levels of ed-ucation in a way that would be otherwise verydifficult to represent.

The figure allows one to identify changeswithin cohorts (change within vector) and overtime (sequence of arrows by group). Here wesee that high school dropouts have a lowersense of control overall but a dramatic drop insense of control during youth that levels outas they age. College-educated respondents, incontrast, have a generally high sense of controlthat is continuously optimistic through adult-hood, turning negative only after about age 60.Recent advances in the use of statistical graphics

www.annualreviews.org • Data Visualization in Sociology 119

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 16: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

1.2

1.0

0.8

0.4

0.2

0.0

0.6

Pred

icte

d se

nse

of co

ntro

l

18 24 30 36 42 48 54 60 66 72 78 84 90Age (years)

Collegedegree

High schooldegree

No high schooldegree

83Y Percentile

68

51

18

9

33

a

LabourLiberal DemocratConservative

0.0

0.2

0.4

0.6

0.8

3 6 9 3 6 9 3 6 9Attitude toward Europe

Prob

abili

ty

Knowledge

0

1

2

3

b

Figure 9(a) Vector diagram for latent trajectory model of perceived control by age, cohort, and education (adapted from Mirowsky & Ross2007, with permission from the University of Chicago Press). (b) Predicted probabilities and standard errors plotted from a multinomialmodel (adapted from Fox & Hong 2009).

for model interpretation include estimates ofthe uncertainty of the model predictions. Mostsoftware now provides easy access to modelpredictions from the data, and this allows oneto provide results under varying scenarios (see,for example, Alkema et al. 2011). In this case,

the hard work is done before the plot is made.Figure 9b shows a series of predicted proba-bilities from a multinomial model at differentlevels of various predictors and outcomes,with appropriate standard errors shown. Hereno conceptual advances are needed on the

120 Healy · Moody

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 17: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

Colorado SpringsHIV risk network

a Default PAJEK view b Edited for presentation

High

Low

Closenesscentrality

Figure 10Network exemplar of moving between software default and presentation results. Subtle adjustments to line widths and color palettesand the addition of a centrality scale greatly aid interpretability in (b).

graphical side, just the ability to get informa-tion out of the model in a readily interpretableform (Fox 2003, Fox & Hong 2009).

The distance between exploratory and pre-sentation graphics is most pronounced as thedensity of information necessary to display in-creases. Network images are particularly inter-esting in this case. A little effort with layeringand coloring makes a real difference. Consideralso Figure 10, which shows a before and afterof the same data. The basic layout is retained(with the addition of a little jittering to allevi-ate algorithmically induced stacking), but theresult is much more interpretable.

Recent work on constructing visually inter-pretable social networks has focused on care-ful data reduction, either by suppressing nodesentirely in favor of contour-style diagrams(Moody 2004, Moody & Light 2006) or bydeleting or bundling edges to highlight struc-ture (Crnovrsanin et al. 2014). Other workhas focused explicitly on quantifying the layout

model using stress or multidimensional scaling–related techniques (Frank & Yasumoto 1998,Brandes & Pich 2006, Brandes et al. 2012; seeLima 2011 for exemplars).

Our focus so far has been on presenting re-sults to professional peers. But in recent yearsthe clear presentation of data to broader publicshas become increasingly important. It has neverbeen easier to circulate full-color graphics oforiginal data analysis to large groups of peo-ple. Social sharing of data through the Inter-net generally, but especially through servicessuch as Facebook and Twitter, has acceleratedthe rise of infographics or info-visualization.To many working statisticians, infographicsare the descendants of Tufte’s Ducks—those“self-promoting graphics” where “the over-all design purveys Graphical Style rather thanquantitative information” (Tufte 1983, p. 116).The contemporary infographic in its pureform is a supercharged megaduck incorporat-ing not only the bells and whistles derided by

www.annualreviews.org • Data Visualization in Sociology 121

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 18: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

Tufte but far more besides, such as a spuriousquasi-narrative structure, pictographic se-quencing, or excessive dynamic elements.Gelman & Unwin (2013) discuss Infovis-stylework from a statistical point of view. They arguethat most infographics do not meet the stan-dards normally demanded of statistical visual-izations, but they concede that sometimes thegoals of the latter are not those of the former.

It seems clear, though, that informationvisualization tools will become ever morewidespread. In keeping with our general argu-ment that good visualization is a component ofbroader good practice around data analysis, akey issue is the openness of standards and toolsfor data analysis on the web. Social scientistshave typically worked within dedicated statisti-cal applications to produce static graphics in aformat geared primarily for print publication.But there has been tremendous developmentover the past decade, and even just within thepast five years, in tools designed to present datainteractively on the web. The development ofpowerful libraries written in JavaScript has al-lowed developers to present statistical graphicsin a way that is quite open with respect to bothcode and data. Mike Bostock’s D3 library, forinstance, is increasingly used by statisticians andmedia analysts alike and provides a powerful setof dynamic visual methods (Murray 2013). It isalways difficult to know ex ante which particu-lar software tool kits have staying power in thelong run—functionally similar platforms andlibraries have come and gone before—whichis why static formats such as Postscript andportable document format, or PDF, are so long-lived. But even so, the leading edge of develop-ment in this area seems to be moving to fur-ther integrate specific statistical tools such asR with data formats (notably JavaScript ObjectNotation, or JSON) that can be presented effec-tively and interactively in the browser. For somekinds of data, notably the generation of dynamicchoropleth maps and cartograms, the standardof presentation in some media outlets is nowvery high. It can be difficult to interpret com-plex and colorful maps with data chunked intounits that vary radically by size (e.g., US coun-

ties). Nevertheless, a map such as the one shownin Figure 11, which appeared in the New YorkTimes (Bloch & Gebeloff 2009), makes for a veryengaging way to explore patterns both spatiallyand over time. Presenting data of this sort inan effective, interactive package is difficult forsmall teams of researchers to accomplish. Butit is not impossible. Katz’s (2013) dialect surveymaps are a compelling recent example of what isnow within reach. Developers seem interestedin building the production of web-enabled con-tent into the software sociologists are used tousing, and thus these tools are likely to continueto become more powerful and easier to use.

For sociologists thinking about the publicimpact of their work, it is worth bearing inmind that, the sins of Infovis notwithstanding,a well-crafted statistical graphic is the fastestway to propagate one’s findings. Moreover, it iseasy to forget how revelatory the general publiccan find even a relatively ordinary descriptiveimage if it is properly constructed. The panelsin Figure 12 show two examples. Figure 12ashows the rate of deaths due to assault in 24OECD countries between 1960 and 2011.The point of the image is to emphasize theexceptionally high death rate in the UnitedStates compared with other countries (aswell as the large changes in the US numberthat are visible over the timeframe), and sothe US series is colored separately from therest, with every other country getting theirown smoothed line and data points, but notindividual colors. The unique trajectory of theUnited States is immediately apparent. The useof color probably helped the image circulatemore widely in social media and traditionaloutlets than it otherwise might have. Color isnot strictly necessary, however, as the superbimage in Figure 12b makes clear. Taken fromKenworthy (2014), Figure 12b shows trendsin life expectancy plotted against a measureof health expenditures for 20 countries. TheUnited States is singled out with a bolder linethan the others. Individual data points are notplotted. There are only seven numbers labeledon the graph (including the one in “19 otherrich countries”), yet a strong argument based

122 Healy · Moody

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 19: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

Figure 11A New York Times interactive choropleth map allows users to explore historical and geographical patterns of migration to the UnitedStates (Bloch & Gebeloff 2009, adapted with permission from the New York Times; the interactive map is available at http://www.nytimes.com/interactive/2009/03/10/us/20090310-immigration-explorer.html).

on rich data is beautifully made about what hashappened to the returns to health spending inthe OECD generally, and in the United Statesin particular. In the original presentation,Kenworthy characterizes the data and mea-sures with a compact note in the caption,specifying the methods and measures. There isnothing about this figure that is conceptually ortechnically new. And yet a clearly conceived andcleanly executed image like this is still relativelyuncommon in the sociological literature.

Visualizations of categorical data remainmore difficult to convey effectively, partly be-cause the general public is not always familiarwith conventional ways to present it. Mosaic

plots, for instance, can be effective representa-tions of contingency tables, but people are nottaught to read them in the same way they canread bar charts or scatterplots. The effectivevisualization of network data presents similarissues. The dual problems of dimensionalityand scale require creative ways to layer andaggregate information in a manner that high-lights the key features of interest. In an attemptto characterize trends in political polarizationin the US Senate, Moody & Mucha (2013)relied on a combination of multiple aggrega-tion strategies and visual “identity arcs” linkingindividuals over time that effectively pushed“party loyalists” to the background while

www.annualreviews.org • Data Visualization in Sociology 123

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 20: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

0

2

4

6

8

10

1960 1970 1980 1990 2000 2010Year

Ass

ault

Dea

ths

per 1

00,0

00 p

opul

atio

n

United States 23 other OECD Countries

US

19 other richcountries

70

78

83

Life

exp

ecta

ncy

5 12 18%Health expenditures

a Assault deaths by country b Life expectancy by country

Figure 12(a) Assault deaths in the United States and 23 other OECD countries (Healy 2012). (b) Health expenditure (as a percentage of GDP)and life expectancy in the United States and 19 other rich countries (see Kenworthy 2014; image courtesy of L. Kenworthy).

highlighting those (increasingly rare) senatorswho reach across the aisle (Figure 13).

CONCLUSIONWe have argued that quantitative visualizationis a core feature of social-scientific practice fromstart to finish. All aspects of the research pro-cess from the initial exploration of data to theeffective presentation of a polished argumentcan benefit from good graphical habits. Goodgraphics are not, of course, the only thing—seeGodfrey (2013) for a discussion of the situationof blind and visually impaired users of currentstatistical software. But the dominant trend istoward a world where the visualization of dataand results is a routine part of what it means todo social science.

Getting general audiences comfortable withdifferent kinds of data visualization is a long-term project, and not one that any particularresearcher or journal editor has any meaningfulcontrol over. But given that the interpretabilityof statistical graphics rests on both their inter-nal coherence as objects and the shared rep-resentational conventions they embody, a firststep is to insist on good standards in the peerreview process. A glance at recent issues of,

say, the American Sociological Review shows thatthe standards for publishable graphical materialvary wildly between and even within articles—far more than the standards for data analysis,prose, and argument. Variation is to be ex-pected, but the absence of consistency in ele-ments as simple as axis labeling, gridlines, orlegends is striking. Just as training in elemen-tary visualization methods should be a standardcomponent of graduate education, our flag-ship journals should encourage their authors tothink about the most effective ways to encour-age visual clarity. This should not take the formof overly strict style guides but instead aim foran ideal of consistent, considered good judg-ment in the presentation of data and results inthe service of sociological argument.

Effective data visualization is part of abroader shift in the social sciences where dataare more easily available, code and coding toolsare more widely accessible, and high-qualitygraphical work is easy to produce and share.We hope for professional audiences who ex-pect to see effective graphics as a routine as-pect of presented work, and we look forwardto wider publics who are able to comfortablyread and interpret good graphical work. Sociol-ogists should take advantage of the remarkable

124 Healy · Moody

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 21: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

Polarization modularity

0

0.27

0.27

0.13

0.13

‘07–

’08

‘03–

’04

‘99–

’00

‘95–

’96

‘91–

’92

‘87–

’88

‘83–

’84

‘79–

’80

‘75–

76

Democrats

US

Sena

te v

otin

g sim

ilarit

y ne

twor

ks, 1

975–

2012

Tim

elin

e: p

resi

dent

, Sen

ate

part

y ba

lanc

e, a

nd d

ate

(thr

ough

June

7, 2

012)

Republicans

Cart

erFo

rdRe

agan

G.H

.W. B

ush

Clin

ton

G.W

. Bus

h

DR

RD

RR

RD

DD

DR

RR

DD

D

0

0.1

0.2

0.3

1910

1930

1950

1970

Year

1990

2010

Det

ail

Modularity

R

Oba

ma ‘11–

’12

DD

Gro

up si

zeSe

nato

rscr

ossi

ng ti

me

20101

5062

6256

56Se

nate

bal

ance

With

in-g

roup

vote

sim

ilarit

y

0.72

0.89

0.78

0.83

Vote

sim

ilarit

y (≥

0.6)

RD

5050 5510102525

Figu

re13

Agg

rega

tion

and

akn

own

dim

ensio

n(a

pola

riza

tion

scal

e)sim

plify

aco

mpl

exne

twor

kla

yout

.(A

dapt

edfr

omM

oody

&M

ucha

2013

with

perm

issio

nfr

omC

ambr

idge

Uni

vers

ityPr

ess.)

www.annualreviews.org • Data Visualization in Sociology 125

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 22: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

progress in methods, tools, and means toshare—from statistics to computational social

science to web development—the better to seethe social world, and help others see it, too.

DISCLOSURE STATEMENTThe authors are not aware of any affiliations, memberships, funding, or financial holdings thatmight be perceived as affecting the objectivity of this review.

ACKNOWLEDGMENTSWe thank Jaemin Lee, Achim Edelmann, and Richard Benton for comments on earlier drafts.Permission to use copyrighted material was granted by the American Sociological Association(Figure 1b), the University of Chicago Press (Figure 9a), the New York Times (Figure 11), andCambridge University Press (Figure 13). All other figures are taken from the public domain and/orsignificantly redrawn and adapted by the authors. Partial support for this work was provided byNIH grants 1R21HD068317-01 and 1 R01 HD075712-01.

LITERATURE CITED

Alkema L, Raftery AE, Gerland P, Clark SJ, Pelletier F, et al. 2011. Probabilistic projections of the totalfertility rate for all countries. Demography 48:815–39

Anscombe FJ. 1973. Graphs in statistical analysis. Am. Stat. 27:17–21Bender-deMoll S, Morris M, Moody J. 2008. Prototype packages for managing and animating longitudinal

network data: dynamicnetwork and rSoNIA. J. Stat. Softw. 24(7). http://www.jstatsoft.org/v24/i07Bertin J. 1967 (2010). Semiology of Graphics: Diagrams, Networks, Maps. Redlands, CA: ESRI PressBloch M, Gebeloff R. 2009. Immigration explorer. New York Times, March 10. http://www.nytimes.com/

interactive/2009/03/10/us/20090310-immigration-explorer.html.Bourdieu P. 1984. Distinction: A Social Critique of the Judgment of Taste. Cambridge, MA: Harvard Univ. PressBrandes U, Indlekofer N, Mader M. 2012. Visualization methods for longitudinal social networks and stochas-

tic actor-oriented modeling. Soc. Netw. 43:291–308Brandes U, Pich C. 2006. Eigensolver methods for progressive multidimensional scaling of large data. Int.

Symp. Graph Drawing (GD), Lect. Notes Comput. Sci. (LNCS) 4372:42–53Breiger RL. 2000. A toolkit for practice theory. Poetics 27:91–115Breiger RL, Melamed D. 2014. The duality of organizations and their attributes: turning regression modeling

‘inside out.’ Res. Sociol. Organ. 40:261–74Buja A, Cook D, Hofmann H, Lawrence M, Lee EK, et al. 2009. Statistical inference for exploratory data

analysis and model diagnostics. Phil. Trans. R. Soc. A 367:4361–83Chang W. 2013. The R Graphics Cookbook. Sebastopol, CA: O’ReillyChapin FS. 1924. The statistical definition of a societal variable. Am. J. Sociol. 30:154–71Chatterjee S, Firat A. 2007. Generating data with identical statistics but dissimilar graphics: a follow up to the

Anscombe Dataset. Am. Stat. 61:248–54Cleveland WS. 1993. Visualizing Data. Summit, NJ: HobartCleveland WS. 1994. The Elements of Graphing Data. Summit, NJ: HobartCook D, Swaine DF. 2007. Interactive and Dynamic Graphics for Data Analysis. New York: SpringerCrnovrsanin T, Muelder CW, Faris R, Felmlee D, Ma K-L. 2014. Visualization techniques for categorical

analysis of social networks with multiple edge sets. Soc. Netw. 37:56–64Du Bois WEB. 1898 (1967). The Philadelphia Negro. New York: Shocken BooksEmerson JW, Green W, Schloerke B, Crowley B, Cook D, et al. 2013. The generalized pairs plot. J. Comp.

Graph. Stat. 22:79–91

126 Healy · Moody

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 23: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

Few S. 2009. Now You See It: Simple Visualization Techniques for Quantitative Analysis. Oakland, CA: AnalyticsFew S. 2012. Show Me the Numbers: Designing Tables and Graphs to Enlighten. Burlingame, CA: Analytics.

2nd ed.Fox J. 2003. Effect displays in R for generalised linear models. J. Stat. Softw. 8(15). http://www.

jstatsoft.org/v08/i15/paperFox J, Hong J. 2009. Effect displays in R for multinomial and proportional-odds logit models: extensions to

the effects package. J. Stat. Softw. 32(1). http://www.jstatsoft.org/v32/i01/paperFrank KA, Yasumoto J. 1998. Linking action to social structure within a system: social capital within and

between subgroups. Am. J. Sociol. 104:642–86Freeman LC. 2004. The Development of Social Network Analysis: A Study in the Sociology of Science. Vancouver,

Can.: EmpiricalFreese J. 2007. Reproducibility standards in quantitative social science: why not sociology? Soc. Methods Res.

36:153–72Friendly M. 2000. Visualizing Categorical Data. Cary, NC: SAS Inst.Gelman A. 2004. Exploratory data analysis for complex models. J. Comput. Graph. Stat. 13:755–79Gelman A, Unwin A. 2013. Infovis and statistical graphics: different goals, different looks. J. Comp. Graph.

Stat. 22:2–28Godfrey AJR. 2013. Statistical software from a blind person’s perspective. R J. 5:73–79Handcock MS, Morris M. 1999. Relative Distribution Methods in the Social Sciences. New York: Springer-VerlagHarrell F. 2001. Regression Modeling Strategies. New York: SpringerHart HH. 1896. Immigration and crime. Am. J. Sociol. 2:369–77Healy K. 2012. America is a violent country. Kieran Healy Blog, July 20. http://kieranhealy.org/blog/

archives/2012/07/20/america-is-a-violent-countryHewitt C. 1977. The effect of political democracy and social democracy on equality in industrial societies: a

cross-national comparison. Am. Sociol. Rev. 42:450–64Inselberg A. 2009. Parallel Coordinates: Visual Multidimensional Geometry and its Applications. New York:

SpringerJackman RM. 1980. The impact of outliers on income inequality. Am. Sociol. Rev. 45:344–47Katz J. 2013. Regional dialect variation in the continental US. Work. Pap., Proj. Beyond “Soda, Pop, or Coke,”

Dep. Stat., N.C. State Univ., Raleigh. http://www4.ncsu.edu/∼jakatz2/project-dialect.htmlKenworthy L. 2014. Social Democratic America. New York: Oxford Univ. PressKeynes JM. 1938. Review of HG Funkhouser, Historical Development of the Graphical Representation of Statistical

Data. Econ. J. 48:281–82Kleimean K, Horton NJ. 2013. SAS and R: Data Management, Statistical Analysis, and Graphics. Boca Raton,

FL: Chapman & Hall/CRC. 2nd ed.Kostelnick C. 2008. The visual rhetoric of data displays: the conundrum of clarity. IEEE Trans. Prof. Commun.

51:116–29Lenski G. 1966. Power and Privilege. New York: McGraw-HillLima M. 2011. Visual Complexity: Mapping Patterns of Information. New York: Princeton Archit. PressLundberg GA, Steele M. 1938. Social attraction-patterns in a village. Sociometry 1:375–419Mann ME, Bradley RS, Hughes MK. 1999. Northern hemisphere temperatures during the past millennium:

inferences, uncertainties, and limitations. Geophys. Res. Lett. 26:759–62Marro A. 1899. Influence of the puberal development upon the moral character of children of both sexes. Am.

J. Sociol. 5:193–219Mirowsky J, Kim J. 2007. Graphing age trajectories: vector graphs, synthetic and virtual cohort projections,

and virtual cohort projections, and cross-sectional profiles of depression. Sociol. Methods Res. 35:497–541Mirowsky J, Ross C. 2007. Life course trajectories of perceived control and their relationship to education.

Am. J. Sociol. 112:1339–82Mitchell M. 2012. A Visual Guide to Stata Graphics. College Station, TX: Stata. 3rd ed.Moody J. 2004. The structure of a social science collaboration network: disciplinary cohesion from 1963 to

1999. Am. Sociol. Rev. 69:213–38Moody J, Brynildsen WD, Osgood DW, Feinberg ME, Gest S. 2011. Popularity trajectories and substance

use in early adolescence. Soc. Netw. 33:101–12

www.annualreviews.org • Data Visualization in Sociology 127

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 24: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40CH05-MoodyHealy ARI 4 July 2014 13:29

Moody J, Light R. 2006. A view from above: the evolving sociological landscape. Am. Sociol. 38:67–86Moody J, McFarland DA, Bender-deMoll S. 2005. Dynamic network visualization: methods for meaning with

longitudinal network movies. Am. J. Sociol. 110:1206–41Moody J, Mucha PJ. 2013. Portrait of political party polarization. Netw. Sci. 1:119–21Morris M, Kurth AE, Hamilton DT, Moody J, Wakefield S. 2009. Concurrent partnerships and HIV preva-

lence disparities by race: linking science and public health. Am. J. Public Health 99:1023–31Moustafa R, Wegman E. 2006. Multivariate continuous data—parallel coordinates. In Graphics of Large

Datasets, ed. A Unwin, C Theus, H Hofmann, pp. 143–56. New York: SpringerMurray S. 2013. Interactive Data Visualization for the Web. Sebastopol: O’ReillyMurrell P. 2011. R Graphics. Boca Raton, FL: Chapman & Hall. 2nd ed.Sarkar D. 2008. Lattice: Multivariate Data Visualization with R. New York: SpringerSletto RF. 1936. A critical study of the criterion of internal consistency in personality scale construction. Am.

Sociol. Rev. 1:61–68Stack S. 1979. The effects of political participation and social party strength on the degree of income inequality.

Am. Sociol. Rev. 44:168–71Tufte ER. 1978. Political Control of the Economy. Princeton, NJ: Princeton Univ. PressTufte ER. 1983. The Visual Display of Quantitative Information. Cheshire, CT: GraphicsTufte ER. 1990. Envisioning Information. Cheshire, CT: GraphicsTufte ER. 1997. Visual Explanations: Images and Quantities, Evidence and Narrative. Cheshire, CT: GraphicsTufte ER. 2006. Beautiful Evidence. Cheshire, CT: GraphicsTukey JW. 1972. Some graphic and semigraphic displays. In Statistical Papers in Honor of George W. Snedecor,

ed. TA Bancroft, pp. 293–316. Ames: Iowa State Univ. PressTukey JW. 1977. Exploratory Data Analysis. New York: Addison WesleyWainer H. 1984. How to display data badly. Am. Stat. 38:137–47Wainer H. 2010. Foreword. See Bertin 1967 (2010), pp. ix–xWickham H. 2009. ggplot2: Elegant Graphics for Data Analysis. New York: SpringerWickham H. 2010. A layered grammar of graphics. J. Comput. Graph. Stat. 19:3–28Wickham H, Cook D, Hofmann H, Buja A. 2010. Graphical inference for Infovis. IEEE Trans. Vis. Comput.

Graph. 6:973–79Wilkinson L. 1995 (2005). The Grammar of Graphics. New York: Springer. 2nd ed.Yau N. 2012. Visualize This: The FlowingData Guide to Design, Visualization, and Statistics. Indianapolis, IN:

Wiley

128 Healy · Moody

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 25: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40-FrontMatter ARI 8 July 2014 6:42

Annual Reviewof Sociology

Volume 40, 2014Contents

Prefatory Chapter

Making Sense of CultureOrlando Patterson ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 1

Theory and Methods

Endogenous Selection Bias: The Problem of Conditioning on aCollider VariableFelix Elwert and Christopher Winship ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣31

Measurement Equivalence in Cross-National ResearchEldad Davidov, Bart Meuleman, Jan Cieciuch, Peter Schmidt, and Jaak Billiet ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣55

The Sociology of Empires, Colonies, and PostcolonialismGeorge Steinmetz ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣77

Data Visualization in SociologyKieran Healy and James Moody ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 105

Digital Footprints: Opportunities and Challenges for Online SocialResearchScott A. Golder and Michael W. Macy ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 129

Social Processes

Social Isolation in AmericaPaolo Parigi and Warner Henson II ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 153

WarAndreas Wimmer ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 173

60 Years After Brown: Trends and Consequences of School SegregationSean F. Reardon and Ann Owens ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 199

PanethnicityDina Okamoto and G. Cristina Mora ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 219

Institutions and Culture

A Comparative View of Ethnicity and Political EngagementRiva Kastoryano and Miriam Schader ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 241

v

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 26: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40-FrontMatter ARI 8 July 2014 6:42

Formal Organizations

(When) Do Organizations Have Social Capital?Olav Sorenson and Michelle Rogan ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 261

The Political Mobilization of Firms and IndustriesEdward T. Walker and Christopher M. Rea ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 281

Political and Economic Sociology

Political Parties and the Sociological Imagination:Past, Present, and Future DirectionsStephanie L. Mudge and Anthony S. Chen ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 305

Taxes and Fiscal SociologyIsaac William Martin and Monica Prasad ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 331

Differentiation and Stratification

The One PercentLisa A. Keister ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 347

Immigrants and African AmericansMary C. Waters, Philip Kasinitz, and Asad L. Asad ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 369

Caste in Contemporary India: Flexibility and PersistenceDivya Vaid ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 391

Incarceration, Prisoner Reentry, and CommunitiesJeffrey D. Morenoff and David J. Harding ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 411

Intersectionality and the Sociology of HIV/AIDS: Past, Present,and Future Research DirectionsCeleste Watkins-Hayes ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 431

Individual and Society

Ethnic Diversity and Its Effects on Social CohesionTom van der Meer and Jochem Tolsma ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 459

Demography

Warmth of the Welcome: Attitudes Toward Immigrantsand Immigration Policy in the United StatesElizabeth Fussell ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 479

Hispanics in Metropolitan America: New Realities and Old DebatesMarta Tienda and Norma Fuentes ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 499

Transitions to Adulthood in Developing CountriesFatima Juarez and Cecilia Gayet ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 521

vi Contents

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.

Page 27: Data Visualization in Sociology - · PDF filecurrent state of data visualization in sociology. ... Exemplars of bar charts (Hart 1896), line graphs ... of current best practices meant

SO40-FrontMatter ARI 8 July 2014 6:42

Race, Ethnicity, and the Changing Context of Childbearingin the United StatesMegan M. Sweeney and R. Kelly Raley ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 539

Urban and Rural Community Sociology

Where, When, Why, and For Whom Do Residential ContextsMatter? Moving Away from the Dichotomous Understanding ofNeighborhood EffectsPatrick Sharkey and Jacob W. Faber ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 559

Gender and Urban SpaceDaphne Spain ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 581

Policy

Somebody’s Children or Nobody’s Children? How the SociologicalPerspective Could Enliven Research on Foster CareChristopher Wildeman and Jane Waldfogel ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 599

Sociology and World Regions

Intergenerational Mobility and Inequality: The Latin American CaseFlorencia Torche ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 619

A Critical Overview of Migration and Development:The Latin American ChallengeRaul Delgado-Wise ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 643

Indexes

Cumulative Index of Contributing Authors, Volumes 31–40 ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 665

Cumulative Index of Article Titles, Volumes 31–40 ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 669

Errata

An online log of corrections to Annual Review of Sociology articles may be found athttp://www.annualreviews.org/errata/soc

Contents vii

Ann

u. R

ev. S

ocio

l. 20

14.4

0:10

5-12

8. D

ownl

oade

d fro

m w

ww

.ann

ualre

view

s.org

Acc

ess p

rovi

ded

by D

uke

Uni

vers

ity o

n 08

/09/

17. F

or p

erso

nal u

se o

nly.