data visualization in sociology...so40ch05-moodyhealy ari 4 july 2014 13:29 introduction from the...

27
Data Visualization in Sociology Kieran Healy and James Moody Sociology Department, Duke University, Durham, North Carolina 27708; email: [email protected], [email protected] Annu. Rev. Sociol. 2014. 40:105–28 First published online as a Review in Advance on June 6, 2014 The Annual Review of Sociology is online at soc.annualreviews.org This article’s doi: 10.1146/annurev-soc-071312-145551 Copyright c 2014 by Annual Reviews. All rights reserved Keywords visualization, statistics, methods, exploratory data analysis Abstract Visualizing data is central to social scientific work. Despite a promising early beginning, sociology has lagged in the use of visual tools. We review the history and current state of visualization in sociology. Using examples throughout, we discuss recent developments in ways of seeing raw data and presenting the results of statistical modeling. We make a general distinction between those methods and tools designed to help explore data sets and those designed to help present results to others. We argue that recent advances should be seen as part of a broader shift toward easier sharing of code and data both between researchers and with wider publics, and we encourage practitioners and publishers to work toward a higher and more consistent standard for the graphical display of sociological insights. 105 Annu. Rev. Sociol. 2014.40:105-128. Downloaded from www.annualreviews.org by Duke University on 07/31/14. For personal use only.

Upload: others

Post on 25-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    Data Visualizationin SociologyKieran Healy and James MoodySociology Department, Duke University, Durham, North Carolina 27708;email: [email protected], [email protected]

    Annu. Rev. Sociol. 2014. 40:105–28

    First published online as a Review in Advance onJune 6, 2014

    The Annual Review of Sociology is online atsoc.annualreviews.org

    This article’s doi:10.1146/annurev-soc-071312-145551

    Copyright c⃝ 2014 by Annual Reviews.All rights reserved

    Keywordsvisualization, statistics, methods, exploratory data analysis

    AbstractVisualizing data is central to social scientific work. Despite a promisingearly beginning, sociology has lagged in the use of visual tools. Wereview the history and current state of visualization in sociology. Usingexamples throughout, we discuss recent developments in ways of seeingraw data and presenting the results of statistical modeling. We make ageneral distinction between those methods and tools designed to helpexplore data sets and those designed to help present results to others.We argue that recent advances should be seen as part of a broader shifttoward easier sharing of code and data both between researchers andwith wider publics, and we encourage practitioners and publishers towork toward a higher and more consistent standard for the graphicaldisplay of sociological insights.

    105

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    INTRODUCTIONFrom the mind’s eye to the Hubble telescope,visualization is a central feature of discovery,understanding, and communication in science.There are many different ways to see. Visualtools range from false-color photographs oftelescopic images in astronomy to reconstruc-tions of prehistoric creatures in paleontology.In the statistical sciences, images are often moreabstract than models of fighting dinosaurs—depending as they must on conventions that linksize, value, texture, color, orientation, or shapeto quantities (Bertin 1967 [2010]). But statisti-cal visualizations are nonetheless critical to pro-moting science. One need only think of the nowiconic hockey-stick diagram of earth tempera-ture for a clear case (Mann et al. 1999). De-spite its ubiquity in most of the natural sciences,visualization often remains an afterthought insociology.

    In this article, we review the history andcurrent state of data visualization in sociology.Our aim is to encourage sociologists to usethese methods effectively across the researchand publication process. We begin with a briefhistory, then present an overview of the the-ory of graphical presentation. The bulk of ourreview is organized around the uses of visualiza-tion in first the exploration and then the presen-tation of data, with exemplars of good practice.We also discuss workflow and software issuesand the question of whether better visualizationcan make sociological research more accessible.

    SOCIOLOGY LAGSFirst, why are statistical visualizations so com-mon in other fields and rare in sociology? Al-though model summaries offer exacting preci-sion in expressing particular quantities—suchas the slope of a line through data points—getting a sense of multiple patterns simulta-neously is typically easier visually. The point ismade forcefully by Anscombe’s (1973) famousquartet, reproduced in Figure 1a. Each data setcontains 11 observations on two variables. Thebasic statistical properties of each data set are

    almost identical, up to and including their bi-variate regression lines. But when visualized as ascatterplot, the differences are readily apparent(see also Chatterjee & Firat 2007). Lest we thinksuch features are confined to carefully con-structed examples, consider Jackman’s (1980)intervention in a debate between Hewitt (1977)and Stack (1979) over a critical test of Lenski’s(1966) theory of inequality and politics, repro-duced in Figure 1b. The argument is won ata glance, as the figure shows that the seem-ingly strong negative association between voterturnout and income inequality depends entirelyon the inclusion of South Africa in the sample.

    Given the power of statistical visualization,then, it is puzzling that quantitative sociology isso often practiced without visual referents. Oneneed only compare a recent issue of the Ameri-can Sociological Review or the American Journal ofSociology to Science, Nature, or the Proceedings ofthe National Academy of Science to see the radicaldifference in visual acuity. It is common for thepremier journals in sociology to publish articleswith many tables, but no figures. The oppositeis true in the premier natural science journals.There, a key figure is often the heart of the ar-ticle. In Nature, for example, the online tableof contents includes a thumbnail of the centralfigure to serve as the link to the rest of the paper.

    It has not always been so. Early in the historyof the discipline, data visualizations were com-mon and not appreciably out of step with thewider scientific community. Exemplars of barcharts (Hart 1896), line graphs (Marro 1899),parametric density plots and dot plots withstandard errors (Chapin 1924), scatterplots(Sletto 1936), and social network diagrams(Lundberg & Steele 1938) are easy to find inearly sociological journal articles. Du Bois’s(1898 [1967]) The Philadelphia Negro is filledwith innovative visualizations, including choro-pleth maps, table-and-histogram combinations,time series, and others. But somewhere alongthe line sociology became a field where sophis-ticated statistical models were almost invari-ably represented by dense tables of variablesalong rows and model numbers along columns.Though they may signal scientific rigor, such

    106 Healy · Moody

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    1 2

    3 4

    5.0

    7.5

    10.0

    12.5

    5.0

    7.5

    10.0

    12.5

    5 10 15 5 10 15x values

    y va

    lues

    For all panels, N = 11; mean = 7.5; regression: Y = 3 + 0.5(X); r = 0.82.SE of slope estimate: 0.118, t = 4.24; sum of squares (X − X): 100

    South Africa

    Bivariate slope including South Africa (N = 18)

    Bivariate slope excluding South Africa (N = 17)

    a Anscombe’s quartet (1973) b Jackman (1980)

    Figure 1Visualizations reveal model summary failures: (a) Anscombe’s quartet shows how statistically identical data sets can look very different;(b) visualization from Jackman (1980) decisively demonstrates the influence of outlying data points in an analysis.

    tables can easily be substantively indecipherableto most readers and perhaps at times even toauthors. The reasons for this are beyond thescope of this review, although several possi-bly complementary hypotheses suggest them-selves. First, to the extent that graphical im-agery was thought of as descriptive, statisticalimages may have been collateral damage in thewar between causal-inferential modeling anddescriptive reportage. Second, figures may haveseemed unsophisticated. The very clarity of a(good) figure made the work seem too sim-ple. Third, and more charitably, visualizationin sociology might have been a victim of thefield’s relatively rapid embrace of quantitativemethods. American sociology adopted sophis-ticated modeling techniques quite early com-pared with other social sciences. The rangeand variety of its research questions and datasources meant that the statistical tool kit in so-ciology in the late 1960s and into the 1970swas more varied than in economics or psy-

    chology at the time and much more developedthan what was then current in political science.But this was also a period when the visual-ization tools of statistical software lagged wellbehind their strictly computational abilities.Conventions of data presentation may havestandardized at a time when the possibilities forvisualization were narrower. Finally, some ofthe resistance to figures may have come fromthe fact that the tables in early journal articlesand monographs often contained actual datarather than summaries or model results. In a re-view of a history of graphical methods in statis-tics written in 1938, John Maynard Keynes re-marked that he wished the author

    could have added a warning, supported byhorrid examples, of the evils of the graphicalmethod unsupported by tables of figures. Bothfor accurate understanding, and particularly tofacilitate the use of the same material by otherpeople, it is essential that graphs should not be

    www.annualreviews.org • Data Visualization in Sociology 107

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    published by themselves, but only when sup-ported by the tables which lead up to them. Itwould be an exceedingly good rule to forbidin any scientific periodical the publication ofgraphs unsupported by tables. (Keynes 1938,p. 282, emphasis added)

    To speak anachronistically, here Keynes isarguing that economists need the underlyingdata along with the visual summary for the sakeof reproducibility. We are now at a point whenthe volume of data used in a typical quantita-tive article far exceeds what can be presented ina series of tables. But Keynes’s point is worthbearing in mind. The utility of visualizationmethods—in particular their ability to effec-tively summarize large quantities of data or so-phisticated modeling techniques—is partly de-pendent on related advances in our ability toeasily share data and reproduce analyses. If dataare accessible as needed, using figures insteadof tables becomes much easier. Not coinci-dentally, this is another area where sociologyhas lagged behind other social sciences (Freese2007).

    Whatever their relative importance, the netresult of these processes for sociology has beena training and publication standard that rarelyincludes graphical treatments of statistics. Newstudents are typically not taught to think aboutgraphics and statistics in a consistent, coherentway.

    Our argument is not that sociologists shouldbe producing more visualizations just becauseeveryone else is doing it. Indeed, as we discussbelow, there is considerable debate about whatsort of visual work is most effective, when it canbe superfluous, and how it can at times be mis-leading to researchers and audiences alike. Justlike sober and authoritative tables, data visual-izations have their own rhetoric of plausibility.Anscombe’s quartet notwithstanding, summarystatistics and modeling can be thought of astools that deliberately simplify data to let us seepast the cloud of data points. We do not thinkvisualization will give us the right answer sim-ply by looking. Rather, we should think abouthow visualization might be more effectively in-

    tegrated into all stages of our work. Softwarenow makes routinely generating figures easierthan ever. Even if many disciplinary journalsstill lag in their editorial desire or ability topresent good data visualizations, we argue thatit is time for these methods to be fully integratedinto sociology’s research process.

    VISUALIZATION IN PRINCIPLEBook-length treatments of good statistical visu-alization practice abound. Their content rangesfrom the more theoretical—emphasizing, forinstance, the nature and origins of visualconventions—to more pragmatic collectionsof current best practices meant to serve asan inspiration to practitioners. In betweenare efforts to codify practice and developtaste, and guides to working implementations.The most influential general treatments areprobably Bertin’s (1967 [2010]) Semiology ofGraphics, Cleveland’s The Elements of GraphingData (1994) and Visualizing Data (1993), andWilkinson’s (1995 [2005]) The Grammar ofGraphics. Overviews of contemporary practicecan be had in Few (2009, 2012) and Yau (2012).There are also several books based specificallyon visualization techniques within a particularsoftware program, such as Friendly (2000) forSAS, Mitchell (2012) for Stata, Murrell (2011)for R, and Kleimean & Horton (2013) forcomparisons of multiple programs. Sometimesthe graphical capabilities of particular softwareapplications are loosely related to the moretheoretical work, taking from them a concernwith aesthetic principles and possibly specificsorts of plots. In other cases, the linkage iscloser. Sarkar (2008) describes a data visu-alization package for R that closely followsCleveland’s ideas (and some earlier associatedsoftware), and Wickham (2009, 2010) describesa software package for R that implements andextends principles worked out in Wilkinson’s(1995 [2005]) The Grammar of Graphics.

    The conceptual literature is deep and com-prehensive, although its representatives do notalways speak in one voice. This is to be expectedin an area where theoretical development

    108 Healy · Moody

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    involves judgments of taste. The best-knowncritic and tastemaker by far in the field isEdward R. Tufte. It is fair to say that The VisualDisplay of Quantitative Information (Tufte 1983)is a classic in the field, and its three follow-uptexts are also widely read (Tufte 1990, 1997,2006). Described as “self-exemplifying” (Tufte2006, p. 10), the bulk of the work is a seriesof negative and positive examples with moregeneral principles (or rules of thumb) extractedfrom them rather than a direct guide to practice,akin more to a reference book on ingredientsthan to a cookbook for daily use in the kitchen.At the same time, Tufte’s early work in politi-cal science shows that he applied his ideas wellbefore codifying them in this way. His PoliticalControl of the Economy (Tufte 1978) combinesdata tables, figures, and text in a manner thatremains remarkably fresh almost 40 years later.

    Across his work, Tufte preaches a consistentset of principles, though they vary in their de-gree of specificity. Thus,

    Graphical excellence is the well-designed pre-sentation of interesting data—a matter of sub-

    stance, of statistics, and of design. . . . [It] consistsof complex ideas communicated with clarity,precision, and efficiency. . . . [It] is that whichgives to the viewer the greatest number ofideas in the shortest time with the least inkin the smallest space. . . . [It] is nearly alwaysmultivariate. . . . And graphical excellence re-quires telling the truth about the data. (Tufte1983, p. 51)

    Tufte illustrates the point with Charles JosephMinard’s famous visualization of Napoleon’smarch on Moscow, reproduced in Figure 2.He remarks that this image “may well be thebest statistical graphic ever drawn,” and arguesthat it “tells a rich, coherent story with its mul-tivariate data, far more enlightening than justa single number bouncing along over time. Sixvariables are plotted: the size of the army, its lo-cation on a two-dimensional surface, directionof the army’s movement, and temperature onvarious dates during the retreat from Moscow”(Tufte 1983, p. 40). It is worth noting how dif-ferent Minard’s image is from most contem-porary statistical graphics. Until recently, these

    Figure 2Minard’s visualization of Napoleon’s advance on and retreat from Moscow is a classic of visualization, but its design is in many waysatypical.

    www.annualreviews.org • Data Visualization in Sociology 109

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    have tended to be generalizations of the scatter-plot or barplot, either in the direction of seeingmore data or seeing the output of models. Theformer looks for ways to increase the volume ofdata visible, the number of variables displayedwithin a panel, or the number of panels dis-played within a plot. The latter looks for waysto see results of models—point estimates, con-fidence ranges, predicted probabilities, and soon. Tufte (1983, p. 177) acknowledges that atour de force such as Minard’s “can be describedand admired, but there are no compositionalprinciples on how to create that one wonderfulgraphic in a million.” The best one can do for“more routine, workaday designs” is to suggestsome guidelines such as “have a properly cho-sen format and design,” “use words, numbers,and drawing together,” “display an accessiblecomplexity of detail,” and “avoid content-freedecoration, including chartjunk” (p. 177).

    Among this set of general goals are somespecific details that can be employed to gooduse across applications. This includes extensiveuse of layering and separation, for example,building on the insights of good cartography.Judicious use of stroke weight and color allowsone to layer multiple meanings on a singlevisual plane. The ability to successfully pulloff such effects depends on use of the smallesteffective difference—lighter lines, smaller colorvariations, and simpler textures. It has longbeen a complaint of chart designers that accom-plishing this often means working very muchagainst the (highly detailed, drop-shadowed,rich, Corinthian leather) grain of the defaultsettings in spreadsheet or other chart-makingapplications. Comparison and evaluation areoften enhanced by the use of many smallmultiples—plots that repeatedly display somereference variable or relationship (e.g., grossdomestic product versus health care costs overtime) and iterate across some other variable ofinterest (e.g., country) in an ordered fashion(see also Bertin 1967 [2010], pp. 217–45). Theuse of such multiples highlights the notion ofparallelism that allows a reader to carefullycompare across instances of similar-but-crucially-different items. Combined, these fea-

    tures facilitate a simultaneous micro and macroreading where key points are clearly communi-cated at the surface, but deeper meaning is ob-tained through careful review and exploration.

    A common complaint about Tufte’s workis that there are so few direct instructions.Busy cooks want a cookbook, not a pictureof a fantastic meal. The tendency for thecodification of data visualization to vacillatebetween overly abstract maxims and overlyspecific examples is characteristic of any craftwhere a practical sense of how to proceed—ataste or feeling for the right choice—mattersfor successful execution. A long-standing andplausible response to the problem is to have thedesigner make many of the judicious choices inadvance and then embed them for users in thedefault settings of graphics applications. Giventhat graphical software aimed at regular usershas been around for several decades now, how-ever, these efforts have proven less successfulthan initially hoped. In the foreword to thenew edition of Semiology of Graphics, HowardWainer (2010, p. xi) reflects on the hope heand others once felt that easy-to-use graphicaltools and software would lead to better generalpractice by way of smarter defaults. But, heargues, this has not happened. In the end, high-quality graphical presentation requires craftinga deliberately designed message rather thanaccepting the pre-established setting. Recenttheoretical work explicitly recognizes the limitsof relying on defaults. Following Wilkinson inimplementing ggplot’s “grammar of graphics”for R, Wickham (2010, p. 3) notes that theanalogy to grammar is useful because although“[a] good grammar will allow us to gain insightinto the composition of complicated graphics,and reveal unexpected connections betweenseemingly different graphics[,] . . . there willstill be many grammatically correct but non-sensical graphics. . . . [G]ood grammar is justthe first step in creating a good sentence.”

    If software defaults cannot enforce the ele-ments of good taste, the next best—or maybebetter—thing is a means to easily expose themechanics of good practice. One of the mostpositive developments in statistical software

    110 Healy · Moody

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    over the past 15 years has been its integrationwith a much broader set of tools built tofacilitate the sharing of both data and code.The first wave of modern statistical graphicsand information design could convey, in print,the general principles and the quality products.But the crucial piece in between—the designprocess and practical assembly—remainedopaque. Subsequently, communities of usersbegan to share not just output but code muchmore widely, whether under the auspices ofa for-profit developer (as in the case of Stata)or actively backed by free or open-sourcelicensed platforms (as with R) or expert userblogs (http://sas-and-r.blogspot.com, http://flowingdata.com, http://www.r-statistics.com/tag/visualization). Some of these havedeveloped into comprehensive referencesaimed at the practicing researcher (Chang2013). Most recently, pastebins and softwaredevelopment platforms backed by distributedversion control systems—most notablyGithub—have made sharing code both techni-cally much easier and normatively expected.

    As with the move toward replication datasets, everyday sharing of code allows novicesto look behind the curtain much more easilythan before. And perhaps unlike the earlieremphasis on accepting sensible defaults, itencourages new users to tinker with variousmethods and learn by doing. In many cases,software now allows users to control verydetailed layout elements in their programscripts, which (with a little extra languagework) allows one to override defaults withprincipled graphical choices. This ongoingintegration of guidebooks, how-to websites,code repositories, and fully reproducibleexamples is a major step forward for improvingvisualization practice. As one particularly well-developed example among many, UCLA’sInstitute for Digital Research and Educationhas a large library of worked graphical examplesimplemented across several statistics packages(http://www.ats.ucla.edu/stat/dae). Finally,because most statistical packages can now pro-duce graphics as editable vector graphics files,one can use any graphical editor to fine-tune

    elements (such as line thickness, greatersubtlety in color selection, etc.) for production.

    These developments do not make questionsof judgment and good practice go away. Sta-tistical visualization needs to be thought of aspart and parcel of analysis and presentation. Weshould be crafting visualizations thoughtfully inthe same way we craft arguments or build mod-els. Resources of this sort cannot by themselvesguarantee that code snippets will not simply bemechanically copied or inappropriately appliedby users looking for a shortcut to a good out-come. But, to paraphrase Keynes from a dif-ferent context, they do seem to promise if notcivilized visualization, at least the possibility ofcivilized visualization.

    VISUALIZATION IN PRACTICEWe have argued that there are several promis-ing ways that general principles of visualizationcan become more tangible in everyday use. Wenow turn to the question of current practicein a little more detail. Here we follow thecommon distinction between visualizationfor exploration versus presentation of a finalfinding. The former is meant for internalconsumption, as the researcher examines thedata to figure out what is going on; the latteris designed to convince a wider audience. Nat-urally, these processes overlap to some degree.The general principles covered in the previoussection—regarding clarity, honesty, showingthe data, and so on—apply equally to both thebackstage and frontstage of visualization work.But what is needed in each case does differ.Some recent developments on each side areworth highlighting.

    Exploring the DataGraphical methods are now well integrated intothe process of checking assumptions and ro-bustness in most statistical packages and areoften generated by default. Figure 3 shows atypical example of some diagnostic plots of anordinary least squares regression. They wereproduced on demand and by default, with no

    www.annualreviews.org • Data Visualization in Sociology 111

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

    http://sas-and-r.blogspot.comhttp://flowingdata.comhttp://flowingdata.comhttp://www.r-statistics.com/tag/visualizationhttp://www.r-statistics.com/tag/visualizationhttp://www.ats.ucla.edu/stat/dae

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    a

    Fitted valuesR

    esid

    uals

    Residuals vs Fitted

    Theoretical Quantiles

    Normal Q−Q

    Scale−Location

    Leverage

    Residuals vs LSt

    anda

    rdiz

    ed R

    esid

    uals

    Fitted values

    b

    Figure 3Default diagnosticplots for a linearmodel: (a) R, (b) SAS.Though automaticallyproduced, both panelspresent informationclearly and withjudicious use oflabeling and color.

    112 Healy · Moody

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    further tweaking or polishing. Note that al-though we voiced some skepticism above aboutthe ability of defaults to shape practice, theseplots are models of clarity. They could becalled into service for presentation purposes ina pinch. Their real utility, however, is the easewith which they can be produced and viewedas part of one’s everyday workflow as a socialscientist: With tools like these, comments onoutliers such as Jackman’s (1980) should neveragain be necessary.

    Diagnostic plots of this kind are—inprinciple—what you look at after a model hasbeen chosen. They are confirmatory rather thanstrictly exploratory. Advocacy of exploratorydata analysis (EDA), of looking carefully andcreatively before modeling, is most closely as-sociated with John Tukey (1972, 1977). His-torically, EDA has been closely tied to the riseof graphical capabilities in statistical comput-ing, particularly tools that allow rapid interac-tive visualization. A mild sense of unease withEDA is a feature of the statistical literature. Theapproach is explicitly inductive and concernedwith exploring data in a relatively freewheelingfashion as an aid to discovery, which at times canseem uncomfortably opportunistic or unstruc-tured. To working social scientists these are of-ten virtues, but statistics is also the disciplinewhere the avoidance of spurious associations isa major focus of technical work.

    As data sets have continued to increase inboth size and dimensionality, and as computingpower and graphical methods have tried tokeep up, there has been a rapprochementbetween the strictly exploratory and strictlyconfirmatory approaches. Working socialscientists routinely explore their data as partof the process of cleaning and checking it. Itwould be naive to think researchers were not onthe lookout—literally—for interesting patternsin complex data sets. Recent developments inEDA have focused on extending establishedmethods of easily looking at a lot of data atonce, and on developing new ways for visuallychecking the validity of apparent relationships.The idea is to make the exploratory a littlemore confirmatory.

    A first useful tool for this sort of explorationis a generalized scatterplot matrix. In a standardpairs plot, the goal is to see all the bivariate rela-tionships in the data at once, presented in a gridso that quick comparisons can easily be made.An unfortunate limitation, particularly for thesocial sciences, is that these plots do a poor jobwith categorical variables. Ideally we would liketo see the panels of the matrix display the datain a form appropriate to the underlying vari-able. A generalized pairs plot (Emerson et al.2013) accomplishes this, using barcode plots,boxplots, mosaic plots, and other methods.Figure 4 shows an example. The specific soft-ware implementation adds additional function-ality, including the ability to display differentplots—such as barcode and mosaic plots—inthe upper and lower triangles of the plot ma-trix, histograms along the main diagonal, andthe option of adding smoothed or linear regres-sion lines to panels.

    Generalized pairs plots can be extended evenfurther, depending on the software, by allow-ing further partitioning within panels. For in-stance, we can show separate histograms of acontinuous variable broken out by the valuesof a categorical variable. Multipanel plots areintrinsically rich in information. When com-bined with several within-panel types of repre-sentation and a large number of variables, theycan become quite complex. But, again, the mainutility of this approach is less in the presenta-tion of finished work—although it can certainlybe useful for that—and more in the way it en-ables the working researcher to quickly inves-tigate aspects of her own data. The goal is notto pithily summarize a single point one alreadyknows, but to open things up for further ex-ploration. Harrell (2001) remains an exemplarybook-length demonstration of the virtues of in-tegrating graphical methods with the processof data exploration (including exploring pat-terns of missingness in the data) right acrossthe process of model building, diagnostics, andpresentation.

    With many variables and large amountsof data, a square matrix of plots can becomeunwieldy even to the trained eye. Seeing more

    www.annualreviews.org • Data Visualization in Sociology 113

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    Figure 4A generalized pairs plot handles categorical data easily, and in different ways.

    data more quickly, and in particular exploringhigh-dimensional data in a controlled way,has been a focus of recent visualization re-search. Early work—going back to Tukey, andothers—allowed for the exploration of datain three dimensions, for instance by way of

    rotating a cloud of points on a screen. Thissort of approach “demoed well,” as spinningaround a cloud of colored points looks quiteimpressive to the casual observer. But in-terpreting these displays is another matter.Thus, methods for interactively exploring

    114 Healy · Moody

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    −1 −0.8 −0.6 −0.4 −0.2 0Correlation

    0.2 0.4 0.6 0.8 1

    cereb

    vas

    pubh

    ealth

    pubt

    opriv

    prog

    lib.tri

    m

    assau

    lt

    dono

    rs

    roads

    exter

    nal

    gdp

    healt

    hpo

    ppo

    p.den

    s

    tradc

    on.tri

    m

    cerebvas

    pubhealth

    pubtopriv

    proglib.trim

    assault

    donors

    roads

    external

    gdp

    health

    pop

    pop.dens

    tradcon.trim

    −0.04

    0.59

    0.45

    −0.13

    0.12

    −0.27

    0.07

    −0.08

    −0.12

    0.25

    0.28

    −0.01

    −0.01

    0.01

    0.11

    0.48

    −0.04

    −0.27

    −0.01

    −0.37

    0.27

    0.02

    0

    0.83

    0.44

    0.15

    −0.2

    −0.09

    −0.41

    −0.35

    −0.19

    −0.33

    −0.28

    0.39

    0.14

    −0.01

    0.07

    −0.44

    −0.26

    −0.42

    −0.3

    −0.19

    0.27

    0.06

    −0.03

    −0.06

    −0.04

    −0.33

    −0.76

    −0.64

    0.35

    0.14

    0.33

    −0.34

    0.14

    −0.11

    −0.16

    0.51

    0.47

    0.03

    −0.23

    −0.02

    0.06

    0.58

    0.38

    −0.21

    0.04

    −0.07

    0.26

    0.26

    0.04

    −0.16

    0.11

    −0.03

    −0.12

    0.33

    0.07 0.85

    Figure 5A correlation matrix represented as a tiled heat map (upper triangle) with color-keyed correlation coefficients(lower triangle).

    data sets advanced on two fronts. The firstmoved toward further development of mul-tiple panels, notably with innovative ways ofvisually conditioning on additional variablesor highlighting interactively selected casesacross panels. Co-plots, shingles, and contouror surface plots are all examples of this kindof development (Cleveland 1993, pp. 186–271;Sarkar 2008, pp. 67–115). Increasingly, thesemethods take advantage of color for presentingdata, as with heatmaps or tiled representa-tions of a correlation matrix (see Figure 5).

    Tools for permuting correlation matrices,either in the order produced by factor-analytictechniques or other direct optimization, allowone to identify higher-order patterns in suchfigures (Breiger & Melamed 2014).

    A second direction has been the develop-ment of parallel coordinate plots, which showmultiple variables side by side in a way thatallows for the visualization of both specificoutliers and clusters of association acrossmany variables at once (Moustafa & Wegman2006, Inselberg 2009). Figure 6 gives a simple

    www.annualreviews.org • Data Visualization in Sociology 115

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    −2.5

    0.0

    2.5

    roads

    cons

    ent.la

    w

    txp.po

    p opt

    pubt

    opiv

    exter

    nal

    pubh

    ealth

    varia

    ble

    pop.d

    ens

    dono

    rs

    healt

    h

    assau

    ltgd

    p

    cereb

    vas

    Valu

    e

    CorporatistLiberalSocDem

    World

    Figure 6A parallel coordinates plot highlighting a possibly relevant grouping variable.

    example, although the approach is best suitedto much larger numbers of variables and obser-vations than shown here. This sort of plot alsobenefits from being used interactively, as theordering of the variables (and the highlightingof possible grouping variables) can changethe interpretability of the graph quickly. TheGGobi system, for example, is designed toprovide interactive, semiautomated facilitiesfor “touring” large, high-dimensional data inreal time using parallel plots and a variety ofother methods (Cook & Swaine 2007).

    This broad EDA tradition has recently be-gun to reconnect with the model-checkingor diagnostic approach, with convergencehappening from both directions. The long-standing concern here is that a striking visual-ization might not correspond to any robust un-derlying phenomenon. Early advocates of datavisualization typically presented a “parade ofhorribles” (e.g., Wainer 1984) showing how bad

    visual presentation can distort or misrepresentthe data. But even properly presented visual-izations can be vulnerable to spurious patternattribution on the part of researchers and ob-servers. From the EDA side, Wickham et al.(2010) and Buja et al. (2009) provide some prin-cipled ways for assessing, in a broadly graphi-cal manner, whether or not the patterns one isseeing are likely to be spurious. For example,a permutation lineup presents observed datain a small-multiple context surrounded by nullplots of generated data. “Which plot shows thereal data?” Buja et al. (2009, p. 4372) ask. Ifobservers cannot reliably pick it out, then weshould doubt both the utility of the plot andthe soundness of any inferences (or arguments)based on it. From the modeling side, Gelman(2004, pp. 773–74) argues that a Bayesian ap-proach provides a principled framework for as-sessing “the implicit model checking involvedin virtually any data display.”

    116 Healy · Moody

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    Although we have argued that sociologistshave been relatively slow to adopt data visual-ization, several of the issues we have discussedhave independently appeared within the socio-logical literature. Sociologists routinely dealwith data where almost all the variables of inter-est are categorical, for example. And, as notedabove, the routine and effective display of cate-gorical data (especially cross-classified categor-ical data) has not been a trivial problem to solve.Furthermore, sociology has a long tradition ofusing methods that reduce high-dimensionaldata in some way—especially via factor analysis,principal components, correspondence analy-sis, or other related methods. In Distinction, forexample, Bourdieu (1984, pp. 128–29, 262, 266,343) presents his analysis of the space of Frenchsocial class and taste in a way that is both highlyvisual but also—for some critics—decidedly dif-ficult to interpret. This family of methods lendsitself to suggestive visualization in what mightbe called a configurational mode. This is some-what inimical to the Anglo-American traditionof seeking causal relations in statistical models.Breiger (2000) provides a useful discussion ofsome of the issues here, emphasizing points ofconvergence.

    Dimensional reduction of this sort typicallycharacterizes the problem of interest in termsof space or distance, which naturally encouragesthe mapping of social systems. Sociologists havebeen among the earliest users of these visualiza-tion tools, particularly with network analysis.The earliest interactive network tools were lit-erally peg boards and rubber bands (Freeman2004) or pins-and-strings.1 Interactive explo-ration of social network data has obviously beenmade much easier with the advent of efficientcomputer programs. Released in 1996, PAJEKwas one of the earliest completely interactivevisualization tools that was also optimized forlarge networks. Earlier software typically sepa-rated the visualization and analysis steps. Therehas since been rapid growth in the development

    1See http://www.soc.duke.edu/∼jmoody77/VizARS/sna_peg.jpg.

    of interactive network exploration tools, in-cluding on the web (http://www.theyrule.net,http://dirtyenergymoney.com). The chal-lenge for such work is excess reduction in theinherent complexity of the data, which has ledmethodologists to propose fit statistics for net-work layouts (Moody et al. 2005, Brandes et al.2012).

    The rapid availability of fully dynamic net-work data has created opportunities and chal-lenges for visualization. Network movies, forexample, allow one to capture the relational dy-namics as they unfold in space and time (Moodyet al. 2005, Bender-deMoll et al. 2008, Morriset al. 2009). The clear advantage of a net-work movie is that one can reserve the twodimensions of the visual plane for mappingthe topography of the social system and watchthe shape of the system change as the anima-tion runs. This is particularly useful for explo-ration, as it makes visible dynamic features thatare otherwise difficult to capture in summarystatistics. But there are also costs. People tendto have poor visual memories, so comparingnonadjacent moments in time is challenging,and the analyst must make strong assumptionsabout how to aggregate the network eventsover time. Similar visualization challenges arebecoming common in dynamic statistical dis-plays, such as the GapMinder data set, whichallows one to explore associations over time(http://www.gapminder.org).

    Presenting the ResultsThese considerations lead naturally to the ques-tion of presenting data. Most of the principlesdiscussed above regarding the construction offigures for exploring data also apply to present-ing it, if only because the audiences are oftenthe same—that is, experts in a particular field.But effective statistical graphics have a rhetor-ical aspect, too (Kostelnick 2008). In general,the goal is to look for ways of presenting thedata that are both effective with respect to one’sargument and honest with respect to the data.

    Though conceptually simple and among theearliest examples of statistical visualizations,

    www.annualreviews.org • Data Visualization in Sociology 117

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

    http://www.soc.duke.edu/~jmoody77/VizARS/sna_peg.jpghttp://www.soc.duke.edu/~jmoody77/VizARS/sna_peg.jpghttp://www.theyrule.nethttp://dirtyenergymoney.comhttp://www.gapminder.org

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    0 20 40 60 80 1000

    20

    40

    60

    80

    Aut

    hors

    (%)

    Total number of lifetime publications Total number of lifetime publications

    Aut

    hors

    (%)

    1 10 1000.001

    0.01

    0.1

    1

    10

    100

    a b

    Figure 7The distribution of authors’ lifetime number of publications in three very selective sociology journals ishighly skewed. In comparison to a standard histogram (a), a log-log histogram (b) is much better at revealingdetails in the “long tail” of the distribution.

    variable distributions remain of keen substan-tive interest. Many of the distributions typicallystudied in sociology are extremely skewed anddifficult to display as simple histograms. Con-sider, for example, some data on the number oftimes authors publish in a select set of journals(here the American Sociological Review, Ameri-can Journal of Sociology, and Social Forces) overthe course of their career. Figure 7a presentsa standard histogram, whereas Figure 7b fol-lows the convention now common in the phys-ical sciences of presenting the distribution on alog-log scale.

    When comparing distributions across cat-egorical variables, comparative boxplots allowone to examine multiple moments of a distri-bution across multiple categories or over time(with some loss of resolution). The presentationof joint distributions of multiple categoricalvariables has similarly been improved witharea-accurate Venn diagrams (see for example,http://www.eulerdiagrams.org/eulerAPE).An important contribution to this literatureis the work of Handcock & Morris (1999) onrelative distribution methods. By comparingthe ratio of two distributions at each pointalong the x-axis, one is quickly able to identify

    differences in both shape and central tendency.Figure 8 reproduces the relative distributionin permanent wage growth for two cohorts ofthe National Longitudinal Survey. If the wagedistributions were identical, the density wouldbe a simple horizontal line at 1.0; instead wesee much greater inequality (heavier tails atboth ends) in the recent cohort.

    A related problem involves effectivelydisplaying trends over time, particularly whenattempting to demonstrate strong variabilityacross units. The convention of reserving thex-axis for time and the y-axis for magnitudebecomes tricky if many series are given equalweight. An effective solution involves carefullychoosing colors, line weights, and labels tohighlight a particular strand among many (seeFigure 10 below). Moody et al. (2011) areable to demonstrate the wild variability inadolescent popularity sequences by generatinga scatterplot of trajectory summaries withexemplar labels.2 Because each position in the

    2See http://www.soc.duke.edu/∼jmoody77/VizARS/Figure5.jpg for trendspace; http://www.soc.duke.edu/∼jmoody77/VizARS/Figure%206.pdf for application ofthis space to model prediction outcomes.

    118 Healy · Moody

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

    http://www.eulerdiagrams.org/eulerAPEhttp://www.soc.duke.edu/~jmoody77/VizARS/Figure5.jpghttp://www.soc.duke.edu/~jmoody77/VizARS/Figure5.jpghttp://www.soc.duke.edu/~jmoody77/VizARS/Figure%206.pdfhttp://www.soc.duke.edu/~jmoody77/VizARS/Figure%206.pdf

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    Proportion of the original cohort

    Permanent differences in log wagesRe

    lativ

    e de

    nsity

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    0.0 0.2 0.4 0.6 0.8 1.0

    −1 0.5 1 1.5 2

    Figure 8The relative probability density function distribution of permanent wage growth in the original and recentNational Longitudinal Survey cohorts. A decile bar chart is superimposed on the density estimate. Theupper axis is labeled in permanent differences in log wages (adapted from Handcock & Morris 1999).

    field captures a unique trend, the distributionalcoverage of the space suggests there is notypical sequence.

    Moving beyond simple variable compari-son displays, the bulk of statistical work in so-ciology involves complex multivariate models.Even with good statistical training, tables ofcoefficients are hard to decipher quickly andtend to foreground statistical significance oversubstantive magnitudes. Straightforwardly in-terpreting the effects of independent variables israrely intuitive, especially for models with com-plex link functions, categorical components,or interaction terms. Although odds ratios aremargin free and thus nominally interpretable,knowing whether an effect is substantively largeis often difficult without comparative contextand may be impossible to discern directly fromthe table without intimate knowledge of the un-derlying distribution of control variables. Thesimplest solution to this problem is to use themodel to predict outcome variables at differ-ent levels or combinations of the independent

    variables of interest. Figure 9a shows a pow-erful example from Mirowsky & Ross (2007).They use a new style of vector graphs for la-tent growth models by age (see Mirowsky &Kim 2007) to display predicted values from in-teraction terms. This enables them to take re-sults from a complex structural equation modelof people’s perceived sense of control and si-multaneously illustrate both within-cohort andbetween-cohort changes at varying levels of ed-ucation in a way that would be otherwise verydifficult to represent.

    The figure allows one to identify changeswithin cohorts (change within vector) and overtime (sequence of arrows by group). Here wesee that high school dropouts have a lowersense of control overall but a dramatic drop insense of control during youth that levels outas they age. College-educated respondents, incontrast, have a generally high sense of controlthat is continuously optimistic through adult-hood, turning negative only after about age 60.Recent advances in the use of statistical graphics

    www.annualreviews.org • Data Visualization in Sociology 119

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    1.2

    1.0

    0.8

    0.4

    0.2

    0.0

    0.6

    Pred

    icte

    d se

    nse

    of co

    ntro

    l

    18 24 30 36 42 48 54 60 66 72 78 84 90Age (years)

    Collegedegree

    High schooldegree

    No high schooldegree

    83Y Percentile

    68

    51

    18

    9

    33

    a

    LabourLiberal DemocratConservative

    0.0

    0.2

    0.4

    0.6

    0.8

    3 6 9 3 6 9 3 6 9Attitude toward Europe

    Prob

    abili

    ty

    Knowledge

    0

    1

    2

    3

    b

    Figure 9(a) Vector diagram for latent trajectory model of perceived control by age, cohort, and education (adapted from Mirowsky & Ross2007, with permission from the University of Chicago Press). (b) Predicted probabilities and standard errors plotted from a multinomialmodel (adapted from Fox & Hong 2009).

    for model interpretation include estimates ofthe uncertainty of the model predictions. Mostsoftware now provides easy access to modelpredictions from the data, and this allows oneto provide results under varying scenarios (see,for example, Alkema et al. 2011). In this case,

    the hard work is done before the plot is made.Figure 9b shows a series of predicted proba-bilities from a multinomial model at differentlevels of various predictors and outcomes,with appropriate standard errors shown. Hereno conceptual advances are needed on the

    120 Healy · Moody

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    Colorado SpringsHIV risk network

    a Default PAJEK view b Edited for presentation

    High

    Low

    Closenesscentrality

    Figure 10Network exemplar of moving between software default and presentation results. Subtle adjustments to line widths and color palettesand the addition of a centrality scale greatly aid interpretability in (b).

    graphical side, just the ability to get informa-tion out of the model in a readily interpretableform (Fox 2003, Fox & Hong 2009).

    The distance between exploratory and pre-sentation graphics is most pronounced as thedensity of information necessary to display in-creases. Network images are particularly inter-esting in this case. A little effort with layeringand coloring makes a real difference. Consideralso Figure 10, which shows a before and afterof the same data. The basic layout is retained(with the addition of a little jittering to allevi-ate algorithmically induced stacking), but theresult is much more interpretable.

    Recent work on constructing visually inter-pretable social networks has focused on care-ful data reduction, either by suppressing nodesentirely in favor of contour-style diagrams(Moody 2004, Moody & Light 2006) or bydeleting or bundling edges to highlight struc-ture (Crnovrsanin et al. 2014). Other workhas focused explicitly on quantifying the layout

    model using stress or multidimensional scaling–related techniques (Frank & Yasumoto 1998,Brandes & Pich 2006, Brandes et al. 2012; seeLima 2011 for exemplars).

    Our focus so far has been on presenting re-sults to professional peers. But in recent yearsthe clear presentation of data to broader publicshas become increasingly important. It has neverbeen easier to circulate full-color graphics oforiginal data analysis to large groups of peo-ple. Social sharing of data through the Inter-net generally, but especially through servicessuch as Facebook and Twitter, has acceleratedthe rise of infographics or info-visualization.To many working statisticians, infographicsare the descendants of Tufte’s Ducks—those“self-promoting graphics” where “the over-all design purveys Graphical Style rather thanquantitative information” (Tufte 1983, p. 116).The contemporary infographic in its pureform is a supercharged megaduck incorporat-ing not only the bells and whistles derided by

    www.annualreviews.org • Data Visualization in Sociology 121

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    Tufte but far more besides, such as a spuriousquasi-narrative structure, pictographic se-quencing, or excessive dynamic elements.Gelman & Unwin (2013) discuss Infovis-stylework from a statistical point of view. They arguethat most infographics do not meet the stan-dards normally demanded of statistical visual-izations, but they concede that sometimes thegoals of the latter are not those of the former.

    It seems clear, though, that informationvisualization tools will become ever morewidespread. In keeping with our general argu-ment that good visualization is a component ofbroader good practice around data analysis, akey issue is the openness of standards and toolsfor data analysis on the web. Social scientistshave typically worked within dedicated statisti-cal applications to produce static graphics in aformat geared primarily for print publication.But there has been tremendous developmentover the past decade, and even just within thepast five years, in tools designed to present datainteractively on the web. The development ofpowerful libraries written in JavaScript has al-lowed developers to present statistical graphicsin a way that is quite open with respect to bothcode and data. Mike Bostock’s D3 library, forinstance, is increasingly used by statisticians andmedia analysts alike and provides a powerful setof dynamic visual methods (Murray 2013). It isalways difficult to know ex ante which particu-lar software tool kits have staying power in thelong run—functionally similar platforms andlibraries have come and gone before—whichis why static formats such as Postscript andportable document format, or PDF, are so long-lived. But even so, the leading edge of develop-ment in this area seems to be moving to fur-ther integrate specific statistical tools such asR with data formats (notably JavaScript ObjectNotation, or JSON) that can be presented effec-tively and interactively in the browser. For somekinds of data, notably the generation of dynamicchoropleth maps and cartograms, the standardof presentation in some media outlets is nowvery high. It can be difficult to interpret com-plex and colorful maps with data chunked intounits that vary radically by size (e.g., US coun-

    ties). Nevertheless, a map such as the one shownin Figure 11, which appeared in the New YorkTimes (Bloch & Gebeloff 2009), makes for a veryengaging way to explore patterns both spatiallyand over time. Presenting data of this sort inan effective, interactive package is difficult forsmall teams of researchers to accomplish. Butit is not impossible. Katz’s (2013) dialect surveymaps are a compelling recent example of what isnow within reach. Developers seem interestedin building the production of web-enabled con-tent into the software sociologists are used tousing, and thus these tools are likely to continueto become more powerful and easier to use.

    For sociologists thinking about the publicimpact of their work, it is worth bearing inmind that, the sins of Infovis notwithstanding,a well-crafted statistical graphic is the fastestway to propagate one’s findings. Moreover, it iseasy to forget how revelatory the general publiccan find even a relatively ordinary descriptiveimage if it is properly constructed. The panelsin Figure 12 show two examples. Figure 12ashows the rate of deaths due to assault in 24OECD countries between 1960 and 2011.The point of the image is to emphasize theexceptionally high death rate in the UnitedStates compared with other countries (aswell as the large changes in the US numberthat are visible over the timeframe), and sothe US series is colored separately from therest, with every other country getting theirown smoothed line and data points, but notindividual colors. The unique trajectory of theUnited States is immediately apparent. The useof color probably helped the image circulatemore widely in social media and traditionaloutlets than it otherwise might have. Color isnot strictly necessary, however, as the superbimage in Figure 12b makes clear. Taken fromKenworthy (2014), Figure 12b shows trendsin life expectancy plotted against a measureof health expenditures for 20 countries. TheUnited States is singled out with a bolder linethan the others. Individual data points are notplotted. There are only seven numbers labeledon the graph (including the one in “19 otherrich countries”), yet a strong argument based

    122 Healy · Moody

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    Figure 11A New York Times interactive choropleth map allows users to explore historical and geographical patterns of migration to the UnitedStates (Bloch & Gebeloff 2009, adapted with permission from the New York Times; the interactive map is available at http://www.nytimes.com/interactive/2009/03/10/us/20090310-immigration-explorer.html).

    on rich data is beautifully made about what hashappened to the returns to health spending inthe OECD generally, and in the United Statesin particular. In the original presentation,Kenworthy characterizes the data and mea-sures with a compact note in the caption,specifying the methods and measures. There isnothing about this figure that is conceptually ortechnically new. And yet a clearly conceived andcleanly executed image like this is still relativelyuncommon in the sociological literature.

    Visualizations of categorical data remainmore difficult to convey effectively, partly be-cause the general public is not always familiarwith conventional ways to present it. Mosaic

    plots, for instance, can be effective representa-tions of contingency tables, but people are nottaught to read them in the same way they canread bar charts or scatterplots. The effectivevisualization of network data presents similarissues. The dual problems of dimensionalityand scale require creative ways to layer andaggregate information in a manner that high-lights the key features of interest. In an attemptto characterize trends in political polarizationin the US Senate, Moody & Mucha (2013)relied on a combination of multiple aggrega-tion strategies and visual “identity arcs” linkingindividuals over time that effectively pushed“party loyalists” to the background while

    www.annualreviews.org • Data Visualization in Sociology 123

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

    http://www.nytimes.com/interactive/2009/03/10/us/20090310-immigration-explorer.htmlhttp://www.nytimes.com/interactive/2009/03/10/us/20090310-immigration-explorer.html

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    0

    2

    4

    6

    8

    10

    1960 1970 1980 1990 2000 2010Year

    Ass

    ault

    Dea

    ths

    per 1

    00,0

    00 p

    opul

    atio

    n

    United States 23 other OECD Countries

    US

    19 other richcountries

    70

    78

    83

    Life

    exp

    ecta

    ncy

    5 12 18%Health expenditures

    a Assault deaths by country b Life expectancy by country

    Figure 12(a) Assault deaths in the United States and 23 other OECD countries (Healy 2012). (b) Health expenditure (as a percentage of GDP)and life expectancy in the United States and 19 other rich countries (see Kenworthy 2014; image courtesy of L. Kenworthy).

    highlighting those (increasingly rare) senatorswho reach across the aisle (Figure 13).

    CONCLUSIONWe have argued that quantitative visualizationis a core feature of social-scientific practice fromstart to finish. All aspects of the research pro-cess from the initial exploration of data to theeffective presentation of a polished argumentcan benefit from good graphical habits. Goodgraphics are not, of course, the only thing—seeGodfrey (2013) for a discussion of the situationof blind and visually impaired users of currentstatistical software. But the dominant trend istoward a world where the visualization of dataand results is a routine part of what it means todo social science.

    Getting general audiences comfortable withdifferent kinds of data visualization is a long-term project, and not one that any particularresearcher or journal editor has any meaningfulcontrol over. But given that the interpretabilityof statistical graphics rests on both their inter-nal coherence as objects and the shared rep-resentational conventions they embody, a firststep is to insist on good standards in the peerreview process. A glance at recent issues of,

    say, the American Sociological Review shows thatthe standards for publishable graphical materialvary wildly between and even within articles—far more than the standards for data analysis,prose, and argument. Variation is to be ex-pected, but the absence of consistency in ele-ments as simple as axis labeling, gridlines, orlegends is striking. Just as training in elemen-tary visualization methods should be a standardcomponent of graduate education, our flag-ship journals should encourage their authors tothink about the most effective ways to encour-age visual clarity. This should not take the formof overly strict style guides but instead aim foran ideal of consistent, considered good judg-ment in the presentation of data and results inthe service of sociological argument.

    Effective data visualization is part of abroader shift in the social sciences where dataare more easily available, code and coding toolsare more widely accessible, and high-qualitygraphical work is easy to produce and share.We hope for professional audiences who ex-pect to see effective graphics as a routine as-pect of presented work, and we look forwardto wider publics who are able to comfortablyread and interpret good graphical work. Sociol-ogists should take advantage of the remarkable

    124 Healy · Moody

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    Polarization modularity

    0

    0.27

    0.27

    0.13

    0.13

    ‘07–

    ’08

    ‘03–

    ’04

    ‘99–

    ’00

    ‘95–

    ’96

    ‘91–

    ’92

    ‘87–

    ’88

    ‘83–

    ’84

    ‘79–

    ’80

    ‘75–

    76

    Democrats

    US

    Sena

    te v

    otin

    g sim

    ilarit

    y ne

    twor

    ks, 1

    975–

    2012

    Tim

    elin

    e: p

    resi

    dent

    , Sen

    ate

    part

    y ba

    lanc

    e, a

    nd d

    ate

    (thr

    ough

    June

    7, 2

    012)

    Republicans

    Cart

    erFo

    rdRe

    agan

    G.H

    .W. B

    ush

    Clin

    ton

    G.W

    . Bus

    h

    DR

    RD

    RR

    RD

    DD

    DR

    RR

    DD

    D

    0

    0.1

    0.2

    0.3

    1910

    1930

    1950

    1970

    Year

    1990

    2010

    Det

    ail

    Modularity

    R

    Oba

    ma ‘11–

    ’12

    DD

    Gro

    up si

    zeSe

    nato

    rscr

    ossi

    ng ti

    me

    20101

    5062

    6256

    56Se

    nate

    bal

    ance

    With

    in-g

    roup

    vote

    sim

    ilarit

    y

    0.72

    0.89

    0.78

    0.83

    Vote

    sim

    ilarit

    y (≥

    0.6)

    RD

    5050 5510102525

    Figu

    re13

    Agg

    rega

    tion

    and

    akn

    own

    dim

    ensio

    n(a

    pola

    riza

    tion

    scal

    e)sim

    plify

    aco

    mpl

    exne

    twor

    kla

    yout

    .(A

    dapt

    edfr

    omM

    oody

    &M

    ucha

    2013

    with

    perm

    issio

    nfr

    omC

    ambr

    idge

    Uni

    vers

    ityPr

    ess.)

    www.annualreviews.org • Data Visualization in Sociology 125

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    progress in methods, tools, and means toshare—from statistics to computational social

    science to web development—the better to seethe social world, and help others see it, too.

    DISCLOSURE STATEMENTThe authors are not aware of any affiliations, memberships, funding, or financial holdings thatmight be perceived as affecting the objectivity of this review.

    ACKNOWLEDGMENTSWe thank Jaemin Lee, Achim Edelmann, and Richard Benton for comments on earlier drafts.Permission to use copyrighted material was granted by the American Sociological Association(Figure 1b), the University of Chicago Press (Figure 9a), the New York Times (Figure 11), andCambridge University Press (Figure 13). All other figures are taken from the public domain and/orsignificantly redrawn and adapted by the authors. Partial support for this work was provided byNIH grants 1R21HD068317-01 and 1 R01 HD075712-01.

    LITERATURE CITED

    Alkema L, Raftery AE, Gerland P, Clark SJ, Pelletier F, et al. 2011. Probabilistic projections of the totalfertility rate for all countries. Demography 48:815–39

    Anscombe FJ. 1973. Graphs in statistical analysis. Am. Stat. 27:17–21Bender-deMoll S, Morris M, Moody J. 2008. Prototype packages for managing and animating longitudinal

    network data: dynamicnetwork and rSoNIA. J. Stat. Softw. 24(7). http://www.jstatsoft.org/v24/i07Bertin J. 1967 (2010). Semiology of Graphics: Diagrams, Networks, Maps. Redlands, CA: ESRI PressBloch M, Gebeloff R. 2009. Immigration explorer. New York Times, March 10. http://www.nytimes.com/

    interactive/2009/03/10/us/20090310-immigration-explorer.html.Bourdieu P. 1984. Distinction: A Social Critique of the Judgment of Taste. Cambridge, MA: Harvard Univ. PressBrandes U, Indlekofer N, Mader M. 2012. Visualization methods for longitudinal social networks and stochas-

    tic actor-oriented modeling. Soc. Netw. 43:291–308Brandes U, Pich C. 2006. Eigensolver methods for progressive multidimensional scaling of large data. Int.

    Symp. Graph Drawing (GD), Lect. Notes Comput. Sci. (LNCS) 4372:42–53Breiger RL. 2000. A toolkit for practice theory. Poetics 27:91–115Breiger RL, Melamed D. 2014. The duality of organizations and their attributes: turning regression modeling

    ‘inside out.’ Res. Sociol. Organ. 40:261–74Buja A, Cook D, Hofmann H, Lawrence M, Lee EK, et al. 2009. Statistical inference for exploratory data

    analysis and model diagnostics. Phil. Trans. R. Soc. A 367:4361–83Chang W. 2013. The R Graphics Cookbook. Sebastopol, CA: O’ReillyChapin FS. 1924. The statistical definition of a societal variable. Am. J. Sociol. 30:154–71Chatterjee S, Firat A. 2007. Generating data with identical statistics but dissimilar graphics: a follow up to the

    Anscombe Dataset. Am. Stat. 61:248–54Cleveland WS. 1993. Visualizing Data. Summit, NJ: HobartCleveland WS. 1994. The Elements of Graphing Data. Summit, NJ: HobartCook D, Swaine DF. 2007. Interactive and Dynamic Graphics for Data Analysis. New York: SpringerCrnovrsanin T, Muelder CW, Faris R, Felmlee D, Ma K-L. 2014. Visualization techniques for categorical

    analysis of social networks with multiple edge sets. Soc. Netw. 37:56–64Du Bois WEB. 1898 (1967). The Philadelphia Negro. New York: Shocken BooksEmerson JW, Green W, Schloerke B, Crowley B, Cook D, et al. 2013. The generalized pairs plot. J. Comp.

    Graph. Stat. 22:79–91

    126 Healy · Moody

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

    http://www.jstatsoft.org/v24/i07http://www.nytimes.com/interactive/2009/03/10/us/20090310-immigration-explorer.htmlhttp://www.nytimes.com/interactive/2009/03/10/us/20090310-immigration-explorer.html

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    Few S. 2009. Now You See It: Simple Visualization Techniques for Quantitative Analysis. Oakland, CA: AnalyticsFew S. 2012. Show Me the Numbers: Designing Tables and Graphs to Enlighten. Burlingame, CA: Analytics.

    2nd ed.Fox J. 2003. Effect displays in R for generalised linear models. J. Stat. Softw. 8(15). http://www.

    jstatsoft.org/v08/i15/paperFox J, Hong J. 2009. Effect displays in R for multinomial and proportional-odds logit models: extensions to

    the effects package. J. Stat. Softw. 32(1). http://www.jstatsoft.org/v32/i01/paperFrank KA, Yasumoto J. 1998. Linking action to social structure within a system: social capital within and

    between subgroups. Am. J. Sociol. 104:642–86Freeman LC. 2004. The Development of Social Network Analysis: A Study in the Sociology of Science. Vancouver,

    Can.: EmpiricalFreese J. 2007. Reproducibility standards in quantitative social science: why not sociology? Soc. Methods Res.

    36:153–72Friendly M. 2000. Visualizing Categorical Data. Cary, NC: SAS Inst.Gelman A. 2004. Exploratory data analysis for complex models. J. Comput. Graph. Stat. 13:755–79Gelman A, Unwin A. 2013. Infovis and statistical graphics: different goals, different looks. J. Comp. Graph.

    Stat. 22:2–28Godfrey AJR. 2013. Statistical software from a blind person’s perspective. R J. 5:73–79Handcock MS, Morris M. 1999. Relative Distribution Methods in the Social Sciences. New York: Springer-VerlagHarrell F. 2001. Regression Modeling Strategies. New York: SpringerHart HH. 1896. Immigration and crime. Am. J. Sociol. 2:369–77Healy K. 2012. America is a violent country. Kieran Healy Blog, July 20. http://kieranhealy.org/blog/

    archives/2012/07/20/america-is-a-violent-countryHewitt C. 1977. The effect of political democracy and social democracy on equality in industrial societies: a

    cross-national comparison. Am. Sociol. Rev. 42:450–64Inselberg A. 2009. Parallel Coordinates: Visual Multidimensional Geometry and its Applications. New York:

    SpringerJackman RM. 1980. The impact of outliers on income inequality. Am. Sociol. Rev. 45:344–47Katz J. 2013. Regional dialect variation in the continental US. Work. Pap., Proj. Beyond “Soda, Pop, or Coke,”

    Dep. Stat., N.C. State Univ., Raleigh. http://www4.ncsu.edu/∼jakatz2/project-dialect.htmlKenworthy L. 2014. Social Democratic America. New York: Oxford Univ. PressKeynes JM. 1938. Review of HG Funkhouser, Historical Development of the Graphical Representation of Statistical

    Data. Econ. J. 48:281–82Kleimean K, Horton NJ. 2013. SAS and R: Data Management, Statistical Analysis, and Graphics. Boca Raton,

    FL: Chapman & Hall/CRC. 2nd ed.Kostelnick C. 2008. The visual rhetoric of data displays: the conundrum of clarity. IEEE Trans. Prof. Commun.

    51:116–29Lenski G. 1966. Power and Privilege. New York: McGraw-HillLima M. 2011. Visual Complexity: Mapping Patterns of Information. New York: Princeton Archit. PressLundberg GA, Steele M. 1938. Social attraction-patterns in a village. Sociometry 1:375–419Mann ME, Bradley RS, Hughes MK. 1999. Northern hemisphere temperatures during the past millennium:

    inferences, uncertainties, and limitations. Geophys. Res. Lett. 26:759–62Marro A. 1899. Influence of the puberal development upon the moral character of children of both sexes. Am.

    J. Sociol. 5:193–219Mirowsky J, Kim J. 2007. Graphing age trajectories: vector graphs, synthetic and virtual cohort projections,

    and virtual cohort projections, and cross-sectional profiles of depression. Sociol. Methods Res. 35:497–541Mirowsky J, Ross C. 2007. Life course trajectories of perceived control and their relationship to education.

    Am. J. Sociol. 112:1339–82Mitchell M. 2012. A Visual Guide to Stata Graphics. College Station, TX: Stata. 3rd ed.Moody J. 2004. The structure of a social science collaboration network: disciplinary cohesion from 1963 to

    1999. Am. Sociol. Rev. 69:213–38Moody J, Brynildsen WD, Osgood DW, Feinberg ME, Gest S. 2011. Popularity trajectories and substance

    use in early adolescence. Soc. Netw. 33:101–12

    www.annualreviews.org • Data Visualization in Sociology 127

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

    http://www.jstatsoft.org/v08/i15/paperhttp://www.jstatsoft.org/v08/i15/paperhttp://www.jstatsoft.org/v32/i01/paperhttp://kieranhealy.org/blog/archives/2012/07/20/america-is-a-violent-countryhttp://kieranhealy.org/blog/archives/2012/07/20/america-is-a-violent-countryhttp://www4.ncsu.edu/~jakatz2/project-dialect.html

  • SO40CH05-MoodyHealy ARI 4 July 2014 13:29

    Moody J, Light R. 2006. A view from above: the evolving sociological landscape. Am. Sociol. 38:67–86Moody J, McFarland DA, Bender-deMoll S. 2005. Dynamic network visualization: methods for meaning with

    longitudinal network movies. Am. J. Sociol. 110:1206–41Moody J, Mucha PJ. 2013. Portrait of political party polarization. Netw. Sci. 1:119–21Morris M, Kurth AE, Hamilton DT, Moody J, Wakefield S. 2009. Concurrent partnerships and HIV preva-

    lence disparities by race: linking science and public health. Am. J. Public Health 99:1023–31Moustafa R, Wegman E. 2006. Multivariate continuous data—parallel coordinates. In Graphics of Large

    Datasets, ed. A Unwin, C Theus, H Hofmann, pp. 143–56. New York: SpringerMurray S. 2013. Interactive Data Visualization for the Web. Sebastopol: O’ReillyMurrell P. 2011. R Graphics. Boca Raton, FL: Chapman & Hall. 2nd ed.Sarkar D. 2008. Lattice: Multivariate Data Visualization with R. New York: SpringerSletto RF. 1936. A critical study of the criterion of internal consistency in personality scale construction. Am.

    Sociol. Rev. 1:61–68Stack S. 1979. The effects of political participation and social party strength on the degree of income inequality.

    Am. Sociol. Rev. 44:168–71Tufte ER. 1978. Political Control of the Economy. Princeton, NJ: Princeton Univ. PressTufte ER. 1983. The Visual Display of Quantitative Information. Cheshire, CT: GraphicsTufte ER. 1990. Envisioning Information. Cheshire, CT: GraphicsTufte ER. 1997. Visual Explanations: Images and Quantities, Evidence and Narrative. Cheshire, CT: GraphicsTufte ER. 2006. Beautiful Evidence. Cheshire, CT: GraphicsTukey JW. 1972. Some graphic and semigraphic displays. In Statistical Papers in Honor of George W. Snedecor,

    ed. TA Bancroft, pp. 293–316. Ames: Iowa State Univ. PressTukey JW. 1977. Exploratory Data Analysis. New York: Addison WesleyWainer H. 1984. How to display data badly. Am. Stat. 38:137–47Wainer H. 2010. Foreword. See Bertin 1967 (2010), pp. ix–xWickham H. 2009. ggplot2: Elegant Graphics for Data Analysis. New York: SpringerWickham H. 2010. A layered grammar of graphics. J. Comput. Graph. Stat. 19:3–28Wickham H, Cook D, Hofmann H, Buja A. 2010. Graphical inference for Infovis. IEEE Trans. Vis. Comput.

    Graph. 6:973–79Wilkinson L. 1995 (2005). The Grammar of Graphics. New York: Springer. 2nd ed.Yau N. 2012. Visualize This: The FlowingData Guide to Design, Visualization, and Statistics. Indianapolis, IN:

    Wiley

    128 Healy · Moody

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40-FrontMatter ARI 8 July 2014 6:42

    Annual Reviewof Sociology

    Volume 40, 2014Contents

    Prefatory Chapter

    Making Sense of CultureOrlando Patterson ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 1

    Theory and Methods

    Endogenous Selection Bias: The Problem of Conditioning on aCollider VariableFelix Elwert and Christopher Winship ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣31

    Measurement Equivalence in Cross-National ResearchEldad Davidov, Bart Meuleman, Jan Cieciuch, Peter Schmidt, and Jaak Billiet ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣55

    The Sociology of Empires, Colonies, and PostcolonialismGeorge Steinmetz ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣77

    Data Visualization in SociologyKieran Healy and James Moody ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 105

    Digital Footprints: Opportunities and Challenges for Online SocialResearchScott A. Golder and Michael W. Macy ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 129

    Social Processes

    Social Isolation in AmericaPaolo Parigi and Warner Henson II ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 153

    WarAndreas Wimmer ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 173

    60 Years After Brown: Trends and Consequences of School SegregationSean F. Reardon and Ann Owens ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 199

    PanethnicityDina Okamoto and G. Cristina Mora ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 219

    Institutions and Culture

    A Comparative View of Ethnicity and Political EngagementRiva Kastoryano and Miriam Schader ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 241

    v

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40-FrontMatter ARI 8 July 2014 6:42

    Formal Organizations

    (When) Do Organizations Have Social Capital?Olav Sorenson and Michelle Rogan ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 261

    The Political Mobilization of Firms and IndustriesEdward T. Walker and Christopher M. Rea ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 281

    Political and Economic Sociology

    Political Parties and the Sociological Imagination:Past, Present, and Future DirectionsStephanie L. Mudge and Anthony S. Chen ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 305

    Taxes and Fiscal SociologyIsaac William Martin and Monica Prasad ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 331

    Differentiation and Stratification

    The One PercentLisa A. Keister ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 347

    Immigrants and African AmericansMary C. Waters, Philip Kasinitz, and Asad L. Asad ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 369

    Caste in Contemporary India: Flexibility and PersistenceDivya Vaid ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 391

    Incarceration, Prisoner Reentry, and CommunitiesJeffrey D. Morenoff and David J. Harding ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 411

    Intersectionality and the Sociology of HIV/AIDS: Past, Present,and Future Research DirectionsCeleste Watkins-Hayes ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 431

    Individual and Society

    Ethnic Diversity and Its Effects on Social CohesionTom van der Meer and Jochem Tolsma ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 459

    Demography

    Warmth of the Welcome: Attitudes Toward Immigrantsand Immigration Policy in the United StatesElizabeth Fussell ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 479

    Hispanics in Metropolitan America: New Realities and Old DebatesMarta Tienda and Norma Fuentes ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 499

    Transitions to Adulthood in Developing CountriesFatima Juárez and Cecilia Gayet ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ 521

    vi Contents

    Ann

    u. R

    ev. S

    ocio

    l. 20

    14.4

    0:10

    5-12

    8. D

    ownl

    oade

    d fro

    m w

    ww

    .ann

    ualre

    view

    s.org

    by D

    uke

    Uni

    vers

    ity o

    n 07

    /31/

    14. F

    or p

    erso

    nal u

    se o

    nly.

  • SO40-FrontMatter ARI 8 July 20