i o ed. · results from barnes & noble dataset books in the barnes & noble dataset...

10
In the format provided by the authors and unedited. Millions of online book co-purchases reveal partisan differences in the con- sumption of science Feng Shi 1† , Yongren Shi 2† , Fedor A. Dokshin 3 , James A. Evans 1,4 *, Michael W. Macy 3 * Affiliations: 1 Computation Institute, University of Chicago, Chicago, IL 60637, USA. 2 Yale Institute for Network Science, Yale University, New Haven, CT 06511, USA. 3 Department of Sociology, Cornell University, Ithaca, NY 14853, USA. 4 Department of Sociology, University of Chicago, Chicago, IL 60637, USA. *Correspondence to: [email protected] (J. A. E.), [email protected] (M. W. M.). †Both authors contributed equally to this work. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. SUPPLEMENTARY INFORMATION VOLUME: 1 | ARTICLE NUMBER: 0079 NATURE HUMAN BEHAVIOUR | DOI: 10.1038/s41562-017-0079 | www.nature.com/nathumbehav 1

Upload: others

Post on 20-Apr-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: I o ed. · Results from Barnes & Noble Dataset Books in the Barnes & Noble dataset constitute a subset of Amazon political and science books, but the two co-purchase networks are

In the format provided by the authors and unedited.

1

Supplementary Information for

Millions of online book co-purchases reveal partisan differences in the con-

sumption of science

Feng Shi1†, Yongren Shi2† , Fedor A. Dokshin3, James A. Evans1,4*, Michael W. Macy3*

Affiliations:

1Computation Institute, University of Chicago, Chicago, IL 60637, USA.

2Yale Institute for Network Science, Yale University, New Haven, CT 06511, USA.

3Department of Sociology, Cornell University, Ithaca, NY 14853, USA.

4Department of Sociology, University of Chicago, Chicago, IL 60637, USA.

*Correspondence to: [email protected] (J. A. E.), [email protected] (M. W. M.).

†Both authors contributed equally to this work.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

SUPPLEMENTARY INFORMATIONVOLUME: 1 | ARTICLE NUMBER: 0079

NATURE HUMAN BEHAVIOUR | DOI: 10.1038/s41562-017-0079 | www.nature.com/nathumbehav 1

Page 2: I o ed. · Results from Barnes & Noble Dataset Books in the Barnes & Noble dataset constitute a subset of Amazon political and science books, but the two co-purchase networks are

2

Supplementary Figures

Supplementary Figure 1. Confidence in science among liberals & moderates (blue curve) and conservatives (red curve) since 1970. Data are from the General Social Survey cumulative file (N=33154). The figure reports decade-specific averages of the responses in each of the 27 years between 1972 and 2010 that the item was surveyed. Prior to the 1990s, conservatives reported higher confidence, although the differ-ences are not statistically significant. After 1990, conservatives report significantly lower confi-dence in science.

Supplementary Figure 2. Distributions of sales rank and publication year by liberal and conservative books. Left: Distributions of logarithmic sales ranks for liberal books and conservative books. Rank 1 corresponds to highest sales. Books with missing sales rank are not considered in the plot. The mean logarithmic sales rank is 13.5 for liberal books and 13.1 for conservative books, and medi-an logarithmic sales rank is 13.7 for liberal and 13.4 for conservative. Right: Distributions of publication years for liberal books and conservative books. Mean publication years of both are 1999. Median publication year is 2007 for liberal books and 2009 for conservative books.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

NATURE HUMAN BEHAVIOUR | DOI: 10.1038/s41562-017-0079 | www.nature.com/nathumbehav 2

SUPPLEMENTARY INFORMATION

Page 3: I o ed. · Results from Barnes & Noble Dataset Books in the Barnes & Noble dataset constitute a subset of Amazon political and science books, but the two co-purchase networks are

3

Supplementary Figure 3. Positive correlation between the number of citations a sub-discipline receives from patents (Y) and from other sub-disciplines (X). A linear regression model log(Y) = aX + b is fitted to data with estimate a = 0.0279 (p-value < 0.001).

Supplementary Figure 4. Standardized difference between centralities of blue- and red-linked books, by polarization for each discipline.

For each discipline, the plot shows the centrality difference

E[X ]−E[Y ]S2(1/nx +1/ny )

against polariza-

tion, where X corresponds to centralities of blue-linked books, Y to centralities of red-linked books, nx to the number of blue-linked books, and

ny to the number of red-linked books. S2 is the pooled sample variance of blue- and red-linked books. A robust linear regression line is shown in the plot with slope 1.2278 (p-value<0.001).

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

NATURE HUMAN BEHAVIOUR | DOI: 10.1038/s41562-017-0079 | www.nature.com/nathumbehav 3

SUPPLEMENTARY INFORMATION

Page 4: I o ed. · Results from Barnes & Noble Dataset Books in the Barnes & Noble dataset constitute a subset of Amazon political and science books, but the two co-purchase networks are

4

Supplementary Figure 5. Reproduction of major findings in the main text after academic books are removed. (A) Political relevance and polarization of science topics compared to topics outside of science. (B) Correlation between political alignment and applied index of sub-disciplines (r=0.39, p=0.005). (C) Difference between the average numbers of science books linked to a blue and those to a red book (scientific breadth), by polarization for each discipline. The difference in sci-entific breadth and log(polarization) are highly correlated with r=0.75 and p<0.001. (D) Differ-ence between centralities of blue- and red-linked books for each discipline. A robust regression line is shown as a guide to the eye. A simple linear regression gives slope 1 and p-value 0.003 with two outliers economic and history removed.

History

Economics

A

B C D

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

NATURE HUMAN BEHAVIOUR | DOI: 10.1038/s41562-017-0079 | www.nature.com/nathumbehav 4

SUPPLEMENTARY INFORMATION

Page 5: I o ed. · Results from Barnes & Noble Dataset Books in the Barnes & Noble dataset constitute a subset of Amazon political and science books, but the two co-purchase networks are

5

Supplementary Figure 6. Results from the Barnes & Nobel dataset. (A) Political polarization scores of scientific disciplines calculated from the Barnes & Nobel co-purchase network (Y) and those from the Amazon network (X). The scores from the two dataset are nearly identical (r=0.97, p<0.001). (B) Political alignment of disciplines calculated from the two datasets. The alignment scores are highly correlated across the two networks (r=0.76,

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

NATURE HUMAN BEHAVIOUR | DOI: 10.1038/s41562-017-0079 | www.nature.com/nathumbehav 5

SUPPLEMENTARY INFORMATION

Page 6: I o ed. · Results from Barnes & Noble Dataset Books in the Barnes & Noble dataset constitute a subset of Amazon political and science books, but the two co-purchase networks are

6

p<0.001). (C) Political alignment of disciplines, organized as a tree of science topics. (D) Polari-zation by the difference between the average numbers of science books linked to a blue book and those to a red book, for each discipline. The difference in scientific breadth is significantly corre-lated with polarization (r=0.54, p=0.005). (E) Difference between mean closeness centralities of blue- and red-linked books for each discipline. Except for biology, mean centrality of blue-linked books is no less, and for half of the disciplines, significantly larger than that of red-linked books.

Supplementary Figure 7. Results using Erdos-Reyni random graph as null model. (Top) Assessments of political relevance, alignment and polarization using the Erdos-Reyni null model. These charts and graphs reveal the same pattern of book purchases as reported in the main text: Science books are more politically relevant and polarized than non-science, due large-ly to the social sciences and humanities, while the physical and life sciences are similar to non-science overall. Books on the performing arts and sports have low political relevance and polari-zation compared to literature, religion or science. (Bottom) Detailed comparisons of the political assessments using the two null models (E-R and configuration). The two models yield nearly identical relevance scores, and highly correlated alignments scores. Even though alignment scores from the two models are not the same, their respective orderings among topics are nearly identical.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

NATURE HUMAN BEHAVIOUR | DOI: 10.1038/s41562-017-0079 | www.nature.com/nathumbehav 6

SUPPLEMENTARY INFORMATION

Page 7: I o ed. · Results from Barnes & Noble Dataset Books in the Barnes & Noble dataset constitute a subset of Amazon political and science books, but the two co-purchase networks are

7

Supplementary Figure 8. Results with Indeterminate books included. Reproduction of major results in the main text when 315 formerly “indeterminate” works are in-cluded in the set of political red and blue books. These books are written by Congresspeople, which nevertheless signal weak conservative or liberal leanings, but are coded based on the polit-ical commitments of the author (e.g., Republican vs. Democrat). We find that when we incorpo-rate the new “moderate” political books into our analysis, it exhibits negligible change on our original results.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

NATURE HUMAN BEHAVIOUR | DOI: 10.1038/s41562-017-0079 | www.nature.com/nathumbehav 7

SUPPLEMENTARY INFORMATION

Page 8: I o ed. · Results from Barnes & Noble Dataset Books in the Barnes & Noble dataset constitute a subset of Amazon political and science books, but the two co-purchase networks are

8

Supplementary Results

Academic Books To test the effect of academic books on our findings that liberals have a wider interest in science books within most disciplines, we identified a large set of academic books from our dataset and re-performed our analysis after removing the academic books.

Academic books were identified according to their publishers. First, we compiled a list of academic publishers from the website (services.exeter.ac.uk/bfa/az.htm), to which we added all publishers with “university” or “academic” in their names and manually filtered out publishers that are clearly not academic (e.g., Trump University Press). Finally we classified each book as academic or not by using its publisher as a proxy. In total 136,672 academic books (about 10% of all books) were identified in the dataset.

We performed all the analyses after academic books were removed, and present the results in Supplementary Figure 5. The results, after removing academic books, remain consistent with the results we report in the main text. Supplementary Figure 5A shows that science is neither an apolitical sphere nor a public sphere. Science is more relevant and polarized than topics outside of science, and is polarized both within and across scientific disciplines. Across fields, there is a significant positive correlation (r=0.39, p=0.005) between political alignment and the commer-cial applied index of sub-disciplines (Supplementary Figure 5B), implying that customers for liberal books prefer basic science while conservative customers prefer commercially applied sci-ence. Within disciplines, Supplementary Figure 5C plots political polarization by the difference between the average number of disciplinary books linked to a blue book and the average number linked to a red book, for each discipline; and Supplementary Figure 5D reports the difference between closeness centralities of blue-linked and red-linked books for each discipline (cf. Sup-plementary Figure 4). Supplementary Figure 5 C and D together reveal that red books are more likely to cluster on the periphery of the disciplinary networks, with blue books linked to a wider variety of science books and blue-linked science books closer to the disciplinary core. Given that academic books are removed, this wider liberal interest in science books does not appear to be a simple consequence of academic liberalism.

Results from Barnes & Noble Dataset Books in the Barnes & Noble dataset constitute a subset of Amazon political and science books, but the two co-purchase networks are quite different. For example, only 9% of Amazon co-purchase links in the science and politics subgraph are found in the Barnes & Noble network, and only 21% of Barnes & Noble links are found in Amazon. See Supplementary Table 1 for a brief summary. Nevertheless, the number of political links with each book in Amazon and Barnes & Noble is highly correlated (r=0.60, p<0.001).

We replicated the Amazon-based analyses using the Barnes & Noble network and found consistent results across datasets. First, we calculated political polarization and alignment of sci-ence topics in Barnes & Noble. These measures are compared with those of corresponding topics in Amazon (Supplementary Figure 6, A and B). The polarization scores of disciplines given by the two networks are nearly identical (r=0.97, p<0.001); the alignment scores of disciplines are also highly correlated across the two networks (r=0.76, p<0.001).

Alignment of disciplines within the four “schools” in Barnes & Nobel is reported in Sup-plementary Figure 6C. There are not enough books in the sub-disciplines from the Barnes & No-bel dataset to statistically test the correlation between alignment and applied index. However, the

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

NATURE HUMAN BEHAVIOUR | DOI: 10.1038/s41562-017-0079 | www.nature.com/nathumbehav 8

SUPPLEMENTARY INFORMATION

Page 9: I o ed. · Results from Barnes & Noble Dataset Books in the Barnes & Noble dataset constitute a subset of Amazon political and science books, but the two co-purchase networks are

9

general pattern still holds: Applied disciplines like veterinary medicine and economics are at the red end of their respective schools, while anthropology and mathematics are most blue.

Finally, across disciplines, blue books link to a larger number of disciplinary books com-pared to red (Supplementary Figure 6D), consistent with results obtained with Amazon data. And for most disciplines, science books linked to blue are more central in their respective disciplines in terms of closeness centrality than books linked to red (Supplementary Figure 6E).

Effect of Null Model In the assessment of political relevance and alignment, the configuration model is used as a null model for our co-purchase network to generate prior distributions for measurement in our Bayes-ian framework. In this way we avoid relying on the “absolute difference” between observations and the null model, which could be affected by the choice of null model, but rather compare the null model and observations within a statistical framework. The more often we see certain pat-terns in the data, the weaker the effect of the null model. We picked the configuration model not only because it is a standard choice, but also because the book network is highly heterogeneous. It is not hard to imagine a book co-purchased with many other books (e.g., the Harry Potter se-ries) and one that is only co-purchased with other books from the same author (e.g., Dickens’ least popular novel, Pickwick Papers). Therefore, it is important to control for the degree of each book with our null model of the co-purchase network. The configuration model is ideal for that purpose.

To demonstrate the robustness of our findings, we carried out similar calculations using the Erdos-Renyi random graph as a null model. In this model, each co-purchase link appears inde-pendently with equal probability. Assessments of relevance and alignment from the Erdos-Renyi null model are shown in Supplementary Figure 7. They are highly correlated with results from the configuration null model and reveal similar patterns of purchase behavior.

Indeterminate Books When coding the political books in our sample, we took a conservative approach and included only books that clearly and consistently fit into a “blue” or “red” camp. These sets of books give us the most interpretable signal of partisan consumers. Unfortunately, “indeterminate” books in-clude not only books that could be seen as “moderate” but also ones that lie on dimensions or-thogonal to the left-right divide, and so they remain outside the scope conditions for testing whether political polarization extends beyond politics to other cultural domains such as science.

To test for potential biases due to our sampling of political books, we reran the analysis in-cluding a sample of formerly “indeterminate” works (written by U.S. Congresspersons), which exhibited weak signals of conservative or liberal leaning. We coded these based only on the po-litical commitments of the author (e.g., Republican vs. Democratic Congressperson). This yields an increase of 315 liberal or conservative books. We find that when we incorporate the new “moderate” political books into our analysis, it exerts negligible change on our original results (Supplementary Figure 8).

Finally, among the science books that are co-purchased with red and blue books, we found that 80% are also co-purchased with the “indeterminate” political books. In other words, even though we did not use the “indeterminate” books to infer people’s political ideology, those who buy books that are moderate or nonpolitical remain in our consideration.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

NATURE HUMAN BEHAVIOUR | DOI: 10.1038/s41562-017-0079 | www.nature.com/nathumbehav 9

SUPPLEMENTARY INFORMATION

Page 10: I o ed. · Results from Barnes & Noble Dataset Books in the Barnes & Noble dataset constitute a subset of Amazon political and science books, but the two co-purchase networks are

10

Supplementary Tables

Supplementary Table 1. Number of books and co-purchase links in Amazon and Barnes & Noble datasets. The last row reports the number of books and links present in both datasets. The last column re-ports the number of co-purchase links in the subgraph of science and political books. Books in Barnes & Noble are a subset of Amazon books, but the two co-purchase networks are quite dif-ferent when considering the co-purchase links. Number of books Number of co-purchase links

Total Science Politics Total Science - Politics Amazon 1,303,504 428,433 1,256 26,467,385 6,288,423 B & N 439,603 285,942 1,078 3,375,406 2,582,715 Common 439,603 285,942 1,078 664,149 542,578

Supplementary Table 2. Sales rank and publication year for political books. Number of books Logarithmic sales rank Publication year

Mean Median Mean Median Conservative 677 13.1 13.4 1999 2009 Liberal 587 13.5 13.7 1999 2007

Supplementary Table 3 (Supplementary_Table_3.tsv) ISBN number, title and publisher for the human-coded conservative books. The text file is tab delimited. Supplementary Table 4 (Supplementary_Table_4.tsv) ISBN number, title and publisher for the human-coded conservative books. The text file is tab delimited.

Supplementary Table 5 (Supplementary_Table_5.xlsx) Organization of the LC and DD categories into a hierarchy of science topics including 4 major scientific “schools” (humanities, physical, life, and social sciences) and 27 exclusive high-level topics, corresponding to broadly defined scientific disciplines (e.g. Physics, Chemistry, Medi-cine, Economics, etc.) that fall under these 4 major schools. The file is a Microsoft Excel spread-sheet.

Supplementary Table 6 (Supplementary_Table_6.xlsx) Organization of the LC and DD categories into four major topics outside of science – Arts, Sports, Literature (fiction and poetry), and Religion. The file is a Microsoft Excel spreadsheet.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

NATURE HUMAN BEHAVIOUR | DOI: 10.1038/s41562-017-0079 | www.nature.com/nathumbehav 10

SUPPLEMENTARY INFORMATION