data ethics for mathematicians

36
More generally: A discussion of ethics for data, research, and publishing Mason A. Porter (@masonporter) Department of Mathematics UCLA

Upload: mason-porter

Post on 12-Apr-2017

695 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Data Ethics for Mathematicians

More generally: A discussion of ethics for data, research, and publishing

Mason A. Porter (@masonporter)Department of MathematicsUCLA

Page 2: Data Ethics for Mathematicians

§ It’s important.

§ People need to be able to replicate our work.§ Making sure their own code is correct§ Natural self-correction in science (and ability to understand precisely every choice we

make in our work)

§ Not traditionally part of mathematical training, but increasingly we are using social data — including potentially personal data — in our research

Page 3: Data Ethics for Mathematicians

§ We use a lot more real data nowadays, and in particular this includes a lot of human (and animal) data.§ Much less a part of the research (and thus training) tradition in mathematics than in other

disciplines

§ Other disciplines have thought a lot more about ethics than mathematics§ In many cases, unfortunately, because they’ve messed up the ethics historically,

sometimes substantially, and we need to learn from the best practices they’ve developed

"Look, lady. Just because my grandfather didn't rape the environment and exploit the workers doesn't make me a peasant. And it's not that he didn't want to rape the environment and exploit the workers; I'm sure he did. It's just that as a barber, he didn't have that much opportunity."

– Roger Cobb [Steve Martin], All of Me (1984)

Thanks to Peter Mucha for the quote suggestion (and an excuse to allude to this movie)

Page 4: Data Ethics for Mathematicians

§ Be honest and fair (obviously)

§ Design ethically thoughtful research

§ Explain your decisions to others

§ [Points 2 and 3 taken from slides by Matt Salganik]

Page 5: Data Ethics for Mathematicians

FOUR PRINCIPLES

§ Respect for persons § (Note: Animal research also has thorny ethical issues!)

§ Beneficence

§ Justice

§ Respect for Law and Public Interest

How do you balance these four principles?

Page 6: Data Ethics for Mathematicians

§ Always be honest about your work

Page 7: Data Ethics for Mathematicians

§ If you are working with personal data, you need to check with your Institutional Review Board (IRB) to ensure that you are doing the work in an ethical way.§ They may tell you that you don’t need to submit a formal application, or they may tell you

that you do. Let them know briefly what data you have access to (or plan to acquire, and how) and what you plan to do with it.

§ Different IRBs of course can rule differently.

§ Rules differ in different countries§ Human data versus animal data

§ In these slides, I have human data in mind, but animal data and its acquisition of course also has major ethical considerations.

§ Look through UCLA’s website for the Office of the Human Research Protection Program (OHRPP): http://ora.research.ucla.edu/ohrpp/Pages/OHRPPHome.aspx

§ “IRB is a floor, not a ceiling” (from Matt Salganik’s slides)

Page 8: Data Ethics for Mathematicians

§ A well-known, heavily-used set of courses: https://www.citiprogram.org/index.cfm?pageID=86§ I found this from a link from UCLA’s OHRPP website.

§ Several years ago, I did some IRB training. (When preparing these slides, I couldn’t find the specific online course I took.) In addition to helping to think about issues, if something does go wrong, you do (from a practical point of view) want to be able to say that you have appropriate ethics training.

§ Note: The training required/expected/available differs substantially across countries.§ Example: From my experience, my impression is that the UK appears to be less stringent

about human data than the US, but it appears to be more stringent about non-human animals.

Page 9: Data Ethics for Mathematicians
Page 10: Data Ethics for Mathematicians
Page 11: Data Ethics for Mathematicians
Page 12: Data Ethics for Mathematicians

§ The more your research has the potential to violate personal privacy, the more helpful for humanity the outcome needs to have the potential to be.

Page 13: Data Ethics for Mathematicians

§ Informed Consent

§ Understanding and managing informational risk

§ Privacy

§ Making decisions in the face of uncertainty

§ Other notes§ Put yourself in everyone else’s shoes§ Think of research ethics as continuous, not discrete (sliding scale)

Bullet points from Matt Salganik’s slides

Page 14: Data Ethics for Mathematicians

§ You must provide sufficient (and precise) detail for people to be able to replicate your work!

§ Try to include it in your papers, but people are human, so if somebody e-mails you to ask for a clarification, copy of code (even if poorly commented), or something else, you should respond and send it to them, provide it’s something that you have the right to send them.

Page 15: Data Ethics for Mathematicians

§ To the extent possible, you should publish your data and usable (and well-commented) code along with your work.§ There can be tension between these ideals and issues of personal privacy, nondisclosure agreements, and

so on.

§ If using synthetic data, publish code to generate the data and the generated examples that you used in your paper.§ Supplementary material for the paper on the journal website, Github, Figshare, and other venues

§ Likely relevant for literally all of you§ E.g., if you are doing any numerical computations at all, this is desirable§ E.g., adjacency matrices for graphs in a definition–theorem–proof paper is also useful for readers (though

level of necessity depends on how large the graphs are)

§ Admission: I have been trying to get better about this over the years. I am very good about responding to e-mail queries, and the goal (though there exist practical considerations) is to be precise about all of my steps and to put as much online as feasible.

Page 16: Data Ethics for Mathematicians

§ For empirical data, if you have permission to post something (e.g., does the data “belong” to somebody else?) and it doesn’t invade privacy, you should post it because that promotes good science.

Page 17: Data Ethics for Mathematicians

§ Alternative name: “replication crisis”

§ https://en.wikipedia.org/wiki/Replication_crisis

Take a look, e.g., at the work of Victoria Snodden: http://web.stanford.edu/~vcs/

Page 18: Data Ethics for Mathematicians

§ Be explicit about anything you did, so that others can know what choices you made and evaluate whether they think it is the best procedure for your analysis§ E.g., sampling biases change properties of data

§ There are many reasons that one makes choices, so it’s not that you shouldn’t make them, but it’s part of your scientific procedure, so tell people exactly what you did so they know exactly what these choices were. (They may want to make different choices.)

§ “Manipulating” is a loaded word; here I mean it in a neutral way (i.e., “changes”), rather than in a negative one.

Page 19: Data Ethics for Mathematicians

§ When are things actually “anonymous”§ Is “full” anonymization even possible?

Page 20: Data Ethics for Mathematicians

Slide from Matt Salganik

Page 21: Data Ethics for Mathematicians

Slide from Matt Salganik

Page 22: Data Ethics for Mathematicians

§ https://en.wikipedia.org/wiki/Netflix_Prize#Cancelled_sequel

Page 23: Data Ethics for Mathematicians

§ Acknowledge all sources of data

§ Include precise means of how you got data and how somebody else can get the data (e.g., who do they contact?), especially if there is a reason that you are unable to post the data itself

§ Be generous when acknowledging people in papers: useful discussions, ideas, etc.

§ Be fair and appropriate when discussing work by authors in past papers§ You are standing on the shoulders of giants. :) Given credit where it is due.§ Difference between somebody “showing” something in a past paper versus “reporting” it.

The former is a statement of verifying validity; the latter is a historical fact (assuming what you write is accurate).

Page 24: Data Ethics for Mathematicians

§ There can be complications in posting data to the public, no matter how well-intentioned.§ This is a great data set to advance several avenues of research in network science, and my goal

is for people to be able to do that.

§ Learning the hard way§ Urgently arranging a phone meeting with the head of Facebook’s Data Science team§ An important learning experience for me§ A small chapter in the long story of data privacy§ A blog entry that is very critical of me (though this differs from my side of the story):

http://www.michaelzimmer.org/2011/02/15/facebook-data-of-1-2-million-users-from-2005-released/

§ Led to my learning much more about these issues (though under very stressful circumstances), a page about research using human data in Oxford’s Mathematical Institute, etc.§ https://www.maths.ox.ac.uk/members/policies/data-protection/research-using-data-involving-humans

Page 25: Data Ethics for Mathematicians

§ Research in collaboration with companies or government: What is it ok to include in a publication or post online?

§ Tension between open data and personal privacy

§ Terms-of-service agreements and nondisclosure agreements

§ In what sense can you replicate work if you can’t post everything?§ “Softer” replication: do you observe similar phenomena in circumstances that have some

similarities but are not the same?§ E.g., human behavior in different social networks

Page 26: Data Ethics for Mathematicians

§ See, e.g., the discussion around this paper: http://science.sciencemag.org/content/early/2015/05/06/science.aaa1160.full§ Eytan Bakshy, Solomon Messing, & Lada Adamic, Exposure to ideologically diverse news

and opinion on Facebook, Science, 2015§ They can’t tell us Facebook’s sampling algorithm, so how are we as scientists going to go

about “replicating” their work?§ Note: Do their insights apply to other online social networks? One should be able to do a weaker

form of replication such that the most interesting qualitative results are not merely a property of specifics on Facebook

§ Also: What about this work being public versus being entirely within Facebook and us never seeing any of it?

Page 27: Data Ethics for Mathematicians

§ A. D. I. Kramer, J. E. Guillory, and J. T. Hancock. Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences of the United States of America, 111(24):8788–8790, 2014§ Look up articles on this one

§ Experiments on Facebook with changes in people’s feeds

§ Also: What about this work being public versus being entirely within Facebook and us never seeing any of it?

Note: Academic researchers have IRBs that need to approve a study before it starts, whereas Facebook has a publication review board to approve publication of a study after it's already been done. Thus, we know that this study occurred because FB concluded that it could be published. We don’t know about what stuff is done with our data from FB and other companies when it doesn’t get published.

Page 28: Data Ethics for Mathematicians

Should academic researchers and companies follow the same rules?

Page 29: Data Ethics for Mathematicians

§ You can apply this comment generally to “data science” if you like, though the property of connectivity in networks provides substantial additional issues beyond just data science (and “Big Data”, etc.).

Page 30: Data Ethics for Mathematicians

§ Short essay by Johan Ugander (Management Science & Engineering, Stanford): https://medium.com/@jugander/truth-lies-and-an-ethics-of-personalization-e4ccfa7f2b84#.rzap3hm70

§ As an example, he discusses “Cambridge Analytica, identified by the NY Times as the hired guns behind Trump’s online targeting.”

§ Alexander Nix (CEO of CA) gave the following example in a video. Quoting Ugander’s essay: “if you own a private beach, he notes, you’d have more success keeping people off your beach by putting up a “Warning: sharks beyond this point” sign vs. a “private property” sign. The problem is: he recommends this strategy — and personalized versions of it — without any consideration to whether there actually are any sharks, advocating “behavioral communication” that is completely detached from any truth about reality. In fewer words: crafting lies, and then targeting them.”

Page 31: Data Ethics for Mathematicians

§ http://callingbullshit.org

§ Full title: “Calling Bullshit in the Age of Big Data”§ A course designed by Carl Bergstrom and Jevin West (University of Washington)§ Excellent syllabus and reading materials

§ Various parts of it relate to ethics, and they also have a unit directly about ethics: http://callingbullshit.org/syllabus.html#Ethics

Page 32: Data Ethics for Mathematicians

§ Targeted advertising (different trailers for people of different races) for the movie "Straight outta Compton": http://www.businessinsider.com/why-straight-outta-compton-had-different-trailers-for-people-of-different-races§ Different levels of prior familiarity with gangsta rap pioneers N. W. A. (Ice Cube, Dr. Dre, etc.)

§ Papers by Arvind Narayanan and collaborators, including:§ http://senglehardt.com/papers/ccs16_online_tracking.pdf§ https://5harad.com/papers/twivacy.pdf

§ J. Su et al., “De-anonymizing Web Browsing Data with Social Networks”, 2016

§ C. Kanich et al., “Spamalytics: An Empirical Analysis of Spam Marketing Conversion” (2008):§ http://www.umiacs.umd.edu/~tdumitra/courses/ENEE757/Fall14/papers/Kanich08.pdf

§ B. Markines et al., “Social spam detection” (2009):§ http://dl.acm.org/citation.cfm?doid=1531914.1531924

Page 33: Data Ethics for Mathematicians

§ “Tastes, Ties, and Time” Facebook data set§ One discussion about the controversy associated with this data set:

http://www.chronicle.com/article/Harvards-Privacy-Meltdown/128166/

§ Research by Sinan Aral and collaborators on manipulation of voting on social media sites§ One discussion: https://techcrunch.com/2013/08/11/reddit-science-herd/

Page 34: Data Ethics for Mathematicians

§ Mathematicians are relatively new to using human data, but we don’t yet have the ethics training to help us deal with the thorny issues

§ Learn from the best practices (and past mistakes) from other disciplines§ As in those other disciplines, mathematicians should be getting ethics training

§ Read about — and think about and discuss — various controversies and other studies. We all may set our bars in a different place, but we need to do it conscientiously.§ It’s a sliding bar: the more potential for invasion of personal privacy, the more valuable

the potential outcome has to be for humanity§ IRB approval is only a lower bound

Page 35: Data Ethics for Mathematicians

§ While I have more training and experience with these issues than most mathematicians, I am very much an amateur on data ethics compared to people from the social and human sciences, for whom this is a standard part of the training from the beginning of their education.

§ With this in mind, please contact me with any suggestions on these slides. Did I miss any salient points? Do you disagree with any of the discussed points? Are there any other studies that are especially crucial to bring up?

§ Eventually, I hope to develop these slides further into an article for a venue in the mathematical sciences. Let me know if you are interested in being involved in writing this article.

Page 36: Data Ethics for Mathematicians

§ Several suggestions for resources from Johan Ugander

§ Several comments on my slides and suggestions for resources from Peter Mucha

§ Website from Matt Salganik’s class on Computational Social Science (Fall 2016): http://www.princeton.edu/~mjs3/soc596_f2016/§ I drew some material and ideas from his slides on ethics

§ It would be pretty ironic if I plagiarized these slides, wouldn’t it?