how do data science workers collaborate? roles, workflows ... · [email protected], ibm research,...

22

How do Data Science Workers Collaborate? Roles,Workflows, and Tools

AMY X. ZHANG∗, University of Washington & MIT, USAMICHAEL MULLER∗, IBM Research, USADAKUO WANG, IBM Research & MIT-IBM Watson AI Lab, USA

Today, the prominence of data science within organizations has given rise to teams of data science workerscollaborating on extracting insights from data, as opposed to individual data scientists working alone. However,we still lack a deep understanding of how data science workers collaborate in practice. In this work, weconducted an online survey with 183 participants who work in various aspects of data science. We focusedon their reported interactions with each other (e.g., managers with engineers) and with different tools (e.g.,Jupyter Notebook). We found that data science teams are extremely collaborative and work with a varietyof stakeholders and tools during the six common steps of a data science workflow (e.g., clean data and trainmodel). We also found that the collaborative practices workers employ, such as documentation, vary accordingto the kinds of tools they use. Based on these findings, we discuss design implications for supporting datascience team collaborations and future research directions.

CCS Concepts: • Human-centered computing→ Computer supported cooperative work.

Additional Key Words and Phrases: data science; teams; data scientists; collaboration; machine learning;collaborative data science; human-centered data science

ACM Reference Format:Amy X. Zhang, Michael Muller, and Dakuo Wang. 2020. How do Data Science Workers Collaborate? Roles,Workflows, and Tools. Proc. ACM Hum.-Comput. Interact. 4, CSCW1, Article 22 (May 2020), 23 pages. https://doi.org/10.1145/3392826

1 INTRODUCTIONData science often refers to the process of leveraging modern machine learning techniques toidentify insights from data [47, 51, 67]. In recent years, with more organizations adopting a “data-centered” approach to decision-making [20, 88], more and more teams of data science workershave formed to work collaboratively on larger data sets, more structured code pipelines, and moreconsequential decisions and products. Meanwhile, research around data science topics has alsoincreased rapidly within the HCI and CSCW community in the past several years [34, 50, 51, 54, 67,81, 92, 93, 95].

From existing literature, we have learned that the data science workflow often consists of multiplephases [54, 67, 95]. For example, Wang et al. describes the data science workflow as containing3 major phases—Preparation, Modeling, and Deployment—and 10 more fine-grained steps [95].∗Both authors contributed equally to this research.

Authors’ addresses: Amy X. Zhang, [email protected], University of Washington & , MIT, USA; Michael Muller, [email protected], IBM Research, USA; Dakuo Wang, [email protected], IBM Research & , MIT-IBM Watson AI Lab,USA.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without feeprovided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice andthe full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requiresprior specific permission and/or a fee. Request permissions from [email protected].© 2020 Association for Computing Machinery.2573-0142/2020/5-ART22 $15.00https://doi.org/10.1145/3392826

Proc. ACM Hum.-Comput. Interact., Vol. 4, No. CSCW1, Article 22. Publication date: May 2020.

arX

iv:2

001.

0668

4v3

[cs

.HC

] 1

6 A

pr 2

020

https://doi.org/10.1145/3392826

https://doi.org/10.1145/3392826

https://doi.org/10.1145/3392826

22:2 Amy X. Zhang, Michael Muller, and Dakuo Wang

Various tools have also been built for supporting data science work, including programminglanguages such as Python or R, statistical analysis tools such as SAS [58] and SPSS [63], integrateddevelopment environments (IDEs) such as Jupyter Notebook [31, 52], and automated model buildingsystems such as AutoML [29, 60] and AutoAI [95]. And from empirical studies, we know howindividual data scientists are using these tools [49, 50, 81], and what features could be added toimprove the tools for users working alone [92].However, a growing body of recent literature has hinted that data science projects consist of

complex tasks that require multiple skills [51, 62]. These requirements often lead participants tojuggle multiple roles in a project, or to work in teams with others who have distinct skills. Forinstance, in addition to the well-studied role of data scientist [56, 64], who engages in technicalactivities such as cleaning data, extracting or designing features, analyzing/modeling data, andevaluating results, there is also the role of project manager, who engages in less technical activitiessuch as reporting results [43, 55, 70, 77]. The 2017 Kaggle survey reported additional roles involvedin data science [38], but without addressing topics of collaboration. In this work, we limit the rolesin our survey to activities and relationships that were mention in interviews in Muller et al. [67]Unfortunately, most of today’s understanding of data science collaboration only focuses on

the perspective of the data scientist, and how to build tools to support distant and asynchronouscollaborations among data scientists, such as version control of code. The technical collaborationsafforded by such tools [100] only scratch the surface of the many ways that collaborations mayhappen within a data science team, such as when stakeholders discuss the framing of an initialproblem before any code is written or data collected [79]. However, we have little empirical data tocharacterize the many potential forms of data science collaboration.

Indeed, we should not assume data science team collaboration is the same as the activities froma conventional software development team, as argued by various previous literature [34, 54]. Datascience is engaged as an “exploration” process more than an “engineering” process [40, 50, 67].“Engineering” work is oftentimes assumed to involve extended blocks of solitude, without thebenefit of colleagues’ expertise while engaging with data and code [74]. While this perspective onengineering is still evolving [84], there is no doubt that “exploration” work requires deep domainknowledge that oftentimes only resides in domain experts’ minds [50, 67]. And due to the distinctskills and knowledge residing within different roles in a data science team, more challenges withcollaboration can arise [43, 61].In this paper, we aim to deepen our current understanding of the collaborative practices of

data science teams from not only the perspective of technical team members (e.g., data scientistsand engineers), but also the understudied perspective of non-technical team members (e.g., teammanagers and domain experts). Our study covers both a large scale of users—we designed anonline survey and recruited 183 participants with experience working in data science teams—and an in-depth investigation—our survey questions dive into 5 major roles in a data scienceteam (engineer/analyst/programmer, researcher/scientist, domain expert, manager/executive, andcommunicator), and 6 stages (understand problem and create plan, access and clean data, selectand engineer features, train and apply models, evaluate model outcomes, and communicate withclients or stakeholders) in a data science workflow. In particular, we report what other roles a teammember works with, in which step(s) of the workflow, and using what tools.In what follows, we first review literature on the topic of data science work practices and

tooling; then we present the research method and the design of the survey; we report surveyresults following the order of Overview of Collaboration, Collaboration Roles, Collaborative Tools,and Collaborative Practices. Based on these findings, we discuss implications and suggest designs offuture collaborative tools for data science.


How do Data Science Workers Collaborate? Roles, Workflows, and Tools 22:3

2 RELATEDWORKOur research contributes to the existing literature on how data science teams work. We start thissection by reviewing recent HCI and CSCW research on data science work practices; then we takean overview of the systems and features designed to support data science work practices. Finally,we highlight specific literature that aims to understand and support particularly the collaborativeaspect of data science teamwork.

2.1 Data Science Work PracticesJonathan Grudin describes the current cycle of the popularity of AI-related topics in industry andin academia as an “AI Summer” [33]. In the hype surrounding AI, many fancy technology demosmark key milestones, such as IBM DeepBlue [15], which defeated a human chess champion for thefirst time, and Google’s AlphaGo demo, which defeated the world champion in Go [96]. With theseadvances in AI and machine learning technologies, more and more organizations are trying to applymachine learning models to business decision-making processes. People refer to this collection ofwork as “data science” [34, 47, 54, 67] and the various workers who participate in this process as“data scientists” or “data engineers”.

HCI researchers are interested in data science practices. Studies have been conducted to un-derstand data science work practices [34, 43, 50, 54, 61, 67, 75, 76, 81], sometimes using the labelof Human Centered Data Science [4, 66]. For example, Wang et al. proposed a framework of 3stages and 10 steps to characterize the data science workflow by synthesizing existing literature(Figure 1) [95]. The stages consist of Preparation, Modeling, and Deployment, and at a finer-grainedlevel, the framework has 10 steps from Data Acquisition to Model Runtime Monitoring and Im-provement. This workflow framework is built on top of Muller et al.’s work [67], which mostlyfocused on the Preparation steps, and decomposed the data science workflow into 4 stages, based oninterviews with professional data scientists: Data Acquisition, Data Cleaning, Feature Engineering,and Model Building and Selection.

Preparation Modeling

Deployment

EnsemblingModel ValidationModel Improvement

Data Acquisition Data Cleaning & Labeling

Runtime Monitoring

Model Deployment

Hyperparameter Optimization

Feature Engineering Model Selection

Fig. 1. A data science workflow, consisting of three high-level phases: data preparation, model building, andmodel deployment [95]

Given these workflow frameworks and terminology [67, 95], we can position existing empiricalwork within a data science workflow. For example, researchers have suggested that 80% of a datascience project is spent in the Preparation stage [34, 48, 67, 80, 103]. As a result, data scientists oftendo not have enough time to complete a comprehensive data analysis in the Modeling stage [85].



Passi and Jackson dug deeper into the Preparation stage, showing that when data scientists pre-process their data, it is often a rule-based, not rule-bound, process [75]. Pine and Liboiron furtherinvestigated how data scientists made those pre-processing rules [79].However, most of the literature focuses only on a single data scientist’s perspective, despite

many interviewees reporting that “data science is a team sport” [95]. Even in the Muller et al. [67]and Wang et al. [95] workflows, they focus only on the activities that involve data and code, whichwere most likely performed by the technical roles in the data science team. The voices of thenon-technical collaborators within a data science team are missing, including an understanding ofwho they worked with when, and what tools they used.

In contrast to the current literature in data science, software engineering has built a strongliterature on collaborative practices in software development [42, 53, 87], including in both opensource communities [9, 17] and industry teams [6]. As teams working on a single code base canoften be large, cross-site, and globally dispersed, much research has focused on the challenges andpotential solutions for communication and coordination [42]. These challenges can be exacerbatedby cultural differences between teammembers of different backgrounds [36, 44] and by the differentroles of team members such as project manager [101] or operator [86]. Many of the tools used bysoftware engineering teams are also used by data science teams (i.e., GitHub [17], Slack [73]), andthe lessons learned from this work can inform the design of collaborative tools for data science.However, there are also important differences when it comes to data science in particular, such asthe types of roles and technical expertise of data science collaborators as well as a greater emphasison exploration, data management, and communicating insights in data science projects.Many of the papers about solitary data science work practices adopted the interview research

method [34, 50, 54, 67, 92, 95]. An interview research method is well-suited for the exploratorynature of these empirical works in understanding a new practice, but it also falls short in generatinga representative and generalizable understanding from a larger user population. Thus, we decidedto leverage a large-scale online survey to complement the existing qualitative narratives.

2.2 Collaboration in Data ScienceOnly recently have some CSCW researchers began to investigate the collaborative aspect of datascience work [11, 32, 75, 83, 91]. For example, Hou and Wang [43] conducted an ethnography studyto explore collaboration in a civic data hackathon event where data science workers help non-profitorganizations develop insights from their data. Mao et al. [61] interviewed biomedical domainexperts and data scientists who worked on the same data science projects. Their findings partiallyecho previous results [34, 54, 67] that suggest data science workflows have many steps. Moreimportantly, their findings are similar to the software engineering work cited above [42, 53, 87]opening the possibility that data science may also be a highly collaborative effort where domainexperts and data scientists need to work closely together to advance along the workflow.

Researchers also observed challenges in collaborations within data science teams that were notas common in conventional software engineering teams. Bopp et al. showed that “big data” couldbecome a burden to non-profit organizations who lack staff to make use of those data resources[10]. Hou et al. provided a possible reason—i.e., that the technical data workers “speak a differentlanguage” than the non-technical problem owners, such as a non-profit organization (NPO) client, inthe civic data hackathon that they studied [43]. The non-technical NPO clients could only describetheir business questions in natural language, e.g., “why is this phenomenon happening?” But datascience workers do not necessarily know how to translate this business question into a data sciencequestion. A good practice researchers observed in this context was “brokering activity”, where aspecial group of organizers who understand both data science and the context serve as translatorsto turn business questions into data science questions (see Williams’ earlier HCI work on the



importance of “translators” who understand and mediate between multiple domains [99]). Also,once the data workers generated the results, the “brokers” helped to interpret the data sciencefindings into business insights.The aforementioned collaboration challenges [43] are not unique to the civic data hackathon

context. Mao et al. [61] interviewed both data scientists and bio-medical scientists who workedtogether in research projects. They found that these two different roles often do not have commonground about the project’s progress. For example, the goal of bio-medical scientists is to discovernew knowledge; thus, when they ask a research question, that question is often tentative. Oncethere is an intermediate result, bio-medical scientists often need to revise their research questionor ask a new question, because their scientific journey is to “ask the right question”. However, thedata scientists were focused on transferring a research question into a well-defined data sciencequestion so they could optimize machine learning models and increase performance. The behaviorof the bio-medical scientists was perceived by the data scientists as “wasting our time”, as they hadworked hard to “find the answer to the question” that later was discarded. Mao et al. argued that theconstant re-calibration of common ground might help to ease tensions and support cross-disciplinedata science work.These related projects focused only on a civic data hackathon [43] and on the collaborative

projects between data scientists and bio-medical scientists in scientific discovery projects [61].Also, both of them used ethnographic research methods aiming for in-depth understanding of thecontext. In this work, we wanted to target a more commonly available scenario—data science teams’work practices in corporations—as this scenario is where most data science professionals work. Wealso want to gather a broader user perspective through the deployment of an online survey.

2.3 Data Science ToolsBased on the empirical findings and design suggestions from previous literature [11, 32, 43, 61, 67,75, 83, 91], some designers and system builders have proposed human-in-the-loop design principlesfor science tools [2, 3, 28, 49, 50, 92]. For example, Gil et al. surveyed papers about building machinelearning systems and developed a set of design guidelines for building human-centered machinelearning systems [28]. Amershi et al. in parallel reviewed a broader spectrum of AI applicationsand proposed a set of design suggestions for AI systems in general [3].With these design principles and guidelines in mind [3, 28], many systems and features have

been proposed to support aspects of data science work practices. One notable system is JupyterNotebook [45] and its variations such as Google Colab [30] and Jupyter-Lab [46]. Jupyter Notebookis an integrated code development environment tailored for data science work practices. It has agraphical user interface that supports three key functionalities—coding, documenting a narrative,and observing execution results [54]—that are central to data science work [50]. Moreover, theability to easily switch between code and output cells allows data scientists to quickly iterate ontheir model-crafting and testing steps [50, 67].

However, only a few recent works have started to look at designing specific collaborative featuresto support data science teams beyond the individual data scientist’s perspective [16, 68, 81, 92, 93].For example, Jupyter Notebook’s narrative cell feature is designed to allow data scientists to leavehuman-readable annotations so that when another data scientist re-uses the code, they can betterunderstand it. However, Rule et al. found a very low usage of these narrative cells (markdown cells)among a million Jupyter notebooks that they sampled from GitHub [81]. Data scientists were notwriting their notebooks with a future collaborator or re-user in mind.

More recently, Wang and colleagues at University of Michigan have examined how data sciencetools can better support collaboration. Their 2019 study [92] aimed to understand if the JupyterNotebook had a new feature that allows multiple data scientists to synchronously write code



(as many people do in Google Docs today [71]), whether and how data scientists would use itfor their collaboration. They found the proposed feature can encourage more exploration andreduce communication costs, while also promoting unbalanced participation and slacker behaviors.In their 2020 paper [93], Wang et al. took up a related challenge, namely the documentation ofinformal conversations and decisions that take place during data science projects. Building on priorwork [16, 68, 73], they built Callisto, an integration of synchronous chat with a Jupyter notebook.In tests with 33 data science practitioners, Wang et al. showed the importance of automaticcomputation of the reference point in order to anchor chat discussion in the code.

In sum, almost all of the proposed tools and features in data science focus only on the technicalusers’ scenarios (e.g., data scientists and data engineers), such as how to better understand andwrangle data [18, 41], or how to better write and share code [49, 81, 92, 93]. In this work, wewant to present an account that covers both the technical roles and the non-technical roles of aprofessional data science team in corporations, so that we can better propose design suggestionsfrom a multi-disciplinary perspective.

3 METHOD3.1 ParticipantsParticipants were a self-selected convenience sample of employees in IBM who read or contributedto Slack channels about data science (e.g., channel-names such as “deeplearning”, “data-science-at-ibm”, “ibm-nlp”, and similar). Participants worked in diverse roles in research, engineering, healthsciences, management, and related line-of-business organizations.

We estimate that the Slack channels were read by approximately 1000 employees. Thus, the 183people who provided data constituted a 20% percent participation rate.Participants had the option to complete the survey anonymously. Therefore, our knowledge of

the participants is derived from their responses to survey items about their roles on data scienceteams (Figure 2).

3.2 SurveyQuestionsThe survey asked participants to describe a recent data science project, focusing on collaborations (ifany), the roles and skills among the data science team (if appropriate), and the role of collaboratorsat different stages of the data science workflow. Next, we asked open-ended questions aboutthe tools participants used to collaborate, including at different workflow stages.1 Finally, weasked participants to describe their collaborative practices around sharing and re-using code anddata, including their expectations around their own work and their documentation practices. Toencourage more people to contribute, we made all questions optional.

3.3 Survey DistributionWe posted requests to participate in relevant IBM internal Slack channels during January 2019.Responses began to arrive in January. We wrote 2–4 reminder posts, depending on the size andactivity of each Slack channel. We collected the last response on 3 April 2019.

4 RESULTSThe 183 people who responded to the anonymous survey described themselves as being of variedexperience in data science but primarily 0–5 years (Figure 2A). Clearly, some saw connectionsbetween contemporary data science and earlier projects involving statistical modeling, and that is

1Open-text responses were coded by two of the authors. We agreed on a set of coding guidelines in advance, and we resolvedany disagreements through discussion.



A)

B) C)

Fig. 2. Self-reported information about survey respondents: A) Work experience in data science and machinelearning-related projects. B) Histogram of data science team sizes. C) Heatmap of the prevalence of a pair ofroles taken on by one respondent within in a data science project, with the diagonal showing the respondentswho only self-identified as one role.

why we see some long years of experience. Respondents worked primarily in smaller teams of sixpeople or fewer (Figure 2B). A few appeared to have solo data science practices.

Respondents reported that they often acted in multiple roles in their teams, and this may be due tothe fact that most of them have a relatively small team. Figure 2C is a heatmap showing the numberof times in our survey a respondent stated they acted in both roles out of a possible pair (with thetwo roles defined by a cell’s position along the x-axis and y-axis). For cells along the diagonal, wereport the number of respondents who stated they only performed that one role and no other. Ascan be seen, this was relatively rare, except in the case of the Engineer/Analyst/Programmer role.Unsurprisingly, there was considerable role-overlap among Engineers/Analysts/Programmers

and Researchers/Scientists (i.e., the technical roles). These two roles also served—to a lesser extent—in the roles of Communicators and Domain Experts.By contrast, people in the role of Manager/Executive reported little overlap with other roles.

From the roles-overlap heatmap of Figure 2C, it appears that functional leadership—i.e., workingin multiple roles—occurred primarily in technical roles (Engineer/Analyst/Programmer and Re-searcher/Scientist). These patterns may reflect IBM’s culture to definemanagers as people-managers,rather than as technical team leaders.

4.1 Do Data Science Workers Collaborate?Figure 3 shows patterns of self-reported collaborations across different roles in data science projects.First, we begin answering one of the overall research questions: What is the extent of collabo-ration on data science teams?

Table 1. Percentages of collaborations reported by each role.

Role Percent Reporting Collaboration

Engineer/Analyst/Programmer 99%Communicator 96%Researcher/Scientist 95%Manager/Executive 89%Domain Expert 87%



Fig. 3. Who collaborates with whom? Note: raw counts are reported in the stacked bar chart and normalizedpercentages (along the columns) are reported in the heatmap.

4.1.1 Rates of Collaboration. The data behind Figure 3 allow us to see the extent of collaboration foreach self-reported role among the data science workers (Table 1). Among the five data science rolesof Figure 3, three roles reported collaboration at rates of 95% or higher. The lowest collaboration-ratewas among Domain Experts, who collectively reported a collaboration percentage of 87%. In thefollowing subsections, we explore the patterns and supports for these collaborations.

4.1.2 Who Collaborates with Whom? The stacked bar chart to the left in Figure 3 reflects the rawnumbers of people in each role who responded to our survey and stated that they collaboratedwith another role. The heatmap to the right of Figure 3 shows a similar view of the collaborationrelationship—with whom they believe they collaborate—as the chart on the left, except that thecells are now normalized by the total volume in each column. The columns (and the horizontal axis)represent the reporter of a collaborative relationship. The rows (and the vertical axis) represent thecollaboration partner who is mentioned by the reporter at the base of each column. Lighter colorsin the heatmap indicate more frequently-reported collaboration partnerships.When we examine a square heatmap with reciprocal rows and columns, we may look for

asymmetries around the major diagonal. For each pair of roles (A and B), do the informants reporta similar proportion of collaboration in each direction—i.e., does A report about the same level ofcollaboration with B, as B reports about A?

Surprisingly, we see a disagreement about collaborations in relation to the role of Communicator.Communicators report strong collaborations with Managers and with Domain Experts, as shown inthe Communicator column of Figure 3. However, these reports are not fully reciprocated by thosecollaboration partners. As shown in the row for Communicators, most roles reported little collabo-ration with Communicators relative to the other roles. A particularly striking difference is thatCommunicators report (in their column) relatively strong collaboration with Managers/Executives,but the Managers/Executives (in their own column) report the least collaboration with Communi-cators. There is a similar, less severe, asymmetry between Communicators and Domain Experts.We will later interpret these findings in the Discussion in Section 5.2.2.

4.1.3 Are there “Hub” Collaborator Roles? Are certain roles dominant in the collaboration networkof Figure 3? Figure 4 shows the reports of collaboration from Figure 3 as a network graph. Eachreport of collaboration takes the form of a directed arc from one role to another. The direction ofthe arc between e.g. (A->B) can be interpreted as “A reports collaboration with B.” The thicknessof each arc represents the proportion of people who report each directed-type of collaboration.To avoid distortions due to different numbers of people reporting in each role, we normalized



Fig. 4. Network graph of reported collaborative relationships. The arrow from Researcher to Communicatormay be interpreted as aggregated reports by Researchers about their collaboration with Communicators.Note that some pairwise relationships do not have equal bidirectional symmetry. The thickness of each arcrepresents the proportion of people who reported each directed-type of collaboration, normalized by numberof people in each role.

the width of each arc as the number of reported collaborations divided by the number of peoplereporting from that role. Self-arcs represent cases in which the two collaborators were in the samerole—e.g., an engineer who reports collaborating with another engineer.

With one exception, this view shows relatively egalitarian strengths of role-to-role collaboration.While we might expect to find Managers/Executives as the dominant or “hub” role, their collabora-tive relations are generally similar to those of Engineers and Researchers. Domain Experts are onlyslightly less engaged in collaborations.The exception occurs, as noted above, in relation to Communicators. Communicators in this

graph clearly believe that they are collaborating strongly with other roles (thick arrows), but theother roles report less collaboration toward Communicators (thin arrows).

The self-loop arrows are also suggestive. These arrows appear to show strong intra-role collabo-rations among Engineers, Researchers, and Communicators. By contrast, Managers/Executives andDomain Experts appear to collaborate less with other members of their own roles.

4.2 Collaborator Roles in Different Stages of the Data Science WorkflowAs reviewed above in relation to Figure 1, data science projects are often thought to follow a seriesof steps or stages—even if these sequences serve more as mental models than as guides to dailypractice [67, 75]. We now consider how the roles of data science workers interact with those stages.Figure 5 shows the relative participation of each role as a collaborator in the stages of a data

science workflow. As motivated in the Related Work section, in this paper, we adopted a six-stepview of a reference-model data science workflow, beginning with creating a measurement plan[79], and moving through technical stages to an eventual delivering stage of an analysis or modelor working system. Some organizations also perform a check for bias and/or discrimination duringthe technical development [7]. However, because that step is not yet accepted by all data scienceprojects and may happen at different stages, we have listed that step separately at the end of thehorizontal axis of the stacked bar chart in Figure 5.



Fig. 5. The roles of collaborators during the stages of a data science project.

4.2.1 Where do Non-Technical RolesWork? The data for Figure 5 show highly significant differencesfrom one column to the next column (χ 248 = 148.777, p< .001).Through a close examination of Figure 5, we found that the degree of involvement by Man-

agers/Executives and by Communicators is roughly synchronized—despite their seeming lack ofcollaboration patterns as seen in Figures 3 and 4. Each of these roles is relatively strongly engagedin the first stage (measurement plan) and the last two stages (evaluate, communicate), but largelyabsent from the technical work stages (access data, features, model). Perhaps each of these roles isengaged with relatively humanistic aspects of the work, but with different and perhaps unconnectedhumanistic aspects for each of their distinct roles.

4.2.2 Where do Domain Experts Work? The involvement of Domain Experts is similar to that ofManagers and Communicators, but to a lesser extent. Domain experts are active at every stage, incontrast to Communicators, who appear to drop out during the modeling stage. Domain experts arealso more engaged (by self-report) during stages in which Managers have very little engagement.Thus, it appears that Domain Experts are either directly involved in the core technical work,or are strongly engaged in consultation during data-centric activities such as data-access andfeature-extraction. They take on even more prominent roles during later stages of evaluating andcommunicating.

4.2.3 Where do Technical Roles Work? There is an opposite pattern of engagement for the coretechnical work, done by Engineers/Analysts/Programmers, who are most active while the Managersand Communicators are less involved.

The degree of involvement by Researchers/Scientists seems to be relatively stable and stronglyengaged across all stages. This finding may clarify the “hub” results of Figure 4, which suggestedrelatively egalitarian collaboration relations. Figure 5 suggests that perhaps Researchers/Scientistsactively guide the project through all of its stages.

4.2.4 Who Checks AI Fairness and Bias? The stage of assessment and mitigation of bias appears tobe treated largely as a technical matter. Communicators and Managers have minimal involvement.Unsurprisingly, Domain Experts play a role in this stage, presumably because they know moreabout how bias may creep into work in their own domains.



Table 2. Categories of data science tools and the number of times each tool was mentioned by respondents.

Tool Category Tools Mentioned by Respondents (number of times mentioned)

asynchronous discussion Slack (86), email (55), Microsoft Teams (1)synchronous discussion meeting (13), e-meeting (12), phone (1)project management Jira (8), ZenHub (2), Trello (1)code management GitHub (56), Git (5)code Python (42), R (9), Java (3), scripts (3)code editor Visual Studio Code (11), PyCharm (11), RStudio (8), Eclipse (1), Atom

(1)interactive code environment Jupyter Notebook (66), SQL (6), terminal (4), Google Colab (4)software package Scikit-learn (3), Shiny App (2), Pandas (2)analytics/visualization SPSS (27), Watson Analytics (22), Cognos (7), ElasticSearch (4), Apache

Spark (3), Graphana (2), Tableau (2), Logstash (2), Kibana (1)spreadsheet Microsoft Excel (22), spreadsheets (3), Google Sheets (1)document editing wiki (2), LaTeX (2), Microsoft Word (2), Dropbox Paper (2), Google Docs

(1)filesharing Box (43), cloud (5), NFS (2), Dropbox (1), Filezilla (1)presentation software Microsoft Powerpoint (18), Prezi (1)Note: code allows programmers to write algorithms for data science. code editor and interactive code

environment provide a user experience for writing that code. code management is where the code may bestored and shared. By contrast, analytics/visualization provides “macro-level” tools that can invoke entire

steps or modular actions in a data science pipeline.

4.2.5 Summary. From the analyses in this section, we begin to see data science work as a conver-gence of several analytic dimensions: people in roles, roles in collaboration, and roles in a sequenceof project activities. The next section adds a fourth dimension, namely the tools used by datascience workers.

4.3 Tooling for CollaborationWe asked respondents to describe the tools that they used in the stages of a data science project—i.e.,the same stages as in the preceding section. We provided free-text fields for them to list their tools,so that we could capture the range of tools used. We then collaboratively converted the free-textresponses into sets of tools for each response, before iteratively classifying the greater set of toolsfrom all responses into 13 higher-level categories, as shown in Table 2.2

When we examined the pattern of tools usage across project stages (Figure 6), we found highlysignificant differences across the project stages (χ 272 = 209.519, p< .001). As above, we summarizetrends that suggest interesting properties of data science collaboration:

4.3.1 Coding and Discussing. The use of coding resources was as anticipated. Coding resourceswere used during intense work with data, and during intense work with models. Code may serveas a form of asynchronous discussion (e.g., [13]): Respondents tended to decrease their use ofasynchronous discussion during project stages in which they made relatively heavier use of codingresources.

4.3.2 Documentation of Work. We were interested to see whether and how respondents docu-mented their work. Respondents reported some document-editing during the activities leading to a2We discussed the classification scheme repeatedly until we were in agreement about which tool fit into each category. Wepostponed all statistical analyses until we had completed our social process of classification.



Fig. 6. The tools used in each stage of a data science project. Note: on the left is the raw count of each toolcategory for each stage, while on the right, each column is normalized to sum to 1.0.

measurement plan. There was also a small use of presentation software, which can of course serveas a form of documentation.3 The use of these tools returned during the stage of delivery to clients.

4.3.3 Gaps in Documentation for Feature Engineering. In contrast, we were surprised that therewas little use of documents during the phase of feature-extraction and feature-engineering. Thisstage is an important site for the design of data [23]. The meaning and nature of the data may bechanged [23, 67] during this time-consuming step [34, 48, 67, 80, 103]. During this phase, the use ofsynchronous discussion tools dropped to nearly zero, and the use of asynchronous discussion toolswas relatively low. There was relatively little use of filesharing tools. It appears that these teamswere not explicitly recording their decisions. Thus, important human decisions may be inscribedinto the data and the data science pipeline, while simultaneously becoming invisible [67, 79]. Theimplications for subsequent re-analysis and revision may be severe.

4.3.4 Gaps in Documentation for Bias Mitigation. We were similarly surprised that the stage of biasdetection and mitigation also seemed to lack documentation, except perhaps through filesharing.We anticipate that organizations will begin to require documentation of bias mitigation as biasissues become more important.

4.3.5 Summary. In Section 4.1, we showed that data science workers engage in extensive collabo-ration. Then in Section 4.2 we showed that collaboration is pervasive across across all stages of datascience work, and that members of data science teams are intensely involved in those collaborativeactivities. By contrast, this section shows gaps in the usage of collaborative tools. We propose that anew generation of data science tools should be created with collaboration “baked in” to the design.

4.4 Collaborative Practices around Code and DataFinally, we sought to understand how tool usage by a respondent relates to their practices aroundcode reading, re-use, and documentation, as well as data sharing, re-use, and documentation.Particularly, if technical team members must collaborate with non-technical team members, thentools and practices to support documentation will be key.3We consider the use of spreadsheets to be ambiguous in terms of documentation. Spreadsheets function both as recordsand as discardable scratch-pads and sandboxes. Baker et al. summarize the arguments for treating spreadsheets not asdocumentation, but rather as tools that are in need of external documentation (e.g., [19]), which is often lacking [5].



Table 3. Survey respondents clustered by their self-reported tool usage.

Respondent Clusters Number of PeoplePer Cluster

Tools Frequently Mentioned(number of times mentioned across questions)

0 (project managed) 19 GitHub (86), Slack (79), email (47), Box (26)1 (interactive) 13 Jupyter Notebook (82), GitHub (44), Slack (22)

2 (scripted) 44 Python (50), SPSS (44), GitHub (27), Jupyter notebook(27), Slack (24)

To begin, we clustered the survey respondents into different clusters according to their self-reported tool usage. To create a “tools profile” for each respondent, we used the questions regardingtool usage, described in Section 4.3, and summed up all the mentions of each tool from all theopen-ended questions on tool usage. Thus, if a respondent answered only “GitHub” for all 7 stagesof their data science project, then they would have a count of 7 under the tool “GitHub” and a countof 0 elsewhere.Using the k-means clustering algorithm in the Scikit-learn Python library, we found that k=3

clusters resulted in the highest average silhouette coefficient of 0.254. This resulted in the clustersdescribed in Table 3. We only included respondents who had mentioned at least one tool acrossall the tool usage questions; as the questions were optional, and we experienced some dropoutpartway through the survey, we had 76 respondents to cluster.We saw that the respondents in the first cluster (Cluster 0) mentioned using both GitHub and

Slack at multiple points in their data science workflow, as well as email and Box to a lesser extent.Given these tools’ features for project management, including code management, issue tracking,and team coordination, we characterize this cluster of respondents as project managed. In contrast,respondents in Cluster 1 mentioned using Jupyter Notebook repeatedly, and only occasionallymentioned other tools; thus we designate the cluster’s respondents as using interactive tools due toJupyter Notebook’s interactive coding environment. Finally, Cluster 2 had the most respondents anda longer tail of mentioned tools. However, the tools most mentioned were Python and SPSS; thus,we characterize this cluster of respondents as using scripted tools. We also noticed that Cluster 2was predominately made up of self-reported Engineers/Analysts/Programmers at 80%. Meanwhile,Researchers/Scientists had the greatest prevalence in Cluster 0 and Cluster 1, with 84.2% and 84.6%,respectively.

4.4.1 Reading and Re-using Others’ Code and Data. In Figure 7, we report the answers in theaffirmative to questions asking respondents whether they read other people’s code and data andre-used other people’s code and data, separated and normalized by cluster. One finding that stoodout is the overall lower levels of collaborative practices around data as opposed to code. This wasobserved across all three clusters of tool profiles, despite the ability in some tools, such as GitHub,to store and share data.When comparing across the stages of planning, coding, and testing of code, there were few

noticeable differences between clusters except in the stage of testing code. Here, we saw thatClusters 1 (interactive) and 2 (scripted) had relatively fewer respondents reading others’ code inthe testing phase (and Cluster 2 had few respondents re-using others’ code in the testing phase). Itmay be that in an interactive notebook or scripting environment, there is relatively less testing, incontrast to the practice of writing unit tests in larger software projects, and as a result, a relativelylower need for alignment with others’ code when it comes to testing. We also saw that Cluster 0(project-managed) had no respondents that did not read other people’s code or did not re-use other



Fig. 7. Reading and re-use of others’ code and data across respondent clusters.

Table 4. Expectations around one’s own code and data being re-used.

Cluster 0(Project managed)

Cluster 1(Interactive)

Cluster 2(Scripted) All

Expect that their code will be re-used 68.4% 84.6% 80.9% 78.8%

Expect that their data will be re-used 73.7% 46.1% 50% 59.6%

Table 5. Code and data documentation practices according to each cluster.

Documentation Practice Cluster 0(Project managed)

Cluster 1(Interactive)

Cluster 2(Scripted)

Code

In-line comments 100% 84.6% 90.5%Longer blocks of comments in the code 68.4% 30.8% 38.1%Markdown cells in notebooks 63.2% 92.3% 38.1%External documents 63.2% 61.5% 28.6%

Data

Column labels 66.7% 63.6% 75%Data dictionary 50% 63.6% 40%Extra text in JSON schema (or similar) 27.8% 18.2% 15%External documents 77.8% 45.5% 50%

people’s code, which suggests that workers in this cluster are coordinating their code with others,using tools like GitHub and Slack.

4.4.2 Expectations Around One’s Own Code and Data Being Re-used. In Table 4, we report onrespondents answers to their expectations for how their own code or data will be used by others.Respondents were more likely to state that they expected others to re-use their code as opposed totheir data. In the case of code re-use, peoples’ expectations were slightly lower for respondents inCluster 0 and slightly higher for respondents in the other clusters, though this was not significant.We also saw that the expectation that data would be re-used was more prevalent in Cluster 0 whilerelatively low in Cluster 1 and 2. This may be because the native features for displaying structureddata or coordinating the sharing of data, such as using version control, are more rudimentary withintools like Jupyter Notebook, although a few recent works have developed prototypes examined ina lab environment [49, 92, 93].



4.4.3 Code and Data Documentation Practices. In Table 5, we show respondents’ answers to howthey document their code as well as their data, broken down by cluster. Overall, we see higher ratesof self-reported code documentation in Cluster 0 (project managed) and 1 (interactive) comparedto 2 (scripted). For instance, 100% of members of Cluster 0 said they used in-line comments todocument their code. Cluster 2 also had high rates of using in-line comments, though other practiceswere infrequently used. Unsurprisingly, the use of markdown cells in notebooks was most prevalentin Cluster 1 (interactive), while longer blocks of comments in the code was least used (30.8%)likely because markdown cells perform that function. A lack of code documentation besides in-linecomments in Cluster 2 suggests that it may be more difficult for collaborators to work with codewritten by members of this cluster. We note that this is not due to a low expectation within Cluster2 that code would be re-used.We found that data science workers overall performed less documentation when it comes to

data as opposed to code, perhaps due to their perceptions around re-use. Even something basic likeadding column labels to data sets was not performed by a third to a quarter of members of eachcluster, as shown in Table 5. Instead, the most prevalent practice within any of the clusters wasthe use of external documents by Cluster 0 (project managed) at 77.8%. While external documentsallow data set curators to add extensive documentation about their data, one major downside isthat they are uncoupled—there is little ability to directly reference, link to, or display annotationson top of the data itself. This may lead to issues where the documentation can be lost, not noticedby a collaborator, or misunderstood out of context.

4.4.4 Summary. In this section, we examined the collaborative practices of data science workersin relation to the kinds of tools they use. Through clustering, we identified three main “toolsprofiles”. The first makes heavy use of GitHub and Slack and is relatively active in reading otherpeople’s code, re-using other people’s code, expecting that others would use one’s code and data,and documenting code. Out of all the three clusters, workers using this tool profile seem to have thehealthiest collaborative practices. However, even this cluster has relatively low rates of collaborationand documentation around data.

The second cluster primarily uses Jupyter Notebook for data science work. While people in thiscluster were generally active in code collaboration and code documentation, we notice a lower rateof reading others’ code while testing one’s code as well as a low expectation that one’s data wouldbe re-used.The third cluster had a greater variety of tool usage but more emphasis on writing scripts in

Python or SPSS. This cluster had low rates of code documentation outside of in-line comments,signaling potential difficulties for non-technical collaborators.

5 DISCUSSIONWe began our Results by asking “Do data science workers collaborate?” The answer from this surveydataset is clearly “yes.” These results are in agreement with prior work [11, 32, 43, 61, 75, 83, 91, 95].In this paper, we provide a greater depth of collaboration information by exploring the interactionsof team roles, tools, project stages, and documentation practices. One of our strong findings is thatpeople in most roles report extensive collaboration during each stage of a data science project.These findings suggest new needs among data science teams and communities, and encourage usto think about a new generation of “collaboration-friendly” data science tools and environments.

5.1 Possible Collaborative Features5.1.1 Provenance of data. We stated a concern earlier that there seemed to be insufficient use ofdocumentation during multiple stages of data science projects and fewer practices of documentation



for data as opposed to code. Partly this may be due to a lack of expectations that one’s data willever be re-used by another. In addition, there are now mature tools for collaborating on code dueto over a decade of research and practice on this topic in the field of software engineering [17, 87];however, fewer tools exist for data and are not yet widely adopted. The absence of documentationmay obscure the source of datasets as well as computations performed over datasets in the stepsinvolving data cleaning or transformation. The problem can be compounded if there is a needto combine datasets for richer records. When teams of data science workers share data, then theknowledge of one person may be obscured, and the organizational knowledge of one team may notbe passed along to a second team. Thus, there is a need for a method to record data provenance. Amethod for embedding this information within the data themselves would be a superior outcome asopposed to within external documents. As one example, the DataHub project [8] replicates GitHub-like features of version control and provenance management but for datasets. In a similar vein, theModelDB project provides version control and captures metadata about machine learning modelsover the course of their development [89]. Beyond provenance captured automatically, there needsto be ways for collaborators to record discussions and decisions made with each transformation.

5.1.2 Provenance of code. There also remain subtle issues in the provenance of code. At first, itseems as if the reliance of data science on code packages and code libraries should obviate any needfor documentation of code in data science. However, in Section 4.3.3, we discussed the invisibilityof much of the work on feature extraction and feature engineering. The code for these activitiesis generally not based on a well-known and well-maintained software package or product. If thiscode becomes lost, the important knowledge about the nature and meaning of the features [66, 67]may also be lost.

However, lack of motivation to document lower-level decision-making may be a limiting factortowards stronger documentation practices, particularly in an “exploration” mindset. In the softwareengineering realm, tools have been proposed to support more lightweight ways for programmersto externalize their thought processes, such as social tagging of code [84] or clipping rationalesfrom the web [59]. Other tools embed and automatically capture context while programmers areforaging for information to guide decisions, such as search [12] and browsing history [24, 37].Similar ideas could be applied in the case of data science, where users may be weighing the use ofdifferent code libraries or statistical methods. Other decisions such as around feature engineeringmay result from conversations between team members [73] that then could be linked in the code.

In addition, we noticed a drop-off in collaborative code practices when it came to testing already-written code. This has important implications for developing standards around testing for dataand model issues of bias, which will only be more important in years to come. Thus, preservingthe provenance of code may also be important to keep the data processing steps transparent andaccountable.

5.1.3 Transparency. More broadly, data science projects may inadvertently involve many assump-tions, improvisations, and hidden decisions. Some of these undocumented commitments may arisethrough the assumption that everyone on the team shares certain knowledge—but what about thenext team that “inherits” code or data from a prior project? As we just noted, this kind of transparenttransmission of knowledge may be important with regard to the design of features [23, 67]. It canalso be important for the earlier step of establishing a data management plan, which may define, inpart, what qualifies as data in this project [79].We advocate to make invisible activities more visible—and thus discuss-able and (when nec-

essary) debatable and accountable. This argument for transparency is related but distinct fromthe ongoing “Explainable AI” (XAI) initiative—XAI emphasizes using various techniques (e.g.,



visualization [97]) and designs to make machine learning algorithms understandable by non-technical users [3, 21, 39, 57, 102], whereas we argue for the explanation of decisions among thevarious data science creators of machine learning algorithms. Recent work in this space similarlyargues for more documentation to improve transparency, as well as greater standardization arounddocumentation [27, 65], particularly important when it comes to publicly-released datasets andmodels.

5.2 Collaborating with Whom? and When?These concerns for provenance and transparency may be important to multiple stakeholders. Teammembers are obvious beneficiaries of good record-keeping. In the language of value sensitivedesign [25], team members are direct stakeholders—i.e., people who directly interact with datascience tools in general, and the project’s code in particular. Again using concepts from valuesensitive design, there are likely to be multiple indirect stakeholders—i.e., people who are affectedby the data science system, or by its code, or by its data.

5.2.1 Indirect Stakeholders. For a data science project, indirect stakeholders might be future projectteams. These peers (or future peers) would benefit from understanding what decisions were made,and how data were defined [79] and transformed [23, 67]. For data science projects that affectbank loans [14, 72] prison sentences [78], or community policing [90], the public are also indirectstakeholders as they worry about the possibility of inequitable treatment or faulty data. Finally,another beneficiary of provenance and transparency is one’s own future self, who may return toa data science project after a year or two of other engagements, only to discover that the teamhas been dispersed, personal memories have faded, and the project needs to be learned like anyunfamiliar data science resource.

5.2.2 “Imbalanced” Collaboration. In Section 4.1.2, we observed that there is a mismatch aroundperceived collaborations between different roles. For example, Communicators believed they collab-orate a lot with Managers/Executives, but the Managers/Executives perceived they collaborated theleast with Communicators. This result is the normalized proportions of the reported collaborationsfrom each role in Figure 3, so it is possible that Managers/Executives may collaborate a lot with allother roles and the collaboration with Communicators has the smallest proportion among thesecollaborations.

We speculate that the collaborations reported by our informants may have been highly directional.Communicators may have received information from other roles—or may simply have read shareddocuments or observed meetings—to find the information that they needed to communicate. Theirrole may have been largely to receive information, and they are likely to have been aware of theirdependencies on other members of the team. By contrast, the other roles may have perceivedCommunicators as relatively passive team members. These other roles may have consideredthat they themselves received little from the Communicators, and may have down-reported theircollaborations accordingly.Different roles also reported different intra-role collaboration patterns in Section 4.1.3. These

patterns suggest that the people in these roles may have different relationships with their owncommunities of practice [22, 98]. There may be stronger peer communities for each of Engineers,Researchers, and Communicators, and there may be weaker peer communities for each of Managersand Domain Experts. It may be that Domain Experts are focused within their own domain, andmay not collaborate much with Domain Experts who work in other domains. It may be thatManagers/Executives focus on one data science project at-a-time, and do not consult with theirpeers about the technical details within each project.



5.3 AI Fairness and BiasThe detection, assessment, and mitigation of bias in data science systems is inherently complexand multidisciplinary, involving expertise in prediction and modeling, statistical assessments inparticular domains, domain knowledge of an area of possible harms, and aspects of regulations andlaw. There may also be roles for advocates for particular affected groups, and possibly advocatesfor commercial parties who favor maintaining the status quo.In these settings, a data science pipeline becomes an object of contention. Making sense of the

data science pipeline requires multiple interpreters from diverse perspectives, including adversarialinterpreters [25]. All of the issues raised above regarding provenance and transparency are relevant.

Our result confirmed that in data science collaborations, there are activities around AI fairnessand bias detection and mitigation happening along the data science workflow in Section 4.2.4, andit appears to be treated largely as a technical matter. For example, data scientists and engineers areinvolved in the process as they follow up with the latest technical algorithms on how to detect biasand fix it. Our results also suggest that Domain Experts also plays a role in the Bias detection andMitigation process, presumably because they know more about how bias may creep into work intheir own domains.However, we did not see much involvement from Communicators and Managers/Executives.

This is surprising, as Communicators and Managers are the ones who may know the most aboutpolicy requirements and worry the most about the negative consequences of a biased AI algorithm.We speculate that Managers may become more involved in this stage in the future, as bias issuesbecome more salient in industry and academia [1, 26, 35].

5.4 Limitations and Future DirectionsOur survey respondents were all recruited from IBM—a large, multinational technology company—and their views may not be fully representative of the larger population of professionals workingin the data science related projects.One example of how our results might be skewed comes from the fact that almost all of our

respondents worked in small teams, typically with 5 or 6 collaborators in a team. While this numberis consistent to what previous literature reported (e.g., Wang et al. reported 2-3 data scientists in ateam [95], and our work also counts managers, communicators, researchers, and engineers), in othercontexts the size of data science teams may vary. Also, due to the fact that all these respondentsare from the same company, their preference in selecting tools and how to use these tools may bedominated by the company culture. The findings may be different if we study data science teams’collaborative practices in different scenarios, such as in offline data hackathons [43].Another limitation is that our findings are based on self-reported data using an online survey.

Despite this research method’s power of covering a broader user population, it is also known thatsurvey respondents may have bias in answering those behavioral questions. We see this rather asa new promising research direction than limitation, and we look forward to conducting furtherstudies with Contextual Inquiry [82], Participatory Analysis [69], or Value Sensitive Design [25] toobserve and track how people actually behave in a data science team collaboration.

We should also note that data science teams may not always appreciate the proposed features thatincrease transparency and accountability of each team member’s contribution, as they may havenegative effects. In the co-editing activities enabled by Google Doc-like features, writers sometimesdo not want to have such high transparency [94]. Thus, we need additional user evaluations ofcollaboration features before deploying them into the real world.



6 CONCLUSIONIn this paper, we presented results of a large-scale survey of data science workers at a majorcorporation that examined how data science workers collaborate. We find that not only do datascience workers collaborate extensively, they perform a variety of roles, and work with a varietyof stakeholders during different stages of the data science project workflow. We also investigatedthe tools that data scientists use when collaborating, and how tool usage relates to collaborativepractices such as code and data documentation. From this analysis, we present directions for futureresearch and development of data science collaboration tools.

In summary, we hope we have made the following contributions:

• The first large in-depth survey about data science collaborative practices, and the first largestudy to provide roles-based analyses of collaborations.

• The first large-scale study of data science activities during specific stages of data scienceprojects.

• The first analysis of collaborative tools usage across the stages of data science projects.• The first large-scale analysis of documentation practices in data science.

ACKNOWLEDGMENTSWe appreciate all the survey respondents for participating in our online survey. This work isgenerously supported by MIT-IBM Watson AI Lab under the “Human-in-the-loop AutomatedMachine Learning” project.

REFERENCES[1] Serge Abiteboul, Gerome Miklau, Julia Stoyanovich, and Gerhard Weikum. 2016. Data, responsibly (dagstuhl seminar

16291). In Dagstuhl Reports, Vol. 6. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.[2] Saleema Amershi, Bongshin Lee, Ashish Kapoor, Ratul Mahajan, and Blaine Christian. 2011. Human-guided machine

learning for fast and accurate network alarm triage. In Twenty-Second International Joint Conference on ArtificialIntelligence.

[3] Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, ShamsiIqbal, Paul N Bennett, Kori Inkpen, et al. 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 CHIConference on Human Factors in Computing Systems. ACM, 3.

[4] Cecilia Aragon, Clayton Hutto, Andy Echenique, Brittany Fiore-Gartland, Yun Huang, Jinyoung Kim, Gina Neff,Wanli Xing, and Joseph Bayer. 2016. Developing a research agenda for human-centered data science. In Proceedings ofthe 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion. ACM, 529–535.

[5] Kenneth R Baker, Lynn Foster-Johnson, Barry Lawson, and Stephen G Powell. 2006. A survey of MBA spreadsheetusers. Spreadsheet Engineering Research Project. Tuck School of Business 9 (2006).

[6] Andrew Begel. 2008. Effecting change: Coordination in large-scale software development. In Proceedings of the 2008international workshop on Cooperative and human aspects of software engineering. ACM, 17–20.

[7] Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia,Jacquelyn Martino, Sameep Mehta, A Mojsilović, et al. 2019. AI Fairness 360: An extensible toolkit for detecting andmitigating algorithmic bias. IBM Journal of Research and Development 63, 4/5 (2019), 4–1.

[8] Anant Bhardwaj, Souvik Bhattacherjee, Amit Chavan, Amol Deshpande, Aaron J Elmore, Samuel Madden, andAditya G Parameswaran. 2014. Datahub: Collaborative data science & dataset version management at scale. arXivpreprint arXiv:1409.0798 (2014).

[9] Christian Bird, David Pattison, Raissa D’Souza, Vladimir Filkov, and Premkumar Devanbu. 2008. Latent socialstructure in open source projects. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations ofsoftware engineering. ACM, 24–35.

[10] Chris Bopp, Ellie Harmon, and Amy Voida. 2017. Disempowered by data: Nonprofits, social enterprises, and theconsequences of data-driven work. In Proceedings of the 2017 CHI conference on human factors in computing systems.ACM, 3608–3619.

[11] Christine L Borgman, Jillian C Wallis, and Matthew S Mayernik. 2012. Who’s got the data? Interdependencies inscience and technology collaborations. Computer Supported Cooperative Work (CSCW) 21, 6 (2012), 485–523.



[12] Joel Brandt, Mira Dontcheva,MarcosWeskamp, and Scott R Klemmer. 2010. Example-centric programming: integratingweb search into the development environment. In Proceedings of the SIGCHI Conference on Human Factors in ComputingSystems. ACM, 513–522.

[13] Laurence Brothers, V Sembugamoorthy, and M Muller. 1990. ICICLE: groupware for code inspection. In Proceedingsof the 1990 ACM conference on Computer-supported cooperative work. ACM, 169–181.

[14] Matthew Adam Bruckner. 2018. The promise and perils of algorithmic lenders’ use of big data. Chi.-Kent L. Rev. 93(2018), 3.

[15] Murray Campbell, A Joseph Hoane Jr, and Feng-hsiung Hsu. 2002. Deep blue. Artificial intelligence 134, 1-2 (2002),57–83.

[16] Rose Chang, Meredith Granger, Alena Bueller, and Taka Shimokobe. 2018. Designing comments. Poster at JupyterCon2018.

[17] Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social coding in GitHub: transparency andcollaboration in an open software repository. In Proceedings of the ACM 2012 conference on computer supportedcooperative work. ACM, 1277–1286.

[18] Tommy Dang, Fang Jin, et al. 2018. Predict saturated thickness using tensorboard visualization. In Proceedings of theWorkshop on Visualisation in Environmental Sciences. Eurographics Association, 35–39.

[19] J Steve Davis. 1996. Tools for spreadsheet auditing. International Journal of Human-Computer Studies 45, 4 (1996),429–442.

[20] Seth Dobrin and IBM Analytics. 2017. How IBM builds an effective data science team. https://venturebeat.com/2017/12/22/how-ibm-builds-an-effective-data-science-team/

[21] Jaimie Drozdal, Justin Weisz, Dakuo Wang, Dass Gaurave, Bingsheng Yao, Changruo Zhao, Michael Muller, Lin Ju,and Hui Su. 2020. Exploring Information Needs for Establishing Trust in Automated Data Science Systems. In IUI’20.ACM, in press.

[22] Paul Duguid. 2005. âĂĲThe art of knowingâĂİ: Social and tacit dimensions of knowledge and the limits of thecommunity of practice. The information society 21, 2 (2005), 109–118.

[23] Melanie Feinberg. 2017. A design perspective on data. In Proceedings of the 2017 CHI Conference on Human Factors inComputing Systems. ACM, 2952–2963.

[24] Adam Fourney and Meredith Ringel Morris. 2013. Enhancing technical Q&A forums with CiteHistory. In SeventhInternational AAAI Conference on Weblogs and Social Media.

[25] Batya Friedman, Peter H Kahn, Alan Borning, and Alina Huldtgren. 2013. Value sensitive design and informationsystems. In Early engagement and new technologies: Opening up the laboratory. Springer, 55–95.

[26] Megan Garcia. 2016. Racist in the machine: The disturbing implications of algorithmic bias. World Policy Journal 33,4 (2016), 111–117.

[27] Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, III Daumeé,Hal, and Kate Crawford. 2018. Datasheets for Datasets. arXiv e-prints, Article arXiv:1803.09010 (Mar 2018),arXiv:1803.09010 pages. arXiv:cs.DB/1803.09010

[28] Yolanda Gil, James Honaker, Shikhar Gupta, Yibo Ma, Vito D’Orazio, Daniel Garijo, Shruti Gadewar, Qifan Yang, andNeda Jahanshad. 2019. Towards human-guided machine learning. In Proceedings of the 24th International Conferenceon Intelligent User Interfaces. ACM, 614–624.

[29] Google. [n.d.]. Cloud AutoML. Retrieved 3-April-2019 from https://cloud.google.com/automl/[30] Google. [n.d.]. Colaboratory. Retrieved 3-April-2019 from https://colab.research.google.com[31] Brian Granger, Chris Colbert, and Ian Rose. 2017. JupyterLab: The next generation jupyter frontend. JupyterCon 2017

(2017).[32] Corrado Grappiolo, Emile van Gerwen, Jack Verhoosel, and Lou Somers. 2019. The Semantic Snake Charmer Search

Engine: A Tool to Facilitate Data Science in High-tech Industry Domains. In Proceedings of the 2019 Conference onHuman Information Interaction and Retrieval. ACM, 355–359.

[33] Jonathan Grudin. 2009. AI and HCI: Two fields divided by a common focus. Ai Magazine 30, 4 (2009), 48–48.[34] Philip J Guo, Sean Kandel, Joseph MHellerstein, and Jeffrey Heer. 2011. Proactive wrangling: mixed-initiative end-user

programming of data transformation scripts. In Proceedings of the 24th annual ACM symposium on User interfacesoftware and technology. ACM, 65–74.

[35] Sara Hajian, Francesco Bonchi, and Carlos Castillo. 2016. Algorithmic bias: From discrimination discovery to fairness-aware data mining. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and datamining. ACM, 2125–2126.

[36] Christine A Halverson, Jason B Ellis, Catalina Danis, and Wendy A Kellogg. 2006. Designing task visualizations tosupport the coordination of work in software development. In Proceedings of the 2006 20th anniversary conference onComputer supported cooperative work. ACM, 39–48.


https://venturebeat.com/2017/12/22/how-ibm-builds-an-effective-data-science-team/

https://venturebeat.com/2017/12/22/how-ibm-builds-an-effective-data-science-team/

http://arxiv.org/abs/cs.DB/1803.09010

https://cloud.google.com/automl/

https://colab.research.google.com


[37] Björn Hartmann, Mark Dhillon, and Matthew K Chan. 2011. HyperSource: bridging the gap between source andcode-related web sites. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM,2207–2210.

[38] Bob Hayes. 2018. Top 10 challenges to practicing data science at work. http://businessoverbroadway .com/top-10-challengesto-practicing-data-science-at-work.

[39] Jeffrey Heer. 2019. Agency plus automation: Designing artificial intelligence into interactive systems. Proceedings ofthe National Academy of Sciences 116, 6 (2019), 1844–1850.

[40] Jeffrey Heer and Ben Shneiderman. 2012. Interactive dynamics for visual analysis. Queue 10, 2 (2012), 30.[41] Jeffrey Heer, Fernanda B Viégas, and Martin Wattenberg. 2007. Voyagers and voyeurs: supporting asynchronous

collaborative information visualization. In Proceedings of the SIGCHI conference on Human factors in computing systems.ACM, 1029–1038.

[42] James D Herbsleb, Audris Mockus, Thomas A Finholt, and Rebecca E Grinter. 2001. An empirical study of globalsoftware development: distance and speed. In Proceedings of the 23rd international conference on software engineering.IEEE Computer Society, 81–90.

[43] Youyang Hou and Dakuo Wang. 2017. Hacking with NPOs: collaborative analytics and broker roles in civic datahackathons. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 53.

[44] Haiyan Huang and Eileen M Trauth. 2007. Cultural influences and globally distributed information systems develop-ment: experiences from Chinese IT professionals. In Proceedings of the 2007 ACM SIGMIS CPR conference on Computerpersonnel research: The global information technology workforce. ACM, 36–45.

[45] Project Jupyter. [n.d.]. Jupyter Notebook. Retrieved 3-April-2019 from https://jupyter.org[46] Project Jupyter. [n.d.]. JupyterLab. https://www.github.com/jupyterlab/jupyterlab[47] Kaggle. 2018. Kaggle Data Science Survey 2018. Retrieved 17-September-2019 from https://www.kaggle.com/

sudhirnl7/data-science-survey-2018/[48] Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: Interactive visual specification

of data transformation scripts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM,3363–3372.

[49] Mary Beth Kery, Bonnie E John, Patrick O’Flaherty, Amber Horvath, and Brad A Myers. 2019. Towards EffectiveForaging by Data Scientists to Find Past Analysis Choices. In Proceedings of the 2019 CHI Conference on Human Factorsin Computing Systems. ACM, 92.

[50] Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook:Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on HumanFactors in Computing Systems. ACM, 174.

[51] Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2016. The emerging role of data scientistson software development teams. In Proceedings of the 38th International Conference on Software Engineering. ACM,96–107.

[52] Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E Granger, Matthias Bussonnier, Jonathan Frederic,Kyle Kelley, Jessica B Hamrick, Jason Grout, Sylvain Corlay, et al. 2016. Jupyter Notebooks-a publishing format forreproducible computational workflows.. In ELPUB. 87–90.

[53] Robert E Kraut and Lynn A Streeter. 1995. Coordination in software development. Commun. ACM 38, 3 (1995), 69–82.[54] Sean Kross and Philip J Guo. 2019. Practitioners Teaching Data Science in Industry and Academia: Expectations,

Workflows, and Challenges. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM,263.

[55] George Lawton. 2018. The nine roles you need on your data science research team. TechTarget.https://searchcio.techtarget.com/news/252445605/The-nine-roles-you-need-on-your-data-science-research-team.

[56] Chang Han Lee. 2014. Data career paths: Data analyst vs. data scientist vs. data engineer: 3 data careers decoded andwhat it means for you. Udacity. https://blog.udacity.com/2014/12/data-analyst-vs-data-scientist-vs-data-engineer.html.

[57] Q Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing Design Practices for Explainable AIUser Experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM.

[58] RC Littell, WW Stroup, GA Milliken, RD Wolfinger, and O Schabenberger. 2006. SAS for mixed models 2nd edition.SAS Institute, Cary, North Carolina, USA (2006).

[59] Michael Xieyang Liu, Jane Hsieh, Nathan Hahn, Angelina Zhou, Emily Deng, Shaun Burley, Cynthia Taylor, AniketKittur, and Brad A Myers. 2019. Unakite: Scaffolding Developers’ Decision-Making Using the Web. In Proceedings ofthe 32nd Annual ACM Symposium on User Interface Software and Technology. ACM, 67–80.

[60] Sijia Liu, Parikshit Ram, Deepak Vijaykeerthy, Djallel Bouneffouf, Gregory Bramble, Horst Samulowitz, DakuoWang, Andrew Conn, and Alexander Gray. 2019. An ADMM Based Framework for AutoML Pipeline Configuration.arXiv:cs.LG/1905.00424


https://jupyter.org

https://www.github.com/jupyterlab/jupyterlab

https://www.kaggle.com/sudhirnl7/data-science-survey-2018/

https://www.kaggle.com/sudhirnl7/data-science-survey-2018/

http://arxiv.org/abs/cs.LG/1905.00424


[61] Yaoli Mao, Dakuo Wang, Michael Muller, Kush Varshney, Ioana Baldini, Casey Dugan, and Aleksandra Mojsilovic.2020. How Data Scientists Work Together With Domain Experts in Scientific Collaborations. In Proceedings of the2020 ACM conference on GROUP. ACM.

[62] Kate Matsudaira. 2015. The science of managing data science. Queue 13, 4 (2015), 30.[63] Ralf Mikut and Markus Reischl. 2011. Data mining tools. Wiley Interdisciplinary Reviews: Data Mining and Knowledge

Discovery 1, 5 (2011), 431–443.[64] Steven Miller. 2014. Collaborative approaches needed to close the big data skills gap. Journal of Organization design 3,

1 (2014), 26–30.[65] Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer,

Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the Conference onFairness, Accountability, and Transparency. ACM, 220–229.

[66] Michael Muller, Melanie Feinberg, Timothy George, Steven J Jackson, Bonnie E John, Mary Beth Kery, and SamirPassi. 2019. Human-Centered Study of Data Science Work Practices. In Extended Abstracts of the 2019 CHI Conferenceon Human Factors in Computing Systems. ACM, W15.

[67] Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera Liao, Casey Dugan, and ThomasErickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). ACM, New York, NY, USA,Forthcoming.

[68] Michael Muller and Dakuo Wang. 2018. Explore new features with us. Lab demo at JupyterCon 2018.[69] Michael J Muller. 2001. Layered participatory analysis: New developments in the CARD technique. In Proceedings of

the SIGCHI conference on Human factors in computing systems. ACM, 90–97.[70] Oded Nov and Chen Ye. 2010. Why do people tag?: motivations for photo tagging. Commun. ACM 53, 7 (2010),

128–131.[71] Judith S Olson, DakuoWang, Gary M Olson, and Jingwen Zhang. 2017. How people write together now: Beginning the

investigation with advanced undergraduates in a project course. ACM Transactions on Computer-Human Interaction(TOCHI) 24, 1 (2017), 4.

[72] Cathy O’neil. 2016. Weapons of math destruction: How big data increases inequality and threatens democracy. BroadwayBooks.

[73] Soya Park, Amy X. Zhang, and David R. Karger. 2018. Post-literate Programming: Linking Discussion and Code inSoftware Development Teams. In The 31st Annual ACM Symposium on User Interface Software and Technology AdjunctProceedings (UIST ’18 Adjunct). ACM, New York, NY, USA, 51–53. https://doi.org/10.1145/3266037.3266098

[74] Chris Parnin. 2013. Programmer, interrupted. In 2013 IEEE Symposium on Visual Languages and Human CentricComputing. IEEE, 171–172.

[75] Samir Passi and Steven Jackson. 2017. Data vision: Learning to see through algorithmic abstraction. In Proceedings ofthe 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 2436–2447.

[76] Samir Passi and Steven J Jackson. 2018. Trust in Data Science: Collaboration, Translation, and Accountability inCorporate Data Science Projects. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 136.

[77] D.J. Patil. 2011. Building data science teams. Stanford University. http://web.stanford.edu/group/ mmds/slides2012/s-patil1.pdf.

[78] Sarah Picard, Matt Watkins, Michael Rempel, and Ashmini Kerodal. [n.d.]. Beyond the Algorithm. ([n. d.]).[79] Kathleen H Pine and Max Liboiron. 2015. The politics of measurement and action. In Proceedings of the 33rd Annual

ACM Conference on Human Factors in Computing Systems. ACM, 3147–3156.[80] Tye Rattenbury, Joseph M Hellerstein, Jeffrey Heer, Sean Kandel, and Connor Carreras. 2017. Principles of data

wrangling: Practical techniques for data preparation. " O’Reilly Media, Inc.".[81] Adam Rule, Aurélien Tabard, and James D Hollan. 2018. Exploration and explanation in computational notebooks. In

Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 32.[82] Katie A Siek, Gillian R Hayes, Mark W Newman, and John C Tang. 2014. Field deployments: Knowing from using in

context. In Ways of Knowing in HCI. Springer, 119–142.[83] Manuel Stein, Halldór Janetzko, Daniel Seebacher, Alexander Jäger, Manuel Nagel, Jürgen Hölsch, Sven Kosub, Tobias

Schreck, Daniel Keim, and Michael Grossniklaus. 2017. How to make sense of team sport data: From acquisition todata modeling and research aspects. Data 2, 1 (2017), 2.

[84] Margaret-Anne Storey, Li-Te Cheng, Ian Bull, and Peter Rigby. 2006. Shared waypoints and social tagging to supportcollaboration in software development. In Proceedings of the 2006 20th anniversary conference on Computer supportedcooperative work. ACM, 195–198.

[85] Charles Sutton, Timothy Hobson, James Geddes, and Rich Caruana. 2018. Data diff: Interpretable, executablesummaries of changes in distributions for data wrangling. In Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery & Data Mining. ACM, 2279–2288.


https://doi.org/10.1145/3266037.3266098


[86] Bjørnar Tessem and Jon Iden. 2008. Cooperation between developers and operations in software engineering projects.In Proceedings of the 2008 international workshop on Cooperative and human aspects of software engineering. ACM,105–108.

[87] Christoph Treude, Margaret-anne Storey, and Jens Weber. 2009. Empirical studies on collaboration in softwaredevelopment: A systematic literature review. (2009).

[88] Michelle Ufford, Matthew Seal, and Kyle Kelley. 2018. Beyond Interactive: Notebook Innovation at Netflix.[89] Manasi Vartak, Harihar Subramanyam, Wei-En Lee, Srinidhi Viswanathan, Saadiyah Husnoo, Samuel Madden, and

Matei Zaharia. 2016. M odel DB: a system for machine learning model management. In Proceedings of the Workshopon Human-In-the-Loop Data Analytics. ACM, 14.

[90] Nitya Verma and Lynn Dombrowski. 2018. Confronting Social Criticisms: Challenges when Adopting Data-DrivenPolicing Strategies. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 469.

[91] Stijn Viaene. 2013. Data scientists aren’t domain experts. IT Professional 15, 6 (2013), 12–17.[92] April Yi Wang, Anant Mittal, Christopher Brooks, and Steve Oney. 2019. How Data Scientists Use Computational

Notebooks for Real-Time Collaboration. In Proceedings of the 2019 CHI Conference Extended Abstracts on HumanFactors in Computing Systems. article 39.

[93] April Yi Wang, Zihan Wu, Christopher Brooks, and Steve Oney. 2020. Callisto: Capturing the âĂĲWhyâĂİ byConnecting Conversations with Computational Narratives. In Proceedings of the 2020 CHI Conference ExtendedAbstracts on Human Factors in Computing Systems. in press.

[94] Dakuo Wang, Judith S. Olson, Jingwen Zhang, Trung Nguyen, and Gary M. Olson. 2015. DocuViz: VisualizingCollaborative Writing. In Proceedings of CHI’15. ACM, New York, NY, USA, 1865–1874.

[95] Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samu-lowitz, and Alexander Gray. 2019. Human-AI Collaboration in Data Science: Exploring Data Scientists’ Perceptions ofAutomated AI. To appear in Computer Supported Cooperative Work (CSCW) (2019).

[96] Fei-Yue Wang, Jun Jason Zhang, Xinhu Zheng, Xiao Wang, Yong Yuan, Xiaoxiao Dai, Jie Zhang, and Liuqing Yang.2016. Where does AlphaGo go: From church-turing thesis to AlphaGo thesis and beyond. IEEE/CAA Journal ofAutomatica Sinica 3, 2 (2016), 113–120.

[97] Daniel Weidele, Justin Weisz, Erick Oduor, Michael Muller, Josh Andres, Alexander Gray, and Dakuo Wang. 2020.AutoAIViz: Opening the Blackbox of Automated Artificial Intelligence with Conditional Parallel Coordinates. InIUI’20. ACM, in press.

[98] Etienne Wenger. 2011. Communities of practice: A brief introduction. (2011).[99] Marian G Williams and Vivienne Begg. 1993. Translation between software designers and users. Commun. ACM 36,

6 (1993), 102–104.[100] YuWu, Jessica Kropczynski, Patrick C Shih, and John M Carroll. 2014. Exploring the ecosystem of software developers

on GitHub and other platforms. In Proceedings of the companion publication of the 17th ACM conference on Computersupported cooperative work & social computing. ACM, 265–268.

[101] Shaoke Zhang, Chen Zhao, Qiang Zhang, Hui Su, Haiyan Guo, Jie Cui, Yingxin Pan, and Paul Moody. 2007. Managingcollaborative activities in project management. In Proceedings of the 2007 symposium on Computer human interactionfor the management of information technology. ACM, 3.

[102] Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy. 2020. Effect of Confidence and Explanation on Accuracy andTrust Calibration in AI-Assisted Decision Making. In Proceedings of the Conference on Fairness, Accountability, andTransparency. ACM.

[103] Marc-André Zöller and Marco F Huber. 2019. Survey on Automated Machine Learning. arXiv preprint arXiv:1904.12054(2019).

Received October 2019; revised January 2020; accepted January 2020


how do data science workers collaborate? roles, workflows ... · [email protected], ibm research,...

Documents