data science popup austin: conflict in growing data science organizations

27
DATA SCIENCE POP UP AUSTIN Design of Conflict Management Systems in Data Science Eduardo Ariño de la Rubia earino VP of Product & Data Scientist in Residence, Domino Data Lab

Upload: domino-data-lab

Post on 05-Jan-2017

324 views

Category:

Data & Analytics


0 download

TRANSCRIPT

DATA SCIENCEPOP UP

AUSTIN

Design of Conflict Management Systems in Data Science

Eduardo Ariño de la Rubia

earino

VP of Product & Data Scientist in Residence, Domino Data Lab

DATA SCIENCEPOP UP

AUSTIN

#datapopupaustin

April 13, 2016Galvanize, Austin Campus

Oh The Conflicts You’ll Face

Conflict in Growing Data Science Organizations

A Quick Introduction● Eduardo Ariño de la Rubia● VP of Product & Data Scientist in Residence at Domino Data Lab● Computer programmer for… too long● HPC (PVM & MPI), ML since the mid 90s● Husband, father, dog owner● I share too much on twitter (@earino)

conflict noun: conflict; plural noun: conflicts /känˌflikt/

an incompatibility between two or more opinions, principles, or interests.

4 Theories of Conflict

Theories of Conflict1. Individual Characteristics2. Social Process3. Social Structure4. Formal Theories

Individual Characteristic TheoriesThese theories focus on understanding individual aggression, and see such aggression as the source of conflict.

Conflict resolution focuses on containing or redirecting aggressive tendencies.

Examples:

1. That data scientist is hard to work with2. Steve hates it when Laura has a better answer

Social Process TheoriesSocial process theories treat conflict and conflict resolution as processes which cannot be explained entirely in terms of either individual behavior, or social structures.

Social process theorists may focus on such issues as patterns of conflict escalation, the role of conflict in society, or the relation between conflict and competition.

Examples:

1. PhDs in Physics are trained to be difficult2. That department is just ornery

Social Structure TheoriesThese theories view the social organization as the main source of conflict. Class divisions, racial or ethnic divisions or sex divisions form the basis for social conflict.

Such theories recommend one of five basic approaches to conflict resolution: avoidance, acceptance, gradual social reform, nonviolent confrontation, or violent confrontation

Examples:

1. The marketing department refuses to share their data2. Sales is always making promises we can’t keep

Formal Theories

Example:

1. We could have predicted the data lake would become a swamp, it’s a tragedy of the commons

2. We’re just going to go tit-for-tat until one of us defects

Formal theories attempt to explain conflict by use of logical or mathematical models. Formal models are both powerful and flexible, but can be difficult to understand and apply.

Company X Gets Them a Data Scientist

Evolution of Data Science in an OrganizationCompany X has a problem.

Company X realizes that they’re losing out on business, their customers seen to be one step ahead of them, and that they need to be smarter.

Company X realizes that the way to get ahead of this is by hiring someone who will help them curate their data, gain insights, and come up with ways to enhance their offering using data.

Company X is going to hire a data scientist.

Company X Now Has >=2 Problems

Phase #1 - Single Data Scientist● Probably the most productive way of running a data science department

Conflicts:

● Why does this data scientist get unfettered access to our data?● This data scientist doesn’t understand DEPARTMENT_X and is

misrepresenting the state of things● How do we know we can “trust” these models?

Conflict Types:

● Usually individual characteristics

Phase #2 - Get the Data Scientist Some Help● The data scientist gets help from either a Junior Data Scientist (rare) or more

likely a Data Engineer

Conflict:

● Usually doesn’t actually speed up anything since now the data scientist takes on larger problems and

● It turns out managing someone and coming up with good tasks for them to do isn’t a trivial task (who knew?)

Conflict Types:

● Usually individual characteristics, sometimes social structure.

Phase #3 - A Data Scientist in Every Pot● This is probably the most common place organizations stop.● Every product team / department gets their own data scientist (LinkedIn)

Conflicts

● Why are we doing so much “redundant work”?● Managing feedback loops and complexity

Read: Machine Learning: The High Interest Credit Card of Technical Debt

Conflict Types

● Lots of social process stuff, become tribes, etc...

A Short PitchThe complexity of a DS department grows quite a bit beyond this point. You need a series of conventions. It’s hard to keep experiments straight, there are wacky feedback loops, the world gets hard. There are 3 principles you should follow:

1. Focus on interests2. Build in feedback loops3. Consultation before, feedback after

In short, either build tooling that supports these conflict resolution principles at every turn, or use a platform that supports this.

Phase #4 - Why aren’t these Data Scientists under IT● A relatively rare powerplay, but I have seen it happen in “pure tech”

businesses such as SaaS, apps, and games.● Sometimes also “let’s just put Data Science under BI”

Conflicts

● Data science is not the same thing as product engineering / BI● Engineering management is poorly calibrated for EDA, feature engineering,

amorphous poorly specified goals, etc…● You will be forced to use Agile

Conflict Types

● Pretty much everything.

A Quick Sidebar About Agile...

10 Assumptions of Agile for Software (there’s more)1. Teams stay together over time2. People are specializing generalists3. People are engaged and motivated4. Teams deliver products5. Projects come to teams6. Teams are loosely coupled to the organization7. Teams have minimal external dependencies8. Fully engaged customers9. Established architecture and processes

10. There are clearly understood goals and metrics

How do those assumptions stack up for DS?1. Teams stay together over time2. People are specializing generalists3. People are engaged and motivated4. Teams deliver products (well, maybe?)5. Projects come to teams6. Teams are loosely coupled to the organization7. Teams have minimal external dependencies8. Fully engaged customers9. Established architecture and processes

10. There are clearly understood goals and metrics

Phase #5 - COE / Internal Consulting Model● Probably the most successful model I have seen● Data science reports to data science leadership, but individual data scientists

are deployed to project teams● Very flexible

Conflicts

● Constant fight for resources (now you’re just another department)● Challenging to invest time and effort to learn specifics of silos in the business

Conflict Types

● Social structure

Conclusion1. Try to understand what is driving conflict in your organization2. Apply 3 principles of dispute resolution

a. Interests firstb. Build in feedback loopsc. Consultation before, feedback after

3. Build / Use tooling which allows you to formalize conflict resolution processes, so that each time is not an ad-hoc adventure

4. Be data informed, not data driven. Remember that sampling a data generating process creates bias.

5. Figure out where you are in your organization’s development and where you want to be.

DATA SCIENCEPOP UP

AUSTIN

@datapopup #datapopupaustin