dan mallinger – data science practice manager, think big analytics at mlconf atl

Post on 05-Dec-2014

1.199 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

This talk will introduce a paradigm for enabling access to large, unstructured, and novel datasets in enterprises, while retaining value from existing tools and staff. By following a real world example, the discussion will walk through how small, central data science teams can make data discoveries and data value accessible to others. We will also review the tools, data science approaches, and best practices to uncovering, polishing, and digesting signal in data to support analytics at the front lines of business.

TRANSCRIPT

Organizing for Data Science

Dan Mallinger Data Science Practice Manager

September 2014

CONFIDENTIAL | 2

•  Data Science Practice Manager −  Think Big Analytics

•  Working with clients across −  Financial Services −  Advertising −  Manufacturing −  Social −  Network Providers

Dan Mallinger

CONFIDENTIAL 2

CONFIDENTIAL | 3

•  Define Data Science in the Organization •  Look at Current Perspectives on Organization •  Discuss Shortcomings •  Review a Real World Solution

Today

CONFIDENTIAL 3

CONFIDENTIAL | 4

�  Use Data to Improve Our Business

�  Better Understand Customers �  Act Proactively, Not Reactively

What Do We Hope to Do?

CONFIDENTIAL 4

CONFIDENTIAL | 5

�  Scale �  Robustness �  Repeatability

Why Organize?

CONFIDENTIAL 5

CONFIDENTIAL | 6

�  Revolutionizing Ad Targeting �  Automating Deals and

Recommendations �  Alerting Admins to New Network

Attacks

Perception: What Does Data Science Do?

CONFIDENTIAL 6

CONFIDENTIAL | 7

�  Specific Data Expertise �  Exploratory Analysis �  Modeling �  Creativity �  Programming �  Big Data �  Communication

�  Ability to Target Impact �  Unstructured Analysis

�  Organizational Politics

�  Visualization

�  …

What Does It Take?

CONFIDENTIAL 7

CONFIDENTIAL | 8

�  Centralized - Brings data, analysis, and

processing together - Data scientists support one

another �  Distributed

- Data scientists close to business - Multiple models for rotating

data scientists into lines of business

The New Toy: A Center of Excellence

CONFIDENTIAL 8

CoE

Line of Business A

Line of Business B

Line of Business C

CONFIDENTIAL | 9

�  Specific Data Expertise �  Exploratory Analysis �  Modeling �  Creativity �  Programming �  Big Data �  Communication

�  Ability to Target Impact �  Unstructured Analysis

�  Organizational Politics

�  Visualization

�  …

What Does It Still Take?

CONFIDENTIAL 9

CONFIDENTIAL | 10

�  Designed a great home for unicorns �  But they are still unicorns

CONFIDENTIAL 10

If You Build It, They Will Come?

CONFIDENTIAL | 11

�  Unravel Capability �  Map Activities to Functional Roles �  Align Functions with Process,

Not Individuals

�  Don’t Forget to Scale

Working with Horses, Not Unicorns

CONFIDENTIAL 11

CONFIDENTIAL | 12

�  Identify Fraudulent Sessions �  Cross Channel Analysis �  Next Best Action �  Optimize Pathways �  Determine Session Interest �  Customizing Experience �  Proactive Outreach �  Search Analysis

�  Content Optimization

CLIENT EXAMPLE Clickstream Data in Action

CONFIDENTIAL 12

CONFIDENTIAL | 13

�  Billions of clicks �  Unstructured data �  How do we model it?! �  Model the SIGNAL �  Not the data

CLIENT EXAMPLE Scaling Data Science

CONFIDENTIAL 13

CONFIDENTIAL | 14

CLIENT EXAMPLE Clickstream Data Science in Action

CONFIDENTIAL 14

Hadoop 1.0

MPP Web

Feature Selection & Dimensionality Reduction

CONFIDENTIAL | 15

�  Feature Selection - Forests - Clustering

�  Dimensionality Reduction - SVM

�  Challenges - Job Latency - Limited Iterations

CLIENT EXAMPLE Extracting Signal: Hadoop 1.0

CONFIDENTIAL 15

CONFIDENTIAL | 16

•  Spark −  Faster response in exploration −  Better Support for Iterative Models

•  Genetic Algorithms •  Neural Networks

•  Challenges −  In memory: costly and limiting −  MapReduce does not go away

CLIENT EXAMPLE Extracting Signal: Hadoop 2.0

CONFIDENTIAL 16

CONFIDENTIAL | 17

�  Focus on Technical Skills - EDA - Modeling - Programming / Big Data

�  Communication Skills - Capturing signal needs - Iterating with stakeholders

CLIENT EXAMPLE Horses, Not Unicorns

CONFIDENTIAL 17

Hadoop 1.0

CONFIDENTIAL | 18

•  Continue to make signal available to analysts −  Next up: Extracting signal from text

•  Act as a capability search party −  Sprints of new insights and tools

•  Finalize operating model −  Funding structure −  Engagement model with lines of business

CLIENT EXAMPLE CoE Next Steps

CONFIDENTIAL 18

CONFIDENTIAL | 19

Discussion Over Drinks

CONFIDENTIAL 19

top related