dan mallinger – data science practice manager, think big analytics at mlconf atl
DESCRIPTION
This talk will introduce a paradigm for enabling access to large, unstructured, and novel datasets in enterprises, while retaining value from existing tools and staff. By following a real world example, the discussion will walk through how small, central data science teams can make data discoveries and data value accessible to others. We will also review the tools, data science approaches, and best practices to uncovering, polishing, and digesting signal in data to support analytics at the front lines of business.TRANSCRIPT
Organizing for Data Science
Dan Mallinger Data Science Practice Manager
September 2014
CONFIDENTIAL | 2
• Data Science Practice Manager − Think Big Analytics
• Working with clients across − Financial Services − Advertising − Manufacturing − Social − Network Providers
Dan Mallinger
CONFIDENTIAL 2
CONFIDENTIAL | 3
• Define Data Science in the Organization • Look at Current Perspectives on Organization • Discuss Shortcomings • Review a Real World Solution
Today
CONFIDENTIAL 3
CONFIDENTIAL | 4
� Use Data to Improve Our Business
� Better Understand Customers � Act Proactively, Not Reactively
What Do We Hope to Do?
CONFIDENTIAL 4
CONFIDENTIAL | 5
� Scale � Robustness � Repeatability
Why Organize?
CONFIDENTIAL 5
CONFIDENTIAL | 6
� Revolutionizing Ad Targeting � Automating Deals and
Recommendations � Alerting Admins to New Network
Attacks
Perception: What Does Data Science Do?
CONFIDENTIAL 6
CONFIDENTIAL | 7
� Specific Data Expertise � Exploratory Analysis � Modeling � Creativity � Programming � Big Data � Communication
� Ability to Target Impact � Unstructured Analysis
� Organizational Politics
� Visualization
� …
What Does It Take?
CONFIDENTIAL 7
CONFIDENTIAL | 8
� Centralized - Brings data, analysis, and
processing together - Data scientists support one
another � Distributed
- Data scientists close to business - Multiple models for rotating
data scientists into lines of business
The New Toy: A Center of Excellence
CONFIDENTIAL 8
CoE
Line of Business A
Line of Business B
Line of Business C
CONFIDENTIAL | 9
� Specific Data Expertise � Exploratory Analysis � Modeling � Creativity � Programming � Big Data � Communication
� Ability to Target Impact � Unstructured Analysis
� Organizational Politics
� Visualization
� …
What Does It Still Take?
CONFIDENTIAL 9
CONFIDENTIAL | 10
� Designed a great home for unicorns � But they are still unicorns
CONFIDENTIAL 10
If You Build It, They Will Come?
CONFIDENTIAL | 11
� Unravel Capability � Map Activities to Functional Roles � Align Functions with Process,
Not Individuals
� Don’t Forget to Scale
Working with Horses, Not Unicorns
CONFIDENTIAL 11
CONFIDENTIAL | 12
� Identify Fraudulent Sessions � Cross Channel Analysis � Next Best Action � Optimize Pathways � Determine Session Interest � Customizing Experience � Proactive Outreach � Search Analysis
� Content Optimization
CLIENT EXAMPLE Clickstream Data in Action
CONFIDENTIAL 12
CONFIDENTIAL | 13
� Billions of clicks � Unstructured data � How do we model it?! � Model the SIGNAL � Not the data
CLIENT EXAMPLE Scaling Data Science
CONFIDENTIAL 13
CONFIDENTIAL | 14
CLIENT EXAMPLE Clickstream Data Science in Action
CONFIDENTIAL 14
Hadoop 1.0
MPP Web
Feature Selection & Dimensionality Reduction
CONFIDENTIAL | 15
� Feature Selection - Forests - Clustering
� Dimensionality Reduction - SVM
� Challenges - Job Latency - Limited Iterations
CLIENT EXAMPLE Extracting Signal: Hadoop 1.0
CONFIDENTIAL 15
CONFIDENTIAL | 16
• Spark − Faster response in exploration − Better Support for Iterative Models
• Genetic Algorithms • Neural Networks
• Challenges − In memory: costly and limiting − MapReduce does not go away
CLIENT EXAMPLE Extracting Signal: Hadoop 2.0
CONFIDENTIAL 16
CONFIDENTIAL | 17
� Focus on Technical Skills - EDA - Modeling - Programming / Big Data
� Communication Skills - Capturing signal needs - Iterating with stakeholders
CLIENT EXAMPLE Horses, Not Unicorns
CONFIDENTIAL 17
Hadoop 1.0
CONFIDENTIAL | 18
• Continue to make signal available to analysts − Next up: Extracting signal from text
• Act as a capability search party − Sprints of new insights and tools
• Finalize operating model − Funding structure − Engagement model with lines of business
CLIENT EXAMPLE CoE Next Steps
CONFIDENTIAL 18
CONFIDENTIAL | 19
Discussion Over Drinks
CONFIDENTIAL 19