building successful data science applications

23
Building successful data science applications 6 concepts every data science team needs to understand Niels Kasch

Upload: datasciencemd

Post on 15-Jan-2017

189 views

Category:

Data & Analytics


13 download

TRANSCRIPT

Page 1: Building Successful Data Science Applications

Building successful data science applications

6 concepts every data science team needs to understand

Niels Kasch

Page 2: Building Successful Data Science Applications

About Niels• Co-founder of Miner & Kasch• ML and NLP• Data enthusiast• Probably knows your 401(k) balance• Ph.D. in Computer Science from UMBC

https://www.linkedin.com/in/nielskasch

@nielskasch

Page 3: Building Successful Data Science Applications

Purpose of this talk

Lessons learned from observations of Data Science in the wild

How to make the most or get the most out of your DS team?What to watch out for when you start doing data science?

Page 4: Building Successful Data Science Applications

What makes a data science application successful?Yay• Uses data• Uses models and ML algorithms• Are deployed• Scale• Require no human in the operational

loop• Inform decision-making• Have a large impact

Nay• Not actionable• Not repeatable• Dies on a PowerPoint• Tied to the data scientist who made it• Not scalable

Page 5: Building Successful Data Science Applications

#1 - Position the data science team appropriately within the organization

Pros

• DSaaS - CDS prioritizes projects• Entire org can utilize DS resources• Objective analytics• Sharing of analytics knowledge

Cons• Trust and relationships with

business units• Domain knowledge depth

Data Science Team

Chief Data Scientist

Central

Business Unit

… Business Unit

Page 6: Building Successful Data Science Applications

#1 - Position the data science team appropriately within the organization

Pros• Business units more likely to

embrace DS efforts• Fast turn around to business unit

requests• Specialization of data scientists in

business unit’s data and processes

Cons• Potential for siloed analytics and

data assets• Duplication of infrastructure

Within Business Units

Business Unit

Data Science Team

Business Unit

Data Science Team

Page 7: Building Successful Data Science Applications

#1 - Position the data science team appropriately within the organization

Pros• Attempts to use the best of both

worlds• Fosters knowledge sharing across

DS team and business units• Diversity for Data Scientists

Cons• Potential for mixed objectives from

two bosses• Prioritization of DS efforts

Mixed

Data Science Team

Chief Data Scientist

Business Unit

Page 8: Building Successful Data Science Applications

#1 - Position the data science team appropriately within the organization

Why is organizational structure so important?

What happens if you don’t have the right structure?

Page 9: Building Successful Data Science Applications

#2 Assemble the right team• No one person can do it all

Team properties

Business acumenExpertise in statistics

Data wranglerDomain expertisePeople who know the tools

Expertise in machine learning

Page 10: Building Successful Data Science Applications

#2 Assemble the right team

• Data Scientist• Statistician – has a strong methodological background• ML expert – develops predictive and explanatory models• Data analyst – exhibits strong communication skills (presentation,

visualization)• Business analyst – is a domain expert and understands business needs

• Data Engineer – versatile on full stack, performs ETL, model operationalization

• Data Architect - streamlines, centralizes, and maintains data assets

• Project manager – manages people and projects, understands tools, methods, relates to the business

Team composition

Page 11: Building Successful Data Science Applications

#2 - Assemble the right team

Why is the right team important?

What happens if you don’t have the right team?

Page 12: Building Successful Data Science Applications

#3 - Conduct repeatable data science through processes• CRISP-DM, SEMMA, ASUM-DM, OSEMN, SCRUM

• Iterate fast and often• Keep aligned with business

Page 13: Building Successful Data Science Applications

#3 - Data Science ProcessData science is an interactive process between SMEs and data scientists

• Stakeholder involvement○ IT (DBA, Data Architect)○ Business stakeholders

• Stakeholder is involved in every aspect of the process○ Define problem according to business need (value & impact)○ Knowledge transfer from subject matter experts○ Review progress and provide feedback

Define problem

Explore data

Develop features

Create model

Training & Documentation

Deploy analytics

• Define use case• with business• stakeholders and • SME• Define dependent• variable

• Integrate data • assets• Check data• completeness• Develop data• dictionaries and• data summary• statistics

• Derive• independent• variables in• support of• modeling task• Impute missing• values

• Develop predictive • and explanatory • model• Answer the why • and what of the • business problem

• Transition model • from dev to prod • environment• Code review and • optimization

• Document feature • and model details• Provide training to • analytics and IT • stakeholders

Page 14: Building Successful Data Science Applications

#3 - Conduct repeatable data science through processes

Why is it important to have a process?

What happens if you don’t have the right process?

Page 15: Building Successful Data Science Applications

#4 - Foster the right atmosphereFor the organization• Collaboration

• Partner with stakeholders in every step of the process• Avoid the us vs. them mentality• Pair analytics/programming

• Analytics-driven enterprise• Enable business stakeholders to play with DS output• Make the business as a whole smarter

• Sensible Analytics• Quality of analytics requires the freedom to fail• Relate analytics to key initiatives, KPIs, and drivers

Page 16: Building Successful Data Science Applications

#4 - Foster the right atmosphereFor the team• Collaboration

• Encourage team members to learn from each other • Encourage team members to learn from and share with the business

• Enable ‘quiet time’• Establish a no fear mentality• Provide diversity on analytics tasks• Provide time to keep up with tech and academia

Page 17: Building Successful Data Science Applications

#4 - Foster the right atmosphere

Why is the right atmosphere important?

What happens if you don’t have the right atmosphere?

Page 18: Building Successful Data Science Applications

#5 - Ensure access to data• Beak down data silos• Reduce time to analysis• Chief Data Officer – governance and utilization of data assets in an org

Inte

rnal

Exte

rnal

Transactions Sales Promotions

Inventory Products CRM

Demographics

Web/app usage

Video Call center Surveys

...

Weather Social media

Factual

Traffic

Public/Govt.

...

Stru

ctur

ed a

nd u

nstr

uctu

red

CRM

EDW

HR

...

Organizational/BUdata silos

Data Lake

Page 19: Building Successful Data Science Applications

#5 - Ensure access to data

Why is access to data important?

What happens if you don’t have access to data?

Page 20: Building Successful Data Science Applications

#6 - Provide the right tools• Flexible stack to let people work with what they know• Enable rapid exploration• Volume, Velocity, Variety, Veracity of data

Data Lake

Page 21: Building Successful Data Science Applications

#6 - Provide the right tools

Why are the right tools important?

What happens if you don’t have the right tools?

Page 22: Building Successful Data Science Applications

Wrap up

#1 - Position

within the org

#2 - The right team

#3 - Repeatable

data science through

processes

#4 - The right

atmosphere

#5 - Access to data

#6 - The right tools

Successful data science applications

Page 23: Building Successful Data Science Applications

Thanks

Niels Kasch

www.minerkasch.com

https://www.linkedin.com/in/nielskasch

@nielskasch