data science popup austin: data do's and dont's: lessons from the front line

26
DATA SCIENCE POP UP AUSTIN Data Do's and Dont's: Lessons From the Front Line Ryan Orban VP of Product and Strategy, Data Scientist, Galvanize ryanorban

Upload: domino-data-lab

Post on 16-Apr-2017

783 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

DATA SCIENCEPOP UP

AUSTIN

Data Do's and Dont's: Lessons From the Front Line

Ryan OrbanVP of Product and Strategy,

Data Scientist, Galvanize

ryanorban

Page 2: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

DATA SCIENCEPOP UP

AUSTIN

#datapopupaustin

April 13, 2016Galvanize, Austin Campus

Page 4: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

Data Do’s and Dont’s: Lessons from the Frontline

Page 5: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

Co-Founder & CEO Zipfian Academy

Ryan Orban @ryanorban

EVP of Product and Strategy Galvanize

Page 6: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

We believe an opportunity belongs to anyone with aptitude and ambition.

Page 7: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

4Galvanize 2015

NODES ON THE NETWORK

COLORADO (BOULDER, DENVER, FORT COLLINS)

SEATTLE, WA

SAN FRANCISCO, CA

AUSTIN, TX (OPENING Q1 2016)

Programs: Full Stack Immersive, Data Science Immersive, Entrepreneurship

Programs: Full Stack Immersive, Data Science Immersive, Entrepreneurship

Programs: Full Stack Immersive, Data Science Immersive, Data Engineering Immersive, Masters of Science in Data Science, Entrepreneurship

Programs: Full Stack Immersive, Data Science Immersive, Entrepreneurship

[Explanation Text]

Page 8: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

5Galvanize 2015

5 PROGRAMS

• Full Stack Immersive

• Data Science Immersive

• Data Engineering Immersive

Project over 500 Student Member Graduates in 2015

Currently over 1500 Members

• Master of Science in Data Science (University of New Haven)

• Startup Membership

Page 9: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

6Galvanize 2015

PLACEMENT STATS

FULL STACK IMMERSIVE DATA SCIENCE IMMERSIVE

$43K $77KPre-program Salary

Average Starting Salary

97% Placement Rate*

*Galvanize is a founder member of NESTA (New Economy Skills Training Association), a trade organization founded to regulate the new “bootcamp” market. This place rate is more rigorous than that requested by state licensure agencies. The placement rate is calculated 6 months after graduation.

$72K $114KPre-program Salary

94% Placement Rate*

Average Starting Salary

Page 10: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

Software Engineering

Data Science

Data Analysis

Data Engineering

Machine Learning Java

Linux, UNIX

Mobile Development

Objective C

C, C++, C#

Web Development

Ruby on Rails

JavaScript

Front-endPHP

Full-Stack

Excel

Python

SQL

NLPHadoop

Databases

Network Analysis

Java

AssemblyStatistics

R

The orange words are the most important things we teach.

How These Things Relate to Each Other

Full-Stack Web Development and Data Science are in gray circles.

Page 11: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

8Galvanize 2015

DATA SCIENCE IMMERSIVE

Week 1 - Exploratory Data Analysis and Software Engineering Best Practices

Week 2 - Statistical Inference, Bayesian Methods, A/B Testing, Multi-Armed Bandit

Week 3 - Regression, Regularization, Gradient Descent

Week 4 - Supervised Machine Learning: Classification, Validation, Ensemble Methods

Week 5 - Clustering, Topic Modeling (NMF, LDA), NLP

Week 6 - Network Analysis, Matrix Factorization, and Time Series

Week 7 - Hadoop, Hive, and MapReduce

Week 8 - Data Visualization with D3.js, Data Products, and Fraud Detection Case Study

Weeks 9-10 - Capstone Projects

Week 12 - Onsite Interviews

Page 12: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line
Page 13: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

Data Manipulation Model Creation Prediction

Page 14: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

Data Manipulation

Page 15: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

Do

Don’t

• Assume your data is friendly • ETL and feature engineering is largely opaque to others (and yourself after enough time away)

• Automate cleaning and transformation pipelines • Jupyter and RStudio are great for EDA, but have issues with collaboration and version control

• Build functional code to be reused; export into plain code files, track with Git

Page 16: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

Model Creation

Page 17: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

Do

Don’t• Never use accuracy as your main metric

• You can have 99% accuracy but 0% predictive power • Unbalanced classes; sampling

• Use metrics like precision and recall • Aggregate metrics like F1-score, AUC/AIC/BIC also good • Remember that models with highest scores are not always the ones you need; permissive vs. conservative based on use case

Page 18: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

Do

Don’t• Don’t start with the most complicated models first (deep learning, gradient boosting, SVMs, etc.)

• Don’t focus on the algorithm •“More data always beats better algorithms” • But better features usually beat better algorithms*

• Start with a baseline model, then continuously “close the loop” • Create a base case to optimize against • Does 1% greater F1-score outweigh a 10x training time in production? Not usually unless you’re Google-scale.

Page 19: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

Do

Don’t

• Assume your cross-validation metrics will hold up against real-life data

• Separate your application and prediction code • Fast iteration cycles are key. Create a “scoring service” that is uncoupled from application code.

• APIs & service oriented architectures typically work best

Page 20: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

Communication

Page 21: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

Do

Don’t

• Don’t focus on the “how”, i.e. cover every trial and tribulation

• Cut to the chase • After a presentation, I always ask the class two questions: • What is one sentence that describes what the speaker learned? • Why do I care?

Page 22: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

19Galvanize 2015

• Early Access to Students

• Candidate Matching

• Curriculum Development

• Corporate Student Sponsorship

• Diversity

TALENT

Page 23: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

20Galvanize 2015

• Membership

• Organic Relationships

• Course Content

• Mentorship

• Community

• Events

ACCESS

Page 24: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

21Galvanize 2015

• Galvanize Experts

• Capstone Projects

• Internship

• Corporate Training

EXPERTISE

Page 25: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

THANK YOURYAN ORBAN | EVP, STRATEGY [email protected] @ryanorban

www.galvanize.com

Page 26: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

DATA SCIENCEPOP UP

AUSTIN

@datapopup #datapopupaustin