data science: a mindset for productivity

Post on 28-Jul-2015

1.660 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Science: A Mindset for ProductivityDaniel Tunkelang

@dtunkelang

Daniel

tl;dr

The most important part of data science is pickingthe right problem and figuring out how to frame it.

We’re all technologists, right?

But nobody knows everything.*Class HashMap<K,V>

java.lang.Objectjava.util.AbstractMap<K,V>

java.util.HashMap<K,V>

Type Parameters:

K - the type of keys maintained by this mapV - the type of mapped values

All Implemented Interfaces:Serializable, Cloneable, Map<K,V>

*Except Jeff Dean.

Math and computer science matter…

But you have to solve the right problem.

Stay friends with your exes.

explainexpress

experiment

Data science is a mindset.

ExplainIterate using explainable models.

ExpressModel your utility and inputs.

ExperimentOptimize for speed of learning.

Explain

With apologies to the little prince.

Deep learning is the new black.

But accuracy isn’t everything.

The importance of being explainable.• Algorithms can protect you from overfitting, but they can’t

protect you from the biases you introduce.

• Introspection into your models and features makes it easier for you and others to debug them.

• Especially if you don’t completely trust your objective function or representativeness of your training data.

Linear models? Decision trees?• Linear regression and decision trees favor explainability over accuracy,

compared to more sophisticated models.

• But size matters. If you have too many features or too deep a decision tree, you lose explainability.

• You can always upgrade to a more sophisticated model when you trust your objective function and training data.

• Build a machine learning model is an iterative process. Optimize for the speed of your own learning.

Express

Machine learning for dummies.• Define objective function.• Collect training data.• Build models.• Profit!

You only improve what you measure.

Clicks?

Actions?

Outcomes?

Sometimes accuracy is complicated.

What’s your error function?

Consider stratified sampling.

Experiment

How to find your prince.You have to kiss a lot of frogs to find one prince. So how can you find your prince faster?

By finding more frogs andkissing them faster and faster.

-- Mike Moran

Think like an economist.Yesterday

Experiments are expensive,

choose hypotheses wisely.

TodayExperiments are cheap,

do as many as you can!

But don’t forget you’re a scientist.

Optimize for the speed of learning.

Test one variable at a time.• Autocomplete• Entity Tagging• Vertical Intent• # of Suggestions• Suggestion Order• Language• Query Construction• Ranking Model

tl;dr

The most important part of data science is pickingthe right problem and figuring out how to frame it.

Daniel Tunkelangdtunkelang@gmail.com

https://linkedin.com/in/dtunkelang@dtunkelang

top related