machine learning the product

Machine Learning the Product Boxun Zhang, Spotify

about me

Data Scientist at Spotify • Model and understand user retention • General measurement and behavior analysis

Ph.D. in CS from TU Delft • Studied user behaviors in BitTorrent

A/B testing is a popular method for learning your product, but traditional A/B testing techniques are

insufficient for learning from your A/B test.

In this presentation, I will introduce a different method used at Spotify for analyzing A/B test.

A/B testing is a popular method for learning your product, but traditional A/B testing techniques are

insufficient for learning from your A/B test.

learning your product is crucial

measure performance


measure performanceidentify problems


measure performanceidentify problems


discover opportunities

everyone knows A/B testing

everyone knows A/B testing

more or less

a hypothesis

control test

a hypothesis

a change

control test

a hypothesis

a change

testcontrolcontrol test

a hypothesis

measure effect

testcontrol

effectchange

is this good enough?

with traditional A/B testing techniques, we can only learn from A/B test in a rather superficial way

we can measure the size of the effect, but often don’t know the cause of the effect

product can be complex

Credit: Image by Eric Long, National Air and Space Museum, Smithsonian Institution

user behavior can be complex, too

sometimes, we introduce big changes

changeproduct

change

changechangechange

…

product

behavior

change

effectchange

changechangechange

…

product

behavior

change

we need to ask a different question

which behavior change has the biggest impact on the effect?

predictive powerimpact

our brain is great, but not at this

the machine is

solution

machine learning problemA/B test

1. determine type of problem

classificationretention

conversionclick-through

rate

classificationretention

conversionclick-through

rate

regressionlifetime value

session length

2. prepare user-level measurement

group level user level

change

change_1change_2change_3

…

product

behavior

change_n

change


…

product

behavior

change_n

m_1m_2m_3…

measurement

m_p

change


…

product

behavior

change_n

m_1m_2m_3…

measurement

m_p

target

effect

change


…

product

behavior

change_n

m_1m_2m_3…

measurement

m_p

target

effect

p > n

measurement machine learning features

actionable features

non-actionable featuresvs.

session lengthskip rate

offline streams

actionable features

session lengthskip rate

offline streams

actionable features

countryage

gender

non-actionable features

actionable features are used to build models

actionable features are used to build models

non-actionable features are used to segment users

this is essentially feature engineering, and we need rich, high quality features

“…some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used.”

— Pedro Domingos

A few useful things to know about machine learning. Pedro Domingos.

3. model selection

gradient boosting machine1

XGBoost2: a better variance of GBM

1: Greedy function approximation: A gradient boosting machine. Jerome H. Friedman

2: XGBoost: A Scalable Tree Boosting System. Tianqi Chen, Carlos Guestrin

this is not a kaggle competition

prediction is not the goal

feature/variable importance is

feature importance is measured as the improvement in accuracy brought by a feature to the branches it

is on, then summarized over all the trees

https://xgboost.readthedocs.io/en/latest/R-package/discoverYourData.html

https://xgboost.readthedocs.io/en/latest/R-package/discoverYourData.html

once a model is trained and validated, the features of highest importance are the ones with the biggest impact on the target/effect

examine top features

examine top features

compare between test and control

derive informative, high-level metrics

this can be an iterative process, as we can build new models to understand which

features have big impact on top features

why not linear models?

linear models have relatively weak predictive power, and are not robust

why not Random ForestsTM

feature importance is inaccurate (biased) for correlated features

Bias in random forest variable importance measures: Illustrations, sources and a solution. Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn

applications at Spotify

activationretention

churn

thank you

machine learning the product

Data & Analytics