how to conclude online experiments in python

54
volodymyrk How to conclude online experiments in Python Volodymyr (Vlad) Kazantsev Head of Data Science at Product Madness

Upload: volodymyr-kazantsev

Post on 07-Aug-2015

61 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: How to conclude online experiments in python

volodymyrk

How to conclude online experiments

in PythonVolodymyr (Vlad) Kazantsev

Head of Data Science at Product Madness

Page 3: How to conclude online experiments in python

volodymyrk

Goal of the tutorial

Uncover the “magic” behind statistics used for A/B testing and other online experiments

Page 4: How to conclude online experiments in python

volodymyrk

● Head of Data Science (Social Gaming)

● Product Manager at King

● MBA at London Business School

● Visual Effect developer (Avatar, Batman, ...)

● MSc in Probability (Kiev Uni, Ukraine)

A quick bioNow

2004

Page 5: How to conclude online experiments in python

volodymyrk

Different kinds of tests

● Classic A/B tests

● Long running activities with control groups

● Longitudinal tests

Page 6: How to conclude online experiments in python

volodymyrk

Why bother?

● To test your hypothesis and learn● To avoid blindly following HiPPOs● To audit performance of product and

marketing teams

Page 7: How to conclude online experiments in python

volodymyrk

Why Stats?

● To separate data from the noise● To quantify uncertainty

Page 8: How to conclude online experiments in python

volodymyrk

Fruit Crush Epic

The Story of almost real mobile game, in the almost real gaming company.. and one Data Scientist

Page 9: How to conclude online experiments in python

volodymyrk

Day-13 seconds panic-attack

Page 10: How to conclude online experiments in python

volodymyrk

Day 1 - loading time panic-attack!Fruit Crush Epic

Page 11: How to conclude online experiments in python

volodymyrkTaxonomy of Classical stat testing

Which Test?

1 Sample

2 Samples

>2 Samples

Mean

Proportion

Variance

σ known

σ unknown

z-test one sample

t-test one sample

z-test for proportion

Chi-squared test

Mean

Proportion

Variance

ANOVA

z-test for (μ1-μ2)

t-test for (μ1-μ2)

z-test or t-test for dependent samples

z-test, 2 proportions

independent

dependent samples

σ1,σ2 known

σ1,σ2 unknown

F-test

Page 12: How to conclude online experiments in python

volodymyrkTaxonomy of Classical stat testing

Which Test?

1 Sample

2 Samples

>2 Samples

Mean

Proportion

Variance

σ known

σ unknown

z-test one sample

t-test one sample

z-test for proportion

Chi-squared test

Mean

Proportion

Variance

ANOVA

z-test for (μ1-μ2)

t-test for (μ1-μ2)

z-test or t-test for dependent samples

z-test, 2 proportions

independent

dependent samples

σ1,σ2 known

σ1,σ2 unknown

F-test

Page 13: How to conclude online experiments in python

volodymyrk

One sample t-testNull Hypothesis:- avg. loading time <=3 seconds for last hour's observation

Alternative Hypothesis:- population mean is >3 seconds for last hour's observation

Test:- single sample, one-sided t-test.

Page 14: How to conclude online experiments in python

volodymyrk

One sample t-test

t_value = t-test(samples, expected mean)

p-value: 0.086 probability of obtaining the result as extreme as observed, assuming Null-hypothesis is true

t-distribution lookup(t_value, sample_size)

Page 15: How to conclude online experiments in python

volodymyrk

If you want to code it yourself

Page 16: How to conclude online experiments in python

volodymyrk

Stats in Python

numpy

scipy.stats

statsmodels.stats

theano

pymc3

Classical Bayesian

* High-level view. Lot’s of stuff missing here. pymc3 uses statsmodels for GLM

Page 17: How to conclude online experiments in python

volodymyrk

One sample t-test and z-test

Page 18: How to conclude online experiments in python

volodymyrk

Confidence Interval

Page 19: How to conclude online experiments in python

volodymyrk

Confidence Interval for the Mean

Page 20: How to conclude online experiments in python

volodymyrk

Standard Error of the Mean in Python

Page 21: How to conclude online experiments in python

volodymyrk

Next Day

Page 22: How to conclude online experiments in python

volodymyrk

Day-2OMG, my Retention is low!

Page 23: How to conclude online experiments in python

volodymyrk

Is my day-1 retention low?

Day-1 results:

installs 448

returned next day 123

Day-1 retention 27.46%

Retention target 30%

Fruit Crush Epic

Page 24: How to conclude online experiments in python

volodymyrkTaxonomy of Classical stat testing

Which Test?

1 Sample

2 Samples

>2 Samples

Mean

Proportion

Variance

σ known

σ unknown

z-test one sample

t-test one sample

z-test for proportion

Chi-squared test

Mean

Proportion

Variance

ANOVA

z-test for (μ1-μ2)

t-test for (μ1-μ2)

z-test or t-test for dependent samples

z-test, 2 proportions

independent

dependent samples

σ1,σ2 known

σ1,σ2 unknown

F-test

Page 25: How to conclude online experiments in python

volodymyrk

One sample z-test for proportionNull Hypothesis:- avg. retention >=30%

Alternative Hypothesis:- avg. retention <30%

Test:- single sample, one-sided z-test for proportion

Page 26: How to conclude online experiments in python

volodymyrk

In Python...

Page 27: How to conclude online experiments in python

volodymyrk

So what is my confidence interval?

Page 28: How to conclude online experiments in python

volodymyrk

Day-5Connect with Facebook or Die!

The First A/B test

Page 29: How to conclude online experiments in python

volodymyrk

A/B test 1 - connect to Facebook

Page 30: How to conclude online experiments in python

volodymyrk

A/B test design

Group A

Group B Start Level 1

Start Level 1

Finish Level 1

50%

50%

Have seen prompt 2501

Connected 1104

Connect rate 44.1%

Have seen prompt 2141

Connected 1076

Connect rate 50.2%

Fruit Crush Epic

Page 31: How to conclude online experiments in python

volodymyrk

Is it statistically significant?Fruit Crush Epic

Page 32: How to conclude online experiments in python

volodymyrkTaxonomy of Classical stat testing

Which Test?

1 Sample

2 Samples

>2 Samples

Mean

Proportion

Variance

σ known

σ unknown

z-test one sample

t-test one sample

z-test for proportion

Chi-squared test

Mean

Proportion

Variance

ANOVA

z-test for (μ1-μ2)

t-test for (μ1-μ2)

z-test or t-test for dependent samples

z-test, 2 proportions

independent

dependent samples

σ1,σ2 known

σ1,σ2 unknown

F-test

Page 33: How to conclude online experiments in python

volodymyrk

Two samples z-test for proportionNull Hypothesis:- avg. connection rate is the same. P1 = P2

Alternative Hypothesis:- P1 ≠ P2

Test:- two samples z-test for proportion. Two sided

Page 34: How to conclude online experiments in python

volodymyrk

Two samples z-test for proportion in Python

Page 35: How to conclude online experiments in python

volodymyrk

Confidence interval for difference in proportion

Page 36: How to conclude online experiments in python

volodymyrk

In Python

Page 37: How to conclude online experiments in python

volodymyrk

What should we measure, exactly?

1000

1000

150

400

450

30

390

430

160

840

40

400

400

connected: 47%retained: 82%

connected: 50%retained: 80%Start

Level 1

Start Level 1

Start Level 2

Start Level 2

Page 38: How to conclude online experiments in python

volodymyrk

What about Bayesian Stats?

Page 39: How to conclude online experiments in python

volodymyrk

Bayesian Credible Interval vs. CI

Page 40: How to conclude online experiments in python

volodymyrk

Day-30Do you want to buy last chance?

A/B testing Revenue

Page 41: How to conclude online experiments in python

volodymyrk

How much an extra life is worth?

LOSER!!!

Purchase another chance

for only..

$0.99

LOSER!!!

Purchase another chance

for only..

$1.99

Fruit Crush Epic

Page 42: How to conclude online experiments in python

volodymyrk

How we are going to test it?Consider● There are multiple items to buy in game (lives, boosters, blenders, etc)● We expect more people to make a $0.99 purchase, so we hope to make

more money overall, even at lower priceA/B test Design● We will show A/B test to new users only● Will run for 2 months● We will measure overall revenue per user in the first 30 days● Null-hypothesis: we make more money from $0.99 group

Measurements● Difference in Average Revenue Per User (ARPU) in 30 days● Difference in Conversion Rate (%% of users who make at least 1 purchase)

Page 43: How to conclude online experiments in python

volodymyrk

Results

count 450 390mean 151.9 214.225% 20.8 26.550% 55.3 69.475% 147.3 231.3max 3960 3647.8

Fruit Crush Epic

* random generator used in the example is available in ipython notebooks** distribution is made more extreme than what is normally observed in casual game, like our imaginary match-3 title

Page 44: How to conclude online experiments in python

volodymyrk

Results

30,000 users in each group450 payers 390 payers

p-value = 0.037Significant

p-value = ???Is it Significant?

Page 45: How to conclude online experiments in python

volodymyrkTaxonomy of Classical stat testing

Which Test?

1 Sample

2 Samples

>2 Samples

Mean

Proportion

Variance

σ known

σ unknown

z-test one sample

t-test one sample

z-test for proportion

Chi-squared test

Mean

Proportion

Variance

ANOVA

z-test for (μ1-μ2)

t-test for (μ1-μ2)

z-test or t-test for dependent samples

z-test, 2 proportions

independent

dependent samples

σ1,σ2 known

σ1,σ2 unknown

F-test

Page 46: How to conclude online experiments in python

volodymyrk

Welch's t-test (σ1≠σ2)

Can we actually use t-test?

Page 47: How to conclude online experiments in python

volodymyrk

Poor’s man non-parametric test: split 5

p < 3%

Page 48: How to conclude online experiments in python

volodymyrkIf you don’t know enough stats - simulate!

This is very close to p-value from t-test

Page 49: How to conclude online experiments in python

volodymyrk

Can we improve sensitivity?27 players, who have spent > $1000 in both group.10 in $0.99 group and 17 in $1.99 groupMax spent = $3960

Page 50: How to conclude online experiments in python

volodymyrkAnd we re-run our analysis

Again, we can use t-test

Page 51: How to conclude online experiments in python

volodymyrk

Final Thoughts

Page 52: How to conclude online experiments in python

volodymyrk

Can we analyse distributions?

You can quantify difference between two curvesArea under the curve is Average Revenue per User

Fruit Crush Epic

* random generator used in the example is available in ipython notebooks** distribution is made more extreme than what is normally observed in casual game, like our imaginary match-3 title

Page 53: How to conclude online experiments in python

volodymyrk

Is 30 day revenue a good metric?LTV projection A LTV projection B

Fruit Crush Epic

Page 54: How to conclude online experiments in python

volodymyrk

Summary:

● There are only few stats tests that any Data Scientist must know

● t-tests are robust to be useful even with skewed data sets

● Bayesian and MCMC is cool, but don’t use MCMC for trivial cases

● It is hard to detect the difference in heavily-skewed cases

IPython Notebooks for this tutorial are available at: http://nbviewer.ipython.org/github/VolodymyrK/stats-testing-in-python