building an experimentation framework

116
Building an experimentation framework for web apps Zhi-Da Zhong [email protected] Tuesday, July 26, 2011

Upload: zsqr

Post on 06-May-2015

1.891 views

Category:

Technology


2 download

DESCRIPTION

OSCON talk on building a simple but powerful framework for feature ramp ups, A/B and multivariate testing, and other types of experiments in web apps.

TRANSCRIPT

Page 1: Building an experimentation framework

Building an experimentation framework for web apps

Zhi-Da [email protected]

Tuesday, July 26, 2011

Page 2: Building an experimentation framework

About the talk

Why

What

Framework

Break / hack

Tech Details

Test design

Analysis

Tuesday, July 26, 2011

Page 3: Building an experimentation framework

Why?

Tuesday, July 26, 2011

Page 4: Building an experimentation framework

Questions

“What will happen if I do X”?

“Is X better than Y?”

Tuesday, July 26, 2011

Page 5: Building an experimentation framework

The future &

alternate universes (We’re bad at those.)

Tuesday, July 26, 2011

Page 6: Building an experimentation framework

Then what?

Tuesday, July 26, 2011

Page 7: Building an experimentation framework

Experiments

Tuesday, July 26, 2011

Page 8: Building an experimentation framework

Try it out.

Experiments

Tuesday, July 26, 2011

Page 9: Building an experimentation framework

Try it out.

Data beats speculation.

Experiments

Tuesday, July 26, 2011

Page 10: Building an experimentation framework

Try different alternatives

on different people.

Experiments

Tuesday, July 26, 2011

Page 11: Building an experimentation framework

Try different alternatives

on different people.

Experiments

Tuesday, July 26, 2011

Page 12: Building an experimentation framework

Which is better?

v.s.

Tuesday, July 26, 2011

Page 13: Building an experimentation framework

Not a great experiment

Tuesday, July 26, 2011

Page 14: Building an experimentation framework

Web apps

Tuesday, July 26, 2011

Page 15: Building an experimentation framework

Front end experiments

• Layout, colors, images, copy, ...

• No functional changes

• Impact can be surprisingly high

Tuesday, July 26, 2011

Page 16: Building an experimentation framework

A little more complex...

• Multipage flows

• Functionality changes

Tuesday, July 26, 2011

Page 17: Building an experimentation framework

Backend experiments

• Why not?

• Algorithms, architectures, batch processes, ...

Tuesday, July 26, 2011

Page 18: Building an experimentation framework

The Etsy search backend

• New algorithm

• New RPC protocol

• New result data structure

• New Solr trunk snapshot

Web app

Search cluster A

Search cluster B

search()

searchA() searchB()

Tuesday, July 26, 2011

Page 19: Building an experimentation framework

DB re-architecture

• Postgres => Sharded MySQL

• Multiple experiments

Tuesday, July 26, 2011

Page 20: Building an experimentation framework

Whole new features

New pages+

New DB tables+

New batch jobs+...

Tuesday, July 26, 2011

Page 21: Building an experimentation framework

Not just 2 variants

• A/B/C... tests

• Multi-variate tests

Tuesday, July 26, 2011

Page 22: Building an experimentation framework

Caveats

• Content not under your control

• Price tests?

• Hard-to-measure/quantify things

• Long term impact?

Tuesday, July 26, 2011

Page 23: Building an experimentation framework

Other tests

• Internal users testing

• Whitelisted user testing

Tuesday, July 26, 2011

Page 24: Building an experimentation framework

Opt-in experiments

Tuesday, July 26, 2011

Page 25: Building an experimentation framework

Complementary techniques

• Observed/recorded testing

- show different people the same thing

• Side-by-side testing

- show each person 2 alternatives

Tuesday, July 26, 2011

Page 26: Building an experimentation framework

Side by side testing

Tuesday, July 26, 2011

Page 27: Building an experimentation framework

How

Tuesday, July 26, 2011

Page 28: Building an experimentation framework

A common approach

• JS-based

• Non-techie UI

• “No IT!”

• “Designed For Marketers, By Marketers”

Tuesday, July 26, 2011

Page 29: Building an experimentation framework

• The developer is the user

• Code as configuration

• An integral part of the dev process

Our approach

Tuesday, July 26, 2011

Page 30: Building an experimentation framework

Developer as the user

• The builder of the feature writes the test

• Not just a marketing tool

Tuesday, July 26, 2011

Page 31: Building an experimentation framework

Code as config

• Simplicity

• Expressivity

• Quality

• Version => complete system state

• Revision history

Tuesday, July 26, 2011

Page 32: Building an experimentation framework

Part of the dev process

Every change is an experiment!

Tuesday, July 26, 2011

Page 33: Building an experimentation framework

What does it look like?

Tuesday, July 26, 2011

Page 34: Building an experimentation framework

Tuesday, July 26, 2011

Page 35: Building an experimentation framework

Default => Experiment => (new) Default

Tuesday, July 26, 2011

Page 36: Building an experimentation framework

To add a new feature...

+ $config[‘new_search’] = array(+ ‘enabled’ => ‘off’+ );

function search() {+ if ($cfg->isEnabled(‘new_search’)) {+ return do_new_search();+ }

// existing stuff}

Tuesday, July 26, 2011

Page 37: Building an experimentation framework

Deploy that

Tuesday, July 26, 2011

Page 38: Building an experimentation framework

Now we go crazy...

function do_new_search() { // exciting new stuff // that might or might not work // but we can deploy it anyway // since it’s flagged off}

Tuesday, July 26, 2011

Page 39: Building an experimentation framework

Internal user testing

$config[‘new_search’] = array(+ ‘enabled’ => ‘rampup’,+ ‘rampup’ => array(+ ‘admin’ => true

));

Tuesday, July 26, 2011

Page 40: Building an experimentation framework

$config[‘new_search’] = array( ‘enabled’ => ‘rampup’, ‘rampup’ => array(

+ ‘whitelist’ => array('zhida'), ‘admin’ => true ));

Whitelists

Tuesday, July 26, 2011

Page 41: Building an experimentation framework

$config[‘new_search’] = array( ‘enabled’ => ‘rampup’, ‘rampup’ => array(

+ ‘group’ => 12345, ‘admin’ => true ));

Opt-in experiments

Tuesday, July 26, 2011

Page 42: Building an experimentation framework

$config[‘new_search’] = array( ‘enabled’ => ‘rampup’, ‘rampup’ => array(

+ ‘percent’ => 1.5, ‘admin’ => true ));

A/B

Tuesday, July 26, 2011

Page 43: Building an experimentation framework

$config[‘new_search’] = array(+ ‘enabled’ => ‘on’

);

If it works...

Tuesday, July 26, 2011

Page 44: Building an experimentation framework

Order matters

Whitelist / Blacklist > Internal > Opt-in > Random

Tuesday, July 26, 2011

Page 45: Building an experimentation framework

The framework

Tuesday, July 26, 2011

Page 46: Building an experimentation framework

As easy as...

Tuesday, July 26, 2011

Page 47: Building an experimentation framework

As easy as...

1. Pick a variant

Tuesday, July 26, 2011

Page 48: Building an experimentation framework

As easy as...

1. Pick a variant

2. Do what it says

Tuesday, July 26, 2011

Page 49: Building an experimentation framework

As easy as...

1. Pick a variant

2. Do what it says

3. Log the event

Tuesday, July 26, 2011

Page 50: Building an experimentation framework

What's in a test?

Tuesday, July 26, 2011

Page 51: Building an experimentation framework

Variants

• Key-value pairs

• interpreted by the app

• Name

• mostly for logging

Tuesday, July 26, 2011

Page 52: Building an experimentation framework

SubjectIdProvider

• Why?

• hashing and other selectors

• logging

• Types of subjects

• Users...but not always

• Different groups of users - sellers vs buyers, etc.

• Different ways to identify them - signed in vs signed out

function getID()

Tuesday, July 26, 2011

Page 53: Building an experimentation framework

Selectors

function select($subjectID) => Variant Name

Tuesday, July 26, 2011

Page 54: Building an experimentation framework

Combining multiple selectors

• OR

• breaks blacklists

• AND

• breaks whitelists

• Sequence

• works!

Tuesday, July 26, 2011

Page 55: Building an experimentation framework

Selector sequence

• Defines an ordering

• Returns A/B/C/... or <don't care>

Tuesday, July 26, 2011

Page 56: Building an experimentation framework

Loggers

function log($testKey, $variantKey, $subjectKey)

Tuesday, July 26, 2011

Page 57: Building an experimentation framework

More => better

• More data

• More ways to track

• access logs

• 3P analytics

• custom

Tuesday, July 26, 2011

Page 58: Building an experimentation framework

Access log augmentation

• Apache note

• Lots of log analysis tools

• grep

• $$

Tuesday, July 26, 2011

Page 59: Building an experimentation framework

3P Analytics

• Quick to start

• May be cheap

• Volume?

• Lag time?

• Flexibility / customization?

Tuesday, July 26, 2011

Page 60: Building an experimentation framework

3P Analytics - how

• Custom variables

• take note of number & size limits

• Custom segments

• Canned metrics

Tuesday, July 26, 2011

Page 61: Building an experimentation framework

3P Analytics - example

<script type="text/javascript">var pageTracker = _gat._getTracker("UA-1234567-8");

pageTracker._initData();

pageTracker._setCustomVar(2, "AB", "search_test.variantC", 3);

pageTracker._trackPageview();

</script>

Tuesday, July 26, 2011

Page 62: Building an experimentation framework

Our own event tracking

• HTML beacons

• Hadoop

• Cloud

Web appHTML, JS

Hadoop

eventbeacon

Event log

Results

Tuesday, July 26, 2011

Page 63: Building an experimentation framework

Break / hackhttps://github.com/etsy/ab

Tuesday, July 26, 2011

Page 64: Building an experimentation framework

Building on top of the core API

Tuesday, July 26, 2011

Page 65: Building an experimentation framework

Test builders

• Capture common patterns

• feature ramp ups

• opt-in experiments

• Help with test design

• weight equalization

• multivariate testing

Tuesday, July 26, 2011

Page 66: Building an experimentation framework

Automatic Dispatchers

• Separate dispatching and work

• Work with components that have well-defined invocation APIs

• Define a particular level of granularity

• Feel like magic

Tuesday, July 26, 2011

Page 67: Building an experimentation framework

Dispatcher example - MVC

• View dispatch

• Controller dispatch

• Spring framework, etc.

Tuesday, July 26, 2011

Page 68: Building an experimentation framework

Selector Registry

• Reuse

• Clarity

• Documentation

$selectorReg = array( ‘staff’ => ‘InternalUserSelector’, ‘whitelist’ => ‘WhitelistSelector’, ‘percent’ => ‘WeightedSelector’);

Tuesday, July 26, 2011

Page 69: Building an experimentation framework

Randomized Selector

Tuesday, July 26, 2011

Page 70: Building an experimentation framework

What does it mean?

Tuesday, July 26, 2011

Page 71: Building an experimentation framework

What does it mean?

• Independent of subject attributes

Tuesday, July 26, 2011

Page 72: Building an experimentation framework

What does it mean?

• Independent of subject attributes

• Independent of other tests

Tuesday, July 26, 2011

Page 73: Building an experimentation framework

What does it mean?

• Independent of subject attributes

• Independent of other tests

• Independent of (coarse-grained) time

Tuesday, July 26, 2011

Page 74: Building an experimentation framework

Persistence

Tuesday, July 26, 2011

Page 75: Building an experimentation framework

Persistence

• Better experience

Tuesday, July 26, 2011

Page 76: Building an experimentation framework

Persistence

• Better experience

• Better data

Tuesday, July 26, 2011

Page 77: Building an experimentation framework

Persistence

• Better experience

• Better data

• Multi-part tests

Tuesday, July 26, 2011

Page 78: Building an experimentation framework

Persistence

• Better experience

• Better data

• Multi-part tests

• ...but not forever

Tuesday, July 26, 2011

Page 79: Building an experimentation framework

Ramping up/down

• Vary group sizes

• Reduce risk

• Distribute load

Tuesday, July 26, 2011

Page 80: Building an experimentation framework

Persistence + Ramping

• Minimize inconsistency

• Ramping up

• Should just add people to the treatment group

• Ramping down

• Should just remove part of the treatment group

Tuesday, July 26, 2011

Page 81: Building an experimentation framework

rand()

• Explicit persistence

• Cookie

• DB

• Scaling

• Maintenance

Tuesday, July 26, 2011

Page 82: Building an experimentation framework

Hashing

variant = H(id)

Tuesday, July 26, 2011

Page 83: Building an experimentation framework

Hashing

variant = H(id)

Persistence

Tuesday, July 26, 2011

Page 84: Building an experimentation framework

Hashing

variant = H(id)

Persistence

Tuesday, July 26, 2011

Page 85: Building an experimentation framework

Hashing

variant = H(id)

Persistence

Attribute independence

Tuesday, July 26, 2011

Page 86: Building an experimentation framework

Hashing

variant = H(id)

Persistence Attribute independence

Tuesday, July 26, 2011

Page 87: Building an experimentation framework

Hashing

variant = H(id)

Persistence

Test independence?

Attribute independence

Tuesday, July 26, 2011

Page 88: Building an experimentation framework

Hashing

variant = H(test id, id)

Persistence

Test independence

Attribute independence

Tuesday, July 26, 2011

Page 89: Building an experimentation framework

Hashing

variant = H(test id, id)

Persistence Test independenceAttribute independence

Tuesday, July 26, 2011

Page 90: Building an experimentation framework

Hashing

variant = H(test id, id)

Persistence

What else?

Test independenceAttribute independence

Tuesday, July 26, 2011

Page 91: Building an experimentation framework

Hashing

variant = H(test id, id)

Persistence

Weights!

Attribute independence Test independence

Tuesday, July 26, 2011

Page 92: Building an experimentation framework

Hashing

h = H(test id, id)

Persistence Attribute independence Test independence

Tuesday, July 26, 2011

Page 93: Building an experimentation framework

Hashing

h = H(test id, id)

variant = P(h, weights)

Persistence Attribute independence Test independence

Tuesday, July 26, 2011

Page 94: Building an experimentation framework

Partitioning

Hash

0 1

Tuesday, July 26, 2011

Page 95: Building an experimentation framework

Partitioning

Hash

0 1

Partition

.5

Tuesday, July 26, 2011

Page 96: Building an experimentation framework

Partitioning

Hash

0 1

Partition

A B.5

Tuesday, July 26, 2011

Page 97: Building an experimentation framework

Ramping up

Hash

0 1

Partition

A B.7

Tuesday, July 26, 2011

Page 98: Building an experimentation framework

Which hash function?

• MD5/SHA-256/...

• Test it!

• But be careful...

Tuesday, July 26, 2011

Page 99: Building an experimentation framework

A/B + opt-in

• Need to separate the groups for analysis

• Solution: use more than 2 variants!

• Act according to variant properties

• Track by variant name

Tuesday, July 26, 2011

Page 100: Building an experimentation framework

Analysis

Tuesday, July 26, 2011

Page 101: Building an experimentation framework

... Confidence interval ... something something ... Binomial ... blah blah ...

Tuesday, July 26, 2011

Page 102: Building an experimentation framework

• How sure are we?

• What if it were random?

Confidence Intervals

Tuesday, July 26, 2011

Page 103: Building an experimentation framework

Binomial experiments

Tuesday, July 26, 2011

Page 104: Building an experimentation framework

H T H T T T H T H H

Binomial experiments

Tuesday, July 26, 2011

Page 105: Building an experimentation framework

H T H T T T H T H H

T H T H T T H H T H

Binomial experiments

Tuesday, July 26, 2011

Page 106: Building an experimentation framework

Results

Tuesday, July 26, 2011

Page 107: Building an experimentation framework

Dashboards

Tuesday, July 26, 2011

Page 108: Building an experimentation framework

A few test design tips

Tuesday, July 26, 2011

Page 109: Building an experimentation framework

Whatʼs the question?

Tuesday, July 26, 2011

Page 110: Building an experimentation framework

Whatʼs the question?

What metrics?

Tuesday, July 26, 2011

Page 111: Building an experimentation framework

Whatʼs the question?

What metrics?

How much better?

Tuesday, July 26, 2011

Page 112: Building an experimentation framework

Who?

• Different roles

• Old vs new

• Novelty

• Habit

• Expectation

Tuesday, July 26, 2011

Page 113: Building an experimentation framework

When?

• User types vary

• Activity patterns vary

• Site content might vary

• Performance might vary

• Full weeks are often a good starting point

Tuesday, July 26, 2011

Page 114: Building an experimentation framework

Summary

Tuesday, July 26, 2011

Page 115: Building an experimentation framework

Better living through experimentation

• More risk taking => better product

• MTTR

• Lower stress

Tuesday, July 26, 2011

Page 116: Building an experimentation framework

You can too.

Tuesday, July 26, 2011