essential intuitive statistics for experimentation
TRANSCRIPT
Intuitive Statistics
Matt Gardner
Population
SampleConfidence Interval
False Positives
Let’s pretend μT-μC=0
False Negatives
Let’s pretend μT-μC=d
Population
SampleConfidence Interval
( μ )
( x, se )( x ± z x se )
False Positives
Let’s pretend μT-μC=0
False Negatives
Let’s pretend μT-μC=d
Population
SampleConfidence Interval
( μ )
( x, se )( x ± z x se )
False Positives
Let’s pretend μT-μC=0
False Negatives
Let’s pretend μT-μC=d
Population
SampleConfidence Interval
( μ )
( x, se )( x ± z x se )
False Positives
Let’s pretend μT-μC=0
False Negatives
Let’s pretend μT-μC=d
Population
SampleConfidence Interval
( μ )
( x, se )( x ± z x se )
False Positives
Let’s pretend μT-μC=0
False Negatives
Let’s pretend μT-μC=d
Population
SampleConfidence Interval
( μ )
( x, se )( x ± z x se )
False Positives
Let’s pretend μT-μC=0
False Negatives
Let’s pretend μT-μC=d
Population
SampleConfidence Interval
( μ )
( x, se )( x ± z x se )
False Positives
Let’s pretend μT-μC=0
False Negatives
Let’s pretend μT-μC=d
Metrics
Unit of Analysis Measure
Mean Proportion
Traffic percent
Sample size
Lift, alpha, beta
SQL
Measure
Unit of Analysis
Simple Sample Size Calculator – here.
x duration
Common mistakes:• Inputs can change through time• Underestimating lift is safer than over estimating• Think carefully about choice of power – is it high enough?• Multiple metrics – choose highest traffic requirement• Large sample size > extend duration > bundle features > alternative metric
Metrics
Unit of Analysis Measure
Inputs for calculation
• Sum - of measure over all units• Count - of analysis units • Standard deviation - of measure over all units *• Relative lift – in average measure test vs. control• Alpha – false positive rate• Beta – false negative rate
* for proportion metrics sd = p.(1-p).n
Mean Proportion
Traffic percent
Sample size
Lift, alpha, beta
SQL
Measure
Unit of Analysis
Simple Sample Size Calculator – here.
x duration
Common mistakes:• Inputs can change through time• Underestimating lift is safer than over estimating• Think carefully about choice of power – is it high enough?• Multiple metrics – choose highest traffic requirement• Large sample size > extend duration > bundle features > alternative metric
Metrics
Unit of Analysis Measure
Inputs for calculation
• Sum - of measure over all units• Count - of analysis units • Standard deviation - of measure over all units *• Relative lift – in average measure test vs. control• Alpha – false positive rate• Beta – false negative rate
* for proportion metrics sd = p.(1-p).n
Mean Proportion
Traffic percent
Sample size
Lift, alpha, beta
SQL
Measure
Unit of Analysis
Simple Sample Size Calculator – here.
x duration
Common mistakes:• Inputs can change through time• Underestimating lift is safer than over estimating• Think carefully about choice of power – is it high enough?• Multiple metrics – choose highest traffic requirement• Large sample size > extend duration > bundle features > alternative metric
Metrics
Unit of Analysis Measure
Inputs for calculation
• Sum - of measure over all units• Count - of analysis units • Standard deviation - of measure over all units *• Relative lift – in average measure test vs. control• Alpha – false positive rate• Beta – false negative rate
* for proportion metrics sd = p.(1-p).n
Mean Proportion
Traffic percent
Sample size
Lift, alpha, beta
SQL
Measure
Unit of Analysis
Simple Sample Size Calculator – here.
x duration
Common mistakes:• Inputs can change through time• Underestimating lift is safer than over estimating• Think carefully about choice of power – is it high enough?• Multiple metrics – choose highest traffic requirement• Large sample size > extend duration > bundle features > alternative metric
• Experiment results are subject to randomness and conclusions will sometimes be in error • We choose the false positive and false negative error rates at experiment design time
• We know in advance if the experiment is likely to be useful and we should think carefully before running experiments … they are expensive!
• Always compute sample size!