n=10^9: automated experimentation at scale
TRANSCRIPT
![Page 1: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/1.jpg)
N=109 Automated
Experimenta5on at Scale Wojciech Galuba Decision Tools Lead,
Facebook @wgaluba
![Page 2: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/2.jpg)
![Page 3: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/3.jpg)
N=109: Automated Experimentation at Scale
Wojtek Galuba (wgaluba@fb) Decision Tools Team Lead Data Science Infrastructure Facebook
![Page 4: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/4.jpg)
History of Data Science Infra at FB • Founded April 2012 • A group of data scientists and software engineers • Experienced first hand the need for better infrastructure • Need continues to grow • Team doubled over the past year • Expect continued rapid growth this year
![Page 5: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/5.jpg)
Why do we experiment?
![Page 6: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/6.jpg)
Experimentation
Product changes
Experiment to study this
Metrics
![Page 7: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/7.jpg)
Experiment to:
Catch problems before they arise
![Page 8: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/8.jpg)
Experiment to:
Choose between multiple options
![Page 9: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/9.jpg)
Experiment to:
Challenge intuitions about product
![Page 10: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/10.jpg)
Experiment to:
Not only evaluate ideas but generate new ones
![Page 11: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/11.jpg)
Challenges
![Page 12: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/12.jpg)
Many experiments
• Experiments running in parallel • Modifying many different aspects of the product • Overlaps are possible and may conflict
![Page 13: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/13.jpg)
Many metric dimensions • Different contexts of user actions • Thousands of device types • Geography • Demographics • Time • Enormous space of possible questions
![Page 14: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/14.jpg)
Many teams • Many ways to run an experiment • Diverse audience for results • Huge set of results from every experiment • Many ways to interpret results
![Page 15: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/15.jpg)
Experimentation at Facebook
![Page 16: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/16.jpg)
An experiment
![Page 17: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/17.jpg)
QuickExperiment
Div
ide
peop
le ra
ndom
ly color: blue
size: medium"
color: blue"size: big"
color: green"size: medium"
![Page 18: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/18.jpg)
QuickExperiment • Centralized experiment management • Purely config-level: no code pushes to iterate • Automatic exposure logging
![Page 19: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/19.jpg)
PlanOut
![Page 20: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/20.jpg)
PlanOut • Open sourced: http://facebook.github.io/planout/ • Flexible experimental design • Full, programmatic control over param values
![Page 21: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/21.jpg)
Experiment evaluation
Exposures
Metrics
% change from control to test -1 0 1 2 -2 3 -3
posts
99.9 % 99 % 95 % Confidence:
![Page 22: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/22.jpg)
Assess decision risk
99.9 % 99 % 95 % Confidence:
![Page 23: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/23.jpg)
Lessons learned
![Page 24: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/24.jpg)
Computing answers to exponential number of possible questions
Pre-compute • low specificity • low dimensionality • long-term
Compute on-the-fly • high specificity • high dimensionality • short-term
A balancing act
![Page 25: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/25.jpg)
Tackling many dimensions Two sets of tools
For exploration For extraction
![Page 26: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/26.jpg)
Automated exploration
![Page 27: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/27.jpg)
Enforce a lifecycle; In particular:
clear experiment end dates
![Page 28: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/28.jpg)
Why lifecycle policy? • Unifies methodology across teams • Prevents tech debt buildup • Minimizes bad impact on product
![Page 29: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/29.jpg)
Ease of rapid iteration; Safe and scientifically valid iteration
![Page 30: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/30.jpg)
Fast, but not too fast • Novelty effect vs. top engaged users bump • Understand if waiting helps
![Page 31: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/31.jpg)
Ensure mutual exclusion; Across platforms, features and infra
![Page 32: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/32.jpg)
Why mutual exclusion? • Fewer experiment conflicts • Lower metrics variance
![Page 33: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/33.jpg)
Exposure log everything • Measure effects on the exposed only • Conditioning analyses on the time since last exposure
![Page 34: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/34.jpg)
The culture
Experimentation gives focus; But watch out for tunnel vision!
![Page 35: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/35.jpg)
The culture
Cultivate sound practices; Safe and low-impact experimentation
![Page 36: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/36.jpg)
The culture
Educate on data interpretation; Uniform decision-making
across teams
![Page 37: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/37.jpg)
Understanding uncertainty
“Robust misinterpretation of confidence intervals” Rink Hoekstra et al. Psychonomic Bulletin & Review
• Only 3% of scientists got all 6 answers right...
• How do we educate the users of the tools?
![Page 38: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/38.jpg)
The three stages of experimentation
infrastructure
![Page 39: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/39.jpg)
Stage 1: Artisanal
Photo credit: Abhisek Sarda
![Page 40: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/40.jpg)
Stage 2: Power tools
![Page 41: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/41.jpg)
Stage 2: Power tools
![Page 42: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/42.jpg)
Stage 3: Industrialized
Photo credit: Steve Jurvetson
![Page 43: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/43.jpg)
Conclusions
Empower, but don’t overwhelm
![Page 44: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/44.jpg)
Conclusions
Filter and automate, but maintain broad focus
![Page 45: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/45.jpg)
Conclusions Clean data and powerful tools are great, but
building the right experimentation culture is equally important
![Page 46: N=10^9: Automated Experimentation at Scale](https://reader037.vdocuments.site/reader037/viewer/2022103021/55d4f8c0bb61eb741f8b46d5/html5/thumbnails/46.jpg)
N=109 Automated Experimenta5on at
Scale Wojciech Galuba
Decision Tools Lead, Facebook @wgaluba