a framework for multi-a(rmed)/b(andit) testing with online ... · a framework for...
TRANSCRIPT
![Page 1: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/1.jpg)
A framework for Multi-A(rmed)/B(andit) testing with online FDR control
Fanny Yang, Aaditya Ramdas, Kevin Jamieson, Martin J. WainwrightUC Berkeley
Spotlight,NIPS Conference, December 2017
![Page 2: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/2.jpg)
Traditional A/B Testing
A B
vs.
control alternative
![Page 3: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/3.jpg)
Traditional A/B Testing
A B
50% 75%
vs.
control alternative
![Page 4: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/4.jpg)
Traditional A/B Testing
A B
H0: A at least as good as B
50% 75%
vs.
control alternative
Hypothesis test
![Page 5: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/5.jpg)
Traditional A/B Testing
A B
H0: A at least as good as B
50% 75%
vs.
control alternativeaccept
AKeep using
Hypothesis test
![Page 6: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/6.jpg)
Traditional A/B Testing
A B
H0: A at least as good as B
50% 75%
vs.
control alternative
B
reject
Switch to
accept
AKeep using
Hypothesis test
![Page 7: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/7.jpg)
In reality: many alternatives, many tests
vs.
![Page 8: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/8.jpg)
…
Control (default) Alternatives
In reality: many alternatives, many tests
vs.
![Page 9: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/9.jpg)
…January
Phone App Layout
Control (default) Alternatives
In reality: many alternatives, many tests
vs.
![Page 10: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/10.jpg)
Sequen
ce o
f te
sts …
…AprilWebsiteLayout
JanuaryPhone App
Layout
Control (default) Alternatives
In reality: many alternatives, many tests
vs.
vs.
![Page 11: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/11.jpg)
Sequen
ce o
f te
sts …
…
…AprilWebsiteLayout
JanuaryPhone App
Layout
AugustTeaser picture
Control (default) Alternatives
In reality: many alternatives, many tests
vs.
vs.
vs.
![Page 12: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/12.jpg)
Sequen
ce o
f te
sts …
…
…AprilWebsiteLayout
JanuaryPhone App
Layout
AugustTeaser picture
Control (default) Alternatives
In reality: many alternatives, many tests
vs.
vs.
vs.
![Page 13: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/13.jpg)
Goal I (A/B testing)
AprilWebsiteLayout
AugustTeaser picture
MayTV ad
JanuaryPhone App
Layout
Dec.NIPS booth
JuneEmail ads
![Page 14: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/14.jpg)
Goal I (A/B testing)Null hypothesis truecontrol is indeed better
Null hypothesis wrongat least one alternative better
AprilWebsiteLayout
AugustTeaser picture
MayTV ad
JanuaryPhone App
Layout
Dec.NIPS booth
JuneEmail ads
![Page 15: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/15.jpg)
Goal I (A/B testing)Null hypothesis truecontrol is indeed better
Null hypothesis wrongat least one alternative better
AprilWebsiteLayout
AugustTeaser picture
MayTV ad
JanuaryPhone App
Layout
Dec.NIPS booth
JuneEmail ads
Accepted
Rejected
![Page 16: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/16.jpg)
discoveries
Goal I (A/B testing)Null hypothesis truecontrol is indeed better
Null hypothesis wrongat least one alternative better
AprilWebsiteLayout
AugustTeaser picture
MayTV ad
JanuaryPhone App
Layout
Dec.NIPS booth
JuneEmail ads
Accepted
Rejected
![Page 17: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/17.jpg)
discoveriesfalse discoveries
Goal I (A/B testing)Null hypothesis truecontrol is indeed better
Null hypothesis wrongat least one alternative better
AprilWebsiteLayout
AugustTeaser picture
MayTV ad
JanuaryPhone App
Layout
Dec.NIPS booth
JuneEmail ads
Accepted
Rejected
![Page 18: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/18.jpg)
discoveriesfalse discoveries
Goal I (A/B testing)
Control the expected ratio #false discoveries
#discoveries(FDR)
Null hypothesis truecontrol is indeed better
Null hypothesis wrongat least one alternative better
AprilWebsiteLayout
AugustTeaser picture
MayTV ad
JanuaryPhone App
Layout
Dec.NIPS booth
JuneEmail ads
Accepted
Rejected
![Page 19: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/19.jpg)
discoveries
Goal II (power and best alternative)
Null hypothesis truecontrol is indeed better
Null hypothesis wrongat least one alternative better
AprilWebsiteLayout
AugustTeaser picture
MayTV ad
JanuaryPhone App
Layout
Dec.NIPS booth
JuneEmail ads
Accepted:
Rejected:
![Page 20: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/20.jpg)
discoveries true discoveries
Goal II (power and best alternative)
Null hypothesis truecontrol is indeed better
Null hypothesis wrongat least one alternative better
AprilWebsiteLayout
AugustTeaser picture
MayTV ad
JanuaryPhone App
Layout
Dec.NIPS booth
JuneEmail ads
Accepted:
Rejected:
![Page 21: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/21.jpg)
discoveries true discoveries
Goal II (power and best alternative)
Null hypothesis truecontrol is indeed better
Null hypothesis wrongat least one alternative better
AprilWebsiteLayout
AugustTeaser picture
MayTV ad
JanuaryPhone App
Layout
Dec.NIPS booth
JuneEmail ads
Accepted:
Rejected:
Best alternative: Alternative 3 Alternative 4 Alternative 2
![Page 22: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/22.jpg)
discoveries true discoveries
Goal II (power and best alternative)
Null hypothesis truecontrol is indeed better
Null hypothesis wrongat least one alternative better
AprilWebsiteLayout
AugustTeaser picture
MayTV ad
JanuaryPhone App
Layout
Dec.NIPS booth
JuneEmail ads
Accepted:
Rejected:
Best alternative: Alternative 3 Alternative 4 Alternative 2
Maximize # true discoveries,
find best alternative for each discovery
![Page 23: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/23.jpg)
Our framework: MAB-FDR
MAB-FDR meta algorithm
Online FDR procedure
desired FDR level 𝛼
![Page 24: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/24.jpg)
Our framework: MAB-FDR
MAB-FDR meta algorithm
Test j
Online FDR procedure
…
desired FDR level 𝛼
![Page 25: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/25.jpg)
Our framework: MAB-FDR
MAB-FDR meta algorithm
𝛼𝑗
Test j
Online FDR procedure
…
desired FDR level 𝛼
![Page 26: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/26.jpg)
Our framework: MAB-FDR
MAB-FDR meta algorithm
𝛼𝑗
Test j
Test𝑝𝑗 < 𝛼𝑗
𝑝𝑗
Online FDR procedure
…
desired FDR level 𝛼
Best-armMAB
![Page 27: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/27.jpg)
Our framework: MAB-FDR
MAB-FDR meta algorithm
𝛼𝑗 Reject/accept
Test j
Test𝑝𝑗 < 𝛼𝑗
𝑝𝑗
Online FDR procedure
…
desired FDR level 𝛼
Best alternative
Best-armMAB
![Page 28: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/28.jpg)
Our framework: MAB-FDR
MAB-FDR meta algorithm
𝛼𝑗 Reject/accept
Test j
Test𝑝𝑗 < 𝛼𝑗
𝑝𝑗
𝛼j+1 Reject/accept
Test j+1
Best-armMAB
Test 𝑝j+1 < 𝛼j+1
𝑝j+1
Online FDR procedure
……
desired FDR level 𝛼
Best alternativeBest alternative
Best-armMAB
![Page 29: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/29.jpg)
Our framework…
1. Uses online FDR procedures to control FDR at any test
2. Uses best-arm MAB algorithm for testing each hypothesis,
and finding the best alternative
while sampling only as much as needed
![Page 30: A framework for Multi-A(rmed)/B(andit) testing with online ... · A framework for Multi-A(rmed)/B(andit) testing with online FDR control Fanny Yang, Aaditya Ramdas, Kevin Jamieson,](https://reader033.vdocuments.site/reader033/viewer/2022052723/5f0ebc6c7e708231d440afb3/html5/thumbnails/30.jpg)
AadityaRamdas
KevinJamieson
MartinWainwright
”A framework for Multi-A(rmed)/B(andit) testing with online FDR control”
FannyYang
Come and learn more at
Poster #2