a linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...a linear-time...
TRANSCRIPT
![Page 1: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/1.jpg)
A linear-time adaptive nonparametric two-sampletest
Zoltan Szabo (CMAP, Ecole Polytechnique)
Wittawat Jitkrittum Kacper Chwialkowski Arthur Gretton
Signal Processing and Machine Learning SeminarMarseilles
March 24, 2017
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 2: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/2.jpg)
Motivating examples
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 3: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/3.jpg)
Motivating example-1: NLP
Given: two categories of documents (Bayesian inference,neuroscience).
Task:
test their distinguishability,most discriminative words Ñ interpretability.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 4: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/4.jpg)
Motivating example-2: computer vision
Given: two sets of faces (happy, angry).
Task:
check if they are different,determine the most discriminative features/regions.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 5: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/5.jpg)
One-page summary
We propose a nonparametric t-test.
It gives a reason why H0 is rejected.
It is
adaptive Ñ high test power.fast (linear time).
Paper, code:
NIPS [Jitkrittum et al., 2016].
https://github.com/wittawatj/interpretable-test.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 6: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/6.jpg)
One-page summary
We propose a nonparametric t-test.
It gives a reason why H0 is rejected.
It is
adaptive Ñ high test power.fast (linear time).
Paper, code:
NIPS [Jitkrittum et al., 2016].
https://github.com/wittawatj/interpretable-test.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 7: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/7.jpg)
Two-sample test, distribution features
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 8: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/8.jpg)
What is a two-sample test?
Given:
X “ txiuni“1i.i.d.„ P, Y “ tyjunj“1
i.i.d.„ Q.
Example: xi = i th happy face, yj = j th sad face.
Problem: using X , Y test
H0 : P “ Q, vs
H1 : P ‰ Q.
Assume X ,Y Ă Rd .
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 9: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/9.jpg)
What is a two-sample test?
Given:
X “ txiuni“1i.i.d.„ P, Y “ tyjunj“1
i.i.d.„ Q.
Example: xi = i th happy face, yj = j th sad face.
Problem: using X , Y test
H0 : P “ Q, vs
H1 : P ‰ Q.
Assume X ,Y Ă Rd .
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 10: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/10.jpg)
What is a two-sample test?
Given:
X “ txiuni“1i.i.d.„ P, Y “ tyjunj“1
i.i.d.„ Q.
Example: xi = i th happy face, yj = j th sad face.
Problem: using X , Y test
H0 : P “ Q, vs
H1 : P ‰ Q.
Assume X ,Y Ă Rd .
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 11: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/11.jpg)
Ingredients of two-sample test
Test statistic: λn “ λnpX ,Y q, random.Significance level: α “ 0.01.Under H0: PH0p λn ď Tα
looomooon
correctly accepting H0
q “ 1´ α.
Under H1: PH1pTα ă λnq “ Ppcorrectly rejecting H0q =: power.
0 20 40 60 800
0.01
0.02
0.03
0.04
0.05
0.06
λn
PH0
(
λn
)
PH1
(
λn
)
Tα
λn
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 12: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/12.jpg)
Ingredients of two-sample test
Test statistic: λn “ λnpX ,Y q, random.Significance level: α “ 0.01.Under H0: PH0p λn ď Tα
looomooon
correctly accepting H0
q “ 1´ α.
Under H1: PH1pTα ă λnq “ Ppcorrectly rejecting H0q =: power.
0 20 40 60 800
0.01
0.02
0.03
0.04
0.05
0.06
λn
PH0
(
λn
)
PH1
(
λn
)
Tα
λn
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 13: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/13.jpg)
Towards representations of distributions: EX
Given: 2 Gaussians with different means.
Solution: t-test.
−6 −4 −2 0 2 4 60
0.1
0.2
0.3
0.4pdf
Two Gaussian variables: different means
PQ
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 14: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/14.jpg)
Towards representations of distributions: EX 2
Setup: 2 Gaussians; same means, different variances.
Idea: look at the 2nd-order features of RVs.
ϕx “ x2 ñ difference in EX 2.
−6 −4 −2 0 2 4 60
0.1
0.2
0.3
0.4
Two Gaussian variables: different variances
PQ
10−1
100
101
102
0
0.2
0.4
0.6
0.8
1
1.2
1.4
pd
f
Pdf−s of X2
PQ
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 15: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/15.jpg)
Towards representations of distributions: EX 2
Setup: 2 Gaussians; same means, different variances.
Idea: look at the 2nd-order features of RVs.
ϕx “ x2 ñ difference in EX 2.
−6 −4 −2 0 2 4 60
0.1
0.2
0.3
0.4
Two Gaussian variables: different variances
PQ
10−1
100
101
102
0
0.2
0.4
0.6
0.8
1
1.2
1.4
pd
f
Pdf−s of X2
PQ
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 16: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/16.jpg)
Towards representations of distributions: further moments
Setup: a Gaussian and a Laplacian distribution.
Challenge: their means and variances are the same.
Idea: look at higher-order features.
−6 −4 −2 0 2 4 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7p
df
Gaussian & Laplacian variables
PQ
Let us consider feature representations!
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 17: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/17.jpg)
Kernel: similarity between features
Given: x and x1 objects (images or texts).
Question: how similar they are?
Define features of the objects:
ϕx : features of x,
ϕx1 : features of x1.
Kernel: inner product of these features
kpx, x1q :“ 〈ϕx, ϕx1〉 .
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 18: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/18.jpg)
Kernel: similarity between features
Given: x and x1 objects (images or texts).
Question: how similar they are?
Define features of the objects:
ϕx : features of x,
ϕx1 : features of x1.
Kernel: inner product of these features
kpx, x1q :“ 〈ϕx, ϕx1〉 .
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 19: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/19.jpg)
Kernel: similarity between features
Given: x and x1 objects (images or texts).
Question: how similar they are?
Define features of the objects:
ϕx : features of x,
ϕx1 : features of x1.
Kernel: inner product of these features
kpx, x1q :“ 〈ϕx, ϕx1〉 .
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 20: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/20.jpg)
Kernel examples on Rd (γ ą 0, p P Z`)
Polynomial kernel:
kpx, yq “ p〈x, y〉` γqp.
Gaussian kernel:
kpx, yq “ e´γ}x´y}22 .
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 21: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/21.jpg)
Towards distribution features
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 22: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/22.jpg)
Towards distribution features
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 23: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/23.jpg)
Towards distribution features
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 24: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/24.jpg)
Towards distribution features
{MMD2pP,Qq “ ĚKP,P `ĘKQ,Q ´ 2ĘKP,Q (without diagonals in ĚKP,P, ĘKQ,Q)
:{MMD illustration credit: Arthur Gretton
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 25: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/25.jpg)
Kernel Ñ distribution feature
Kernel recall: kpx, x1q “ 〈ϕx, ϕx1〉.
Feature of P (mean embedding):
µP :“ Ex„Prϕxs.
Previous quantity: unbiased estimate of
MMD2pP,Qq “ }µP ´ µQ}2 .
Valid test [Gretton et al., 2012]. Challenges:
1 Threshold choice: ’ugly’ asymptotics of n {MMD2pP,Pq.2 Test statistic: quadratic time complexity.3 Witness P Hpkq: can be hard to interpret.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 26: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/26.jpg)
Kernel Ñ distribution feature
Kernel recall: kpx, x1q “ 〈ϕx, ϕx1〉.Feature of P (mean embedding):
µP :“ Ex„Prϕxs.
Previous quantity: unbiased estimate of
MMD2pP,Qq “ }µP ´ µQ}2 .
Valid test [Gretton et al., 2012]. Challenges:
1 Threshold choice: ’ugly’ asymptotics of n {MMD2pP,Pq.2 Test statistic: quadratic time complexity.3 Witness P Hpkq: can be hard to interpret.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 27: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/27.jpg)
Kernel Ñ distribution feature
Kernel recall: kpx, x1q “ 〈ϕx, ϕx1〉.Feature of P (mean embedding):
µP :“ Ex„Prϕxs.
Previous quantity: unbiased estimate of
MMD2pP,Qq “ }µP ´ µQ}2 .
Valid test [Gretton et al., 2012]. Challenges:
1 Threshold choice: ’ugly’ asymptotics of n {MMD2pP,Pq.2 Test statistic: quadratic time complexity.3 Witness P Hpkq: can be hard to interpret.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 28: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/28.jpg)
Kernel Ñ distribution feature
Kernel recall: kpx, x1q “ 〈ϕx, ϕx1〉.Feature of P (mean embedding):
µP :“ Ex„Prϕxs.
Previous quantity: unbiased estimate of
MMD2pP,Qq “ }µP ´ µQ}2 .
Valid test [Gretton et al., 2012]. Challenges:
1 Threshold choice: ’ugly’ asymptotics of n {MMD2pP,Pq.2 Test statistic: quadratic time complexity.3 Witness P Hpkq: can be hard to interpret.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 29: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/29.jpg)
Linear-time tests
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 30: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/30.jpg)
Linear-time 2-sample test
Recall:
MMDpP,Qq “ }µP ´ µQ}Hpkq .
Changing [Chwialkowski et al., 2015] this to
ρpP,Qq :“
g
f
f
e
1
J
Jÿ
j“1
rµPpvjq ´ µQpvjqs2
with random tvjuJj“1 test locations.
ρ is a metric (a.s.). How do we estimate it? Distribution under H0?
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 31: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/31.jpg)
What is a random metric?
In short
It is a metric almost surely.
In other words,
ρpP,Qq ě 0, ρpP,Qq “ 0 ô P “ Q almost surely.
ρpP,Qq “ ρpQ,Pq almost surely.
ρpP,Qq ď ρpP,Dq ` ρpD,Qq almost surely.
V “ tvjuJj“1 Ă Rd : reason of randomness.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 32: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/32.jpg)
What is a random metric?
In short
It is a metric almost surely.
In other words,
ρpP,Qq ě 0, ρpP,Qq “ 0 ô P “ Q almost surely.
ρpP,Qq “ ρpQ,Pq almost surely.
ρpP,Qq ď ρpP,Dq ` ρpD,Qq almost surely.
V “ tvjuJj“1 Ă Rd : reason of randomness.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 33: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/33.jpg)
What is a random metric?
In short
It is a metric almost surely.
In other words,
ρpP,Qq ě 0, ρpP,Qq “ 0 ô P “ Q almost surely.
ρpP,Qq “ ρpQ,Pq almost surely.
ρpP,Qq ď ρpP,Dq ` ρpD,Qq almost surely.
V “ tvjuJj“1 Ă Rd : reason of randomness.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 34: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/34.jpg)
What is a random metric?
In short
It is a metric almost surely.
In other words,
ρpP,Qq ě 0, ρpP,Qq “ 0 ô P “ Q almost surely.
ρpP,Qq “ ρpQ,Pq almost surely.
ρpP,Qq ď ρpP,Dq ` ρpD,Qq almost surely.
V “ tvjuJj“1 Ă Rd : reason of randomness.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 35: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/35.jpg)
What is a random metric?
In short
It is a metric almost surely.
In other words,
ρpP,Qq ě 0, ρpP,Qq “ 0 ô P “ Q almost surely.
ρpP,Qq “ ρpQ,Pq almost surely.
ρpP,Qq ď ρpP,Dq ` ρpD,Qq almost surely.
V “ tvjuJj“1 Ă Rd : reason of randomness.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 36: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/36.jpg)
Result
Theorem
If k is
bounded: supx,x1 kpx, x1q ď Bk ă 8,
analytic: x ÞÑ kpx, yq is analytic for any y P Rd .
characteristic: µ is injective,
then
ρpP,Qq :“
g
f
f
e
1
J
Jÿ
j“1
rµPpvjq ´ µQpvjqs2
is a metric a.s. w.r.t. tvjuJj“1.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 37: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/37.jpg)
Result
Theorem
If k is
bounded: supx,x1 kpx, x1q ď Bk ă 8,
analytic: x ÞÑ kpx, yq is analytic for any y P Rd .
characteristic: µ is injective,
then
ρpP,Qq :“
g
f
f
e
1
J
Jÿ
j“1
rµPpvjq ´ µQpvjqs2
is a metric a.s. w.r.t. tvjuJj“1.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 38: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/38.jpg)
Result
Theorem
If k is
bounded: supx,x1 kpx, x1q ď Bk ă 8,
analytic: x ÞÑ kpx, yq is analytic for any y P Rd .
characteristic: µ is injective,
then
ρpP,Qq :“
g
f
f
e
1
J
Jÿ
j“1
rµPpvjq ´ µQpvjqs2
is a metric a.s. w.r.t. tvjuJj“1.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 39: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/39.jpg)
Result
Theorem
If k is
bounded: supx,x1 kpx, x1q ď Bk ă 8,
analytic: x ÞÑ kpx, yq is analytic for any y P Rd .
characteristic: µ is injective,
then
ρpP,Qq :“
g
f
f
e
1
J
Jÿ
j“1
rµPpvjq ´ µQpvjqs2
is a metric a.s. w.r.t. tvjuJj“1.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 40: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/40.jpg)
Why do analytic features work? – proof idea
µ is injective to analytic functions:
k : bounded, analytic ñ elements of Hk : analytic.k : characteristic, bounded ñ µ “ µk : well-defined, injective.
µ: characteristic ñ for P ‰ Q, f :“ µP ´ µQ ‰ 0.
f : analytic, thus
ρpP,Qq “
g
f
f
e
Jÿ
j“1
rµPpvjq ´ µQpvjqs2
is a metric, a.s. w.r.t. (vji .i .d .„ ) m ! λ. Reason: for an
analytic f ‰ 0, mtv : f pvq “ 0u “ 0.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 41: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/41.jpg)
Why do analytic features work? – proof idea
µ is injective to analytic functions:
k : bounded, analytic ñ elements of Hk : analytic.k : characteristic, bounded ñ µ “ µk : well-defined, injective.
µ: characteristic ñ for P ‰ Q, f :“ µP ´ µQ ‰ 0.
f : analytic, thus
ρpP,Qq “
g
f
f
e
Jÿ
j“1
rµPpvjq ´ µQpvjqs2
is a metric, a.s. w.r.t. (vji .i .d .„ ) m ! λ. Reason: for an
analytic f ‰ 0, mtv : f pvq “ 0u “ 0.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 42: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/42.jpg)
Why do analytic features work? – proof idea
µ is injective to analytic functions:
k : bounded, analytic ñ elements of Hk : analytic.k : characteristic, bounded ñ µ “ µk : well-defined, injective.
µ: characteristic ñ for P ‰ Q, f :“ µP ´ µQ ‰ 0.
f : analytic, thus
ρpP,Qq “
g
f
f
e
Jÿ
j“1
rµPpvjq ´ µQpvjqs2
is a metric, a.s. w.r.t. (vji .i .d .„ ) m ! λ. Reason: for an
analytic f ‰ 0, mtv : f pvq “ 0u “ 0.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 43: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/43.jpg)
Estimation
Compute
pρ2pP,Qq “1
J
Jÿ
j“1
rµPpvjq ´ µQpvjqs2,
where µPpvq “1n
řni“1 kpxi , vq. Example using kpx, vq “ e´
}x´v}2
2σ2 :
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 44: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/44.jpg)
Estimation – continued
pρ2pP,Qq “1
J
Jÿ
j“1
rµPpvjq ´ µQpvjqs2
“1
J
Jÿ
j“1
«
1
n
nÿ
i“1
kpxi , vjq ´1
n
nÿ
i“1
kpyi , vjq
ff2
“1
J
Jÿ
j“1
pznq2j “
1
JzTn zn,
where zn “1n
řni“1 rkpxi , vjq ´ kpyi , vjqs
Jj“1
loooooooooooooomoooooooooooooon
“:zi
P RJ .
Good news: estimation is linear in n!
Bad news: intractable null distr. =?n pρ2pP,Pq w
ÝÑ sum of Jcorrelated χ2.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 45: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/45.jpg)
Estimation – continued
pρ2pP,Qq “1
J
Jÿ
j“1
rµPpvjq ´ µQpvjqs2
“1
J
Jÿ
j“1
«
1
n
nÿ
i“1
kpxi , vjq ´1
n
nÿ
i“1
kpyi , vjq
ff2
“1
J
Jÿ
j“1
pznq2j “
1
JzTn zn,
where zn “1n
řni“1 rkpxi , vjq ´ kpyi , vjqs
Jj“1
loooooooooooooomoooooooooooooon
“:zi
P RJ .
Good news: estimation is linear in n!
Bad news: intractable null distr. =?n pρ2pP,Pq w
ÝÑ sum of Jcorrelated χ2.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 46: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/46.jpg)
Estimation – continued
pρ2pP,Qq “1
J
Jÿ
j“1
rµPpvjq ´ µQpvjqs2
“1
J
Jÿ
j“1
«
1
n
nÿ
i“1
kpxi , vjq ´1
n
nÿ
i“1
kpyi , vjq
ff2
“1
J
Jÿ
j“1
pznq2j “
1
JzTn zn,
where zn “1n
řni“1 rkpxi , vjq ´ kpyi , vjqs
Jj“1
loooooooooooooomoooooooooooooon
“:zi
P RJ .
Good news: estimation is linear in n!
Bad news: intractable null distr. =?n pρ2pP,Pq w
ÝÑ sum of Jcorrelated χ2.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 47: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/47.jpg)
Estimation – continued
pρ2pP,Qq “1
J
Jÿ
j“1
rµPpvjq ´ µQpvjqs2
“1
J
Jÿ
j“1
«
1
n
nÿ
i“1
kpxi , vjq ´1
n
nÿ
i“1
kpyi , vjq
ff2
“1
J
Jÿ
j“1
pznq2j “
1
JzTn zn,
where zn “1n
řni“1 rkpxi , vjq ´ kpyi , vjqs
Jj“1
loooooooooooooomoooooooooooooon
“:zi
P RJ .
Good news: estimation is linear in n!
Bad news: intractable null distr. =?n pρ2pP,Pq w
ÝÑ sum of Jcorrelated χ2.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 48: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/48.jpg)
Normalized version gives tractable null
Modified test statistic:
λn “ nzTn Σ´1n zn,
where Σn “ cov ptziuni“1q.
Under H0:
λnwÝÑ χ2pJq. ñ Easy to get the p1´ αq-quantile!
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 49: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/49.jpg)
Our idea
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 50: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/50.jpg)
Idea
Until this point: test locations (V) are fixed.
Instead: choose θ “ tV, σu to
maximize lower bound on the test power.
Theorem (Lower bound on power, for large n)
Test power ě Lpλnq; L: explicit function, increasing.
Here,
λn “ nµTΣ´1µ: population version of λn.µ “ Exyrz1s, Σ “ Exy
“
pz1 ´ µqpz1 ´ µqT‰
.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 51: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/51.jpg)
Idea
Until this point: test locations (V) are fixed.
Instead: choose θ “ tV, σu to
maximize lower bound on the test power.
Theorem (Lower bound on power, for large n)
Test power ě Lpλnq; L: explicit function, increasing.
Here,
λn “ nµTΣ´1µ: population version of λn.µ “ Exyrz1s, Σ “ Exy
“
pz1 ´ µqpz1 ´ µqT‰
.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 52: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/52.jpg)
Non-convexity, informative features
2D problem:
P :“ N p0, Iq, Q :“ N pe1, Iq.
V “ tv1, v2u. Fix v1 to s.
v2 ÞÑ λnptv1, v2uq: contourplot.
Nearby locations: do notincrease discrimininability.
Non-convexity: reveals multipleways to capture the difference.
v2 λtrn/2(v1, v2)
0
20
40
60
80
100
120
140
160
v2 λtrn/2(v1, v2)
128
136
144
152
160
168
176
184
192
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 53: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/53.jpg)
Non-convexity, informative features
2D problem:
P :“ N p0, Iq, Q :“ N pe1, Iq.
V “ tv1, v2u. Fix v1 to s.
v2 ÞÑ λnptv1, v2uq: contourplot.
Nearby locations: do notincrease discrimininability.
Non-convexity: reveals multipleways to capture the difference.
v2 λtrn/2(v1, v2)
0
20
40
60
80
100
120
140
160
v2 λtrn/2(v1, v2)
128
136
144
152
160
168
176
184
192
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 54: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/54.jpg)
Convergence of the λn estimator
But λn is unknown. Split pX ,Y q into pXtr ,Ytr q and pXte ,Yteq.
1 Locations, kernel parameter: θ “ arg maxθ λtrn2pθq.
2 Test statistic: λten2
`
θ˘
.
Theorem (Guarantee on objective approximation, γn Ñ 0)
supV,K
ˇ
ˇzTn pΣn ` γnq´1zn ´ µTΣ´1µ
ˇ
ˇ “ O`
n´14
˘
.
Examples:
K “"
kσpx, yq “ e´}x´y}2
2σ2 : σ ą 0
*
,
K “!
kApx, yq “ e´px´yqTApx´yq : A ą 0
)
.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 55: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/55.jpg)
Convergence of the λn estimator
But λn is unknown. Split pX ,Y q into pXtr ,Ytr q and pXte ,Yteq.
1 Locations, kernel parameter: θ “ arg maxθ λtrn2pθq.
2 Test statistic: λten2
`
θ˘
.
Theorem (Guarantee on objective approximation, γn Ñ 0)
supV,K
ˇ
ˇzTn pΣn ` γnq´1zn ´ µTΣ´1µ
ˇ
ˇ “ O`
n´14
˘
.
Examples:
K “"
kσpx, yq “ e´}x´y}2
2σ2 : σ ą 0
*
,
K “!
kApx, yq “ e´px´yqTApx´yq : A ą 0
)
.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 56: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/56.jpg)
Convergence of the λn estimator
But λn is unknown. Split pX ,Y q into pXtr ,Ytr q and pXte ,Yteq.
1 Locations, kernel parameter: θ “ arg maxθ λtrn2pθq.
2 Test statistic: λten2
`
θ˘
.
Theorem (Guarantee on objective approximation, γn Ñ 0)
supV,K
ˇ
ˇzTn pΣn ` γnq´1zn ´ µTΣ´1µ
ˇ
ˇ “ O`
n´14
˘
.
Examples:
K “"
kσpx, yq “ e´}x´y}2
2σ2 : σ ą 0
*
,
K “!
kApx, yq “ e´px´yqTApx´yq : A ą 0
)
.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 57: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/57.jpg)
Convergence of the λn estimator
But λn is unknown. Split pX ,Y q into pXtr ,Ytr q and pXte ,Yteq.
1 Locations, kernel parameter: θ “ arg maxθ λtrn2pθq.
2 Test statistic: λten2
`
θ˘
.
Theorem (Guarantee on objective approximation, γn Ñ 0)
supV,K
ˇ
ˇzTn pΣn ` γnq´1zn ´ µTΣ´1µ
ˇ
ˇ “ O`
n´14
˘
.
Examples:
K “"
kσpx, yq “ e´}x´y}2
2σ2 : σ ą 0
*
,
K “!
kApx, yq “ e´px´yqTApx´yq : A ą 0
)
.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 58: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/58.jpg)
Proof idea
Lower bound on the test power:
|λn ´ λn| À }zn ´ µ}2 ` }Σn ´Σ}F .
Bound the r.h.s. by Hoeffding inequality ñ Pp|λn ´ λn| ě tq.By reparameterization: Ppλn ě Tαq bound.
Uniformly λn « λn:
Reduction to bounding supV,K
}zn ´ µ}2, supV,K
}Σn ´Σ}F .
Empirical processes, Dudley entropy bound.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 59: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/59.jpg)
Proof idea
Lower bound on the test power:
|λn ´ λn| À }zn ´ µ}2 ` }Σn ´Σ}F .
Bound the r.h.s. by Hoeffding inequality ñ Pp|λn ´ λn| ě tq.By reparameterization: Ppλn ě Tαq bound.
Uniformly λn « λn:
Reduction to bounding supV,K
}zn ´ µ}2, supV,K
}Σn ´Σ}F .
Empirical processes, Dudley entropy bound.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 60: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/60.jpg)
Numerical demos
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 61: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/61.jpg)
Parameter settings
Gaussian kernel (σ). α “ 0.01. J “ 1. Repeat 500 trials.Report
PprejectH0q «#times λn ą Tα holds
#trials.
Compare 4 methods
ME-full: Optimize V and Gaussian bandwidth σ.ME-grid: Optimize σ. Random V [Chwialkowski et al., 2015].MMD-quad: Test with quadratic-time MMD [Gretton et al., 2012].MMD-lin: Test with linear-time MMD [Gretton et al., 2012].
Optimize kernels to power in MMD-lin, MMD-quad.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 62: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/62.jpg)
NLP: discrimination of document categories
5903 NIPS papers (1988-2015).Keyword-based category assignment into 4 groups:
Bayesian inference, Deep learning, Learning theory, Neuroscience
d “ 2000 nouns. TF-IDF representation.
Problem nte ME-full ME-grid MMD-quad MMD-lin1. Bayes-Bayes 215 .012 .018 .022 .008
2. Bayes-Deep 216 .954 .034 .906 .262
3. Bayes-Learn 138 .990 .774 1.00 .238
4. Bayes-Neuro 394 1.00 .300 .952 .972
5. Learn-Deep 149 .956 .052 .876 .500
6. Learn-Neuro 146 .960 .572 1.00 .538
Performance of ME-full rOpnqs is comparable to MMD-quad rOpn2qs.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 63: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/63.jpg)
NLP: most/least discriminative words
Aggregating over trials; example: ’Bayes-Neuro’.
Most discriminative words:
spike, markov, cortex, dropout, recurr, iii, gibb.
learned test locations: highly interpretable,’markov’, ’gibb’ (ð Gibbs): Bayesian inference,’spike’, ’cortex’: key terms in neuroscience.
Least dicriminative ones:
circumfer, bra, dominiqu, rhino, mitra, kid, impostor.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 64: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/64.jpg)
NLP: most/least discriminative words
Aggregating over trials; example: ’Bayes-Neuro’.
Most discriminative words:
spike, markov, cortex, dropout, recurr, iii, gibb.
learned test locations: highly interpretable,’markov’, ’gibb’ (ð Gibbs): Bayesian inference,’spike’, ’cortex’: key terms in neuroscience.
Least dicriminative ones:
circumfer, bra, dominiqu, rhino, mitra, kid, impostor.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 65: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/65.jpg)
Distinguish positive/negative emotions
Karolinska Directed Emotional Faces (KDEF) [Lundqvist et al., 1998].70 actors = 35 females and 35 males.d “ 48ˆ 34 “ 1632. Grayscale. Pixel features.
` :happy neutral surprised
´ :afraid angry disgusted
Problem nte ME-full ME-grid MMD-quad MMD-lin˘ vs. ˘ 201 .010 .012 .018 .008
` vs. ´ 201 .998 .656 1.00 .578
Learned test location (averaged) =
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 66: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/66.jpg)
Summary
We proposed a nonparametric t-test:
linear time,adaptive Ñ high-power (« ’MMD-quad’),
2 demos: discriminating
documents of different categories,positive/negative emotions.
Extension (independence testing):
https://arxiv.org/abs/1610.04782
https://github.com/wittawatj/fsic-test
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 67: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/67.jpg)
Summary
We proposed a nonparametric t-test:
linear time,adaptive Ñ high-power (« ’MMD-quad’),
2 demos: discriminating
documents of different categories,positive/negative emotions.
Extension (independence testing):
https://arxiv.org/abs/1610.04782
https://github.com/wittawatj/fsic-test
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 68: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/68.jpg)
Thank you for the attention!
Acknowledgements: This work was supported by the GatsbyCharitable Foundation.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 69: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/69.jpg)
Contents
Characteristic functions, infinite J.
Number of locations (J).
MMD: IPM representation.
Estimation of MMD2.
Computational complexity: pJ, n, dq-dependence.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 70: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/70.jpg)
Characteristic functions, infinite J
Characteristic functions – poor choice:
ρ2pP,Qq :“
g
f
f
e
1
J
Jÿ
j“1
rφPpvjq ´ φQpvjqs2.
[Moulines et al., 2007]:
ρ3pP,Qq :“nxnyn
›
›
›C´
12 pµQ ´ µPq
›
›
›
Hk
,
C “nx
nx ` nyCxx `
nynx ` ny
Cyy : pooled covariance operator.
Computational cost: high (cubic).
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 71: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/71.jpg)
Characteristic functions, infinite J
Characteristic functions – poor choice:
ρ2pP,Qq :“
g
f
f
e
1
J
Jÿ
j“1
rφPpvjq ´ φQpvjqs2.
[Moulines et al., 2007]:
ρ3pP,Qq :“nxnyn
›
›
›C´
12 pµQ ´ µPq
›
›
›
Hk
,
C “nx
nx ` nyCxx `
nynx ` ny
Cyy : pooled covariance operator.
Computational cost: high (cubic).
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 72: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/72.jpg)
Characteristic functions, infinite J
Characteristic functions – poor choice:
ρ2pP,Qq :“
g
f
f
e
1
J
Jÿ
j“1
rφPpvjq ´ φQpvjqs2.
[Moulines et al., 2007]:
ρ3pP,Qq :“nxnyn
›
›
›C´
12 pµQ ´ µPq
›
›
›
Hk
,
C “nx
nx ` nyCxx `
nynx ` ny
Cyy : pooled covariance operator.
Computational cost: high (cubic).
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 73: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/73.jpg)
Smoothed characteristic functions
ψPptq “
ż
Rd
φPpωq`pt ´ ωqdω, t P Rd ,
ρ4pP,Qq :“
g
f
f
e
1
J
Jÿ
j“1
rψPpvjq ´ ψQpvjqs2.
It
works,
is more sensitive to differences in the frequency domain.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 74: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/74.jpg)
Smoothed characteristic functions
ψPptq “
ż
Rd
φPpωq`pt ´ ωqdω, t P Rd ,
ρ4pP,Qq :“
g
f
f
e
1
J
Jÿ
j“1
rψPpvjq ´ ψQpvjqs2.
It
works,
is more sensitive to differences in the frequency domain.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 75: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/75.jpg)
Number of locations (J)
Small J:
often enough to detect the difference of P & Q.few distinguishing regions to reject H0.faster test.
Very large J:
test power need not increase monotonically in J (morelocations ñ statistic can gain in variance).defeats the purpose of a linear-time test.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 76: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/76.jpg)
Number of locations (J)
Small J:
often enough to detect the difference of P & Q.few distinguishing regions to reject H0.faster test.
Very large J:
test power need not increase monotonically in J (morelocations ñ statistic can gain in variance).defeats the purpose of a linear-time test.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 77: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/77.jpg)
MMD: IPM representation
MMD2pP,Qq “ }µP ´ µQ}2Hpkq
“
«
sup}f }Hpkqď1
〈µP ´ µQ, f 〉Hpkq
ff2
p˚q“
«
sup}f }Hpkqď1
Ex„Pf pxq ´ Ey„Qf pyq
ff2
.
p˚q in details:
〈µP, f 〉Hpkq “⟨ż
kp¨, xqdPpxq, f⟩
Hpkq
“
ż
〈kp¨, xq, f 〉Hpkqlooooooomooooooon
“f pxq
dPpxq
“ Ex„Pf pxq.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 78: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/78.jpg)
MMD: IPM representation
MMD2pP,Qq “ }µP ´ µQ}2Hpkq “
«
sup}f }Hpkqď1
〈µP ´ µQ, f 〉Hpkq
ff2
p˚q“
«
sup}f }Hpkqď1
Ex„Pf pxq ´ Ey„Qf pyq
ff2
.
p˚q in details:
〈µP, f 〉Hpkq “⟨ż
kp¨, xqdPpxq, f⟩
Hpkq
“
ż
〈kp¨, xq, f 〉Hpkqlooooooomooooooon
“f pxq
dPpxq
“ Ex„Pf pxq.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 79: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/79.jpg)
MMD: IPM representation
MMD2pP,Qq “ }µP ´ µQ}2Hpkq “
«
sup}f }Hpkqď1
〈µP ´ µQ, f 〉Hpkq
ff2
p˚q“
«
sup}f }Hpkqď1
Ex„Pf pxq ´ Ey„Qf pyq
ff2
.
p˚q in details:
〈µP, f 〉Hpkq “⟨ż
kp¨, xqdPpxq, f⟩
Hpkq
“
ż
〈kp¨, xq, f 〉Hpkqlooooooomooooooon
“f pxq
dPpxq
“ Ex„Pf pxq.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 80: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/80.jpg)
MMD: IPM representation
MMD2pP,Qq “ }µP ´ µQ}2Hpkq “
«
sup}f }Hpkqď1
〈µP ´ µQ, f 〉Hpkq
ff2
p˚q“
«
sup}f }Hpkqď1
Ex„Pf pxq ´ Ey„Qf pyq
ff2
.
p˚q in details:
〈µP, f 〉Hpkq “⟨ż
kp¨, xqdPpxq, f⟩
Hpkq
“
ż
〈kp¨, xq, f 〉Hpkqlooooooomooooooon
“f pxq
dPpxq
“ Ex„Pf pxq.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 81: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/81.jpg)
MMD: IPM representation
MMD2pP,Qq “ }µP ´ µQ}2Hpkq “
«
sup}f }Hpkqď1
〈µP ´ µQ, f 〉Hpkq
ff2
p˚q“
«
sup}f }Hpkqď1
Ex„Pf pxq ´ Ey„Qf pyq
ff2
.
p˚q in details:
〈µP, f 〉Hpkq “⟨ż
kp¨, xqdPpxq, f⟩
Hpkq
“
ż
〈kp¨, xq, f 〉Hpkqlooooooomooooooon
“f pxq
dPpxq
“ Ex„Pf pxq.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 82: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/82.jpg)
MMD: IPM representation
MMD2pP,Qq “ }µP ´ µQ}2Hpkq “
«
sup}f }Hpkqď1
〈µP ´ µQ, f 〉Hpkq
ff2
p˚q“
«
sup}f }Hpkqď1
Ex„Pf pxq ´ Ey„Qf pyq
ff2
.
p˚q in details:
〈µP, f 〉Hpkq “⟨ż
kp¨, xqdPpxq, f⟩
Hpkq
“
ż
〈kp¨, xq, f 〉Hpkqlooooooomooooooon
“f pxq
dPpxq
“ Ex„Pf pxq.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 83: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/83.jpg)
Estimation of MMD2
Squared difference between feature means:
MMD2pP,Qq “ }µP ´ µQ}2H “ 〈µP ´ µQ, µP ´ µQ〉H“ 〈µP, µP〉H ` 〈µQ, µQ〉H ´ 2 〈µP, µQ〉H“ EP,Pkpx , x
1q ` EQ,Qkpy , y1q ´ 2EP,Qkpx , yq.
Unbiased empirical estimate for txiuni“1 „ P, tyju
nj“1 „ Q:
{MMD2pP,Qq “ ĚKP,P `ĘKQ,Q ´ 2ĘKP,Q.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 84: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/84.jpg)
Estimation of MMD2
Squared difference between feature means:
MMD2pP,Qq “ }µP ´ µQ}2H “ 〈µP ´ µQ, µP ´ µQ〉H“ 〈µP, µP〉H ` 〈µQ, µQ〉H ´ 2 〈µP, µQ〉H“ EP,Pkpx , x
1q ` EQ,Qkpy , y1q ´ 2EP,Qkpx , yq.
Unbiased empirical estimate for txiuni“1 „ P, tyju
nj“1 „ Q:
{MMD2pP,Qq “ ĚKP,P `ĘKQ,Q ´ 2ĘKP,Q.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 85: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/85.jpg)
Computational complexity
Optimization & testing: linear in n.
Testing: O`
ndJ ` nJ2 ` J3˘
.
Optimization: O`
ndJ2 ` J3˘
per gradient ascent.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 86: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/86.jpg)
Chwialkowski, K., Ramdas, A., Sejdinovic, D., and Gretton, A.(2015).Fast Two-Sample Testing with Analytic Representations ofProbability Measures.In Neural Information Processing Systems (NIPS), pages1981–1989.
Gretton, A., Borgwardt, K., Rasch, M., Scholkopf, B., andSmola, A. (2012).A kernel two-sample test.Journal of Machine Learning Research, 13:723–773.
Jitkrittum, W., Szabo, Z., Chwialkowski, K., and Gretton, A.(2016).Interpretable distribution features with maximum testingpower.In Neural Information Processing Systems (NIPS).
Lundqvist, D., Flykt, A., and Ohman, A. (1998).The Karolinska directed emotional faces-KDEF.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test
![Page 87: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat](https://reader031.vdocuments.site/reader031/viewer/2022022500/5aa2c0527f8b9ab4208d727b/html5/thumbnails/87.jpg)
Technical report, ISBN 91-630-7164-9.
Moulines, E., Bach, F. R., and Harchaoui, Z. (2007).Testing for homogenity with kernel Fisher discriminantanalysis.In Neural Information Processing Systems (NIPS), pages609–616.
Zoltan Szabo A linear-time adaptive nonparametric two-sample test