jacob steinhardt with banghua zhu and jiantao jiao...the difficulty simple example: mean...
TRANSCRIPT
![Page 1: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/1.jpg)
Generalized Resilience and Robust Statistics
Jacob Steinhardtwith Banghua Zhu and Jiantao Jiao
UC Berkeley
August 8, 2019
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 1 / 19
![Page 2: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/2.jpg)
Motivation
Would like to design robust estimators:
Process error
Measurement error
Outliers
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 2 / 19
![Page 3: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/3.jpg)
Motivation
Would like to design robust estimators:
Process error
Measurement error
Outliers
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 2 / 19
![Page 4: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/4.jpg)
Motivation
Would like to design robust estimators:
Process error
Measurement error
Outliers
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 2 / 19
![Page 5: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/5.jpg)
Motivation
Would like to design robust estimators:
Process error
Measurement error
Outliers
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 2 / 19
![Page 6: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/6.jpg)
The Difficulty
Simple example: mean estimation.
Estimate mean of distribution in Rd with ε fraction of outliers.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 3 / 19
![Page 7: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/7.jpg)
The Difficulty
Simple example: mean estimation.
Estimate mean of distribution in Rd with ε fraction of outliers.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 3 / 19
![Page 8: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/8.jpg)
The Difficulty
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 3 / 19
![Page 9: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/9.jpg)
The Difficulty
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 3 / 19
![Page 10: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/10.jpg)
Context and Overview
Recent work designs outlier-robust estimators in many settings:
mean estimation [DKKLMS16/17, LRV16, CSV17, SCV18, ...]
regression [KK18, PSBR18, DKKLSS18]
classification [KLS09, ABL14, DKS17], etc.
Will generalize and extend the insights:
general treatment of population limit in presence of outliers
new finite-sample analysis based on generalized KS distance
robustness to Wasserstein corruptions based on “friendly perturbations”
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 4 / 19
![Page 11: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/11.jpg)
Setting
p
p̃p∗true distribution
D(p∗,p)≤ ε
∈
Gdistributional assumptions
corrupted distribution
population distribution
X1, . . . ,Xn
samples
θ̂(X1, . . . ,Xn)
estimated parameters
L(p∗, θ̂)
cost
Three goals:
large Gpopulation limit
statistical rate
Example D = Wc : cost c(x ,y) to move x to y , average cost ≤ ε .
c(x ,y) = I[x 6= y ]: TV distance (outliers)
c(x ,y) = ‖x− y‖2: earthmover distance (measurement error)
c(x ,y) = ‖x− y‖0: corrupted entries
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 5 / 19
![Page 12: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/12.jpg)
Setting
p
p̃p∗true distribution
D(p∗,p)≤ ε
∈
Gdistributional assumptions
corrupted distribution
population distribution
X1, . . . ,Xn
samples
θ̂(X1, . . . ,Xn)
estimated parameters
L(p∗, θ̂)
cost
Three goals:
large Gpopulation limit
statistical rate
Example D = Wc : cost c(x ,y) to move x to y , average cost ≤ ε .
c(x ,y) = I[x 6= y ]: TV distance (outliers)
c(x ,y) = ‖x− y‖2: earthmover distance (measurement error)
c(x ,y) = ‖x− y‖0: corrupted entries
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 5 / 19
![Page 13: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/13.jpg)
Setting
p
p̃p∗true distribution
D(p∗,p)≤ ε
∈
Gdistributional assumptions
corrupted distribution
population distribution
X1, . . . ,Xn
samples
θ̂(X1, . . . ,Xn)
estimated parameters
L(p∗, θ̂)
cost
Three goals:
large Gpopulation limit
statistical rate
Example D = Wc : cost c(x ,y) to move x to y , average cost ≤ ε .
c(x ,y) = I[x 6= y ]: TV distance (outliers)
c(x ,y) = ‖x− y‖2: earthmover distance (measurement error)
c(x ,y) = ‖x− y‖0: corrupted entries
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 5 / 19
![Page 14: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/14.jpg)
Setting
p
p̃p∗true distribution
D(p∗,p)≤ ε
∈
Gdistributional assumptions
corrupted distribution
population distribution
X1, . . . ,Xn
samples
θ̂(X1, . . . ,Xn)
estimated parameters
L(p∗, θ̂)
cost
Three goals:
large Gpopulation limit
statistical rate
Example D = Wc : cost c(x ,y) to move x to y , average cost ≤ ε .
c(x ,y) = I[x 6= y ]: TV distance (outliers)
c(x ,y) = ‖x− y‖2: earthmover distance (measurement error)
c(x ,y) = ‖x− y‖0: corrupted entries
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 5 / 19
![Page 15: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/15.jpg)
Setting
p
p̃p∗true distribution
D(p∗,p)≤ ε
∈
Gdistributional assumptions
corrupted distribution
population distribution
X1, . . . ,Xn
samples
θ̂(X1, . . . ,Xn)
estimated parameters
L(p∗, θ̂)
cost
Three goals:
large Gpopulation limit
statistical rate
Example D = Wc : cost c(x ,y) to move x to y , average cost ≤ ε .
c(x ,y) = I[x 6= y ]: TV distance (outliers)
c(x ,y) = ‖x− y‖2: earthmover distance (measurement error)
c(x ,y) = ‖x− y‖0: corrupted entries
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 5 / 19
![Page 16: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/16.jpg)
Warm-up: TV, mean estimation
Warm-up problem: D = TV , L(p,θ) = ‖µ(p)−θ‖, where µ(p) = Ex∼p[x].
Mean estimation with outliers.
Key lemma: projection estimator. First observed by Donoho and Liu (1988).
Lemma
Suppose p∗ ∈ G, and define θ̂(p) = µ(q), where q = argminq∈G TV (p,q).
Then L(p∗, θ̂(p̃)) is upper-bounded by modu(G,2ε), where
modu(G,ε) := supp,p′∈G,TV(p,p′)≤ε
‖µ(p)−µ(p′)‖.
Proof: TV (p∗,q)≤ 2ε , and p∗,q both lie in G.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 6 / 19
![Page 17: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/17.jpg)
Warm-up: TV, mean estimation
Warm-up problem: D = TV , L(p,θ) = ‖µ(p)−θ‖, where µ(p) = Ex∼p[x].
Mean estimation with outliers.
Key lemma: projection estimator. First observed by Donoho and Liu (1988).
Lemma
Suppose p∗ ∈ G, and define θ̂(p) = µ(q), where q = argminq∈G TV (p,q).
Then L(p∗, θ̂(p̃)) is upper-bounded by modu(G,2ε), where
modu(G,ε) := supp,p′∈G,TV(p,p′)≤ε
‖µ(p)−µ(p′)‖.
Proof: TV (p∗,q)≤ 2ε , and p∗,q both lie in G.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 6 / 19
![Page 18: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/18.jpg)
Warm-up: TV, mean estimation
Warm-up problem: D = TV , L(p,θ) = ‖µ(p)−θ‖, where µ(p) = Ex∼p[x].
Mean estimation with outliers.
Key lemma: projection estimator. First observed by Donoho and Liu (1988).
Lemma
Suppose p∗ ∈ G, and define θ̂(p) = µ(q), where q = argminq∈G TV (p,q).
Then L(p∗, θ̂(p̃)) is upper-bounded by modu(G,2ε), where
modu(G,ε) := supp,p′∈G,TV(p,p′)≤ε
‖µ(p)−µ(p′)‖.
Proof: TV (p∗,q)≤ 2ε , and p∗,q both lie in G.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 6 / 19
![Page 19: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/19.jpg)
Modulus: Examples
modu(G,ε) := supp,p′∈G,TV(p,p′)≤ε
‖µ(p)−µ(p′)‖.
Example: Gaussians. G = {N (µ, I) | µ ∈ Rd}.TV(N (µ, I),N (µ ′, I))≈ ‖µ−µ ′‖2.
Hence modu(G,ε)≈ ε .
Generalization: G = sub-Gaussians (parameter σ ).
Can show that modu(G,ε) =O(σε√
log(1/ε)).
Key lemma: thin tails =⇒ ε-perturbation can’t change mean much.
General property: resilience.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 7 / 19
![Page 20: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/20.jpg)
Modulus: Examples
modu(G,ε) := supp,p′∈G,TV(p,p′)≤ε
‖µ(p)−µ(p′)‖.
Example: Gaussians. G = {N (µ, I) | µ ∈ Rd}.TV(N (µ, I),N (µ ′, I))≈ ‖µ−µ ′‖2.
Hence modu(G,ε)≈ ε .
Generalization: G = sub-Gaussians (parameter σ ).
Can show that modu(G,ε) =O(σε√
log(1/ε)).
Key lemma: thin tails =⇒ ε-perturbation can’t change mean much.
General property: resilience.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 7 / 19
![Page 21: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/21.jpg)
Modulus: Examples
modu(G,ε) := supp,p′∈G,TV(p,p′)≤ε
‖µ(p)−µ(p′)‖.
Example: Gaussians. G = {N (µ, I) | µ ∈ Rd}.TV(N (µ, I),N (µ ′, I))≈ ‖µ−µ ′‖2.
Hence modu(G,ε)≈ ε .
Generalization: G = sub-Gaussians (parameter σ ).
Can show that modu(G,ε) =O(σε√
log(1/ε)).
Key lemma: thin tails =⇒ ε-perturbation can’t change mean much.
General property: resilience.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 7 / 19
![Page 22: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/22.jpg)
Modulus: Examples
modu(G,ε) := supp,p′∈G,TV(p,p′)≤ε
‖µ(p)−µ(p′)‖.
Example: Gaussians. G = {N (µ, I) | µ ∈ Rd}.TV(N (µ, I),N (µ ′, I))≈ ‖µ−µ ′‖2.
Hence modu(G,ε)≈ ε .
Generalization: G = sub-Gaussians (parameter σ ).
Can show that modu(G,ε) =O(σε√
log(1/ε)).
Key lemma: thin tails =⇒ ε-perturbation can’t change mean much.
General property: resilience.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 7 / 19
![Page 23: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/23.jpg)
Resilience
Definition (Resilience)
A distribution p is (ρ,ε)-resilient if ‖µ(p)−µ(r)‖ ≤ ρ whenever r ≤ p1−ε
.
(The condition r ≤ p1−ε
means that r is an ε-deletion of p.)
Lemma (Resilience =⇒ bounded modulus)
Let G(ρ,ε) = {p | p is (ρ,ε)-resilient}. Then modu(G(ρ,ε),ε)≤ 2ρ .
µp µp′µr
Modulus lemma yields optimal bound in mostknown cases!
Sub-Gaussian: ρ =O(ε√
log(1/ε))
Bounded k th moments: ρ =O(ε1−1/k )
Proof: Let p,p′ ∈ G(ρ,ε). Define midpoint r = min(p,p′)1−TV(p,p′) . Then r ≤ p
1−ε, p′
1−ε.
Thus ‖µ(p)−µ(p′)‖ ≤ ‖µ(p)−µ(r)‖+‖µ(p′)−µ(r)‖ ≤ 2ρ .
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 8 / 19
![Page 24: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/24.jpg)
Resilience
Definition (Resilience)
A distribution p is (ρ,ε)-resilient if ‖µ(p)−µ(r)‖ ≤ ρ whenever r ≤ p1−ε
.
(The condition r ≤ p1−ε
means that r is an ε-deletion of p.)
Lemma (Resilience =⇒ bounded modulus)
Let G(ρ,ε) = {p | p is (ρ,ε)-resilient}. Then modu(G(ρ,ε),ε)≤ 2ρ .
µp µp′µr
Modulus lemma yields optimal bound in mostknown cases!
Sub-Gaussian: ρ =O(ε√
log(1/ε))
Bounded k th moments: ρ =O(ε1−1/k )
Proof: Let p,p′ ∈ G(ρ,ε). Define midpoint r = min(p,p′)1−TV(p,p′) . Then r ≤ p
1−ε, p′
1−ε.
Thus ‖µ(p)−µ(p′)‖ ≤ ‖µ(p)−µ(r)‖+‖µ(p′)−µ(r)‖ ≤ 2ρ .
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 8 / 19
![Page 25: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/25.jpg)
Resilience
Definition (Resilience)
A distribution p is (ρ,ε)-resilient if ‖µ(p)−µ(r)‖ ≤ ρ whenever r ≤ p1−ε
.
(The condition r ≤ p1−ε
means that r is an ε-deletion of p.)
Lemma (Resilience =⇒ bounded modulus)
Let G(ρ,ε) = {p | p is (ρ,ε)-resilient}. Then modu(G(ρ,ε),ε)≤ 2ρ .
µp µp′µr
Modulus lemma yields optimal bound in mostknown cases!
Sub-Gaussian: ρ =O(ε√
log(1/ε))
Bounded k th moments: ρ =O(ε1−1/k )
Proof: Let p,p′ ∈ G(ρ,ε).
Define midpoint r = min(p,p′)1−TV(p,p′) . Then r ≤ p
1−ε, p′
1−ε.
Thus ‖µ(p)−µ(p′)‖ ≤ ‖µ(p)−µ(r)‖+‖µ(p′)−µ(r)‖ ≤ 2ρ .
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 8 / 19
![Page 26: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/26.jpg)
Resilience
Definition (Resilience)
A distribution p is (ρ,ε)-resilient if ‖µ(p)−µ(r)‖ ≤ ρ whenever r ≤ p1−ε
.
(The condition r ≤ p1−ε
means that r is an ε-deletion of p.)
Lemma (Resilience =⇒ bounded modulus)
Let G(ρ,ε) = {p | p is (ρ,ε)-resilient}. Then modu(G(ρ,ε),ε)≤ 2ρ .
µp µp′µr
Modulus lemma yields optimal bound in mostknown cases!
Sub-Gaussian: ρ =O(ε√
log(1/ε))
Bounded k th moments: ρ =O(ε1−1/k )
Proof: Let p,p′ ∈ G(ρ,ε). Define midpoint r = min(p,p′)1−TV(p,p′) . Then r ≤ p
1−ε, p′
1−ε.
Thus ‖µ(p)−µ(p′)‖ ≤ ‖µ(p)−µ(r)‖+‖µ(p′)−µ(r)‖ ≤ 2ρ .
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 8 / 19
![Page 27: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/27.jpg)
Resilience
Definition (Resilience)
A distribution p is (ρ,ε)-resilient if ‖µ(p)−µ(r)‖ ≤ ρ whenever r ≤ p1−ε
.
(The condition r ≤ p1−ε
means that r is an ε-deletion of p.)
Lemma (Resilience =⇒ bounded modulus)
Let G(ρ,ε) = {p | p is (ρ,ε)-resilient}. Then modu(G(ρ,ε),ε)≤ 2ρ .
µp µp′µr
Modulus lemma yields optimal bound in mostknown cases!
Sub-Gaussian: ρ =O(ε√
log(1/ε))
Bounded k th moments: ρ =O(ε1−1/k )
Proof: Let p,p′ ∈ G(ρ,ε). Define midpoint r = min(p,p′)1−TV(p,p′) . Then r ≤ p
1−ε, p′
1−ε.
Thus ‖µ(p)−µ(p′)‖ ≤ ‖µ(p)−µ(r)‖+‖µ(p′)−µ(r)‖ ≤ 2ρ .
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 8 / 19
![Page 28: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/28.jpg)
Resilience
Definition (Resilience)
A distribution p is (ρ,ε)-resilient if ‖µ(p)−µ(r)‖ ≤ ρ whenever r ≤ p1−ε
.
(The condition r ≤ p1−ε
means that r is an ε-deletion of p.)
Lemma (Resilience =⇒ bounded modulus)
Let G(ρ,ε) = {p | p is (ρ,ε)-resilient}. Then modu(G(ρ,ε),ε)≤ 2ρ .
µp µp′µr
Modulus lemma yields optimal bound in mostknown cases!
Sub-Gaussian: ρ =O(ε√
log(1/ε))
Bounded k th moments: ρ =O(ε1−1/k )
Proof: Let p,p′ ∈ G(ρ,ε). Define midpoint r = min(p,p′)1−TV(p,p′) . Then r ≤ p
1−ε, p′
1−ε.
Thus ‖µ(p)−µ(p′)‖ ≤ ‖µ(p)−µ(r)‖+‖µ(p′)−µ(r)‖ ≤ 2ρ .
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 8 / 19
![Page 29: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/29.jpg)
Finite-sample estimation
Resilience characterizes error when n = ∞, what about finite samples?
Projection algorithm: take argminq∈G TV(p̃,q).
Problem: if p̃ is discrete and q is continuous, TV(p̃,q) = 1!
Solution: relax the distance!
T̃VH(p,q) = supt∈R,h∈H
|p(h(X)≥ t)−q(h(X)≥ t)|.
Lemmas:
Modulus is still bounded if we replace TV with T̃VH, whereH= {x 7→ 〈v ,x〉 | v ∈ Rd}.T̃VH(p, p̂n) =O(
√vc(H)/n) [Devroye and Lugosi]
Upshot: projection still works, but use T̃VH instead of TV.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 9 / 19
![Page 30: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/30.jpg)
Finite-sample estimation
Resilience characterizes error when n = ∞, what about finite samples?
Projection algorithm: take argminq∈G TV(p̃,q).
Problem: if p̃ is discrete and q is continuous, TV(p̃,q) = 1!
Solution: relax the distance!
T̃VH(p,q) = supt∈R,h∈H
|p(h(X)≥ t)−q(h(X)≥ t)|.
Lemmas:
Modulus is still bounded if we replace TV with T̃VH, whereH= {x 7→ 〈v ,x〉 | v ∈ Rd}.T̃VH(p, p̂n) =O(
√vc(H)/n) [Devroye and Lugosi]
Upshot: projection still works, but use T̃VH instead of TV.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 9 / 19
![Page 31: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/31.jpg)
Finite-sample estimation
Resilience characterizes error when n = ∞, what about finite samples?
Projection algorithm: take argminq∈G TV(p̃,q).
Problem: if p̃ is discrete and q is continuous, TV(p̃,q) = 1!
Solution: relax the distance!
T̃VH(p,q) = supt∈R,h∈H
|p(h(X)≥ t)−q(h(X)≥ t)|.
Lemmas:
Modulus is still bounded if we replace TV with T̃VH, whereH= {x 7→ 〈v ,x〉 | v ∈ Rd}.T̃VH(p, p̂n) =O(
√vc(H)/n) [Devroye and Lugosi]
Upshot: projection still works, but use T̃VH instead of TV.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 9 / 19
![Page 32: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/32.jpg)
General TV case
Focused so far on mean estimation. Now generalize to arbitrary loss.
Stick with D = TV, but replace ‖µ(p)−θ‖ with arbitrary L(p,θ).
Modulus still gives bound:
Lemma
Suppose p∗ ∈ G, and define θ̂(p) = θ ∗(q), where q = argminq∈G TV (p,q).
Then L(p∗, θ̂(p̃)) is upper-bounded by modu(G,2ε), where
modu(G,ε) := supp,p′∈G,TV(p,p′)≤ε
L(p,θ ∗(p′)).
Can we generalize resilience to this setting?
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 10 / 19
![Page 33: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/33.jpg)
General TV case
Focused so far on mean estimation. Now generalize to arbitrary loss.
Stick with D = TV, but replace ‖µ(p)−θ‖ with arbitrary L(p,θ).
Modulus still gives bound:
Lemma
Suppose p∗ ∈ G, and define θ̂(p) = θ ∗(q), where q = argminq∈G TV (p,q).
Then L(p∗, θ̂(p̃)) is upper-bounded by modu(G,2ε), where
modu(G,ε) := supp,p′∈G,TV(p,p′)≤ε
L(p,θ ∗(p′)).
Can we generalize resilience to this setting?
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 10 / 19
![Page 34: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/34.jpg)
Resilience: Arbitrary lossRecall before: p is resilient if ‖µ(p)−µ(r)‖ small whenever r ≤ p
1−ε.
Now two conditions: G↓, G↑.
G↓(ρ1,ε) = {p | L(r ,θ ∗(p))≤ ρ1 whenever r ≤ p1−ε},
G↑(ρ1,ρ2,ε) = {p | L(p,θ)≤ ρ2 whenever L(r ,θ)≤ ρ1 and r ≤ p1−ε}.
Lemma (Resilience =⇒ small modulus)
Let G = G↓(ρ1,ε)∩G↑(ρ1,ρ2,ε). Then modu(G,ε)≤ ρ2.
Proof: p p′D(p,p′)≤ ε
r = min(p,p′)1−TV(p,p′)
r ≤ p1−ε r ≤ p′
1−ε
p′ ∈ G↓ B(r ,θ ∗(p′))≤ ρ1 L(p,θ ∗(p′))≤ ρ2p ∈ G↑
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 11 / 19
![Page 35: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/35.jpg)
Resilience: Arbitrary lossRecall before: p is resilient if ‖µ(p)−µ(r)‖ small whenever r ≤ p
1−ε.
Now two conditions: G↓, G↑.
G↓(ρ1,ε) = {p | L(r ,θ ∗(p))≤ ρ1 whenever r ≤ p1−ε},
G↑(ρ1,ρ2,ε) = {p | L(p,θ)≤ ρ2 whenever L(r ,θ)≤ ρ1 and r ≤ p1−ε}.
Lemma (Resilience =⇒ small modulus)
Let G = G↓(ρ1,ε)∩G↑(ρ1,ρ2,ε). Then modu(G,ε)≤ ρ2.
Proof: p p′D(p,p′)≤ ε
r = min(p,p′)1−TV(p,p′)
r ≤ p1−ε r ≤ p′
1−ε
p′ ∈ G↓ B(r ,θ ∗(p′))≤ ρ1 L(p,θ ∗(p′))≤ ρ2p ∈ G↑
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 11 / 19
![Page 36: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/36.jpg)
Resilience: Arbitrary lossRecall before: p is resilient if ‖µ(p)−µ(r)‖ small whenever r ≤ p
1−ε.
Now two conditions: G↓, G↑.
G↓(ρ1,ε) = {p | L(r ,θ ∗(p))≤ ρ1 whenever r ≤ p1−ε},
G↑(ρ1,ρ2,ε) = {p | L(p,θ)≤ ρ2 whenever L(r ,θ)≤ ρ1 and r ≤ p1−ε}.
Lemma (Resilience =⇒ small modulus)
Let G = G↓(ρ1,ε)∩G↑(ρ1,ρ2,ε). Then modu(G,ε)≤ ρ2.
Proof: p p′D(p,p′)≤ ε
r = min(p,p′)1−TV(p,p′)
r ≤ p1−ε r ≤ p′
1−ε
p′ ∈ G↓ B(r ,θ ∗(p′))≤ ρ1 L(p,θ ∗(p′))≤ ρ2p ∈ G↑
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 11 / 19
![Page 37: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/37.jpg)
Example: Linear regression
Linear regression: L(p,θ) = E(x ,y)∼p[(y−θ>x)2]−E(x ,y)∼p[(y− (θ ∗)>x)2].
Proposition (Sufficient conditions for linear regression)
Let Z = Y − (θ ∗)>X be the regression error under the true parameters θ ∗.Suppose that
E[Z 2k ]≤ 1 and E[(v>X)2k ]≤ τ2kE[(v>X)2]k ∀v ∈ Rd .
Then p∗ is resilient with ρ2 =O(τ2ε2−2/k ).
Comparisons:
Delete points to minimize regression error (Klivans-Kothari-Mekha 2018):suboptimal error ε1−1/k
Diakonikolas-Kong-Stewart (2019) delete points to enforce momentcondition: requires isotropy + 4th moments similar to Gaussian
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 12 / 19
![Page 38: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/38.jpg)
Example: Linear regression
Linear regression: L(p,θ) = E(x ,y)∼p[(y−θ>x)2]−E(x ,y)∼p[(y− (θ ∗)>x)2].
Proposition (Sufficient conditions for linear regression)
Let Z = Y − (θ ∗)>X be the regression error under the true parameters θ ∗.Suppose that
E[Z 2k ]≤ 1 and E[(v>X)2k ]≤ τ2kE[(v>X)2]k ∀v ∈ Rd .
Then p∗ is resilient with ρ2 =O(τ2ε2−2/k ).
Comparisons:
Delete points to minimize regression error (Klivans-Kothari-Mekha 2018):suboptimal error ε1−1/k
Diakonikolas-Kong-Stewart (2019) delete points to enforce momentcondition: requires isotropy + 4th moments similar to Gaussian
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 12 / 19
![Page 39: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/39.jpg)
Example: Covariance estimation
Given distribution with mean µp and covariance Σp.Goal: output µ , Σ such that
‖I−Σ−1/2p ΣΣ
−1/2p ‖2 and ‖Σ−1/2
p (µp−µ)‖2
are both small.
Proposition (Sufficient condition for covariance estimation)
Suppose that E[(v>Σ−1/2p (X −µp))2k ]≤ σ2k‖v‖2k
2 ∀v ∈ Rd . Then we canoutput Σ, µ such that
‖I−Σ−1/2p ΣΣ
−1/2p ‖2 ≤O(σε
1−1/k ) and (1)
‖Σ−1/2p (µp−µ)‖2 ≤O(σε
1−1/2k ). (2)
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 13 / 19
![Page 40: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/40.jpg)
Example: Covariance estimation
Given distribution with mean µp and covariance Σp.Goal: output µ , Σ such that
‖I−Σ−1/2p ΣΣ
−1/2p ‖2 and ‖Σ−1/2
p (µp−µ)‖2
are both small.
Proposition (Sufficient condition for covariance estimation)
Suppose that E[(v>Σ−1/2p (X −µp))2k ]≤ σ2k‖v‖2k
2 ∀v ∈ Rd . Then we canoutput Σ, µ such that
‖I−Σ−1/2p ΣΣ
−1/2p ‖2 ≤O(σε
1−1/k ) and (1)
‖Σ−1/2p (µp−µ)‖2 ≤O(σε
1−1/2k ). (2)
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 13 / 19
![Page 41: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/41.jpg)
Extension to other perturbations (Wc)
Recap: modulus determines robustness, resilience is sufficient condition forrobustness in TV case.
Next extend results from TV to other Wc (transportation) distances.
Recall Wc(p,q) is cost to “move” p to q if moving x → y costs c(x ,y).
Formally: Wc(p,q) = minπ{Eπ [c(x ,y)] | π(x) = p(x),π(y) = q(y)}.
Key midpoint property of resilience: if TV(p,q)≤ ε , there exists midpoint rsuch that r ≤ p
1−εand r ≤ q
1−ε.
How to generalize to Wc?
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 14 / 19
![Page 42: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/42.jpg)
Extension to other perturbations (Wc)
Recap: modulus determines robustness, resilience is sufficient condition forrobustness in TV case.
Next extend results from TV to other Wc (transportation) distances.
Recall Wc(p,q) is cost to “move” p to q if moving x → y costs c(x ,y).
Formally: Wc(p,q) = minπ{Eπ [c(x ,y)] | π(x) = p(x),π(y) = q(y)}.
Key midpoint property of resilience: if TV(p,q)≤ ε , there exists midpoint rsuch that r ≤ p
1−εand r ≤ q
1−ε.
How to generalize to Wc?
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 14 / 19
![Page 43: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/43.jpg)
Extension to other perturbations (Wc)
Recap: modulus determines robustness, resilience is sufficient condition forrobustness in TV case.
Next extend results from TV to other Wc (transportation) distances.
Recall Wc(p,q) is cost to “move” p to q if moving x → y costs c(x ,y).
Formally: Wc(p,q) = minπ{Eπ [c(x ,y)] | π(x) = p(x),π(y) = q(y)}.
Key midpoint property of resilience: if TV(p,q)≤ ε , there exists midpoint rsuch that r ≤ p
1−εand r ≤ q
1−ε.
How to generalize to Wc?
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 14 / 19
![Page 44: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/44.jpg)
Friendly perturbations
Consider one-dimensional case:
µpµp µr
Delete ε-mass: µp→ µr .
Alternative: move ε-mass towards µr .
Doesn’t reference deletion, defined for any Wc !
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 15 / 19
![Page 45: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/45.jpg)
Friendly perturbations
Consider one-dimensional case:
µpµp µr
Delete ε-mass: µp→ µr .
Alternative: move ε-mass towards µr .
Doesn’t reference deletion, defined for any Wc !
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 15 / 19
![Page 46: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/46.jpg)
Friendly perturbations
Consider one-dimensional case:
µpµp µr
Delete ε-mass: µp→ µr .
Alternative: move ε-mass towards µr .
Doesn’t reference deletion, defined for any Wc !
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 15 / 19
![Page 47: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/47.jpg)
Friendly perturbations
Consider one-dimensional case:
µpµp µr
Delete ε-mass: µp→ µr .
Alternative: move ε-mass towards µr .
Doesn’t reference deletion, defined for any Wc !
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 15 / 19
![Page 48: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/48.jpg)
Friendly perturbation: formal definition
Definition (Friendly perturbation)For a distribution p over X , fix a function f : X → R. A distribution r is anε-friendly perturbation of p if there is a coupling π between p and r such that:
The cost Eπ [c(x ,y)] is at most ε .
All points move towards the mean of r : f (y) is between f (x) and Er [f (x)]almost surely.
µp µrµr µp′
Midpoint lemma:Proof by picture
Lemma: if X has “nice topology”, any p and p′ with Wc(p,p′)≤ ε have anε-friendly midpoint.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 16 / 19
![Page 49: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/49.jpg)
Friendly perturbation: formal definition
Definition (Friendly perturbation)For a distribution p over X , fix a function f : X → R. A distribution r is anε-friendly perturbation of p if there is a coupling π between p and r such that:
The cost Eπ [c(x ,y)] is at most ε .
All points move towards the mean of r : f (y) is between f (x) and Er [f (x)]almost surely.
µp µrµr µp′
Midpoint lemma:Proof by picture
Lemma: if X has “nice topology”, any p and p′ with Wc(p,p′)≤ ε have anε-friendly midpoint.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 16 / 19
![Page 50: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/50.jpg)
Friendly perturbation: formal definition
Definition (Friendly perturbation)For a distribution p over X , fix a function f : X → R. A distribution r is anε-friendly perturbation of p if there is a coupling π between p and r such that:
The cost Eπ [c(x ,y)] is at most ε .
All points move towards the mean of r : f (y) is between f (x) and Er [f (x)]almost surely.
µp µrµr µp′
Midpoint lemma:Proof by picture
Lemma: if X has “nice topology”, any p and p′ with Wc(p,p′)≤ ε have anε-friendly midpoint.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 16 / 19
![Page 51: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/51.jpg)
Resilience for Wc
Definition (Resilience for fixed f )
For any distribution p, we say that p is (ρ,ε, f )-resilient if every ε-friendlyperturbation r of p has |Er [f ]−Ep[f ]| ≤ ρ .
How to extend from one-dimensional f to arbitrary loss L(p,θ)?
Answer: if L(p,θ) is convex in p, use Fenchel-Moreau theorem:
L(p,θ) = supf∈Fθ
Ep[f ]−L∗(f ,θ)
Then apply to each f in Fenchel-Moreau representation.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 17 / 19
![Page 52: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/52.jpg)
Resilience for Wc
Definition (Resilience for fixed f )
For any distribution p, we say that p is (ρ,ε, f )-resilient if every ε-friendlyperturbation r of p has |Er [f ]−Ep[f ]| ≤ ρ .
How to extend from one-dimensional f to arbitrary loss L(p,θ)?
Answer: if L(p,θ) is convex in p, use Fenchel-Moreau theorem:
L(p,θ) = supf∈Fθ
Ep[f ]−L∗(f ,θ)
Then apply to each f in Fenchel-Moreau representation.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 17 / 19
![Page 53: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/53.jpg)
Resilience for Wc
Definition (Resilience for fixed f )
For any distribution p, we say that p is (ρ,ε, f )-resilient if every ε-friendlyperturbation r of p has |Er [f ]−Ep[f ]| ≤ ρ .
How to extend from one-dimensional f to arbitrary loss L(p,θ)?
Answer: if L(p,θ) is convex in p, use Fenchel-Moreau theorem:
L(p,θ) = supf∈Fθ
Ep[f ]−L∗(f ,θ)
Then apply to each f in Fenchel-Moreau representation.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 17 / 19
![Page 54: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/54.jpg)
Example: Linear regression
Under roughly similar assumptions to TV case, get ε1−1/k error assumingbounded 2(k + 1) moments.
Error ε1−1/k likely suboptimal (should be ε2−2/k ).
k + 1 vs k in moment condition is typical behavior for W1 vs TV
Finite-sample analysis:
Can construct W̃H analogous to T̃VH.
However, construction more complex and doesn’t always work.
Can at least show W̃H(p, p̂n) =O((d/n)1/2 + (1/n)1/3) when p hasbounded 3rd moments.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 18 / 19
![Page 55: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/55.jpg)
Example: Linear regression
Under roughly similar assumptions to TV case, get ε1−1/k error assumingbounded 2(k + 1) moments.
Error ε1−1/k likely suboptimal (should be ε2−2/k ).
k + 1 vs k in moment condition is typical behavior for W1 vs TV
Finite-sample analysis:
Can construct W̃H analogous to T̃VH.
However, construction more complex and doesn’t always work.
Can at least show W̃H(p, p̂n) =O((d/n)1/2 + (1/n)1/3) when p hasbounded 3rd moments.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 18 / 19
![Page 56: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/56.jpg)
Example: Linear regression
Under roughly similar assumptions to TV case, get ε1−1/k error assumingbounded 2(k + 1) moments.
Error ε1−1/k likely suboptimal (should be ε2−2/k ).
k + 1 vs k in moment condition is typical behavior for W1 vs TV
Finite-sample analysis:
Can construct W̃H analogous to T̃VH.
However, construction more complex and doesn’t always work.
Can at least show W̃H(p, p̂n) =O((d/n)1/2 + (1/n)1/3) when p hasbounded 3rd moments.
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 18 / 19
![Page 57: Jacob Steinhardt with Banghua Zhu and Jiantao Jiao...The Difficulty Simple example: mean estimation. Estimate mean of distribution in Rd with e fraction of outliers. Zhu, Jiao, Steinhardt](https://reader033.vdocuments.site/reader033/viewer/2022041705/5e4421df39ce0938201691c8/html5/thumbnails/57.jpg)
Summary
Resilience criterion bounds population limit for TV perturbations.
T̃VH gives finite-sample analysis for projection algorithm.
Friendly perturbations allow us to generalize resilience toWc-perturbations.
Many open questions for Wc case!Better finite-sample analysis.Efficient algorithms.Beyond Wc?
Zhu, Jiao, Steinhardt (UC Berkeley) Generalized Resilience and Robust Statistics August 8, 2019 19 / 19