robust estimation in parameter learning

RobustEstimationinParameterLearning

SimonsInstituteBootcamp Tutorial,Part2

AnkurMoitra(MIT)

CLASSICPARAMETERLEARNINGGivensamplesfromanunknowndistributioninsomeclass

e.g.a1-DGaussian

canweaccuratelyestimateitsparameters?


e.g.a1-DGaussian

canweaccuratelyestimateitsparameters? Yes!


e.g.a1-DGaussian


empiricalmean: empiricalvariance:

Yes!

Themaximumlikelihoodestimatorisasymptoticallyefficient(1910-1920)

R.A.Fisher

Themaximumlikelihoodestimatorisasymptoticallyefficient(1910-1920)

R.A.Fisher J.W.Tukey

Whatabouterrors inthemodelitself?(1960)

ROBUSTPARAMETERLEARNINGGivencorrupted samplesfroma1-DGaussian:


=+idealmodel noise observedmodel

Howdoweconstrainthenoise?


Equivalently:

L1-normofnoiseatmostO(ε)


Equivalently:

L1-normofnoiseatmostO(ε) ArbitrarilycorruptO(ε)-fractionofsamples(inexpectation)


Equivalently:

ThisgeneralizesHuber’sContaminationModel:Anadversarycanadd anε-fractionofsamples



Equivalently:

ThisgeneralizesHuber’sContaminationModel:Anadversarycanadd anε-fractionofsamples


Outliers:Pointsadversaryhascorrupted,Inliers:Pointshehasn’t

Inwhatnormdowewanttheparameterstobeclose?


Definition:Thetotalvariationdistancebetweentwodistributionswithpdfs f(x)andg(x)is


FromtheboundontheL1-normofthenoise,wehave:

observedideal




estimate ideal

Goal:Finda1-DGaussianthatsatisfies


estimate observed


Equivalently,finda1-DGaussianthatsatisfies

Dotheempiricalmeanandempiricalvariancework?


No!


No!


Butthemedian andmedianabsolutedeviationdowork

Fact[Folklore]:Givensamplesfromadistributionthatisε-closeintotalvariationdistancetoa1-DGaussian

themedianandMADrecoverestimatesthatsatisfy

where



where

Alsocalled(properly)agnosticallylearninga1-DGaussian



where

Whataboutrobustestimationinhigh-dimensions?

Whataboutrobustestimationinhigh-dimensions?

e.g.microarrayswith10kgenes



where

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� RecentResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� AWin-WinAlgorithm� UnknownCovariance

OUTLINE

PartIII:FurtherResults

MainProblem:Givensamplesfromadistributionthatisε-closeintotalvariationdistancetoad-dimensionalGaussian

giveanefficientalgorithmtofindparametersthatsatisfy

MainProblem:Givensamplesfromadistributionthatisε-closeintotalvariationdistancetoad-dimensionalGaussian

giveanefficientalgorithmtofindparametersthatsatisfy

SpecialCases:

(1)Unknownmean

(2)Unknowncovariance

ACOMPENDIUMOFAPPROACHES

ErrorGuarantee

RunningTime

UnknownMean


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε)


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard

GeometricMedian


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard

GeometricMedian poly(d,N)


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard

GeometricMedian poly(d,N)O(ε√d)


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard


Tournament O(ε) NO(d)


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard



O(ε√d)Pruning O(dN)


ErrorGuarantee

RunningTime

TukeyMedian O(ε) NP-Hard

GeometricMedian O(ε√d) poly(d,N)


O(ε√d)Pruning O(dN)

UnknownMean

…

ThePriceofRobustness?

Allknownestimatorsarehardtocomputeorlosepolynomial factorsinthedimension



Equivalently:Computationallyefficientestimatorscanonlyhandle

fractionoferrorsandgetnon-trivial(TV<1)guarantees



Equivalently:Computationallyefficientestimatorscanonlyhandle

fractionoferrorsandgetnon-trivial(TV<1)guarantees

Isrobustestimationalgorithmicallypossibleinhigh-dimensions?

PartI:Introduction


� RecentResults




OUTLINE


RECENTRESULTS

Theorem[Diakonikolas,Li,Kamath,Kane,Moitra,Stewart‘16]:Thereisanalgorithmwhengivensamplesfromadistributionthatisε-closeintotalvariationdistancetoad-dimensionalGaussianfindsparametersthatsatisfy

Robustestimationishigh-dimensionsisalgorithmicallypossible!

Moreoverthealgorithmrunsintimepoly(N,d)

RECENTRESULTS

Theorem[Diakonikolas,Li,Kamath,Kane,Moitra,Stewart‘16]:Thereisanalgorithmwhengivensamplesfromadistributionthatisε-closeintotalvariationdistancetoad-dimensionalGaussianfindsparametersthatsatisfy

Robustestimationishigh-dimensionsisalgorithmicallypossible!


Extensions:Canweakenassumptionstosub-Gaussianorboundedsecondmoments(withweakerguarantees)forthemean

Independentlyandconcurrently:

Theorem[Lai,Rao,Vempala ‘16]:Thereisanalgorithmwhengivensamplesfromadistributionthatisε-closeintotal

variationdistancetoad-dimensionalGaussianfindsparametersthatsatisfy


Independentlyandconcurrently:

Theorem[Lai,Rao,Vempala ‘16]:Thereisanalgorithmwhengivensamplesfromadistributionthatisε-closeintotal

variationdistancetoad-dimensionalGaussianfindsparametersthatsatisfy


Whenthecovarianceisbounded,thistranslatesto:

AGENERALRECIPE

Robustestimationinhigh-dimensions:

� Step#1:Findanappropriateparameterdistance

� Step#2:Detectwhenthenaïveestimatorhasbeencompromised

� Step#3:Findgoodparameters,ormakeprogressFiltering:FastandpracticalConvexProgramming:Bettersamplecomplexity

AGENERALRECIPE





Let’sseehowthisworksforunknownmean…

PartI:Introduction


� RecentResults




OUTLINE


PARAMETERDISTANCE

Step#1:FindanappropriateparameterdistanceforGaussians

PARAMETERDISTANCE


ABasicFact:

(1)

PARAMETERDISTANCE


ABasicFact:

(1)

ThiscanbeprovenusingPinsker’s Inequality

andthewell-knownformulaforKL-divergencebetweenGaussians

PARAMETERDISTANCE


ABasicFact:

(1)

PARAMETERDISTANCE


ABasicFact:

(1)

Corollary:Ifourestimate(intheunknownmeancase)satisfies

then

PARAMETERDISTANCE


ABasicFact:

(1)

Corollary:Ifourestimate(intheunknownmeancase)satisfies

then

OurnewgoalistobecloseinEuclideandistance

PartI:Introduction


� RecentResults




OUTLINE


DETECTINGCORRUPTIONS

Step#2:Detectwhenthenaïveestimatorhasbeencompromised



=uncorrupted=corrupted



=uncorrupted=corrupted

Thereisadirectionoflarge(>1)variance

KeyLemma:IfX1,X2,…XN comefromadistributionthatisε-closetoandthenfor

(1) (2)

withprobabilityatleast1-δ

KeyLemma:IfX1,X2,…XN comefromadistributionthatisε-closetoandthenfor

(1) (2)

withprobabilityatleast1-δ

Take-away:Anadversaryneedstomessupthesecondmomentinordertocorruptthefirstmoment

PartI:Introduction


� RecentResults




OUTLINE


AWIN-WINALGORITHM

Step#3:Eitherfindgoodparameters,orremovemanyoutliers

AWIN-WINALGORITHM


FilteringApproach:Supposethat:

AWIN-WINALGORITHM



Wecanthrowoutmorecorruptedthanuncorruptedpoints:

v

wherevisthedirectionoflargestvariance

AWIN-WINALGORITHM




v

wherevisthedirectionoflargestvariance,andThasaformula

AWIN-WINALGORITHM




v

T

wherevisthedirectionoflargestvariance,andThasaformula

AWIN-WINALGORITHM



Wecanthrowoutmorecorruptedthanuncorruptedpoints

AWIN-WINALGORITHM




Ifwecontinuetoolong,we’dhavenocorruptedpointsleft!

AWIN-WINALGORITHM





Eventuallywefind(certifiably)goodparameters

AWIN-WINALGORITHM






RunningTime: SampleComplexity:

AWIN-WINALGORITHM






RunningTime: SampleComplexity:ConcentrationofLTFs

PartI:Introduction


� RecentResults




OUTLINE


AGENERALRECIPE





AGENERALRECIPE





Howaboutforunknowncovariance?

PARAMETERDISTANCE


PARAMETERDISTANCE


AnotherBasicFact:

(2)

PARAMETERDISTANCE


AnotherBasicFact:

Again,provenusingPinsker’s Inequality

(2)

PARAMETERDISTANCE


AnotherBasicFact:


(2)

Ournewgoalistofindanestimatethatsatisfies:

PARAMETERDISTANCE


AnotherBasicFact:


(2)

Ournewgoalistofindanestimatethatsatisfies:

Distanceseemsstrange,butit’stherightonetousetoboundTV

UNKNOWNCOVARIANCE

Whatifwearegivensamplesfrom?

UNKNOWNCOVARIANCE


Howdowedetectifthenaïveestimatoriscompromised?

UNKNOWNCOVARIANCE



KeyFact:Let and

Thenrestrictedtoflattenings ofdxdsymmetricmatrices

UNKNOWNCOVARIANCE



KeyFact:Let and


ProofusesIsserlis’s Theorem

UNKNOWNCOVARIANCE

needtoprojectout



KeyFact:Let and


KeyIdea: Transformthedata,lookforrestrictedlargeeigenvalues


Ifwerethetruecovariance,wewouldhaveforinliers


Ifwerethetruecovariance,wewouldhaveforinliers,inwhichcase:

wouldhavesmallrestrictedeigenvalues


Ifwerethetruecovariance,wewouldhaveforinliers,inwhichcase:

wouldhavesmallrestrictedeigenvalues

Take-away:Anadversaryneedstomessupthe(restricted)fourthmomentinordertocorruptthesecondmoment

ASSEMBLINGTHEALGORITHM

Givensamplesthatareε-closeintotalvariationdistancetoad-dimensionalGaussian



Step#1:Doublingtrick




Nowusealgorithmforunknowncovariance





Step#2:(Agnostic)isotropicposition






rightdistance,ingeneralcase






Nowusealgorithmforunknownmeanrightdistance,ingeneralcase

PartI:Introduction


� RecentResults




OUTLINE


LIMITSTOROBUSTESTIMATION

Theorem[Diakonikolas,Kane,Stewart‘16]:Anystatisticalquerylearning* algorithminthestrongcorruptionmodel

thatmakeserrormustmakeatleastqueries

insertionsanddeletions

LIMITSTOROBUSTESTIMATION

Theorem[Diakonikolas,Kane,Stewart‘16]:Anystatisticalquerylearning* algorithminthestrongcorruptionmodel

thatmakeserrormustmakeatleastqueries

*Insteadofseeingsamplesdirectly,analgorithmqueriesafnctn

andgetsexpectation,uptosamplingnoise

insertionsanddeletions

LISTDECODING

Whatifanadversarycancorruptthemajority ofsamples?

LISTDECODING


Thisextendstomixturesstraightforwardly

Theorem[Charikar,Steinhardt,Valiant‘17]:Givensamplesfromadistributionwithmeanandcovariancewherehavebeencorrupted,thereisanalgorithmthatoutputs

with thatsatisfies

LISTDECODING


Thisextendstomixturesstraightforwardly

Theorem[Charikar,Steinhardt,Valiant‘17]:Givensamplesfromadistributionwithmeanandcovariancewherehavebeencorrupted,thereisanalgorithmthatoutputs

with thatsatisfies

[Kothari,Steinhardt‘18],[Diakonikolas etal’18] gaveimprovedguarantees,butunderGaussianity

BEYONDGAUSSIANS

Canwerelaxthedistributionalassumptions?

BEYONDGAUSSIANS

Theorem[Kothari,Steurer ‘18] [Hopkins,Li’18]:Givenε-corruptedsamplesfromak-certifiablysubgaussian distributionthereisanalgorithmthatoutputs


BEYONDGAUSSIANS

Theorem[Kothari,Steurer ‘18] [Hopkins,Li’18]:Givenε-corruptedsamplesfromak-certifiablysubgaussian distributionthereisanalgorithmthatoutputs


Whenyouonlyknowboundsonthemoments,theseguaranteesareoptimal

SUBGAUSSIAN CONFIDENCEINTERVALS

Estimatingthemeanaccuratelywithheavytaileddistributions?



Theorem[Hopkins‘18]:Givenniid samplesfromadistributionwithmeanandcovarianceandtargetconfidence,thereisapolynomialtimealgorithmthatoutputssatisfying




Theempiricalmeandoesn’twork,andmedian-of-meansestimatordueto [Lugosi,Mendelson ‘18]ishardtocompute




Theempiricalmeandoesn’twork,andmedian-of-meansestimatordueto [Lugosi,Mendelson ‘18]ishardtocompute

[Cherapanamjeri,Flammarion,Bartlett‘19]gavefasteralgorithmsbasedongradientdescent

Summary:� Nearlyoptimalalgorithmforagnosticallylearningahigh-dimensionalGaussian

� Generalrecipeusingrestrictedeigenvalueproblems� Furtherapplicationstoothermixturemodels�What’snextforalgorithmicrobuststatistics?

Thanks!AnyQuestions?

Summary:� Nearlyoptimalalgorithmforagnosticallylearningahigh-dimensionalGaussian

� Generalrecipeusingrestrictedeigenvalueproblems� Furtherapplicationstoothermixturemodels�What’snextforalgorithmicrobuststatistics?

robust estimation in parameter learning

Documents