![Page 1: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/1.jpg)
TianLi(CMU),AnitKumarSahu(BCAI),ManzilZaheer(GoogleResearch),MaziarSanjabi(FacebookAI),AmeetTalwalkar(CMU&DeterminedAI),VirginiaSmith(CMU)
FederatedOptimizationinHeterogeneousNetworks
![Page 2: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/2.jpg)
FederatedLearningPrivacy-preservingtraininginheterogeneous,(potentially)massivenetworks
Networksofremotedevicese.g.,cellphones
next-wordprediction
Networksofisolatedorganizationse.g.,hospitals
healthcare
2
![Page 3: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/3.jpg)
ExampleApplications
Voicerecognitiononmobilephones
Adaptingtopedestrianbehavioronautonomousvehicles
Personalizedhealthcareonwearabledevices
Predictivemaintenanceforindustrialmachines
3
![Page 4: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/4.jpg)
Workflow&Challenges
Wt Wt
W′ ′ W′
Wt+1
Systemsheterogeneityvariablehardware,networkconnectivity,
power,etc
Statisticalheterogeneityhighlynon-identicallydistributeddata
Expensivecommunicationpotentiallymassivenetwork;wireless
communication
Privacyconcernsprivacyleakagethroughparameters
localtraininglocaltraining
Objective:
server
devices
lossondevicekAstandardsetup:
4
minw
f(w) =N
∑k=1
pkFk(w)
![Page 5: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/5.jpg)
APopularMethod:FederatedAveraging(FedAvg)[1]
[1]McMahan,H.Brendan,etal."Communication-efficientlearningofdeepnetworksfromdecentralizeddata."AISTATS,2017.
Workswellinmanysettings!(especiallynon-convex)
5
Ateachcommunicationround:
Serverrandomlyselectsasubsetofdevices&sendsthecurrentglobalmodel wt
Eachselecteddevice updates for epochs
ofSGDtooptimize &sendsthenewlocalmodelback
k wt EFk
Serveraggregateslocalmodelstoformanewglobalmodelwt+1
Whatcangowrong?
![Page 6: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/6.jpg)
Whataretheissues?
simpleaverageupdates
statisticalheterogeneity
highlynon-identicallydistributeddata
0% stragglers
6
stragglers
systemsheterogeneity
simplydropslowdevices[2]
[2]Bonawitz,Keith,etal."TowardsFederatedLearningatScale:SystemDesign."MLSys,2019.
0% stragglers
90% stragglers
FedAvg
heuristicmethod
notguaranteedtoconverge
![Page 7: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/7.jpg)
OutlineMotivation
FedProxMethod
TheoreticalAnalysis
Experiments
FutureWork7
![Page 8: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/8.jpg)
FedProx—HighLevel
rateasafunctionofstatisticalheterogeneityaccountforstragglers theory
allowforvariableamountsofwork&safelyincorporatethem
encouragemorewell-behavedupdates
simplydropstragglers
systemsheterogeneity
averagesimpleSGDupdates
statisticalheterogeneity
1. convergenceguarantees2. morerobustempiricalperformance forfederatedlearninginheterogeneousnetworks
Contributio
ns
FedProx
8
![Page 9: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/9.jpg)
FedProx:AFrameworkForFederatedOptimizationAteachcommunicationround,
localobjective:
minwk
Fk(wk)
Objective:
minw
f(w) =N
∑k=1
pkFk(w)
Idea1:Allowforvariableamountsofworktobeperformedonlocaldevicestohandlestragglers
9
Idea2:ModifiedLocalSubproblem:
a proximal term
minwk
Fk(wk) +μ2
wk − wt 2
![Page 10: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/10.jpg)
FedProx:AFrameworkForFederatedOptimization
ModifiedLocalSubproblem: minwk
Fk(wk) +μ2
wk − wt 2
Theproximalterm(1)safelyincorporatenoisyupdates;(2)explicitly
limitstheimpactoflocalupdates
GeneralizationofFedAvg
Canuseanylocalsolver
Morerobustandstableempiricalperformance
Strongtheoreticalguarantees(withsomeassumptions)10
![Page 11: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/11.jpg)
OutlineMotivation
FedProxMethod
TheoreticalAnalysis
Experiments
FutureWork11
![Page 12: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/12.jpg)
ConvergenceAnalysis
High-level:convergesdespitethesechallengesIntroducesnotionofB-dissimilarityintocharacterizestatisticalheterogeneity:
IIDdata: non-IIDdata:
B = 1B > 1
12
Challenges:devicesubsampling,non-iiddata,localupdates
*usedinothercontexts,e.g.,gradientdiversity[3]toquantifythebenefitsofscalingdistributedSGD
[3]Yin,Dong,etal."GradientDiversity:aKeyIngredientforScalableDistributedLearning.”AISTATS,2018.
𝔼 [∥∇Fk(w)∥2] ≤ ∥∇f(w)∥2B2
![Page 13: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/13.jpg)
Assumption1:Dissimilarityisbounded
ConvergenceAnalysis
13
Proximaltermmakesthemethodmoreamenabletotheoreticalanalysis!
Assumption2:Modifiedlocalsubproblemisconvex&smooth
Assumption3:EachlocalsubproblemissolvedtosomeaccuracyFlexiblecommunication/computationtradeoffAccountforpartialworkintherates
![Page 14: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/14.jpg)
Rateisgeneral:Coversbothconvex,andnon-convexlossfunctionsIndependentofthelocalsolver;agnosticofthesamplingmethod
ThesameasymptoticconvergenceguaranteeasSGDCanconvergemuchfasterthandistributedSGDinpractice
ConvergenceAnalysis
14
[Theorem]Obtainsuboptimality ,afterTrounds,with:ε
T = O ( f(w0) − f*ρε )
some constant, a function of (B, μ, …)
![Page 15: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/15.jpg)
OutlineMotivation
FedProxMethod
TheoreticalAnalysis
Experiments
FutureWork15
![Page 16: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/16.jpg)
ExperimentsZeroSystemsheterogeneity+FixedStatisticalheterogeneity
Benchmark:LEAF(leaf.cmu.edu)
16
FedAvg
FedProxwith leadstomorestableconvergenceunderstatisticalheterogeneityμ > 0
FedProx, μ > 0
![Page 17: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/17.jpg)
FedAvg FedProx, μ > 0
Similarbenefitsforalldatasets
17
![Page 18: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/18.jpg)
ExperimentsHighSystemsheterogeneity+FixedStatisticalheterogeneity
18
FedAvg
Allowingforvariableamountsofworktobeperformedhelps
convergenceinthepresenceofsystems
heterogeneity
FedProx, μ = 0FedProx, μ > 0
FedProxwith leadstomorestableconvergenceunderstatistical&systems
heterogeneity
μ > 0
![Page 19: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/19.jpg)
FedAvg FedProx, μ = 0 FedProx, μ > 0
19
Similarbenefitsforalldatasets
Intermsoftestaccuracy:
onaverage,22%absoluteaccuracyimprovementcomparedwithFedAvgin
highlyheterogeneoussettings
![Page 20: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/20.jpg)
ExperimentsImpactofStatisticalHeterogeneity
Settingμ>0canhelptocombatthis
Inaddition,B-dissimilaritycapturesstatisticalheterogeneity(seepaper)
Increasingheterogeneityleadstoworseconvergence
20
![Page 21: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/21.jpg)
OutlineMotivation
FedProxMethod
TheoreticalAnalysis
Experiments
FutureWork21
![Page 22: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/22.jpg)
FutureWorkPrivacy&security
Betterprivacymetrics&mechanisms
PersonalizationAutomaticfine-tuning
ProductionizingColdstartproblems
Hyper-parametertuningSetμautomatically
DiagnosticsDeterminingheterogeneityaprioriLeveragingtheheterogeneityforimprovedperformance
Whitepaper:FederatedLearning:Challenges,Methods,andFutureDirections,IEEESignalProcessingMagazine,2020.
(alsoonArXiv)
22
![Page 23: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/23.jpg)
Thanks!
Paper&code:cs.cmu.edu/~litian/Benchmark:leaf.cmu.edu
Poster:#3,thisroom
23
On-deviceIntelligenceWorkshop,Wednesday,thisroom
![Page 24: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/24.jpg)
Backup1• Relationswithpreviousworks
• proximalterm• ElasticSGD:employsamorecomplexmovingaveragetoupdateparameters;
limitedtoSGDasalocalsolver;onlybeenanalyzedforquadraticproblems• DANEandinexactDANE:addsanadditionalgradientcorrectionterm,assume
fulldeviceparticipation(unrealistic);discouragingempiricalperformance• FedDANE: A Federated Newton-Type Method, Arxiv.
• Otherworks:differentpurposessuchasspeedingupSGDonasinglemachine;differentanalysisassumptions(IID,solvingsubproblemsexactly)
• B-dissimilarityterm• Forotherpurposes,suchasquantifyingthebenefitinscalingSGDforIIDdata
24
![Page 25: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/25.jpg)
• Datastatistics
• Systemsheterogeneitysimulation
• FixaglobalnumberofepochsE,andforcesomedevicestoperformfewerupdatesthan epochs.Inparticular,forvaryingheterogeneoussetting,assign (chosenuniformlyrandombetween )numberofepochsto0%,50,and90%ofselecteddevices.
E x[1,E]
Backup2
25
![Page 26: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/26.jpg)
Backup3
• TheoriginalFedAvgalgorithm
![Page 27: Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · • FedDANE: A Federated Newton-Type Method, Arxiv. • Other works: different purposes](https://reader034.vdocuments.site/reader034/viewer/2022042419/5f36013c89f2c01526292844/html5/thumbnails/27.jpg)
Backup4• CompletetheoremAssumethefunctions arenon-convex,L-Lipschitzsmooth,andthereexists ,suchthat
,with .Supposethat isnotastationarysolutionandthelocalfunctions are -dissimilar,i.e., If and arechosensuchthat
Fk L_ > 0∇2Fk ⪰ − L_I μ̄ = μ − L_ > 0 wt
Fk B B(wt) ≤ B . μ, K, γtk
ρt = ( 1μ
−γtBμ
−B(1 + γt) 2
μ̄ K−
LB(1 + γt)μ̄μ
−L(1 + γt)2B2
2μ̄2−
LB2(1 + γt)2
μ̄2K (2 2K + 2)) > 0,
thenattheiteration ofFedProx,wehavethefollowingexpecteddecreaseintheglobalobjective:
t
𝔼St[ f(wt+1)] ≤ f(wt) − ρt∥∇f(wt)∥2,
where isthesetof deviceschosenatiteration andSt K t γt = maxk∈St
γtk .