ie 598: incremental gradient methods - saganiaohe.ise.illinois.edu/ie598_2016/pdf/ie598... · [2]...
TRANSCRIPT
![Page 1: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/1.jpg)
IE 598: Incremental Gradient Methods - SAGA
Meghana Bandembande2
![Page 2: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/2.jpg)
Outline
• Introduction• SAGA algorithm• Convergence proof
![Page 3: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/3.jpg)
Finite sum problem
Minimize f(x) of the form
Each function is µ-strongly convex and L-smooth.
Applications:Empirical risk minimization
![Page 4: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/4.jpg)
Motivation
• Gradient DescentConvergence rate Iteration cost Total complexity
• Stochastic Gradient DescentConvergence rateIteration costTotal complexity
Algorithms with linear convergence and cheap iteration cost
![Page 5: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/5.jpg)
Variance reduction technique
To be estimated: E[X]Given: Y which is correlated with X. E[Y] can be easily computed
α=1: unbiasedα=0: highly biasedIf Cov[X,Y] is large, variance of estimator is lower
![Page 6: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/6.jpg)
SAGA: Algorithm
![Page 7: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/7.jpg)
Convergence results
![Page 8: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/8.jpg)
Convergence result
• Define the following function
• Show that
• Note that and conclude the result
![Page 9: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/9.jpg)
![Page 10: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/10.jpg)
![Page 11: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/11.jpg)
![Page 12: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/12.jpg)
![Page 13: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/13.jpg)
![Page 14: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/14.jpg)
![Page 15: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/15.jpg)
Composite case
Consider F(x)=f(x)+h(x) where h(x) is convex but not L-smooth.
Same convergence rate due to non-expansiveness of proximal operator.
![Page 16: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/16.jpg)
Other Variance Reduction Techniques
![Page 17: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/17.jpg)
Convergence rates
• SAG
• SAGA
• SVRG
![Page 18: IE 598: Incremental Gradient Methods - SAGAniaohe.ise.illinois.edu/IE598_2016/pdf/IE598... · [2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method](https://reader035.vdocuments.site/reader035/viewer/2022081517/5ed5884d3318b773e91e5c50/html5/thumbnails/18.jpg)
References
[1] M. W. Schmidt, N. L. Roux, and F. R. Bach. Minimizing Finite Sums with the Stochastic Average Gradient. arXiv:1309.2388, 2013.
[2] A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In NIPS 27, pages 1646-1654. 2014.
[3] R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In NIPS 26, pages 315-323. 2013.
[4] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer, 2004.
[5] Incremental Gradient Methods, IE 598 Course Notes,http://niaohe.ise.illinois.edu/IE598/pdf/IE598-lecture23-incremental%20gradient%20algorithms.pdf