dynamic system view of deep learning - stanford universityyplu/dynamicocnn.pdf · of deep learning...
TRANSCRIPT
![Page 1: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/1.jpg)
BIN DONG
PEKING UNIVERSITY
Dynamic System and Optimal Control Perspective of Deep Learning
Special thanks to Yiping Lu who helped in preparation of the slides.
![Page 2: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/2.jpg)
OutlineBackground and motivation
Deep neural network and numerical ODE
Deep neural network and numerical PDE
An application in image processing and medical imaging
Optimal control perspective for deep network training
![Page 3: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/3.jpg)
Background & Motivation
![Page 4: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/4.jpg)
Deep Learning: Burning Hot!
4Credit: D. Donoho/ H. Monajemi/ V. Papyan “Stats 385”@Stanford
![Page 5: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/5.jpg)
Deep LearningDeep learning is “alchemy” - Ali Rahimi, NIPS 2017
![Page 6: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/6.jpg)
Deep Learning
What are still challenging◦ Learning from limited or/and weakly labelled data
◦ Learning from data of different types
◦ Theoretical guidance, transparency
Should we expect rigorous mathematical analysis of deep learning? Maybe, but…
We also wish to allow the possibility than an engineer or team of engineers may construct a machine which works, butwhose manner of operation cannot be satisfactorily described by its constructors because they have applied a method which is largely experimental – Alan M. Turing
![Page 7: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/7.jpg)
What are still challenging◦ Learning from limited or/and weakly labelled data
◦ Learning from data of different types
◦ Theoretical guidance, transparency
We probably should first find “frameworks” and “links” with mathematics.
Deep Learning
Deep Network
Network Architecture
Network Training
Differential Equations (DE)
Numerical DE
Optimal Control
![Page 8: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/8.jpg)
Deep Neural Networks and Numerical ODENETWORK STRUCTURE DESIGN
![Page 9: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/9.jpg)
Depth Neural Network
𝑓1 𝑓2 𝑓3⋯ 𝑥
Deep Neural Network
A Dynamic System?
![Page 10: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/10.jpg)
Motivation
𝒙𝒏+𝟏 = 𝒙𝒏 + 𝒇(𝒙𝒏)
Deep Residual Learning(@CVPR2016)
𝒙𝒕 = 𝒇(𝒙)
Forward Euler Scheme
- Weinan E. A Proposal on Machine Learning via Dynamical Systems. Communications in Mathematical Science, 2017.
- Haber E, Ruthotto L. Stable architectures for deep neural networks[J]. Inverse Problems, 2017.- Bo C, Meng L, et al. Reversible Architectures for Arbitrarily Deep Residual Neural Networks,
AAAI 2018- Lu Y. et al., Beyond Finite Layer Neural Network: Bridging Deep Architects and Numerical
Differential Equations, ICML 2018.
![Page 11: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/11.jpg)
Motivation
Theoretical Convergence Results is built in:Thorpe, Matthew, and Yves van Gennip. "Deep Limits of Residual Neural Networks." arXivpreprint arXiv:1810.11741(2018).A New Generalization Perspective From Control:Han, Jiequn, and Qianxiao Li. "A mean-field optimal control formulation of deep learning." arXivpreprint arXiv:1807.01083(2018).
𝒙𝒏+𝟏 = 𝒙𝒏 + 𝒇(𝒙𝒏)
Deep Residual Learning(@CVPR2016)
𝒙𝒕 = 𝒇(𝒙)
Forward Euler Scheme
![Page 12: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/12.jpg)
Depth Revolution
Deeper And Deeper
![Page 13: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/13.jpg)
Depth Revolution
Going intoinfinite layer
Differential Equation As Infinite Layer Neural Network
![Page 14: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/14.jpg)
Polynet(@CVPR2017)
(b) Polynet
Zhang X, Li Z, Loy C C, et al. PolyNet: A Pursuit of Structural Diversity in Very Deep Networks. CVPR 2017
Revisiting previous efforts in deep learning, we found that diversity, another aspect in network design that is relatively less explored, also plays a significant role
PolyStrure: 𝑥𝑛+1 = 𝑥𝑛 + 𝐹 𝑥𝑛 + 𝐹(𝐹 𝑥𝑛 )
Backward Euler Scheme:𝑥𝑛+1 = 𝑥𝑛 + 𝐹 𝑥𝑛+1 ⇒ 𝑥𝑛+1 = 𝐼 − 𝐹 −1𝑥𝑛
Approximate the operator 𝐼 − 𝐹 −1 by 𝐼 + 𝐹 + 𝐹2 +⋯
![Page 15: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/15.jpg)
FractalNet(@ICLR2017)
fc
fc
fc
conv
Larsson G, Maire M, Shakhnarovich G. FractalNet: Ultra-Deep Neural Networks without Residuals. ICLR 2017.
Runge-KuttaScheme(2order)
𝑥𝑛+1 =𝑘1𝑥𝑛 + 𝑘2(𝑘3𝑥𝑛 + 𝑓1 𝑥𝑛 ) + 𝑓2(𝑘3𝑥𝑛 + 𝑓1 𝑥𝑛 )
![Page 16: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/16.jpg)
ODE: Infinite Layer Neural Network
Dynamic System Neural Network
Continuous limit Numerical Approximation
WRN, ResNeXt, Inception-ResNet, PolyNet, SENet etc…… : New scheme to Approximate the right hand side termWhy not change the way to discrete 𝑢𝑡?
Lu, Yiping, et al. "Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations." ICML 2018
![Page 17: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/17.jpg)
Experiment
𝒙𝒏+𝟏 = 𝒙𝒏 + 𝒇(𝒙𝒏)
𝒙𝒕 = 𝒇(𝒙)
@Linear Multi-step Residual Network
Lu, Yiping, et al. "Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations." ICML 2018
![Page 18: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/18.jpg)
Experiment
𝒙𝒏+𝟏 = 𝒙𝒏 + 𝒇(𝒙𝒏)
@Linear Multi-step Residual Network
𝒙𝒕 = 𝒇(𝒙) 𝒙𝒏+𝟏 = (𝟏 − 𝒌𝒏)𝒙𝒏 + 𝒌𝒏𝒙𝒏−𝟏 + 𝒇(𝒙𝒏)Linear Multi-step Scheme
Linear Multi-step Residual Network
Lu, Yiping, et al. "Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations." ICML 2018
![Page 19: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/19.jpg)
conv
conv
conv
conv
conv
conv
conv
convScale 1-kScale k
(b)Linear Multi-step ResNet(a) ResNet
Lu, Yiping, et al. "Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations." ICML 2018
![Page 20: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/20.jpg)
conv
conv
conv
conv
conv
conv
conv
convScale 1-kScale k
(b)Linear Multi-step ResNet(a) ResNet
Only One More Parameter
Lu, Yiping, et al. "Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations." ICML 2018
![Page 21: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/21.jpg)
Experiment
@Linear Multi-step Residual Network
(a)Resnet (b)LM-Resnet
Lu, Yiping, et al. "Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations." ICML 2018
![Page 22: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/22.jpg)
Experiment
@Linear Multi-step Residual Network
![Page 23: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/23.jpg)
Explanation on the performance boost via modified equations
@Linear Multi-step Residual Network
𝟏 + 𝒌𝒏 ሶ𝒖 + 𝟏 − 𝒌𝒏𝚫𝒕
𝟐ሷ𝒖𝒏 = 𝒇(𝒖)𝒙𝒏+𝟏 = (𝟏 − 𝒌𝒏)𝒙𝒏+𝒌𝒏𝒙𝒏−𝟏 + 𝚫𝐭𝒇(𝒙𝒏)
𝒙𝒏+𝟏 = 𝒙𝒏 + 𝚫𝐭𝒇(𝒙𝒏)
ResNet
LM-ResNet
ሶ𝒖 +𝚫𝐭
𝟐ሷ𝒖𝒏 = 𝒇(𝒖)
[1] Dong B, Jiang Q, Shen Z. Image restoration: wavelet frame shrinkage, nonlinear evolution PDEs, and beyond.
Multiscale Modeling and Simulation: A SIAM Interdisciplinary Journal 2017.
[2] Su W, Boyd S, Candes E J. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory
and Insights. Advances in Neural Information Processing Systems, 2015.
[3] A. Wibisono, A. Wilson, and M. I. Jordan. A variational perspective on accelerated methods in
optimizationProceedings of the National Academy of Sciences 2016.
![Page 24: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/24.jpg)
Plot The Momentum
@Linear Multi-step Residual Network
𝟏 + 𝒌𝒏 ሶ𝒖 + 𝟏 − 𝒌𝒏𝚫𝒕
𝟐ሷ𝒖𝒏 + 𝒐 𝚫𝒕𝟑 = 𝒇(𝒖)
Learn A Momentum
𝒙𝒏+𝟏 = (𝟏 − 𝒌𝒏)𝒙𝒏+𝒌𝒏𝒙𝒏−𝟏 + 𝚫𝐭𝒇(𝒙𝒏)
![Page 25: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/25.jpg)
Plot The Momentum
@Linear Multi-step Residual Network
𝟏 + 𝒌𝒏 ሶ𝒖 + 𝟏 − 𝒌𝒏𝚫𝒕
𝟐ሷ𝒖𝒏 + 𝒐 𝚫𝒕𝟑 = 𝒇(𝒖)
Learn A Momentum
𝒙𝒏+𝟏 = (𝟏 − 𝒌𝒏)𝒙𝒏+𝒌𝒏𝒙𝒏−𝟏 + 𝚫𝐭𝒇(𝒙𝒏)
![Page 26: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/26.jpg)
Connection to stochastic dynamic
Noise can avoid overfit?
Dynamic System
Lu, Yiping, et al. "Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations." ICML 2018
![Page 27: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/27.jpg)
Gastaldi X. Shake-Shake regularization. ICLR Workshop Track2017.
𝒙𝒏+𝟏 = 𝒙𝒏 + 𝜼𝒇𝟏 𝒙 + 𝟏 − 𝜼 𝒇𝟐 𝒙 , 𝜼 ∼ 𝑼 𝟎, 𝟏Shake-Shake regularization
Apply data augmentation techniques to internal representations.
= 𝒙𝒏 + 𝒇𝟐 𝒙𝒏 +𝟏
𝟐𝒇𝟏 𝒙𝒏 − 𝒇𝟐 𝒙𝒏 + (𝜼 −
𝟏
𝟐) 𝒇𝟏 𝒙𝒏 − 𝒇𝟐 𝒙𝒏
Lu, Yiping, et al. "Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations." ICML 2018
Connection to stochastic dynamic
![Page 28: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/28.jpg)
Huang G, Sun Y, Liu Z, et al. Deep Networks with Stochastic Depth ECCV2016.
𝒙𝒏+𝟏 = 𝒙𝒏 + 𝜼𝒏𝒇 𝒙Deep Networks with Stochastic Depth
To reduce the effective length of a neural network during training, we randomly skip layers entirely.
= 𝒙𝒏 + 𝑬𝜼𝒏𝒇 𝒙𝒏 + 𝜼𝒏 − 𝑬𝜼𝒏 𝒇(𝒙𝒏)
Lu, Yiping, et al. "Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations." ICML 2018
Connection to stochastic dynamic
![Page 29: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/29.jpg)
Noise can avoid overfit?
ሶ𝑋 𝑡 = 𝑓 𝑋 𝑡 , 𝑎 𝑡 + 𝑔(𝑋 𝑡 , 𝑡)𝑑𝐵𝑡 , 𝑋 0 = 𝑋0
The numerical scheme is only need to be weak convergence!
𝑬𝒅𝒂𝒕𝒂(𝑙𝑜𝑠𝑠(𝑋 𝑇 )
Lu, Yiping, et al. "Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations." ICML 2018
Connection to stochastic dynamic
![Page 30: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/30.jpg)
Huang G, Sun Y, Liu Z, et al. Deep Networks with Stochastic Depth ECCV2016.
𝒙𝒏+𝟏 = 𝒙𝒏 + 𝜼𝒏𝒇 𝒙Deep Networks with Stochastic Depth
To reduce the effective length of a neural network during training, we randomly skip layers entirely.
= 𝒙𝒏 + 𝑬𝜼𝒏𝒇 𝒙𝒏 + 𝜼𝒏 − 𝑬𝜼𝒏 𝒇(𝒙𝒏)
We need 1 − 2𝑝𝑛 = 𝑂(√Δ𝑡)
Lu, Yiping, et al. "Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations." ICML 2018
Connection to stochastic dynamic
![Page 31: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/31.jpg)
conv
conv
conv
conv
conv
conv
conv
convScale 1-kScale k
(b)Linear Multi-step ResNet(a) ResNet
Stochastic Strategy As Previous
𝟏 + 𝒌𝒏 ሶ𝒖 + 𝟏 − 𝒌𝒏𝚫𝒕
𝟐ሷ𝒖𝒏 + 𝒐 𝚫𝒕𝟑 = 𝒇 𝒖 + 𝒈 𝒖 𝒅𝑾𝒕
Lu, Yiping, et al. "Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations." ICML 2018
![Page 32: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/32.jpg)
Experiment
@Linear Multi-step Residual Network
Lu, Yiping, et al. "Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations." ICML 2018
![Page 33: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/33.jpg)
Conclusion
@Beyond Finite Layer Neural Network
Neural Network Dynamic System
Stochastic Learning Stochastic Dynamic System
New Discretization
LM-ResNet
Original One: LM-Resnet56 Beats Resnet110
Stochastic Depth One: LM-Resnet110 Beats Resnet1202
Modified Equation
Lu, Yiping, et al. "Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations." ICML 2018
![Page 34: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/34.jpg)
Earlier Evidence: LISTA
Gregor, K., and LeCun, Y. Learning fast approximations of sparse coding. In ICML 2010 (pp. 399-406).
0)0()),(()1( ZkSZXWhkZ e
ISTA
Unrolling
𝒁(𝒌 + 𝟏)
LISTA
Unrolled Dynamics
![Page 35: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/35.jpg)
Earlier Evidence: TRD
Learning a diffusion process for denoising
Chen Y, Yu W, Pock T. On learning optimized reaction diffusion processes for effective image restoration CVPR2015
Average PSNR among a dataset with 68 images
Unrolled Dynamics
![Page 36: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/36.jpg)
Recent Evidence: Optimization Algorithm Inspired DNN
Deep neural network as optimization algorithm:
Faster algorithm result in better deep neural network:
𝒙𝒌+𝟏 = 𝝓(𝑾𝒙𝒌) 𝒙𝒌+𝟏 = 𝒙𝒌 − 𝛁𝑭(𝒙𝒌)
Heavy Ball Net: 𝑥𝑘+1 = 𝑇 𝑥𝑘 + 𝑥𝑘 − 𝑥𝑘−1
Accelerated GD Net:
𝑥𝑘+1 =
𝑗=0
𝑘
𝛼𝑘+1,𝑗𝑇(𝑥𝑗) + 𝛽 𝑥𝑘 −
𝑗=0
𝑘
ℎ𝑘+1,𝑗𝑥𝑗
Li H, Yang Y, Chen D and Lin Z. Optimization Algorithm Inspired Deep Neural Network Structure Design. ACML 2018.
![Page 37: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/37.jpg)
X. Wang, et al. "Non-local neural networks." CVPR 2018.
Recent Evidence: Nonlocal DNN
Residual Block:
ResNet Block:
Nonlocal Block:
• “Kinetics” data set: 246k trainingvideos and 20k validation videos.• Task: classification involving 400human action categories
Instability when using multiple blocks!
![Page 38: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/38.jpg)
Nonlocal diffusion
Nonlocal Neural Network
Nonlocal Markov Jump Process
Tao Y, Sun Q, Du Q, et al. Nonlocal Neural Networks, Nonlocal Diffusion and Nonlocal Modeling. NeurIPS 2018.
Design a new stable block
Recent Evidence: Nonlocal DNN as Nonlocal Diffusion
![Page 39: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/39.jpg)
Deep Neural Networks and Numerical PDEDATA DRIVEN PHYSIC LAW DISCOVERY
![Page 40: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/40.jpg)
Can we learn principles (e.g. PDEs) from data?
PDE-Net: Learning PDEs from Data
Dynamics of actin in Immunocytoskeleton Dynamics of Mitochondria
Long Z et al. PDE-Net: Learning PDEs from Data. ICML 2018.
Credit: Kebin Shi, Physics@PKU
![Page 41: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/41.jpg)
Can we learn principles (e.g. PDEs) from data?
Preliminary attempt: ◦ Combine deep learning and numerical PDEs
Objectives:◦ Predictive power (deep learning)◦ Transparency (numerical PDEs)
PDE-Net: Learning PDEs from Data
Long Z et al. PDE-Net: Learning PDEs from Data. ICML 2018.
S. Sato et al., Siggraph 2018
![Page 42: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/42.jpg)
PDE-Net: Learning PDEs from DataPDE-Net: a flexible and transparent deep network
42
𝜕𝑢
𝜕𝑡= 𝐹(𝑥, 𝑢, 𝛻𝑢, 𝛻2𝑢,… )Assuming:
𝛿𝑡-block PDE-Net: multiple 𝛿𝑡-blocks
Long Z et al. PDE-Net: Learning PDEs from Data. ICML 2018.
Prior knowledge on 𝐹:• Type of the PDE• Maximum order
![Page 43: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/43.jpg)
PDE-Net: Learning PDEs from DataConstraints on kernels (granting transparency)◦ Moment matrix (related to vanishing moments in wavelets)
◦ We can approximate any differential operator at any prescribed order by constraining 𝑀(𝑞)
◦ For example: approximation of 𝜕𝑓
𝜕𝑥with a 3 × 3 kernel
1st orderlearnable
2st orderlearnable
1st orderfrozen
Dong, Q. Jiang and Z. Shen, Multiscale Modeling & Simulation, 2017
Long Z et al. PDE-Net: Learning PDEs from Data. ICML 2018.
![Page 44: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/44.jpg)
PDE-Net: Learning PDEs from DataNumerical experiments: data set generation
◦ Convection-diffusion equation (linear)
◦ Diffusion with a nonlinear source (nonlinear)
◦ Initialization: random function with frequency ≤ 9 and 6
◦ Assumptions on 𝐹
◦ Linear:
◦ Nonlinear
Long Z et al. PDE-Net: Learning PDEs from Data. ICML 2018.
![Page 45: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/45.jpg)
PDE-Net: Learning PDEs from DataNumerical experiments: results
◦ Prediction: linear (5 × 5 and 7 × 7 filters)
5 × 5
7 × 7
Learnable filters (orange) v.s. frozen filters (blue) in prediction
Long Z et al. PDE-Net: Learning PDEs from Data. ICML 2018.
![Page 46: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/46.jpg)
PDE-Net: Learning PDEs from DataNumerical experiments: results
◦ Model estimation: linear
46
Long Z et al. PDE-Net: Learning PDEs from Data. ICML 2018.
![Page 47: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/47.jpg)
PDE-Net: Learning PDEs from DataNumerical experiments: results
◦ Prediction and model estimation: nonlinear (7 × 7 filters)
Long Z et al. PDE-Net: Learning PDEs from Data. ICML 2018.
![Page 48: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/48.jpg)
PDE-Net 2.0: Numeric-Symbolic Hybrid RepresentationSymbolic network (granting transparency)
Prior knowledge on 𝐹:• Addition and multiplication of
the terms;• Maximum order.
𝜕𝑢
𝜕𝑡= 𝐹(𝑢, 𝛻𝑢, 𝛻2𝑢,… )Assuming:
![Page 49: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/49.jpg)
PDE-Net 2.0: Numeric-Symbolic Hybrid RepresentationSymbolic network (granting transparency)
Long Z, Lu Y and Dong B. “PDE-Net 2.0: Learning PDEs from Data with A Numeric-Symbolic Hybrid Deep Network”, arXiv:1812.04426, 2018
Motivated by EQL• Sahoo, S. S.; Lampert, C. H. & Martius, G. ICML 2018.• Martius, Georg, and Christoph H. Lampert. arXiv
preprint arXiv:1610.02995 (2016).
𝜂1
𝜉1 𝜂2
𝜉2
𝑣
𝑤
⋯
𝑢
𝑓(⋅,⋅)
𝑣
𝑤
⋯
𝑢
𝑓(⋅,⋅)
𝑣
𝑤
⋯
𝑢
⋯
𝐹(𝑢, 𝑣,𝑤,… )
W3
⋅ +𝑏3
W1
⋅ +b1
W2
⋅ +b2
identity identity
More Constraints:• Pseudo-upwinding• Sparsity on moment matrices• Sparsity on the symbolic network
![Page 50: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/50.jpg)
PDE-Net 2.0: Numeric-Symbolic Hybrid RepresentationWeaker assumption on 𝐹: unknown type
Long Z, Lu Y and Dong B. “PDE-Net 2.0: Learning PDEs from Data with A Numeric-Symbolic Hybrid Deep Network”, arXiv:1812.04426, 2018
𝜈 = 0.05
Remainer weights of 𝒖, 𝒗
Prediction
Model recovery
Burger’s Equation
![Page 51: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/51.jpg)
Application In Image ProcessingBLIND IMAGE RESTORATION
![Page 52: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/52.jpg)
Deep Learning For Restoration
Network 1 𝝈 = 𝟐𝟓
One Noise Level One Net
![Page 53: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/53.jpg)
Deep Learning For Restoration
Network 1 𝝈 = 𝟐𝟓
Network 2 𝝈 = 𝟑𝟓
One Noise Level One Net
![Page 54: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/54.jpg)
What We Want
One Model
One Model For All Noise Level
![Page 55: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/55.jpg)
What Happen When Meet High Noise Level
BM3D DnCNN(Zhang et al. TIP, 2017)
Fails!
![Page 56: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/56.jpg)
PDEs In Image Processinginput output
processing
Zhang X, Lu Y et al., Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration ICLR 2019.
![Page 57: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/57.jpg)
Moving Endpoint Control
Early Stopping Is A Regularization
Can we train it?
Need to be learn
Terminal time as a variable to train
Zhang X, Lu Y et al., Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration ICLR 2019.
![Page 58: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/58.jpg)
Our Approach: Dynamically Unfolding Recurrent Restorer
Zhang X, Lu Y et al., Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration ICLR 2019.
![Page 59: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/59.jpg)
Restoration Unit
Policy Unit
A Good Policy LeadsTo A Good Restorer
Given A Policy -> Train The Restorer
- Good Policy Leads To Better Restorer- Good Policy Leads To Better Generalization
Zhang X, Lu Y et al., Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration ICLR 2019.
![Page 60: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/60.jpg)
DURR ModelDiscretize: Turn To An RL Problem
Consider the objective as a reward
You can also choose other approaches:- A good image quality assessment without
reference.- A Classifer- Fixed loop times according to the noise level- A Person
Zhang X, Lu Y et al., Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration ICLR 2019.
![Page 61: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/61.jpg)
DURR ModelResults
Zhang X, Lu Y et al., Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration ICLR 2019.
![Page 62: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/62.jpg)
DURR ModelResults
Zhang X, Lu Y et al., Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration ICLR 2019.
![Page 63: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/63.jpg)
Nose Level Doesn’t Seen In Training
DURR
DnCNN
Zhang X, Lu Y et al., Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration ICLR 2019.
![Page 64: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/64.jpg)
JPEG Deblocking
Ground Truth DnCNN-B Our DURR
One Model For All QF
Zhang X, Lu Y et al., Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration ICLR 2019.
![Page 65: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/65.jpg)
Application In Medical ImagingUNROLLING REVISITED
![Page 66: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/66.jpg)
Sun, Li, and Xu. Deep ADMM-net for compressive sensing MRI. NIPS 2016.
Unrolled Dynamics: ADMM-Net
![Page 67: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/67.jpg)
Sun, Li, and Xu. Deep ADMM-net for compressive sensing MRI. NIPS 2016.
Unrolled Dynamics: ADMM-Net
![Page 68: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/68.jpg)
Further Application of Unrolling –Task-Based Image Reconstruction
D. Wu et al., End-to-End Lung Nodule Detection in Computed Tomography, MICCAI Workshop, 2018. (arXiv:1711.02074)
Two-step approach: imaging and diagnosis
Problems of the two-step approach:◦ Evaluation of the reconstructed image quality.
◦ Redundancy in data for a specific task.
Can we make it end-to-end, and does it help?
Image Reconstruction
Abnormality Detection
![Page 69: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/69.jpg)
Further Application of Unrolling –Task-Based Image Reconstruction
D. Wu et al., End-to-End Lung Nodule Detection in Computed Tomography, MICCAI Workshop, 2018. (arXiv:1711.02074)
Unrolled SQS
ROC
![Page 70: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/70.jpg)
Further Application of Unrolling –Task-Based Image Reconstruction
D. Wu et al., End-to-End Lung Nodule Detection in Computed Tomography, MICCAI Workshop, 2018. (arXiv:1711.02074)
ROC Cross Entropy
![Page 71: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/71.jpg)
Further Application of Unrolling –Task-Based Image Reconstruction
D. Wu et al., End-to-End Lung Nodule Detection in Computed Tomography, MICCAI Workshop, 2018. (arXiv:1711.02074)
Similar Ideas
![Page 72: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/72.jpg)
Deep Network TrainingOPTIMAL CONTROL PERSPECTIVE
![Page 73: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/73.jpg)
Optimization: Solving The “KKT” Condition
@Maximum Principle Based AlgorithmsRegularizationLoss Function
Qianxiao Li, Long Chen , Cheng Tai, and Weinan E Maximum Principle Based Algorithms for Deep Learning
Original ODE
𝑯 = 𝒑 ⋅ 𝒇 − 𝑳
![Page 74: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/74.jpg)
Optimization: Solving The “KKT” Condition
@Maximum Principle Based AlgorithmsRegularizationLoss Function
Qianxiao Li, Long Chen , Cheng Tai, and Weinan E Maximum Principle Based Algorithms for Deep Learning
Costate
𝑯 = 𝒑 ⋅ 𝒇 − 𝑳
![Page 75: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/75.jpg)
Optimization: Solving The “KKT” Condition
@Maximum Principle Based AlgorithmsRegularizationLoss Function
Qianxiao Li, Long Chen , Cheng Tai, and Weinan E Maximum Principle Based Algorithms for Deep Learning
Maximum Principle
𝑯 = 𝒑 ⋅ 𝒇 − 𝑳
![Page 76: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/76.jpg)
Optimization: Solving The “KKT” Condition
@Maximum Principle Based Algorithms
Qianxiao Li, Long Chen , Cheng Tai, and Weinan E Maximum Principle Based Algorithms for Deep Learning
Solving it via Gauss-Seidel Iteration
![Page 77: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/77.jpg)
Optimization: Solving The “KKT” Condition
@Maximum Principle Based Algorithms
Qianxiao Li, Long Chen , Cheng Tai, and Weinan E Maximum Principle Based Algorithms for Deep Learning
Solving it via Gauss-Seidel Iteration
Back Propagation: argmax step instead of a gradient ascent
![Page 78: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/78.jpg)
Works For Binary NN
Li Q, Hao S. An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks. ICML2018.
![Page 79: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/79.jpg)
Neural ODE
Chen, Tian Qi, et al. “Neural Ordinary Differential Equations.” NeurIPS 2018 (best paper)
NODE
Recall the PMP
![Page 80: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/80.jpg)
VAE and Normalizing Flow
M. I. Jordan, et al., An introduction to variational methods for graphical models. Machine Learning, 37(2):183–233, 1999.
Variational Principle: estimating the density of data 𝑥 by maximizing −𝐹(𝑥)
![Page 81: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/81.jpg)
VAE and Normalizing Flow
Fei-Fei Li & Justin Johnson & Serena Yeung, Lecture 12-90, May 15, 2018
![Page 82: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/82.jpg)
VAE and Normalizing Flow
Rezende, Danilo, and Shakir Mohamed. "Variational Inference with Normalizing Flows." ICML 2015.
Normalizing flow for variational inference: provides a more flexible family of estimators of the unknown 𝑝(𝑧|𝑥)
where 𝑓𝑗 are smooth invertible maps
![Page 83: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/83.jpg)
NODE for Normalizing Flow
Chen, Tian Qi, et al. “Neural Ordinary Differential Equations.” NeurIPS 2018 (best paper)
Use the change of variables theorem to compute exact changes in probability if samples are transformed through a bijective function 𝑓:
Use NODE:
Reducing the calculation cost of gradient from 𝑶(𝒅𝟑) to 𝑶(𝒅)
![Page 84: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/84.jpg)
VAE and Normalizing Flow
D. P. Kingma and P. Dhariwal. "Glow: Generative flow with invertible 1x1 convolutions." NeurIPS 2018.
Normalizing flow for image synthesis:
![Page 85: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/85.jpg)
VAE and Normalizing Flow
D. P. Kingma and P. Dhariwal. "Glow: Generative flow with invertible 1x1 convolutions." NeurIPS 2018.
Normalizing flow for image synthesis:
![Page 86: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/86.jpg)
Applied Math Perspective on Deep LearningTake home message:
Deep Network
Network Architecture
Network Training
Differential Equations (DE)
Numerical DE
Optimal Control
From David Wipf’s Slide@ICASSP2018
![Page 87: Dynamic System View Of Deep Learning - Stanford Universityyplu/DynamicOCNN.pdf · of Deep Learning Special thanks to Yiping Lu who helped in preparation of the slides. Outline Background](https://reader030.vdocuments.site/reader030/viewer/2022041016/5ec9705a38000a687d6ec2e0/html5/thumbnails/87.jpg)
Thanks and Questions?