deep learning as optimal control problems...learning, campus jussieu, sorbonne université, paris...
TRANSCRIPT
![Page 1: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/1.jpg)
Regularisation for Inverse Problems and Machine Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019
Deep learning as optimal control problems
Martin Benning, Queen Mary University of London (QMUL)
Models and numerical methods
This is joint work with Elena Celledoni, Matthias J. Ehrhardt, Brynjulf Owren and Carola-Bibiane Schönlieb
![Page 2: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/2.jpg)
This is joint work with
2
MB, Elena Celledoni, Matthias J. Ehrhardt, Brynjulf Owren, and Carola-Bibiane Schönlieb. "Deep learning as optimal control problems: models and numerical methods." arXiv preprint arXiv:1904.05657 (2019). To appear in Journal of Computational Dynamics in December 2019
![Page 3: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/3.jpg)
Deep learning as optimal control problems
• Introduction
• Runge-Kutta (RK) networks
• Numerical results
• Regularisation properties of RK network
3
![Page 5: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/5.jpg)
Introduction
5
In this part we consider supervised machine learning problems, e.g. classification
Goal of classification is to find mapping given input and output samples .
g : ℝn → {c0, c1, …, cK−1}{(xi, ci)}m
i=1
Here every class label for indicates that the input belongs to the corresponding class, e.g.
ci ∈ {c0, c1, …, cK−1} i ∈ {1,…, m}xi
for , then belongs to class ci = cl l ∈ {0,1,…, K − 1} xi l
![Page 6: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/6.jpg)
Introduction
image source
Training data
xi 2 Rd
Example:
source
![Page 7: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/7.jpg)
Introduction
7
Goal of classification is to find mapping given input and output samples .
g : ℝn → {c0, c1, …, cK−1}{(xi, ci)}m
i=1
g(x) := 𝒞 (Wh(x, u) + μ)• is the hypothesis function mapping from to • is a weight vector • is a bias factor • is a model function parametrised by parameters
𝒞 ℝ {c0, c1, …, cK−1}W ∈ ℝ1×n
μ ∈ ℝh : ℝn × P → ℝn u ∈ P
A potential model for such a classifier function isg
In this part we consider supervised machine learning problems, e.g. classification
![Page 8: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/8.jpg)
Introduction
8
Goal of classification is to find mapping given input and output samples .
g : ℝn → {c0, c1, …, cK−1}{(xi, ci)}m
i=1
g(x) := 𝒞 (Wh(x, u) + μ)A potential model for such a classifier function isg
We can train the model parameters by minimising a cost function of the formm
∑i=1
L (𝒞(W h(xi, u) + μ), ci) + ℛ(u)
with respect to , and u W μ
In this part we consider supervised machine learning problems, e.g. classification
![Page 9: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/9.jpg)
Introduction
9
Goal of classification is to find mapping given input and output samples .
g : ℝn → {c0, c1, …, cK−1}{(xi, ci)}m
i=1
g(x) := 𝒞 (Wh(x, u) + μ)A potential model for such a classifier function isg
We can train the model parameters by minimising a cost function of the form
12
m
∑i=1
𝒞 (Wh(xi, u) + μ) − ci2
+ ℛ(u)
with respect to , and u W μ
In this part we consider supervised machine learning problems, e.g. classification
![Page 10: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/10.jpg)
Introduction
10
Goal of classification is to find mapping given input and output samples .
g : ℝn → {c0, c1, …, cK−1}{(xi, ci)}m
i=1
g(x) := 𝒞 (Wh(x, u) + μ)A potential model for such a classifier function isg
We can train the model parameters by minimising a cost function of the formm
∑i=1
[log (1 + exp(Wh(xi, u) + μ)) − ci (Wh(xi, u) + μ)] + ℛ(u)
with respect to , and u W μ
In this part we consider supervised machine learning problems, e.g. classification
![Page 11: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/11.jpg)
Introduction
11
We can train the model parameters by minimising a cost function of the form
12
m
∑i=1
𝒞 (Wh(xi, u) + μ) − ci2
+ ℛ(u)
with respect to , and u W μHow do we choose the model function ?h
For example as a deep neural network such as the residual network (ResNet), i.e.
y[ j+1]i = y[ j]
i + f (y[ j]i , u[ j]), j = 0,…, N − 1, y[0]
i = xi .
h(xi, u) = y[N ]i , u = (u[0], …, u[N−1]) ,
K. He, X. Zhang, S. Ren and J. Sun, Deep Residual Learning for Image Recognition, in IEEE Conference on Computer Vision and Pattern Recognition, 2016, 770–778.
In this part we consider supervised machine learning problems, e.g. classification
![Page 12: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/12.jpg)
Introduction
12
How do we choose the model function ?hFor example as a deep neural network such as the residual network (ResNet), i.e.
y[ j+1]i = y[ j]
i + f (y[ j]i , u[ j]), j = 0,…, N − 1, y[0]
i = xi .
h(xi, u) = y[N ]i , u = (u[0], …, u[N−1]) ,
K. He, X. Zhang, S. Ren and J. Sun, Deep Residual Learning for Image Recognition, in IEEE Conference on Computer Vision and Pattern Recognition, 2016, 770–778.
Possible choices for and :u[ j] f
u[ j] := (K[ j], β[ j]), j = 0,…, N − 1,
f (y[ j]i , u[ j]) := σ(K[ j]y[ j]
i + β[ j])
K[ j] ∈ ℝn×n β[ j] ∈ ℝn×1
σ(x) = tanh(x)
![Page 13: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/13.jpg)
Introduction
13
Training the ResNet with (mean) squared error cost function therefore reads as
y[ j+1]i = y[ j]
i + f (y[ j]i , u[ j]), j = 0,…, N − 1, y[0]
i = xi .
miny,u,W,μ
m
∑i=1
𝒞 (Wy[N ]i + μ) − ci
2
+ ℛ(u),
subject to
![Page 14: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/14.jpg)
Introduction
14
y[ j+1]i = y[ j]
i + f (y[ j]i , u[ j]), j = 0,…, N − 1, y[0]
i = xi .
miny,u,W,μ
m
∑i=1
𝒞 (Wy[N ]i + μ) − ci
2
+ ℛ(u),
subject to
Δt
E. Haber and L. Ruthotto, Stable architectures for deep neural networks, Inverse Problems, 34 (2017), 014004.
By introducing a step-size parameter we can view the constraint as a forward Euler discretisation of the differential equation
Δt
·yi = f (yi, u), t ∈ [0,T ], yi(0) = xi .
Training the ResNet with (mean) squared error cost function therefore reads as
![Page 15: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/15.jpg)
Introduction
15
In the continuum we therefore obtain the following optimal control problem
miny,u,W,μ
m
∑i=1
𝒞 (Wyi(T ) + μ) − ci2
+ ℛ(u),
subject to
B. Chang, L. Meng, E. Haber, L. Ruthotto, D. Begert and E. Holtham, Reversible architectures for arbitrarily deep residual neural networks, in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
·yi = f (yi, u), t ∈ [0,T ], yi(0) = xi .
![Page 16: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/16.jpg)
www.qmul.ac.uk [email protected]
A lot of research in this direction
16
• Yann LeCun. A theoretical framework for back-propagation. In Proceedings of the 1988 connectionist models summer school, volume 1, pages 21–28. CMU, Pittsburgh, Pa: Morgan Kaufmann, 1988.
• Weinan E, A Proposal on Machine Learning via Dynamical Systems, Comm. in Math. and Stat. 2017. • B. Chang, L. Meng, E. Haber, L. Ruthotto, D. Begert and E. Holtham, Reversible Architectures for Arbitrarily Deep Residual
Neural Networks, arXiv: 1709.03698v1, AAAI (National Conference on Artificial Intelligence). • E. Haber, L. Ruthotto, Stable Architectures for Deep Neural Networks, arXiv: 1705.03341v2 • Qianxiao Li Shuji Hao, An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks • Qianxiao Li, Long Chen, Cheng Tai, Weinan E, Maximum Principle Based Algorithms for Deep Learning, Journal of Machine
Learning Research 18 (2018). • L. Ruthotto and E. Haber, Deep Neural Networks Motivated by Partial Differential Equations, arXiv:1804.04272 • Lu, Y., Zhong, A., Li, Q., Bin Dong. (2017, October 27). Beyond Finite Layer Neural Networks: Bridging Deep Architectures
and Numerical Differential Equations. arXiv.org. • Chen, T. Q., Rubanova, Y., Bettencourt, J., Duvenaud, D. (2018). Neural Ordinary Differential Equations. Presented at
NeurIPS. • Gholami, A., Keutzer, K., Biros, G. (2019, February 26). ANODE: Unconditionally Accurate Memory-Efficient Gradients for
Neural ODEs. • Zhang, T., Yao, Z., Gholami, A., Keutzer, K., Gonzalez, J., Biros, G., Mahoney, M. (2019, June 9). ANODEV2: A Coupled
Neural ODE Evolution Framework.
![Page 17: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/17.jpg)
www.qmul.ac.uk [email protected]
A lot of research in this direction
17
• Yann LeCun. A theoretical framework for back-propagation. In Proceedings of the 1988 connectionist models summer school, volume 1, pages 21–28. CMU, Pittsburgh, Pa: Morgan Kaufmann, 1988.
• Weinan E, A Proposal on Machine Learning via Dynamical Systems, Comm. in Math. and Stat. 2017. • B. Chang, L. Meng, E. Haber, L. Ruthotto, D. Begert and E. Holtham, Reversible Architectures for Arbitrarily Deep Residual
Neural Networks, arXiv: 1709.03698v1, AAAI (National Conference on Artificial Intelligence). • E. Haber, L. Ruthotto, Stable Architectures for Deep Neural Networks, arXiv: 1705.03341v2 • Qianxiao Li Shuji Hao, An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks • Qianxiao Li, Long Chen, Cheng Tai, Weinan E, Maximum Principle Based Algorithms for Deep Learning, Journal of Machine
Learning Research 18 (2018). • L. Ruthotto and E. Haber, Deep Neural Networks Motivated by Partial Differential Equations, arXiv:1804.04272 • Lu, Y., Zhong, A., Li, Q., Bin Dong. (2017, October 27). Beyond Finite Layer Neural Networks: Bridging Deep Architectures
and Numerical Differential Equations. arXiv.org. • Chen, T. Q., Rubanova, Y., Bettencourt, J., Duvenaud, D. (2018). Neural Ordinary Differential Equations. Presented at
NeurIPS. • Gholami, A., Keutzer, K., Biros, G. (2019, February 26). ANODE: Unconditionally Accurate Memory-Efficient Gradients for
Neural ODEs. • Zhang, T., Yao, Z., Gholami, A., Keutzer, K., Gonzalez, J., Biros, G., Mahoney, M. (2019, June 9). ANODEV2: A Coupled
Neural ODE Evolution Framework.
![Page 18: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/18.jpg)
DEEP LEARNING AS OPTIMAL CONTROL PROBLEMS
18
![Page 19: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/19.jpg)
Deep learning as optimal control problems
19
miny,u,W,μ
m
∑i=1
𝒞 (Wyi(T ) + μ) − ci2
+ ℛ(u),
subject to·yi = f (yi, u), t ∈ [0,T ], yi(0) = xi .
Deep learning optimal control problem
![Page 20: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/20.jpg)
Deep learning as optimal control problems
20
subject to·y = f (y, u), t ∈ [0, T ], y(0) = x .
Deep learning optimal control problem
miny, u
𝒥(y(T ))
Variational calculus: consider variations
y := y + εv , u := w + εw .
How does the system behave for ?·y = f (y, u) ε → 0
![Page 21: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/21.jpg)
Deep learning as optimal control problems
21
·y = f (y, u), t ∈ [0, T ], y(0) = x .
Variational calculus: consider variations
y := y + εv , u := w + εw .
How does the system behave for ?·y = f (y, u) ε → 0
Differential equation constraint
ddε ( d
dty(t))
ε=0
= ,
ddε
f (y(t), u(t))ε=0
= .
ddt
v(t)
∂y f (y(t), u(t)) v(t) + ∂u f (y(t), u(t)) w(t)
![Page 22: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/22.jpg)
Deep learning as optimal control problems
22
subject to·y = f (y, u), t ∈ [0, T ], y(0) = x .
Deep learning optimal control problem
miny, u
𝒥(y(T ))
Variational equation
Jacobian of w.r.t. to f uJacobian of w.r.t. to f y
ddt
v(t) ∂y f (y(t), u(t)) v(t) + ∂u f (y(t), u(t)) w(t)=
![Page 23: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/23.jpg)
Adjoint equationThe adjoint of the variational equation is a system of differential equations for a variable , obtained assumingp(t)
⟨p(t), v(t)⟩ = ⟨p(0), v(0)⟩, ∀t ∈ [0,T ] .
This implies⟨p(t), ·v(t)⟩ = − ⟨ ·p(t), v(t)⟩,
which yields ddt
p(t) = − (∂y f (y(t), u(t)))⊤
p(t) ,
with constraint (∂u f (y(t), u(t)))⊤ p(t) = 0 .
![Page 24: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/24.jpg)
First-order necessary conditions for optimality
24
·y = f (y, u), y(0) = x ,Then, the boundary value problem system
miny, u
𝒥(y(T )) ,
·p = − (∂y f (y, u))⊤
p , p(T ) = ∂y𝒥(y)y=y(T )
,
0 = (∂u f (y, u))⊤ p , t ∈ [0, T ] ,
expresses the first order necessary conditions for optimality of
Sketch of proof: set up a Lagrange functional and compute the optimality conditions.
![Page 25: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/25.jpg)
Associated Hamiltonian systemFor this optimal control problem, there is an associated Hamiltonian system with Hamiltonian
ℋ(y, p, u) := p⊤ f (y, u)
with·y = ∂pℋ, ·p = − ∂yℋ, ∂uℋ = 0.
The constraint implies that this is a differential algebraic equation of index one
0 = (∂u f (y, u))⊤ p
Provided the Hessian is invertible, we can regard this system as an ODE when applying numerical methods.
∂u,uℋ
![Page 26: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/26.jpg)
Numerical discretisation of the optimal control problem
26
How do we discretise the boundary value problem system
·y = f (y, u), y(0) = x ,
·p = − (∂y f (y, u))⊤
p , p(T ) = ∂y𝒥(y)y=y(T )
,
0 = (∂u f (y, u))⊤ p , t ∈ [0, T ] ?
Symplectic Runge-Kutta methods are suited to this problem!
J. M. Sanz-Serna, Symplectic Runge-Kutta schemes for adjoint equations automatic differentiation, optimal control and more, SIAM Rev., 58 (2016), 3–33.
![Page 27: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/27.jpg)
y[ j+1] = y[ j] + Δts
∑i=1
bi f [ j]i
f [ j]i = f (y[ j]
i , u[ j]i ), i = 1,…, s,
y[ j]i = y[ j] + Δt
s
∑l=1
ai,l f [ j]l , i = 1,…, s
Symplectic Runge-Kutta methods are suited to this problem!
p[ j+1] = p[ j] + Δts
∑i=1
bi ℓ[ j]i
Numerical discretisation of the optimal control problem
ℓ[ j]i = − ∂y f (y[ j]
i , u[ j]i )T p[ j]
i , i = 1,…, s,
p[ j]i = p[ j] + Δt
s
∑l=1
ai,lℓ[ j]l , i = 1,…, s
Forward pass
Backward pass
y[0] = x
p[N ] := ∂𝒥(y[N ])
(∂u f (y[ j]i , u[ j]
i ))T
p[ j]i = 0
biai, j + bjai, j − bibj = 0bi = bi
This is the symplectic part!
![Page 28: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/28.jpg)
y[ j+1] = y[ j] + Δt f (y[ j], u[ j]),
Symplectic Runge-Kutta methods are suited to this problem!
p[ j+1] = p[ j] − Δt (∂y f (y[ j], u[ j]))T
p[ j+1],
Numerical discretisation of the optimal control problem
y[0] = x
p[N ] := ∂y𝒥(y[N ])
0 = (∂u f (y[ j]i , u[ j]
i ))T
p[ j+1]i
Special case: symplectic Euler
Has the advantage that bilinear forms are preserved after discretisation, i.e. will be preserved after discretisation.⟨p(t), v(t)⟩
![Page 29: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/29.jpg)
Numerical discretisation of the optimal control problem
Proposition: if the continuous optimal control system is discretised with a symplectic partitioned Runge-Kutta method with , then the first order optimality for the discrete control problem
bi ≠ 0
min{u[ j]
i }N−1j=0 ,
{y[ j]}Nj=1, {y[ j]
i }Nj=1,
𝒥(y[N ]),
subject toy[ j+1] = y[ j] + Δt
s
∑i=1
bi f [ j]i
f [ j]i = f (y[ j]
i , u[ j]i ), i = 1,…, s,
y[ j]i = y[ j] + Δt
s
∑m=1
ai,m f [ j]m , i = 1,…, s,
is satisfied.
![Page 30: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/30.jpg)
Numerical discretisation of the optimal control problem
What does this mean in practice?
If we use symplectic partitioned Runge-Kutta methods, then first-optimise-then-discretise and first-discretise-then-optimise
lead to the same optimal control problems.
·y = f (y, u)
y[ j]i = y[ j] + Δt
s
∑m=1
ai,m f [ j]m min
{u [ j]i }N−1
j=0 ,{y[ j]}Nj=1,{y[ j]
i }Nj=1,
𝒥(y[N ])
miny, u
𝒥(y(T ))optimise
optimise
discretise discretise
![Page 32: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/32.jpg)
Runge-Kutta networks
32
The discretisation of the optimal control problem with symplectic partitioned Runge-Kutta methods inspires a range of different network architectures.
Activation function: with parameters f (y, u) = σ(Ky + β) u = (K, β)
y[ j+1] = y[ j] + Δts
∑i=1
bi f [ j]i
f [ j]i = σ(K[ j]
i y[ j]i + β[ j]
i ), i = 1,…, s,
y[ j]i = y[ j] + Δt
s
∑l=1
ai,l f [ j]l , i = 1,…, s
Runge-Kutta networks:
![Page 33: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/33.jpg)
Runge-Kutta networks
33
The discretisation of the optimal control problem with symplectic partitioned Runge-Kutta methods inspires a range of different network architectures.
Runge-Kutta networks: special case , the ResNets = 1
y[ j+1] = y[ j] + Δt σ (K[ j]y[ j] + β[ j])Backpropagation: γ[ j] = σ′ (K[ j]y[ j] + β[ j]) ⊙ p[ j+1]
p[ j+1] = p[ j] − Δt (K[ j])⊤γ[ j]
Activation function: with parameters f (y, u) = σ(Ky + β) u = (K, β)
![Page 34: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/34.jpg)
Runge-Kutta networks
34
The discretisation of the optimal control problem with symplectic partitioned Runge-Kutta methods inspires a range of different network architectures.
Runge-Kutta networks: special case , the ResNets = 1
y[ j+1] = y[ j] + Δt σ (K[ j]y[ j] + β[ j])γ[ j] = σ′ (K[ j]y[ j] + β[ j]) ⊙ p[ j+1]
p[ j+1] = p[ j] − Δt (K[ j])⊤γ[ j]
Parameter gradient: ∂K[ j]𝒥(y[N ]) = Δt γ[ j] (y[ j])⊤
∂β[ j]𝒥(y[N ]) = Δt γ[ j]
Activation function: with parameters f (y, u) = σ(Ky + β) u = (K, β)
![Page 35: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/35.jpg)
ODENet
35
The discretisation of the optimal control problem with symplectic partitioned Runge-Kutta methods inspires a range of different network architectures.
Activation function: with parameters f (y, u) = α σ(Ky + β) u = (K, α, β)
Example: ResNet with varying time steps, i.e.
y[ j+1] = y[ j] + Δt α[ j] σ (K[ j]y[ j] + β[ j]) ;
we refer to this model as the ODENet.
Parameter gradient: ∂α[ j]𝒥(y[N ]) = Δt ⟨p[ j+1], σ (K[ j]y[ j] + β[ j])⟩
![Page 36: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/36.jpg)
ODENet
36
The discretisation of the optimal control problem with symplectic partitioned Runge-Kutta methods inspires a range of different network architectures.
Example: ResNet with varying time steps, i.e.
y[ j+1] = y[ j] + Δt α[ j] σ (K[ j]y[ j] + β[ j]) ;
we refer to this model as the ODENet.
In some applications it can make sense to assume that the learned time steps have to lie in the simplex
S := α ∈ ℝN α[ j] ≥ 0,N
∑j=1
α[ j] = 1 .
Activation function: with parameters f (y, u) = α σ(Ky + β) u = (K, α, β)
![Page 38: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/38.jpg)
Setup
38
Binary classification: ci ∈ {0,1}
Hypothesis function: 𝒞(x) =1
1 + exp(−x)
miny,u,W,μ
m
∑i=1
𝒞 (Wyi(T ) + μ) − ci2
+ ℛ(u),
subject to·yi = f (yi, u), t ∈ [0,T ], yi(0) = xi .
Deep learning optimal control problem
Varying number of layers after discretisation
Activation function: σ(x) = tanh(x)
![Page 39: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/39.jpg)
www.qmul.ac.uk /QMUL @QMUL
Datasets2D data sets
donut1d donut2d
spiral squares
![Page 40: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/40.jpg)
www.qmul.ac.uk /QMUL @QMUL
Results for donut1d
![Page 41: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/41.jpg)
www.qmul.ac.uk /QMUL @QMUL
Results for squares
![Page 42: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/42.jpg)
www.qmul.ac.uk /QMUL @QMUL
Transformation of features
![Page 43: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/43.jpg)
www.qmul.ac.uk /QMUL @QMUL
Transformations for fixed classifier
![Page 44: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/44.jpg)
www.qmul.ac.uk /QMUL @QMUL
Robustness to initialisations
![Page 45: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/45.jpg)
www.qmul.ac.uk /QMUL @QMUL
Optimisation
![Page 46: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/46.jpg)
www.qmul.ac.uk /QMUL @QMUL
Adaptive time steps for ODENet
![Page 47: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/47.jpg)
www.qmul.ac.uk /QMUL @QMUL
Comparison of Runge-Kutta NetworksWe now train different Runge-Kutta networks for the same classification tasks
Butcher tableau ResNet/Euler Improved Euler
Kutta(3) Kutta(4)
![Page 48: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/48.jpg)
www.qmul.ac.uk /QMUL @QMUL
Comparison of Runge-Kutta Networks
![Page 49: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/49.jpg)
www.qmul.ac.uk /QMUL @QMUL
Comparison of Runge-Kutta Networks
![Page 50: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/50.jpg)
www.qmul.ac.uk /QMUL @QMUL
Comparison of Runge-Kutta Networks
![Page 51: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/51.jpg)
www.qmul.ac.uk /QMUL @QMUL
ODENet for MNIST
![Page 52: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/52.jpg)
www.qmul.ac.uk /QMUL @QMUL
ODENet for MNIST
![Page 53: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/53.jpg)
www.qmul.ac.uk /QMUL @QMUL
ODENet for MNIST
![Page 55: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/55.jpg)
Regularisation properties
55
Let us consider the continuous formulation of the neural network constraint
ddt
y(t) = σ (K(t)y(t) + β(t))
![Page 56: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/56.jpg)
Regularisation properties
56
Let us consider the continuous formulation of the neural network constraint
ddt
y(t) = σ (K(t)y(t) + β(t))
and modify it as suggested in *
−K(t)⊤
Then we can verify the following stability estimate if is chosen to be monotone: σ
∥y(t) − yδ(t)∥ ≤ C ∥y(0) − yδ(0)∥ , for t ≥ 0 .
Here is a constant and and denote solutions of the flow for initial values and .
C > 0 y(t) yδ(t)y(0) yδ(0)
*Lars Ruthotto and Eldad Haber. "Deep neural networks motivated by partial differential equations." arXiv preprint arXiv:1804.04272 (2018).
![Page 57: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/57.jpg)
Regularisation properties
57
Then we can verify the following stability estimate if is chosen to be monotone: σ
∥y(t) − yδ(t)∥ ≤ C ∥y(0) − yδ(0)∥ , for t ≥ 0 .
Proof:ddt ( 1
2∥y(t) − yδ(t)∥)
2
= − ⟨K(t)⊤(σ (K(t)y(t) + β(t)) − σ (K(t)yδ(t) + β(t))), y(t) − yδ(t)⟩= − ⟨σ (K(t)y(t) + β(t)) − σ (K(t)yδ(t) + β(t)), K(t)y(t) + β(t) − (K(t)yδ(t) + β(t))⟩
= ⟨ ddt
y(t) −ddt
yδ(t), y(t) − yδ(t)⟩
![Page 58: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/58.jpg)
Regularisation properties
58
Then we can verify the following stability estimate if is chosen to be monotone: σ
∥y(t) − yδ(t)∥ ≤ C ∥y(0) − yδ(0)∥ , for t ≥ 0 .
Proof:
= − ⟨σ (K(t)y(t) + β(t)) − σ (K(t)yδ(t) + β(t)), K(t)y(t) + β(t) − (K(t)yδ(t) + β(t))⟩= − ⟨σ(z(t)) − σ(zδ(t)), z(t) − zδ(t)⟩
for and .z(t) := K(t)y(t) + β(t) zδ(t) := K(t)yδ(t) + β(t)
ddt ( 1
2∥y(t) − yδ(t)∥)
2
= ⟨ ddt
y(t) −ddt
yδ(t), y(t) − yδ(t)⟩
≤ 0 ,
![Page 59: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/59.jpg)
Regularisation properties
59
Then we can verify the following stability estimate if is chosen to be monotone: σ
∥y(t) − yδ(t)∥ ≤ C ∥y(0) − yδ(0)∥ , for t ≥ 0 .
Proof:ddt ( 1
2∥y(t) − yδ(t)∥)
2
Integrating from 0 to now yieldst12
∥y(t) − yδ(t)∥2 −12
∥y(0) − yδ(0)∥2 ≤ 0 ,
which concludes the proof. ∎
≤ 0 .
![Page 60: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/60.jpg)
Regularisation propertiesIs stability a desirable property in neural networks?
Explaining and Harnessing Adversarial Examples, Goodfellow et al, ICLR 2015.
Adversarial examples in image classification:
![Page 61: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/61.jpg)
Regularisation propertiesHardening of deep neural networks:
suppose we have inputs and with , then our network produces outputs with
y(0) = x yδ(0) = xδ
∥x − xδ∥ ≤ δ
∥y(t) − yδ(t)∥ ≤ C δ .
limδ→0
∥y(t) − yδ(t)∥ = 0 .Hence, we observe
This continuous dependency is also a feature of regularisation operators;
We should discretise or train such that this stability estimate is preserved.
![Page 62: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/62.jpg)
Regularisation properties
ddt
y(t) = −K(t)⊤σ (K(t)y(t) + β(t))
Connection to inverse problems for example with additional reaction term:
−λA⊤ ( Ay(t) − f)
Let and denote solutions of this flow for data and with and identical initial values. For this model we can derive the following stability estimate:
y(t) yδ(t) f f δ ∥f − f δ∥ ≤ δ
ddt ( 1
2∥y(t) − yδ(t)∥2) ≤ − λ ⟨A⊤(Ay(t) − f ) − A⊤(Ayδ(t) − f δ), y(t) − yδ(t)⟩
= − λ ⟨A(y(t) − yδ(t)) − ( f − f δ), A(y(t) − yδ(t))⟩
![Page 63: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/63.jpg)
Regularisation properties
ddt
y(t) = −K(t)⊤σ (K(t)y(t) + β(t))
Connection to inverse problems for example with additional reaction term:
−λA⊤ ( Ay(t) − f)
Let and denote solutions of this flow for data and with and identical initial values. For this model we can derive the following stability estimate:
y(t) yδ(t) f f δ ∥f − f δ∥ ≤ δ
ddt ( 1
2∥y(t) − yδ(t)∥2) = − λ ⟨A(y(t) − yδ(t)) − ( f − f δ), A(y(t) − yδ(t))⟩
≤ λ ⟨f − f δ, A(y(t) − yδ(t))⟩≤ λ∥f − f δ∥∥A(y(t) − yδ(t))∥
![Page 64: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/64.jpg)
Regularisation properties
ddt
y(t) = −K(t)⊤σ (K(t)y(t) + β(t))
Connection to inverse problems for example with additional reaction term:
−λA⊤ ( Ay(t) − f)
Let and denote solutions of this flow for data and with and identical initial values. For this model we can derive the following stability estimate:
y(t) yδ(t) f f δ ∥f − f δ∥ ≤ δ
ddt ( 1
2∥y(t) − yδ(t)∥2) ≤ λ∥f − f δ∥∥A(y(t) − yδ(t))∥
≤ λδ∥A∥∥y(t) − yδ(t)∥
![Page 65: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/65.jpg)
Regularisation properties
ddt
y(t) = −K(t)⊤σ (K(t)y(t) + β(t))
Connection to inverse problems for example with additional reaction term:
−λA⊤ ( Ay(t) − f)Let and denote solutions of this flow for data and with and identical initial values. For this model we can derive the following stability estimate:
y(t) yδ(t) f f δ ∥f − f δ∥ ≤ δ
ddt ( 1
2∥y(t) − yδ(t)∥2) ≤ λδ∥A∥∥y(t) − yδ(t)∥
⟹ ∥y(t) − yδ(t)∥ ≤ ∥y(0) − yδ(0)∥ + λδ∥A∥ t
= λ∥A∥t δ
![Page 66: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/66.jpg)
Regularisation properties
ddt
y(t) = −K(t)⊤σ (K(t)y(t) + β(t))
Connection to inverse problems for example with additional reaction term:
−λA⊤ ( Ay(t) − f)Let and denote solutions of this flow for data and with and identical initial values. For this model we can derive the following stability estimate:
y(t) yδ(t) f f δ ∥f − f δ∥ ≤ δ
∥y(t) − yδ(t)∥ ≤ λ∥A∥t δ .
Hence, we need to choose a stopping time to guarantee convergence of this stability estimate, i.e. .
t*limδ↓0
∥y(t) − yδ(t)∥ = 0
![Page 67: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/67.jpg)
Regularisation properties
ddt
y(t) = −K(t)⊤σ (K(t)y(t) + β(t))
Connection to inverse problems for example with additional reaction term:
−λA⊤ ( Ay(t) − f)When do we converge to a solution of the inverse problem, i.e. ?y† Ay† = f
ddt ( 1
2∥yδ(t) − y†∥2) = ⟨ d
dtyδ(t), yδ(t) − y†⟩
= − ⟨K(t)⊤σ (K(t)yδ(t) + β(t)) + λA⊤ (Ayδ(t) − f δ), y(t) − y†⟩−⟨K(t)⊤σ (K(t)yδ(t) + β(t)), y(t) − y†⟩
first inner product
−λ ⟨A⊤ (Ayδ(t) − f δ), y(t) − y†⟩second inner product
=
![Page 68: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/68.jpg)
Regularisation propertiesWhen do we converge to a solution of the inverse problem, i.e. ?y† Ay† = f
−⟨K(t)⊤σ (K(t)yδ(t) + β(t)), y(t) − y†⟩first inner product
= −⟨K(t)⊤σ (K(t)yδ(t) + β(t)) − K(t)⊤σ (K(t)y† + β(t)), y(t) − y†⟩+⟨K(t)⊤σ (K(t)y† + β(t)), y(t) − y†⟩
≤ ⟨K(t)⊤σ (K(t)y† + β(t)), y(t) − y†⟩Monotonic increase of σ
![Page 69: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/69.jpg)
Regularisation propertiesWhen do we converge to a solution of the inverse problem, i.e. ?y† Ay† = f
−⟨K(t)⊤σ (K(t)yδ(t) + β(t)), y(t) − y†⟩first inner product
≤ ⟨K(t)⊤σ (K(t)y† + β(t)), y(t) − y†⟩
K(t)⊤σ (K(t)y† + β(t)) y(t) − y†≤
![Page 70: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/70.jpg)
Regularisation properties
ddt
y(t) = −K(t)⊤σ (K(t)y(t) + β(t))
Connection to inverse problems for example with additional reaction term:
−λA⊤ ( Ay(t) − f)When do we converge to a solution of the inverse problem, i.e. ?y† Ay† = f
ddt ( 1
2∥yδ(t) − y†∥2) = ⟨ d
dtyδ(t), yδ(t) − y†⟩
= − ⟨K(t)⊤σ (K(t)yδ(t) + β(t)) + λA⊤ (Ayδ(t) − f δ), y(t) − y†⟩−λ ⟨A⊤ (Ayδ(t) − f δ), y(t) − y†⟩
second inner product
K(t)⊤σ (K(t)y† + β(t)) y(t) − y†≤
![Page 71: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/71.jpg)
Regularisation propertiesWhen do we converge to a solution of the inverse problem, i.e. ?y† Ay† = f
= −λ ⟨Ayδ(t) − f δ, Ay(t) − Ay†⟩
−λ ⟨A⊤ (Ayδ(t) − f δ), y(t) − y†⟩second inner product
= − λ ⟨Ayδ(t) − f δ, Ay(t) − f⟩−λ ⟨Ayδ(t) − f + f − f δ, Ay(t) − f⟩=
= −λ Ayδ(t) − f2
− λ ⟨f − f δ, Ayδ(t) − f⟩≤ λ f − f δ Ayδ(t) − f λδ∥A∥ yδ(t) − y†≤
![Page 72: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/72.jpg)
Regularisation properties
ddt
y(t) = −K(t)⊤σ (K(t)y(t) + β(t))
Connection to inverse problems for example with additional reaction term:
−λA⊤ ( Ay(t) − f)When do we converge to a solution of the inverse problem, i.e. ?y† Ay† = f
ddt ( 1
2∥yδ(t) − y†∥2) = ⟨ d
dtyδ(t), yδ(t) − y†⟩
= − ⟨K(t)⊤σ (K(t)yδ(t) + β(t)) + λA⊤ (Ayδ(t) − f δ), y(t) − y†⟩K(t)⊤σ (K(t)y† + β(t)) y(t) − y†≤ + λδ∥A∥ yδ(t) − y†
= ( K(t)⊤σ (K(t)y† + β(t)) + λδ∥A∥) yδ(t) − y†
![Page 73: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/73.jpg)
Regularisation properties
ddt
y(t) = −K(t)⊤σ (K(t)y(t) + β(t))
Connection to inverse problems for example with additional reaction term:
−λA⊤ ( Ay(t) − f)When do we converge to a solution of the inverse problem, i.e. ?y† Ay† = f
ddt ( 1
2∥yδ(t) − y†∥2) ≤ ( K(t)⊤σ (K(t)y† + β(t)) + λδ∥A∥) yδ(t) − y†
Assumption (source condition): is bounded for all .∥K(t)⊤σ (K(t)y† + β(t)) ∥ t ≥ 0
⟹ddt ( 1
2∥yδ(t) − y†∥2) ≤ (C + λδ∥A∥) yδ(t) − y†
![Page 74: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/74.jpg)
Regularisation properties
ddt
y(t) = −K(t)⊤σ (K(t)y(t) + β(t))
Connection to inverse problems for example with additional reaction term:
−λA⊤ ( Ay(t) − f)Assumption (source condition): is bounded for all .∥K(t)⊤σ (K(t)y† + β(t)) ∥ t ≥ 0
⟹
ddt ( 1
2∥yδ(t) − y†∥2) ≤ (C + λδ∥A∥) yδ(t) − y†
∥yδ(t) − y†∥ ≤ ∥yδ(0) − y†∥ + (C + λδ∥A∥) t
![Page 75: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/75.jpg)
Regularisation propertiesAssumption (source condition): is bounded for all .∥K(t)⊤σ (K(t)y† + β(t)) ∥ t ≥ 0
⟹
ddt ( 1
2∥yδ(t) − y†∥2) ≤ (C + λδ∥A∥) yδ(t) − y†
∥yδ(t) − y†∥ ≤ ∥yδ(0) − y†∥ + (C + λδ∥A∥) t
Suppose and , then we could estimateA = Id yδ(0) = f δ
∥yδ(t) − y†∥ ≤ ∥f δ − f ∥ +(C + λδ∥A∥) t
![Page 76: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/76.jpg)
Regularisation propertiesAssumption (source condition): is bounded for all .∥K(t)⊤σ (K(t)y† + β(t)) ∥ t ≥ 0
⟹
ddt ( 1
2∥yδ(t) − y†∥2) ≤ (C + λδ∥A∥) yδ(t) − y†
∥yδ(t) − y†∥ ≤ ∥yδ(0) − y†∥ + (C + λδ∥A∥) t
Suppose and , then we could estimateA = Id yδ(0) = f δ
∥yδ(t) − y†∥ ≤ δ +(C + λδ∥A∥) t
and conclude for .limδ↓0
∥yδ(t*) − y†∥ = 0 t* ∼ δ
![Page 77: Deep learning as optimal control problems...Learning, Campus Jussieu, Sorbonne Université, Paris 19.11.2019 Deep learning as optimal control problems Martin Benning, Queen Mary University](https://reader033.vdocuments.site/reader033/viewer/2022050507/5f9848dc856164591f67d33c/html5/thumbnails/77.jpg)
www.qmul.ac.uk /QMUL @QMUL
Thank you for your attention!Acknowledgements: