newton method - carnegie mellon school of computer science · equivalent to newton step on f(x). a...
TRANSCRIPT
![Page 1: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/1.jpg)
Newton Method
Materials courtesy: B. Poczos, R. Tibshirani, C. Carmanis & S. Sanghavi
![Page 2: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/2.jpg)
Outline
![Page 3: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/3.jpg)
![Page 4: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/4.jpg)
![Page 5: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/5.jpg)
5
Newton Method for Finding a Root
Linear Approximation (1st order Taylor approx):
Goal:
Therefore,
![Page 6: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/6.jpg)
6
Illustration of Newton’s method Goal: finding a root
In the next step we will linearize here in x
![Page 7: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/7.jpg)
7
Example: Finding a Root
http://en.wikipedia.org/wiki/Newton%27s_method
![Page 8: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/8.jpg)
8
Newton Method for Finding a Root This can be generalized to multivariate functions
Therefore,
[Pseudo inverse if there is no inverse]
![Page 9: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/9.jpg)
9
![Page 10: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/10.jpg)
10
Newton method for minimization
![Page 11: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/11.jpg)
11
Newton method for minimization
unconstrained
twice differentiable
![Page 12: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/12.jpg)
12
![Page 13: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/13.jpg)
13
Motivation with Quadratic Approximation
unconstrained
twice differentiable
![Page 14: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/14.jpg)
14
Motivation with Quadratic Approximation
![Page 15: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/15.jpg)
15
Comparison with Gradient Descent
![Page 16: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/16.jpg)
16
Comparison with Gradient Descent
Gradient descent uses a different quadratic approximation:
Newton method is obtained by minimizing over quadratic approximation:
![Page 17: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/17.jpg)
17
Comparison with Gradient Descent
![Page 18: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/18.jpg)
18
![Page 19: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/19.jpg)
19
![Page 20: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/20.jpg)
20
Pre-Conditioning for Gradient descent
Recall convergence rate for gradient descent:
Can we convert it into well-conditioned problem by changing coordinates?
Can get if A = [r2
f(x)]�1
![Page 21: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/21.jpg)
21
Pre-Conditioning for Gradient descent
Can we convert it into well-conditioned problem by changing coordinates?
Can get if
Running gradient descent for g(y), gives best descent direction and convergence rate.
Equivalent to Newton step on f(x).
A = [r2f(x)]�1/2
![Page 22: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/22.jpg)
22
Affine Invariance
[Proof: HW3]
![Page 23: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/23.jpg)
23
Affine Invariant stopping criterion
Stopping criterion for gradient descent:
Stopping criterion for Newton method:
where is the
Newton decrement.
Not affine-invariant
![Page 24: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/24.jpg)
24
Affine Invariant stopping criterion
![Page 25: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/25.jpg)
25
![Page 26: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/26.jpg)
26
Damped Newton’s Method
![Page 27: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/27.jpg)
27
Backtracking line search
![Page 28: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/28.jpg)
28
Convergence Rate
![Page 29: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/29.jpg)
29
Local convergence for finding root
Quadratic convergence
![Page 30: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/30.jpg)
30
Convergence analysis
![Page 31: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/31.jpg)
31
Convergence analysis
![Page 32: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/32.jpg)
32
Convergence analysis
Analysis can be improved e.g. for self-concordant functions
![Page 33: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/33.jpg)
33
Comparison to first-order methods
![Page 34: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/34.jpg)
34
Comparison to first-order methods
![Page 35: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/35.jpg)
35
Example: Logistic regression
![Page 36: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion](https://reader033.vdocuments.site/reader033/viewer/2022050216/5f61e18e692b314fac495efa/html5/thumbnails/36.jpg)
36
Example: Logistic regression