on the geometry of optimization problems and their structure · optimizationproblems goal...
TRANSCRIPT
![Page 1: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/1.jpg)
On the Geometry of Optimization Problemsand their Structure
PhD defense - Vincent Roulet
Supervised by Alexandre d’Aspremont
December 21, 2017
1
![Page 2: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/2.jpg)
Part 1
Sharpness in Convex Optimization Problems
2
![Page 3: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/3.jpg)
Optimization ProblemsGoal Make best possible action for a given task
Classical Example: Minimize production cost of an item
Formally,
minimize f (x)
subject to x ∈ C
in x whereI x = (x1, . . . , xd) ∈ Rd represents the parameters of the taskI f : Rd → R measures the cost of an actionI C ⊂ Rd represents constraints on the possible parameters
ApplicationsI Design an electronic circuitI Fit a model to data (Machine Learning in 2nd part)I ...
3
![Page 4: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/4.jpg)
Optimization ProblemsGoal Make best possible action for a given task
Classical Example: Minimize production cost of an item
Formally,
minimize f (x)
subject to x ∈ C
in x whereI x = (x1, . . . , xd) ∈ Rd represents the parameters of the taskI f : Rd → R measures the cost of an actionI C ⊂ Rd represents constraints on the possible parameters
ApplicationsI Design an electronic circuitI Fit a model to data (Machine Learning in 2nd part)I ...
3
![Page 5: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/5.jpg)
Optimization ProblemsGoal Make best possible action for a given task
Classical Example: Minimize production cost of an item
Formally,
minimize f (x)
subject to x ∈ C
in x whereI x = (x1, . . . , xd) ∈ Rd represents the parameters of the taskI f : Rd → R measures the cost of an actionI C ⊂ Rd represents constraints on the possible parameters
ApplicationsI Design an electronic circuitI Fit a model to data (Machine Learning in 2nd part)I ...
3
![Page 6: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/6.jpg)
Optimization ProblemsGoal Make best possible action for a given task
Classical Example: Minimize production cost of an item
Formally,
minimize f (x)
subject to x ∈ C
in x whereI x = (x1, . . . , xd) ∈ Rd represents the parameters of the taskI f : Rd → R measures the cost of an actionI C ⊂ Rd represents constraints on the possible parameters
ApplicationsI Design an electronic circuitI Fit a model to data (Machine Learning in 2nd part)I ...
3
![Page 7: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/7.jpg)
Optimization Algorithms
Principle: Search iteratively an approximate solution
Description1. Starts from x0 ∈ C2. At each t ≥ 0, gets information It on the problem at xt3. Builds xt+1 from previous information {I0, . . . , It}
• Assume here first order information It = {f (xt),∇f (xt)}• Rule to design
PerformanceMeasured by number of iterations T to achieve accuracy ε
f (xT )− f ∗ ≤ ε
where f ∗ = minx f (x)
4
![Page 8: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/8.jpg)
Optimization Algorithms
Principle: Search iteratively an approximate solution
Description1. Starts from x0 ∈ C2. At each t ≥ 0, gets information It on the problem at xt3. Builds xt+1 from previous information {I0, . . . , It}
• Assume here first order information It = {f (xt),∇f (xt)}• Rule to design
PerformanceMeasured by number of iterations T to achieve accuracy ε
f (xT )− f ∗ ≤ ε
where f ∗ = minx f (x)
4
![Page 9: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/9.jpg)
Optimization Algorithms
Principle: Search iteratively an approximate solution
Description1. Starts from x0 ∈ C2. At each t ≥ 0, gets information It on the problem at xt3. Builds xt+1 from previous information {I0, . . . , It}
• Assume here first order information It = {f (xt),∇f (xt)}• Rule to design
PerformanceMeasured by number of iterations T to achieve accuracy ε
f (xT )− f ∗ ≤ ε
where f ∗ = minx f (x)
4
![Page 10: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/10.jpg)
Optimization Algorithms
Principle: Search iteratively an approximate solution
Description1. Starts from x0 ∈ C2. At each t ≥ 0, gets information It on the problem at xt3. Builds xt+1 from previous information {I0, . . . , It}
• Assume here first order information It = {f (xt),∇f (xt)}• Rule to design
PerformanceMeasured by number of iterations T to achieve accuracy ε
f (xT )− f ∗ ≤ ε
where f ∗ = minx f (x)
4
![Page 11: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/11.jpg)
Algorithm Design
Idea: Use simple geometric description of the function around itsminimizers to design algorithms
In this part1. Take advantage of the sharpness of a function to accelerate
convergence of classical algorithmsSharpness, Restart and Acceleration, V. Roulet and A. d’Aspremont, toappear in Advances in Neural Information Processing Systems 30 (NIPS2017).
2. Use sharpness description to link optimization and statisticalperformances of decoding proceduresComputational Complexity versus Statistical Performance on SparseRecovery Problems, V. Roulet, N. Boumal and A. d’Aspremont, undersubmission to Information and Inference: A Journal of the IMA.
5
![Page 12: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/12.jpg)
Algorithm Design
Idea: Use simple geometric description of the function around itsminimizers to design algorithms
In this part1. Take advantage of the sharpness of a function to accelerate
convergence of classical algorithmsSharpness, Restart and Acceleration, V. Roulet and A. d’Aspremont, toappear in Advances in Neural Information Processing Systems 30 (NIPS2017).
2. Use sharpness description to link optimization and statisticalperformances of decoding proceduresComputational Complexity versus Statistical Performance on SparseRecovery Problems, V. Roulet, N. Boumal and A. d’Aspremont, undersubmission to Information and Inference: A Journal of the IMA.
5
![Page 13: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/13.jpg)
Algorithm Design
Idea: Use simple geometric description of the function around itsminimizers to design algorithms
In this part1. Take advantage of the sharpness of a function to accelerate
convergence of classical algorithmsSharpness, Restart and Acceleration, V. Roulet and A. d’Aspremont, toappear in Advances in Neural Information Processing Systems 30 (NIPS2017).
2. Use sharpness description to link optimization and statisticalperformances of decoding proceduresComputational Complexity versus Statistical Performance on SparseRecovery Problems, V. Roulet, N. Boumal and A. d’Aspremont, undersubmission to Information and Inference: A Journal of the IMA.
5
![Page 14: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/14.jpg)
Outline
I Sharpness in Convex Optimization Problems1.1 Smooth Convex Optimization1.2 Sharpness1.3 Scheduled Restarts1.4 Sharpness on Sparse Recovery Problems
6
![Page 15: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/15.jpg)
Convex Functions
Convex FunctionsA differentiable function f : Rd → R is convex if at any x ∈ Rd ,
f (y) ≥ f (x) + 〈∇f (x), y − x〉, for every y ∈ Rd
← x
x → f(x)y → f(x) + 〈∇f(x), y − x〉
∇f (x) = 0 =⇒ f (x) = f ∗
7
![Page 16: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/16.jpg)
Convex Functions
Convex FunctionsA differentiable function f : Rd → R is convex if at any x ∈ Rd ,
f (y) ≥ f (x) + 〈∇f (x), y − x〉, for every y ∈ Rd
← x
x → f(x)y → f(x) + 〈∇f(x), y − x〉
∇f (x) = 0 =⇒ f (x) = f ∗
7
![Page 17: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/17.jpg)
Strong Convexity
Strong ConvexityA differentiable function f : Rd → R is strongly convex if there existsµ ≥ 0 such that at any x ∈ Rd ,
f (y) ≥ f (x) + 〈∇f (x), y − x〉+ µ
2‖x − y‖22, for every y ∈ dom f .
← x
x → f(x)y → f(x) + 〈∇f(x), y − x〉+ µ
2‖x− y‖22
Implies minimizer x∗ is unique and µ2 ‖x
∗ − x‖22 ≤ f (x)− f ∗
8
![Page 18: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/18.jpg)
Strong Convexity
Strong ConvexityA differentiable function f : Rd → R is strongly convex if there existsµ ≥ 0 such that at any x ∈ Rd ,
f (y) ≥ f (x) + 〈∇f (x), y − x〉+ µ
2‖x − y‖22, for every y ∈ dom f .
← x
x → f(x)y → f(x) + 〈∇f(x), y − x〉+ µ
2‖x− y‖22
Implies minimizer x∗ is unique and µ2 ‖x
∗ − x‖22 ≤ f (x)− f ∗
8
![Page 19: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/19.jpg)
Smooth FunctionsSmooth FunctionsA differentiable function f : Rd → R is smooth if there exists L such that
‖∇f (x)−∇f (y)‖2 ≤ L‖x − y‖2, for every x , y ∈ Rd
Using Taylor expansion of f , at any point x ∈ Rd
f (y) ≤ f (x) + 〈∇f (x), y − x〉+ L2‖x − y‖22, for every y ∈ Rd
← x
x → f(x)
y → f(x) + 〈∇f(x), y − x〉+ L2 ‖x− y‖22
xt+1 = argminy f (xt) + 〈∇f (xt), y − xt〉+ L2‖xt − y‖22
⇒ f (xt+1) ≤ f (xt)− L2‖∇f (xt)‖
22
9
![Page 20: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/20.jpg)
Smooth FunctionsSmooth FunctionsA differentiable function f : Rd → R is smooth if there exists L such that
‖∇f (x)−∇f (y)‖2 ≤ L‖x − y‖2, for every x , y ∈ Rd
Using Taylor expansion of f , at any point x ∈ Rd
f (y) ≤ f (x) + 〈∇f (x), y − x〉+ L2‖x − y‖22, for every y ∈ Rd
← x
x → f(x)
y → f(x) + 〈∇f(x), y − x〉+ L2 ‖x− y‖22
xt+1 = argminy f (xt) + 〈∇f (xt), y − xt〉+ L2‖xt − y‖22
⇒ f (xt+1) ≤ f (xt)− L2‖∇f (xt)‖
22
9
![Page 21: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/21.jpg)
Smooth FunctionsSmooth FunctionsA differentiable function f : Rd → R is smooth if there exists L such that
‖∇f (x)−∇f (y)‖2 ≤ L‖x − y‖2, for every x , y ∈ Rd
Using Taylor expansion of f , at any point x ∈ Rd
f (y) ≤ f (x) + 〈∇f (x), y − x〉+ L2‖x − y‖22, for every y ∈ Rd
← x
x → f(x)
y → f(x) + 〈∇f(x), y − x〉+ L2 ‖x− y‖22
xt+1 = argminy f (xt) + 〈∇f (xt), y − xt〉+ L2‖xt − y‖22
⇒ f (xt+1) ≤ f (xt)− L2‖∇f (xt)‖
22
9
![Page 22: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/22.jpg)
Smooth Convex Optimization ProblemsStudy unconstrained problems
minimize f (x)where f is convex and L-smooth
Optimal algorithmAccelerated gradient descent [Nesterov, 1983] that starts at x0 andoutputs after t iterations, x = A(x0, t), s.t.
f (x)− f ∗ ≤ 4Lt2
d(x0,X∗)2,
where d(x ,X ∗) is the Euclidean distance from x to X ∗ = argminx f (x)
Intuition: Builds estimated sequence of f along iterates
Additional assumptions ?I With strong convexity, optimal algorithm outputs x such that
f (x)− f ∗ = O(exp(−√µ/Lt))
I Weaker assumption ?
10
![Page 23: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/23.jpg)
Smooth Convex Optimization ProblemsStudy unconstrained problems
minimize f (x)where f is convex and L-smooth
Optimal algorithmAccelerated gradient descent [Nesterov, 1983] that starts at x0 andoutputs after t iterations, x = A(x0, t), s.t.
f (x)− f ∗ ≤ 4Lt2
d(x0,X∗)2,
where d(x ,X ∗) is the Euclidean distance from x to X ∗ = argminx f (x)
Intuition: Builds estimated sequence of f along iterates
Additional assumptions ?I With strong convexity, optimal algorithm outputs x such that
f (x)− f ∗ = O(exp(−√µ/Lt))
I Weaker assumption ?
10
![Page 24: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/24.jpg)
Smooth Convex Optimization ProblemsStudy unconstrained problems
minimize f (x)where f is convex and L-smooth
Optimal algorithmAccelerated gradient descent [Nesterov, 1983] that starts at x0 andoutputs after t iterations, x = A(x0, t), s.t.
f (x)− f ∗ ≤ 4Lt2
d(x0,X∗)2,
where d(x ,X ∗) is the Euclidean distance from x to X ∗ = argminx f (x)
Intuition: Builds estimated sequence of f along iterates
Additional assumptions ?I With strong convexity, optimal algorithm outputs x such that
f (x)− f ∗ = O(exp(−√µ/Lt))
I Weaker assumption ?
10
![Page 25: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/25.jpg)
Smooth Convex Optimization ProblemsStudy unconstrained problems
minimize f (x)where f is convex and L-smooth
Optimal algorithmAccelerated gradient descent [Nesterov, 1983] that starts at x0 andoutputs after t iterations, x = A(x0, t), s.t.
f (x)− f ∗ ≤ 4Lt2
d(x0,X∗)2,
where d(x ,X ∗) is the Euclidean distance from x to X ∗ = argminx f (x)
Intuition: Builds estimated sequence of f along iterates
Additional assumptions ?I With strong convexity, optimal algorithm outputs x such that
f (x)− f ∗ = O(exp(−√µ/Lt))
I Weaker assumption ?10
![Page 26: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/26.jpg)
Outline
I Sharpness in Convex Optimization Problems1.1 Smooth Convex Optimization1.2 Sharpness1.3 Scheduled Restarts1.4 Sharpness on Sparse Recovery Problems
11
![Page 27: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/27.jpg)
Sharpness
SharpnessA function f satisfies the sharpness property on a set K ⊃ X ∗ if thereexists r ≥ 1, µ > 0, s.t.
µ
rd(x ,X ∗)r ≤ f (x)− f ∗, for every x ∈ K
Lower bound on the function around minimizers
X∗
x → f(x)x → µd(x,X∗)
X∗
x → f(x)x →
µ
2d(x,X∗)2
r = 1 r = 2
12
![Page 28: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/28.jpg)
Sharpness
ApplicationsI Strongly convex functions (r = 2)I Sparse Prediction problem like f (x) = 1
2‖Ax − b‖22 + λ‖x‖1 (r = 2)I Matrix game problems minx maxy xTAy (r = 1)I Real and subanalytic functions (r = ?)
ReferencesI Studied by Łojasiewicz [1963] for real analytical functionsI Numerous applications e.g. [Bolte et al., 2007]: non-convex
optimization, dynamical systems, concentration inequalities...
13
![Page 29: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/29.jpg)
Sharpness
ApplicationsI Strongly convex functions (r = 2)I Sparse Prediction problem like f (x) = 1
2‖Ax − b‖22 + λ‖x‖1 (r = 2)I Matrix game problems minx maxy xTAy (r = 1)I Real and subanalytic functions (r = ?)
ReferencesI Studied by Łojasiewicz [1963] for real analytical functionsI Numerous applications e.g. [Bolte et al., 2007]: non-convex
optimization, dynamical systems, concentration inequalities...
13
![Page 30: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/30.jpg)
Sharpness and Smoothness
Combining sharpness lower bound and smoothness upper bound on X ∗,
µ
rd(x ,X ∗)r ≤ f (x)− f ∗ ≤ L
2d(x ,X ∗)2
=⇒ 0 <2µrL≤ d(x ,X ∗)2
d(x ,X ∗)r
Taking x → X ∗, necessarily 2 ≤ r
Condition numbers
τ = 1− 2/r ∈ [0, 1[ and κ = L/µ2r
14
![Page 31: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/31.jpg)
Sharpness and Smoothness
Combining sharpness lower bound and smoothness upper bound on X ∗,
µ
rd(x ,X ∗)r ≤ f (x)− f ∗ ≤ L
2d(x ,X ∗)2
=⇒ 0 <2µrL≤ d(x ,X ∗)2
d(x ,X ∗)r
Taking x → X ∗, necessarily 2 ≤ r
Condition numbers
τ = 1− 2/r ∈ [0, 1[ and κ = L/µ2r
14
![Page 32: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/32.jpg)
Sharpness and Smoothness
Combining sharpness lower bound and smoothness upper bound on X ∗,
µ
rd(x ,X ∗)r ≤ f (x)− f ∗ ≤ L
2d(x ,X ∗)2
=⇒ 0 <2µrL≤ d(x ,X ∗)2
d(x ,X ∗)r
Taking x → X ∗, necessarily 2 ≤ r
Condition numbers
τ = 1− 2/r ∈ [0, 1[ and κ = L/µ2r
14
![Page 33: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/33.jpg)
Sharpness and Smoothness
Combining sharpness lower bound and smoothness upper bound on X ∗,
µ
rd(x ,X ∗)r ≤ f (x)− f ∗ ≤ L
2d(x ,X ∗)2
=⇒ 0 <2µrL≤ d(x ,X ∗)2
d(x ,X ∗)r
Taking x → X ∗, necessarily 2 ≤ r
Condition numbers
τ = 1− 2/r ∈ [0, 1[ and κ = L/µ2r
14
![Page 34: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/34.jpg)
Outline
I Sharpness in Convex Optimization Problems1.1 Smooth Convex Optimization1.2 Sharpness1.3 Scheduled Restarts1.4 Sharpness on Sparse Recovery Problems
15
![Page 35: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/35.jpg)
Scheduled RestartsPrinciple Run accelerated algorithm A, stop it, restart from last iterate
Here schedule restarts in advance at times tk and build from x0 ∈ Rd
xk = A(xk−1, tk)
Why?Combine convergence bound and sharpness
f (xk)− f ∗ ≤ 4Lt2k
d(xk−1,X∗)2 and
µ
rd(xk−1,X
∗)r ≤ f (xk−1)− f ∗
Sof (xk)− f ∗ ≤ cL,µ,r
t2k(f (xk−1)− f ∗)2/r
Method of analysis
1. For given 0 < γ < 1, compute tk , f (xk)− f ∗ ≤ γ(f (xk−1)− f ∗)
2. Optimize on γ to get optimal rate in terms of total number ofiterations N =
∑Ri=1 tk after R restarts
16
![Page 36: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/36.jpg)
Scheduled RestartsPrinciple Run accelerated algorithm A, stop it, restart from last iterate
Here schedule restarts in advance at times tk and build from x0 ∈ Rd
xk = A(xk−1, tk)
Why?Combine convergence bound and sharpness
f (xk)− f ∗ ≤ 4Lt2k
d(xk−1,X∗)2 and
µ
rd(xk−1,X
∗)r ≤ f (xk−1)− f ∗
Sof (xk)− f ∗ ≤ cL,µ,r
t2k(f (xk−1)− f ∗)2/r
Method of analysis
1. For given 0 < γ < 1, compute tk , f (xk)− f ∗ ≤ γ(f (xk−1)− f ∗)
2. Optimize on γ to get optimal rate in terms of total number ofiterations N =
∑Ri=1 tk after R restarts
16
![Page 37: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/37.jpg)
Scheduled RestartsPrinciple Run accelerated algorithm A, stop it, restart from last iterate
Here schedule restarts in advance at times tk and build from x0 ∈ Rd
xk = A(xk−1, tk)
Why?Combine convergence bound and sharpness
f (xk)− f ∗ ≤ 4Lt2k
d(xk−1,X∗)2 and
µ
rd(xk−1,X
∗)r ≤ f (xk−1)− f ∗
Sof (xk)− f ∗ ≤ cL,µ,r
t2k(f (xk−1)− f ∗)2/r
Method of analysis
1. For given 0 < γ < 1, compute tk , f (xk)− f ∗ ≤ γ(f (xk−1)− f ∗)
2. Optimize on γ to get optimal rate in terms of total number ofiterations N =
∑Ri=1 tk after R restarts
16
![Page 38: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/38.jpg)
Scheduled RestartsPrinciple Run accelerated algorithm A, stop it, restart from last iterate
Here schedule restarts in advance at times tk and build from x0 ∈ Rd
xk = A(xk−1, tk)
Why?Combine convergence bound and sharpness
f (xk)− f ∗ ≤ 4Lt2k
d(xk−1,X∗)2 and
µ
rd(xk−1,X
∗)r ≤ f (xk−1)− f ∗
Sof (xk)− f ∗ ≤ cL,µ,r
t2k(f (xk−1)− f ∗)2/r
Method of analysis
1. For given 0 < γ < 1, compute tk , f (xk)− f ∗ ≤ γ(f (xk−1)− f ∗)
2. Optimize on γ to get optimal rate in terms of total number ofiterations N =
∑Ri=1 tk after R restarts
16
![Page 39: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/39.jpg)
Optimal Schedule
Proposition [R. and d’Aspremont, 2017]
For f convex, L-smooth and (r , µ)-sharp on a set K ⊃ {x : f (x) ≤ f (x0)}Run scheduled restarts with
tk = Cτ,κeτk
Then after R restarts and N =∑R
i=1 tk total iterations, it outputs x s.t.
f (x)− f ∗ = O(exp(−κ−1/2N)
)when τ = 0
f (x)− f ∗ = O(1/N2/τ
)when τ > 0
Recall: τ = 1− 2/r , κ = L/µ2/rRemarks
I Optimal for this class of problems [Nemirovskii and Nesterov, 1985]I Bound continuous in τ : for τ → 0, gets bound for τ = 0
17
![Page 40: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/40.jpg)
Parameter-free straegy
In practice (r , µ) are unknown, adaptivity is crucial
Adaptive strategy (log-scale grid search)Given a fixed budget of iterations N, search with schedules of the form
tk = Ceτk
I Grid on C limited by N
I Grid on C limited by continuity of the bounds in τ
AnalysisI Nearly-optimal bounds (up to constant factor 4)I Cost of the grid search log2(N)2
18
![Page 41: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/41.jpg)
Parameter-free straegy
In practice (r , µ) are unknown, adaptivity is crucial
Adaptive strategy (log-scale grid search)Given a fixed budget of iterations N, search with schedules of the form
tk = Ceτk
I Grid on C limited by N
I Grid on C limited by continuity of the bounds in τ
AnalysisI Nearly-optimal bounds (up to constant factor 4)I Cost of the grid search log2(N)2
18
![Page 42: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/42.jpg)
Parameter-free straegy
In practice (r , µ) are unknown, adaptivity is crucial
Adaptive strategy (log-scale grid search)Given a fixed budget of iterations N, search with schedules of the form
tk = Ceτk
I Grid on C limited by N
I Grid on C limited by continuity of the bounds in τ
AnalysisI Nearly-optimal bounds (up to constant factor 4)I Cost of the grid search log2(N)2
18
![Page 43: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/43.jpg)
Universal Scheduled RestartsGeneralizationNon-smooth or Hölder smooth convex functions where there exists1 ≤ s ≤ 2 and L > 0 s.t.
‖∇f (x)−∇f (y)‖2 ≤ L‖x − y‖s−12 , for every x , y ∈ dom f ,
where ∇f (x) is any subgradient of f at x if s = 1 (non-smooth case)
Optimal rate [Nesterov, 2015] (without sharpness)
f (x)− f ∗ ≤ csLd(x0,X∗)s
tρwhere ρ = 3s/2− 1
Proposition [R. and d’Aspremont, 2017] (Simplified)
Optimal scheduled restart with sharpness output x s.t.
f (x)− f ∗ = O(exp(−κ−s/(2ρ)N)
)when τ = 0
f (x)− f ∗ = O(1/Nρ/τ
)when τ > 0
where τ = 1− s/r ∈ [0, 1[, κ = L2/s/µ2r
19
![Page 44: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/44.jpg)
Universal Scheduled RestartsGeneralizationNon-smooth or Hölder smooth convex functions where there exists1 ≤ s ≤ 2 and L > 0 s.t.
‖∇f (x)−∇f (y)‖2 ≤ L‖x − y‖s−12 , for every x , y ∈ dom f ,
where ∇f (x) is any subgradient of f at x if s = 1 (non-smooth case)
Optimal rate [Nesterov, 2015] (without sharpness)
f (x)− f ∗ ≤ csLd(x0,X∗)s
tρwhere ρ = 3s/2− 1
Proposition [R. and d’Aspremont, 2017] (Simplified)
Optimal scheduled restart with sharpness output x s.t.
f (x)− f ∗ = O(exp(−κ−s/(2ρ)N)
)when τ = 0
f (x)− f ∗ = O(1/Nρ/τ
)when τ > 0
where τ = 1− s/r ∈ [0, 1[, κ = L2/s/µ2r
19
![Page 45: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/45.jpg)
Universal Scheduled RestartsGeneralizationNon-smooth or Hölder smooth convex functions where there exists1 ≤ s ≤ 2 and L > 0 s.t.
‖∇f (x)−∇f (y)‖2 ≤ L‖x − y‖s−12 , for every x , y ∈ dom f ,
where ∇f (x) is any subgradient of f at x if s = 1 (non-smooth case)
Optimal rate [Nesterov, 2015] (without sharpness)
f (x)− f ∗ ≤ csLd(x0,X∗)s
tρwhere ρ = 3s/2− 1
Proposition [R. and d’Aspremont, 2017] (Simplified)
Optimal scheduled restart with sharpness output x s.t.
f (x)− f ∗ = O(exp(−κ−s/(2ρ)N)
)when τ = 0
f (x)− f ∗ = O(1/Nρ/τ
)when τ > 0
where τ = 1− s/r ∈ [0, 1[, κ = L2/s/µ2r
19
![Page 46: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/46.jpg)
Test on Classification ProblemsFor real data set (n = 208 samples, d = 60 features)
Compare toI Accelerated gradient (Acc)I Restart heuristic enforcing monotonicity of objective values (Mono)I Adaptive restarts (Adap)
ForI Least square minx ‖Ax − b‖22I Logistic minx
∑i log(1+ exp(−biaTi x))
Least square LogisticLarge dots represent the restart iterations
20
![Page 47: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/47.jpg)
Test on Classification ProblemsFor real data set (n = 208 samples, d = 60 features)
Compare toI Accelerated gradient (Acc)I Restart heuristic enforcing monotonicity of objective values (Mono)I Adaptive restarts (Adap)
ForI Least square minx ‖Ax − b‖22I Logistic minx
∑i log(1+ exp(−biaTi x))
Least square LogisticLarge dots represent the restart iterations
20
![Page 48: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/48.jpg)
Test on Classification ProblemsResults also valid for composite problems (same convergence bounds)
minimize f (x) + g(x)
where f , g convex, g "simple", f is (L, s)-smooth, f + g is (r , µ)-sharp
Test onI Lasso minx ‖Ax − b‖22 + ‖x‖1I Dual SVM minx xTAAT x − xT1 s.t. 0 ≤ x ≤ 1
Lasso Dual SVMLarge dots represent the restart iterations
21
![Page 49: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/49.jpg)
Test on Classification ProblemsResults also valid for composite problems (same convergence bounds)
minimize f (x) + g(x)
where f , g convex, g "simple", f is (L, s)-smooth, f + g is (r , µ)-sharp
Test onI Lasso minx ‖Ax − b‖22 + ‖x‖1I Dual SVM minx xTAAT x − xT1 s.t. 0 ≤ x ≤ 1
Lasso Dual SVMLarge dots represent the restart iterations
21
![Page 50: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/50.jpg)
Outline
I Sharpness in Convex Optimization Problems1.1 Smooth Convex Optimization1.2 Sharpness1.3 Scheduled Restarts1.4 Sharpness on Sparse Recovery Problems
22
![Page 51: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/51.jpg)
Sparse Recovery Problems
Goal Recover a signal x∗ ∈ Rd from n linear observations
bi = aTi x∗, i ∈ {1, . . . , n}
ProblemNeeds more observations n than features d
Additional assumptionx∗ is s-sparse : has only s � d non-zero values
ApplicationsI Coding/Decoding audio signals, images, ...I Find explanatory variables for an experiment
23
![Page 52: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/52.jpg)
Sparse Recovery Problems
Goal Recover a signal x∗ ∈ Rd from n linear observations
bi = aTi x∗, i ∈ {1, . . . , n}
ProblemNeeds more observations n than features d
Additional assumptionx∗ is s-sparse : has only s � d non-zero values
ApplicationsI Coding/Decoding audio signals, images, ...I Find explanatory variables for an experiment
23
![Page 53: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/53.jpg)
Sparse Recovery Problems
Goal Recover a signal x∗ ∈ Rd from n linear observations
bi = aTi x∗, i ∈ {1, . . . , n}
ProblemNeeds more observations n than features d
Additional assumptionx∗ is s-sparse : has only s � d non-zero values
ApplicationsI Coding/Decoding audio signals, images, ...I Find explanatory variables for an experiment
23
![Page 54: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/54.jpg)
Sparse Recovery Problems
Goal Recover a signal x∗ ∈ Rd from n linear observations
bi = aTi x∗, i ∈ {1, . . . , n}
ProblemNeeds more observations n than features d
Additional assumptionx∗ is s-sparse : has only s � d non-zero values
ApplicationsI Coding/Decoding audio signals, images, ...I Find explanatory variables for an experiment
23
![Page 55: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/55.jpg)
Sparse Recovery Problems
Goal Recover a signal x∗ ∈ Rd from n linear observations
bi = aTi x∗, i ∈ {1, . . . , n}
ProblemNeeds more observations n than features d
Additional assumptionx∗ is s-sparse : has only s � d non-zero values
ApplicationsI Coding/Decoding audio signals, images, ...I Find explanatory variables for an experiment
23
![Page 56: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/56.jpg)
Sparse Recovery Problems
Original decoding procedureGiven b = (b1, . . . , bn)
T ∈ Rn and A = (a1, . . . , an)T ∈ Rn×d , original
problem is
minimize ‖x‖0subject to Ax = b
where Supp(x) = {i ∈ {1, . . . , d}, xi 6= 0} is the support of x
→ NP hard combinatorial problem...
Practical decoding procedureSolve instead convex relaxation
minimize Card(Supp(x))subject to Ax = b
Recovery achieved if solution x = x∗, where x∗ original vector (b = Ax∗)
24
![Page 57: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/57.jpg)
Sparse Recovery Problems
Original decoding procedureGiven b = (b1, . . . , bn)
T ∈ Rn and A = (a1, . . . , an)T ∈ Rn×d , original
problem is
minimize ‖x‖0subject to Ax = b
where Supp(x) = {i ∈ {1, . . . , d}, xi 6= 0} is the support of x→ NP hard combinatorial problem...
Practical decoding procedureSolve instead convex relaxation
minimize Card(Supp(x))subject to Ax = b
Recovery achieved if solution x = x∗, where x∗ original vector (b = Ax∗)
24
![Page 58: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/58.jpg)
Sparse Recovery Problems
Original decoding procedureGiven b = (b1, . . . , bn)
T ∈ Rn and A = (a1, . . . , an)T ∈ Rn×d , original
problem is
minimize ‖x‖0subject to Ax = b
where Supp(x) = {i ∈ {1, . . . , d}, xi 6= 0} is the support of x→ NP hard combinatorial problem...
Practical decoding procedureSolve instead convex relaxation
minimize Card(Supp(x))subject to Ax = b
Recovery achieved if solution x = x∗, where x∗ original vector (b = Ax∗)
24
![Page 59: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/59.jpg)
Sparse Recovery Problems
Original decoding procedureGiven b = (b1, . . . , bn)
T ∈ Rn and A = (a1, . . . , an)T ∈ Rn×d , original
problem is
minimize ‖x‖0subject to Ax = b
where Supp(x) = {i ∈ {1, . . . , d}, xi 6= 0} is the support of x→ NP hard combinatorial problem...
Practical decoding procedureSolve instead convex relaxation
minimize Card(Supp(x))subject to Ax = b
Recovery achieved if solution x = x∗, where x∗ original vector (b = Ax∗)
24
![Page 60: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/60.jpg)
Sharpness in Sparse Recovery Problems
Exploit sharpness of f (x) = ‖x‖1 on {x : Ax = b}
γAd(x ,X∗) ≤ f (x)− f ∗
Proposition [R., Boumal and d’Aspremont, 2017] (Simplified)
Optimal scheduled restart of classical algorithm for exact recoveryoutputs, after N total number of iterations, x such that
f (x)− f ∗ = O(exp (−γAN))
RemarkI Optimal schedule needs γA but log-scale grid search nearly optimal
25
![Page 61: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/61.jpg)
Sharpness in Sparse Recovery Problems
Exploit sharpness of f (x) = ‖x‖1 on {x : Ax = b}
γAd(x ,X∗) ≤ f (x)− f ∗
Proposition [R., Boumal and d’Aspremont, 2017] (Simplified)
Optimal scheduled restart of classical algorithm for exact recoveryoutputs, after N total number of iterations, x such that
f (x)− f ∗ = O(exp (−γAN))
RemarkI Optimal schedule needs γA but log-scale grid search nearly optimal
25
![Page 62: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/62.jpg)
Sharpness in Sparse Recovery Problems
Exploit sharpness of f (x) = ‖x‖1 on {x : Ax = b}
γAd(x ,X∗) ≤ f (x)− f ∗
Proposition [R., Boumal and d’Aspremont, 2017] (Simplified)
Optimal scheduled restart of classical algorithm for exact recoveryoutputs, after N total number of iterations, x such that
f (x)− f ∗ = O(exp (−γAN))
RemarkI Optimal schedule needs γA but log-scale grid search nearly optimal
25
![Page 63: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/63.jpg)
Sharpness and Sparse Recovery Performance
Recovery thresholdGiven A ∈ Rn×d , denote smax(A) its recovery threshold such that anyoriginal signal x∗ s-sparse, with s < smax(A), is the unique solution of
minimize ‖x‖1subject to Ax = Ax∗
Proposition [R., Boumal and d’Aspremont, 2017]
Given A ∈ Rn×d and an original signal x∗ s-sparse, with s < smax(A),
‖x‖1−‖x∗‖1 > (1−√
s/smax(A))‖x−x∗‖1 ∀x ∈ Rd : Ax = Ax∗, x 6= x∗
Rate of convergence of optimal restart scheme reads
‖x‖1 − ‖x∗‖1 = O(exp
(−(1−
√s/smax(A)
)N))
26
![Page 64: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/64.jpg)
Sharpness and Sparse Recovery Performance
Recovery thresholdGiven A ∈ Rn×d , denote smax(A) its recovery threshold such that anyoriginal signal x∗ s-sparse, with s < smax(A), is the unique solution of
minimize ‖x‖1subject to Ax = Ax∗
Proposition [R., Boumal and d’Aspremont, 2017]
Given A ∈ Rn×d and an original signal x∗ s-sparse, with s < smax(A),
‖x‖1−‖x∗‖1 > (1−√
s/smax(A))‖x−x∗‖1 ∀x ∈ Rd : Ax = Ax∗, x 6= x∗
Rate of convergence of optimal restart scheme reads
‖x‖1 − ‖x∗‖1 = O(exp
(−(1−
√s/smax(A)
)N))
26
![Page 65: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/65.jpg)
Sharpness and Sparse Recovery Performance
Recovery thresholdGiven A ∈ Rn×d , denote smax(A) its recovery threshold such that anyoriginal signal x∗ s-sparse, with s < smax(A), is the unique solution of
minimize ‖x‖1subject to Ax = Ax∗
Proposition [R., Boumal and d’Aspremont, 2017]
Given A ∈ Rn×d and an original signal x∗ s-sparse, with s < smax(A),
‖x‖1−‖x∗‖1 > (1−√
s/smax(A))‖x−x∗‖1 ∀x ∈ Rd : Ax = Ax∗, x 6= x∗
Rate of convergence of optimal restart scheme reads
‖x‖1 − ‖x∗‖1 = O(exp
(−(1−
√s/smax(A)
)N))
26
![Page 66: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/66.jpg)
Numerical IllustrationFor random observation matrix A, smax(A) ≈ n/ log d
So to recover s-sparse signals, needs
n ≈ s log d
Convergence rate of optimal restart
‖x‖1 − ‖x∗‖1 = O(exp
(−(1− c
√s log d/n
)N))
Best restart scheme found by grid search along oversampling ratioτ = n/(s log d) for fixed d = 1000
Left : sparsity s = 20 fixed. Right : nb of samples n = 200 fixed.
27
![Page 67: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/67.jpg)
Numerical IllustrationFor random observation matrix A, smax(A) ≈ n/ log d
So to recover s-sparse signals, needs
n ≈ s log d
Convergence rate of optimal restart
‖x‖1 − ‖x∗‖1 = O(exp
(−(1− c
√s log d/n
)N))
Best restart scheme found by grid search along oversampling ratioτ = n/(s log d) for fixed d = 1000
Left : sparsity s = 20 fixed. Right : nb of samples n = 200 fixed.27
![Page 68: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/68.jpg)
Conclusion and Future WorkContributions
I Analyze acceleration of accelerated schemes by restart under asharpness assumption
I Show cost of adaptive schemesI Link optimization complexity and statistical performance of sparse
recovery problems with sharpness
Not presentedI Link sharpness to robust recovery performance or noisy observationsI Extension to other sparse structures : group sparsity, low rank
matricesI Optimization algorithms seen as integration methods of the gradient
flow [Scieur, R., Bach and d’Aspremont, 2017]
PerspectivesI Get more practical adaptive scheme.→ Fercoq and Qu [2017] has one for r = 2, extends results
I Refine sharpness analysis for robust sparse recovery problems
28
![Page 69: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/69.jpg)
Conclusion and Future WorkContributions
I Analyze acceleration of accelerated schemes by restart under asharpness assumption
I Show cost of adaptive schemesI Link optimization complexity and statistical performance of sparse
recovery problems with sharpness
Not presentedI Link sharpness to robust recovery performance or noisy observationsI Extension to other sparse structures : group sparsity, low rank
matricesI Optimization algorithms seen as integration methods of the gradient
flow [Scieur, R., Bach and d’Aspremont, 2017]
PerspectivesI Get more practical adaptive scheme.→ Fercoq and Qu [2017] has one for r = 2, extends results
I Refine sharpness analysis for robust sparse recovery problems
28
![Page 70: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/70.jpg)
Conclusion and Future WorkContributions
I Analyze acceleration of accelerated schemes by restart under asharpness assumption
I Show cost of adaptive schemesI Link optimization complexity and statistical performance of sparse
recovery problems with sharpness
Not presentedI Link sharpness to robust recovery performance or noisy observationsI Extension to other sparse structures : group sparsity, low rank
matricesI Optimization algorithms seen as integration methods of the gradient
flow [Scieur, R., Bach and d’Aspremont, 2017]
PerspectivesI Get more practical adaptive scheme.→ Fercoq and Qu [2017] has one for r = 2, extends results
I Refine sharpness analysis for robust sparse recovery problems28
![Page 71: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/71.jpg)
Part 2
Machine Learning Problemswith Partitioning Structure
29
![Page 72: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/72.jpg)
Machine Learning Problems
Goal Predict attributes y from objects x
Images → Digits DNA → Phenotype Documents → Topic
30
![Page 73: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/73.jpg)
Supervised Machine Learning Problems
GoalLearn mapping
f : x → y
from n training samples of objects/attributes (x1, y1), . . . , (xn, yn)
MethodPrediction of f on (x , y) measured by loss function `(y , f (x))Learning procedure consist in
minimize1n
n∑i=1
`(yi , f (xi )) + R(f )
in mapping f ∈ F where R is a regularizer that prevents overfitting ontraining data
31
![Page 74: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/74.jpg)
Supervised Machine Learning Problems
GoalLearn mapping
f : x → y
from n training samples of objects/attributes (x1, y1), . . . , (xn, yn)
MethodPrediction of f on (x , y) measured by loss function `(y , f (x))Learning procedure consist in
minimize1n
n∑i=1
`(yi , f (xi )) + R(f )
in mapping f ∈ F where R is a regularizer that prevents overfitting ontraining data
31
![Page 75: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/75.jpg)
Structure Information
Idea Impose an underlying structure on the datato both learn and simplify prediction task
In this part1. Group features for prediction task
Iterative Hard Clustering of Features, V. Roulet, F. Fogel, F. Bach and A.d’Aspremont, under submission to the 21st International Conference onArtificial Intelligence and Statistics (AISTATS 2018)
2. (Not presented) Extension to group samples or tasksLearning with Clustering Penalties, V. Roulet, F. Fogel, F. Bach and A.d’Aspremont, presented at workshop Transfer and Multi-Task Learning:Trends and New Perspectives (NIPS 2015)
32
![Page 76: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/76.jpg)
Structure Information
Idea Impose an underlying structure on the datato both learn and simplify prediction task
In this part1. Group features for prediction task
Iterative Hard Clustering of Features, V. Roulet, F. Fogel, F. Bach and A.d’Aspremont, under submission to the 21st International Conference onArtificial Intelligence and Statistics (AISTATS 2018)
2. (Not presented) Extension to group samples or tasksLearning with Clustering Penalties, V. Roulet, F. Fogel, F. Bach and A.d’Aspremont, presented at workshop Transfer and Multi-Task Learning:Trends and New Perspectives (NIPS 2015)
32
![Page 77: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/77.jpg)
Outline
II Machine Learning Problemswith Partitioning Structure2.1 Grouping Features for Prediction2.2 Iterative Hard Thresholding2.3 Sparse and Linear Grouped Models2.4 Synthetic experiments
33
![Page 78: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/78.jpg)
Practical Motivation
Goal Predict phenotypes from DNA
DataGenomes x1, . . . xn composed of d genes
ProblemNumber of genes d very large, prediction hard to make and interpret
AssumptionSome genes are redundant → form groups of genes and compute theirinfluence
34
![Page 79: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/79.jpg)
Practical Motivation
Goal Predict phenotypes from DNA
DataGenomes x1, . . . xn composed of d genes
ProblemNumber of genes d very large, prediction hard to make and interpret
AssumptionSome genes are redundant → form groups of genes and compute theirinfluence
34
![Page 80: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/80.jpg)
Practical Motivation
Goal Predict phenotypes from DNA
DataGenomes x1, . . . xn composed of d genes
ProblemNumber of genes d very large, prediction hard to make and interpret
AssumptionSome genes are redundant → form groups of genes and compute theirinfluence
34
![Page 81: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/81.jpg)
Practical Motivation
Goal Predict phenotypes from DNA
DataGenomes x1, . . . xn composed of d genes
ProblemNumber of genes d very large, prediction hard to make and interpret
AssumptionSome genes are redundant → form groups of genes and compute theirinfluence
34
![Page 82: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/82.jpg)
Problem Formulation
Linear regression modelAttributes y ∈ R, find w ∈ Rd such that
xTw ≈ y
To this end,minimize L(w) + λR(w)
in w ∈ Rd where L(w) = 1n
∑ni=1 `(yi ,w
T xi ) is the empirical loss and λis a regularization parameter
Examples:I Squared loss `(y , xTw) = (y − xTw)2
I Squared regularizer R(w) = ‖w‖22
35
![Page 83: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/83.jpg)
Problem Formulation
Linear regression modelAttributes y ∈ R, find w ∈ Rd such that
xTw ≈ y
To this end,minimize L(w) + λR(w)
in w ∈ Rd where L(w) = 1n
∑ni=1 `(yi ,w
T xi ) is the empirical loss and λis a regularization parameter
Examples:I Squared loss `(y , xTw) = (y − xTw)2
I Squared regularizer R(w) = ‖w‖22
35
![Page 84: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/84.jpg)
Problem Formulation
Usual sparsitySelects s features, by constraining w with at most s non-zero values,
minimize L(w) + λR(w)subject to Card(Supp(w)) ≤ s,
where Supp(w) = {i ∈ {1, . . . , d},wi 6= 0} is the support of w
→ Hard combinatorial problem
→ Solved approximately by projected gradient descent/convex relaxation
Reduce prediction problem to at most s variables
36
![Page 85: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/85.jpg)
Problem Formulation
Usual sparsitySelects s features, by constraining w with at most s non-zero values,
minimize L(w) + λR(w)subject to Card(Supp(w)) ≤ s,
where Supp(w) = {i ∈ {1, . . . , d},wi 6= 0} is the support of w→ Hard combinatorial problem
→ Solved approximately by projected gradient descent/convex relaxation
Reduce prediction problem to at most s variables
36
![Page 86: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/86.jpg)
Problem Formulation
Usual sparsitySelects s features, by constraining w with at most s non-zero values,
minimize L(w) + λR(w)subject to Card(Supp(w)) ≤ s,
where Supp(w) = {i ∈ {1, . . . , d},wi 6= 0} is the support of w→ Hard combinatorial problem
→ Solved approximately by projected gradient descent/convex relaxation
Reduce prediction problem to at most s variables
36
![Page 87: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/87.jpg)
Problem Formulation
Usual sparsitySelects s features, by constraining w with at most s non-zero values,
minimize L(w) + λR(w)subject to Card(Supp(w)) ≤ s,
where Supp(w) = {i ∈ {1, . . . , d},wi 6= 0} is the support of w→ Hard combinatorial problem
→ Solved approximately by projected gradient descent/convex relaxation
Reduce prediction problem to at most s variables
36
![Page 88: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/88.jpg)
Problem Formulation
Grouping constraintsI Constraint w to have at most Q different values v1, . . . , vQI Each vq is assigned to a group gq ⊂ {1, . . . , d}
Formally, w defines a partition {g1, . . . , gQ} of {1, . . . , d}
Part(w) = {g ⊂ {1, . . . , d} : (i , j) ∈ g × g , iff wi = wj}
Example: w = (7, -3, -3, 0, 7) → Part(w) = {{1,5}, {4}, {2,3}}
Regression with grouping constraints of the feature reads
minimize L(w) + λR(w)subject to Card(Part(w)) ≤ Q
Reduce prediction problem to at most Q variables
37
![Page 89: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/89.jpg)
Problem Formulation
Grouping constraintsI Constraint w to have at most Q different values v1, . . . , vQI Each vq is assigned to a group gq ⊂ {1, . . . , d}
Formally, w defines a partition {g1, . . . , gQ} of {1, . . . , d}
Part(w) = {g ⊂ {1, . . . , d} : (i , j) ∈ g × g , iff wi = wj}
Example: w = (7, -3, -3, 0, 7) → Part(w) = {{1,5}, {4}, {2,3}}
Regression with grouping constraints of the feature reads
minimize L(w) + λR(w)subject to Card(Part(w)) ≤ Q
Reduce prediction problem to at most Q variables
37
![Page 90: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/90.jpg)
Problem Formulation
Grouping constraintsI Constraint w to have at most Q different values v1, . . . , vQI Each vq is assigned to a group gq ⊂ {1, . . . , d}
Formally, w defines a partition {g1, . . . , gQ} of {1, . . . , d}
Part(w) = {g ⊂ {1, . . . , d} : (i , j) ∈ g × g , iff wi = wj}
Example: w = (7, -3, -3, 0, 7) → Part(w) = {{1,5}, {4}, {2,3}}
Regression with grouping constraints of the feature reads
minimize L(w) + λR(w)subject to Card(Part(w)) ≤ Q
Reduce prediction problem to at most Q variables
37
![Page 91: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/91.jpg)
Problem Formulation
Grouping constraintsI Constraint w to have at most Q different values v1, . . . , vQI Each vq is assigned to a group gq ⊂ {1, . . . , d}
Formally, w defines a partition {g1, . . . , gQ} of {1, . . . , d}
Part(w) = {g ⊂ {1, . . . , d} : (i , j) ∈ g × g , iff wi = wj}
Example: w = (7, -3, -3, 0, 7) → Part(w) = {{1,5}, {4}, {2,3}}
Regression with grouping constraints of the feature reads
minimize L(w) + λR(w)subject to Card(Part(w)) ≤ Q
Reduce prediction problem to at most Q variables
37
![Page 92: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/92.jpg)
Problem Formulation
Grouping constraintsI Constraint w to have at most Q different values v1, . . . , vQI Each vq is assigned to a group gq ⊂ {1, . . . , d}
Formally, w defines a partition {g1, . . . , gQ} of {1, . . . , d}
Part(w) = {g ⊂ {1, . . . , d} : (i , j) ∈ g × g , iff wi = wj}
Example: w = (7, -3, -3, 0, 7) → Part(w) = {{1,5}, {4}, {2,3}}
Regression with grouping constraints of the feature reads
minimize L(w) + λR(w)subject to Card(Part(w)) ≤ Q
Reduce prediction problem to at most Q variables
37
![Page 93: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/93.jpg)
Outline
II Machine Learning Problemswith Partitioning Structure2.1 Grouping Features for Prediction2.2 Iterative Hard Thresholding2.3 Sparse and Linear Grouped Models2.4 Synthetic experiments
38
![Page 94: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/94.jpg)
Projection on feasible set
Projection of w ∈ Rd on {w : Card(Part(w)) ≤ Q} reads
minimizeQ∑
q=1
∑i∈gq
(wi − vq)2,
in v1, . . . , vQ ∈ R and G = {g1, . . . , gQ} a partition of {1, . . . , d}
→ Recognizes k-means in one dimension
→ Can be solved exactly in polynomial time by dynamic programming
Projected gradient descent is possible
39
![Page 95: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/95.jpg)
Projection on feasible set
Projection of w ∈ Rd on {w : Card(Part(w)) ≤ Q} reads
minimizeQ∑
q=1
∑i∈gq
(wi − vq)2,
in v1, . . . , vQ ∈ R and G = {g1, . . . , gQ} a partition of {1, . . . , d}
→ Recognizes k-means in one dimension
→ Can be solved exactly in polynomial time by dynamic programming
Projected gradient descent is possible
39
![Page 96: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/96.jpg)
Projection on feasible set
Projection of w ∈ Rd on {w : Card(Part(w)) ≤ Q} reads
minimizeQ∑
q=1
∑i∈gq
(wi − vq)2,
in v1, . . . , vQ ∈ R and G = {g1, . . . , gQ} a partition of {1, . . . , d}
→ Recognizes k-means in one dimension
→ Can be solved exactly in polynomial time by dynamic programming
Projected gradient descent is possible
39
![Page 97: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/97.jpg)
Projection on feasible set
Projection of w ∈ Rd on {w : Card(Part(w)) ≤ Q} reads
minimizeQ∑
q=1
∑i∈gq
(wi − vq)2,
in v1, . . . , vQ ∈ R and G = {g1, . . . , gQ} a partition of {1, . . . , d}
→ Recognizes k-means in one dimension
→ Can be solved exactly in polynomial time by dynamic programming
Projected gradient descent is possible
39
![Page 98: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/98.jpg)
Projection on feasible set
Projection of w ∈ Rd on {w : Card(Part(w)) ≤ Q} reads
minimizeQ∑
q=1
∑i∈gq
(wi − vq)2,
in v1, . . . , vQ ∈ R and G = {g1, . . . , gQ} a partition of {1, . . . , d}
→ Recognizes k-means in one dimension
→ Can be solved exactly in polynomial time by dynamic programming
Projected gradient descent is possible
39
![Page 99: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/99.jpg)
Iterative Hard Clustering
Denote k-means(w ,Q) the projection of w that group it in Q groups
Algorithm Iterative Hard Clustering (IHC)Inputs: L(w),R(w),Q, λ ≥ 0, step size γtInitialize w0 ∈ Rd (e.g. w0 = 0)for t = 0,. . . ,T do
wt+1/2 = wt − γt(∇L(wt) + λ∇R(wt))wt+1 = k-means(wt+1/2,Q)
end forOutput: w = wT
RemarksI In practice, backtracking line search for γt , ensures decreaseI Akin to projected gradient descent for sparse problems, called
Iterative Hard Thresholding (IHT)
40
![Page 100: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/100.jpg)
Iterative Hard Clustering
Denote k-means(w ,Q) the projection of w that group it in Q groups
Algorithm Iterative Hard Clustering (IHC)Inputs: L(w),R(w),Q, λ ≥ 0, step size γtInitialize w0 ∈ Rd (e.g. w0 = 0)for t = 0,. . . ,T do
wt+1/2 = wt − γt(∇L(wt) + λ∇R(wt))wt+1 = k-means(wt+1/2,Q)
end forOutput: w = wT
RemarksI In practice, backtracking line search for γt , ensures decreaseI Akin to projected gradient descent for sparse problems, called
Iterative Hard Thresholding (IHT)
40
![Page 101: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/101.jpg)
Convergence analysis
ProblemConstraint set is not convex, convergence is not ensured...
→ Use that it’s a union of subspaces defined by partitions
Recovery analysisAnalyze algorithm as a decoding procedure
I Assumeyi = xTi w∗ + ηi , for every i ∈ {1, . . . , n}
where η = N (0, σ2) and w∗ s.t. Card(Part(w∗)) ≤ Q
I Analyze convergence to w∗ by solving least-square problemI Compute number of random observations n needed to find w∗
Results [R., Fogel, d’Aspremont and Bach, 2017]Recovery (up to statistical precision) for n ≥ d random observations
Partitioning structure much harder than sparsity where recovery needsn = O(s log d)
41
![Page 102: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/102.jpg)
Convergence analysisProblemConstraint set is not convex, convergence is not ensured...
→ Use that it’s a union of subspaces defined by partitions
Recovery analysisAnalyze algorithm as a decoding procedure
I Assumeyi = xTi w∗ + ηi , for every i ∈ {1, . . . , n}
where η = N (0, σ2) and w∗ s.t. Card(Part(w∗)) ≤ Q
I Analyze convergence to w∗ by solving least-square problemI Compute number of random observations n needed to find w∗
Results [R., Fogel, d’Aspremont and Bach, 2017]Recovery (up to statistical precision) for n ≥ d random observations
Partitioning structure much harder than sparsity where recovery needsn = O(s log d)
41
![Page 103: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/103.jpg)
Convergence analysisProblemConstraint set is not convex, convergence is not ensured...
→ Use that it’s a union of subspaces defined by partitions
Recovery analysisAnalyze algorithm as a decoding procedure
I Assumeyi = xTi w∗ + ηi , for every i ∈ {1, . . . , n}
where η = N (0, σ2) and w∗ s.t. Card(Part(w∗)) ≤ Q
I Analyze convergence to w∗ by solving least-square problemI Compute number of random observations n needed to find w∗
Results [R., Fogel, d’Aspremont and Bach, 2017]Recovery (up to statistical precision) for n ≥ d random observations
Partitioning structure much harder than sparsity where recovery needsn = O(s log d)
41
![Page 104: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/104.jpg)
Convergence analysisProblemConstraint set is not convex, convergence is not ensured...
→ Use that it’s a union of subspaces defined by partitions
Recovery analysisAnalyze algorithm as a decoding procedure
I Assumeyi = xTi w∗ + ηi , for every i ∈ {1, . . . , n}
where η = N (0, σ2) and w∗ s.t. Card(Part(w∗)) ≤ Q
I Analyze convergence to w∗ by solving least-square problemI Compute number of random observations n needed to find w∗
Results [R., Fogel, d’Aspremont and Bach, 2017]Recovery (up to statistical precision) for n ≥ d random observations
Partitioning structure much harder than sparsity where recovery needsn = O(s log d)
41
![Page 105: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/105.jpg)
Convergence analysisProblemConstraint set is not convex, convergence is not ensured...
→ Use that it’s a union of subspaces defined by partitions
Recovery analysisAnalyze algorithm as a decoding procedure
I Assumeyi = xTi w∗ + ηi , for every i ∈ {1, . . . , n}
where η = N (0, σ2) and w∗ s.t. Card(Part(w∗)) ≤ Q
I Analyze convergence to w∗ by solving least-square problemI Compute number of random observations n needed to find w∗
Results [R., Fogel, d’Aspremont and Bach, 2017]Recovery (up to statistical precision) for n ≥ d random observations
Partitioning structure much harder than sparsity where recovery needsn = O(s log d)
41
![Page 106: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/106.jpg)
Convergence analysisProblemConstraint set is not convex, convergence is not ensured...
→ Use that it’s a union of subspaces defined by partitions
Recovery analysisAnalyze algorithm as a decoding procedure
I Assumeyi = xTi w∗ + ηi , for every i ∈ {1, . . . , n}
where η = N (0, σ2) and w∗ s.t. Card(Part(w∗)) ≤ Q
I Analyze convergence to w∗ by solving least-square problemI Compute number of random observations n needed to find w∗
Results [R., Fogel, d’Aspremont and Bach, 2017]Recovery (up to statistical precision) for n ≥ d random observations
Partitioning structure much harder than sparsity where recovery needsn = O(s log d)
41
![Page 107: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/107.jpg)
Outline
II Machine Learning Problemswith Partitioning Structure2.1 Grouping Features for Prediction2.2 Iterative Hard Thresholding2.3 Sparse and Linear Grouped Models2.4 Synthetic experiments
42
![Page 108: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/108.jpg)
Sparse and Grouped Linear Models
GoalBoth selects s features and group them in Q groups by solving
minimize L(w) + λR(w)subject to Card(Supp(w)) ≤ s, Card(Part(w)) ≤ Q + 1
ProcedureI Develop new dynamic programming to project on constraintsI Use resulting projected gradient descent
Recovery analysisConstraint set is still a union of subspaces, same analysis appliedBut here
n = O(s log d + Q log s + (s − Q) logQ)
observations are sufficient
43
![Page 109: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/109.jpg)
Sparse and Grouped Linear Models
GoalBoth selects s features and group them in Q groups by solving
minimize L(w) + λR(w)subject to Card(Supp(w)) ≤ s, Card(Part(w)) ≤ Q + 1
ProcedureI Develop new dynamic programming to project on constraintsI Use resulting projected gradient descent
Recovery analysisConstraint set is still a union of subspaces, same analysis appliedBut here
n = O(s log d + Q log s + (s − Q) logQ)
observations are sufficient
43
![Page 110: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/110.jpg)
Sparse and Grouped Linear Models
GoalBoth selects s features and group them in Q groups by solving
minimize L(w) + λR(w)subject to Card(Supp(w)) ≤ s, Card(Part(w)) ≤ Q + 1
ProcedureI Develop new dynamic programming to project on constraintsI Use resulting projected gradient descent
Recovery analysisConstraint set is still a union of subspaces, same analysis appliedBut here
n = O(s log d + Q log s + (s − Q) logQ)
observations are sufficient
43
![Page 111: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/111.jpg)
Outline
II Machine Learning Problemswith Partitioning Structure2.1 Grouping Features for Prediction2.2 Iterative Hard Thresholding2.3 Sparse and Linear Grouped Models2.4 Synthetic experiments
44
![Page 112: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/112.jpg)
Synthetic Experiments
SettingI yi = xTi w∗ + ηi with η ∼ N (0, σ2I)I w∗ composed of Q = 5 group of identical features among d = 100
GoalI Test robustness of our method with n and level of noise σI Measure ‖w∗ − w‖2 with w estimated vector
45
![Page 113: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/113.jpg)
Synthetic Experiments
SettingI yi = xTi w∗ + ηi with η ∼ N (0, σ2I)I w∗ composed of Q = 5 group of identical features among d = 100
GoalI Test robustness of our method with n and level of noise σI Measure ‖w∗ − w‖2 with w estimated vector
45
![Page 114: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/114.jpg)
Synthetic Experiments Results
Compare Iterative Hard Clustering (IHC) toI Least square given original partition Part(w∗) (Oracle)I Least-squares (LS)I Least-squares followed by a k-means (LSK)I OSCAR penalty (enforces cluster with regularization) (OS)
n = 50 n = 75 n = 100 n = 125 n = 150Oracle 0.16±0.06 0.14±0.04 0.10±0.04 0.10±0.04 0.09±0.03
LS 61.94±17.63 51.94±16.01 21.41±9.40 1.02±0.18 0.70±0.09
LSK 62.93±18.05 57.78±17.03 10.18±14.96 0.31±0.19 0.19±0.12
OS 61.54±17.59 52.87±15.90 11.32±7.03 1.25±0.28 0.71±0.10
IHC 63.31±18.24 52.72±16.51 5.52±14.33 0.14±0.09 0.09±0.04
Measure of ‖w∗ − w‖2 along number of samples n for fixed σ = 0.5, d = 100
46
![Page 115: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/115.jpg)
Synthetic Experiments Results
Compare Iterative Hard Clustering (IHC) toI Least square given original partition Part(w∗) (Oracle)I Least-squares (LS)I Least-squares followed by a k-means (LSK)I OSCAR penalty (enforces cluster with regularization) (OS)
n = 50 n = 75 n = 100 n = 125 n = 150Oracle 0.16±0.06 0.14±0.04 0.10±0.04 0.10±0.04 0.09±0.03
LS 61.94±17.63 51.94±16.01 21.41±9.40 1.02±0.18 0.70±0.09
LSK 62.93±18.05 57.78±17.03 10.18±14.96 0.31±0.19 0.19±0.12
OS 61.54±17.59 52.87±15.90 11.32±7.03 1.25±0.28 0.71±0.10
IHC 63.31±18.24 52.72±16.51 5.52±14.33 0.14±0.09 0.09±0.04
Measure of ‖w∗ − w‖2 along number of samples n for fixed σ = 0.5, d = 100
46
![Page 116: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/116.jpg)
Synthetic Experiments Results
Compare Iterative Hard Clustering (IHC) toI Least square given original partition Part(w∗) (Oracle)I Least-squares (LS)I Least-squares followed by a k-means (LSK)I OSCAR penalty (enforces cluster with regularization) (OS)
σ = 0.05 σ = 0.1 σ = 0.5 σ = 1Oracle 0.86 ±0.27 1.72±0.54 8.62±2.70 17.19±5.43
LS 7.04±0.92 14.05±1.82 70.39±9.20 140.41±18.20
LSK 1.44±0.46 2.88±0.91 19.10±12.13 48.09±27.46
OS 14.43±2.45 18.89±3.46 71.00±10.12 140.33±18.83
IHC 0.87±0.27 1.74±0.52 9.11±4.00 26.23±18.00
Measure of ‖w∗ − w‖2 along level of noise σ for fixed n = 150, d = 100
47
![Page 117: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/117.jpg)
Conclusion and Future Work
ContributionsI Developed simple method to group (and select) featuresI Analyzed recovery performance in comparison to sparsityI Provides scalable and robust results
Not presentedI Convex approach for least square lossI Systematic method to group features, samples or tasks
PerspectivesI Test on genomic data (not satisfying yet on text)I Refine partitioning constraint to get better recovery resultsI Develop regularization norm from partitioning constraints
48
![Page 118: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/118.jpg)
Conclusion and Future Work
ContributionsI Developed simple method to group (and select) featuresI Analyzed recovery performance in comparison to sparsityI Provides scalable and robust results
Not presentedI Convex approach for least square lossI Systematic method to group features, samples or tasks
PerspectivesI Test on genomic data (not satisfying yet on text)I Refine partitioning constraint to get better recovery resultsI Develop regularization norm from partitioning constraints
48
![Page 119: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/119.jpg)
Conclusion and Future Work
ContributionsI Developed simple method to group (and select) featuresI Analyzed recovery performance in comparison to sparsityI Provides scalable and robust results
Not presentedI Convex approach for least square lossI Systematic method to group features, samples or tasks
PerspectivesI Test on genomic data (not satisfying yet on text)I Refine partitioning constraint to get better recovery resultsI Develop regularization norm from partitioning constraints
48
![Page 120: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/120.jpg)
Thank you for your attention !
49
![Page 121: On the Geometry of Optimization Problems and their Structure · OptimizationProblems Goal Makebestpossibleactionforagiventask Classical Example: Minimizeproductioncostofanitem Formally,](https://reader034.vdocuments.site/reader034/viewer/2022050505/5f9705fbc1ebe926543dcfe0/html5/thumbnails/121.jpg)
ReferencesBolte, J., Daniilidis, A. and Lewis, A. [2007], ‘The łojasiewicz inequality for
nonsmooth subanalytic functions with applications to subgradient dynamicalsystems’, SIAM Journal on Optimization 17(4), 1205–1223.
Fercoq, O. and Qu, Z. [2017], ‘Adaptive restart of accelerated gradient methodsunder local quadratic growth condition’, arXiv preprint arXiv:1709.02300 .
Łojasiewicz, S. [1963], ‘Une propriété topologique des sous-ensemblesanalytiques réels’, Les équations aux dérivées partielles pp. 87–89.
Nemirovskii, A. and Nesterov, Y. [1985], ‘Optimal methods of smooth convexminimization’, USSR Computational Mathematics and Mathematical Physics25(2), 21–30.
Nesterov, Y. [1983], ‘A method of solving a convex programming problem withconvergence rate O(1/k2)’, Soviet Mathematics Doklady 27(2), 372–376.
Nesterov, Y. [2015], ‘Universal gradient methods for convex optimizationproblems’, Mathematical Programming 152(1-2), 381–404.
R., V., Boumal, N. and d’Aspremont, A. [2017], ‘Complexity verus statistacalperformance on sparse recovery problems’, Submitted at Information andInference : A Journal of the IMA .
R., V. and d’Aspremont, A. [2017], ‘Sharpness, restart and acceleration’,Advances in Neural Information Processing Systems 30 .
R., V., Fogel, F., d’Aspremont, A. and Bach, F. [2017], ‘Iterative hardclustering of features’, Submitted at 21st International Conference onArtificial Intelligence and Statistics (AISTATS 2018) .
Scieur, D., R., V., Bach, F. and d’Aspremont, A. [2017], ‘Integration methodsand accelerated optimization algorithms’, arXiv preprint arXiv:1702.06751 .
50