introduction to machine learning and stochastic optimization · introduction to machine learning...

Introduction to Machine Learning and Stochastic

OptimizationRobert M. Gower

Spring School on Optimization and Data Science, Novi Saad, March 2017

Solving the Finite Sum Training Problem

A Datum Function

Finite Sum Training Problem

Optimization Sum of Terms

The Training Problem

Gradient Descent Example

A Logistic Regression problem using the fourclass labelled data from LIBSVM

(n, d)= (862,2)

Optimal point

Gradient Descent Example

A Logistic Regression problem using the fourclass labelled data from LIBSVM

(n, d)= (862,2)

The Training Problem

Stochastic Gradient Descent

Is it possible to design a method that uses only the gradient of a single data function at each iteration?

Unbiased EstimateLet j be a random index sampled from {1, …, n} selected uniformly at random. Then

Optimal point

EXE: Using that

Show that

Strong Convexity

Assumptions for Convergence

EXE: Using that

Show that

Strong Convexity

Assumptions for Convergence Often the same as

the regularization parameter

EXE: Using that

Show that

Strong Convexity

Strong convexity parameter!

EXE: Using that

Show that

Expected Bounded Stochastic Gradients

Strong Convexity

Strong convexity parameter!

Example of Strong Convexity

Hinge loss + L2

Quadratic lower bound

Complexity / Convergence

Theorem

Proof:

Unbiased estimatorTaking expectation with respect to j

Taking total expectationStrong conv.

Bounded Stoch grad

Stochastic Gradient Descent α =0.01

Complexity / ConvergenceTheorem (Shrinking stepsize)

Sublinear convergence

Complexity / ConvergenceTheorem (Shrinking stepsize)

Sublinear convergence

Shrinking Stepsize

Comparison SGD vs GD

M. Schmidt, N. Le Roux, F. Bach (2016)Mathematical Programming Minimizing Finite Sums with the Stochastic Average Gradient.

Stoch. Average Gradient (SAG)

Maybe just an unbiased estimate is not enough.

Variance reduced methods through

Sketching

Build an Estimate of the Gradient

Instead of using directlyUse to update estimate

Unbiased

Converges in L2

We would like gradient estimate such that:

Unbiased

Converges in L2

We would like gradient estimate such that:

Example: The Stochastic Average Gradient

The Stochastic Average Gradient

How to prove this converges? Is this the only option?

Introducing the Jacobian

Is this the only option? How to prove this converges?

Stoch. Linear Measurement

Stochastic Sketch

Sparse Stochastic Matrix

Stochastic Sparse Sketches

Stochastic Sketch

Sparse Stochastic Matrix

Eg: SGD Sketch

Many examples: Sparse Rademacher matrices, sampling with replacement, nonuniform...etc

Eg: Mini-batch SGD Sketch

Sample Stochastic Sketch

A Jacobian Based Method

Maintain Jacobian Estimate

Improved Guess

Sample Stochastic Sketch

Maintain Jacobian Estimate

Jacobian Sketching Algorithm

Updating the Jacobian Estimate: Sketch and project

Sketch and Project the Jacobian

RMG and Peter Richtarik (2015) Randomized iterative methods for linear systemsSIAM Journal on Matrix Analysis and Applications 36(4)

Solution:

Exercise

Show that the solution Jt is given by

Proof: The Lagrangian is given by

Solution:

Exercise

Solution:

Exercise

Differentiating in J and setting to zero: (1)

Solution:

Exercise

Differentiating in J and setting to zero: (1)(2)

Solution:

Exercise

Differentiating in J and setting to zero: (1)(2)

Substituting (1) into (2) gives the solution.

Solution:

Sketch and project the Jacobian

Solution:

Unbiased ConditionLemma.

Proof:

Exercise

Proof:

Exercise

Proof:

Exercise

Proof:

Archetype Jacobian Sketching Algorithm

Looks expensive and complicated. Investigate

Homework:

Example: minibatch-SAGA

Gradiant estimate

Jacobain update

Homework:

Example: minibatch-SAGA

Proving Convergence of Variance reduced

methods

introduction to machine learning and stochastic optimization · introduction to machine learning...

Documents

stochastic global maximum principle for optimization with...

stochastic optimization for machine...

stochastic composition optimization - princeton...

stochastic optimization - markov chain monte...

a model reference adaptive search method for stochastic...

stochastic optimization for large-scale machine learning ·...

stochastic optimization of simulation models: management...

stochastic optimization algorithms.why?

adaptive stochastic optimization

issues in stochastic search and optimization …stochastic...

the stochastic optimization problem

weimar optimization and stochastic days - dynardo …...

stochastic optimization in high dimension

online stochastic combinatorial optimization

approximation algorithms for stochastic optimization

stochastic optimization algorithms for machine...

stochastic optimization part i: convex analysis and...

how does a stochastic optimization/approximation algorithm...

stochastic optimization

on the use of stochastic hessian information in ...€¦ ·...