optimization for deep learning...โ€ข starting with the unconstrained, one dimensional case โ€“ to...

13
Optimization for Deep Learning Industrial AI Lab. Prof. Seungchul Lee

Upload: others

Post on 24-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimization for Deep Learning...โ€ข Starting with the unconstrained, one dimensional case โ€“ To find minimum point ๐‘ฅ๐‘ฅโˆ—, we can look at the derivative of the function ๐‘“๐‘“โ€ฒ๐‘ฅ๐‘ฅ

Optimization for Deep Learning

Industrial AI Lab.Prof. Seungchul Lee

Page 2: Optimization for Deep Learning...โ€ข Starting with the unconstrained, one dimensional case โ€“ To find minimum point ๐‘ฅ๐‘ฅโˆ—, we can look at the derivative of the function ๐‘“๐‘“โ€ฒ๐‘ฅ๐‘ฅ

Optimization

โ€ข 3 key components1) Objective function2) Decision variable or unknown3) Constraints

โ€ข Procedures1) The process of identifying objective, variables, and constraints for a given problem (known as

"modelingโ€)2) Once the model has been formulated, optimization algorithm can be used to find its solutions

2

Page 3: Optimization for Deep Learning...โ€ข Starting with the unconstrained, one dimensional case โ€“ To find minimum point ๐‘ฅ๐‘ฅโˆ—, we can look at the derivative of the function ๐‘“๐‘“โ€ฒ๐‘ฅ๐‘ฅ

Optimization: Mathematical Model

โ€ข In mathematical expression

โ€“ ๐‘ฅ๐‘ฅ =๐‘ฅ๐‘ฅ1โ‹ฎ๐‘ฅ๐‘ฅ๐‘›๐‘›

โˆˆ โ„๐‘›๐‘› is the decision variable

โ€“ ๐‘“๐‘“:โ„๐‘›๐‘› โ†’ โ„ is objective functionโ€“ Feasible region: ๐ถ๐ถ = {๐‘ฅ๐‘ฅ:๐‘”๐‘”๐‘–๐‘–(๐‘ฅ๐‘ฅ) โ‰ค 0, ๐‘–๐‘– = 1,โ‹ฏ ,๐‘š๐‘š}

โ€“ ๐‘ฅ๐‘ฅโˆ— โˆˆ โ„๐‘›๐‘› is an optimal solution if ๐‘ฅ๐‘ฅโˆ— โˆˆ ๐ถ๐ถ and ๐‘“๐‘“(๐‘ฅ๐‘ฅโˆ—) โ‰ค ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ,โˆ€๐‘ฅ๐‘ฅ โˆˆ ๐ถ๐ถ

3

Page 4: Optimization for Deep Learning...โ€ข Starting with the unconstrained, one dimensional case โ€“ To find minimum point ๐‘ฅ๐‘ฅโˆ—, we can look at the derivative of the function ๐‘“๐‘“โ€ฒ๐‘ฅ๐‘ฅ

Optimization: Mathematical Model

โ€ข In mathematical expression

โ€ข Remarks: equivalent

4

Page 5: Optimization for Deep Learning...โ€ข Starting with the unconstrained, one dimensional case โ€“ To find minimum point ๐‘ฅ๐‘ฅโˆ—, we can look at the derivative of the function ๐‘“๐‘“โ€ฒ๐‘ฅ๐‘ฅ

Solving Optimization Problems

5

Page 6: Optimization for Deep Learning...โ€ข Starting with the unconstrained, one dimensional case โ€“ To find minimum point ๐‘ฅ๐‘ฅโˆ—, we can look at the derivative of the function ๐‘“๐‘“โ€ฒ๐‘ฅ๐‘ฅ

Solving Optimization Problems

โ€ข Starting with the unconstrained, one dimensional case

โ€“ To find minimum point ๐‘ฅ๐‘ฅโˆ—, we can look at the derivative of the function ๐‘“๐‘“โ€ฒ ๐‘ฅ๐‘ฅโ€“ Any location where ๐‘“๐‘“โ€ฒ ๐‘ฅ๐‘ฅ = 0 will be a โ€œflatโ€ point in the function

โ€ข For convex problems, this is guaranteed to be a minimum

6

Page 7: Optimization for Deep Learning...โ€ข Starting with the unconstrained, one dimensional case โ€“ To find minimum point ๐‘ฅ๐‘ฅโˆ—, we can look at the derivative of the function ๐‘“๐‘“โ€ฒ๐‘ฅ๐‘ฅ

Solving Optimization Problems

โ€ข Generalization for multivariate function ๐‘“๐‘“:โ„๐‘›๐‘› โ†’ โ„โ€“ the gradient of ๐‘“๐‘“ must be zero

โ€ข For defined as above, gradient is a n-dimensional vector containing partial derivatives with respect to each dimension

โ€ข For continuously differentiable ๐‘“๐‘“ and unconstrained optimization, optimal point must have

๐›ป๐›ป๐‘ฅ๐‘ฅ๐‘“๐‘“ ๐‘ฅ๐‘ฅโˆ— = 0

7

Page 8: Optimization for Deep Learning...โ€ข Starting with the unconstrained, one dimensional case โ€“ To find minimum point ๐‘ฅ๐‘ฅโˆ—, we can look at the derivative of the function ๐‘“๐‘“โ€ฒ๐‘ฅ๐‘ฅ

How do we Find ๐œต๐œต๐’™๐’™๐’‡๐’‡ ๐’™๐’™ = ๐ŸŽ๐ŸŽ

โ€ข Direct solutionโ€“ In some cases, it is possible to analytically compute ๐‘ฅ๐‘ฅโˆ— such that ๐›ป๐›ป๐‘ฅ๐‘ฅ๐‘“๐‘“ ๐‘ฅ๐‘ฅโˆ— = 0

8

Page 9: Optimization for Deep Learning...โ€ข Starting with the unconstrained, one dimensional case โ€“ To find minimum point ๐‘ฅ๐‘ฅโˆ—, we can look at the derivative of the function ๐‘“๐‘“โ€ฒ๐‘ฅ๐‘ฅ

How do we Find ๐œต๐œต๐’™๐’™๐’‡๐’‡ ๐’™๐’™ = ๐ŸŽ๐ŸŽ

โ€ข Iterative methodsโ€“ More commonly the condition that the gradient equal zero will not have an analytical solution, require

iterative methods

โ€“ The gradient points in the direction of โ€œsteepest ascentโ€ for function ๐‘“๐‘“

9

Page 10: Optimization for Deep Learning...โ€ข Starting with the unconstrained, one dimensional case โ€“ To find minimum point ๐‘ฅ๐‘ฅโˆ—, we can look at the derivative of the function ๐‘“๐‘“โ€ฒ๐‘ฅ๐‘ฅ

Descent Direction (1D)

โ€ข It motivates the gradient descent algorithm, which repeatedly takes steps in the direction of the negative gradient

10

Page 11: Optimization for Deep Learning...โ€ข Starting with the unconstrained, one dimensional case โ€“ To find minimum point ๐‘ฅ๐‘ฅโˆ—, we can look at the derivative of the function ๐‘“๐‘“โ€ฒ๐‘ฅ๐‘ฅ

Gradient Descent

11

Page 12: Optimization for Deep Learning...โ€ข Starting with the unconstrained, one dimensional case โ€“ To find minimum point ๐‘ฅ๐‘ฅโˆ—, we can look at the derivative of the function ๐‘“๐‘“โ€ฒ๐‘ฅ๐‘ฅ

Gradient Descent

โ€ข Update rule:

12

Page 13: Optimization for Deep Learning...โ€ข Starting with the unconstrained, one dimensional case โ€“ To find minimum point ๐‘ฅ๐‘ฅโˆ—, we can look at the derivative of the function ๐‘“๐‘“โ€ฒ๐‘ฅ๐‘ฅ

Practically Solving Optimization Problems

โ€ข The good news: for many classes of optimization problems, people have already done all the โ€œhard workโ€ of developing numerical algorithmsโ€“ A wide range of tools that can take optimization problems in โ€œnaturalโ€ forms and compute a

solution

โ€ข Gradient descentโ€“ Easy to implementโ€“ Very general, can be applied to any differentiable loss functionsโ€“ Requires less memory and computations (for stochastic methods)โ€“ Neural networks/deep learning โ€“ TensorFlow

13