tutorial: a unified framework for optimization under uncertainty...

107
Tutorial: A Unified Framework for Optimization under Uncertainty Enterprisewide Optimization Group Carnegie Mellon University September 11, 2018 Warren B. Powell Princeton University Department of Operations Research and Financial Engineering © 2018 Warren B. Powell, Princeton University

Upload: others

Post on 03-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Tutorial: A Unified Framework for

Optimization under Uncertainty

Enterprisewide Optimization GroupCarnegie Mellon University

September 11, 2018

Warren B. Powell

Princeton UniversityDepartment of Operations Research

and Financial Engineering

© 2018 Warren B. Powell, Princeton University

Page 2: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Ad-click bidding

Roomsage.com

Page 3: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Revenue management

Earning vs. learning

» You want to maximize

revenues, but you do not

know how demand

responds to price.

You earn the most with

prices near the middle, but

you do not learn anything.

You learn the most by sampling

endpoints, but then you do not

earn anything.

Page 4: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Learning problems

Health sciences

» Sequential design of

experiments for drug discovery

» Drug delivery – Optimizing the

design of protective

membranes to control drug

release

» Medical decision making –

Optimal learning for medical

treatments.

Page 5: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Drug discovery

Designing molecules

» X and Y are sites where we can hang substituents to change the

behavior of the molecule. We approximate the performance using

a linear belief model:

0

ij ij

sites i substituents j

Y X

» How to sequence experiments to

learn the best molecule as quickly

as possible?

Reg

ret

Page 6: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Ride sharing

Uber/Lyft

» Provides real-time, on-demand

transportation.

» Drivers are encouraged to enter or leave

the system using pricing signals and

informational guidance.

Decisions:

» How to price to get the right balance of

drivers relative to customers.

» Real-time management of drivers.

» Policies (rules for managing drivers,

customers, …)

Page 7: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Matching buyers with sellers

Now we have a logistic curve for

each origin-destination pair (i,j)

Number of offers for each (i,j) pair

is relatively small.

Need to generalize the learning

across hundreds to thousands of

markets.

0

0( , | )

1

aij ij ij

aij ij ij

p a

Y

p a

eP p a

e

Buyer Seller

Offered price

Page 8: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Emergency storm response

Hurricane Sandy

» Once in 100 years?

» Rare convergence of events

» But, meteorologists did an

amazing job of forecasting

the storm.

The power grid

» Loss of power creates

cascading failures (lack of

fuel, inability to pump water)

» How to plan?

» How to react?

Page 9: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Meeting variability with portfolios of generation

with mixtures of dispatchability

Page 10: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Storage applications

How much energy to store in a battery to handle the

volatility of wind and spot prices to meet demands?

Page 11: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Modeling

Before we can solve complex problems, we have

to know how to think about them.

The biggest challenge when making decisions

under uncertainty is modeling.

Min E {cx}Ax = b

x > 0

Mathematician

Software

Organize class

libraries, and set up

communications and

databases

Page 12: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Modeling

For deterministic problems, we speak the language

of mathematical programming

» Linear programming:

» For time-staged problems

min x cx

0

Ax b

x

1 1

0

t t t t t

t t t

t

A x B x b

D x u

x

0 ,...,

0

minT

T

x x t t

t

c x

Arguably Dantzig’s biggest

contribution, more so than the

simplex algorithm, was his

articulation of optimization

problems in a standard format,

which has given algorithmic

researchers a common

language.

Page 13: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Stochastic

programming

Markov

decision

processes

Reinforcement

learning

Optimal

control

Model

predictive

control

Robust

optimization

Approximate

dynamic

programming

Online

computation

Simulation

optimization

Stochastic

search

Decision

analysis

Stochastic

control

Simulation

optimizationDynamic

Programming

and

control

Optimal

learning

Bandit

problems

Page 14: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Stochastic

programming

Markov

decision

processes

Reinforcement

learning

Optimal

control

Model

predictive

control

Robust

optimization

Approximate

dynamic

programming

Online

computation

Simulation

optimization

Stochastic

search

Decision

analysis

Stochastic

control

Simulation

optimizationDynamic

Programming

and

control

Optimal

learning

Bandit

problems

Page 15: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Outline

Elements of a dynamic model

Modeling uncertainty

Designing policies

The four classes of policies

From deterministic to stochastic optimization

Page 16: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Outline

Elements of a dynamic model

Modeling uncertainty

Designing policies

The four classes of policies

From deterministic to stochastic optimization

Page 17: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Modeling dynamic systems

All sequential decision problems can be modeled

using five core components:

» State variables• What do we need to know at time t?

» Decision variables• What are our decisions?

» Exogenous information• What do we learn for the first time between t and t+1?

» Transition function• How do the state variables evolve over time?.

» Objective function• What are our performance metrics?

Page 18: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Modeling dynamic problems

The state variable:

Controls community

"Information state"

Operations research/MDP/Computer science

, , System state, where:

Resource state (physical state)

Loca

t

t t t t

t

x

S R I B

R

tion/status of truck/train/plane

Energy in storage

Information state

Prices

Weather

Belief state ("state of knowl

t

t

I

B

edge")

Belief about performance of a drug or catalyst

Belief about the status of equipment

Page 19: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

The state variable

My definition of a state variable:

» The first depends on a policy. The second depends

only on the problem (and includes the constraints).

» Using either definition, all properly modeled problems

are Markovian!

Page 20: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Modeling dynamic problems

Decisions:

Markov decision processes/Computer science

Discrete action

Control theory

Low-dimensional continuous vector

Operations research

Usually a discrete or continuous but high-dimensional

t

t

t

a

u

x

vector of decisions.

At this point, we do not specify to make a decision.

Instead, we define the function ( ) (or ( ) or ( )),

where specifies the type of policy. " " carries information

about the type of functi

how

X s A s U s

.on , and any tunable parameters ff

Page 21: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

The decision variables

Styles of decisions

» Binary

» Finite

» Continuous scalar

» Continuous vector

» Discrete vector

» Categorical

0,1x X

1,2,...,x X M

,x X a b

1( ,..., ), K kx x x x R

1( ,..., ), K kx x x x Z

1( ,..., ), is a category (e.g. red/green/blue)I ix a a a

Page 22: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Modeling dynamic problems

Exogenous information:

New information that first became known at time

ˆ ˆ ˆˆ = , , ,

ˆ Equipment failures, delays, new arrivals

New drivers being hired to the network

ˆ New customer demands

t

t t t t

t

t

W t

R D p E

R

D

ˆ Changes in prices

ˆ Information about the environment (temperature, ...)

t

t

p

E

Note: Any variable indexed by t is known at time t. This convention,

which is not standard in control theory, dramatically simplifies the

modeling of information.

1 2Below, we let represent a sequence of actual observations , ,....

refers to a sample realization of the random variable .t t

W W

W W

Page 23: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Modeling dynamic problems

The transition function

1 1

1 1

1 1

1 1

( , , )

ˆ Inventories

ˆ Spot prices

ˆ Market demands

M

t t t t

t t t t

t t t

t t t

S S S x W

R R x R

p p p

D D D

Also known as the:

“System model”

“State transition model”

“Plant model”

“Plant equation”

“Transition law”

“State equation”

“Transfer function”

“Transformation function”

“Law of motion”

“Model”

For many applications, these equations are unknown. This

is known as “model-free” dynamic programming.

Page 24: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Objective functions

» Cumulative reward (“online learning”)

• Policies have to work well over time.

» Final reward (“offline learning”)

• We only care about how well the final decision 𝑥𝜋,𝑁 works.

» Risk

1 0

0

max , ( ), |T

t t t t t

t

C S X S W S

E

Modeling stochastic, dynamic problems

,

0ˆmax ( , ) |NF x W S

E

0 0 0 1 1 1 0max ( , ( )), ( , ( )),..., ( , ( )) |T T TC S X S C S X S C S X S S

Page 25: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

The complete model:

» Objective function

• Cumulative reward (“online learning”)

• Final reward (“offline learning”)

• Risk:

» Transition function:

» Exogenous information:

1 0

0

max , ( ), |T

t t t t t

t

C S X S W S

E

Modeling stochastic, dynamic problems

1 1, , ( )M

t t t tS S S x W

0 1 2, , ,..., TS W W W

,

0ˆmax ( , ) |NF x W S

E

0 0 0 1 1 1 0max ( , ( )), ( , ( )),..., ( , ( )) |T T TC S X S C S X S C S X S S

Page 26: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

The modeling process

Modeling real applications

» I conduct a conversation with a domain expert to fill in

the elements of a problem:

State

variables

Decision

variables

New

information

Transition

function

Objective

function

What we need to know

(and only what we need)

What we control

What we didn’t know

when we made our decision

How the state variables evolve

Performance metrics

Page 27: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Outline

Elements of a dynamic model

Modeling uncertainty

Designing policies

The four classes of policies

From deterministic to stochastic optimization

Page 28: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Modeling uncertainty

Classes of uncertainty

» Observational uncertainty

» Prognostic uncertainty (forecasting)

» Experimental noise/variability

» Transitional uncertainty

» Inferential uncertainty

» Model uncertainty

» Systematic exogenous uncertainty

» Control/implementation uncertainty

» Algorithmic noise

» Goal uncertainty

Modeling uncertainty in the context of stochastic optimization is a

relatively untapped area of research.

Page 29: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Outline

Elements of a dynamic model

Modeling uncertainty

Designing policies

The four classes of policies

From deterministic to stochastic optimization

Page 30: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Designing policies

We have to start by describing what we mean by a

policy.

» Definition:

A policy is a mapping from a state to an action.

… any mapping.

How do we search over an arbitrary space of

policies?

Page 31: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Designing policies

“Policies” and the English language

Behavior Habit Procedure

Belief Laws/bylaws Process

Bias Manner Protocols

Commandment Method Recipe

Conduct Mode Ritual

Convention Mores Rule

Culture Patterns Style

Customs Plans Technique

Dogma Policies Tenet

Etiquette Practice Tradition

Fashion Prejudice Way of life

Formula Principle

Page 32: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Designing policies

Two fundamental strategies for finding policies:

1) Policy search – Search over a class of functions for

making decisions to optimize some metric.

2) Lookahead approximations – Approximate the impact

of a decision now on the future.

0( , )0

max , ( | ) |f f

T

t t t tf Ft

E C S X S S

*

' ' ' ' 1

' 1

( ) arg max ( , ) max ( , ( )) | | ,t

T

t t x t t t t t t t t t

t t

X S C S x C S X S S S x

E E

Page 33: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Designing policies

Policy search:

1a) Policy function approximations (PFAs)• Lookup tables

– “when in this state, take this action”

• Parametric functions

– Order-up-to policies: if inventory is less than s, order up to S.

– Affine policies -

– Neural networks

• Locally/semi/non parametric

– Requires optimizing over local regions

1b) Cost function approximations (CFAs)• Optimizing a deterministic model modified to handle uncertainty

(buffer stocks, schedule slack)

( )( | ) arg max ( , | )

t t

CFA

t t txX S C S x

X

( | )PFA

t tx X S

( | ) ( )PFA

t t f f t

f F

x X S S

Page 34: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Designing policies

Lookahead approximations – Approximate the impact of a

decision now on the future:

2a) Approximating the value of being in a state (VFA):

2b) Direct lookahead (DLA)

Optimal policy:

Approximate policy – solve an approximate lookahead model:

*

1 1

1 1

( ) arg max ( , ) ( ) | ,

( ) arg max ( , ) ( ) | ,

arg max ( , ) ( )

t

t

t

t t x t t t t t t

VFA

t t x t t t t t t

x x

x t t t t

X S C S x V S S x

X S C S x V S S x

C S x V S

E

E

' ' ' , 1

' 1

( ) arg max ( , ) max ( , ( )) | | ,t

t HDLA

t t x t t tt t tt t t tt t

t t

X S C S x C S X S S S x

E E

*

' ' ' 1

' 1

( ) arg max ( , ) max ( , ( )) | | ,

t

T

t t x t t t t t t t t

t t

X S C S x C S X S S S xE E

Page 35: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

1) Policy function approximations (PFAs)

» Lookup tables, rules, parametric/nonparametric functions

2) Cost function approximation (CFAs)

»

3) Policies based on value function approximations (VFAs)

»

4) Direct lookahead policies (DLAs)

» Deterministic lookahead/rolling horizon proc./model predictive control

» Chance constrained programming

» Stochastic lookahead /stochastic prog/Monte Carlo tree search

» “Robust optimization”

Four (meta)classes of policies

( )( | ) arg max ( , | )

t t

CFA

t t txX S C S x

X

( ) arg max ( , ) ( , )t

VFA x x

t t x t t t t t tX S C S x V S S x

' '

' 1

( ) arg max ( , ) ( ) ( ( ), ( ))t

TLA S

t t tt tt tt tt

t t

X S C S x p C S x

xtt,xt ,t+1

,...,xt ,t+T

,

' '( ),...,

' 1

( ) arg max min ( , ) ( ( ), ( ))ttt t t H

TLA RO

t t tt tt tt ttw Wx x

t t

X S C S x C S w x w

[ ( )] 1t tP A x f W

Poli

cy s

earc

h

,

' ',...,

' 1

( ) arg max ( , ) ( , )

tt t t H

TLA D

t t tt tt tt ttx x

t t

X S C S x C S x

Look

ahea

dap

pro

xim

atio

ns

Page 36: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

1) Policy function approximations (PFAs)

» Lookup tables, rules, parametric/nonparametric functions

2) Cost function approximation (CFAs)

»

3) Policies based on value function approximations (VFAs)

»

4) Direct lookahead policies (DLAs)

» Deterministic lookahead/rolling horizon proc./model predictive control

» Chance constrained programming

» Stochastic lookahead /stochastic prog/Monte Carlo tree search

» “Robust optimization”

Four (meta)classes of policies

( )( | ) arg max ( , | )

t t

CFA

t t txX S C S x

X

,

' ',...,

' 1

( ) arg max ( , ) ( , )

tt t t H

TLA D

t t tt tt tt ttx x

t t

X S C S x C S x

( ) arg max ( , ) ( , )t

VFA x x

t t x t t t t t tX S C S x V S S x

' '

' 1

( ) arg max ( , ) ( ) ( ( ), ( ))t

TLA S

t t tt tt tt tt

t t

X S C S x p C S x

xtt,xt ,t+1

,...,xt ,t+T

,

' '( ),...,

' 1

( ) arg max min ( , ) ( ( ), ( ))ttt t t H

TLA RO

t t tt tt tt ttw Wx x

t t

X S C S x C S w x w

[ ( )] 1t tP A x f W

Funct

ion a

ppro

x.

Page 37: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

1) Policy function approximations (PFAs)

» Lookup tables, rules, parametric/nonparametric functions

2) Cost function approximation (CFAs)

»

3) Policies based on value function approximations (VFAs)

»

4) Direct lookahead policies (DLAs)

» Deterministic lookahead/rolling horizon proc./model predictive control

» Chance constrained programming

» Stochastic lookahead /stochastic prog/Monte Carlo tree search

» “Robust optimization”

Four (meta)classes of policies

( )( | ) arg max ( , | )

t t

CFA

t t txX S C S x

X

,

' ',...,

' 1

( ) arg max ( , ) ( , )

tt t t H

TLA D

t t tt tt tt ttx x

t t

X S C S x C S x

( ) arg max ( , ) ( , )t

VFA x x

t t x t t t t t tX S C S x V S S x

' '

' 1

( ) arg max ( , ) ( ) ( ( ), ( ))t

TLA S

t t tt tt tt tt

t t

X S C S x p C S x

xtt,xt ,t+1

,...,xt ,t+T

,

' '( ),...,

' 1

( ) arg max min ( , ) ( ( ), ( ))ttt t t H

TLA RO

t t tt tt tt ttw Wx x

t t

X S C S x C S w x w

[ ( )] 1t tP A x f W

Imb

edd

ed o

pti

miz

atio

n

Page 38: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Learning problems

Classes of learning problems in stochastic

optimization

1) Approximating the objective

𝐹(𝑥|𝜃) ≈ 𝔼𝐹(𝑥,𝑊).

2) Designing a policy 𝑋𝜋 𝑆 𝜃 .

3) A value function approximation

𝑉𝑡(𝑆𝑡|𝜃) ≈ 𝑉𝑡(𝑆𝑡).

4) Designing a cost function approximation:• The objective function 𝐶𝜋 𝑆𝑡 , 𝑥𝑡|𝜃 .

• The constraints 𝑋𝜋(𝑆𝑡|𝜃)

5) Approximating the transition function

𝑆𝑀(𝑆𝑡 , 𝑥𝑡 ,𝑊𝑡+1|𝜃) ≈ 𝑆𝑀(𝑆𝑡 , 𝑥𝑡 ,𝑊𝑡+1)

Page 39: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Approximation strategies

Approximation strategies» Lookup tables

• Independent beliefs

• Correlated beliefs

» Linear parametric models• Linear models

• Sparse-linear

• Tree regression

» Nonlinear parametric models• Logistic regression

• Neural networks

» Nonparametric models• Gaussian process regression

• Kernel regression

• Support vector machines

• Deep neural networks

Page 40: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Learning challenges

The learning challenge

From big (batch) data… … to recursive learning

Page 41: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Learning challenges

Variable-dimensional learning

Page 42: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Outline

Elements of a dynamic model

Modeling uncertainty

Designing policies

The four classes of policies

From deterministic to stochastic optimization

Page 43: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Outline

The four classes of policies

» Policy function approximations (PFAs)

» Cost function approximations (CFAs)

» Value function approximations (VFAs)

» Direct lookahead policies (DLAs)

» A hybrid lookahead/CFA

Page 44: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Outline

The four classes of policies

» Policy function approximations (PFAs)

» Cost function approximations (CFAs)

» Value function approximations (VFAs)

» Direct lookahead policies (DLAs)

» A hybrid lookahead/CFA

Page 45: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Policy function approximations

Battery arbitrage – When to charge, when to

discharge, given volatile LMPs

Page 46: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71

Grid operators require that batteries bid charge and

discharge prices, an hour in advance.

We have to search for the best values for the policy

parameters

DischargeCharge

Charge Discharge and .

Policy function approximations

Page 47: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Policy function approximations

Our policy function might be the parametric

model (this is nonlinear in the parameters):charge

charge discharge

charge

1 if

( | ) 0 if

1 if

t

t t

t

p

X S p

p

Energy in storage:

Price of electricity:

Page 48: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Policy function approximations

Finding the best policy

» We need to maximize

» We cannot compute the expectation, so we run simulations:

DischargeCharge

0

max ( ) , ( | )T

t

t t t

t

F C S X S

E

Page 49: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Outline

The four classes of policies

» Policy function approximations (PFAs)

» Cost function approximations (CFAs)

» Value function approximations (VFAs)

» Direct lookahead policies (DLAs)

» A hybrid lookahead/CFA

Page 50: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Cost function approximations

Lookup table

» We can organize potential catalysts into groups

» Scientists using domain knowledge can estimate

correlations in experiments between similar catalysts.

Page 51: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

51

Cost function approximations

Correlated beliefs: Testing one material teaches us about other

materials

1 2 3 4 4 5

Page 52: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Cost function approximations

Cost function approximations (CFA)

» Upper confidence bounding

» Interval estimation

» Boltzmann exploration (“soft max”)• Choose x with probability:

log( | ) arg max

UCB n UCB n UCB

x x n

x

nX S

N

( | ) arg maxIE n IE n IE n

x x xX S

0

n

xz

'

'

( )

nx

nx

n

x

x

eP

e

Page 53: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

53

Cost function approximations

Picking 𝜃𝐼𝐸 = 0 means we are evaluating each choice

at the mean.

1 2 3 4 4 5

Page 54: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

54

Cost function approximations

Picking 𝜃𝐼𝐸 = 2 means we are evaluating each choice

at the 95th percentile.

1 2 3 4 4 5

Page 55: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Cost function approximations

Optimizing the policy

» We optimize 𝜃𝐼𝐸 to maximize:

where

Notes:

» This can handle any belief model,

including correlated beliefs, nonlinear

belief models.

» All we require is that we be able to

simulate a policy.

( | ) arg max ( , )n IE n IE n IE n n n n

x x x x xx X S S

,max ( ) ,IE

IE NF F x W

E

Reg

ret

Page 56: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Cost function approximations

Other applications

» Airlines optimizing schedules with schedule slack to

handle weather uncertainty.

» Manufacturers using buffer stocks to hedge against

production delays and quality problems.

» Grid operators scheduling extra generation capacity in

case of outages.

» Adding time to a trip planned by Google maps to

account for uncertain congestion.

Page 57: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Outline

The four classes of policies

» Policy function approximations (PFAs)

» Cost function approximations (CFAs)

» Value function approximations (VFAs)

» Direct lookahead policies (DLAs)

» A hybrid lookahead/CFA

Page 58: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Value function approximations

Q-learning (for discrete actions)

» But what if the action a is a vector?

1

'

1

1 1

ˆ ( , ) ( , ) max ( ', ')

ˆ( , ) (1 ) ( , ) ( , )

n n n n n n

a

n n n n n n n n n

n n

q s a r s a Q s a

Q s a Q s a q s a

g

a a

-

-

- -

= +

= - +

Page 59: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Blood management

Managing blood inventories

Page 60: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Blood management

Managing blood inventories over time

t=0

0S

1 1ˆ ˆ,R D

1S

Week 1

1x

2 2ˆ ˆ,R D

2S

Week 2

2x

2

xS

3 3ˆ ˆ,R D

3S3x

Week 3

3

xS

t=1 t=2 t=3

Week 0

0x

Page 61: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

O-,1

O-,2

O-,3

AB+,2

AB+,3

O-,0

t ABD

AB+,0

AB+,1

AB+,2

O-,0

O-,1

O-,2

,( ,0)t ABR

,( ,1)t ABR

,( ,2)t ABR

,( ,0)t OR

,( ,1)t OR

,( ,2)t OR

t ABD

t AD

t ABD

t ABD

t ABD

t ABD

AB+

AB-

A+

A-

B+

B-

O+

O-

x

tR

AB+,0

AB+,1

t ABD

Satisfy a demand Hold

tS ˆ , t tR D

Page 62: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

AB+,0

AB+,1

AB+,2

tR

O-,0

O-,1

O-,2

x

tR

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

1,ˆ

t ABR

1tR

1,ˆ

t OR

ˆtD

,( ,0)t ABR

,( ,1)t ABR

,( ,2)t ABR

,( ,0)t OR

,( ,1)t OR

,( ,2)t OR

Page 63: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

AB+,0

AB+,1

AB+,2

tR x

tR

O-,0

O-,1

O-,2

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

ˆtD

,( ,0)t ABR

,( ,1)t ABR

,( ,2)t ABR

,( ,0)t OR

,( ,1)t OR

,( ,2)t OR

Page 64: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

( )tF R

AB+,0

AB+,1

AB+,2

tR x

tR

O-,0

O-,1

O-,2

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

ˆtD

,( ,0)t ABR

,( ,1)t ABR

,( ,2)t ABR

,( ,0)t OR

,( ,1)t OR

,( ,2)t OR

Solve this as a

linear program.

Page 65: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

( )tF R

AB+,0

AB+,1

AB+,2

tR x

tR

O-,0

O-,1

O-,2

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

ˆtD

Dual variables give

value additional

unit of blood..

Duals

,( ,0)t̂ AB

,( ,1)t̂ AB

,( ,2)t̂ AB

,( ,0)t̂ O

,( ,1)t̂ O

,( ,2)t̂ O

Page 66: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Updating the value function approximation

Estimate the gradient at

,( ,2)

n

t ABR

,( ,2)ˆn

t AB

n

tR

( )tF R

Page 67: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Updating the value function approximation

Update the value function at

,

1

x n

tR

1

1 1( )n x

t tV R

,

1

x n

tR

,( ,2)ˆn

t AB

( )tF R

,( ,2)

n

t ABR

Page 68: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Updating the value function approximation

Update the value function at ,

1

x n

tR

,( ,2)ˆn

t AB

,

1

x n

tR

1

1 1( )n x

t tV R

Page 69: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Updating the value function approximation

Update the value function at ,

1

x n

tR

,

1

x n

tR

1

1 1( )n x

t tV R

1 1( )n x

t tV R

Page 70: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Exploiting concavity

Derivatives are used to estimate a piecewise linear approximation

( )t tV R

tR

Page 71: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Iterative learning

t

Page 72: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Iterative learning

Page 73: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Iterative learning

Page 74: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Iterative learning

Page 75: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

1200000

1300000

1400000

1500000

1600000

1700000

1800000

1900000

0 100 200 300 400 500 600 700 800 900 1000

Ob

jecti

ve fu

ncti

on

Iterations

Approximate dynamic programming

… a typical performance graph.

Page 76: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Outline

The four classes of policies

» Policy function approximations (PFAs)

» Cost function approximations (CFAs)

» Value function approximations (VFAs)

» Direct lookahead policies (DLAs)

» A hybrid lookahead/CFA

Page 77: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Lookahead policies

Planning your next chess move:

» You put your finger on the piece while you think about

moves into the future. This is a lookahead policy,

illustrated for a problem with discrete actions.

Page 78: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies
Page 79: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Lookahead policies

Decision trees:

Page 80: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Lookahead policies

Modeling lookahead policies

» Lookahead policies solve a lookahead model, which is an

approximation of the future.

» It is important to understand the difference between the:

• Base model – this is the model we are trying to solve by finding

the best policy. This is usually some form of simulator.

• The lookahead model, which is our approximation of the future

to help us make better decisions now.

» The base model is typically a simulator, or it might be the

real world.

Page 81: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

XX

X

X

X

X

callcall

call

call

call

call

call

0.26 0.38 0.78 0.6 0.6

0

0

0

0.02

0.03

0.04

0.04

0.05

0.064

0.0640.064 0.064

0.502 0.502

0.502 0.502 0.502

0.603 0.603

0.670.670.05

0.05

0.5030.5

0.76

0.5030.503

0.540.54

0.503

0.5030.503

Emergency storm response

Page 82: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

XX

X

X

X

X

0.0 0.0 0.0 0.0 0.0

0

0

00.032

0.048

0.05

60.056

0.08

0.0604

0.06040.0604 0.0604

0.51 0.51

0.51 0.51 0

0.62 0.62

00.990.08

0.08

0.5030.5

0.76

0.5030.503

0.540.54

0.503

0.5030.503

Emergency storm response

callcall

call

call

call

call

call

Page 83: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

XX

X

X

X

X

0.0 0.0 0.0 0.0 0.0

0

0

00.0

0.0

0.0

0.0

0.06

0.060.06 0.06

0 0

0 0

0 0

000.0

0.0

0.5030.5

0.76

0.5030.503

0.540.54

0.503

0.5030.503

00.0

Emergency storm response

callcall

call

call

call

call

call

Page 84: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Decision Outcome Decision Outcome Decision

Page 85: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Lookahead policies

Monte Carlo tree search:

C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis and S.

Colton, “A survey of Monte Carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in Games,

vol. 4, no. 1, pp. 1–49, March 2012.

Page 86: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies
Page 87: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Outline

The four classes of policies

» Policy function approximations (PFAs)

» Cost function approximations (CFAs)

» Value function approximations (VFAs)

» Direct lookahead policies (DLAs)

» A hybrid lookahead/CFA

Page 88: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Parametric cost function approximation

An energy storage problem:

Page 89: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Parametric cost function approximation

Forecasts evolve over time as new information arrives:

Actual

Rolling forecasts,

updated each

hour.

Forecast made at

midnight:

Page 90: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Parametric cost function approximation

Benchmark policy – Deterministic lookahead

Page 91: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Parametric cost function approximation

Parametric cost function approximations

» Replace the constraint

with:

» Lookup table modified forecasts (one adjustment term for

each time in the future):

» Exponential function for adjustments (just two parameters)

» Constant adjustment (one parameter)

't t

'

wr

ttx '

wd

ttx

Page 92: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

0fs = 10fs =

20fs = 30fs =

Lookup table

Constant parameter

Exponential function

𝜃

𝜃

𝜃

𝜃

Page 93: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Parametric cost function approximation

Improvement over deterministic benchmark:

Lookup tableExponential

Constant

Page 94: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

An energy storage problem

Consider a basic energy storage problem:

» We are going to show that with minor variations in the

characteristics of this problem, we can make each class

of policy work best.

Page 95: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

An energy storage problem

We can create distinct flavors of this problem:

» Problem class 1 – Best for PFAs• Highly stochastic (heavy tailed) electricity prices

• Stationary data

» Problem class 2 – Best for CFAs• Stochastic prices and wind (but not heavy tailed)

• Stationary data

» Problem class 3 - Best for VFAs• Stochastic wind and prices (but not too random)

• Time varying loads, but inaccurate wind forecasts

» Problem class 4 – Best for deterministic lookaheads• Relatively low noise problem with accurate forecasts

» Problem class 5 – A hybrid policy worked best here• Stochastic prices and wind, nonstationary data, noisy forecasts.

Page 96: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

An energy storage problem

The policies

» The PFA:• Charge battery when price is below p1

• Discharge when price is above p2

» The CFA• Optimize over a horizon H; maintain upper and lower bounds (u, l)

for every time period except the first (note that this is a hybrid with a

lookahead).

» The VFA• Piecewise linear, concave value function in terms of energy, indexed

by time.

» The lookahead (deterministic)• Optimize over a horizon H (only tunable parameter) using forecasts of

demand, prices and wind energy

» The lookahead CFA• Use a lookahead policy (deterministic), but with a tunable parameter

that improves robustness.

Page 97: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

An energy storage problem

Each policy is best on certain problems

» Results are percent of posterior optimal solution

» … any policy might be best depending on the data.

Joint research with Prof. Stephan Meisel, University of Muenster, Germany.

Page 98: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Outline

Elements of a dynamic model

Modeling uncertainty

Designing policies

The four classes of policies

From deterministic to stochastic optimization

Page 99: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

From deterministic to stochastic

Imagine that you would like to solve the time-dependent

linear program:

» subject to

We can convert this to a proper stochastic model by

replacing with and taking an expectation:

The policy has to satisfy with transition function:

0 ,...,

0

minT

T

x x t t

t

c x

0 0 0

1 1 , 1.t t t t t

A x b

A x B x b t

tx ( )t tX S

0

min ( )T

t t t

t

c X S

E

( )t tX St t tA x R

1 1, ,M

t t t tS S S x W

Page 100: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Modeling

Deterministic

» Objective function

» Decision variables:

» Constraints:

• at time t

• Transition function

Stochastic

» Objective function

» Policy

» Constraints at time t

» Transition function

» Exogenous information

1 0

0

max , ( ), |

T

t t t t t

t

E C S X S W S0 ,...,

0

minT

T

t tx x

t

c x

( )t t t tx X S X

1 1t t t tR b B x 1 1, ,M

t t t tS S S x W

0 1 2( , , ,..., )TS W W W

0

t t t

t

A x R

x

t

X

0 ,..., Tx x :X S X

Page 101: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

From deterministic to stochastic

Deterministic problems

» Modeling is important, but not

central.

» Algorithms are the most

important, and hardest part.

» Huh?

» Just add up the costs!!

Stochastic problems

» Modeling is the most

important, and hardest, aspect

of stochastic optimization

» Searching for policies is

important, but less critical.

» Modeling uncertainty is often

overlooked, but is of central

importance.

» Evaluating a policy is

important, and difficult. In a

simulator? In the field?

Page 102: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

The universal objective function

with

You next need to develop a stochastic model:

» Model uncertainty about parameters in 𝑆0» Model the stochastic process 𝑊1,𝑊2, … ,𝑊𝑁 (for training)

» Model the random variable 𝑊 (for testing, if necessary)

Then search for policies:

» Policy search:• PFAs, CFAs

» Lookahead policies:• VFAs, DLAs

1 0

0

max , ( ), |

T

t t t t t

t

E C S X S W S

Modeling stochastic, dynamic problems

1 1, , ( )M

t t t tS S S x W

Page 103: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Thank you!

For more information, go to

http://www.castlelab.princeton.edu/jungle/

Scroll to “Educational materials”

Page 104: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies
Page 105: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies
Page 106: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies
Page 107: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies

Theory

Applications

Computation

Modeling