planning and model selection in data driven markov...

69
Planning and Model Selection in Data Driven Markov models Shie Mannor Department of Electrical Engineering Technion Joint work with many people along the way: Dotan Di-Castro (Yahoo!), Assaf Halak (Technion), Ofir Mebel (Apple), Aviv Tamar (Technion), John Tsitsiklis (MIT), Huan Xu (NUS), and many others. ICML, June 2014 S. Mannor (Technion) Planning and Model Selection in Data Driven Markov models ICML, June 2014 1 / 36

Upload: others

Post on 01-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Planning and Model Selection in Data DrivenMarkov models

Shie Mannor

Department of Electrical EngineeringTechnion

Joint work with many people along the way: Dotan Di-Castro (Yahoo!), Assaf Halak(Technion), Ofir Mebel (Apple), Aviv Tamar (Technion), John Tsitsiklis (MIT), Huan Xu

(NUS), and many others.

ICML, June 2014

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 1 / 36

Page 2: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Classical planning problems

We typically want to maximize the expected average/discountedreward

In planning:Model is “known”A single scalar reward

We believe the model to be true, even though it is not ...

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 2 / 36

Page 3: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Classical planning problems

We typically want to maximize the expected average/discountedreward

In planning:Model is “known”A single scalar reward

We believe the model to be true, even though it is not ...

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 2 / 36

Page 4: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Classical planning problems

We typically want to maximize the expected average/discountedreward

In planning:Model is “known”A single scalar reward

We believe the model to be true, even though it is not ...

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 2 / 36

Page 5: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Flavors of uncertainty

“Natural" uncertainty: random transitions/rewards→ classicalMDP planning. Not this talk.

Deterministic uncertainty in the parameters→ Robust MDPs. 1/2of this talk.

Probabilistic uncertainty in the parameters→ Bayesian RL/MDPs→ Not this talk (but a book is coming!).

Model uncertainty: model is not known→ second 1/2 of this talk.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 3 / 36

Page 6: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Flavors of uncertainty

“Natural" uncertainty: random transitions/rewards→ classicalMDP planning. Not this talk.

Deterministic uncertainty in the parameters→ Robust MDPs. 1/2of this talk.

Probabilistic uncertainty in the parameters→ Bayesian RL/MDPs→ Not this talk (but a book is coming!).

Model uncertainty: model is not known→ second 1/2 of this talk.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 3 / 36

Page 7: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Flavors of uncertainty

“Natural" uncertainty: random transitions/rewards→ classicalMDP planning. Not this talk.

Deterministic uncertainty in the parameters→ Robust MDPs. 1/2of this talk.

Probabilistic uncertainty in the parameters→ Bayesian RL/MDPs→ Not this talk (but a book is coming!).

Model uncertainty: model is not known→ second 1/2 of this talk.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 3 / 36

Page 8: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Flavors of uncertainty

“Natural" uncertainty: random transitions/rewards→ classicalMDP planning. Not this talk.

Deterministic uncertainty in the parameters→ Robust MDPs. 1/2of this talk.

Probabilistic uncertainty in the parameters→ Bayesian RL/MDPs→ Not this talk (but a book is coming!).

Model uncertainty: model is not known→ second 1/2 of this talk.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 3 / 36

Page 9: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Motivation I: Power grid

Unit commitment scheduling problem (very dynamic)Objective: guarantee reliability (99.9%)

Model is kind of known, but highly stochastic.Failures (outages) do happen!

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 4 / 36

Page 10: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Motivation I: Power grid

Unit commitment scheduling problem (very dynamic)Objective: guarantee reliability (99.9%)

Model is kind of known, but highly stochastic.Failures (outages) do happen!

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 4 / 36

Page 11: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Motivation IILarge US retailer (Fortune 500 company)Marketing problem: send or not send coupon/invitation/mail ordercatalogue

Common wisdom: per customer look at RFMRecency, Frequency, Monetary value

Dynamics matterHow to discretize?

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 5 / 36

Page 12: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Motivation IILarge US retailer (Fortune 500 company)Marketing problem: send or not send coupon/invitation/mail ordercatalogue

Common wisdom: per customer look at RFMRecency, Frequency, Monetary value

Dynamics matterHow to discretize?

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 5 / 36

Page 13: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Motivation IILarge US retailer (Fortune 500 company)Marketing problem: send or not send coupon/invitation/mail ordercatalogue

Common wisdom: per customer look at RFMRecency, Frequency, Monetary value

Dynamics matterHow to discretize?

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 5 / 36

Page 14: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Motivation IILarge US retailer (Fortune 500 company)Marketing problem: send or not send coupon/invitation/mail ordercatalogue

Common wisdom: per customer look at RFMRecency, Frequency, Monetary value

Dynamics matterHow to discretize?

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 5 / 36

Page 15: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Common to the problems

“Real” state space is huge with lots of uncertainty and parameters

Batch data are available

Reasonable simulators, but not ground truth

Computational speed less of an issue, but still need to get thingsdone

Important low probability events

Uncertainty and (model) risk are THE concern

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 6 / 36

Page 16: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Common to the problems

“Real” state space is huge with lots of uncertainty and parameters

Batch data are available

Reasonable simulators, but not ground truth

Computational speed less of an issue, but still need to get thingsdone

Important low probability events

Uncertainty and (model) risk are THE concern

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 6 / 36

Page 17: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Common to the problems

“Real” state space is huge with lots of uncertainty and parameters

Batch data are available

Reasonable simulators, but not ground truth

Computational speed less of an issue, but still need to get thingsdone

Important low probability events

Uncertainty and (model) risk are THE concern

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 6 / 36

Page 18: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

This talk

How to choose the “right" model based on available data.

How to optimize when the model is not (fully) known?

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 7 / 36

Page 19: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Part I: The Model Selection Problem

We will focus on a simpler problem:1 Ignore action completely (MRP). We have: State→ reward→

next state.2 We observe a sequence of T observations and rewards that occur

in some space O × R (O is complicated)

D(T ) = (o1, r1,o2, r2, . . . ,oT , rT ).

3 We are given K mappings from O to states spaces S1, . . . ,SK ,belong to MRPs M1, . . . ,MK , respectively.Each mapping Hi : O → Si describes a model whereSi = {x (i)

1 , . . . , x (i)|Si |} is the state space of the MRP Mi .

4 We do not describe how the mappings {Hi}Ki=1 are constructed.[HalakM, KDD 2013]

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 8 / 36

Page 20: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

The Identification Problem

A model selection criterion takes as input DT and the modelsM1, . . . ,Mk , and it returns one of the k models as the proposed truemodel.

Definition: A model selection criterion is weakly consistent if

Pi(

M(DT ) 6= i)→ 0 as n→∞,

where Pi is the probability induced when model i is the correct model.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 9 / 36

Page 21: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Penalized Likelihood Criteria

In abstraction: data samples y1, y2, . . . , yT .

Li(T ) = maxθ{log P(y1, . . . , yT |Mi(θ))}.

We denote the dimension of θ by |Mi |. Then, an MDL model estimatorhas the following structure

MDL(i) , |Mi |f (T )− Li(T ),

where f (T ) is some sub-linear function.

Many related criteria: AIC, BIC, and many others.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 10 / 36

Page 22: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Impossibility result

Theorem: There does not exist a consistent MDL-like criterion.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 11 / 36

Page 23: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Identifying Markov Reward Processes

We look at the aggregate prediction error.

Two types of aggregations:1 Reward aggregation2 Transition probability aggregation

We will focus on refined models: M1 � M2 is M2 is a refinement of M1.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 12 / 36

Page 24: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Reward Aggregation

Define reward mean square error (RMSE) operator to be

LiRMSE (DT ) =

1T

∑j∈Si

ε(x ij ),

where ε(x ij ) is the error in state j in model i of the reward estimate.

Observation: limT→∞ LiRMSE (DT ) =

∑x∈Si

π(x)Var(x).

Lemma: Suppose Mi contains Mk . Then, for a single trajectory DT wehave Li

RMSE (DT ) ≤ LkRMSE (DT ). Moreover, if the states aggregated in

Mi are with different mean rewards, then the inequality is sharp.

Corollary: Consider a series of refined models M1 � . . . � Mk . Then,

L1RMSE (DT ) ≥ L2

RMSE (DT ) ≥ . . . ≥ LkRMSE (DT ).

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 13 / 36

Page 25: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Reward Aggregation Score

The (reward) score for the j-th model to be

M(j) = |Mj |f (T )

T+ Lj

RMSE (DT ), (1)

where f (T ) is a sub-linear increasing function withlimT→∞ f (T )/

√T →∞.

Based on the RMSE, we consider the following model selector

MRMSE = arg minj

{M(j)

}.

Theorem: The model selector MMSE is weakly consistent.

Finite time analysis gives rates.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 14 / 36

Page 26: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Experiments with artificial data

The figure reports the test statistic M(k) for different model dimensionsk . The error bars are one standard deviation from the mean.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 15 / 36

Page 27: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Experiments with artificial data

The test statistic AIC(k)/BIC(k) for different model dimensions k .

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 16 / 36

Page 28: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Experiments with real data

Large US apparel retailer.RFM measures: Recency, Frequency and Monetary value

Problem: How to aggregate? Focus on recency

1 Randomly2 Most recent3 Least recent

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 17 / 36

Page 29: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Real data

(random) (lowest) (highest)

Each graph is for a different value of f (T )/T : blue =1, green = 10, red= 50.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 18 / 36

Page 30: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Mini-conclusion

A very special model selection problem.

Standard approaches fail - but not all is lost

How to aggregate?

Mismatched models?

MRP→ MDP: for optimization and for model discovery

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 19 / 36

Page 31: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Mini-conclusion

A very special model selection problem.

Standard approaches fail - but not all is lost

How to aggregate?

Mismatched models?

MRP→ MDP: for optimization and for model discovery

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 19 / 36

Page 32: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Part II: Robust MDPs

When in doubt—assume the worse

Model = {S,P,R,A}

Assume:S and A are known and givenP ∈ P and R ∈ R are not known.

Look for a policy with best worst-case performance.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 20 / 36

Page 33: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Part II: Robust MDPs

When in doubt—assume the worse

Model = {S,P,R,A}

Assume:S and A are known and givenP ∈ P and R ∈ R are not known.

Look for a policy with best worst-case performance.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 20 / 36

Page 34: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Part II: Robust MDPs

When in doubt—assume the worse

Model = {S,P,R,A}

Assume:S and A are known and givenP ∈ P and R ∈ R are not known.

Look for a policy with best worst-case performance.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 20 / 36

Page 35: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Robust MDPs

Problem becomes:

(∗) maxpolicy

mindisturbance

Epolicy , disturbance

[∑t

γt rt

]

Game against natureIn general: problem is NP-hardNice (non robust) generative probabilistic interpretation:If Prgen(disturbance) ≥ 1− δ, then (∗) approximates percentileoptimization

Can get generalization bounds: Robustness = generalization

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 21 / 36

Page 36: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Robust MDPs

Problem becomes:

(∗) maxpolicy

mindisturbance

Epolicy , disturbance

[∑t

γt rt

]

Game against natureIn general: problem is NP-hardNice (non robust) generative probabilistic interpretation:If Prgen(disturbance) ≥ 1− δ, then (∗) approximates percentileoptimization

Can get generalization bounds: Robustness = generalization

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 21 / 36

Page 37: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Robust MDPs

Problem becomes:

(∗) maxpolicy

mindisturbance

Epolicy , disturbance

[∑t

γt rt

]

Game against natureIn general: problem is NP-hardNice (non robust) generative probabilistic interpretation:If Prgen(disturbance) ≥ 1− δ, then (∗) approximates percentileoptimization

Can get generalization bounds: Robustness = generalization

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 21 / 36

Page 38: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Robust MDPs

Problem becomes:

(∗) maxpolicy

mindisturbance

Epolicy , disturbance

[∑t

γt rt

]

Game against natureIn general: problem is NP-hardNice (non robust) generative probabilistic interpretation:If Prgen(disturbance) ≥ 1− δ, then (∗) approximates percentileoptimization

Can get generalization bounds: Robustness = generalization

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 21 / 36

Page 39: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Robust MDPs (Cont’)

Problem is solvable when disturbance is a product uncertainty:

disturbance =

{(P,R) : P ∈

∏s

Up(s),R ∈∏

s

Ur (s)

}

Take the worst outcome for every state.Eventually solved by Nilim and El-Ghoui (OR, 2005) using DP(assuming regular uncertainty sets).If really bad events are to be included: must make Up and Ur huge→ conservativenessA complicated tradeoff to make: can’t correlate bad events

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 22 / 36

Page 40: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Robust MDPs (Cont’)

Problem is solvable when disturbance is a product uncertainty:

disturbance =

{(P,R) : P ∈

∏s

Up(s),R ∈∏

s

Ur (s)

}

Take the worst outcome for every state.Eventually solved by Nilim and El-Ghoui (OR, 2005) using DP(assuming regular uncertainty sets).If really bad events are to be included: must make Up and Ur huge→ conservativenessA complicated tradeoff to make: can’t correlate bad events

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 22 / 36

Page 41: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Robust MDPs (Cont’)

Problem is solvable when disturbance is a product uncertainty:

disturbance =

{(P,R) : P ∈

∏s

Up(s),R ∈∏

s

Ur (s)

}

Take the worst outcome for every state.Eventually solved by Nilim and El-Ghoui (OR, 2005) using DP(assuming regular uncertainty sets).If really bad events are to be included: must make Up and Ur huge→ conservativenessA complicated tradeoff to make: can’t correlate bad events

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 22 / 36

Page 42: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Uncertainty sets

Convex sets→ large uncertainty.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 23 / 36

Page 43: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Useful uncertainty sets I

At most K disasters:

disturbanceK ={

(P,R) : Ps = Pnom,Rs = Rnom except at most

s1, .., sK where Psi ,Rsi ∈ (∆P,∆R)}

A generative model:Every node get disturbed with probability µ

Prgenerative(K disturbances) = 1− tail of polynomial distribution

=K∑

k=0

(Kk

)µk (1− µ)K−k

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 24 / 36

Page 44: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Useful uncertainty sets I

At most K disasters:

disturbanceK ={

(P,R) : Ps = Pnom,Rs = Rnom except at most

s1, .., sK where Psi ,Rsi ∈ (∆P,∆R)}

A generative model:Every node get disturbed with probability µ

Prgenerative(K disturbances) = 1− tail of polynomial distribution

=K∑

k=0

(Kk

)µk (1− µ)K−k

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 24 / 36

Page 45: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Useful uncertainty sets II

Continuous uncertainty “budget"

Suppose there is probability Prdist ((P, r); (Pnom,Rnom)).

A generative model:Every node get disturbed with a continuous probability Prdist :

Prgenerative

(disturbance) =∏s∈S

Prdist((Ps,Rs); (Ps nom,Rs nom)

))disturbanceλ =

{(P,R) : Pr

generative(disturbance) ≤ λ

}Not convex→ Cannot solve

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 25 / 36

Page 46: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Useful uncertainty sets II

Continuous uncertainty “budget"

Suppose there is probability Prdist ((P, r); (Pnom,Rnom)).

A generative model:Every node get disturbed with a continuous probability Prdist :

Prgenerative

(disturbance) =∏s∈S

Prdist((Ps,Rs); (Ps nom,Rs nom)

))disturbanceλ =

{(P,R) : Pr

generative(disturbance) ≤ λ

}Not convex→ Cannot solve

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 25 / 36

Page 47: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Useful uncertainty sets II

Continuous uncertainty “budget"

Suppose there is probability Prdist ((P, r); (Pnom,Rnom)).

A generative model:Every node get disturbed with a continuous probability Prdist :

Prgenerative

(disturbance) =∏s∈S

Prdist((Ps,Rs); (Ps nom,Rs nom)

))disturbanceλ =

{(P,R) : Pr

generative(disturbance) ≤ λ

}Not convex→ Cannot solve

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 25 / 36

Page 48: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Useful uncertainty sets II

Continuous uncertainty “budget"

Suppose there is probability Prdist ((P, r); (Pnom,Rnom)).

A generative model:Every node get disturbed with a continuous probability Prdist :

Prgenerative

(disturbance) =∏s∈S

Prdist((Ps,Rs); (Ps nom,Rs nom)

))disturbanceλ =

{(P,R) : Pr

generative(disturbance) ≤ λ

}Not convex→ Cannot solve

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 25 / 36

Page 49: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

State Augmentation

Original state space: SNew state space: S × {0,1, . . . ,K} for discrete disastersNew state space: S × [0, λ] for uncertainty budget

Recipe: Use your favorite algorithm with augmented state space.1 Can use structural properties (monotonicity) in uncertainty.2 For continuous uncertainty, can discretize efficiently.

Caveat: must assume uncertainty is observed

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 26 / 36

Page 50: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

State Augmentation

Original state space: SNew state space: S × {0,1, . . . ,K} for discrete disastersNew state space: S × [0, λ] for uncertainty budget

Recipe: Use your favorite algorithm with augmented state space.1 Can use structural properties (monotonicity) in uncertainty.2 For continuous uncertainty, can discretize efficiently.

Caveat: must assume uncertainty is observed

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 26 / 36

Page 51: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

State Augmentation

Original state space: SNew state space: S × {0,1, . . . ,K} for discrete disastersNew state space: S × [0, λ] for uncertainty budget

Recipe: Use your favorite algorithm with augmented state space.1 Can use structural properties (monotonicity) in uncertainty.2 For continuous uncertainty, can discretize efficiently.

Caveat: must assume uncertainty is observed

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 26 / 36

Page 52: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Scaling-up robust MDPs

Smallish robust MDPs are easy (robust or not)Non-robust MDPs can be handled with function approximation orpolicy search methods.

What about robust MDPs?

Difficulty 1: State space in many problems is very largeDifficulty 2: Parameters of dynamics rarely known: hard to doMonte-Carlo

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 27 / 36

Page 53: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Scaling-up robust MDPs

Smallish robust MDPs are easy (robust or not)Non-robust MDPs can be handled with function approximation orpolicy search methods.

What about robust MDPs?

Difficulty 1: State space in many problems is very largeDifficulty 2: Parameters of dynamics rarely known: hard to doMonte-Carlo

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 27 / 36

Page 54: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Scaling-up robust MDPs

Smallish robust MDPs are easy (robust or not)Non-robust MDPs can be handled with function approximation orpolicy search methods.

What about robust MDPs?

Difficulty 1: State space in many problems is very largeDifficulty 2: Parameters of dynamics rarely known: hard to doMonte-Carlo

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 27 / 36

Page 55: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

A crash course on ADPIn ADP (Approximate Dynamic Programming) we want to solve theBellman equation.

(Policy evaluation): WriteT πy = r + γPy

Bellman equation is Jπ = T πJπ.

Assume: J = Φ · w(Φ is known and given and w is a vector in, say, Rn.)With approximation:

Φ · w = ΠT πΦ · w

(Π is a projection onto {Φw : w ∈ Rn})

Not too hard to show that ΠT π is a contraction→ unique fixedpoint.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 28 / 36

Page 56: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

A crash course on ADPIn ADP (Approximate Dynamic Programming) we want to solve theBellman equation.

(Policy evaluation): WriteT πy = r + γPy

Bellman equation is Jπ = T πJπ.

Assume: J = Φ · w(Φ is known and given and w is a vector in, say, Rn.)With approximation:

Φ · w = ΠT πΦ · w

(Π is a projection onto {Φw : w ∈ Rn})

Not too hard to show that ΠT π is a contraction→ unique fixedpoint.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 28 / 36

Page 57: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Back to robust MDPs

In robust MDPs:

Jπ(s) = r(s, π(s)) + γ infP∈P

Eπ[Jπ(s′)|s, π(s)

]Writing the pessimistic transition as σπv = infp∈P p>v we can write:

T πy = r + γσπ(y)

Not linear anymore!

Can show that Tv = supπ T πv is a contraction [Iyengar, 2005]

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 29 / 36

Page 58: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Solving Robust MDPs

Fixed point equation (writing J = Φ · w):

Φ · w = ΠT πΦ · w (∗)

In general: no solution.

But: If for some β < 1 we have that

P(s′|s, π(s)) ≤ β

γP(s′|s, π(s)) for all P

(P is the conditional according to which data are observed.)Then:

1 ΠTπ is a contraction and we have a unique solution to (*).2 This solution is close to the optimal one.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 30 / 36

Page 59: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Solving Robust MDPs

Fixed point equation (writing J = Φ · w):

Φ · w = ΠT πΦ · w (∗)

In general: no solution.

But: If for some β < 1 we have that

P(s′|s, π(s)) ≤ β

γP(s′|s, π(s)) for all P

(P is the conditional according to which data are observed.)Then:

1 ΠTπ is a contraction and we have a unique solution to (*).2 This solution is close to the optimal one.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 30 / 36

Page 60: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Solving Robust MDPs (cont’)Value iteration:

Φ · wk+1 = ΠT πΦ · wk (∗)

Solution:

wk+1 = (Φ>DΦ)−1(

ΦT Dr + γΦ>Dσπ(Φwk ))

Three terms to control:

Φ>DΦ ≈ 1N

N∑t=1

φ(st )φ(s>t ), Φ>Dr ≈ 1N

N∑t=1

φ(st )r(st , π(st ))

and

Φ>Dσπ(Φwk ) ≈ 1N

N∑t=1

φ(st )σP(Φwk )

But how to we compute the pessimistic σP(Φwk )?

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 31 / 36

Page 61: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Solving the inner problem

Pessimizing oracle:

infp∈P

∑s is reachable

p(s)φ(s)>wk

Can be solved whenP(s,a) = {p : dist(p, p) ≤ ε, and p is a probability} and some otherlucky cases.

So we know how to “find" the value function of a given policy.Finding optimal policy: SARSA:

1 Define φ(s,a) instead of just φ(s)2 Let πw = arg maxa φ(s,a)>w .3 Do value iteration in the new state-action problem.

[Tamar M Xu, ICML 2014]

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 32 / 36

Page 62: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Solving the inner problem

Pessimizing oracle:

infp∈P

∑s is reachable

p(s)φ(s)>wk

Can be solved whenP(s,a) = {p : dist(p, p) ≤ ε, and p is a probability} and some otherlucky cases.

So we know how to “find" the value function of a given policy.Finding optimal policy: SARSA:

1 Define φ(s,a) instead of just φ(s)2 Let πw = arg maxa φ(s,a)>w .3 Do value iteration in the new state-action problem.

[Tamar M Xu, ICML 2014]

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 32 / 36

Page 63: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Example from option trading

Trading an option:

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 33 / 36

Page 64: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Back to motivating examples I

Unit commitment problem (day-ahead/few days ahead):

Classical approach is N/N + 1Renewables (wind/solar) and demand-response cause muchstochasticityModeling is hard (complex internal state, dependencies,high-dimensionality)Rare events are essential: need to robustify.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 34 / 36

Page 65: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Back to motivating examples II

Marketing problem:

Our model is quite bad to start withRobust optimization fights you model mismatchUncertainty set construction is mysterious (but is it important?)

Main issue: ValidationSecondary issue: opimization

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 35 / 36

Page 66: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Back to motivating examples II

Marketing problem:

Our model is quite bad to start withRobust optimization fights you model mismatchUncertainty set construction is mysterious (but is it important?)

Main issue: ValidationSecondary issue: opimization

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 35 / 36

Page 67: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Conclusions

Model misspecification→ parameter uncertainty→ robustapproach

Why uncertainty matters?Models come from data, data are a bitchReward comes from data or humans, this is even worse

Outlook:Contextual parametersLatent models [Bandits: MaillardM ICML 2014]Policy evaluationModel validation

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 36 / 36

Page 68: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Conclusions

Model misspecification→ parameter uncertainty→ robustapproach

Why uncertainty matters?Models come from data, data are a bitchReward comes from data or humans, this is even worse

Outlook:Contextual parametersLatent models [Bandits: MaillardM ICML 2014]Policy evaluationModel validation

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 36 / 36

Page 69: Planning and Model Selection in Data Driven Markov modelschercheurs.lille.inria.fr/~ghavamza/ICML-2014_LTV... · 2014-10-24 · Planning and Model Selection in Data Driven Markov

Conclusions

Model misspecification→ parameter uncertainty→ robustapproach

Why uncertainty matters?Models come from data, data are a bitchReward comes from data or humans, this is even worse

Outlook:Contextual parametersLatent models [Bandits: MaillardM ICML 2014]Policy evaluationModel validation

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 36 / 36