planning and model selection in data driven markov...

Post on 01-Apr-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Planning and Model Selection in Data DrivenMarkov models

Shie Mannor

Department of Electrical EngineeringTechnion

Joint work with many people along the way: Dotan Di-Castro (Yahoo!), Assaf Halak(Technion), Ofir Mebel (Apple), Aviv Tamar (Technion), John Tsitsiklis (MIT), Huan Xu

(NUS), and many others.

ICML, June 2014

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 1 / 36

Classical planning problems

We typically want to maximize the expected average/discountedreward

In planning:Model is “known”A single scalar reward

We believe the model to be true, even though it is not ...

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 2 / 36

Classical planning problems

We typically want to maximize the expected average/discountedreward

In planning:Model is “known”A single scalar reward

We believe the model to be true, even though it is not ...

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 2 / 36

Classical planning problems

We typically want to maximize the expected average/discountedreward

In planning:Model is “known”A single scalar reward

We believe the model to be true, even though it is not ...

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 2 / 36

Flavors of uncertainty

“Natural" uncertainty: random transitions/rewards→ classicalMDP planning. Not this talk.

Deterministic uncertainty in the parameters→ Robust MDPs. 1/2of this talk.

Probabilistic uncertainty in the parameters→ Bayesian RL/MDPs→ Not this talk (but a book is coming!).

Model uncertainty: model is not known→ second 1/2 of this talk.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 3 / 36

Flavors of uncertainty

“Natural" uncertainty: random transitions/rewards→ classicalMDP planning. Not this talk.

Deterministic uncertainty in the parameters→ Robust MDPs. 1/2of this talk.

Probabilistic uncertainty in the parameters→ Bayesian RL/MDPs→ Not this talk (but a book is coming!).

Model uncertainty: model is not known→ second 1/2 of this talk.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 3 / 36

Flavors of uncertainty

“Natural" uncertainty: random transitions/rewards→ classicalMDP planning. Not this talk.

Deterministic uncertainty in the parameters→ Robust MDPs. 1/2of this talk.

Probabilistic uncertainty in the parameters→ Bayesian RL/MDPs→ Not this talk (but a book is coming!).

Model uncertainty: model is not known→ second 1/2 of this talk.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 3 / 36

Flavors of uncertainty

“Natural" uncertainty: random transitions/rewards→ classicalMDP planning. Not this talk.

Deterministic uncertainty in the parameters→ Robust MDPs. 1/2of this talk.

Probabilistic uncertainty in the parameters→ Bayesian RL/MDPs→ Not this talk (but a book is coming!).

Model uncertainty: model is not known→ second 1/2 of this talk.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 3 / 36

Motivation I: Power grid

Unit commitment scheduling problem (very dynamic)Objective: guarantee reliability (99.9%)

Model is kind of known, but highly stochastic.Failures (outages) do happen!

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 4 / 36

Motivation I: Power grid

Unit commitment scheduling problem (very dynamic)Objective: guarantee reliability (99.9%)

Model is kind of known, but highly stochastic.Failures (outages) do happen!

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 4 / 36

Motivation IILarge US retailer (Fortune 500 company)Marketing problem: send or not send coupon/invitation/mail ordercatalogue

Common wisdom: per customer look at RFMRecency, Frequency, Monetary value

Dynamics matterHow to discretize?

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 5 / 36

Motivation IILarge US retailer (Fortune 500 company)Marketing problem: send or not send coupon/invitation/mail ordercatalogue

Common wisdom: per customer look at RFMRecency, Frequency, Monetary value

Dynamics matterHow to discretize?

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 5 / 36

Motivation IILarge US retailer (Fortune 500 company)Marketing problem: send or not send coupon/invitation/mail ordercatalogue

Common wisdom: per customer look at RFMRecency, Frequency, Monetary value

Dynamics matterHow to discretize?

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 5 / 36

Motivation IILarge US retailer (Fortune 500 company)Marketing problem: send or not send coupon/invitation/mail ordercatalogue

Common wisdom: per customer look at RFMRecency, Frequency, Monetary value

Dynamics matterHow to discretize?

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 5 / 36

Common to the problems

“Real” state space is huge with lots of uncertainty and parameters

Batch data are available

Reasonable simulators, but not ground truth

Computational speed less of an issue, but still need to get thingsdone

Important low probability events

Uncertainty and (model) risk are THE concern

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 6 / 36

Common to the problems

“Real” state space is huge with lots of uncertainty and parameters

Batch data are available

Reasonable simulators, but not ground truth

Computational speed less of an issue, but still need to get thingsdone

Important low probability events

Uncertainty and (model) risk are THE concern

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 6 / 36

Common to the problems

“Real” state space is huge with lots of uncertainty and parameters

Batch data are available

Reasonable simulators, but not ground truth

Computational speed less of an issue, but still need to get thingsdone

Important low probability events

Uncertainty and (model) risk are THE concern

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 6 / 36

This talk

How to choose the “right" model based on available data.

How to optimize when the model is not (fully) known?

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 7 / 36

Part I: The Model Selection Problem

We will focus on a simpler problem:1 Ignore action completely (MRP). We have: State→ reward→

next state.2 We observe a sequence of T observations and rewards that occur

in some space O × R (O is complicated)

D(T ) = (o1, r1,o2, r2, . . . ,oT , rT ).

3 We are given K mappings from O to states spaces S1, . . . ,SK ,belong to MRPs M1, . . . ,MK , respectively.Each mapping Hi : O → Si describes a model whereSi = {x (i)

1 , . . . , x (i)|Si |} is the state space of the MRP Mi .

4 We do not describe how the mappings {Hi}Ki=1 are constructed.[HalakM, KDD 2013]

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 8 / 36

The Identification Problem

A model selection criterion takes as input DT and the modelsM1, . . . ,Mk , and it returns one of the k models as the proposed truemodel.

Definition: A model selection criterion is weakly consistent if

Pi(

M(DT ) 6= i)→ 0 as n→∞,

where Pi is the probability induced when model i is the correct model.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 9 / 36

Penalized Likelihood Criteria

In abstraction: data samples y1, y2, . . . , yT .

Li(T ) = maxθ{log P(y1, . . . , yT |Mi(θ))}.

We denote the dimension of θ by |Mi |. Then, an MDL model estimatorhas the following structure

MDL(i) , |Mi |f (T )− Li(T ),

where f (T ) is some sub-linear function.

Many related criteria: AIC, BIC, and many others.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 10 / 36

Impossibility result

Theorem: There does not exist a consistent MDL-like criterion.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 11 / 36

Identifying Markov Reward Processes

We look at the aggregate prediction error.

Two types of aggregations:1 Reward aggregation2 Transition probability aggregation

We will focus on refined models: M1 � M2 is M2 is a refinement of M1.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 12 / 36

Reward Aggregation

Define reward mean square error (RMSE) operator to be

LiRMSE (DT ) =

1T

∑j∈Si

ε(x ij ),

where ε(x ij ) is the error in state j in model i of the reward estimate.

Observation: limT→∞ LiRMSE (DT ) =

∑x∈Si

π(x)Var(x).

Lemma: Suppose Mi contains Mk . Then, for a single trajectory DT wehave Li

RMSE (DT ) ≤ LkRMSE (DT ). Moreover, if the states aggregated in

Mi are with different mean rewards, then the inequality is sharp.

Corollary: Consider a series of refined models M1 � . . . � Mk . Then,

L1RMSE (DT ) ≥ L2

RMSE (DT ) ≥ . . . ≥ LkRMSE (DT ).

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 13 / 36

Reward Aggregation Score

The (reward) score for the j-th model to be

M(j) = |Mj |f (T )

T+ Lj

RMSE (DT ), (1)

where f (T ) is a sub-linear increasing function withlimT→∞ f (T )/

√T →∞.

Based on the RMSE, we consider the following model selector

MRMSE = arg minj

{M(j)

}.

Theorem: The model selector MMSE is weakly consistent.

Finite time analysis gives rates.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 14 / 36

Experiments with artificial data

The figure reports the test statistic M(k) for different model dimensionsk . The error bars are one standard deviation from the mean.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 15 / 36

Experiments with artificial data

The test statistic AIC(k)/BIC(k) for different model dimensions k .

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 16 / 36

Experiments with real data

Large US apparel retailer.RFM measures: Recency, Frequency and Monetary value

Problem: How to aggregate? Focus on recency

1 Randomly2 Most recent3 Least recent

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 17 / 36

Real data

(random) (lowest) (highest)

Each graph is for a different value of f (T )/T : blue =1, green = 10, red= 50.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 18 / 36

Mini-conclusion

A very special model selection problem.

Standard approaches fail - but not all is lost

How to aggregate?

Mismatched models?

MRP→ MDP: for optimization and for model discovery

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 19 / 36

Mini-conclusion

A very special model selection problem.

Standard approaches fail - but not all is lost

How to aggregate?

Mismatched models?

MRP→ MDP: for optimization and for model discovery

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 19 / 36

Part II: Robust MDPs

When in doubt—assume the worse

Model = {S,P,R,A}

Assume:S and A are known and givenP ∈ P and R ∈ R are not known.

Look for a policy with best worst-case performance.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 20 / 36

Part II: Robust MDPs

When in doubt—assume the worse

Model = {S,P,R,A}

Assume:S and A are known and givenP ∈ P and R ∈ R are not known.

Look for a policy with best worst-case performance.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 20 / 36

Part II: Robust MDPs

When in doubt—assume the worse

Model = {S,P,R,A}

Assume:S and A are known and givenP ∈ P and R ∈ R are not known.

Look for a policy with best worst-case performance.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 20 / 36

Robust MDPs

Problem becomes:

(∗) maxpolicy

mindisturbance

Epolicy , disturbance

[∑t

γt rt

]

Game against natureIn general: problem is NP-hardNice (non robust) generative probabilistic interpretation:If Prgen(disturbance) ≥ 1− δ, then (∗) approximates percentileoptimization

Can get generalization bounds: Robustness = generalization

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 21 / 36

Robust MDPs

Problem becomes:

(∗) maxpolicy

mindisturbance

Epolicy , disturbance

[∑t

γt rt

]

Game against natureIn general: problem is NP-hardNice (non robust) generative probabilistic interpretation:If Prgen(disturbance) ≥ 1− δ, then (∗) approximates percentileoptimization

Can get generalization bounds: Robustness = generalization

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 21 / 36

Robust MDPs

Problem becomes:

(∗) maxpolicy

mindisturbance

Epolicy , disturbance

[∑t

γt rt

]

Game against natureIn general: problem is NP-hardNice (non robust) generative probabilistic interpretation:If Prgen(disturbance) ≥ 1− δ, then (∗) approximates percentileoptimization

Can get generalization bounds: Robustness = generalization

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 21 / 36

Robust MDPs

Problem becomes:

(∗) maxpolicy

mindisturbance

Epolicy , disturbance

[∑t

γt rt

]

Game against natureIn general: problem is NP-hardNice (non robust) generative probabilistic interpretation:If Prgen(disturbance) ≥ 1− δ, then (∗) approximates percentileoptimization

Can get generalization bounds: Robustness = generalization

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 21 / 36

Robust MDPs (Cont’)

Problem is solvable when disturbance is a product uncertainty:

disturbance =

{(P,R) : P ∈

∏s

Up(s),R ∈∏

s

Ur (s)

}

Take the worst outcome for every state.Eventually solved by Nilim and El-Ghoui (OR, 2005) using DP(assuming regular uncertainty sets).If really bad events are to be included: must make Up and Ur huge→ conservativenessA complicated tradeoff to make: can’t correlate bad events

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 22 / 36

Robust MDPs (Cont’)

Problem is solvable when disturbance is a product uncertainty:

disturbance =

{(P,R) : P ∈

∏s

Up(s),R ∈∏

s

Ur (s)

}

Take the worst outcome for every state.Eventually solved by Nilim and El-Ghoui (OR, 2005) using DP(assuming regular uncertainty sets).If really bad events are to be included: must make Up and Ur huge→ conservativenessA complicated tradeoff to make: can’t correlate bad events

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 22 / 36

Robust MDPs (Cont’)

Problem is solvable when disturbance is a product uncertainty:

disturbance =

{(P,R) : P ∈

∏s

Up(s),R ∈∏

s

Ur (s)

}

Take the worst outcome for every state.Eventually solved by Nilim and El-Ghoui (OR, 2005) using DP(assuming regular uncertainty sets).If really bad events are to be included: must make Up and Ur huge→ conservativenessA complicated tradeoff to make: can’t correlate bad events

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 22 / 36

Uncertainty sets

Convex sets→ large uncertainty.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 23 / 36

Useful uncertainty sets I

At most K disasters:

disturbanceK ={

(P,R) : Ps = Pnom,Rs = Rnom except at most

s1, .., sK where Psi ,Rsi ∈ (∆P,∆R)}

A generative model:Every node get disturbed with probability µ

Prgenerative(K disturbances) = 1− tail of polynomial distribution

=K∑

k=0

(Kk

)µk (1− µ)K−k

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 24 / 36

Useful uncertainty sets I

At most K disasters:

disturbanceK ={

(P,R) : Ps = Pnom,Rs = Rnom except at most

s1, .., sK where Psi ,Rsi ∈ (∆P,∆R)}

A generative model:Every node get disturbed with probability µ

Prgenerative(K disturbances) = 1− tail of polynomial distribution

=K∑

k=0

(Kk

)µk (1− µ)K−k

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 24 / 36

Useful uncertainty sets II

Continuous uncertainty “budget"

Suppose there is probability Prdist ((P, r); (Pnom,Rnom)).

A generative model:Every node get disturbed with a continuous probability Prdist :

Prgenerative

(disturbance) =∏s∈S

Prdist((Ps,Rs); (Ps nom,Rs nom)

))disturbanceλ =

{(P,R) : Pr

generative(disturbance) ≤ λ

}Not convex→ Cannot solve

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 25 / 36

Useful uncertainty sets II

Continuous uncertainty “budget"

Suppose there is probability Prdist ((P, r); (Pnom,Rnom)).

A generative model:Every node get disturbed with a continuous probability Prdist :

Prgenerative

(disturbance) =∏s∈S

Prdist((Ps,Rs); (Ps nom,Rs nom)

))disturbanceλ =

{(P,R) : Pr

generative(disturbance) ≤ λ

}Not convex→ Cannot solve

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 25 / 36

Useful uncertainty sets II

Continuous uncertainty “budget"

Suppose there is probability Prdist ((P, r); (Pnom,Rnom)).

A generative model:Every node get disturbed with a continuous probability Prdist :

Prgenerative

(disturbance) =∏s∈S

Prdist((Ps,Rs); (Ps nom,Rs nom)

))disturbanceλ =

{(P,R) : Pr

generative(disturbance) ≤ λ

}Not convex→ Cannot solve

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 25 / 36

Useful uncertainty sets II

Continuous uncertainty “budget"

Suppose there is probability Prdist ((P, r); (Pnom,Rnom)).

A generative model:Every node get disturbed with a continuous probability Prdist :

Prgenerative

(disturbance) =∏s∈S

Prdist((Ps,Rs); (Ps nom,Rs nom)

))disturbanceλ =

{(P,R) : Pr

generative(disturbance) ≤ λ

}Not convex→ Cannot solve

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 25 / 36

State Augmentation

Original state space: SNew state space: S × {0,1, . . . ,K} for discrete disastersNew state space: S × [0, λ] for uncertainty budget

Recipe: Use your favorite algorithm with augmented state space.1 Can use structural properties (monotonicity) in uncertainty.2 For continuous uncertainty, can discretize efficiently.

Caveat: must assume uncertainty is observed

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 26 / 36

State Augmentation

Original state space: SNew state space: S × {0,1, . . . ,K} for discrete disastersNew state space: S × [0, λ] for uncertainty budget

Recipe: Use your favorite algorithm with augmented state space.1 Can use structural properties (monotonicity) in uncertainty.2 For continuous uncertainty, can discretize efficiently.

Caveat: must assume uncertainty is observed

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 26 / 36

State Augmentation

Original state space: SNew state space: S × {0,1, . . . ,K} for discrete disastersNew state space: S × [0, λ] for uncertainty budget

Recipe: Use your favorite algorithm with augmented state space.1 Can use structural properties (monotonicity) in uncertainty.2 For continuous uncertainty, can discretize efficiently.

Caveat: must assume uncertainty is observed

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 26 / 36

Scaling-up robust MDPs

Smallish robust MDPs are easy (robust or not)Non-robust MDPs can be handled with function approximation orpolicy search methods.

What about robust MDPs?

Difficulty 1: State space in many problems is very largeDifficulty 2: Parameters of dynamics rarely known: hard to doMonte-Carlo

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 27 / 36

Scaling-up robust MDPs

Smallish robust MDPs are easy (robust or not)Non-robust MDPs can be handled with function approximation orpolicy search methods.

What about robust MDPs?

Difficulty 1: State space in many problems is very largeDifficulty 2: Parameters of dynamics rarely known: hard to doMonte-Carlo

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 27 / 36

Scaling-up robust MDPs

Smallish robust MDPs are easy (robust or not)Non-robust MDPs can be handled with function approximation orpolicy search methods.

What about robust MDPs?

Difficulty 1: State space in many problems is very largeDifficulty 2: Parameters of dynamics rarely known: hard to doMonte-Carlo

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 27 / 36

A crash course on ADPIn ADP (Approximate Dynamic Programming) we want to solve theBellman equation.

(Policy evaluation): WriteT πy = r + γPy

Bellman equation is Jπ = T πJπ.

Assume: J = Φ · w(Φ is known and given and w is a vector in, say, Rn.)With approximation:

Φ · w = ΠT πΦ · w

(Π is a projection onto {Φw : w ∈ Rn})

Not too hard to show that ΠT π is a contraction→ unique fixedpoint.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 28 / 36

A crash course on ADPIn ADP (Approximate Dynamic Programming) we want to solve theBellman equation.

(Policy evaluation): WriteT πy = r + γPy

Bellman equation is Jπ = T πJπ.

Assume: J = Φ · w(Φ is known and given and w is a vector in, say, Rn.)With approximation:

Φ · w = ΠT πΦ · w

(Π is a projection onto {Φw : w ∈ Rn})

Not too hard to show that ΠT π is a contraction→ unique fixedpoint.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 28 / 36

Back to robust MDPs

In robust MDPs:

Jπ(s) = r(s, π(s)) + γ infP∈P

Eπ[Jπ(s′)|s, π(s)

]Writing the pessimistic transition as σπv = infp∈P p>v we can write:

T πy = r + γσπ(y)

Not linear anymore!

Can show that Tv = supπ T πv is a contraction [Iyengar, 2005]

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 29 / 36

Solving Robust MDPs

Fixed point equation (writing J = Φ · w):

Φ · w = ΠT πΦ · w (∗)

In general: no solution.

But: If for some β < 1 we have that

P(s′|s, π(s)) ≤ β

γP(s′|s, π(s)) for all P

(P is the conditional according to which data are observed.)Then:

1 ΠTπ is a contraction and we have a unique solution to (*).2 This solution is close to the optimal one.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 30 / 36

Solving Robust MDPs

Fixed point equation (writing J = Φ · w):

Φ · w = ΠT πΦ · w (∗)

In general: no solution.

But: If for some β < 1 we have that

P(s′|s, π(s)) ≤ β

γP(s′|s, π(s)) for all P

(P is the conditional according to which data are observed.)Then:

1 ΠTπ is a contraction and we have a unique solution to (*).2 This solution is close to the optimal one.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 30 / 36

Solving Robust MDPs (cont’)Value iteration:

Φ · wk+1 = ΠT πΦ · wk (∗)

Solution:

wk+1 = (Φ>DΦ)−1(

ΦT Dr + γΦ>Dσπ(Φwk ))

Three terms to control:

Φ>DΦ ≈ 1N

N∑t=1

φ(st )φ(s>t ), Φ>Dr ≈ 1N

N∑t=1

φ(st )r(st , π(st ))

and

Φ>Dσπ(Φwk ) ≈ 1N

N∑t=1

φ(st )σP(Φwk )

But how to we compute the pessimistic σP(Φwk )?

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 31 / 36

Solving the inner problem

Pessimizing oracle:

infp∈P

∑s is reachable

p(s)φ(s)>wk

Can be solved whenP(s,a) = {p : dist(p, p) ≤ ε, and p is a probability} and some otherlucky cases.

So we know how to “find" the value function of a given policy.Finding optimal policy: SARSA:

1 Define φ(s,a) instead of just φ(s)2 Let πw = arg maxa φ(s,a)>w .3 Do value iteration in the new state-action problem.

[Tamar M Xu, ICML 2014]

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 32 / 36

Solving the inner problem

Pessimizing oracle:

infp∈P

∑s is reachable

p(s)φ(s)>wk

Can be solved whenP(s,a) = {p : dist(p, p) ≤ ε, and p is a probability} and some otherlucky cases.

So we know how to “find" the value function of a given policy.Finding optimal policy: SARSA:

1 Define φ(s,a) instead of just φ(s)2 Let πw = arg maxa φ(s,a)>w .3 Do value iteration in the new state-action problem.

[Tamar M Xu, ICML 2014]

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 32 / 36

Example from option trading

Trading an option:

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 33 / 36

Back to motivating examples I

Unit commitment problem (day-ahead/few days ahead):

Classical approach is N/N + 1Renewables (wind/solar) and demand-response cause muchstochasticityModeling is hard (complex internal state, dependencies,high-dimensionality)Rare events are essential: need to robustify.

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 34 / 36

Back to motivating examples II

Marketing problem:

Our model is quite bad to start withRobust optimization fights you model mismatchUncertainty set construction is mysterious (but is it important?)

Main issue: ValidationSecondary issue: opimization

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 35 / 36

Back to motivating examples II

Marketing problem:

Our model is quite bad to start withRobust optimization fights you model mismatchUncertainty set construction is mysterious (but is it important?)

Main issue: ValidationSecondary issue: opimization

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 35 / 36

Conclusions

Model misspecification→ parameter uncertainty→ robustapproach

Why uncertainty matters?Models come from data, data are a bitchReward comes from data or humans, this is even worse

Outlook:Contextual parametersLatent models [Bandits: MaillardM ICML 2014]Policy evaluationModel validation

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 36 / 36

Conclusions

Model misspecification→ parameter uncertainty→ robustapproach

Why uncertainty matters?Models come from data, data are a bitchReward comes from data or humans, this is even worse

Outlook:Contextual parametersLatent models [Bandits: MaillardM ICML 2014]Policy evaluationModel validation

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 36 / 36

Conclusions

Model misspecification→ parameter uncertainty→ robustapproach

Why uncertainty matters?Models come from data, data are a bitchReward comes from data or humans, this is even worse

Outlook:Contextual parametersLatent models [Bandits: MaillardM ICML 2014]Policy evaluationModel validation

S. Mannor (Technion) Planning and Model Selection in Data Driven Markov modelsICML, June 2014 36 / 36

top related