presentation icml learning to collaborate11-14...learning to collaborate in markov decision...

7
Learning to Collaborate in Markov Decision Processes Goran Radanovic, Rati Devidze, David C. Parkes, Adish Singla

Upload: others

Post on 24-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: presentation icml learning to collaborate11-14...Learning to Collaborate in Markov Decision Processes Goran Radanovic, Rati Devidze, David C. Parkes, AdishSingla. Motivation: Human-AI

Learning to Collaborate in Markov Decision Processes

Goran Radanovic, Rati Devidze, David C. Parkes, Adish Singla

Page 2: presentation icml learning to collaborate11-14...Learning to Collaborate in Markov Decision Processes Goran Radanovic, Rati Devidze, David C. Parkes, AdishSingla. Motivation: Human-AI

Motivation: Human-AI Collaboration

2

Commits topolicy !"

(Best) responds to !"

Behavioral differences Agents have different models of the world

Task

[Dimitrakakis et al., NIPS 2017]

Helper-AI Human

Agent A1 Agent A2

Example setting

Page 3: presentation icml learning to collaborate11-14...Learning to Collaborate in Markov Decision Processes Goran Radanovic, Rati Devidze, David C. Parkes, AdishSingla. Motivation: Human-AI

Motivation: Human-AI Collaboration

3

Commits topolicy !"

Agent A2!# changes over time

Task

Can we utilize learning to adopt a good policy for A1 despite the changing behavior of A2, without detailing A2's learning dynamics?

Helper-AI Human

Agent A1Humans change/adapt their behavior over

time.

Page 4: presentation icml learning to collaborate11-14...Learning to Collaborate in Markov Decision Processes Goran Radanovic, Rati Devidze, David C. Parkes, AdishSingla. Motivation: Human-AI

Formal Model: Two-agent MDP

• Episodic two-agent MDP with commitments

• Goal: design a learning algorithm for A1 that achieves a sublinear regret– Implies near optimality for smooth MDPs

4

Agent A1

Rewards and transitions are non-stationary.

Page 5: presentation icml learning to collaborate11-14...Learning to Collaborate in Markov Decision Processes Goran Radanovic, Rati Devidze, David C. Parkes, AdishSingla. Motivation: Human-AI

Experts with Double Recency Bias

• Based on experts in MDPs:– Assign an experts algorithm to each state– Use ! values as experts’ losses

• Introduce double recency bias

5

[Even-Dar et al., NIPS 2005]

" − 1" − %Recency windowing

&',)

Recency modulation

*' =1Γ-)./

0&',)!

Page 6: presentation icml learning to collaborate11-14...Learning to Collaborate in Markov Decision Processes Goran Radanovic, Rati Devidze, David C. Parkes, AdishSingla. Motivation: Human-AI

Main Results (Informally)

6

Theorem: The regret or ExpDRBias decays as

!(#$%& '()*+, ,./ ), provided that the magnitude change

of A2’s policy is !( #(1).

Theorem: Assume that the magnitude change of A2’s policy is Ω(1). Then achieving a sublinear regret is at least as hard as learning parity with noise.

Page 7: presentation icml learning to collaborate11-14...Learning to Collaborate in Markov Decision Processes Goran Radanovic, Rati Devidze, David C. Parkes, AdishSingla. Motivation: Human-AI

Thank you!

• Visit me at the poster session!

7