presentation icml learning to collaborate11-14...learning to collaborate in markov decision...

Learning to Collaborate in Markov Decision Processes

Goran Radanovic, Rati Devidze, David C. Parkes, Adish Singla

Motivation: Human-AI Collaboration

Commits topolicy !"

(Best) responds to !"

Behavioral differences Agents have different models of the world

[Dimitrakakis et al., NIPS 2017]

Helper-AI Human

Agent A1 Agent A2

Example setting

Motivation: Human-AI Collaboration

Commits topolicy !"

Agent A2!# changes over time

Can we utilize learning to adopt a good policy for A1 despite the changing behavior of A2, without detailing A2's learning dynamics?

Helper-AI Human

Agent A1Humans change/adapt their behavior over

Formal Model: Two-agent MDP

• Episodic two-agent MDP with commitments

• Goal: design a learning algorithm for A1 that achieves a sublinear regret– Implies near optimality for smooth MDPs

Agent A1

Rewards and transitions are non-stationary.

Experts with Double Recency Bias

• Based on experts in MDPs:– Assign an experts algorithm to each state– Use ! values as experts’ losses

• Introduce double recency bias

[Even-Dar et al., NIPS 2005]

" − 1" − %Recency windowing

Recency modulation

*' =1Γ-)./

0&',)!

Main Results (Informally)

Theorem: The regret or ExpDRBias decays as

!(#$%& '()*+, ,./ ), provided that the magnitude change

of A2’s policy is !( #(1).

Theorem: Assume that the magnitude change of A2’s policy is Ω(1). Then achieving a sublinear regret is at least as hard as learning parity with noise.

Thank you!

• Visit me at the poster session!

presentation icml learning to collaborate11-14...learning to collaborate in markov decision...

Documents

neil d. lawrence 11th july 2015 deep learning@icml

dlrlsummerschool.ca · reinforcement learning". icml-2016 p...

lubrication certification requirements (icml...

controllable text generation (icml 2017 under review)

big data bootstrap (icml読み会)

by sergey alexandrov icml 2003

research introspection “icml does icml”

icml 2016€¦ · welcome to new york and the 33rd...

icml · introduction to icml 2015 5 icml 2015 at a glance 7...

collaborate11 - kpis and scorecards using obiee11g

noria fml training brochure with icml certification exam

icml!2013!atlanta! conferenceprogram!

icml 2016: the information sieve

decision trees with minimal costs (icml 2004, banff, canada)

icml rhino rx brochure - cardekho

"active learning for multi-objective optimization" read icml...

icml ’11 tutorial: recommender problems for web...

20160716 icml paper reading, learning to generate with...

appendices for the icml paper “optimizing neural networks...

causata machine learning icml atlanta june 2013