emergent communication for collaborative reinforcement ... · emergent communication for...
TRANSCRIPT
![Page 1: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/1.jpg)
Emergent Communication forCollaborative Reinforcement Learning
Yarin Gal and Rowan McAllister
MLG RCC
8 May 2014
![Page 2: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/2.jpg)
Game Theory
Multi-Agent Reinforcement Learning
Learning Communication
![Page 3: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/3.jpg)
Nash Equilibrium
“Nash equilibria are game-states s.t. no player would fare better byunilateral1 change of their own action.”
1Performed by or affecting only one person involved in a situation, without the agreement of another.
3 of 40
![Page 4: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/4.jpg)
Prisoner’s Dilemma
Sideshow Bob
Cooperate Defect
Sn
ake Cooperate 1,1 3,0
Defect 0,3 2,2
(prison sentence in years)
4 of 40
![Page 5: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/5.jpg)
Pareto Efficiency
“Pareto optima are game-states s.t. no alternative state existswhereby each player would fare equal or better.”
5 of 40
![Page 6: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/6.jpg)
Iterated Prisoner Dilemma Strategies
At = Cooperate (C ), Defect (D)
St = CC , CD, DC , DD (previous game outcome)
π : ×ti=2Si → At
Possible strategies π for Snake:
I Tit-for-Tat:
π(st) =
C , if t = 1;aBob,t−1 , if t > 1
I Reinforce actions conditioned on game outcomes:
π(st) = arg mina ET[accumulated prison years|st , a]update transition model T
6 of 40
![Page 7: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/7.jpg)
Game Theory
Multi-Agent Reinforcement Learning
Learning Communication
![Page 8: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/8.jpg)
Multi-Agent Reinforcement Learning
How can we learn mutually-beneficial collaboration strategies?
I Modelling:multi-agent-MDPs, dec-MDPs
I Issues solving joint tasks:I decentralised knowledge with no centralised control,I credit assignment,I communication constraints
I Issues affecting individual agents:I state space explodes: O(|S|#agents),I coadapatation → dynamic non-Markov environment
8 of 40
![Page 9: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/9.jpg)
Markov Decision Process (MDP)
Stochastic environment characterised by tuple S,A,R,T,γ, where:
I R : S×A× S ′ → R ∈ (−∞,∞)
I T : S×A× S ′ → R ∈ [0, 1]
I γ ∈ [0, 1]
9 of 40
![Page 10: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/10.jpg)
Multi-agent MDP (MMDP)
N-agent stochastic game characterised by tuple S,A,R,T,γ, where:
I S = ×Ni=1Si
I A = ×Ni=1Ai
I R = ×Ni=1Ri , Ri : S×A× S ′ → R
I T : S×A× S ′ → R
10 of 40
![Page 11: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/11.jpg)
Multi-agent Q-learning
I Oblivious agents [Sen et al., 1994]
Qi (s, ai ) ← (1 − α)Qi (s, ai ) + α[Ri (s, ai ) + γVi (s′)]
V ∗i (s) = maxai∈Ai
Q∗i (s, ai )
I Common-payoff games [Claus and Boutilier, 1998]
Qi (s, a) ← (1 − α)Qi (s, a) + α[Ri (s, a, s ′) + γVi (s′)]
Vi (s) ← maxai∈A
∑a−i∈A/Ai
Pi (s, a−i )Qi (s, ai , a−i )
11 of 40
![Page 12: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/12.jpg)
Multi-agent Q-learning
I Oblivious agents [Sen et al., 1994]
Qi (s, ai ) ← (1 − α)Qi (s, ai ) + α[Ri (s, ai ) + γVi (s′)]
V ∗i (s) = maxai∈Ai
Q∗i (s, ai )
I Common-payoff games [Claus and Boutilier, 1998]
Qi (s, a) ← (1 − α)Qi (s, a) + α[Ri (s, a, s ′) + γVi (s′)]
Vi (s) ← maxai∈A
∑a−i∈A/Ai
Pi (s, a−i )Qi (s, ai , a−i )
11 of 40
![Page 13: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/13.jpg)
Independent vs Cooperative Learning
[Tan, 1993]: “Can N communicating agents outperform Nnon-communicating agents?”
Ways of communication:I Agents share Q-learning updates (thus syncing Q-values):
I Pro: each agent learns N-fold faster (per timestep),I Note: same asymptotic performance as independent agents.
I Agents share sensory information:I Pro: more information → better policies,I Con: more information → larger state space → slower learning.
12 of 40
![Page 14: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/14.jpg)
Independent vs Cooperative Learning
[Tan, 1993]: “Can N communicating agents outperform Nnon-communicating agents?”
Ways of communication:I Agents share Q-learning updates (thus syncing Q-values):
I Pro: each agent learns N-fold faster (per timestep),I Note: same asymptotic performance as independent agents.
I Agents share sensory information:I Pro: more information → better policies,I Con: more information → larger state space → slower learning.
12 of 40
![Page 15: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/15.jpg)
Hunter-Prey Problem
prey hunter
10× 10 grid world.
x
y
perceptual state, visual depth 2(prey’s relative position).|S| = 52 + 1 = 26
R =
1.0 : a hunter catches a prey, i.e. xi , yi = 0, 0
−0.1 : otherwise
13 of 40
![Page 16: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/16.jpg)
Hunter-Prey Experiments
Experiment 1 – any hunter catches a prey:
I Baseline: 2 independent hunters, |Si | = 52 + 1 = 26
I 2 hunters, communicating Q-value updates. |Si | = 26
Experiment 2 – both hunters catch same prey simultaneously:
I Baseline: 2 independent hunters, |Si | = 26
I 2 hunters, communicating own locations, |Si | ≈ 26 · 192 = 9386
I 2 hunters, communicating own+prey locations. |Si | ≈ (192 + 1) · 192 = 130682
14 of 40
![Page 17: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/17.jpg)
Hunter-Prey Results
50 100 150 200 250 300 350 400 450 500Number of Trials
010
2030
4050
Ave
rage
step
sin
train
ing
(cum
ulat
ive) Independent
Same-policy
Experiment 1: any hunter catchesa prey
0 2000 4000 6000 8000 10000Number of Trials
025
5075
100
125
150
Average
stepsintra
ining(fo
revery200tra
ils)
IndependentPassively-observingMutual-scouting
Experiment 2: Both hunterscatching same prey simultaneously
15 of 40
![Page 18: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/18.jpg)
Decentralised Sparse-Interaction MDP
[Melo and Veloso, 2011] Philosophy:
I N-agent coordination is hard since the size of the state spacegrows exponentially in N.
I ∴ Limit scope of coordination to where it’s probably more useful;plans and learn w.r.t. ‘local’ agent-agent interactions only.
The Dec-SIMDP framework determines when and how agents i and jcoordinate vs act independently.
‘Decentralised’ = have full joint S-observability, but not full individualS-observability (agent i only observes Si + nearby agents).
16 of 40
![Page 19: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/19.jpg)
Dec-SIMDP: Reducing Joint State Space
S1
S2
S1 × S2
Global coupling
S1
S2
S1 × S2
Local coupling only
17 of 40
![Page 20: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/20.jpg)
Dec-SIMDP: A Navigation Task
Navigation task: coordination necessarily only when crossing the narrowdoorway.
Si = 1, ..., 20, D,
Ai = N, S , E , W ,
Zi = Si ∪ 6, 15, D× 6, 15, D
R(s, a) =
2 if s = (20, 9)1 if s1 = 20, or s2 = 9
−20 if s = (D, D)0 otherwise
18 of 40
![Page 21: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/21.jpg)
Sparse Interaction [video]
Four interconnected modular robots cooperate to change configuration:line → ring
19 of 40
![Page 22: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/22.jpg)
Teammate Modelling
[Mundhe and Sen, 2000]
20 of 40
![Page 23: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/23.jpg)
Credit Assignment
How should individuals be individually credited w.r.t. total teamperformance (or utility)?
↔
21 of 40
![Page 24: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/24.jpg)
Communication
“Shall we both choose to cooperate next round?”
“OK.”
Sideshow Bob
Cooperate Defect
Sn
ake Cooperate 1,1 3,0
Defect 0,3 2,2
(prison sentence in years)
22 of 40
![Page 25: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/25.jpg)
Unknown Languages
“ ?”
“What?”
Alien
Cooperate Defect
Sn
ake Cooperate 1,1 3,0
Defect 0,3 2,2
(prison sentence in years)
23 of 40
![Page 26: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/26.jpg)
Game Theory
Multi-Agent Reinforcement Learning
Learning Communication
![Page 27: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/27.jpg)
Learning communication
I How learning communication can help in RL collaboration
I Approaches to learning communication (ranging from linguisticallymotivated to a pragmatic view)
I What problems exist with learning communication?
25 of 40
![Page 28: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/28.jpg)
Learning communication for collaboration
How can learning communication help in RL collaboration?
I Forgoes expensive expert time for protocol planning
I Allows for a decentralised system without an external authority todecide on a communication protocol
I Life-long learning (adaptive tasks, e.g. future proofed robots)
26 of 40
![Page 29: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/29.jpg)
Learning communication for collaboration
How can learning communication help in RL collaboration?I Forgoes expensive expert time for protocol planningI Allows for a decentralised system without an external authority to
decide on a communication protocolI Life-long learning (adaptive tasks, e.g. future proofed robots)
26 of 40
![Page 30: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/30.jpg)
Learning communication for collaboration
How can learning communication help in RL collaboration?
I Forgoes expensive expert time for protocol planning
I Allows for a decentralised system without an external authority todecide on a communication protocol
I Life-long learning (adaptive tasks, e.g. future proofed robots)
26 of 40
![Page 31: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/31.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – emergent languagesI Emergent languages
I Pidgin – a simplified language developed for communicationbetween groups that do not have a common language
I Creole – a pidgin language nativised by children as their primarylanguage, e.g. Singlish
27 of 40
![Page 32: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/32.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – computational models
I A computational model for emergent languages should account forI polysemy (a word might have different meanings),I synonymy (a meaning might have different words),I ambiguity (two agents might associate different meanings to the
same word),I and be open (agents may enter or leave the population, new words
might emerge to describe meanings).
28 of 40
![Page 33: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/33.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – computational models
I A computational model for emergent languages should account forI polysemy (a word might have different meanings),I synonymy (a meaning might have different words),I ambiguity (two agents might associate different meanings to the
same word),I and be open (agents may enter or leave the population, new words
might emerge to describe meanings).
28 of 40
![Page 34: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/34.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – computational models
I A computational model for emergent languages should account forI polysemy (a word might have different meanings),I synonymy (a meaning might have different words),I ambiguity (two agents might associate different meanings to the
same word),I and be open (agents may enter or leave the population, new words
might emerge to describe meanings).
28 of 40
![Page 35: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/35.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – computational models
I A computational model for emergent languages should account forI polysemy (a word might have different meanings),I synonymy (a meaning might have different words),I ambiguity (two agents might associate different meanings to the
same word),I and be open (agents may enter or leave the population, new words
might emerge to describe meanings).
28 of 40
![Page 36: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/36.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – computational models
I [Steels, 1996] constructs a model in which words map to featuresof an object
29 of 40
![Page 37: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/37.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – computational models
I Agents learn each-other’s word-feature mappings by selecting anobject and describing one of its distinctive features
30 of 40
![Page 38: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/38.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – computational models
I An agent’s word-feature mapping is reinforced when both agentsuse the same word to identify a distinctive feature of the object
31 of 40
![Page 39: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/39.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – formal framework
I Using RL we can formalise the ideas aboveI For example [Goldman et al., 2007] establish a formal framework
where agents using different languages learn to coordinateI In this framework a state space S describes the world,I Ai describes the actions the i ’th agent can perform,I Fi (s) is the probability that agent i is in state s,I Σi is the alphabet of messages agent i can communicate,I and oi is an observation of the state for agent i .
32 of 40
![Page 40: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/40.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – formal framework
I Using RL we can formalise the ideas aboveI For example [Goldman et al., 2007] establish a formal framework
where agents using different languages learn to coordinateI In this framework a state space S describes the world,I Ai describes the actions the i ’th agent can perform,I Fi (s) is the probability that agent i is in state s,I Σi is the alphabet of messages agent i can communicate,I and oi is an observation of the state for agent i .
32 of 40
![Page 41: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/41.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – formal framework
I Using RL we can formalise the ideas aboveI For example [Goldman et al., 2007] establish a formal framework
where agents using different languages learn to coordinateI In this framework a state space S describes the world,I Ai describes the actions the i ’th agent can perform,I Fi (s) is the probability that agent i is in state s,I Σi is the alphabet of messages agent i can communicate,I and oi is an observation of the state for agent i .
32 of 40
![Page 42: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/42.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – formal framework
I Using RL we can formalise the ideas aboveI For example [Goldman et al., 2007] establish a formal framework
where agents using different languages learn to coordinateI In this framework a state space S describes the world,I Ai describes the actions the i ’th agent can perform,I Fi (s) is the probability that agent i is in state s,I Σi is the alphabet of messages agent i can communicate,I and oi is an observation of the state for agent i .
32 of 40
![Page 43: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/43.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – formal framework
I Using RL we can formalise the ideas aboveI For example [Goldman et al., 2007] establish a formal framework
where agents using different languages learn to coordinateI In this framework a state space S describes the world,I Ai describes the actions the i ’th agent can perform,I Fi (s) is the probability that agent i is in state s,I Σi is the alphabet of messages agent i can communicate,I and oi is an observation of the state for agent i .
32 of 40
![Page 44: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/44.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – formal framework
I We define agent i ’s policy to be a mapping from sequences (thehistory) of state-message pairs to actions
δi : Ω∗ × Σ∗ → Ai ,
I and define a secondary mapping from sequences of state-messagepairs to messages
δΣi : Ω∗ × Σ∗ → Σi .
I A translation τ between languages Σ and Σ ′ is a distribution overmessage pairs; each agent holds a distribution Pτ,i overtranslations between its own language and other agents’ languages,
I And meaning is interpreted as “what belief state would cause meto send the message I just received”.
33 of 40
![Page 45: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/45.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – formal framework
I We define agent i ’s policy to be a mapping from sequences (thehistory) of state-message pairs to actions
δi : Ω∗ × Σ∗ → Ai ,
I and define a secondary mapping from sequences of state-messagepairs to messages
δΣi : Ω∗ × Σ∗ → Σi .
I A translation τ between languages Σ and Σ ′ is a distribution overmessage pairs; each agent holds a distribution Pτ,i overtranslations between its own language and other agents’ languages,
I And meaning is interpreted as “what belief state would cause meto send the message I just received”.
33 of 40
![Page 46: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/46.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – formal framework
I We define agent i ’s policy to be a mapping from sequences (thehistory) of state-message pairs to actions
δi : Ω∗ × Σ∗ → Ai ,
I and define a secondary mapping from sequences of state-messagepairs to messages
δΣi : Ω∗ × Σ∗ → Σi .
I A translation τ between languages Σ and Σ ′ is a distribution overmessage pairs; each agent holds a distribution Pτ,i overtranslations between its own language and other agents’ languages,
I And meaning is interpreted as “what belief state would cause meto send the message I just received”.
33 of 40
![Page 47: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/47.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – formal framework
I We define agent i ’s policy to be a mapping from sequences (thehistory) of state-message pairs to actions
δi : Ω∗ × Σ∗ → Ai ,
I and define a secondary mapping from sequences of state-messagepairs to messages
δΣi : Ω∗ × Σ∗ → Σi .
I A translation τ between languages Σ and Σ ′ is a distribution overmessage pairs; each agent holds a distribution Pτ,i overtranslations between its own language and other agents’ languages,
I And meaning is interpreted as “what belief state would cause meto send the message I just received”.
33 of 40
![Page 48: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/48.jpg)
Learning communication: a model
Overview of the framework
34 of 40
![Page 49: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/49.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – formal framework
I Several experiments where used to assess the framework.I For example, two agents work to meet at a point in a gridworld
according to a belief over the location of the other.
I Messages describing an agent’s location are exchanged and theirtranslations are updated depending on whether the agents meet ornot.
I The optimal policies are assumed to be known before the agentstry to learn how to communicate.
35 of 40
![Page 50: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/50.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – a pragmatic view
I Use in roboticsI A leader robot controlling a follower robot [Yanco and Stein, 1993]I Small robots pushing a box towards a source of light [Mataric, 1998]
Figure: Leader-follower robots Figure: Box pushing
36 of 40
![Page 51: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/51.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – a pragmatic view
I Use in roboticsI A leader robot controlling a follower robot
Communication diagram
37 of 40
![Page 52: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/52.jpg)
Approaches to learning communication
From linguistic motivation to a pragmatic view – a pragmatic view
I Use in roboticsI A leader robot controlling a follower robot
Reinforcement regime38 of 40
![Page 53: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/53.jpg)
Why is learning communication difficult?
What problems exist with learning communication?I Difficult to specify a framework
I Many partial frameworks proposed with different approachesI State space explosionI Difficult to use for RL collaboration
I No framework has been shown to improve on independent RL
39 of 40
![Page 54: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/54.jpg)
Why is learning communication difficult?
What problems exist with learning communication?I Difficult to specify a framework
I Many partial frameworks proposed with different approachesI State space explosionI Difficult to use for RL collaboration
I No framework has been shown to improve on independent RL
39 of 40
![Page 55: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/55.jpg)
Why is learning communication difficult?
What problems exist with learning communication?
I Difficult to specify a frameworkI Many partial frameworks proposed with different approaches
I State space explosionI Difficult to use for RL collaboration
I No framework has been shown to improve on independent RL
39 of 40
![Page 56: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/56.jpg)
Why is learning communication difficult?
What problems exist with learning communication?
I Difficult to specify a frameworkI Many partial frameworks proposed with different approaches
I State space explosionI Difficult to use for RL collaboration
I No framework has been shown to improve on independent RL
These problems are not fully answered in currentresearch.
39 of 40
![Page 57: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/57.jpg)
Up, Up and Away
Where this might go
I Learning communication based on sparse interactionsI Reduce state space complexity
I Selecting what to listen to in incoming communicationI State space selection
I Cyber-warfare – better computer worms?I Developing unique communication protocols between cliques of
agents
I Online learning of communicationI Introducing a new agent into a system with existing agentsI Finding optimal policy with agents ignorant of one another, and
then allowing agents to start communicating to improvecollaboration
40 of 40
![Page 58: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/58.jpg)
Up, Up and Away
Where this might go
I Learning communication based on sparse interactionsI Reduce state space complexity
I Selecting what to listen to in incoming communicationI State space selection
I Cyber-warfare – better computer worms?I Developing unique communication protocols between cliques of
agents
I Online learning of communicationI Introducing a new agent into a system with existing agentsI Finding optimal policy with agents ignorant of one another, and
then allowing agents to start communicating to improvecollaboration
40 of 40
![Page 59: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/59.jpg)
Up, Up and Away
Where this might go
I Learning communication based on sparse interactionsI Reduce state space complexity
I Selecting what to listen to in incoming communicationI State space selection
I Cyber-warfare – better computer worms?I Developing unique communication protocols between cliques of
agents
I Online learning of communicationI Introducing a new agent into a system with existing agentsI Finding optimal policy with agents ignorant of one another, and
then allowing agents to start communicating to improvecollaboration
40 of 40
![Page 60: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/60.jpg)
Up, Up and Away
Where this might go
I Learning communication based on sparse interactionsI Reduce state space complexity
I Selecting what to listen to in incoming communicationI State space selection
I Cyber-warfare – better computer worms?I Developing unique communication protocols between cliques of
agents
I Online learning of communicationI Introducing a new agent into a system with existing agentsI Finding optimal policy with agents ignorant of one another, and
then allowing agents to start communicating to improvecollaboration
40 of 40
![Page 61: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/61.jpg)
Up, Up and Away
Where this might go
I Learning communication based on sparse interactionsI Reduce state space complexity
I Selecting what to listen to in incoming communicationI State space selection
I Cyber-warfare – better computer worms?I Developing unique communication protocols between cliques of
agentsI Online learning of communication
I Introducing a new agent into a system with existing agentsI Finding optimal policy with agents ignorant of one another, and
then allowing agents to start communicating to improvecollaboration
Lots to do for future research!40 of 40
![Page 62: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/62.jpg)
Claus, C. and Boutilier, C. (1998).The dynamics of reinforcement learning in cooperative multiagentsystems.In AAAI/IAAI, pages 746–752.
Goldman, C. V., Allen, M., and Zilberstein, S. (2007).Learning to communicate in a decentralized environment.Autonomous Agents and Multi-Agent Systems, 15(1):47–90.
Mataric, M. J. (1998).Using communication to reduce locality in distributed multiagentlearning.Journal of experimental & theoretical artificial intelligence,10(3):357–369.
Melo, F. S. and Veloso, M. (2011).Decentralized mdps with sparse interactions.Artificial Intelligence, 175(11):1757–1789.
Mundhe, M. and Sen, S. (2000).Evolving agent socienties that avoid social dilemmas.
40 of 40
![Page 63: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/63.jpg)
In GECCO, pages 809–816.
Sen, S., Sekaran, M., Hale, J., et al. (1994).Learning to coordinate without sharing information.In AAAI, pages 426–431.
Steels, L. (1996).Emergent adaptive lexicons.From animals to animats, 4:562–567.
Tan, M. (1993).Multi-agent reinforcement learning: Independent vs. cooperativeagents.In Proceedings of the tenth international conference on machinelearning, volume 337. Amherst, MA.
Yanco, H. and Stein, L. A. (1993).An adaptive communication protocol for cooperating mobilerobots.In Meyer, JA, HL Roitblat, and S. Wilson (1993) From Animals toAnimats 2. Proceedings of the Second International Conference on
40 of 40
![Page 64: Emergent Communication for Collaborative Reinforcement ... · Emergent Communication for Collaborative Reinforcement Learning Yarin Gal and Rowan McAllister MLG RCC 8 May 2014. Game](https://reader034.vdocuments.site/reader034/viewer/2022043006/5f8fa95417867e0d8658d005/html5/thumbnails/64.jpg)
Simulation of Adaptive Behavior. The MIT Press, Cambridge Ma,pages 478–485.
40 of 40