how machines learn: from robot soccer to autonomous traffic › users › smmg › archive › 2007...
TRANSCRIPT
![Page 1: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/1.jpg)
How Machines Learn: From Robot Soccerto Autonomous Traffic
Peter Stone
Department or Computer SciencesThe University of Texas at Austin
![Page 2: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/2.jpg)
Research Question
To what degree can autonomousintelligent agents learn in the presence of
teammates and/or adversaries inreal-time, dynamic domains?
Peter Stone
![Page 3: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/3.jpg)
Research Question
To what degree can autonomousintelligent agents learn in the presence of
teammates and/or adversaries inreal-time, dynamic domains?
• Autonomous agents• Multiagent systems• Machine learning• Robotics
Peter Stone
![Page 4: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/4.jpg)
Autonomous Intelligent Agents
• They must sense their environment.• They must decide what action to take (“think”).• They must act in their environment.
Peter Stone
![Page 5: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/5.jpg)
Autonomous Intelligent Agents
• They must sense their environment.• They must decide what action to take (“think”).• They must act in their environment.
Complete Intelligent Agents
Peter Stone
![Page 6: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/6.jpg)
Autonomous Intelligent Agents
• They must sense their environment.• They must decide what action to take (“think”).• They must act in their environment.
Complete Intelligent Agents
• Interact with other agents (Multiagent systems)
Peter Stone
![Page 7: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/7.jpg)
Autonomous Intelligent Agents
• They must sense their environment.• They must decide what action to take (“think”).• They must act in their environment.
Complete Intelligent Agents
• Interact with other agents (Multiagent systems)• Improve performance from experience (Learning agents)
Peter Stone
![Page 8: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/8.jpg)
Autonomous Intelligent Agents
• They must sense their environment.• They must decide what action to take (“think”).• They must act in their environment.
Complete Intelligent Agents
• Interact with other agents (Multiagent systems)• Improve performance from experience (Learning agents)
Autonomous Bidding, Cognitive Systems,Robot Soccer, Traffic management
Peter Stone
![Page 9: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/9.jpg)
BE a learning agent
Peter Stone
![Page 10: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/10.jpg)
BE a learning agent
• You, as a group, act as a learning agent
Peter Stone
![Page 11: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/11.jpg)
BE a learning agent
• You, as a group, act as a learning agent
• Actions: Wave, Stand, Clap
Peter Stone
![Page 12: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/12.jpg)
BE a learning agent
• You, as a group, act as a learning agent
• Actions: Wave, Stand, Clap
• Observations: colors, reward
Peter Stone
![Page 13: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/13.jpg)
BE a learning agent
• You, as a group, act as a learning agent
• Actions: Wave, Stand, Clap
• Observations: colors, reward
• Goal: Find an optimal policy
Peter Stone
![Page 14: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/14.jpg)
BE a learning agent
• You, as a group, act as a learning agent
• Actions: Wave, Stand, Clap
• Observations: colors, reward
• Goal: Find an optimal policy
− Way of selecting actions that gets you the most reward
Peter Stone
![Page 15: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/15.jpg)
How did you do it?
Peter Stone
![Page 16: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/16.jpg)
How did you do it?
• What is your policy?
• What does the world look like?
Peter Stone
![Page 17: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/17.jpg)
Formalizing What Just HappenedKnowns:
Peter Stone
![Page 18: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/18.jpg)
Formalizing What Just HappenedKnowns:• O = {Blue, Red, Green, Black, . . .}• Rewards in IR• A = {Wave, Clap, Stand}
o0, a0, r0, o1, a1, r1, o2, . . .
Peter Stone
![Page 19: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/19.jpg)
Formalizing What Just HappenedKnowns:• O = {Blue, Red, Green, Black, . . .}• Rewards in IR• A = {Wave, Clap, Stand}
o0, a0, r0, o1, a1, r1, o2, . . .
Unknowns:
Peter Stone
![Page 20: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/20.jpg)
Formalizing What Just HappenedKnowns:• O = {Blue, Red, Green, Black, . . .}• Rewards in IR• A = {Wave, Clap, Stand}
o0, a0, r0, o1, a1, r1, o2, . . .
Unknowns:• S = 4x3 grid• R : S ×A 7→ IR• P = S 7→ O• T : S ×A 7→ S
Peter Stone
![Page 21: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/21.jpg)
Formalizing What Just HappenedKnowns:• O = {Blue, Red, Green, Black, . . .}• Rewards in IR• A = {Wave, Clap, Stand}
o0, a0, r0, o1, a1, r1, o2, . . .
Unknowns:• S = 4x3 grid• R : S ×A 7→ IR• P = S 7→ O• T : S ×A 7→ S
oi = P(si)
Peter Stone
![Page 22: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/22.jpg)
Formalizing What Just HappenedKnowns:• O = {Blue, Red, Green, Black, . . .}• Rewards in IR• A = {Wave, Clap, Stand}
o0, a0, r0, o1, a1, r1, o2, . . .
Unknowns:• S = 4x3 grid• R : S ×A 7→ IR• P = S 7→ O• T : S ×A 7→ S
oi = P(si) si = T (si−1, ai−1)
Peter Stone
![Page 23: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/23.jpg)
Formalizing What Just HappenedKnowns:• O = {Blue, Red, Green, Black, . . .}• Rewards in IR• A = {Wave, Clap, Stand}
o0, a0, r0, o1, a1, r1, o2, . . .
Unknowns:• S = 4x3 grid• R : S ×A 7→ IR• P = S 7→ O• T : S ×A 7→ S
oi = P(si) si = T (si−1, ai−1) ri = R(si, ai)
Peter Stone
![Page 24: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/24.jpg)
Reinforcement Learning
• Algorithms to select actions in such problems
Peter Stone
![Page 25: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/25.jpg)
Reinforcement Learning
• Algorithms to select actions in such problems
• Q-learning: provably converges to the optimal policy
Peter Stone
![Page 26: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/26.jpg)
Reinforcement Learning
• Algorithms to select actions in such problems
• Q-learning: provably converges to the optimal policy
− Proof: contraction mappings and fixed point theorem
Peter Stone
![Page 27: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/27.jpg)
A harder problem
• You had 3 actions and saw one of 10 colors
Peter Stone
![Page 28: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/28.jpg)
A harder problem
• You had 3 actions and saw one of 10 colors
• What if you had to control 12 joints . . .
Peter Stone
![Page 29: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/29.jpg)
A harder problem
• You had 3 actions and saw one of 10 colors
• What if you had to control 12 joints . . .
• . . . and saw something like this 30 times per second?
Peter Stone
![Page 30: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/30.jpg)
RoboCup
Peter Stone
![Page 31: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/31.jpg)
RoboCup
Goal: By the year 2050, a team of humanoid robotsthat can beat the human World Cup champion team.
Peter Stone
![Page 32: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/32.jpg)
RoboCup
Goal: By the year 2050, a team of humanoid robotsthat can beat the human World Cup champion team.
• An international research initiative
Peter Stone
![Page 33: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/33.jpg)
RoboCup
Goal: By the year 2050, a team of humanoid robotsthat can beat the human World Cup champion team.
• An international research initiative
• Drives research in many areas:
− Control algorithms; machine vision, sensing; localization;− Distributed computing; real-time systems;− Ad hoc networking; mechanical design;
Peter Stone
![Page 34: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/34.jpg)
RoboCup
Goal: By the year 2050, a team of humanoid robotsthat can beat the human World Cup champion team.
• An international research initiative
• Drives research in many areas:
− Control algorithms; machine vision, sensing; localization;− Distributed computing; real-time systems;− Ad hoc networking; mechanical design;− Multiagent systems; machine learning; robotics
Peter Stone
![Page 35: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/35.jpg)
RoboCup
Goal: By the year 2050, a team of humanoid robotsthat can beat the human World Cup champion team.
• An international research initiative
• Drives research in many areas:
− Control algorithms; machine vision, sensing; localization;− Distributed computing; real-time systems;− Ad hoc networking; mechanical design;− Multiagent systems; machine learning; robotics
Several Different Leagues
Peter Stone
![Page 36: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/36.jpg)
RoboCup Soccer
Peter Stone
![Page 37: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/37.jpg)
The Early Years
Peter Stone
![Page 38: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/38.jpg)
A Decade Later
Peter Stone
![Page 39: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/39.jpg)
Sony Aibo (ERS-210A, ERS-7)
Peter Stone
![Page 40: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/40.jpg)
Sony Aibo (ERS-210A, ERS-7)
Peter Stone
![Page 41: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/41.jpg)
Sony Aibo (ERS-210A, ERS-7)
Peter Stone
![Page 42: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/42.jpg)
Creating a team — Subtasks
Peter Stone
![Page 43: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/43.jpg)
Creating a team — Subtasks
• Vision• Localization• Walking• Ball manipulation (kicking)• Individual decision making• Communication/coordination
Peter Stone
![Page 44: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/44.jpg)
Creating a team — Subtasks
• Vision• Localization• Walking• Ball manipulation (kicking)• Individual decision making• Communication/coordination
Peter Stone
![Page 45: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/45.jpg)
Competitions
• Barely “closed the loop” by American Open (May, ’03)
Peter Stone
![Page 46: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/46.jpg)
Competitions
• Barely “closed the loop” by American Open (May, ’03)
• Improved significantly by Int’l RoboCup (July, ’03)
Peter Stone
![Page 47: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/47.jpg)
Competitions
• Barely “closed the loop” by American Open (May, ’03)
• Improved significantly by Int’l RoboCup (July, ’03)
• Won 3rd place at US Open (2004, 2005)
• Quarterfinalist at RoboCup (2004, 2005)
Peter Stone
![Page 48: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/48.jpg)
Competitions
• Barely “closed the loop” by American Open (May, ’03)
• Improved significantly by Int’l RoboCup (July, ’03)
• Won 3rd place at US Open (2004, 2005)
• Quarterfinalist at RoboCup (2004, 2005)
• Highlights:− Many saves: 1; 2; 3; 4;− Lots of goals: CMU; Penn; Penn; Germany;
− A nice clear− A counterattack goal
Peter Stone
![Page 49: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/49.jpg)
Post-competition: the CS research
Peter Stone
![Page 50: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/50.jpg)
Post-competition: the CS research
• Model-based joint control [Stronger, S, ’04]
• Learning sensor and action models [Stronger, S, ’06]
• Machine learning for fast walking [Kohl, S, ’04]
• Learning to acquire the ball [Fidelman, S, ’06]
• Color constancy on mobile robots [Sridharan, S, ’04]
• Robust particle filter localization [Sridharan, Kuhlmann, S, ’05]
• Autonomous Color Learning [Sridharan, S, ’05]
Peter Stone
![Page 51: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/51.jpg)
Policy Gradient RL to learn fast walk
Goal: Enable an Aibo to walk as fast as possible
Peter Stone
![Page 52: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/52.jpg)
Policy Gradient RL to learn fast walk
Goal: Enable an Aibo to walk as fast as possible
• Start with a parameterized walk
• Learn fastest possible parameters
Peter Stone
![Page 53: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/53.jpg)
Policy Gradient RL to learn fast walk
Goal: Enable an Aibo to walk as fast as possible
• Start with a parameterized walk
• Learn fastest possible parameters
• No simulator available:
− Learn entirely on robots− Minimal human intervention
Peter Stone
![Page 54: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/54.jpg)
Walking Aibos
• Walks that “come with” Aibo are slow
• RoboCup soccer: 25+ Aibo teams internationally
− Motivates faster walks
Peter Stone
![Page 55: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/55.jpg)
Walking Aibos
• Walks that “come with” Aibo are slow
• RoboCup soccer: 25+ Aibo teams internationally
− Motivates faster walks
Hand-tuned gaits [2003] Learned gaitsGerman UT Austin Hornby et al. Kim & UtherTeam Villa UNSW [1999] [2003]
230 mm/s 245 254 170 270 (±5)
Peter Stone
![Page 56: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/56.jpg)
A Parameterized Walk• Developed from scratch as part of UT Austin Villa 2003
• Trot gait with elliptical locus on each leg
Peter Stone
![Page 57: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/57.jpg)
Locus Parametersz
x
y
• Ellipse length• Ellipse height• Position on x axis• Position on y axis• Body height• Timing values
12 continuous parameters
Peter Stone
![Page 58: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/58.jpg)
Locus Parametersz
x
y
• Ellipse length• Ellipse height• Position on x axis• Position on y axis• Body height• Timing values
12 continuous parameters
• Hand tuning by April, ’03: 140 mm/s• Hand tuning by July, ’03: 245 mm/s
Peter Stone
![Page 59: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/59.jpg)
Experimental Setup• Policy π = {θ1, . . . , θ12}, V (π) = walk speed when using π
Peter Stone
![Page 60: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/60.jpg)
Experimental Setup• Policy π = {θ1, . . . , θ12}, V (π) = walk speed when using π
• Training Scenario
− Robots time themselves traversing fixed distance− Multiple traversals (3) per policy to account for noise
Peter Stone
![Page 61: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/61.jpg)
Experimental Setup• Policy π = {θ1, . . . , θ12}, V (π) = walk speed when using π
• Training Scenario
− Robots time themselves traversing fixed distance− Multiple traversals (3) per policy to account for noise− Multiple robots evaluate policies simultaneously− Off-board computer collects results, assigns policies
Peter Stone
![Page 62: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/62.jpg)
Experimental Setup• Policy π = {θ1, . . . , θ12}, V (π) = walk speed when using π
• Training Scenario
− Robots time themselves traversing fixed distance− Multiple traversals (3) per policy to account for noise− Multiple robots evaluate policies simultaneously− Off-board computer collects results, assigns policies
No human intervention except battery changes
Peter Stone
![Page 63: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/63.jpg)
Policy Gradient RL• From π want to move in direction of gradient of V (π)
Peter Stone
![Page 64: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/64.jpg)
Policy Gradient RL• From π want to move in direction of gradient of V (π)
− Can’t compute ∂V (π)∂θi
directly: estimate empirically
Peter Stone
![Page 65: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/65.jpg)
Policy Gradient RL• From π want to move in direction of gradient of V (π)
− Can’t compute ∂V (π)∂θi
directly: estimate empirically
• ∂V (π)∂θi
≈ V ({θ1 + ε, . . . , θ12})− V ({θ1 − ε, . . . , θ12})
Peter Stone
![Page 66: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/66.jpg)
Policy Gradient RL• From π want to move in direction of gradient of V (π)
− Can’t compute ∂V (π)∂θi
directly: estimate empirically
• ∂V (π)∂θi
≈ V ({θ1 + ε, . . . , θ12})− V ({θ1 − ε, . . . , θ12})
− Requires evaluation of 24 policies
Peter Stone
![Page 67: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/67.jpg)
Policy Gradient RL• From π want to move in direction of gradient of V (π)
− Can’t compute ∂V (π)∂θi
directly: estimate empirically
• ∂V (π)∂θi
≈ V ({θ1 + ε, . . . , θ12})− V ({θ1 − ε, . . . , θ12})
− Requires evaluation of 24 policies
• Instead, evaluate t (15) policies in the neighborhood of π
s.t. ith parameter is randomly θi ± ε or 0.
Peter Stone
![Page 68: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/68.jpg)
Policy Gradient RL• From π want to move in direction of gradient of V (π)
− Can’t compute ∂V (π)∂θi
directly: estimate empirically
• ∂V (π)∂θi
≈ V ({θ1 + ε, . . . , θ12})− V ({θ1 − ε, . . . , θ12})
− Requires evaluation of 24 policies
• Instead, evaluate t (15) policies in the neighborhood of π
s.t. ith parameter is randomly θi ± ε or 0.
• V ({θ1 + ε, . . . , θ12}) ≈ Avg+ε,1 ≡ policies with θ1 + ε
Peter Stone
![Page 69: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/69.jpg)
Policy Gradient RL• From π want to move in direction of gradient of V (π)
− Can’t compute ∂V (π)∂θi
directly: estimate empirically
• ∂V (π)∂θi
≈ V ({θ1 + ε, . . . , θ12})− V ({θ1 − ε, . . . , θ12})
− Requires evaluation of 24 policies
• Instead, evaluate t (15) policies in the neighborhood of π
s.t. ith parameter is randomly θi ± ε or 0.
• V ({θ1 + ε, . . . , θ12}) ≈ Avg+ε,1 ≡ policies with θ1 + ε
− Expect t/3 estimates for each of θi ± ε, 0− Each evaluation contributes to all 12 estimates
Peter Stone
![Page 70: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/70.jpg)
Gradient Estimation
Peter Stone
![Page 71: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/71.jpg)
Taking a step
Peter Stone
![Page 72: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/72.jpg)
Taking a step
Ai =
0 if Avg+0,i > Avg+ε,i and
Avg+0,i > Avg−ε,i
Avg+ε,i −Avg−ε,i otherwise(1)
Peter Stone
![Page 73: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/73.jpg)
Taking a step
Ai =
0 if Avg+0,i > Avg+ε,i and
Avg+0,i > Avg−ε,i
Avg+ε,i −Avg−ε,i otherwise(1)
• Normalize A, multiply by scalar step-size η
• π = π + ηA
Peter Stone
![Page 74: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/74.jpg)
Experiments• Started from stable, but fairly slow gait
• Used 3 robots simultaneously
• Each iteration takes 45 traversals, 712 minutes
Peter Stone
![Page 75: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/75.jpg)
Experiments• Started from stable, but fairly slow gait
• Used 3 robots simultaneously
• Each iteration takes 45 traversals, 712 minutes
Before learning After learning
• 24 iterations = 1080 field traversals, ≈ 3 hours
Peter Stone
![Page 76: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/76.jpg)
Results
180
200
220
240
260
280
300
0 5 10 15 20 25
Vel
ocity
(m
m/s
)
Number of Iterations
Velocity of Learned Gait during Training
(UT Austin Villa)
Learned Gait
Hand−tuned Gait
Hand−tuned Gait
Hand−tuned Gait
(UNSW)
(UNSW)
(German Team)
(UT Austin Villa)Learned Gait
Peter Stone
![Page 77: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/77.jpg)
Results
180
200
220
240
260
280
300
0 5 10 15 20 25
Vel
ocity
(m
m/s
)
Number of Iterations
Velocity of Learned Gait during Training
(UT Austin Villa)
Learned Gait
Hand−tuned Gait
Hand−tuned Gait
Hand−tuned Gait
(UNSW)
(UNSW)
(German Team)
(UT Austin Villa)Learned Gait
• Additional iterations didn’t help• Spikes: evaluation noise? large step size?
Peter Stone
![Page 78: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/78.jpg)
Learned ParametersParameter Initial ε Best
Value ValueFront ellipse:
(height) 4.2 0.35 4.081(x offset) 2.8 0.35 0.574(y offset) 4.9 0.35 5.152
Rear ellipse:(height) 5.6 0.35 6.02
(x offset) 0.0 0.35 0.217(y offset) -2.8 0.35 -2.982
Ellipse length 4.893 0.35 5.285Ellipse skew multiplier 0.035 0.175 0.049Front height 7.7 0.35 7.483Rear height 11.2 0.35 10.843Time to move
through locus 0.704 0.016 0.679Time on ground 0.5 0.05 0.430
Peter Stone
![Page 79: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/79.jpg)
Algorithmic Comparison, Robot Port
Before learning After learning
Peter Stone
![Page 80: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/80.jpg)
Summary
• Used policy gradient RL to learn fastest Aibo walk
• All learning done on real robots
• No human itervention (except battery changes)
Peter Stone
![Page 81: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/81.jpg)
Outline
• Machine learning for fast walking [Kohl, S, ’04]
• Learning to acquire the ball [Fidelman, S, ’06]
• Color constancy on mobile robots [Sridharan, S, ’05]
• Autonomous Color Learning [Sridharan, S, ’06]
Peter Stone
![Page 82: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/82.jpg)
Grasping the Ball
• Three stages: walk to ball; slow down; lower chin
• Head proprioception, IR chest sensor 7→ ball distance
• Movement specified by 4 parameters
Peter Stone
![Page 83: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/83.jpg)
Grasping the Ball
• Three stages: walk to ball; slow down; lower chin
• Head proprioception, IR chest sensor 7→ ball distance
• Movement specified by 4 parameters
Brittle!
Peter Stone
![Page 84: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/84.jpg)
Parameterization• slowdown dist: when to slow down
• slowdown factor: how much to slow down
• capture angle: when to stop turning
• capture dist: when to put down head
Peter Stone
![Page 85: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/85.jpg)
Learning the Chin Pinch
• Binary, noisy reinforcement signal: multiple trials
• Robot evaluates self: no human intervention
Peter Stone
![Page 86: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/86.jpg)
Results
• Evaluation of policy gradient, hill climbing, amoeba
0 2 4 6 8 10 120
10
20
30
40
50
60
70
80
90
100
succ
essf
ul c
aptu
res
out o
f 100
tria
ls
iterations
policy gradientamoebahill climbing
Peter Stone
![Page 87: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/87.jpg)
What it learned
Policy slowdown slowdown capture capture Successdist factor angle dist rate
Initial 200mm 0.7 15.0o 110mm 36%Policy gradient 125mm 1 17.4o 152mm 64%
Amoeba 208mm 1 33.4o 162mm 69%Hill climbing 240mm 1 35.0o 170mm 66%
Peter Stone
![Page 88: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/88.jpg)
Outline
• Machine learning for fast walking [Kohl, S, ’04]
• Learning to acquire the ball [Fidelman, S, ’06]
• Color constancy on mobile robots [Sridharan, S, ’05]
• Autonomous Color Learning [Sridharan, S, ’06]
Peter Stone
![Page 89: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/89.jpg)
Color Constancy
• Visual system’s ability to recognize true color acrossvariations in environment
Peter Stone
![Page 90: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/90.jpg)
Color Constancy
• Visual system’s ability to recognize true color acrossvariations in environment
• Challenge: Nonlinear variations in sensor response withchange in illumination
Peter Stone
![Page 91: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/91.jpg)
Color Constancy
• Visual system’s ability to recognize true color acrossvariations in environment
• Challenge: Nonlinear variations in sensor response withchange in illumination
• Mobile robots:
− Computational limitations− Changing camera positions
Peter Stone
![Page 92: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/92.jpg)
Sample Images
Peter Stone
![Page 93: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/93.jpg)
Sample Images
Peter Stone
![Page 94: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/94.jpg)
Sample Images
Peter Stone
![Page 95: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/95.jpg)
Sample Images
Peter Stone
![Page 96: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/96.jpg)
Our Goal
• Match current performance in changing lighting
• Experiments on ERS-210A robots
Peter Stone
![Page 97: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/97.jpg)
Autonomous Color Learning• Color Constancy: more tediously created maps
− Hand-labeling many images −→ hours of manual effort
Peter Stone
![Page 98: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/98.jpg)
Autonomous Color Learning• Color Constancy: more tediously created maps
− Hand-labeling many images −→ hours of manual effort
• Use the structured environment
− Robot learns color distributions
Peter Stone
![Page 99: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/99.jpg)
Autonomous Color Learning• Color Constancy: more tediously created maps
− Hand-labeling many images −→ hours of manual effort
• Use the structured environment
− Robot learns color distributions
• Comparable accuracy, 5 minutes of robot effort
Peter Stone
![Page 100: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/100.jpg)
Outline
• Learning on physical robots
− No simulation, minimal human intervention
Peter Stone
![Page 101: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/101.jpg)
Outline
• Learning on physical robots
− No simulation, minimal human intervention
• Motion: learning for fast walking
• Behavior: acquiring the ball
• Vision: color constancy, autonomous color learning
Peter Stone
![Page 102: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/102.jpg)
Outline
• Learning on physical robots
− No simulation, minimal human intervention
• Motion: learning for fast walking
• Behavior: acquiring the ball
• Vision: color constancy, autonomous color learning
• Multiagent Strategy: RL in simulation
Peter Stone
![Page 103: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/103.jpg)
RoboCup Simulator• Distributed: each player a separate client• Server models dynamics and kinematics
Peter Stone
![Page 104: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/104.jpg)
RoboCup Simulator• Distributed: each player a separate client• Server models dynamics and kinematics• Clients receive sensations, send actions
Client 1
Server
Client 2
Cycle t-1 t t+1 t+2
Peter Stone
![Page 105: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/105.jpg)
RoboCup Simulator• Distributed: each player a separate client• Server models dynamics and kinematics• Clients receive sensations, send actions
Client 1
Server
Client 2
Cycle t-1 t t+1 t+2
• Parametric actions: dash, turn, kick, say
Peter Stone
![Page 106: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/106.jpg)
RoboCup Simulator• Distributed: each player a separate client• Server models dynamics and kinematics• Clients receive sensations, send actions
Client 1
Server
Client 2
Cycle t-1 t t+1 t+2
• Parametric actions: dash, turn, kick, say• Abstract, noisy sensors, hidden state− Hear sounds from limited distance− See relative distance, angle to objects ahead
Peter Stone
![Page 107: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/107.jpg)
RoboCup Simulator• Distributed: each player a separate client• Server models dynamics and kinematics• Clients receive sensations, send actions
Client 1
Server
Client 2
Cycle t-1 t t+1 t+2
• Parametric actions: dash, turn, kick, say• Abstract, noisy sensors, hidden state− Hear sounds from limited distance− See relative distance, angle to objects ahead
• > 10923states
• Limited resources : stamina• Play occurs in real time (≈ human parameters)
Peter Stone
![Page 108: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/108.jpg)
3 vs. 2 Keepaway
Peter Stone
![Page 109: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/109.jpg)
3 vs. 2 Keepaway• Play in a small area (20m × 20m)
• Keepers try to keep the ball
• Takers try to get the ball
Peter Stone
![Page 110: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/110.jpg)
3 vs. 2 Keepaway• Play in a small area (20m × 20m)
• Keepers try to keep the ball
• Takers try to get the ball
• Episode:− Players and ball reset randomly− Ball starts near a keeper− Ends when taker gets the ball or ball goes out
Peter Stone
![Page 111: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/111.jpg)
3 vs. 2 Keepaway• Play in a small area (20m × 20m)
• Keepers try to keep the ball
• Takers try to get the ball
• Episode:− Players and ball reset randomly− Ball starts near a keeper− Ends when taker gets the ball or ball goes out
• Performance measure: average possession duration
• Use CMUnited-99 skills:− HoldBall, PassBall(k), GoToBall, GetOpen
Peter Stone
![Page 112: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/112.jpg)
The Keepers’ Policy Space
notBall
�����
������
���
JJ
JJ
JJJ���
������
��
JJ
JJ
JJ
GetOpen
GoToBall {HoldBall,PassBall(k)}(k is another keeper)
Teammate with ballor can get therefaster
kickable Ballkickable
Peter Stone
![Page 113: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/113.jpg)
The Keepers’ Policy Space
notBall
�����
������
���
JJ
JJ
JJJ���
������
��
JJ
JJ
JJ
GetOpen
GoToBall {HoldBall,PassBall(k)}(k is another keeper)
Teammate with ballor can get therefaster
kickable Ballkickable
Example PoliciesRandom: HoldBall or PassBall(k) randomly
Peter Stone
![Page 114: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/114.jpg)
The Keepers’ Policy Space
notBall
�����
������
���
JJ
JJ
JJJ���
������
��
JJ
JJ
JJ
GetOpen
GoToBall {HoldBall,PassBall(k)}(k is another keeper)
Teammate with ballor can get therefaster
kickable Ballkickable
Example PoliciesRandom: HoldBall or PassBall(k) randomlyHold: Always HoldBall
Peter Stone
![Page 115: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/115.jpg)
The Keepers’ Policy Space
notBall
�����
������
���
JJ
JJ
JJJ���
������
��
JJ
JJ
JJ
GetOpen
GoToBall {HoldBall,PassBall(k)}(k is another keeper)
Teammate with ballor can get therefaster
kickable Ballkickable
Example PoliciesRandom: HoldBall or PassBall(k) randomlyHold: Always HoldBallHand-coded:
If no taker within 10m: HoldBallElse If there’s a good pass: PassBall(k)Else HoldBall
Peter Stone
![Page 116: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/116.jpg)
Keeper’s State Variables
• 11 distances among players, ball, and center
• 2 angles to takers along passing lanes
Peter Stone
![Page 117: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/117.jpg)
Function Approximation: Tile Coding
• Form of sparse, coarse coding based on CMACS [Albus,
1981]
Peter Stone
![Page 118: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/118.jpg)
Function Approximation: Tile Coding
• Form of sparse, coarse coding based on CMACS [Albus,
1981]
Actionvalues
Fullsoccerstate
Fewstate
variables(continuous)
Sparse, coarse,tile coding
Linearmap
Huge binary feature vector(about 400 1’s and 40,000 0’s)
Peter Stone
![Page 119: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/119.jpg)
Main Result
0 1 0 2 0 2 54
6
8
1 0
1 2
1 4
EpisodeDuration(seconds)
Hours of Training Time(bins of 1000 episodes)
handcoded randomalwayshold
1 hour = 720 5-second episodes
Peter Stone
![Page 120: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/120.jpg)
Difficulty of Multiagent Learning
4
6
8
10
12
14
16
18
0 5 10 15 20
Epi
sode
Dur
atio
n (s
econ
ds)
Training Time (hours)
1 Learning2 Learning3 Learning
4
6
8
10
12
14
16
18
0 5 10 15 20
Epi
sode
Dur
atio
n (s
econ
ds)
Training Time (hours)
1 Learning2 Learning3 Learning
4
6
8
10
12
14
16
18
0 5 10 15 20
Epi
sode
Dur
atio
n (s
econ
ds)
Training Time (hours)
1 Learning2 Learning3 Learning
• Multiagent learning is harder!
Peter Stone
![Page 121: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/121.jpg)
Outline
• Robot soccer on real robots
• Robot soccer in simulation
Peter Stone
![Page 122: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/122.jpg)
Outline
• Robot soccer on real robots
• Robot soccer in simulation
• Autonomous driving
Peter Stone
![Page 123: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/123.jpg)
Acknowledgements
Thanks to all the Students Involved!
• Kurt Dresner, Nate Kohl, Peggy Fidelman, MohanSridharan, Richard Sutton
• Other members of the UT Austin Villa Legged Robot Team
• http://www.cs.utexas.edu/~AustinVilla
Peter Stone
![Page 124: How Machines Learn: From Robot Soccer to Autonomous Traffic › users › smmg › archive › 2007 › ... · • RoboCup soccer: 25+ Aibo teams internationally − Motivates faster](https://reader031.vdocuments.site/reader031/viewer/2022011904/5f1b591c026c5f3a334995c6/html5/thumbnails/124.jpg)
Acknowledgements
Thanks to all the Students Involved!
• Kurt Dresner, Nate Kohl, Peggy Fidelman, MohanSridharan, Richard Sutton
• Other members of the UT Austin Villa Legged Robot Team
• http://www.cs.utexas.edu/~AustinVilla
• Fox Sports World for inspiration!
Peter Stone