dcops meet the real world: exploring unknown reward matrices with applications to mobile sensor...
Post on 19-Dec-2015
215 views
TRANSCRIPT
DCOPs Meet the Real World:Exploring Unknown Reward Matrices with Applications to Mobile Sensor
Networks
Manish JainMatthew E.
TaylorMakoto YokooMilindTambe
1Manish Jain
MotivationReal-world Applications of Mobile
Sensor Networks◦Robots in an urban setting◦Autonomous Under-water vehicles
2Manish Jain
Challenges
Rewards are unknown
Limited time-horizon
Anytime performance is important
3Manish Jain
Distributed Constraint Optimization for sensor networks◦[Lesser03, Zhang03, …]
Mobile Sensor Nets for Communication ◦[Cheng2005, Marden07, …]
Factor Graphs◦[Farinelli08, …]
Swarm Intelligence, Potential Games
Other Robotic Approaches …
Existing Models
Manish Jain 4
ContributionsPropose new algorithms for DCOPs
Seamlessly interleave Distributed Exploration and Distributed Exploitation
Tests on physical hardware
5Manish Jain
OutlineBackground on DCOPs
Solution Techniques
Experimental Results
Conclusions and Future Work
6Manish Jain
a2 a3 Reward
10
0
0
6
a1 a2 Reward
10
0
0
6
DCOP Framework
a1 a2 a3
7Manish Jain
Applying DCOP
Manish Jain 8
DCOP Construct Domain Equivalent
Agents Robots
Agent Values Set of Possible Locations
Reward on the Link Signal Strength between neighbors
Objective: Maximize Net Reward
Objective: Maximize net signal strength
k-Optimality [Pearce07]
1-optimal solutions: all or all R< > = 12
R< > = 6
a2 a3 Reward
10
0
0
6
a1 a2 Reward
10
0
0
6
a1 a2 a3
9Manish Jain
MGM-Omniscient
a1
a2
a3
a_i a_j Reward
10
0
0
6
10
Manish Jain
MGM-Omniscient
a1
a2
a3
10
11
Manish Jain
a_i a_j Reward
10
0
0
6
MGM-Omniscient
a1
a2
a3
a_i a_j Reward
10
0
0
6
10 12 10
12
Manish Jain
MGM-Omniscient
a1
a2
a3
a_i a_j Reward
10
0
0
6
10 12 10
a1
a2
a3
0 0 0
Only one agent per neighborhood allowed to change
Monotonic Algorithm13
Manish Jain
Solution TechniquesStatic Estimation
◦SE-Optimistic◦SE-Realistic
Balanced Exploration using Decision Theory◦BE-Backtrack◦BE-Rebid◦BE-Stay
14
Manish Jain
Static Estimation TechniquesSE-Optimistic
◦Always assume that exploration is better
◦Greedy Approach
15
Manish Jain
Static Estimation TechniquesSE-Optimistic
◦Always assume that exploration is better
◦Greedy Approach
SE-Realistic◦More conservative – assume
exploration gives mean reward◦Faster convergence
16
Manish Jain
17
Manish Jain
Balanced Exploration Techniques
BE-Backtrack◦Decision Theoretic Limit on
exploration◦Track previous best location Rb
◦State of the agent: (Rb,T)
18
Manish Jain
Balanced Exploration Techniques
Balanced Exploration Techniques
Manish Jain 19
Balanced Exploration Techniques
Manish Jain 20
Utility of Exploration
Balanced Exploration Techniques
Manish Jain 21
Utility of Backtrack after
Successful Exploration
Balanced Exploration Techniques
Manish Jain 22
Utility of Backtrack after Unsuccessful Exploration
BE-Rebid
◦Allows agents to backtrack
◦Re-evaluate every time-step
◦Allows for on-the-flyreasoning
◦Same equations as BE-Backtrack
23
Manish Jain
Balanced Exploration Techniques
BE-Stay◦Agents unable to backtrack◦Dynamic Programming Approach
24
Manish Jain
Balanced Exploration Techniques
Results
25
Manish Jain
Results
26
Manish Jain
Learning Curve (20 agents, chain, 100 rounds)
Results (simulation)
27
Manish Jain
(chain topology, 100 rounds)
5 15 30 500
0.1
0.2
0.3
0.4
0.5
0.6
Varying Number of RobotsSE-Optimistic
SE-Mean
BE-Stay
BE-Backtrack
BE-Rebid
Sca
led
Cum
ulat
ive
Sig
nal S
tren
gth
Results (simulation)
28
Manish Jain
5 25 50 75 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8Varying Total Number of Rounds
SE-Optimistic
SE-Mean
BE-Stay
BE-Backtrack
BE-Rebid
Sca
led
Cum
ulat
ive
Sig
nal S
tren
gth
(10 agents, random graphs with 15-20 links)
Results (simulation)
29
Manish Jain
Chain Density = 1/3 Density = 2/3 Full0
0.1
0.2
0.3
0.4
0.5
0.6
Varying TopologySE-Optimistic
SE-Mean
BE-Stay
BE-Backtrack
BE-Rebid
Sca
led
Cum
ulat
ive
Sig
nal S
tren
gth
(20 agents, 100 rounds)
Results (physical robots)
30
Manish Jain
Results (physical robots)
31
Manish Jain
Chain Random Fully Connected
0
200
400
600
800
1000Physical Robot Results
SE-Mean BE-Rebid
Abs
olut
e G
ain
(4 robots, 20 rounds)
ConclusionsProvide algorithms for DCOPs
addressing real-world challengesDemonstrated improvement with
physical hardware
32
Manish Jain
Future WorkScaling up the evaluation
◦different approaches◦different parameter settings
Examine alternate metrics ◦battery drain◦throughput◦cost to movement
Verify algorithms in other domains
Manish Jain 33
ConclusionsProvide algorithms for DCOPs
addressing real-world challengesDemonstrated improvement with
physical hardware
35
Manish Jain
[email protected]://teamcore.usc.edu/
manish