dcops meet the real world: exploring unknown reward matrices with applications to mobile sensor...

DCOPs Meet the Real World:Exploring Unknown Reward Matrices with Applications to Mobile Sensor

Networks

Manish JainMatthew E.

TaylorMakoto YokooMilindTambe

1Manish Jain

MotivationReal-world Applications of Mobile

Sensor Networks◦Robots in an urban setting◦Autonomous Under-water vehicles

2Manish Jain

Challenges

Rewards are unknown

Limited time-horizon

Anytime performance is important

3Manish Jain

Distributed Constraint Optimization for sensor networks◦[Lesser03, Zhang03, …]

Mobile Sensor Nets for Communication ◦[Cheng2005, Marden07, …]

Factor Graphs◦[Farinelli08, …]

Swarm Intelligence, Potential Games

Other Robotic Approaches …

Existing Models

Manish Jain 4

ContributionsPropose new algorithms for DCOPs

Seamlessly interleave Distributed Exploration and Distributed Exploitation

Tests on physical hardware

5Manish Jain

OutlineBackground on DCOPs

Solution Techniques

Experimental Results

Conclusions and Future Work

6Manish Jain

a2 a3 Reward

10

0

0

6

a1 a2 Reward

10

0

0

6

DCOP Framework

a1 a2 a3

7Manish Jain

Applying DCOP

Manish Jain 8

DCOP Construct Domain Equivalent

Agents Robots

Agent Values Set of Possible Locations

Reward on the Link Signal Strength between neighbors

Objective: Maximize Net Reward

Objective: Maximize net signal strength

k-Optimality [Pearce07]

1-optimal solutions: all or all R< > = 12

R< > = 6

a2 a3 Reward

10

0

0

6

a1 a2 Reward

10

0

0

6

a1 a2 a3

9Manish Jain

MGM-Omniscient

a1

a2

a3

a_i a_j Reward

10

0

0

6

10

Manish Jain

MGM-Omniscient

a1

a2

a3

10

11

Manish Jain

a_i a_j Reward

10

0

0

6

MGM-Omniscient

a1

a2

a3

a_i a_j Reward

10

0

0

6

10 12 10

12

Manish Jain

MGM-Omniscient

a1

a2

a3

a_i a_j Reward

10

0

0

6

10 12 10

a1

a2

a3

0 0 0

Only one agent per neighborhood allowed to change

Monotonic Algorithm13

Manish Jain

Solution TechniquesStatic Estimation

◦SE-Optimistic◦SE-Realistic

Balanced Exploration using Decision Theory◦BE-Backtrack◦BE-Rebid◦BE-Stay

14

Manish Jain

Static Estimation TechniquesSE-Optimistic

◦Always assume that exploration is better

◦Greedy Approach

15

Manish Jain

Static Estimation TechniquesSE-Optimistic

◦Always assume that exploration is better

◦Greedy Approach

SE-Realistic◦More conservative – assume

exploration gives mean reward◦Faster convergence

16

Manish Jain

17

Manish Jain

Balanced Exploration Techniques

BE-Backtrack◦Decision Theoretic Limit on

exploration◦Track previous best location Rb

◦State of the agent: (Rb,T)

18

Manish Jain



Manish Jain 19


Manish Jain 20

Utility of Exploration


Manish Jain 21

Utility of Backtrack after

Successful Exploration


Manish Jain 22

Utility of Backtrack after Unsuccessful Exploration

BE-Rebid

◦Allows agents to backtrack

◦Re-evaluate every time-step

◦Allows for on-the-flyreasoning

◦Same equations as BE-Backtrack

23

Manish Jain


BE-Stay◦Agents unable to backtrack◦Dynamic Programming Approach

24

Manish Jain


Results

25

Manish Jain

Results

26

Manish Jain

Learning Curve (20 agents, chain, 100 rounds)

Results (simulation)

27

Manish Jain

(chain topology, 100 rounds)

5 15 30 500

0.1

0.2

0.3

0.4

0.5

0.6

Varying Number of RobotsSE-Optimistic

SE-Mean

BE-Stay

BE-Backtrack

BE-Rebid

Sca

led

Cum

ulat

ive

Sig

nal S

tren

gth


28

Manish Jain

5 25 50 75 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8Varying Total Number of Rounds

SE-Optimistic

SE-Mean

BE-Stay

BE-Backtrack

BE-Rebid

Sca

led

Cum

ulat

ive

Sig

nal S

tren

gth

(10 agents, random graphs with 15-20 links)


29

Manish Jain

Chain Density = 1/3 Density = 2/3 Full0

0.1

0.2

0.3

0.4

0.5

0.6

Varying TopologySE-Optimistic

SE-Mean

BE-Stay

BE-Backtrack

BE-Rebid

Sca

led

Cum

ulat

ive

Sig

nal S

tren

gth

(20 agents, 100 rounds)

Results (physical robots)

30

Manish Jain

Results (physical robots)

31

Manish Jain

Chain Random Fully Connected

0

200

400

600

800

1000Physical Robot Results

SE-Mean BE-Rebid

Abs

olut

e G

ain

(4 robots, 20 rounds)

ConclusionsProvide algorithms for DCOPs

addressing real-world challengesDemonstrated improvement with

physical hardware

32

Manish Jain

Future WorkScaling up the evaluation

◦different approaches◦different parameter settings

Examine alternate metrics ◦battery drain◦throughput◦cost to movement

Verify algorithms in other domains

Manish Jain 33

34

Manish Jain

Thank You

[email protected]://teamcore.usc.edu/

manish

ConclusionsProvide algorithms for DCOPs

addressing real-world challengesDemonstrated improvement with

physical hardware

35

Manish Jain

[email protected]://teamcore.usc.edu/

manish

dcops meet the real world: exploring unknown reward matrices with applications to mobile sensor...

Documents

manish jain slide

dcop manish jain

utility of exploration

unsuccessful exploration

distributed exploration

net signal strength

mgmomniscient a1 a2

dcop framework a1 a2