dcops meet the real world: exploring unknown reward matrices with applications to mobile sensor...

35
DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe 1 Manish Jain

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

DCOPs Meet the Real World:Exploring Unknown Reward Matrices with Applications to Mobile Sensor

Networks

Manish JainMatthew E.

TaylorMakoto YokooMilindTambe

1Manish Jain

Page 2: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

MotivationReal-world Applications of Mobile

Sensor Networks◦Robots in an urban setting◦Autonomous Under-water vehicles

2Manish Jain

Page 3: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Challenges

Rewards are unknown

Limited time-horizon

Anytime performance is important

3Manish Jain

Page 4: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Distributed Constraint Optimization for sensor networks◦[Lesser03, Zhang03, …]

Mobile Sensor Nets for Communication ◦[Cheng2005, Marden07, …]

Factor Graphs◦[Farinelli08, …]

Swarm Intelligence, Potential Games

Other Robotic Approaches …

Existing Models

Manish Jain 4

Page 5: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

ContributionsPropose new algorithms for DCOPs

Seamlessly interleave Distributed Exploration and Distributed Exploitation

Tests on physical hardware

5Manish Jain

Page 6: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

OutlineBackground on DCOPs

Solution Techniques

Experimental Results

Conclusions and Future Work

6Manish Jain

Page 7: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

a2 a3 Reward

10

0

0

6

a1 a2 Reward

10

0

0

6

DCOP Framework

a1 a2 a3

7Manish Jain

Page 8: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Applying DCOP

Manish Jain 8

DCOP Construct Domain Equivalent

Agents Robots

Agent Values Set of Possible Locations

Reward on the Link Signal Strength between neighbors

Objective: Maximize Net Reward

Objective: Maximize net signal strength

Page 9: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

k-Optimality [Pearce07]

1-optimal solutions: all or all R< > = 12

R< > = 6

a2 a3 Reward

10

0

0

6

a1 a2 Reward

10

0

0

6

a1 a2 a3

9Manish Jain

Page 10: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

MGM-Omniscient

a1

a2

a3

a_i a_j Reward

10

0

0

6

10

Manish Jain

Page 11: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

MGM-Omniscient

a1

a2

a3

10

11

Manish Jain

a_i a_j Reward

10

0

0

6

Page 12: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

MGM-Omniscient

a1

a2

a3

a_i a_j Reward

10

0

0

6

10 12 10

12

Manish Jain

Page 13: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

MGM-Omniscient

a1

a2

a3

a_i a_j Reward

10

0

0

6

10 12 10

a1

a2

a3

0 0 0

Only one agent per neighborhood allowed to change

Monotonic Algorithm13

Manish Jain

Page 14: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Solution TechniquesStatic Estimation

◦SE-Optimistic◦SE-Realistic

Balanced Exploration using Decision Theory◦BE-Backtrack◦BE-Rebid◦BE-Stay

14

Manish Jain

Page 15: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Static Estimation TechniquesSE-Optimistic

◦Always assume that exploration is better

◦Greedy Approach

15

Manish Jain

Page 16: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Static Estimation TechniquesSE-Optimistic

◦Always assume that exploration is better

◦Greedy Approach

SE-Realistic◦More conservative – assume

exploration gives mean reward◦Faster convergence

16

Manish Jain

Page 17: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

17

Manish Jain

Balanced Exploration Techniques

Page 18: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

BE-Backtrack◦Decision Theoretic Limit on

exploration◦Track previous best location Rb

◦State of the agent: (Rb,T)

18

Manish Jain

Balanced Exploration Techniques

Page 19: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Balanced Exploration Techniques

Manish Jain 19

Page 20: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Balanced Exploration Techniques

Manish Jain 20

Utility of Exploration

Page 21: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Balanced Exploration Techniques

Manish Jain 21

Utility of Backtrack after

Successful Exploration

Page 22: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Balanced Exploration Techniques

Manish Jain 22

Utility of Backtrack after Unsuccessful Exploration

Page 23: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

BE-Rebid

◦Allows agents to backtrack

◦Re-evaluate every time-step

◦Allows for on-the-flyreasoning

◦Same equations as BE-Backtrack

23

Manish Jain

Balanced Exploration Techniques

Page 24: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

BE-Stay◦Agents unable to backtrack◦Dynamic Programming Approach

24

Manish Jain

Balanced Exploration Techniques

Page 25: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Results

25

Manish Jain

Page 26: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Results

26

Manish Jain

Learning Curve (20 agents, chain, 100 rounds)

Page 27: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Results (simulation)

27

Manish Jain

(chain topology, 100 rounds)

5 15 30 500

0.1

0.2

0.3

0.4

0.5

0.6

Varying Number of RobotsSE-Optimistic

SE-Mean

BE-Stay

BE-Backtrack

BE-Rebid

Sca

led

Cum

ulat

ive

Sig

nal S

tren

gth

Page 28: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Results (simulation)

28

Manish Jain

5 25 50 75 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8Varying Total Number of Rounds

SE-Optimistic

SE-Mean

BE-Stay

BE-Backtrack

BE-Rebid

Sca

led

Cum

ulat

ive

Sig

nal S

tren

gth

(10 agents, random graphs with 15-20 links)

Page 29: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Results (simulation)

29

Manish Jain

Chain Density = 1/3 Density = 2/3 Full0

0.1

0.2

0.3

0.4

0.5

0.6

Varying TopologySE-Optimistic

SE-Mean

BE-Stay

BE-Backtrack

BE-Rebid

Sca

led

Cum

ulat

ive

Sig

nal S

tren

gth

(20 agents, 100 rounds)

Page 30: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Results (physical robots)

30

Manish Jain

Page 31: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Results (physical robots)

31

Manish Jain

Chain Random Fully Connected

0

200

400

600

800

1000Physical Robot Results

SE-Mean BE-Rebid

Abs

olut

e G

ain

(4 robots, 20 rounds)

Page 32: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

ConclusionsProvide algorithms for DCOPs

addressing real-world challengesDemonstrated improvement with

physical hardware

32

Manish Jain

Page 33: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

Future WorkScaling up the evaluation

◦different approaches◦different parameter settings

Examine alternate metrics ◦battery drain◦throughput◦cost to movement

Verify algorithms in other domains

Manish Jain 33

Page 34: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

34

Manish Jain

Thank You

[email protected]://teamcore.usc.edu/

manish

Page 35: DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe

ConclusionsProvide algorithms for DCOPs

addressing real-world challengesDemonstrated improvement with

physical hardware

35

Manish Jain

[email protected]://teamcore.usc.edu/

manish