lifelong multi-agent path finding for online pickup and ...hangma/pub/aamas17_slides.pdf ·...

Post on 03-May-2018

225 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lifelong Multi-Agent Path Findingfor Online Pickup and Delivery Tasks

Hang Ma Jiaoyang Li T. K. Satish Kumar Sven KoenigUniversity of Southern California

Tsinghua University

May 11, 2017AAMAS

Multi-Agent Path Finding

Find collision-free paths for all agents from their currentlocations to their predefined goal locations in a knownenvironment.

a1

a2

g1

g2

Time Step 0

a1

a2

g1

g2

Time Step 1

a1

a2

g1

g2

Time Step 2

a1a2

g2

Time Step 3

a1

a2

Motivated by Real-World Applications:

Automated aircraft-towing vehicles, warehouse robots, officerobots, and game characters in video games.

Amazon Warehouse Robots1

Tasks: Move inventory shelves from storage locations toinventory stations or vice versa.

Figure 3: A small region of a Kiva layout. The green cells represent pod storage locations, the orange ovals the robots (withpods not pictured), and the purple and pink regions the queues around the inventory stations.

Figure 2: A Kiva drive unit and storage pod.

used to move the inventory pods with the correct bins fromtheir storage locations to the inventory stations where a pickworker removes the desired products from the desired bin.Note that the pod has four faces, and the drive unit may needto rotate the pod in order to present the correct face. When apicker is done with a pod, the drive unit stores it in an emptystorage location.

Each station is equipped with a desktop computer thatcontrols pick lights, barcode scanners, and laser pointers thatare used to identify the pick and put locations. Because ev-ery product is scanned in and out of the system, overall pick-ing errors go down, which potentially eliminates the needfor post-picking quality control. In general, every station iscapable of being either a picking station or a replenishmentstation. In practice, pick stations will be located near out-bound conveyors, and replenishment stations will be locatednear pallet drop off points.

The power of the Kiva solution comes from the fact thatit allows every worker to have random access to any inven-tory in the warehouse. Moreover, inventory can be retrievedin parallel. When the picker is filling several boxes at thesame time, the parallel, random access ensures that she isnot waiting on pods to arrive. In fact, by keeping a smallqueue of work at the station, the Kiva system delivers a newpod face every six seconds, which sets a baseline pickingrate of 600 lines per hour.2 Peak rates can exceed 600 linesper hour when the operator can pick more than one item offa pod.3

For a large warehouse, the savings in personnel can besignificant. Consider, for example, what a Kiva implemen-tation of the book warehouse would involve. A busy book-seller may ship 100,000 boxes a day. With existing automa-tion, this level of output would employ perhaps 75 workers

2This statistic is based on single unit picks and has been repro-duced for extended periods in the Kiva test facility.

3This statistic was verified when a small Kiva demonstrationsystem was brought to a drugstore distribution center where opera-tors picked at nearly 700 lines per hour.

1755

Figure 5: The Kiva demonstration facility.

AcknowledgmentsBuilding a working MVS requires a core set of great me-chanical, electrical, and software engineers. It is yet an-other thing to turn it into a commercial product and managethe manufacture, assembly, and deployment of these sys-tems. We thank the world-class Kiva employees who havebreathed life into this vision.

ReferencesBoutilier, C.; Shoham, Y.; and Wellman, M. P. 1997. Eco-nomic principles of multi-agent systems. Artificial Intelli-gence 94(1):1–6.Butenko, S.; Murphey, R.; and Pardos, P. M., eds. 2003.Cooperative Control: Models, Applications and Algo-rithms. Springer.Gilmour, K. 2003. Amazon warehouse, amazon adventure.Internet Magazine.Hazard, C. J.; Wurman, P. R.; and D’Andrea, R. 2006.Alphabet soup: A testbed for studying resource allocationin multi-vehicle systems. In Proceedings of the 2006 AAAIWorkshop on Auction Mechanisms for Robot Coordination,23–30.Jennings, N. R., and Bussmann, S. 2003. Agent-based con-trol systems: Why are they suited to engineering complexsystems? IEEE Control Systems Magazine 61–73.Jennings, N. R. 1996. Coordination techniques for dis-tributed artificial intelligence. In O’Hare, G. M. P., andJennings, N. R., eds., Foundations of Distributed ArtificialIntelligence. Wiley. 187–210.

Konolige, K.; Fox, D.; Ortiz, C.; Agno, A.; Eriksen, M.;Limketkai, B.; Ko, J.; Morisset, B.; Schulz, D.; Stewart,B.; and Vincent, R. 2004. Centibots: Very large scale dis-tributed robotic teams. In Proceedings of the InternationalSymposium on Experimental Robotics.Lesser, V. R. 1999. Cooperative multiagent systems: Apersonal view of the state of the art. IEEE Transactions onKnowledge and Data Engineering 11(1):133–142.Malone, T. W.; Fikes, R. E.; Grant, K. R.; and Howard,M. T. 1988. Enterprise: A market-like task scheduler fordistributed computing environments. In Huberman, B. A.,ed., The Ecology of Computation. North Holland.Rosenschein, J. S., and Zlotkin, G. 1994. Rules of En-counter. Cambridge: The MIT Press.Simmons, R.; Smith, T.; Dias, M. B.; Goldberg, D.; Hersh-berger, D.; Stentz, A.; and Zlot, R. 2002. A layered archi-tecture for coordination of mobile robots. In Schultz, A.,and Parker, L., eds., Multi-Robot Systems: From Swarmsto Intelligent Automata. Kluwer.Wellman, M. P., and Wurman, P. R. 1998. Market-awareagents for a multiagent world. Robotics and AutonomousSystems 24:115–25.

1759

1P. R. Wurman, R. D’Andrea, and M. Mountz. “Coordinating Hundreds of Cooperative, Autonomous Vehicles inWarehouses”. In: AI Magazine 29.1 (2008), pp. 9–20.

Multi-Agent Pickup and Delivery (MAPD) Problem

I Existing research on multi-agent path finding — a“one-shot” version:One pre-determined task for each agent — navigates to itsgoal location.

I MAPD — a “lifelong” version of multi-agent path finding:I A task can enter the system at any time.I Agents have to constantly attend to a stream of new tasks.

MAPD Algorithms

1. Decoupled Task Assignment and Path FindingI Token Passing (TP): Greedy task assignment and no task

reassignment.I Token Passing with Task Swaps (TPTS): Local task

reassignment between two agents.

2. Centralized Task Assignment and Path FindingCENTRAL

Roughly:I Effectiveness: TP < TPTS < CENTRALI Efficiency: CENTRAL < TPTS < TP

Tasks

Figure 3: A small region of a Kiva layout. The green cells represent pod storage locations, the orange ovals the robots (withpods not pictured), and the purple and pink regions the queues around the inventory stations.

Figure 2: A Kiva drive unit and storage pod.

used to move the inventory pods with the correct bins fromtheir storage locations to the inventory stations where a pickworker removes the desired products from the desired bin.Note that the pod has four faces, and the drive unit may needto rotate the pod in order to present the correct face. When apicker is done with a pod, the drive unit stores it in an emptystorage location.

Each station is equipped with a desktop computer thatcontrols pick lights, barcode scanners, and laser pointers thatare used to identify the pick and put locations. Because ev-ery product is scanned in and out of the system, overall pick-ing errors go down, which potentially eliminates the needfor post-picking quality control. In general, every station iscapable of being either a picking station or a replenishmentstation. In practice, pick stations will be located near out-bound conveyors, and replenishment stations will be locatednear pallet drop off points.

The power of the Kiva solution comes from the fact thatit allows every worker to have random access to any inven-tory in the warehouse. Moreover, inventory can be retrievedin parallel. When the picker is filling several boxes at thesame time, the parallel, random access ensures that she isnot waiting on pods to arrive. In fact, by keeping a smallqueue of work at the station, the Kiva system delivers a newpod face every six seconds, which sets a baseline pickingrate of 600 lines per hour.2 Peak rates can exceed 600 linesper hour when the operator can pick more than one item offa pod.3

For a large warehouse, the savings in personnel can besignificant. Consider, for example, what a Kiva implemen-tation of the book warehouse would involve. A busy book-seller may ship 100,000 boxes a day. With existing automa-tion, this level of output would employ perhaps 75 workers

2This statistic is based on single unit picks and has been repro-duced for extended periods in the Kiva test facility.

3This statistic was verified when a small Kiva demonstrationsystem was brought to a drugstore distribution center where opera-tors picked at nearly 700 lines per hour.

1755

Figure 5: The Kiva demonstration facility.

AcknowledgmentsBuilding a working MVS requires a core set of great me-chanical, electrical, and software engineers. It is yet an-other thing to turn it into a commercial product and managethe manufacture, assembly, and deployment of these sys-tems. We thank the world-class Kiva employees who havebreathed life into this vision.

ReferencesBoutilier, C.; Shoham, Y.; and Wellman, M. P. 1997. Eco-nomic principles of multi-agent systems. Artificial Intelli-gence 94(1):1–6.Butenko, S.; Murphey, R.; and Pardos, P. M., eds. 2003.Cooperative Control: Models, Applications and Algo-rithms. Springer.Gilmour, K. 2003. Amazon warehouse, amazon adventure.Internet Magazine.Hazard, C. J.; Wurman, P. R.; and D’Andrea, R. 2006.Alphabet soup: A testbed for studying resource allocationin multi-vehicle systems. In Proceedings of the 2006 AAAIWorkshop on Auction Mechanisms for Robot Coordination,23–30.Jennings, N. R., and Bussmann, S. 2003. Agent-based con-trol systems: Why are they suited to engineering complexsystems? IEEE Control Systems Magazine 61–73.Jennings, N. R. 1996. Coordination techniques for dis-tributed artificial intelligence. In O’Hare, G. M. P., andJennings, N. R., eds., Foundations of Distributed ArtificialIntelligence. Wiley. 187–210.

Konolige, K.; Fox, D.; Ortiz, C.; Agno, A.; Eriksen, M.;Limketkai, B.; Ko, J.; Morisset, B.; Schulz, D.; Stewart,B.; and Vincent, R. 2004. Centibots: Very large scale dis-tributed robotic teams. In Proceedings of the InternationalSymposium on Experimental Robotics.Lesser, V. R. 1999. Cooperative multiagent systems: Apersonal view of the state of the art. IEEE Transactions onKnowledge and Data Engineering 11(1):133–142.Malone, T. W.; Fikes, R. E.; Grant, K. R.; and Howard,M. T. 1988. Enterprise: A market-like task scheduler fordistributed computing environments. In Huberman, B. A.,ed., The Ecology of Computation. North Holland.Rosenschein, J. S., and Zlotkin, G. 1994. Rules of En-counter. Cambridge: The MIT Press.Simmons, R.; Smith, T.; Dias, M. B.; Goldberg, D.; Hersh-berger, D.; Stentz, A.; and Zlot, R. 2002. A layered archi-tecture for coordination of mobile robots. In Schultz, A.,and Parker, L., eds., Multi-Robot Systems: From Swarmsto Intelligent Automata. Kluwer.Wellman, M. P., and Wurman, P. R. 1998. Market-awareagents for a multiagent world. Robotics and AutonomousSystems 24:115–25.

1759

g1

s2

g2

s1

Executing Task

In order to execute a task, the agent has to move from itscurrent location via the pickup location to the delivery location:

1. When the agent reaches the pickup location, it starts toexecute the task.

2. When it reaches the delivery location, it finishes the task.

g1

s1

Free Agents

Free Agents:

g1

s1

a2

a1

Occupied Agents

Occupied Agents:

g1

s1a2

a1

Assignment of Agents to Tasks

I A free agent can be assigned to any unexecuted task.

I An occupied agent has to finish executing its current task.

Objective of MAPD

Finish executing each task as quickly as possible.

Effectiveness of a MAPD algorithm

Service time: the average number of timesteps needed tofinish executing each task after it enters the system.An algorithm solves a MAPD instance ⇐⇒ Service time of alltasks is bounded.

Service time: 7+72 = 7

g1

s2

g2

s1

a2

a1

Solvability

Not every MAPD instance is solvable.

s1 a1 a2 g1

Well-Formed MAPD Instances

Being well-formed (based on [M. Cap et al 2015]2): a sufficientcondition that makes MAPD instances solvable.Intuition: agents should only be allowed to rest (that is, stayforever) in locations, called parking locations, where theycannot block other agents.

2M. Cap, J. Vokrınek, and A. Kleiner. “Complete Decentralized Method for On-Line Multi-Robot TrajectoryPlanning in Well-formed Infrastructures”. In: International Conference on Automated Planning and Scheduling.2015, pp. 324–332.

Parking Locations

I Task Parking Locations: all pickup and delivery locationsof tasks(storage locations, inventory stations, etc.)

I Non-task Parking Locations:I All initial locations of agentsI Additional designated parking locations

g1

s2

g2

s1

a1 a2

a3

Well-Formed MAPD Instances

1. # tasks is finite;2. # non-task parking locations ≥ # agents;3. For any two parking locations, there exists a path between

them that traverses no other parking locations.

a1 a2 a1 a2 a1 a2

MAPD Algorithms

We present

1. Two Decoupled Algorithms:complete for well-formed MAPD instances (solve allwell-formed instances)

I Token Passing (TP)I Token Passing with Task Swaps (TPTS)

2. One Centralized Algorithm:CENTRAL

A Running Example

Unexecuted Tasks: task1, task2.Agent a1 and agent a2 are resting.Agent a3 is assigned to task1 and on the way to the pickuplocation s1.

a1

a2a3

s1

g1

s2

g2

Token Passing (TP)

Based on an idea similar to Cooperative A*3:

I Token: a synchronized shared block of memory thatcontains the current paths of all agents, set of unexecutedtask, and agent assignments.

I Only one agent has access to the token at each time.I Each agent assigns itself a task, plan its path, and passes

the token to the next agent.

3D. Silver. “Cooperative Pathfinding”. In: Artificial Intelligence and Interactive Digital Entertainment. 2005,pp. 117–122.

TP: Key Idea

I A task can only be assigned once.I Once an agent is assigned to a task, it cannot be assigned

to other tasks until it finishes the task.

TP: Running Example

Task Available for Assignment: task2.Agent a1 and agent a2 request for token.

a1

a2a3

s1

g1

s2

g2

TP: Agent a1’s Turn

Agent a1 Has Token

1. it cannot assign itself to any task because agent a2 rests ing2, the only task available to it;

2. it has to rest in a parking location that will not create anydeadlock;

3. it can continue to rest in its current location.

s1

g1

s2

g2

TP: Agent a2’s Turn

Agent a2 Has Token

1. it assigns itself to task2;2. task2 is no longer available to other agents;3. it plans a cost-minimal collision-free path to execute task2.

s1

g1

s2

g2

TP: Animation

s1

g1

s2

g2

TP: Animation

s1

g1

s2

g2

TP: Animation

s1

g1

s2

g2

TP: Animation

s1

g1

s2

g2

TP: Animation

s1

g1

s2

g2

TP: Animation

s1

g1

s2

g2

TP: Animation

s2

g2

TP: Animation

s2

g2

TP: Animation

TP: Completeness

TheoremAll well-formed MAPD instances are solvable, and TP solvesthem.

Improving the Effectiveness of TP

TP is simple but can be made more effective:

I A task with an assigned agent can be assigned a newagent (as long as the task has not been executed).

Token Passing with Task Swaps (TPTS)

An agent is allowed to grab a task from another agent if it canfinish the task earlier.

TPTS: Running Example

Tasks Available for Assignment: task1, task2.Agent a1 and agent a2 request for token.

a1

a2a3

s1

g1

s2

g2

TPTS: Agent a1’s Turn

Agent a1 has token.Agent a1 grabs task1 from agent a3.

s1

g1

s2

g2

TPTS: Agent a3 Making Decisions

Agent a3 has token.Agent a3 moves to a parking location that will not create anydeadlock in the future.

s1

g1

s2

g2

TPTS: Agent a2’s Turn

Agent a2 has token.Agent a2 grabs task1 from agent a1.

s1

g1

s2

g2

TPTS: Agent a1 Making Decisions

Agent a1 has token.Agent a1 assigns itself to task2.

s1

g1

s2

g2

TPTS: Completeness

TheoremTPTS solves all well-formed MAPD instances.

Centralized MAPD Algorithm: CENTRAL

CENTRAL assigns agents to tasks in a centralized way:1. assigns parking locations to all free agents using

Hungarian method;2. plans paths for all of them from their current locations to

their assigned parking locations by solving the resulting“one-shot” multi-agent path-finding problem.

CENTRAL: Running Example

Tasks available for assignment: task1, task2

a1

a2a3

s1

g1

s2

g2

CENTRAL: Candidate Parking Locations

Pickup locations s1 and s2 + three additional “good” parkinglocations, one for each agent:

s1

g1

s2

g2

CENTRAL: Assignment of Parking Locations toAgents

CENTRAL uses Hungarian method to find a cost-minimalassignment from parking locations to agents (pickup locationshave priority over other parking locations):

CENTRAL: Path Finding

CENTRAL plans collision-free paths for all agents from theircurrent locations to their assigned parking locations.CENTRAL plans paths to delivery locations only when agentsreach pickup locations.

s1

g1

s2

g2

Comparisons of Three Algorithms

a1

a2a3

s1

g1

s2

g2

s1

g1

s2

g2

s1

g1

s2

g2

s1

g1

s2

g2

Small Simulated Warehouse Environment

Figure: 21× 35 4-neighbor grid with 50 agents. Gray cells areinventory stations and storage locations. Colored circles are the initiallocations of agents.

Experimental Results: 500 Random Tasks, 10 to 50Agents

I Effectiveness1. Service Time:

CENTRAL < TPTS < TP2. Throughput – # tasks executed per 100 timesteps:

TP < TPTS < CENTRAL3. Makespan – timestep when all tasks are finished:

CENTRAL < TPTS < TPI Runtime per Timestep:

TP < 10 millisecondsTPTS < 200 millisecondsCENTRAL < 4,000 milliseconds

Large Simulated Warehouse Environment

Figure: 81× 81 4-neighbor grid with 500 agents.

Results for TP: 1000 Random Tasks, 100 to 500Agents

100 agents: ∼ 0.09 seconds per timestep500 agents: ∼ 6 seconds per timestep

agents 100 200 300 400 500service time 463.25 330.19 301.97 289.08 284.24

runtime (milliseconds) 90.83 538.22 1,854.44 3,881.11 6,121.06

Takeaways

MAPD: A “lifelong” version of multi-agent path finding.Three Algorithms:

I Decoupled and complete for well-formed MAPD instances:TP, TPTS.

I Centralized: CENTRAL.

Task Assignment Effort: TP < TPTS < CENTRALEffectiveness: TP < TPTS < CENTRALEfficiency: CENTRAL < TPTS < TP

top related