traveling salesman problems motivated by robot navigation maria minkoff mit with avrim blum, shuchi...

Traveling Salesman Problems Motivated by

Robot Navigation

Maria Minkoff

MIT

With Avrim Blum, Shuchi Chawla, David Karger, Terran Lane,

Adam Meyerson

A Robot Navigation Problem

• Robot delivering packages in a building• Goal to deliver as quickly as possible• Classic model: Traveling Salesman Problem

• Find a tour of minimum length

• Additional constraints:• some packages have higher priority• uncertainty in robot’s behavior

• battery failure• sensor error, motor control error

Markov Decision Process Model

• State space S

• Choice of actions aA at each state s

• Transition function T(s’|s,a)

• action determines probability distribution on next state

• sequence of actions produces a random path through graph

• Rewards R(s) on states

• If arrive in state s at time t, receive discounted reward tR(s) for

• MDP Goal: policy for picking an action from any state that maximizes total discounted reward

Exponential Discounting

• Motivates to get to desired state quickly

• Inflation: reward collected in distant future decreases in value due to uncertainty • at time t robot loses power with fixed probability

• probability of being alive at t is exponentially distributed

• discounting reflects value of reward in expectation

Solving MDP

• Fixing action at each state produces a Markov Chain with transition probabilities pvw

• Can compute expected discounted reward v if start at state v:

v = rv + w pvw t(v,w) w

• Choosing actions to optimize this recurrence is polynomial time solvable • Linear programming

• Dynamic programming (like shortest paths)

Solving the wrong problem

• Package can only be delivered once• So should not get reward each time reach target

• One solution: expand state space• New state = current location past locations

(packages already delivered)

• Reward nonzero only on states where current location not included in list of previously visited

• Now apply MDP algorithm

• Problem: new state space has exponential size

Tackle an easier problem

• Problem has two novel elements for “theory”• Discounting of reward based on arrival time

• Probability distribution on outcome of actions

• We will set aside second issue for now• In practice, robot can control errors

• Even first issue by itself is hard and interesting

• First step towards solving whole problem

Discounted-Reward TSP

Given • undirected graph G=(V,E) • edge weights (travel times) de ≥ 0

• weights on nodes (rewards) rv ≥ 0

• discount factor (0,1)• root node s

Goalfind a path P starting at s that maximizes

total discounted reward (P) = v P rv dP(v)

Approximation Algorithms

• Discounted-Reward TSP is NP-complete (and so is more general MDP-type problem)

• reduction from minimum latency TSP

• So intractable to solve exactly

• Goal: approximation algorithm that is guaranteed to collect at least some constant

fraction of the best possible discounted reward

Related Problems

Goal of Discounted-Reward TSP seems to be to find a “short” path that collects “lots” of reward

• Prize-Collecting TSP• Given a root vertex v, find a tour containing v that

minimizes total length + foregone reward (undiscounted)

• Primal-dual 2-approximation algorithm [GW 95]

k-TSP

• Find a tour of minimum length that visits at least k vertices

• 2-approximation algorithm known for undirected graphs based on algorithm for PC-TSP [Garg 99]

• Can be extended to handle node-weighted version

Mismatch

Constant factor approximation on length doesn’t exponentiate well

• Suppose optimum solution reaches some vertex v at time t for reward tr

• Constant factor approximation would reach within time 2t for reward 2tr

• Result: get only t fraction of optimum discounted reward, not a constant fraction.

Orienteering Problem

Find a path of length at most D that maximizes net reward collected

• Complement of k-TSP • approximates reward collected instead of length• avoids changing length, so exponentiation doesn’t hurt• unrooted case can be solved via k-TSP

• Drawback: no constant factor approximation for rooted non-geometric version previously known

• Our techniques also give a constant factor approximation for Orienteering problem

Our Results

Using -approximation for k-TSP as subroutine

• (3/2 +2)-approximation for Orienteering

• e(3/2 + 2)-approximation for Discounted-Reward Collection

• constant-factor approximations for tree- and multiple-path versions of the problems

Our Results

Using -approximation for k-TSP as subroutinesubstitute =2 announced by Garg in 1999

• (3/2 +25 -approximation for Orienteering

• e(3/2 +13-approximation for Discounted-Reward Collection

• constant-factor approximations for tree- and multiple-path versions of the problems

Eliminating Exponentiation

• Let dv = shortest path distance (time) to v

• Define the prize at v as v=dv rv

• max discounted reward possibly collectable at v

• If given path reaches v at time tv,

define excess ev = tv – dv

• difference between shortest path and chosen one

• Then discounted reward at v is ev v

• Idea: if excess small, prize ~ discounted reward

• Fact: excess only increases as traverse path

• excess reflects lost time; can’t make it up

Optimum path• assume = ½ (can scale edge lengths)

Claim: at least ½ of optimum path’s discounted reward R is collected

before path’s excess reaches 1

s

u

Proof by contradiction:• Let u be first vertex with eu ≥ 1• Suppose more than R/2 reward follows u• Can shortcut directly to u then traverse the rest of optimum

• reduces all excesses after u by at least 1• so “undiscounts” rewards by factor -1 = 2• so doubles discounted reward collected• but this was more than R/2: contradiction

0

1

0.5

1.5

2

3

0

0.5

1

2

New problem: Approximate Min-Excess Path

• Suppose there exists an s-t path P* with prize value of length l(P*)=dt+e

• Optimization: find s-t path P with prize value ≥ that minimizes excess l(P)-dt over shortest path to t

• equivalent to minimizing total length, e.g. k-TSP

• Approximation: find s-t path P with prize value ≥ that approximates optimum excess over shortest path to t, i.e. has length l(P) = dt + ce

• better than approximating entire path length

Using Min-Excess Path

• Recall discounted reward at v is ev v

• Prefix of optimum discounted reward path:

• collects discounted reward ev v R/2

spans prize v R/2

• and has no vertex with excess over 1

• Guess t = last node on opt path with excess et 1

• Find a path to t of approximately (4 times) minimum excess that spans R/2 prize (we can guess R/2)

• Excesses at most 4, so ev v v/16

discounted reward on found path R/32

Solving Min-Excess Path problem

Exactly solvable case: monotonic paths

• Suppose optimum path goes through vertices in strictly increasing distance from root

• Then can find optimum by dynamic program• Just as can solve longest path in an acyclic graph

• Build table• For each vertex v: is there a monotonic path from

v with length l and prize ?

Solving Min-Excess Path problem

Approximable case: wiggly paths

• Length of path to v is lv = dv + ev

• If ev > dv then lv > ev > lv/2

• i.e., take twice as long as necessary to reach v

• So if approximate lv to constant factor, also approximate ev to twice that constant factor

Approximating path length

• Can use k-TSP algorithm to find approximately shortest s-t path with specified prize

• merge s and t into vertex r • opt path becomes a tour• solve k-TSP with root r

• “unmerge”: can get one or more cycles

r

s t

• connect s and t by shortest path

Decompose optimum path

monotone monotone monotonewiggly wiggly

> 2/3 of each wiggly path is excess

Divides into independent problems

Decomposition Analysis

• 2/3 of each wiggly segment is excess

• That excess accumulates into whole path

• total excess of wiggly segment excess of whole path

total length of wiggly segments 3/2 of path excess

• Use dynamic program to find shortest (min-excess) monotonic segments collecting target prize

• Use k-TSP to find approximately shortest wiggles collecting target prize

• Approximates length, so approximates excess • Over all monotonic and wiggly segments,

approximates total excess

Dynamic program for Min-Excess Path

• For each pair of vertices and each (discretized) prize value, find• Shortest monotonic path collecting desired prize

• Approximately shortest wiggly path collecting desired prize

• Note: polynomially many subproblems

• Use dynamic programming to find optimum pasting together of segments

Solving Orienteering Problem: special case

• Given a path from s that• collects prize • has length D

• ends at t, the farthest point from s

v

t

s

• For any const integer r 1, there

exists a path from s to some v with• prize /r

• excess (D-dv)/r

0

0.5

1

1.5

2

3

1

Solving Orienteering Problem

General case: path ends at arbitrary t• Let u be the farthest point from s • Connect t to s via shortest path• One of path segments ending at u

• has prize /2• has length D

Reduced to special case• Using 4-approximation for

Min-Excess Path get 8-approximation for Orienteering

s

t

u

Budget Prize-Collecting Steiner Tree problem

Find a rooted tree of edge cost at most D that spans maximum amount of prize

• Complement of k-MST

• Create Euler tour of opt tree T* of cost 2D

• Divide this tour into two paths starting at root each of length D

• One of them contains at least ½ of total prize

• Path is a type of tree

• Use c-approximation algorithm for Orienteering to obtain 2c-approximation for Budget PCST

Summary

• Showed maximum discounted reward can be approximated using min-excess path

• Showed how to approximate min-excess pathusing k-TSP

• Min-excess path can also be used to solve rooted Orienteering problem (open question)• Also solves “tree” and “cycle” versions of

Orienteering

Open Questions

• Non-uniform discount factors• each vertex v has its own v

• Non-uniform deadlines• each vertex specifies its own deadline by which it

has to be visited in order to collect reward

• Directed graphs• We used k-TSP, only solved for undirected

• For directed, even standard TSP has no known constant factor approximation

• We only use k-TSP/undirectedness in wiggly parts

Future directions

• Stochastic actions• Stochastic seems to imply directed

• Special case: forget rewards. • Given choice of actions, choose to minimize cover

time of graph

• Applying discounting framework to other problems :• Scheduling

• Exponential penalty in place of hard deadlines

traveling salesman problems motivated by robot navigation maria minkoff mit with avrim blum, shuchi...

Documents

discounted reward rv

state v

value of reward

state spacenew state

discounted reward gtrs

theorydiscounting of

total discounted reward

new state space