· introduction to linear and combinatorial optimization (adm i) martin skutella tu berlin ws...

Introduction to Linear and Combinatorial Optimization(ADM I)

Martin Skutella

TU Berlin

WS 2011/12

1

General Remarks

I new flavor of ADM I — introduce linear and combinatorialoptimization together

I to be continued in ADM II “Discrete Optimization” in (SoSe 2012)

I lectures (Martin Skutella):Wednesday, 10:15 – 11:45, MA 041Thursday, 16:15 – 17:45, MA 043Tuesday Oct. 25 and Nov. 1, 16:15 – 17:45, MA 041

I exercise session (Madeleine Theile):Friday, 14:15 – 15:45, MA 041 (tomorrow: lecture)

I tutorial sessions (Steffen Suerbier), t.b.a. tomorrow!

I homework: set of problems every weak, sometimes programmingexercises (details t.b.a.)

I final oral exam (Modulabschlussprufung) in February or Marchalternatively: oral exam for both ADM I & II next summer

2

Chapter 0:Introduction

3

Optimization Problems

Generic optimization problem

Given: set X , function f : X → RTask: find x∗ ∈ X maximizing (minimizing) f (x∗), i. e.,

f (x∗) ≥ f (x) (f (x∗) ≤ f (x)) for all x ∈ X .

I An x∗ with these properties is called optimal solution (optimum).

I Here, X is the set of feasible solutions, f is the objective function.

Short form: maximize f (x)

subject to x ∈ X

or simply: max{f (x) | x ∈ X}.

Problem: Too general to say anything meaningful!

4

Convex Optimization Problems

Definition 0.1.

Let X ⊆ Rn and f : X → R.

a X is convex if for all x , y ∈ X and 0 ≤ λ ≤ 1 it holds that

λ · x + (1− λ) · y ∈ X .

b f is convex if for all x , y ∈ X and 0 ≤ λ ≤ 1 withλ · x + (1− λ) · y ∈ X it holds that

λ · f (x) + (1− λ) · f (y) ≥ f (λ · x + (1− λ) · y) .

c If X and f are both convex, then min{f (x) | x ∈ X} is a convexoptimization problem.

Note: f : X → R is called concave if −f is convex.

5

Local and Global Optimality

Definition 0.2.

Let X ⊆ Rn and f : X → R.x ′ ∈ X is a local optimum of the optimization problem min{f (x) | x ∈ X}if there is an ε > 0 such that

f (x ′) ≤ f (x) for all x ∈ X with ‖x ′ − x‖2 ≤ ε.

Theorem 0.3.

For a convex optimization problem, every local optimum is a (global)optimum.

Proof: . . .

6

Optimization Problems Considered in this Course:

maximize f (x)

subject to x ∈ X

I X ⊆ Rn polyhedron, f linear function−→ linear optimization problem (in particular convex)

I X ⊆ Zn integer points of a polyhedron, f linear function−→ integer linear optimization problem

I X related to some combinatorial structure (e. g., graph)−→ combinatorial optimization problem

I X finite (but usually huge)−→ discrete optimization problem

7

Example: Shortest Path Problem

Given: directed graph D = (V ,A), weight function w : A→ R≥0,start node s ∈ V , destination node t ∈ V .

Task: find s-t-path of minimum weight in D.

That is, X = {P ⊆ A | P is s-t-path in D} and f : X → R is given by

f (P) =∑

a∈Pw(a) .

Remark.

Note that the finite set of feasible solutions X is only implicitly given by D.This holds for all interesting problems in combinatorial optimization!

8

Example: Minimum Spanning Tree (MST) Problem

Given: undirected graph G = (V ,E ), weight function w : E → R≥0.

Task: find connected subgraph of G containing all nodes in V withminimum total weight.

That is, X = {E ′ ⊆ E | E ′ connects all nodes in V } and f : X → R isgiven by

f (E ′) =∑

e∈E ′w(e) .

Remarks.

I Notice that there always exists an optimal solution without cycles.

I A connected graph without cycles is called a tree.

I A subgraph of G containing all nodes in V is called spanning.

9

Example: Minimum Cost Flow Problem

Given: directed graph D = (V ,A), with arc capacities u : A→ R≥0,arc costs c : A→ R, and node balances b : V → R.

Interpretation:

I nodes v ∈ V with b(v) > 0 (b(v) < 0) have supply (demand) and arecalled sources (sinks)

I the capacity u(a) of arc a ∈ A limits the amount of flow that can besent through arc a.

Task: find a flow x : A→ R≥0 obeying capacities and satisfying allsupplies and demands, that is,

0 ≤ x(a) ≤ u(a) for all a ∈ A,∑

a∈δ+(v)

x(a)−∑

a∈δ−(v)

x(a) = b(v) for all v ∈ V ,

such that x has minimum cost c(x) :=∑

a∈A c(a) · x(a).

10

Example: Minimum Cost Flow Problem (cont.)Formulation as a linear program (LP):

minimize∑

a∈Ac(a) · x(a) (0.1)

subject to∑

a∈δ+(v)

x(a)−∑

a∈δ−(v)

x(a) = b(v) for all v ∈ V , (0.2)

x(a) ≤ u(a) for all a ∈ A, (0.3)

x(a) ≥ 0 for all a ∈ A. (0.4)

I Objective function given by (0.1). Set of feasible solutions:

X = {x ∈ RA | x satisfies (0.2), (0.3), and (0.4)} .

I Notice that (0.1) is a linear function of x and (0.2) – (0.4) are linearequations and linear inequalities, respectively. −→ linear program

11

Example (cont.): Adding Fixed Cost

Fixed costs w : A→ R≥0.

If arc a ∈ A shall be used (i. e., x(a) > 0), it must be bought at cost w(a).

Add variables y(a) ∈ {0, 1} with y(a) = 1 if arc a is used, 0 otherwise.

This leads to the following mixed-integer linear program (MIP):

minimize∑

a∈Ac(a) · x(a) +

∑

a∈Aw(a) · y(a)

subject to∑

a∈δ+(v)

x(a)−∑

a∈δ−(v)

x(a) = b(v) for all v ∈ V ,

x(a) ≤ u(a) · y(a) for all a ∈ A,

x(a) ≥ 0 for all a ∈ A.

y(a) ∈ {0, 1} for all a ∈ A.

MIP: Linear program where some variables may only take integer values.

12

Example: Maximum Weighted Matching Problem

Given: undirected graph G = (V ,E ), weight function w : E → R.

Task: find matching M ⊆ E with maximum total weight.

(M ⊆ E is a matching if every node is incident to at most one edge in M.)

Formulation as an integer linear program (IP):

Variables: xe ∈ {0, 1} for e ∈ E with xe = 1 if and only if e ∈ M.

maximize∑

e∈Ew(e) · xe

subject to∑

e∈δ(v)

xe ≤ 1 for all v ∈ V ,

xe ∈ {0, 1} for all e ∈ E .

IP: Linear program where all variables may only take integer values.

13

Example: Traveling Salesperson Problem (TSP)

Given: complete graph Kn on n nodes, weight function w : E (Kn)→ R.

Task: find a Hamiltonian cycle with minimum total weight.

(A Hamiltonian cycle visits every node exactly once.)

Formulation as an integer linear program? (later!)

14

Example: Minimum Node Coloring Problem

Given: undirected graph G = (V ,E ).

Task: color the nodes of G such that adjacent nodes get different colors.Use a minimum number of colors.

Definition 0.4.

A graph G = (V ,E ) whose nodes can be colored with two colors is calledbipartite.

15

Example: Weighted Vertex Cover Problem

Given: undirected graph G = (V ,E ), weight function w : V → R≥0.

Task: find U ⊆ V of minimum total weight such that every edge e ∈ Ehas at least one endpoint in U.

Formulation as an integer linear program (IP):

Variables: xv ∈ {0, 1} for v ∈ V with xv = 1 if and only if v ∈ U.

minimize∑

v∈Vw(v) · xv

subject to xv + xv ′ ≥ 1 for all e = {v , v ′} ∈ E ,

xv ∈ {0, 1} for all v ∈ V .

16

Typical Questions

For a given optimization problem:

I How to find an optimal solution?

I How to find a feasible solution?

I Does there exist an optimal/feasible solution?

I How to prove that a computed solution is optimal?

I How difficult is the problem?

I Does there exist an efficient algorithm with “small” worst-caserunning time?

I How to formulate the problem as a (mixed integer) linear program?

I Is there a useful special structure of the problem?

17

Preliminary Outline of the Course

I linear programming and the simplex algorithm

I geometric interpretation of the simplex algorithm

I LP duality, complementary slackness

I efficient algorithms for minimum spanning trees, shortest paths

I efficient algorithms for maximum flows and minimum cost flows

I complexity theory

18

Literature on Linear Optimization (not complete)

I D. Bertsimas, J. N. Tsitsiklis, Introduction to Linear Optimization,Athena, 1997.

I V. Chvatal, Linear Programming, Freeman, 1983.

I G. B. Dantzig, Linear Programming and Extensions, PrincetonUniversity Press, 1998 (1963).

I M. Grotschel, L. Lovasz, A. Schrijver, Geometric Algorithms andCombinatorial Optimization. Springer, 1988.

I J. Matousek, B. Gartner, Using and Understanding LinearProgramming, Springer, 2006.

I M. Padberg, Linear Optimization and Extensions, Springer, 1995.

I A. Schrijver, Theory of Linear and Integer Programming, Wiley, 1986.

I R. J. Vanderbei, Linear Programming, Springer, 2001.

19

Literature on Combinatorial Optimization (not complete)

I R. K. Ahuja, T. L. Magnanti, J. B. Orlin, Network Flows: Theory,Algorithms, and Applications, Prentice-Hall, 1993.

I W. J. Cook, W. H. Cunningham, W. R. Pulleyblank, A. Schrijver,Combinatorial Optimization, Wiley, 1998.

I L. R. Ford, D. R. Fulkerson, Flows in Networks, Princeton UniversityPress, 1962.

I M. R. Garey, D. S. Johnson, Computers and Intractability: A Guide tothe Theory of NP-Completeness, Freeman, 1979.

I B. Korte, J. Vygen, Combinatorial Optimization: Theory andAlgorithms, Springer, 2002.

I C. H. Papadimitriou, K. Steiglitz, Combinatorial Optimization:Algorithms and Complexity, Dover Publications, reprint 1998.

I A. Schrijver, Combinatorial Optimization: Polyhedra and Efficiency,Springer, 2003.

20

Chapter 1:Linear Programming Basics

(cp. Bertsimas & Tsitsiklis, Chapter 1)

21

Example of a Linear Program

minimize 2x1 − x2 + 4x3

subject to x1 + x2 + x4 ≤ 2

3x2 − x3 = 5

x3 + x4 ≥ 3

x1 ≥ 0

x3 ≤ 0

Remarks.

I objective function is linear in vector of variables x = (x1, x2, x3, x4)T

I constraints are linear inequalities and linear equations

I last two constraints are special(non-negativity and non-positivity constraint, respectively)

22

General Linear Program

minimize cT · xsubject to ai

T · x ≥ bi for i ∈ M1, (1.1)

aiT · x = bi for i ∈ M2, (1.2)

aiT · x ≤ bi for i ∈ M3, (1.3)

xj ≥ 0 for j ∈ N1, (1.4)

xj ≤ 0 for j ∈ N2, (1.5)

with c ∈ Rn, ai ∈ Rn and bi ∈ R for i ∈ M1 ∪M2 ∪M3 (finite index sets),and N1,N2 ⊆ {1, . . . , n} given.

I x ∈ Rn satisfying constraints (1.1) – (1.5) is a feasible solution.I feasible solution x∗ is optimal solution if

cT · x∗ ≤ cT · x for all feasible solutions x .

I linear program is unbounded if, for all k ∈ R, there is a feasiblesolution x ∈ Rn with cT · x ≤ k.

23

Special Forms of Linear Programs

I maximizing cT · x is equivalent to minimizing (−c)T · x .

I any linear program can be written in the form

minimize cT · xsubject to A · x ≥ b

for some A ∈ Rm×n and b ∈ Rm:

I rewrite aiT · x = bi as: ai

T · x ≥ bi ∧ aiT · x ≤ bi ,

I rewrite aiT · x ≤ bi as: (−ai )

T · x ≥ −bi .

I Linear program in standard form:

min cT · xs.t. A · x = b

x ≥ 0

with A ∈ Rm×n, b ∈ Rm, and c ∈ Rn.

24

Example: Diet Problem

Given:

I n different foods, m different nutrients

I aij := amount of nutrient i in one unit of food j

I bi := requirement of nutrient i in some ideal diet

I cj := cost of one unit of food j

Task: find a cheapest ideal diet consisting of foods 1, . . . , n.

LP formulation: Let xj := number of units of food j in the diet:

min cT · x min cT · xs.t. A · x = b or s.t. A · x ≥ b

x ≥ 0 x ≥ 0

with A = (aij) ∈ Rm×n, b = (bi ) ∈ Rm, c = (cj) ∈ Rn.

25

Reduction to Standard Form

Any linear program can be brought into standard form:

I elimination of free (unbounded) variables xj :

replace xj with x+j , x

−j ≥ 0: xj = x+

j − x−j

I elimination of non-positive variables xj :

replace xj ≤ 0 with (−xj) ≥ 0.

I elimination of inequality constraint aiT · x ≤ bi :

introduce slack variable s ≥ 0 and rewrite: aiT · x + s = bi

I elimination of inequality constraint aiT · x ≥ bi :

introduce slack variable s ≥ 0 and rewrite: aiT · x − s = bi

26

ExampleThe linear program

min 2 x1 + 4 x2

s.t. x1 + x2 ≥ 3

3 x1 + 2 x2 = 14

x1 ≥ 0

is equivalent to the standard form problem

min 2 x1 + 4 x+2 − 4 x−2

s.t. x1 + x+2 − x−2 − x3 = 3

3 x1 + 2 x+2 − 2 x−2 = 14

x1, x+2 , x−2 , x3 ≥ 0

27

Affine Linear and Convex Functions

Lemma 1.1.

a An affine linear function f : Rn → R given by f (x) = cT · x + d withc ∈ Rn, d ∈ R, is both convex and concave.

b If f1, . . . , fk : Rn → R are convex functions, then f : Rn → R definedby f (x) := maxi=1,...,k fi (x) is also convex.

Proof: . . .

28

Piecewise Linear Convex Objective FunctionsLet c1, . . . , ck ∈ Rn and d1, . . . , dk ∈ R.

Consider piecewise linear convex function: x 7→ maxi=1,...,k ciT · x + di :

min maxi=1,...,k

ciT · x + di min z

s.t. A · x ≥ b ←→ s.t. z ≥ ciT · x + di for all i

A · x ≥ b

Example: let c1, . . . , cn ≥ 0

minn∑

i=1

ci · |xi | minn∑

i=1

ci · zi minn∑

i=1

ci · (x+i + x−i )

s.t. A · x ≥ b ↔ s.t. zi ≥ xi ↔ s.t. A · (x+ − x−) ≥ b

zi ≥ −xi x+, x− ≥ 0

A · x ≥ b29

Graphical Representation and Solution

2D example:

min −x1 − x2

s.t. x1 + 2 x2 ≤ 3

2 x1 + x2 ≤ 3

x1, x2 ≥ 0

30

Graphical Representation and Solution (cont.)

3D example:

min −x1 − x2 − x3

s.t. x1 ≤ 1

x2 ≤ 1

x3 ≤ 1

x1, x2, x3 ≥ 0

31

Graphical Representation and Solution (cont.)another 2D example:

min c1 x1 + c2 x2

s.t. −x1 + x2 ≤ 1

x1, x2 ≥ 0

I for c = (1, 1)T , the unique optimal solution is x = (0, 0)T

I for c = (1, 0)T , the optimal solutions are exactly the points

x = (0, x2)T with 0 ≤ x2 ≤ 1

I for c = (0, 1)T , the optimal solutions are exactly the points

x = (x1, 0)T with x1 ≥ 0

I for c = (−1,−1)T , the problem is unbounded, optimal cost is −∞

I if we add the constraint x1 + x2 ≤ −1, the problem is infeasible

32

Properties of the Set of Optimal Solutions

In the last example, the following 5 cases occurred:

i there is a unique optimal solution

ii there exist infinitely many optimal solutions, but the set of optimalsolutions is bounded

iii there exist infinitely many optimal solutions and the set of optimalsolutions is unbounded

iv the problem is unbounded, i. e., the optimal cost is −∞ and nofeasible solution is optimal

v the problem is infeasible, i. e., the set of feasible solutions is empty

These are indeed all cases that can occur in general (see later).

33

Visualizing LPs in Standard Form

Example:Let A = (1, 1, 1) ∈ R1×3, b = (1) ∈ R1 and consider the set of feasiblesolutions

P = {x ∈ R3 | A · x = b, x ≥ 0} .

More general:

I if A ∈ Rm×n with m ≤ n and the rows of A are linearly independent,then

{x ∈ Rn | A · x = b}is an (n −m)-dimensional hyperplane in Rn.

I set of feasible solutions lies in this hyperplane and is only constrainedby non-negativity constraints x ≥ 0.

34

Chapter 2:The Geometry of Linear Programming


35

Polyhedra and Polytopes

Definition 2.1.

Let A ∈ Rm×n and b ∈ Rm.

a set {x ∈ Rn | A · x ≥ b} is called polyhedron

b {x | A · x = b, x ≥ 0} is polyhedron in standard form representation

Definition 2.2.

a Set S ⊆ Rn is bounded if there is K ∈ R such that

‖x‖∞ ≤ K for all x ∈ S .

b A bounded polyhedron is called polytope.

36

Hyperplanes and Halfspaces

Definition 2.3.

Let a ∈ Rn \ {0} and b ∈ R:

a set {x ∈ Rn | aT · x = b} is called hyperplane

b set {x ∈ Rn | aT · x ≥ b} is called halfspace

Remarks

I Hyperplanes and halfspaces are convex sets.

I A polyhedron is an intersection of finitely many halfspaces.

37

Convex Combination and Convex Hull

Definition 2.4.

Let x1, . . . , xk ∈ Rn and λ1, . . . , λk ∈ R≥0 with λ1 + · · ·+ λk = 1.

a The vector∑k

i=1 λi · x i is a convex combination of x1, . . . , xk .

b The convex hull of x1, . . . , xk is the set of all convex combinations.

38

Convex Sets, Convex Combinations, and Convex Hulls

Theorem 2.5.

a The intersection of convex sets is convex.

b Every polyhedron is a convex set.

c A convex combination of a finite number of elements of a convex setalso belongs to that set.

d The convex hull of finitely many vectors is a convex set.

Proof: . . .

Corollary 2.6.

The convex hull of x1, . . . , xk ∈ Rn is the smallest (w.r.t. inclusion)convex subset of Rn containing x1, . . . , xk .

Proof: . . .

39

Extreme Points and Vertices of Polyhedra

Definition 2.7.

Let P ⊆ Rn be a polyhedron.

a x ∈ P is an extreme point of P if

x 6= λ · y + (1− λ) · z for all y , z ∈ P \ {x}, 0 ≤ λ ≤ 1,

i. e., x is not a convex combination of two other points in P.

b x ∈ P is a vertex of P if there is some c ∈ Rn such that

cT · x < cT · y for all y ∈ P \ {x},

i. e., x is the unique optimal solution to the LP min{cT · z | z ∈ P}.

40

Active and Binding Constraints

In the following, let P ⊆ Rn be a polyhedron defined by

aiT · x ≥ bi for i ∈ M1,

aiT · x = bi for i ∈ M2,

with ai ∈ Rn and bi ∈ R, for all i .

Definition 2.8.

If x∗ ∈ Rn satisfies aiT · x∗ = bi for some i , then the corresponding

constraint is active (or binding) at x∗.

41

Basic Facts from Linear Algebra

Theorem 2.9.

Let x∗ ∈ Rn and I = {i | aiT · x∗ = bi}. The following are equivalent:

i there are n vectors in {ai | i ∈ I} which are linearly independent;

ii the vectors in {ai | i ∈ I} span Rn;

iii x∗ is the unique solution to the system of equations aiT · x = bi , i ∈ I .

42

Vertices, Extreme Points, and Basic Feasible Solutions

Definition 2.10.

a x∗ ∈ Rn is a basic solution of P if

I all equality constraints are active andI there are n linearly independent constraints that are active.

b A basic solution satisfying all constraints is a basic feasible solution.

Theorem 2.11.

For x∗ ∈ P, the following are equivalent:

i x∗ is a vertex of P;

ii x∗ is an extreme point of P;

iii x∗ is a basic feasible solution of P.

Proof: . . .

43

Number of Vertices

Corollary 2.12.

a A polyhedron has a finite number of vertices and basic solutions.

b For a polyhedron in Rn given by linear equations and m linearinequalities, this number is at most

(mn

).

Example:

P := {x ∈ Rn | 0 ≤ xi ≤ 1, i = 1, . . . , n} (n-dimensional unit cube)

I number of constraints: m = 2n

I number of vertices: 2n

44

Adjacent Basic Solutions and Edges

Definition 2.13.

Let P ⊆ Rn be a polyhedron.

a Two distinct basic solutions are adjacent if there are n − 1 linearlyindependent constraints that are active at both of them.

b If both solutions are feasible, the line segment that joins them is anedge of P.

45

Polyhedra in Standard FormLet A ∈ Rm×n, b ∈ Rm, and P = {x ∈ Rn | A · x = b, x ≥ 0}.ObservationOne can assume without loss of generality that rank(A) = m.

Theorem 2.14.

x ∈ Rn is a basic solution of P if and only if A · x = b and there areindices B(1), . . . ,B(m) ∈ {1, . . . , n} such that

I columns AB(1), . . . ,AB(m) of matrix A are linearly independent and

I xi = 0 for all i 6∈ {B(1), . . . ,B(m)}.

Proof: . . .

I xB(1), . . . , xB(m) are basic variables, the remaining variables non-basic.

I The vector of basic variables is denoted by xB := (xB(1), . . . , xB(m))T .

I AB(1), . . . ,AB(m) are basic columns of A and form a basis of Rm.

I The matrix B := (AB(1), . . . ,AB(m)) ∈ Rm×m is called basis matrix.46

Basic Columns and Basic Solutions

Observation 2.15.

Let x ∈ Rn be a basic solution, then:

I B · xB = b and thus xB = B−1 · b;

I x is a basic feasible solution if and only if xB = B−1 · b ≥ 0.

Example: m = 2

A1

A2

A3

A4= −A1

b

I A1,A3 or A2,A3 form bases with corresp. basic feasible solutions.

I A1,A4 do not form a basis.

I A1,A2 and A2,A4 and A3,A4 form bases with infeasible basic solution.

47

Bases and Basic Solutions

Corollary 2.16.

I Every basis AB(1), . . . ,AB(m) determines a unique basic solution.

I Thus, different basic solutions correspond to different bases.

I But: two different bases might yield the same basic solution.

Example: If b = 0, then x = 0 is the only basic solution.

48

Adjacent Bases

Definition 2.17.

Two bases AB(1), . . . ,AB(m) and AB′(1), . . . ,AB′(m) are adjacent if theyshare all but one column.

Observation 2.18.

a Two adjacent basic solutions can always be obtained from twoadjacent bases.

b If two adjacent bases lead to distinct basic solutions, then the latterare adjacent.

49

Degeneracy

Definition 2.19.

A basic solution x of a polyhedron P is degenerate if more than nconstraints are active at x .

Observation 2.20.

Let P = {x ∈ Rn | A · x = b, x ≥ 0} be a polyhedron in standard formwith A ∈ Rm×n and b ∈ Rm.

a A basic solution x ∈ P is degenerate if and only if more than n −mcomponents of x are zero.

b For a non-degenerate basic solution x ∈ P, there is a unique basis.

50

Three Different Reasons for Degeneracy

i redundant variablesExample: x1 + x2 = 1

x3 = 0

x1, x2, x3 ≥ 0

←→ A =

(1 1 0

0 0 1

)

ii redundant constraintsExample: x1 + 2 x2 ≤ 3

2 x1 + x2 ≤ 3

x1 + x2 ≤ 2

x1, x2 ≥ 0

iii geometric reasonsExample: Octahedron

Observation 2.21.

Perturbing the right hand side vector b may remove degeneracy.

51

Existence of Extreme Points

Definition 2.22.

A polyhedron P ⊆ Rn contains a line if there is x ∈ P and a directiond ∈ Rn \ {0} such that

x + λ · d ∈ P for all λ ∈ R.

Theorem 2.23.

Let P = {x ∈ Rn | A · x ≥ b} 6= ∅ with A ∈ Rm×n and b ∈ Rm. Thefollowing are equivalent:

i There exists an extreme point x ∈ P.

ii P does not contain a line.

iii A contains n linearly independent rows.

Proof: . . .

52

Existence of Extreme Points (cont.)

Corollary 2.24.

a A non-empty polytope contains an extreme point.

b A non-empty polyhedron in standard form contains an extreme point.

Proof of b:

A · x = b

x ≥ 0←→

A

−A

I

· x ≥

b

−b

0

Example:

P =

x1

x2

x3

∈ R3

∣∣∣∣∣x1 + x2 ≥ 1x1 + 2 x2 ≥ 0

contains a line since

110

+ λ ·

001

∈ P for all λ ∈ R.

53

Optimality of Extreme Points

Theorem 2.25.

Let P ⊆ Rn a polyhedron and c ∈ Rn. If P has an extreme point andmin{cT · x | x ∈ P} is bounded, there is an extreme point that is optimal.

Proof:. . .

Corollary 2.26.

Every linear programming problem is either infeasible or unbounded orthere exists an optimal solution.

Proof: Every linear program is equivalent to an LP in standard form.The claim thus follows from Corollary 2.24 and Theorem 2.25.

54

Chapter 3:The Simplex Method


55

Linear Program in Standard Form

Throughout this chapter, we consider the following standard form problem:

minimize cT · xsubject to A · x = b

x ≥ 0

with A ∈ Rm×n, rank(A) = m, b ∈ Rm, and c ∈ Rn.

56

Basic Directions

Observation 3.1.

Let B = (AB(1), . . . ,AB(m)) be a basis matrix. The values of the basicvariables xB(1), . . . , xB(m) in the system A · x = b are uniquely determinedby the values of the non-basic variables.

Proof: A · x = b ⇐⇒ B · xB +∑

j 6=B(1),...,B(m)

Aj · xj = b

⇐⇒ xB = B−1 · b −∑

j 6=B(1),...,B(m)

B−1 · Aj · xj

Definition 3.2.

For fixed j 6= B(1), . . . ,B(m), let d ∈ Rn be given by

dj := 1, dB := −B−1 · Aj , and dj ′ := 0 for j ′ 6= j ,B(1), . . . ,B(m).

Then A · (x + θ · d) = b, for all θ ∈ R, and d is the jth basic direction.

57

Feasible Directions

Definition 3.3.

Let P ⊆ Rn a polyhedron. For x ∈ P the vector d ∈ Rn \ {0} is afeasible direction at x if there is a θ > 0 with x + θ · d ∈ P.

Example: Some feasible directions at several points of a polyhedron.

58

Feasible DirectionsConsider a basic feasible solution x .

Question: Is the jth basic directions d a feasible direction?

Case 1: If x is a non-degenerate feasible solution, then xB > 0 andx + θ · d ≥ 0 for θ > 0 small enough. −→ answer is yes!

Case 2: If x is degenerate, the answer might be no!E. g., if xB(i) = 0 and dB(i) < 0, then x + θ · d 6≥ 0, for all θ > 0.

Example: n = 5, m = 3, n −m = 2

x2 = 0

x 1=

0

x3 = 0

x4

=0

x5 =

0

yz

I 1st basic direction at y(basic variables x2, x4, x5)

I 3rd basic direction at z(basic variables x1, x2, x4)

59

Reduced Cost CoefficientsConsider a basic solution x .

Question:How does the cost change when moving along the jth basic direction d?

cT · (x + θ · d) = cT · x + θ · cT · d = cT · x + θ · (cj − cBT · B−1 · Aj)︸︷︷︸cj :=

Definition 3.4.

For a given basic solution x , the reduced cost of variable xj , j = 1, . . . , n, is

cj := cj − cBT · B−1 · Aj .

Observation 3.5.

The reduced cost of a basic variable xB(i) is zero.

Proof: cB(i) = cB(i) − cBT · B−1 · AB(i)︸︷︷︸

= ei

= cB(i) − cB(i) = 0

60

Optimality Criterion

Theorem 3.6.

Let x be a basic feasible solution and c the vector of reduced costs.

a If c ≥ 0, then x is an optimal solution.

b If x is an optimal solution and non-degenerate, then c ≥ 0.

Proof: . . .

Definition 3.7.

A basis matrix B is optimal if

i B−1 · b ≥ 0 and

ii cT = cT − cBT · B−1 · A ≥ 0.

Observation 3.8.

If B is an optimal basis, the associated basic solution x is feasible andoptimal.

61

Developement of the Simplex MethodAssumption (for now): only non-degenerate basic feasible solutions

Let x be a basic feasible solution with cj < 0 for some j 6= B(1), . . . ,B(m).

Let d be the jth basic direction:

0 > cj = cT · dIt is desirable to go to y := x + θ∗ · d with θ∗ := max{θ | x + θ · d ∈ P}.

Question: How to determine θ∗?

By construction of d , it holds that A · (x + θ · d) = b for all θ ∈ R, i. e.,

x + θ · d ∈ P ⇐⇒ x + θ · d ≥ 0 .

Case 1: d ≥ 0 =⇒ x + θ · d ≥ 0 for all θ ≥ 0 =⇒ θ∗ =∞Thus, the LP is unbounded.

Case 2: dk < 0 for some k =⇒(

xk + θ dk ≥ 0 ⇐⇒ θ ≤ −xkdk

)

Thus, θ∗ = mink: dk<0

−xkdk

= mini=1,...,mdB(i)<0

−xB(i)

dB(i)> 0 .

62

Developement of the Simplex Method (cont.)Assumption (for now): only non-degenerate basic feasible solutions

Let x be a basic feasible solution with cj < 0 for some j 6= B(1), . . . ,B(m).

Let d be the jth basic direction:

0 > cj = cT · dIt is desirable to go to y := x + θ∗ · d with θ∗ := max{θ | x + θ · d ∈ P}.

θ∗ = mink: dk<0

−xkdk

= mini=1,...,mdB(i)<0

−xB(i)

dB(i)

Let ` ∈ {1, . . . ,m} with θ∗ =−xB(`)

dB(`), then yj = θ∗ and yB(`) = 0.

=⇒ xj replaces xB(`) as a basic variable and we get a new basis matrix

B =(

AB(1), . . . ,AB(`−1),Aj ,AB(`+1), . . . ,AB(m)

)=(

AB(1), . . . ,AB(m)

)

with B(i) =

{B(i) if i 6= `,

j if i = `.63

Core of the Simplex Method

Theorem 3.9.

Let x be a non-degenerate basic feasible solution, j 6= B(1), . . . ,B(m) withcj < 0, d the jth feasible direction, and θ∗ := max{θ | x + θ · d ∈ P} <∞.

a θ∗ = mini=1,...,mdB(i)<0

−xB(i)

dB(i)=−xB(`)

dB(`)for some ` ∈ {1, . . . ,m}.

Let B(i) := B(i) for i 6= ` and B(`) := j .

b AB(1), . . . ,AB(m) are linearly independent and B is a basis matrix.

c y := x + θ∗ · d is a basic feasible solution associated with B andcT · y < cT · x .

Proof: . . .

64

An Iteration of the Simplex Method

Given: basis B = (AB(1) . . .AB(m)), corresponding basic feasible solution x

1 Let cT := cT − cBT · B−1 · A. If c ≥ 0, then STOP;

else choose j with cj < 0.

2 Let u := B−1 ·Aj . If u ≤ 0, then STOP (optimal cost is −∞).

3 Let θ∗ := mini :ui>0

xB(i)

ui=

xB(`)

u`for some ` ∈ {1, . . . ,m}.

4 Form new basis by replacing AB(`) with Aj ; correspondingbasic feasible solution y is given by

yj := θ∗ and yB(i) = xB(i) − θ∗ui for i 6= `.

Remark: We say that the nonbasic variable xj enters the basis and thebasic variable xB(`) leaves the basis.

65

Correctness of the Simplex Method

Theorem 3.10.

If every basic feasible solution is non-degenerate, the simplex methodterminates after finitely many iterations in one of the following two states:

i we have an optimal basis B and an associated basic feasiblesolution x which is optimal;

ii we have a vector d satisfying A · d = 0, d ≥ 0, and cT · d < 0;the optimal cost is −∞.

Proof sketch: The simplex method makes progress in every iteration.Since there are only finitely many different basic feasible solutions,it stops after a finite number of iterations.

66

Simplex Method for Degenerate Problems

I An iteration of the simplex method can also be applied if x is adegenerate basic feasible solution.

I In this case it might happen that θ∗ := mini :ui>0

xB(i)

ui=

xB(`)

u`= 0 if

some basic variable xB(`) is zero and dB(`) < 0.

I Thus, y = x + θ∗ · d = x and the current basic feasible solution doesnot change.

I But replacing AB(`) with Aj still yields a new basis with associatedbasic feasible solution y = x .

Remark: Even if θ∗ is positive, more than one of the original basicvariables may become zero at the new point x + θ∗ · d . Since only one ofthem leaves the basis, the new basic feasible solution y is degenerate.

67

Example

c

x6

=0x 3

=0

x5 =0x4

=0

x2 =

0 x1=

0

x

y

68

Pivot SelectionQuestion: How to choose j with cj < 0 and ` with

xB(`)

u`= min

i :ui>0

xB(i)

ui

if several possible choices exist?

Attention: Choice of j is critical for overall behavior of simplex method.

Three popular choices are:

I smallest subscript rule: choose smallest j with cj < 0.(very simple; no need to compute entire vector c ; usually leads tomany iterations)

I steepest descent rule: choose j such that cj < 0 is minimal.(relatively simple; commonly used for mid-size problems; does notnecessarily yield the best neighboring solution)

I best improvement rule: choose j such that θ∗ cj is minimal.(computationally expensive; used for large problems; usually leads tovery few iterations)

69

Revised Simplex Method

Observation 3.11.

To execute one iteration of the simplex method efficiently, it suffices toknow B(1), . . . ,B(m), the inverse B−1 of the basis matrix and the inputdata A, b, and c. It is then easy to compute:

xB = B−1 · b cT = cT − cBT · B−1 · A

u = B−1 · Aj θ∗ = mini :ui>0

xB(i)

ui=

xB(`)

u`

The new basis matrix is then

B =(AB(1), . . . ,AB(`−1),Aj ,AB(`+1), . . . ,AB(m)

)

Critical question: How to obtain B−1 efficiently?

70

Computing the Inverse of the Basis MatrixI Notice that B−1 · B = (e1, . . . , e`−1, u, e`+1, . . . , em).

I Thus, B−1 can be obtained from B−1 as follows:I multiply `th row of B−1 with 1/u`;I for i 6= `, subtract ui times resulting `th row from ith row.

I These are exactly the elementary row operations needed to turnB−1 · B into the identity matrix!

I Elementary row operations are the same as multiplying the matrixwith corresponding elementary matrices from the left hand side.

I Equivalently:

Obtaining B−1 from B−1

Apply elementary row operations to the matrix (B−1 | u) to make the lastcolumn equal to the unit vector e`. The first m columns of the resultingmatrix form the inverse B−1 of the new basis matrix B.

71

An Iteration of the Revised Simplex MethodGiven: B =

(AB(1), . . . ,AB(m)

), corresp. basic feasible sol. x , and B−1.

1 Let pT := cBT ·B−1 and cj := cj − pT ·Aj , j 6= B(1), . . . ,B(m);

if c ≥ 0, then STOP; else choose j with cj < 0.

2 Let u := B−1 · Aj . If u ≤ 0, then STOP (optimal cost is −∞).


xB(i)

ui=

xB(`)

u`for some ` ∈ {1, . . . ,m}.

4 Form new basis by replacing AB(`) with Aj ; corresponding basicfeasible solution y is given by

yj := θ∗ and yB(i) = xB(i) − θ∗ ui for i 6= `.

5 Apply elementary row operations to the matrix (B−1 | u) tomake the last column equal to the unit vector e`.

The first m columns of the resulting matrix yield B−1.

72

Full Tableau Implementation

Main idea

Instead of maintaining and updating the matrix B−1, we maintain andupdate the m × (n + 1)-matrix

B−1 · (b | A) = (B−1 · b | B−1 · A)

which is called simplex tableau.

I The zeroth column B−1 · b contains xB .

I For i = 1, . . . , n, the ith column of the tableau is B−1 · Ai .

I The column u = B−1 · Aj corresponding to the variable xj that isabout to enter the basis is the pivot column.

I If the `th basic variable xB(`) exits the basis, the `th row of thetableau is the pivot row.

I The element u` > 0 is the pivot element.

73

Full Tableau Implementation (cont.)

Notice: The simplex tableau B−1 · (b | A) represents the linear equation

B−1 · b = B−1 · A · x

which is equivalent to A · x = b.

Updating the simplex tableau

I At the end of an iteration, the simplex tableau B−1 · (b | A) has to beupdated to B−1 · (b | A).

I B−1 can be obtained from B−1 by elementary row operations,i. e., B−1 = Q · B−1 where Q is a product of elementary matrices.

I Thus, B−1 · (b | A) = Q · B−1 · (b | A) and new tableau B−1 · (b | A)can be obtained by applying the same elementary row operations.

74

Zeroth Row of the Simplex TableauIn order to keep track of the objective function value and the reducedcosts, we consider the following augmented simplex tableau:

−cBTB−1b cT − cB

TB−1A

B−1b B−1A

or in more detail

−cBT xB c1 · · · cn

xB(1) | |... B−1A1 · · · B−1An

xB(m) | |

Update after one iteration

The zeroth row is updated by adding a multiple of the pivot row to thezeroth row to set the reduced cost of the entering variable to zero.

75

An Iteration of the Full Tableau Implementation

Given: Simplex tableau corresp. to feasible basis B =(AB(1), . . . ,AB(m)

).

1 If c ≥ 0 (zeroth row), then STOP; else choose pivot column jwith cj < 0.

2 If u = B−1Aj ≤ 0 (jth column), STOP (optimal cost is −∞).


xB(i)

ui=

xB(`)

u`for some ` ∈ {1, . . . ,m}

(cp. columns 0 and j).

4 Form new basis by replacing AB(`) with Aj .

5 Apply elementary row operations to the simplex tableau so thatu` (pivot element) becomes one and all other entries of the pivotcolumn become zero.

76

Full Tableau Implementation: An Example

A simple linear programming problem:

min −10 x1 − 12 x2 − 12 x3

s.t. x1 + 2 x2 + 2 x3 ≤ 202 x1 + x2 + 2 x3 ≤ 202 x1 + 2 x2 + x3 ≤ 20

x1, x2, x3 ≥ 0

77

Set of Feasible Solutions

A = (0, 0, 0)T

B = (0, 0, 10)T

C = (0, 10, 0)T

D = (10, 0, 0)T

E = (4, 4, 4)Tx1

x2

x3

A

B

C

D

E

78

Introducing Slack Variablesmin −10 x1 − 12 x2 − 12 x3

s.t. x1 + 2 x2 + 2 x3 ≤ 202 x1 + x2 + 2 x3 ≤ 202 x1 + 2 x2 + x3 ≤ 20

x1, x2, x3 ≥ 0

LP in standard formmin −10 x1 − 12 x2 − 12 x3

s.t. x1 + 2 x2 + 2 x3 + x4 = 202 x1 + x2 + 2 x3 + x5 = 202 x1 + 2 x2 + x3 + x6 = 20

x1, . . . , x6 ≥ 0

Observation

The right hand side of the system is non-negative. Therefore the point(0, 0, 0, 20, 20, 20)T is a basic feasible solution and we can start thesimplex method with basis B(1) = 4,B(2) = 5,B(3) = 6.

79

Setting Up the Simplex Tableau

x1 x2 x3 x4 x5 x6

xB(i)

ui

0 − 10 − 12 − 12 0 0 0x4 = 20 1 2 2 1 0 0

20

x5 = 20 2 1 2 0 1 0

10

x6 = 20 2 2 1 0 0 1

10

I Determine pivot column (e. g., take smallest subscript rule).

I c1 < 0 and x1 enters the basis.

I Find pivot row with ui > 0 minimizingxB(i)

ui.

I Rows 2 and 3 both attain the minimum.

I Choose i = 2 with B(i) = 5. =⇒ x5 leaves the basis.

I Perform basis change: Eliminate other entries in the pivot column.

I Obtain new basic feasible solution (10, 0, 0, 10, 0, 0)T with cost -100.

80


x1 x2 x3 x4 x5 x6xB(i)

ui0 − 10 − 12 − 12 0 0 0

x4 = 20 1 2 2 1 0 0 20x5 = 20 2 1 2 0 1 0 10x6 = 20 2 2 1 0 0 1 10




ui.





80



ui0 − 10 − 12 − 12 0 0 0

x4 = 20 1 2 2 1 0 0 20x5 = 20 2 1 2 0 1 0 10x6 = 20 2 2 1 0 0 1 10




ui.





80


x1 x2 x3 x4 x5 x6

xB(i)

ui

0 − 10 − 12 − 12 0 0 0x4 = 20 1 2 2 1 0 0

20

x5 = 20 2 1 2 0 1 0

10

x6 = 20 2 2 1 0 0 1

10




ui.





80


x1 x2 x3 x4 x5 x6

xB(i)

ui

100 0 − 7 − 2 0 5 0x4 = 20 1 2 2 1 0 0

20

x5 = 20 2 1 2 0 1 0

10

x6 = 20 2 2 1 0 0 1

10




ui.





80


x1 x2 x3 x4 x5 x6

xB(i)

ui

100 0 − 7 − 2 0 5 0x4 = 20 1 2 2 1 0 0

20

x5 = 20 2 1 2 0 1 0

10

x6 = 20 2 2 1 0 0 1

10




ui.





80


x1 x2 x3 x4 x5 x6

xB(i)

ui

100 0 − 7 − 2 0 5 0x4 = 10 0 1.5 1 1 − 0.5 0

20

x5 = 20 2 1 2 0 1 0

10

x6 = 20 2 2 1 0 0 1

10




ui.





80


x1 x2 x3 x4 x5 x6

xB(i)

ui

100 0 − 7 − 2 0 5 0x4 = 10 0 1.5 1 1 − 0.5 0

20

x5 = 20 2 1 2 0 1 0

10

x6 = 20 2 2 1 0 0 1

10




ui.





80


x1 x2 x3 x4 x5 x6

xB(i)

ui

100 0 − 7 − 2 0 5 0x4 = 10 0 1.5 1 1 − 0.5 0

20

x5 = 20 2 1 2 0 1 0

10

x6 = 0 0 1 − 1 0 − 1 1

10




ui.





80


x1 x2 x3 x4 x5 x6

xB(i)

ui

100 0 − 7 − 2 0 5 0x4 = 10 0 1.5 1 1 − 0.5 0

20

x5 = 20 2 1 2 0 1 0

10

x6 = 0 0 1 − 1 0 − 1 1

10




ui.





80


x1 x2 x3 x4 x5 x6

xB(i)

ui

100 0 − 7 − 2 0 5 0x4 = 10 0 1.5 1 1 − 0.5 0

20

x1 = 10 1 0.5 1 0 0.5 0

10

x6 = 0 0 1 − 1 0 − 1 1

10




ui.





80


x1 x2 x3 x4 x5 x6

xB(i)

ui

100 0 − 7 − 2 0 5 0x4 = 10 0 1.5 1 1 − 0.5 0

20

x1 = 10 1 0.5 1 0 0.5 0

10

x6 = 0 0 1 − 1 0 − 1 1

10




ui.




I Obtain new basic feasible solution (10, 0, 0, 10, 0, 0)T with cost -100.80

Geometric Interpretation in the Original Polyhedron

A = (0, 0, 0)T

B = (0, 0, 10)T

C = (0, 10, 0)T

D = (10, 0, 0)T

E = (4, 4, 4)Tx1

x2

x3

AA

B

C

D

E

D

81

Next Iterations

x1 x2 x3 x4 x5 x6

xB(i)

ui

100 0 − 7 − 2 0 5 0x4 = 10 0 1.5 1 1 −0.5 0

10

x1 = 10 1 0.5 1 0 0.5 0

10

x6 = 0 0 1 − 1 0 − 1 1

−

I c2, c3 < 0 =⇒ two possible choices for pivot column.

I Choose x3 to enter the new basis.

I u3 < 0 =⇒ third row cannot be chosen as pivot row.

I Choose x4 to leave basis.

I New basic feasible solution (0, 0, 10, 0, 0, 10)T with cost -120,

corresponding to point B in the original polyhedron.

82

Next Iterations


ui100 0 − 7 − 2 0 5 0

x4 = 10 0 1.5 1 1 −0.5 0 10x1 = 10 1 0.5 1 0 0.5 0 10x6 = 0 0 1 − 1 0 − 1 1 −







82

Next Iterations

x1 x2 x3 x4 x5 x6

xB(i)

ui

100 0 − 7 − 2 0 5 0x4 = 10 0 1.5 1 1 −0.5 0

10

x1 = 10 1 0.5 1 0 0.5 0

10

x6 = 0 0 1 − 1 0 − 1 1

−







82

Next Iterations

x1 x2 x3 x4 x5 x6

xB(i)

ui

120 0 − 4 0 2 4 0x4 = 10 0 1.5 1 1 −0.5 0

10

x1 = 10 1 0.5 1 0 0.5 0

10

x6 = 0 0 1 − 1 0 − 1 1

−







82

Next Iterations

x1 x2 x3 x4 x5 x6

xB(i)

ui

120 0 − 4 0 2 4 0x4 = 10 0 1.5 1 1 −0.5 0

10

x1 = 10 1 0.5 1 0 0.5 0

10

x6 = 0 0 1 − 1 0 − 1 1

−







82

Next Iterations

x1 x2 x3 x4 x5 x6

xB(i)

ui

120 0 − 4 0 2 4 0x4 = 10 0 1.5 1 1 −0.5 0

10

x1 = 0 1 − 1 0 − 1 1 0

10

x6 = 0 0 1 − 1 0 − 1 1

−







82

Next Iterations

x1 x2 x3 x4 x5 x6

xB(i)

ui

120 0 − 4 0 2 4 0x4 = 10 0 1.5 1 1 −0.5 0

10

x1 = 0 1 − 1 0 − 1 1 0

10

x6 = 0 0 1 − 1 0 − 1 1

−







82

Next Iterations

x1 x2 x3 x4 x5 x6

xB(i)

ui

120 0 − 4 0 2 4 0x4 = 10 0 1.5 1 1 −0.5 0

10

x1 = 0 1 − 1 0 − 1 1 0

10

x6 = 10 0 2.5 0 1 − 1.5 1

−







82

Next Iterations

x1 x2 x3 x4 x5 x6

xB(i)

ui

120 0 − 4 0 2 4 0x3 = 10 0 1.5 1 1 −0.5 0

10

x1 = 0 1 − 1 0 − 1 1 0

10

x6 = 10 0 2.5 0 1 − 1.5 1

−







82

Geometric Interpretation in the Original Polyhedron

A = (0, 0, 0)T

B = (0, 0, 10)T

C = (0, 10, 0)T

D = (10, 0, 0)T

E = (4, 4, 4)Tx1

x2

x3

A

B

C

D

E

D

B

83

Next Iterations


ui120 0 −4 0 2 4 0

x3 = 10 0 1.5 1 1 −0.5 0 203

x1 = 0 1 −1 0 −1 1 0 −x6 = 10 0 2.5 0 1 −1.5 1 4

< 203

x2 enters the basis, x6 leaves it. We get

x1 x2 x3 x4 x5 x6

136 0 0 0 3.6 1.6 1.6x3 = 4 0 0 1 0.4 0.4 −0.6x1 = 4 1 0 0 −0.6 0.4 0.4x2 = 4 0 1 0 0.4 −0.6 0.4

and the reduced costs are all non-negative.

Thus (4, 4, 4, 0, 0, 0) is an optimal solution with cost -136, correspondingto point E = (4, 4, 4) in the original polyhedron.

84

Next Iterations


ui120 0 −4 0 2 4 0

x3 = 10 0 1.5 1 1 −0.5 0 203

x1 = 0 1 −1 0 −1 1 0 −x6 = 10 0 2.5 0 1 −1.5 1 4 < 20

3


x1 x2 x3 x4 x5 x6

136 0 0 0 3.6 1.6 1.6x3 = 4 0 0 1 0.4 0.4 −0.6x1 = 4 1 0 0 −0.6 0.4 0.4x2 = 4 0 1 0 0.4 −0.6 0.4



84

Next Iterations

x1 x2 x3 x4 x5 x6

xB(i)

ui

120 0 −4 0 2 4 0x3 = 10 0 1.5 1 1 −0.5 0

203

x1 = 0 1 −1 0 −1 1 0

−

x6 = 10 0 2.5 0 1 −1.5 1

4 < 203


x1 x2 x3 x4 x5 x6

136 0 0 0 3.6 1.6 1.6x3 = 4 0 0 1 0.4 0.4 −0.6x1 = 4 1 0 0 −0.6 0.4 0.4x2 = 4 0 1 0 0.4 −0.6 0.4



84

All Iterations from Geometric Point of View

A = (0, 0, 0)T

B = (0, 0, 10)T

C = (0, 10, 0)T

D = (10, 0, 0)T

E = (4, 4, 4)Tx1

x2

x3

AA

B

C

D

E

85


A = (0, 0, 0)T

B = (0, 0, 10)T

C = (0, 10, 0)T

D = (10, 0, 0)T

E = (4, 4, 4)Tx1

x2

x3

A

B

C

D

E

D

85


A = (0, 0, 0)T

B = (0, 0, 10)T

C = (0, 10, 0)T

D = (10, 0, 0)T

E = (4, 4, 4)Tx1

x2

x3

A

B

C

D

E

B

85


A = (0, 0, 0)T

B = (0, 0, 10)T

C = (0, 10, 0)T

D = (10, 0, 0)T

E = (4, 4, 4)Tx1

x2

x3

A

B

C

D

EE

85

Comparison of Full Tableau and Revised Simplex Methods

The following table gives the computational cost of one iteration of thesimplex method for the two variants introduced above.

full tableau revised simplex

memory O(mn) O(m2)worst-case time O(mn) O(mn)best-case time O(mn) O(m2)

ConclusionI For implementation purposes, the revised simplex method is clearly

preferable due to its smaller memory requirement and smaller averagerunning time.

I The full tableau method is convenient for solving small LP instancesby hand since all necessary information is readily available.

86

Practical Performance Enhancements

Numerical stability

The most critical issue when implementing the (revised) simplex method isnumerical stability. In order to deal with this, a number of additional ideasfrom numerical linear algebra are needed.

I Every update of B−1 introduces roundoff or truncation errors whichaccumulate and might eventually lead to highly inaccurate results.Solution: Compute the matrix B−1 from scratch once in a while.

I Instead of computing B−1 explicitly, it can be stored as a product ofmatrices Qk · Qk−1 · . . . · Q1 where each matrix Qi can be specified in

terms of m coefficients. Then B−1

= Qk+1 · B−1 = Qk+1 · . . . · Q1.This might also save space.

I Instead of computing B−1 explicitly, compute and store anLR-decomposition.

87

CyclingProblem: If an LP is degenerate, the simplex method might end up in aninfinite loop (cycling).

Example:

x1 x2 x3 x4 x5 x6 x7

3 −3/4 20 −1/2 6 0 0 0x5 = 0 1/4 −8 −1 9 1 0 0x6 = 0 1/2 −12 −1/2 3 0 1 0x7 = 1 0 0 1 0 0 0 1

Pivoting rules

I Column selection: let nonbasic variable with most negative reducedcost cj enter the basis, i. e., steepest descent rule.

I Row selection: among basic variables that are eligible to exit thebasis, select the one with smallest subscript.

88

Iteration 1

x1 x2 x3 x4 x5 x6 x7xB(i)

ui3 − 3/4 20 − 1/2 6 0 0 0

x5 = 0 1/4 − 8 − 1 9 1 0 0 0x6 = 0 1/2 − 12 − 1/2 3 0 1 0 0x7 = 1 0 0 1 0 0 0 1 −

Basis change: x1 enters the basis x5 leaves.

Bases visited

(5, 6, 7)

89

Iteration 1

x1 x2 x3 x4 x5 x6 x7

xB(i)

ui

3 − 3/4 20 − 1/2 6 0 0 0x5 = 0 1/4 − 8 − 1 9 1 0 0

0

x6 = 0 1/2 − 12 − 1/2 3 0 1 0

0

x7 = 1 0 0 1 0 0 0 1

−


Bases visited

(5, 6, 7)

89

Iteration 1

x1 x2 x3 x4 x5 x6 x7

xB(i)

ui

3 0 − 4 − 7/2 33 3 0 0x5 = 0 1/4 − 8 − 1 9 1 0 0

0

x6 = 0 1/2 − 12 − 1/2 3 0 1 0

0

x7 = 1 0 0 1 0 0 0 1

−


Bases visited

(5, 6, 7)

89

Iteration 1

x1 x2 x3 x4 x5 x6 x7

xB(i)

ui

3 0 − 4 − 7/2 33 3 0 0x5 = 0 1/4 − 8 − 1 9 1 0 0

0

x6 = 0 1/2 − 12 − 1/2 3 0 1 0

0

x7 = 1 0 0 1 0 0 0 1

−


Bases visited

(5, 6, 7)

89

Iteration 1

x1 x2 x3 x4 x5 x6 x7

xB(i)

ui

3 0 − 4 − 7/2 33 3 0 0x5 = 0 1/4 − 8 − 1 9 1 0 0

0

x6 = 0 0 4 3/2 − 15 − 2 1 0

0

x7 = 1 0 0 1 0 0 0 1

−


Bases visited

(5, 6, 7)

89

Iteration 1

x1 x2 x3 x4 x5 x6 x7

xB(i)

ui

3 0 − 4 − 7/2 33 3 0 0x5 = 0 1/4 − 8 − 1 9 1 0 0

0

x6 = 0 0 4 3/2 − 15 − 2 1 0

0

x7 = 1 0 0 1 0 0 0 1

−


Bases visited

(5, 6, 7)

89

Iteration 1

x1 x2 x3 x4 x5 x6 x7

xB(i)

ui

3 0 − 4 − 7/2 33 3 0 0x5 = 0 1/4 − 8 − 1 9 1 0 0

0

x6 = 0 0 4 3/2 − 15 − 2 1 0

0

x7 = 1 0 0 1 0 0 0 1

−


Bases visited

(5, 6, 7)

89

Iteration 1

x1 x2 x3 x4 x5 x6 x7

xB(i)

ui

3 0 − 4 − 7/2 33 3 0 0x1 = 0 1 − 32 − 4 36 4 0 0

0

x6 = 0 0 4 3/2 − 15 − 2 1 0

0

x7 = 1 0 0 1 0 0 0 1

−


Bases visited

(5, 6, 7)

89

Iteration 2


ui3 0 − 4 −7/2 33 3 0 0

x1 = 0 1 −32 −4 36 4 0 0 −x6 = 0 0 4 3/2 −15 −2 1 0 0x7 = 1 0 0 1 0 0 0 1 −


Bases visited

(5, 6, 7) → (1, 6, 7)

90

Iteration 3


ui3 0 0 − 2 18 1 1 0

x1 = 0 1 0 8 156 −12 4 0 0x2 = 0 0 1 3/8 −15/4 −1/2 1/4 0 0x7 = 1 0 0 1 0 0 0 1 1


Bases visited

(5, 6, 7) → (1, 6, 7) → (1, 2, 7)

91

Iteration 4


ui3 1/4 0 0 − 3 −2 3 0

x3 = 0 1/8 0 1 −21/2 −3/2 1 0 −x2 = 0 −3/64 1 0 3/16 1/16 −1/8 0 0x7 = 1 −1/8 0 0 21/2 3/2 −1 1 2/21


Bases visited

(5, 6, 7) → (1, 6, 7) → (1, 2, 7) → (3, 2, 7)

92

Iteration 5


ui3 −1/2 16 0 0 − 1 1 0

x3 = 0 −5/2 56 1 0 2 −6 0 0x4 = 0 −1/4 16/3 0 1 1/3 −2/3 0 0x7 = 1 5/2 −56 0 0 −2 6 1 −


Bases visited

(5, 6, 7) → (1, 6, 7) → (1, 2, 7) → (3, 2, 7) → (3, 4, 7)

Observation

After 4 pivoting iterations our basic feasible solution still has not changed.

93

Iteration 6


ui3 −7/4 44 1/2 0 0 − 2 0

x1 = 0 −5/4 28 1/2 0 1 −3 0 −x2 = 0 1/6 −4 −1/6 1 0 1/3 0 0x7 = 1 0 0 1 0 0 0 1 −


Bases visited

(5, 6, 7) → (1, 6, 7) → (1, 2, 7) → (3, 2, 7) → (3, 4, 7)→ (5, 4, 7)

94

Back at the Beginning

x1 x2 x3 x4 x5 x6 x7

3 −3/4 20 −1/2 6 0 0 0x5 = 0 1/4 −8 −1 9 1 0 0x6 = 0 1/2 −12 −1/2 3 0 1 0x7 = 1 0 0 1 0 0 0 1

Bases visited

(5, 6, 7) → (1, 6, 7) → (1, 2, 7) → (3, 2, 7) → (3, 4, 7)→ (5, 4, 7) → (5, 6, 7)

This is the same basis that we started with.

Conclusion

Continuing with the pivoting rules we agreed on at the beginning, thesimplex method will never terminate in this example.

95

Anticycling

We discuss two pivoting rules that are guaranteed to avoid cycling:

I lexicographic rule

I Bland’s rule

96

Lexicographic Order

Definition 3.12.I A vector u ∈ Rn is lexicographically positive (negative) if u 6= 0 and

the first nonzero entry of u is positive (negative). Symbolically, we

write uL> 0 (resp. u

L< 0).

I A vector u ∈ Rn is lexicographically larger (smaller) than a vector

v ∈ Rn if u 6= v and u − vL> 0 (resp. u − v

L< 0).

We write uL> v (resp. u

L< v).

Example:

(0, 2, 3, 0)TL> (0, 2, 1, 4)T

(0, 4, 5, 0)TL< (1, 2, 1, 2)T

97

Lexicographic Pivoting Rule

Lexicographic pivoting rule in the full tableau implementation:

Lexicographic pivoting rule

1 Choose an arbitrary column Aj with cj < 0 to enter the basis. Letu := B−1Aj be the jth column of the tableau.

2 For each i with ui > 0, divide the ith row of the tableau by ui andchoose the lexicographically smallest row `. Then the `th basicvariable xB(`) exits the basis.

RemarkThe lexicographic pivoting rule always leads to a unique choice for theexiting variable. Otherwise two rows of B−1A would have to be linearlydependent which contradicts our assumption on the matrix A.

98

Lexicographic Pivoting Rule (cont.)

Theorem 3.13.

Suppose that the simplex algorithm starts with lexicographically positiverows 1, . . . ,m in the simplex tableau. Suppose that the lexicographicpivoting rule is followed. Then:

a Rows 1, . . . ,m of the simplex tableau remain lexicographically positivethroughout the algorithm.

b The zeroth row strictly increases lexicographically at each iteration.

c The simplex algorithm terminates after a finite number of iterations.

Proof: . . .

99

Remarks on Lexicographic Pivoting Rule

I The lexicographic pivoting rule was derived by considering s smallperturbation of the right hand side vector b leading to anon-degenerate problem (see exercises).

I The lexicographic pivoting rule can also be used in conjunction withthe revised simplex method, provided that B−1 is computed explicitly(this is not the case in sophisticated implementations).

I The assumption in the theorem on the lexicographically positive rowsin the tableau can be made without loss of generality: Rearrange thecolumns of A such that the basic columns (forming the identity matrixin the tableau) come first. Since the zeroth column is nonnegative fora basic feasible solution, all rows are lexicographically positive.

100

Bland’s Rule

Smallest subscript pivoting rule (Bland’s rule)

1 Choose the column Aj with cj < 0 and j minimal to enter the basis.

2 Among all basic variables xi that could exit the basis, select the onewith smallest i .

Theorem (without proof)

The simplex algorithm with Bland’s rule terminates after a finite numberof iterations.

RemarkBland’s rule is compatible with an implementation of the revised simplexmethod in which the reduced costs of the nonbasic variables are computedone at a time, in the natural order, until a negative one is discovered.

101

Finding an Initial Basic Feasible Solution

So far we always assumed that the simplex algorithm starts with a basicfeasible solution. We now discuss how such a solution can be obtained.

I Introducing artificial variables

I The two-phase simplex method

I The big-M method

102

Introducing Artificial VariablesExample:

min x1 + x2 + x3

s.t. x1 + 2 x2 + 3 x3 = 3−x1 + 2 x2 + 6 x3 = 2

4 x2 + 9 x3 = 53 x3 + x4 = 1

x1, . . . , x4 ≥ 0

Auxiliary problem with artificial variables:

min x5 + x6 + x7 + x8

s.t. x1 +2 x2 +3 x3 x5 = 3−x1 +2 x2 +6 x3 + x6 = 2

4 x2 +9 x3 + x7 = 53 x3 +x4 + x8 = 1

x1, . . . , x4, x5, . . . , x8 ≥ 0

103

Auxiliary Problem

Auxiliary problem with artificial variables:

min x5 +x6 +x7 +x8

s.t. x1 +2 x2 +3 x3 x5 = 3−x1 +2 x2 +6 x3 +x6 = 2

4 x2 +9 x3 +x7 = 53 x3 +x4 +x8 = 1

x1, . . . , x4, x5, . . . , x8 ≥ 0

Observationx = (0, 0, 0, 0, 3, 2, 5, 1) is a basic feasible solution for this problem withbasic variables (x5, x6, x7, x8). We can form the initial tableau.

104

Initial Tableau

x1 x2 x3 x4 x5 x6 x7 x8

0 0 0 0 0 1 1 1 1x5 = 3 1 2 3 0 1 0 0 0x6 = 2 − 1 2 6 0 0 1 0 0x7 = 5 0 4 9 0 0 0 1 0x8 = 1 0 0 3 1 0 0 0 1

Calculate reduced costs by eliminating the nonzero-entries for thebasis-variables.

Now we can proceed as seen before...

105

Initial Tableau

x1 x2 x3 x4 x5 x6 x7 x8

0 0 0 0 0 1 1 1 1x5 = 3 1 2 3 0 1 0 0 0x6 = 2 − 1 2 6 0 0 1 0 0x7 = 5 0 4 9 0 0 0 1 0x8 = 1 0 0 3 1 0 0 0 1



105

Initial Tableau

x1 x2 x3 x4 x5 x6 x7 x8

− 3 − 1 − 2 − 3 0 0 1 1 1x5 = 3 1 2 3 0 1 0 0 0x6 = 2 − 1 2 6 0 0 1 0 0x7 = 5 0 4 9 0 0 0 1 0x8 = 1 0 0 3 1 0 0 0 1



105

Initial Tableau

x1 x2 x3 x4 x5 x6 x7 x8

− 5 0 − 4 − 9 0 0 0 1 1x5 = 3 1 2 3 0 1 0 0 0x6 = 2 − 1 2 6 0 0 1 0 0x7 = 5 0 4 9 0 0 0 1 0x8 = 1 0 0 3 1 0 0 0 1



105

Initial Tableau

x1 x2 x3 x4 x5 x6 x7 x8

− 10 0 − 8 − 18 0 0 0 0 1x5 = 3 1 2 3 0 1 0 0 0x6 = 2 − 1 2 6 0 0 1 0 0x7 = 5 0 4 9 0 0 0 1 0x8 = 1 0 0 3 1 0 0 0 1



105

Initial Tableau

x1 x2 x3 x4 x5 x6 x7 x8

−11 0 −8 −21 −1 0 0 0 0x5 = 3 1 2 3 0 1 0 0 0x6 = 2 − 1 2 6 0 0 1 0 0x7 = 5 0 4 9 0 0 0 1 0x8 = 1 0 0 3 1 0 0 0 1



105

Minimizing the Auxiliary Problem

x1 x2 x3 x4 x5 x6 x7 x8

−11 0 −8 −21 −1 0 0 0 0x5 = 3 1 2 3 0 1 0 0 0x6 = 2 −1 2 6 0 0 1 0 0x7 = 5 0 4 9 0 0 0 1 0x8 = 1 0 0 3 1 0 0 0 1

Basis change: x4 enters the basis, x8 exits.

106


x1 x2 x3 x4 x5 x6 x7 x8

−10 0 −8 −18 0 0 0 0 1x5 = 3 1 2 3 0 1 0 0 0x6 = 2 −1 2 6 0 0 1 0 0x7 = 5 0 4 9 0 0 0 1 0x4 = 1 0 0 3 1 0 0 0 1


107


x1 x2 x3 x4 x5 x6 x7 x8

−4 0 −8 0 6 0 0 0 7x5 = 2 1 2 0 −1 1 0 0 −1x6 = 0 −1 2 0 −2 0 1 0 −2x7 = 2 0 4 0 −3 0 0 1 −3x3 = 1/3 0 0 1 1/3 0 0 0 1/3


108


x1 x2 x3 x4 x5 x6 x7 x8

−4 −4 0 0 −2 0 4 0 −1x5 = 2 2 0 0 1 1 −1 0 1x2 = 0 −1/2 1 0 −1 0 1/2 0 −1x7 = 2 0 0 1 0 −2 1 1 1x3 = 1/3 0 0 1 1/3 0 0 0 1/3


109


x1 x2 x3 x4 x5 x6 x7 x8

0 0 0 0 0 2 2 0 1x1 = 1 1 0 0 1/2 1/2 −1/2 0 1/2x2 = 1/2 0 1 0 −3/4 1/4 1/4 0 −3/4x7 = 0 0 0 0 0 −1 −1 1 0x3 = 1/3 0 0 1 1/3 0 0 0 1/3

Basic feasible solution for auxiliary problem with (auxiliary) cost value 0

⇒ Also feasible for the original problem - but not (yet) basic.

110

Obtaining a Basis for the Original Problem

x1 x2 x3 x4 x5 x6 x7 x8

0 0 0 0 0 2 2 0 1x1 = 1 1 0 0 1/2 1/2 −1/2 0 1/2x2 = 1/2 0 1 0 −3/4 1/4 1/4 0 −3/4x7 = 0 0 0 0 0 −1 −1 1 0x3 = 1/3 0 0 1 1/3 0 0 0 1/3

Observation

Restricting the tableau to the original variables, we get a zero-row.Thus the original equations are linearily dependent.→ We can remove the third row.

111

Obtaining a Basis for the Original Problem

x1 x2 x3 x4

− 11/6 0 0 0 − 1/12x1 = 1 1 0 0 1/2x2 = 1/2 0 1 0 −3/4x3 = 1/3 0 0 1 1/3

We finally obtain a basic feasible solution for the original problem.

After computing the reduced costs for this basis (as seen in thebeginning), the simplex method can start with its typical iterations.

112

Omitting Artificial Variables

Auxiliary problem

min x5 +x6 +x7 +x8

s.t. x1 +2 x2 +3 x3 x5 = 3−x1 +2 x2 +6 x3 +x6 = 2

4 x2 +9 x3 +x7 = 53 x3 +x4 +x8 = 1

x1, . . . , x8 ≥ 0

Artificial variable x8 could have been omitted by setting x4 to 1 in theinitial basis. This is possible as x4 does only appear in one constraint.

Generally, this can be done, e. g., with all slack variables that havenonnegative right hand sides.

113

Phase I of the Simplex MethodGiven: LP in standard form: min{cT · x | A · x = b, x ≥ 0}

1 Transform problem such that b ≥ 0 (multiply constraints by −1).

2 Introduce artificial variables y1, . . . , ym and solve auxiliary problem

minm∑

i=1

yi s.t. A · x + Im · y = b, x , y ≥ 0 .

3 If optimal cost is positive, then STOP (original LP is infeasible).

4 If no artificial variable is in final basis, eliminate artificial variables andcolumns and STOP (feasible basis for original LP has been found).

5 If `th basic variable is artificial, find j ∈ {1, . . . , n} with `th entry inB−1 · Aj nonzero. Use this entry as pivot element and replace `thbasic variable with xj .

6 If no such j ∈ {1, . . . , n} exists, eliminate `th row (constraint).

114

The Two-phase Simplex Method

Two-phase simplex method

1 Given an LP in standard from, first run phase I.

2 If phase I yields a basic feasible solution for the original LP, enter“phase II” (see above).

Possible outcomes of the two-phase simplex method

i Problem is infeasible (detected in phase I).

ii Problem is feasible but rows of A are linearly dependent (detected andcorrected at the end of phase I by eliminating redundant constraints.)

iii Optimal cost is −∞ (detected in phase II).

iv Problem has optimal basic feasible solution (found in phase II).

Remark: (ii) is not an outcome but only an intermediate result leading tooutcome (iii) or (iv).

115

Big-M Method

Alternative idea: Combine the two phases into one by introducingsufficiently large penalty costs for artificial variables.

This way, the LP

min∑n

i=1 ci xis.t. A · x = b

x ≥ 0

becomes:

min∑n

i=1 ci xi + M ·∑mj=1 yj

s.t. A · x + Im · y = bx , y ≥ 0

Remark: If M is sufficiently large and the original program has a feasiblesolution, all artificial variables will be driven to zero by the simplex method.

116

How to Choose M?

Observation

Initially, M only occurs in the zeroth row. As the zeroth row never becomespivot row, this property is maintained while the simplex method is running.

All we need to have is an order on all values that can appear as reducedcost coefficients.

Order on cost coefficients

a M + b < c M + d :⇐⇒ (a < c) ∨ (a = c ∧ b < c)

In particular, −a M + b < 0 < a M + b for any positive a and arbitrary b,and we can decide whether a cost coefficient is negative or not.

→ There is no need to give M a fixed numerical value.

117

Example

Example:

min x1 + x2 + x3

s.t. x1 + 2 x2 + 3 x3 = 3−x1 + 2 x2 + 6 x3 = 2

4 x2 + 9 x3 = 53 x3 + x4 = 1

x1, . . . , x4 ≥ 0

118

Introducing Artificial Variables and M

Auxiliary problem:

min x1 +x2 +x3 + M x5 + M x6 + M x7

s.t. x1 +2 x2 +3 x3 x5 = 3−x1 +2 x2 +6 x3 + x6 = 2

4 x2 +9 x3 + x7 = 53 x3 +x4 = 1

x1, . . . , x4, x5, x6, x7 ≥ 0

Note that this time the unnecessary artificial variable x8 has been omitted.

We start off with (x5, x6, x7, x4) = (3, 2, 5, 1).

119

Forming the Initial Tableau

x1 x2 x3 x4 x5 x6 x7

0 1 1 1 0 M M M3 1 2 3 0 1 0 02 − 1 2 6 0 0 1 05 0 4 9 0 0 0 11 0 0 3 1 0 0 0

Compute reduced costs by eliminating the nonzero entries for the basicvariables.

120


x1 x2 x3 x4 x5 x6 x7

0 1 1 1 0 M M M3 1 2 3 0 1 0 02 − 1 2 6 0 0 1 05 0 4 9 0 0 0 11 0 0 3 1 0 0 0


120


x1 x2 x3 x4 x5 x6 x7

− 3M −M + 1 − 2M + 1 − 3M + 1 0 0 M M3 1 2 3 0 1 0 02 − 1 2 6 0 0 1 05 0 4 9 0 0 0 11 0 0 3 1 0 0 0


120


x1 x2 x3 x4 x5 x6 x7

− 5M 1 − 4M + 1 − 9M + 1 0 0 0 M3 1 2 3 0 1 0 02 − 1 2 6 0 0 1 05 0 4 9 0 0 0 11 0 0 3 1 0 0 0


120


x1 x2 x3 x4 x5 x6 x7

− 10M 1 − 8M + 1 − 18M + 1 0 0 0 03 1 2 3 0 1 0 02 − 1 2 6 0 0 1 05 0 4 9 0 0 0 11 0 0 3 1 0 0 0


120


x1 x2 x3 x4 x5 x6 x7

− 10M 1 −8M + 1 −18M + 1 0 0 0 03 1 2 3 0 1 0 02 − 1 2 6 0 0 1 05 0 4 9 0 0 0 11 0 0 3 1 0 0 0


120

First Iteration

x1 x2 x3 x4 x5 x6 x7

−10M 1 − 8M + 1 − 18M + 1 0 0 0 03 1 2 3 0 1 0 02 −1 2 6 0 0 1 05 0 4 9 0 0 0 11 0 0 3 1 0 0 0

Reduced costs for x2 and x3 are negative.

Basis change: x3 enters the basis, x4 leaves.

121

Second Iteration

x1 x2 x3 x4 x5 x6 x7

−4M − 1/3 1 −8M + 1 0 6M − 1/3 0 0 02 1 2 0 −1 1 0 00 −1 2 0 −2 0 1 02 0 4 0 −3 0 0 1

1/3 0 0 1 1/3 0 0 0


122

Third Iteration

x1 x2 x3 x4 x5 x6 x7

−4M − 1/3 −4M + 3/2 0 0 −2M + 2/3 0 4M − 1/2 02 2 0 0 1 1 −1 00 −1/2 1 0 −1 0 1/2 02 2 0 0 1 0 −2 1

1/3 0 0 1 1/3 0 0 0


123

Fourth Iteration

x1 x2 x3 x4 x5 x6 x7

−11/6 0 0 0 −1/12 2M − 3/4 2M + 1/4 021 1 0 0 1/2 1/2 −1/2 0

1/2 0 1 0 −3/4 1/4 1/4 00 0 0 0 0 −1 −1 1

1/3 0 0 1 1/3 0 0 0

Note that all artificial variables have already been driven to 0.


124

Fifth Iteration

x1 x2 x3 x4 x5 x6 x7

−7/4 0 0 1/4 0 2M − 3/4 2M + 1/4 01/2 1 0 −3/2 0 1/2 −1/2 05/4 0 1 9/4 0 1/4 1/4 0

0 0 0 0 0 −1 −1 11 0 0 3 1 0 0 0

We now have an optimal solution of the auxiliary problem, as all costs arenonnegative (M presumed large enough).

By elimiating the third row as in the previous example, we get a basicfeasible and also optimal solution to the original problem.

125

Computational Efficiency of the Simplex Method

Observation

The computational efficiency of the simplex method is determined by

i the computational effort of each iteration;

ii the number of iterations.

Question: How many iterations are needed in the worst case?

Idea for negative answer (lower bound)

Describe

I a polyhedron with an exponential number of vertices;

I a path that visits all vertices and always moves from a vertex to anadjacent one that has lower costs.

126

Computational Efficiency of the Simplex Method

Unit cube

Consider the unit cube in Rn, defined by the constraints

0 ≤ xi ≤ 1, i = 1, . . . , n

The unit cube has

I 2n vertices;

I a spanning path, i. e., a path traveling the edges of the cube visitingeach vertex exactly once.

x1

x2

x1

x2

x3

127

Computational Efficiency of the Simplex Method (cont.)

Klee-Minty cube

Consider a perturbation of the unit cube in Rn, defined by the constraints

0 ≤ x1 ≤ 1,

εxi−1 ≤ xi ≤ 1− εxi−1, i = 2, . . . , n

for some ε ∈ (0, 1/2).

x1

x2

x1

x2

x3

128

Computational Efficiency of the Simplex Method (cont.)

Klee-Minty cube

0 ≤ x1 ≤ 1,

εxi−1 ≤ xi ≤ 1− εxi−1, i = 2, . . . , n, ε ∈ (0, 1/2)

Theorem 3.14.

Consider the linear programming problem of minimizing −xn subject to theconstraints above. Then,

a the feasible set has 2n vertices;

b the vertices can be ordered so that each one is adjacent to and haslower cost than the previous one;

c there exists a pivoting rule under which the simplex method requires2n − 1 changes of basis before it terminates.

129

Diameter of Polyhedra

Definition 3.15.I The distance d(x , y) between two vertices x , y is the minimum

number of edges required to reach y starting from x .

I The diameter D(P) of polyhedron P is the maximum d(x , y) over allpairs of vertices (x , y).

I ∆(n,m) is the maximum D(P) over all polytopes in Rn that arerepresented in terms of m inequality constraints.

I ∆u(n,m) is the maximum D(P) over all polyhedra in Rn that arerepresented in terms of m inequality constraints.

∆(2,8) =⌊8

2

⌋= 4 ∆(2,8) =

⌊82

⌋= 4

∆(2,m) =⌊m

2

⌋∆u(2,8) = 8 − 2 = 6 ∆u(2,8) = 8 − 2 = 6

∆u(2,m) = m − 2

130

Hirsch ConjectureObservation: The diameter of the feasible set in a linear programmingproblem is a lower bound on the number of steps required by the simplexmethod, no matter which pivoting rule is being used.

Polynomial Hirsch Conjecture

∆(n,m) ≤ poly(m, n)

Remarks

I Known lower bounds: ∆u(n,m) ≥ m − n +⌊n5

⌋

I Known upper bounds:

∆(n,m) ≤ ∆u(n,m) < m1+log2 n = (2n)log2 m

I The Strong Hirsch Conjecture

∆(n,m) ≤ m − n

was disproven in 2010 by Paco Santos for n = 43, m = 86.

131

Average Case Behavior of the Simplex Method

I Despite the exponential lower bounds on the worst case behavior ofthe simplex method (Klee-Minty cubes etc.), the simplex methodusually behaves well in practice.

I The number of iterations is “typically” O(m).

I There have been several attempts to explain this phenomenon from amore theoretical point of view.

I These results say that “on average” the number of iterations is O(·)(usually polynomial).

I One main difficulty is to come up with a meaningful and, at the sametime, manageable definition of the term “on average”.

132

Chapter 4:Duality Theory


133

MotivationFor A ∈ Rm×n, b ∈ Rm, and c ∈ Rn, consider the linear program

min cT · x s.t. A · x ≥ b, x ≥ 0

Question: How to derive lower bounds on the optimal solution value?

Idea: For p ∈ Rm with p ≥ 0: A · x ≥ b =⇒ (pT · A) · x ≥ pT · bThus, if cT ≥ pT · A, then

cT · x ≥ (pT · A) · x ≥ pT · b for all feasible solutions x .

Find the best (largest) lower bound in this way:

max pT · b max bT · ps.t. pT · A ≤ cT ←→ s.t. AT · p ≤ c

p ≥ 0 p ≥ 0

This LP is the dual linear program of our initial LP.134

Primal and Dual Linear Program

Consider the general linear program:

min cT · xs.t. ai

T · x ≥ bi for i ∈ M1

aiT · x ≤ bi for i ∈ M2

aiT · x = bi for i ∈ M3

xj ≥ 0 for j ∈ N1

xj ≤ 0 for j ∈ N2

xj free for j ∈ N3

Obtain a lower bound:

max pT · bs.t. pi ≥ 0 for i ∈ M1

pi ≤ 0 for i ∈ M2

pi free for i ∈ M3

pT · Aj ≤ cj for j ∈ N1

pT · Aj ≥ cj for j ∈ N2

pT · Aj = cj for j ∈ N3

The linear program on the right hand side is the dual linear program of theprimal linear program on the left hand side.

135

Primal and Dual Variables and Constraints

primal LP (minimize) dual LP (maximize)

≥ bi ≥ 0

constraints ≤ bi ≤ 0 variables

= bi free

≥ 0 ≤ ci

variables ≤ 0 ≥ ci constraints

free = ci

136

Examples

primal LP dual LP

min cT · xs.t. A · x ≥ b

max pT · bs.t. pT · A = cT

p ≥ 0

min cT · xs.t. A · x = b

x ≥ 0

max pT · bs.t. pT · A ≤ cT

137

Basic Properties of the Dual Linear Program

Theorem 4.1.

The dual of the dual LP is the primal LP.

Proof:. . .

Theorem 4.2.

Let Π1 and Π2 be two LPs where Π2 has been obtained from Π1 by(several) transformations of the following type:

i replace a free variable by the difference of two non-negative variables;

ii introduce a slack variable in order to replace an inequality constraintby an equation;

iii if some row of a feasible equality system is a linear combination of theother rows, eliminate this row.

Then the dual of Π1 is equivalent to the dual of Π2.

Proof:. . .138

Weak Duality Theorem

Theorem 4.3.

If x is a feasible solution to the primal LP (minimization problem) and p afeasible solution to the dual LP (maximization problem), then

cT · x ≥ pT · b .

Proof:. . .

Corollary 4.4.

Consider a primal-dual pair of linear programs as above.

a If the primal LP is unbounded (i. e., optimal cost = −∞), then thedual LP is infeasible.

b If the dual LP is unbounded (i. e., optimal cost =∞), then the primalLP is infeasible.

c If x and p are feasible solutions to the primal and dual LP, resp., andif cT · x = pT · b, then x and p are optimal solutions.

139

Strong Duality Theorem

Theorem 4.5.

If an LP has an optimal solution, so does its dual and the optimal costsare equal.

Proof:. . .

140

Different Possibilities for Primal and Dual LP

primal \ dual finite optimum unbounded infeasible

finite optimum possible impossible impossible

unbounded impossible impossible possible

infeasible impossible possible possible

Example of infeasible primal and dual LP:

min x1 + 2 x2 max p1 + 3 p2

s.t. x1 + x2 = 1 s.t. p1 + 2 p2 = 1

2 x1 + 2 x2 = 3 p1 + 2 p2 = 2

141

Complementary SlacknessConsider the following pair of primal and dual LPs:

min cT · x max pT · bs.t. A · x ≥ b s.t. pT · A = cT

p ≥ 0

If x and p are feasible solutions, then cT · x = pT · A · x ≥ pT · b.Thus,

cT · x = pT · b ⇐⇒ for all i : pi = 0 if aiT · x > bi .

Theorem 4.6.

Consider an arbitrary pair of primal and dual LPs. Let x and p be feasiblesolutions to the primal and dual LP, respectively. Then x and p are bothoptimal if and only if

ui := pi (aiT · x − bi ) = 0 for all i ,

vj := (cj − pT · Aj) xj = 0 for all j .

Proof:. . .142

Geometric ViewConsider pair of primal and dual LPs with A ∈ Rm×n and rank(A) = n:

min cT · x max pT · bs.t. A · x ≥ b s.t. pT · A = cT

p ≥ 0

Let I ⊆ {1, . . . ,m} with |I | = n and ai , i ∈ I , linearly independent.

=⇒ aiT · x = bi , i ∈ I , has unique solution x I (basic solution)

Let p ∈ Rm (dual vector). Then x , p are optimal solutions if

i aiT · x ≥ bi for all i (primal feasibility)

ii pi = 0 for all i 6∈ I (complementary slackness)

iii∑m

i=1 pi · ai = c (dual feasibility)

iv p ≥ 0 (dual feasibility)

(ii) and (iii) imply∑

i∈I pi · ai = c which has a unique solution pI .

The ai , i ∈ I , form basis for dual LP and pI is corresponding basic solution.143

Geometric View (cont.)

c

a1

a2

a1

a3

a1

a4

a1

a5

a2

a1

a3

a4

a5A

B

C

D

144

Dual Variables as Marginal CostsConsider the primal dual pair:

min cT · x max pT · bs.t. A · x = b s.t. pT · A ≤ cT

x ≥ 0

Let x∗ be optimal basic feasible solution to primal LP with basis B, i. e.,x∗B = B−1 · b and assume that x∗B > 0 (i. e., x∗ non-degenerate).

Replace b by b + d . For small d , the basis B remains feasible and optimal:

B−1 · (b + d) = B−1 · b + B−1 · d ≥ 0 (feasibility)

cT = cT − cBT · B−1 · A ≥ 0 (optimality)

Optimal cost of perturbed problem is

cBT · B−1 · (b + d) = cB

T · x∗B + (cBT · B−1)︸︷︷︸=pT

·d

Thus, pi is the marginal cost per unit increase of bi .145

Dual Variables as Shadow PricesDiet problem:

I aij := amount of nutrient i in one unit of food jI bi := requirement of nutrient i in some ideal dietI cj := cost of one unit of food j on the food market

LP duality: Let xj := number of units of food j in the diet:


x ≥ 0

Dual interpretation:I pi is “fair” price per unit of nutrient i

I pT · Aj is value of one unit of food j on the nutrient market

I food j used in ideal diet (x∗j > 0) is consistently priced at the twomarkets (by complementary slackness)

I ideal diet has the same value on both markets (by strong duality)146

Dual Basic SolutionsConsider LP in standard form with A ∈ Rm×n, rank(A) = m, and dual LP:


x ≥ 0

Observation 4.7.

A basis B yields

I a primal basic solution given by xB := B−1 · b and

I a dual basic solution pT := cBT · B−1.

Moreover,

a the values of the primal and the dual basic solutions are equal:

cBT · xB = cB

T · B−1 · b = pT · b ;

b p is feasible if and only if c ≥ 0;

c reduced cost ci = 0 corresponds to active dual constraint;

d p is degenerate if and only if ci = 0 for some non-basic variable xi .147

Dual Simplex Method

I Let B be a basis whose corresponding dual basic solution p is feasible.

I If also the primal basic solution x is feasible, then x , p are optimal.

I Assume that xB(`) < 0 and consider the `th row of the simplex tableau

(xB(`), v1, . . . , vn) (pivot row)

I Let j ∈ {1, . . . , n} with vj < 0 and

cj|vj |

= mini :vi<0

ci|vi |

Performing an iteration of the simplex method with pivot element vjyields new basis B ′ and corresponding dual basic solution p′ with

cB′T · B ′−1 · A ≤ cT and p′T · b ≥ pT · b (with > if cj > 0).

II If vi ≥ 0 for all i ∈ {1, . . . , n}, then the dual LP is unbounded andthe primal LP is infeasible.

148

Remarks on the Dual Simplex Method

I Dual simplex method terminates if lexicographic pivoting rule is used:

I Choose any row ` with xB(`) < 0 to be the pivot row.I Among all columns j with vj < 0 choose the one which is

lexicographically minimal when divided by |vj |.

I Dual simplex method is useful if, e. g., dual basic solution is readilyavailable.

I Example: Resolve LP after right-hand-side b has changed.

149

Chapter 5:Optimal Trees and Paths

(cp. Cook, Cunningham, Pulleyblank & Schrijver, Chapter 2)

150

Trees and Forests

Definition 5.1.

i An undirected graph having no circuit is called a forest.

ii A connected forest is called a tree.

Theorem 5.2.

Let G = (V ,E ) be an undirected graph on n = |V | nodes. Then, thefollowing statements are equivalent:

i G is a tree.

ii G has n − 1 edges and no circuit.

iii G has n − 1 edges and is connected.

iv G is connected. If an arbitrary edge is removed, the resultingsubgraph is disconnected.

v G has no circuit. Adding an arbitrary edge to G creates a circuit.

vi G contains a unique path between any pair of nodes.

Proof: See exercises.151

Kruskal’s Algorithm

Minimum Spanning Tree (MST) Problem

Given: connected graph G = (V ,E ), cost function c : E → R.

Task: find spanning tree T = (V ,F ) of G with minimum cost∑

e∈F c(e).

Kruskal’s Algorithm for MST

1 Sort the edges in E such that c(e1) ≤ c(e2) ≤ · · · ≤ c(em).

2 Set T := (V , ∅).

3 For i := 1 to m do:If adding ei to T does not create a circuit, then add ei to T .

152

Example for Kruskal’s Algorithm

a

b

d

f

g

h

k

16

22

29

20

31

28 32

23

35 25

15

12

18

153

Prim’s Algorithm

Notation: For a graph G = (V ,E ) and A ⊆ V let

δ(A) := {e = {v ,w} ∈ E | v ∈ A and w ∈ V \ A} .

We call δ(A) the cut induced by A.

Prim’s Algorithm for MST

1 Set U := {r} for some node r ∈ V and F := ∅; set T := (U,F ).

2 While U 6= V , determine a minimum cost edge e ∈ δ(U).

3 Set F := F ∪ {e} and U := U ∪ {W } with e = {v ,w}, w ∈ V \ U.

154

Example for Prim’s Algorithm

a

b

d

f

g

h

k

16

22

29

20

31

28 32

23

35 25

15

12

18

155

Correctness of the MST Algorithms

Lemma 5.3.

A graph G = (V ,E ) is connected if and only if there is no set A ⊆ V ,∅ 6= A 6= V , with δ(A) = ∅.

Proof: . . .

Notation: We say that B ⊆ E is extendible to an MST if B is contained inthe edge-set of some MST of G .

Theorem 5.4.

Let B ⊆ E be extendible to an MST and ∅ 6= A ( V with B ∩ δ(A) = ∅.If e is a min-cost edge in δ(A), then B ∪ {e} is extendible to an MST.

Proof: . . .

I Correctness of Prim’s Algorithm immediately follows.I Kruskal: Whenever an edge e = {v ,w} is added, it is cheapest edge

in cut induced by subset of nodes currently reachable from v .156

Efficiency of Prim’s Algorithm

Prim’s Algorithm for MST

1 Set U := {r} for some node r ∈ V and F := ∅; set T := (U,F ).

2 While U 6= V , determine a minimum cost edge e ∈ δ(U).

3 Set F := F ∪ {e} and U := U ∪ {W } with e = {v ,w}, w ∈ V \ U.

I Straightforward implementation achieves running time O(nm) where,as usual, n := |V | and m := |E |:

I the while-loop has n − 1 iterations;I a min-cost edge e ∈ δ(U) can be found in O(m) time.

I Idea for improved running time O(n2):I For each v ∈ V \ U, always keep a minimum cost edge h(v)

connecting v to some node in U.I In each iteration, information about all h(v), v ∈ V \ U, can be

updated in O(n) time.I Find min-cost edge e ∈ δ(U) in O(n) time by only considering

the edges h(v), v ∈ V \ U.I Best known running time is O(m + n log n) (uses Fibonacci heaps).

157

Efficiency of Kruskal’s Algorithm

Kruskal’s Algorithm for MST

1 Sort the edges in E such that c(e1) ≤ c(e2) ≤ · · · ≤ c(em).

2 Set T := (V , ∅).

3 For i := 1 to m do:If adding ei to T does not create a circuit, then add ei to T .

Theorem 5.5.

Kruskal’s Algorithm can be implemented to run in O(m log m) time.

Proof: . . .

158

Minimum Spanning Trees and Linear ProgrammingNotation:

I For S ⊆ V let γ(S) :={

e = {v ,w} ∈ E | v ,w ∈ S}

.I For a vector x ∈ RE and a subset B ⊆ E let x(B) :=

∑e∈B xe .

Consider the following integer linear program:

min cT · xs.t. x(γ(S)) ≤ |S | − 1 for all ∅ 6= S ⊂ V (5.1)

x(E ) = |V | − 1 (5.2)

xe ∈ {0, 1} for all e ∈ E

Observations:I Feasible solution x ∈ {0, 1}E is characteristic vector of subset F ⊆ E .I F does not contain circuit due to (5.1) and n − 1 edges due to (5.2).I Thus, F forms a spanning tree of G .I Moreover, the edge set of an arbitrary spanning tree of G yields a

feasible solution x ∈ {0, 1}E .

159

Minimum Spanning Trees and Linear Programming (cont.)Consider LP relaxation of the integer programming formulation:

min cT · xs.t. x(γ(S)) ≤ |S | − 1 for all ∅ 6= S ⊂ V

x(E ) = |V | − 1

xe ≥ 0 for all e ∈ E

Theorem 5.6.

Let x∗ ∈ {0, 1}E be the characteristic vector of an MST. Then x∗ is anoptimal solution to the LP above.

Proof: . . .

Corollary 5.7.

The vertices of the polytope given by the set of feasible LP solutions areexactly the characteristic vectors of spanning trees of G . The polytope isthus the convex hull of the characteristic vectors of all spanning trees.

160

Shortest Path ProblemGiven: digraph D = (V ,A), node r ∈ V , arc costs ca, a ∈ A.

Task: for each v ∈ V , find dipath from r to v of least cost (if one exists)

Remarks:I Existence of r -v -dipath can be checked, e. g., by breadth-first search.I Ensure existence of r -v -dipaths: add arcs (r , v) of suffic. large cost.

Basic idea behind all algorithms for solving shortest path problem:

If yv , v ∈ V , is the least cost of a dipath from r to v , then

yv + c(v ,w) ≥ yw for all (v ,w) ∈ A. (5.3)

Remarks:I More generally, subpaths of shortest paths are shortest paths!I If there is a shortest r -v -dipath for all v ∈ V , then there is a shortest

path tree, i. e., a directed spanning tree T rooted at r such that theunique r -v -dipath in T is a least-cost r -v -dipath in D.

161

Feasible Potentials

Definition 5.8.

A vector y ∈ RV is a feasible potential if it satisfies (5.3).

Lemma 5.9.

If y is feasible potential with yr = 0 and P an r -v -dipath, then yv ≤ c(P).

Proof: Suppose that P is v0, a1, v1, . . . , ak , vk , where v0 = r and vk = v .Then,

c(P) =k∑

i=1

cai ≥k∑

i=1

(yvi − yvi−1) = yvk − yv0 = yv .

Corollary 5.10.

If y is a feasible potential with yr = 0 and P an r -v -dipath of cost yv ,then P is a least-cost r -v -dipath.

162

Ford’s Algorithm

Ford’s Algorithm

i Set yr := 0, p(r) := r , yv :=∞, and p(v) := null, for all v ∈ V \ {r}.ii While there is an arc a = (v ,w) ∈ A with yw > yv + c(v ,w), set

yw := yv + c(v ,w) and p(w) := v .

Question: Does the algorithm always terminate?

Example:

r

a

d

b

2 1

−3

1

Observation:The algorithm does not terminate because of the negative-cost dicircuit.

163

Validity of Ford’s Algorithm

Lemma 5.11.

If there is no negative-cost dicircuit, then at any stage of the algorithm:

a if yv 6=∞, then yv is the cost of some simple dipath from r to v ;

b if p(v) 6= null, then p defines a simple r -v -dipath of cost at most yv .

Proof: . . .

Theorem 5.12.

If there is no negative-cost dicircuit, then Ford’s Algorithm terminatesafter a finite number of iterations. At termination, y is a feasible potentialwith yr = 0 and, for each node v ∈ V , p defines a least-cost r -v -dipath.

Proof: . . .

164

Feasible Potentials and Negative-Cost Dicircuits

Theorem 5.13.

A digraph D = (V ,A) with arc costs c ∈ RA has a feasible potential if andonly if there is no negative-cost dicircuit.

Proof: . . .

Remarks:

I If there is a dipath but no least-cost dipath from r to v , it is becausethere are arbitrarily cheap nonsimple r -v -dipaths.

I Finding a least-cost simple dipath from r to v is, however, difficult(see later).

Lemma 5.14.

If c is integer-valued, C := 2 maxa∈A |ca|+ 1, and there is no negative-costdicircuit, then Ford’s Algorithm terminates after at most C n2 iterations.

Proof: Exercise.165

Feasible Potentials and Linear ProgrammingAs a consequence of Ford’s Algorithm we get:

Theorem 5.15.

Let D = (V ,A) be a digraph, r , s ∈ V , and c ∈ RA. If, for every v ∈ V ,there exists a least-cost dipath from r to v , then

min{c(P) | P an r -s-dipath} = max{ys − yr | y a feasible potential} .

Formulate the right-hand side as a linear program and consider the dual:

max ys − yr

s.t. yw − yv ≤ c(v ,w)

for all (v ,w) ∈ A

min cT · xs.t.

∑

a∈δ−(v)

xa −∑

a∈δ+(v)

xa = bv ∀v ∈ V

xa ≥ 0 for all a ∈ A

with bs = 1, br = −1, and bv = 0 for all v 6∈ {r , s}.

Notice: The dual is the LP relaxation of an ILP formulation of the shortestr -s-dipath problem (xa= number of times a shortest r -s-dipath uses arc a).

166

Bases of Shortest Path LP

Consider again the dual LP:

min cT · xs.t.

∑

a∈δ−(v)

xa −∑

a∈δ+(v)

xa = bv for all v ∈ V


The underlying matrix Q is the incidence matrix of D.

Lemma 5.16.

Let D = (V ,A) be a connected digraph and Q its incidence matrix. Asubset of columns of Q indexed by a subset of arcs F ⊆ A forms a basis ofthe linear subspace of Rn spanned by the columns of Q if and only if F isthe arc-set of a spanning tree of D.

Proof: Exercise.

167

Refinement of Ford’s Algorithm

Ford’s Algorithm

i Set yr := 0, p(r) := r , yv :=∞, and p(v) := null, for all v ∈ V \ {r}.ii While there is an arc a = (v ,w) ∈ A with yw > yv + c(v ,w), set

yw := yv + c(v ,w) and p(w) := v .

I # iterations crucially depends on order in which arcs are chosen.

I Suppose that arcs are chosen in order S = f1, f2, f3, . . . , f`.

I Dipath P is embedded in S if P’s arc sequence is a subsequence of S.

Lemma 5.17.

If an r -v -dipath P is embedded in S, then yv ≤ c(P) after Ford’sAlgorithm has gone through the sequence S.

Proof: . . .

Goal: Find short sequence S such that, for all v ∈ V , a least-costr -v -dipath is embedded in S.

168

Ford-Bellman AlgorithmBasic idea:

I Every simple dipath is embedded in S1,S2, . . . ,Sn−1 where, for all i ,Si is an ordering of A.

I This yields a shortest path algorithm with running time O(nm).

Ford-Bellman Algorithm

i initialize y , p (see Ford’s Algorithm);

ii for i = 1 to n − 1 do

iii for all a = (v ,w) ∈ A do

iv if yw > yv + c(v ,w), then set yw := yv + c(v ,w) and p(w) := v ;

Theorem 5.18.

The algorithm runs in O(nm) time. If, at termination, y is a feasiblepotential, then p yields a least-cost r -v -dipath for each v ∈ V . Otherwise,the given digraph contains a negative-cost dicircuit.

169

Acyclic Digraphs and Topological Orderings

Definition 5.19.

Consider a digraph D = (V ,A).

a An ordering v1, v2, . . . , vn of V so that i < j for each (vi , vj) ∈ A iscalled a topological ordering.

b If D has a topological ordering, then D is called acyclic.

Observations:

I Digraph D is acyclic if and only if it does not contain a dicircuit.

I Let D be acyclic and S an ordering of A such that (vi , vj) precedes(vk , v`) if i < k. Then every dipath of D is embedded in S.

Theorem 5.20.

The shortest path problem on acyclic digraphs can be solved in time O(m).

Proof: . . .170

Dijkstra’s AlgorithmConsider the special case of nonnegative costs, i. e., ca ≥ 0, for each a ∈ A.

Dijkstra’s Algorithm

i initialize y , p (see Ford’s Algorithm); set S := V ;

ii while S 6= ∅ do

iii choose v ∈ S with yv minimum and delete v from S ;

iv for each w ∈ V with (v ,w) ∈ A do

v if yw > yv + c(v ,w), then set yw := yv + c(v ,w) and p(w) := v ;

Example:

r

a

b

p

q

2

4

1

3

2

2 4

3

r

a p

b q

2 ∞

0

2

4

3

6171

Correctness of Dijkstra’s Algorithm

Lemma 5.21.

For each w ∈ V , let y ′w be the value of yw when w is removed from S .If u is deleted from S before v , then y ′u ≤ y ′v .

Proof: . . .

Theorem 5.22.

If c ≥ 0, then Dijkstra’s Algorithm solves the shortest paths problemcorrectly in time O(n2). A heap-based implementation yields runningtime O(m log n).

Proof: . . .

Remark: The for-loop in Dijkstra’s Algorithm (step iv) can be modifiedsuch that only arcs (v ,w) with w ∈ S are considered.

172

Feasible Potentials and Nonnegative Costs

Observation 5.23.

For given arc costs c ∈ RA and node potential y ∈ RV , define arc costsc ′ ∈ RA by c ′(v ,w) := c(v ,w) + yv − yw . Then, for all v ,w ∈ V , a least-cost

v -w -dipath w.r.t. c is a least-cost v -w -dipath w.r.t. c ′, and vice versa.

Proof: Notice that for any v -w -dipath P it holds that

c ′(P) = c(P) + yv − yw .

Corollary 5.24.

For given arc costs c ∈ RA (not necessarily nonnegative) and a givenfeasible potential y ∈ RV , one can use Dijkstra’s Algorithm to solve theshortest paths problem.

Definition 5.25.

For a digraph D = (V ,A), arc costs c ∈ RA are called conservative if thereis no negative-cost dicircuit in D, i. e., if there is feasible potential y ∈ RV .

173

Efficient Algorithms

What is an efficient algorithm?

I efficient: consider running time

I algorithm: Turing Machine or other formal model of computation

Simplified Definition

An algorithm consists of

I “elementary steps” like, e. g., variable assignments

I simple arithmetic operations

which only take a constant amount of time. The running time of thealgorithm on a given input is the number of such steps and operations.

174

Bit Model and Arithmetic Model

Two ways of measuring the running time and the size of the input I of A:

Bit Model

Count bit operations; e. g., addingtwo n-bit numbers takes n(+1) steps;multiplying them takes O(n2) steps.

Size of input I is the total number ofbits needed to encode “structure”and numbers.

Arithmetic Model

Simple arithmetic operations onarbitrary numbers can beperformed in constant time.

Size of input I is total number ofbits needed to encode “structure”plus # numbers in the input.

175

Polynomial vs. Strongly Polynomial Running Time

Definition 5.26.

i An algorithm runs in polynomial time if, in the bit model, its(worst-case) running time is polynomially bounded in the input size.

ii An algorithm runs in strongly polynomial time if, in the bit model aswell as in the arithmetic model, its (worst-case) running time ispolynomially bounded in the input size.

Examples:

I Prim’s and Kruskal’s Algorithm as well as the Ford-BellmanAlgorithm and Dijkstra’s Algorithm run in strongly polynomial time.

I The Euclidean Algorithm runs in polynomial time but not in stronglypolynomial time.

176

Pseudopolynomial Running Time

I In the bit model, we assume that numbers are binary encoded, i. e.,the encoding of the number n ∈ N needs blog nc+ 1 bits.

I Thus, the running time bound O(C n2) of Ford’s Algorithm whereC := 2 maxa∈A |ca|+ 1 is not polynomial in the input size.

I If we assume, however, that numbers are unary encoded, then C n2 ispolynomially bounded in the input size.

Definition 5.27.

An algorithm runs in pseudopolynomial time if, in the bit model withunary encoding of numbers, its (worst-case) running time is polynomiallybounded in the input size.

177

Chapter 6:Maximum Flow Problems

(cp. Cook, Cunningham, Pulleyblank & Schrijver, Chapter 3)

178

Maximum s-t-Flow Problem

Given: Digraph D = (V ,A), arc capacities u ∈ RA≥0, nodes s, t ∈ V .

Definition 6.1.

A flow in D is a vector x ∈ RA≥0. Moreover, a flow x in D

i obeys arc capacities and is called feasible, if xa ≤ ua for each a ∈ A;

ii has excess exx(v):= x(δ−(v))− x(δ+(v)) at node v ∈ V ;

iii satisfies flow conservation at node v ∈ V if exx(v) = 0;

iv is a circulation if it satisfies flow conservation at each node v ∈ V ;

v is an s-t-flow of value exx(t) if it satisfies flow conservation at eachnode v ∈ V \ {s, t} and if exx(t) ≥ 0.

The maximum s-t-flow problem asks for a feasible s-t-flow in D ofmaximum value.

179

s-t-Flows and s-t-CutsFor a subset of nodes U ⊆ V , the excess of U is defined as

exx(U) := x(δ−(U))− x(δ+(U)) .

Lemma 6.2.

For a flow x and a subset of nodes U it holds that exx(U) =∑

v∈U exx(v).In particular, the value of an s-t-flow x is equal to

exx(t) = −exx(s) = exx(U) for each U ⊆ V \ {s} with t ∈ U.

Proof: . . .

For U ⊆ V \ {s} with t ∈ U, the subset of arcs δ−(U) is called an s-t-cut.

Lemma 6.3.

Let U ⊆ V \ {s} with t ∈ U. The value of a feasible s-t-flow x is at mostthe capacity u(δ−(U)) of the s-t-cut δ−(U). Equality holds if and only ifxa = ua for each a ∈ δ−(U) and xa = 0 for each a ∈ δ+(U).

Proof: . . .180

Residual Graph and Residual ArcsFor a = (v ,w) ∈ A, let a−1 := (w , v) be the corresponding backward arcand A−1 := {a−1 | a ∈ A}.

Definition 6.4.

For a feasible flow x , the set of residual arcs is given by

Ax := {a ∈ A | xa < ua} ∪ {a−1 ∈ A−1 | xa > 0} .

Moreover, the digraph Dx := (V ,Ax) is called the residual graph of x .

Remark: A dipath in Dx is called an x-augmenting path.

Lemma 6.5.

If x is a feasible s-t-flow such that Dx does not contain an s-t-dipath,then x is a maximum s-t-flow.

Proof: . . .181

Residual Capacities

Definition 6.6.

Let x be a feasible flow. For a ∈ A, define

ux(a) := u(a)− x(a) if a ∈ Ax , and ux(a−1) := x(a) if a−1 ∈ Ax .

The value ux(a) is called residual capacity of arc a ∈ Ax .

Observations:

I If x is a feasible flow in (D, u) and y a feasible flow in (Dx , ux), then

z(a) := x(a) + y(a)− y(a−1) for a ∈ A

yields a feasible flow z in D (we write z := x + y for short).

I If x , z are feasible flows in (D, u), then

y(a) := max{0, z(a)− x(a)}, y(a−1) := max{0, x(a)− z(a)} for a ∈ A

yields a feasible flow y in Dx (we write y := z − x for short) .

182

Max-Flow Min-Cut Theorem and Ford-Fulkerson Algorithm

Theorem 6.7.

The maximum s-t-flow value equals the minimum capacity of an s-t-cut.

Proof: . . .

Corollary.

A feasible s-t-flow x is maximum if and only if Dx does not contain ans-t-dipath.

Ford-Fulkerson Algorithm

i set x := 0;

ii while there is an s-t-dipath P in Dx

iii set x := x + δ · χP with δ := min{ux(a) | a ∈ P};

Here, χP : A→ {0, 1,−1} is the characteristic vector of dipath P definedby

χP(a) =

1 if a ∈ P,−1 if a−1 ∈ P,0 otherwise,

for all a ∈ A.

183

Termination of the Ford-Fulkerson Algorithm

Theorem 6.8.

a If all capacities are rational, then the algorithm terminates with amaximum s-t-flow.

b If all capacities are integral, it finds an integral maximum s-t-flow.

Proof: . . .

When an arbitrary x-augmenting path is chosen in every iteration, theFord-Fulkerson Algorithm can behave badly:

s

v

w

t

10k

10k

1

10k

10k

Remark: There exist instances with finite irrational capacities where theFord-Fulkerson Algorithm never terminates and the flow value converges toa value that is strictly smaller than the maximum flow value (see ex.). 184

Running Time of the Ford-Fulkerson Algorithm

Theorem 6.9.

If all capacities are integral and the maximum flow value is K <∞, thenthe Ford-Fulkerson Algorithm terminates after at most K iterations. Itsrunning time is O(m · K ) in this case.

Proof: In each iteration the flow value is increased by at least 1.

A variant of the Ford-Fulkerson Algo. is the Edmonds-Karp Algorithm:I In each iteration, choose shortest s-t-dipath in Dx (using BFS).

Theorem 6.10.

The Edmonds-Karp Algorithm terminates after at most n ·m iterations; itsrunning time is O(n ·m2).

Proof: . . .

Remark: The Edmonds-Karp Algorithm can be implemented with runningtime O(n2 ·m).

185

Application: Konig’s Theorem

Definition 6.11.

Consider an undirected graph G = (V ,E ).

i A matching in G is a subset of edges M ⊆ E with e ∩ e ′ = ∅ for alle, e ′ ∈ M with e 6= e ′.

ii A node cover is a subset of nodes C ⊆ V with e ∩C 6= ∅ for all e ∈ E .

Observation: In a bipartite graph G = (P∪Q,E ), a maximum cardinalitymatching can be found by a maximum flow computation.

Theorem 6.12.

In bipartite graphs, the maximum cardinality of a matching equals theminimum cardinality of a node cover.

Proof: . . .

186

Arc-Based LP FormulationStraightforward LP formulation of the maximum s-t-flow problem:

max∑

a∈δ+(s)

xa −∑

a∈δ−(s)

xa

s.t.∑

a∈δ−(v)

xa −∑

a∈δ+(v)

xa = 0 for all v ∈ V \ {s, t}

xa ≤ u(a) for all a ∈ A


Dual LP:

min∑

a∈Au(a) · za

s.t. yw − yv + z(v ,w) ≥ 0 for all (v ,w) ∈ A

ys = 1, yt = 0

za ≥ 0 for all a ∈ A

187

Dual Solutions and s-t-Cuts

min∑

a∈Au(a) · za

s.t. yw − yv + z(v ,w) ≥ 0 for all (v ,w) ∈ A

ys = 1, yt = 0


Observation: An s-t-cut δ+(U) (with U ⊆ V \ {t}, s ∈ U) yields feasibledual solution (y , z) of value u(δ+(U)):

I let y be the characteristic vector χU of U(i. e., yv = 1 for v ∈ U, yv = 0 for v ∈ V \ U)

I let z be the characteristic vector χδ+(U) of δ+(U)

(i. e., za = 1 for a ∈ δ+(U), za = 0 for a ∈ A \ δ+(U))

Theorem 6.13.

There exists an s-t-cut δ+(U) (with U ⊆ V \ {t}, s ∈ U) such that thecorresponding dual solution (y , z) is an optimal dual solution.

Proof: . . . 188

Flow Decomposition

Theorem 6.14.

For an s-t-flow x in D, there exist s-t-dipaths P1, . . . ,Pk and dicircuitsC1, . . . ,C` in D with k + ` ≤ m and yP1 , . . . , yPk

, yC1 , . . . , yC`≥ 0 with

xa =∑

i :a∈Pi

yPi+∑

j :a∈Cj

yCjfor all a ∈ A.

Moreover, the value of x is∑k

i=1 yPi.

Proof: . . .

Observation:For an s-t-flow x and a flow decomposition as in Theorem 6.14, let

x ′a :=∑

i :a∈Pi

yPifor all a ∈ A.

Then x ′ is an s-t-flow of the same value as x and x ′a ≤ xa for all a ∈ A.189

Path-Based LP FormulationLet P be the set of all s-t-dipaths in D.

max∑

P∈PyP

s.t.∑

P∈P:a∈PyP ≤ u(a) for all a ∈ A

yP ≥ 0 for all P ∈ PDual LP:

min∑

a∈Au(a) · za

s.t.∑

a∈Pza ≥ 1 for all P ∈ P


Remark.Notice that |P| and thus the number of variables of the primal LP and thenumber of constraints of the dual LP can be exponential in n and m.

190

Dual Solutions and s-t-Cuts

min∑

a∈Au(a) · za

s.t.∑

a∈Pza ≥ 1 for all P ∈ P


Observation: An s-t-cut δ+(U) (with U ⊆ V \ {t}, s ∈ U) yields feasibledual solution z of value u(δ+(U)):

I let z be the characteristic vector χδ+(U) of δ+(U)

(i. e., za = 1 for a ∈ δ+(U), za = 0 for a ∈ A \ δ+(U))

Theorem 6.15.

There exists an s-t-cut δ+(U) (with U ⊆ V \ {t}, s ∈ U) such that thecorresponding dual solution z is an optimal dual solution.

191

Maximum s-t-Flows: Another Algorithmic Approach

I A feasible flow x is a maximum s-t-flow if it fulfills two conditions:

i exx(v) = 0 for all v ∈ V \ {s, t}; (flow conservation)

ii there is no s-t-dipath in Dx .

I Ford-Fulkerson and Edmonds-Karp always fulfill the first conditionand terminate as soon as the second condition is fulfilled.

I The Goldberg-Tarjan Algorithm (or Push-Relabel Algorithm, orPreflow-Push Algorithm) always fulfills the second condition andterminate as soon as the first condition is fulfilled.

Definition 6.16.

i A flow x is called preflow if exx(v) ≥ 0 for all v ∈ V \ {s}.ii A node v ∈ V \ {s, t} is called active if exx(v) > 0.

192

Valid Labelings and Admissible Arcs

Definition 6.17.

Let x be a preflow. A function d : V → Z≥0 with d(s) = n and d(t) = 0is called valid labeling if d(v) ≤ d(w) + 1 for all (v ,w) ∈ Ax .An arc (v ,w) ∈ Ax is called admissible if v is active and d(v) = d(w) + 1.

For v ,w ∈ V , let dx(v ,w) be the length of a shortest v -w -dipath in Dx .

Observation 6.18.

Let x be a feasible preflow and d a valid labeling. Then dx(v , t) ≥ d(v).

Lemma 6.19.

Let x be a feasible preflow and d a valid labeling.

a There is a v -s-dipath in Dx for every active node v .

b There is no s-t-dipath in Dx .

Proof: . . .193

Goldberg-Tarjan Algorithm

Goldberg-Tarjan Algorithm

1 for a ∈ δ+(s) set x(a) := u(a); for a ∈ A \ δ+(s) set x(a) := 0;set d(s) := n; for v ∈ V \ {s} set d(v) := 0;

2 while there is an active node v do

3 if there is no admissible arc a ∈ δ+Dx

(v) then Relabel(v);

choose an admissible arc a ∈ δ+Dx

(v) and Push(a);

Relabel(v)

set d(v) := min{d(w) + 1 | (v ,w) ∈ Ax};

Push(a)

augment x along arc a by γ := min{exx(v), ux(a)}; (a ∈ δ+Dx

(v))

194

Analysis of the Goldberg-Tarjan Algorithm

Lemma 6.20.

At any stage of the algorithm, x is a feasible preflow and d a valid labeling.

Proof: . . .

Corollary 6.21.

After termination of the algorithm, x is a maximum s-t-flow.

Proof: . . .

Lemma 6.22.

a A label d(v) is never decreased by the algorithm; calling Relabel(v)strictly increases d(v).

b d(v) ≤ 2n − 1 throughout the algorithm.

c The number of Relabel operations is at most 2n2.

Proof: . . .195

Bounding the Number of Push OperationsWe distinguish two types of Push operations:

I A Push operation on arc a is called saturating if, after the Push, arc ahas disappeared from the residual graph Dx .

I Otherwise, the Push operation is called nonsaturating and node vwith a ∈ δ+(v) is no longer active.

Lemma 6.23.

The number of saturating Push operations is in O(m · n).

Proof: . . .

Lemma 6.24.

The number of unsaturating pushes is at most O(m · n2).

Lemma 6.25.

If the algorithm always chooses an active node v with d(v) maximum,then the number of unsaturating pushes is in O(n3).

Proof: . . . 196

Running Time of the Goldberg-Tarjan Algorithm

Theorem 6.26.

The Goldberg-Tarjan Algorithm finds a maximum s-t-flow in O(n2m) time.

Theorem 6.27.

If the algorithm always chooses an active node v with d(v) maximum, itsrunning time is O(n3).

Remark:If the algorithm always chooses an active node v with d(v) maximum, onecan show that the number of unsaturating pushes and thus the totalrunning time is at most O(n2√m).

197

Chapter 7:Minimum-Cost Flow Problems

(cp. Cook, Cunningham, Pulleyblank & Schrijver, Chapter 4;Korte & Vygen, Chapter 9)

198

b -Transshipments and Costs

Given: Digraph D = (V ,A), capacities u : A→ R≥0, arc costs c : A→ R

Definition 7.1.

i Let b : V → R. A flow x is called b -transshipment if

exx(v) = b(v) for all v ∈ V .

ii The cost of a flow x is defined as c(x) :=∑

a∈A c(a) · x(a).

Observation 7.2.

If there is a feasible b -transshipment, then one can be found by amaximum flow computation.

Proof: . . .

Remark. The existence of a b -transshipment implies that∑

v∈V b(v) = 0.

199

Minimum-Cost b -Transshipment Problem

Minimum-cost b -transshipment problem

Given: D = (V ,A), u : A→ R≥0, c : A→ R, b : V → RTask: find a feasible b -transshipment of minimum cost

Special cases:

I min-cost s-t-flow problem (for given flow value)

I min-cost circulation problem

Cost of residual arc:For a given feasible flow x , we extend the cost function c to Ax by defining

c(a−1) := −c(a) for a ∈ A.

200

Optimality Criteria

Theorem 7.3.

A feasible b -transshipment x has minimum cost among all feasibleb -transshipments if and only if each dicircuit of Dx has nonnegative cost.

Proof: . . .

Theorem 7.4.

A feasible b -transshipment x has minimum cost among all feasibleb -transshipments if and only if there is a feasible potential y ∈ RV , i. e.,

yv + c((v ,w)

)≥ yw for all (v ,w) ∈ Ax .

Proof: . . .

201

Alternative Proof of Theorem 7.4Consider LP formulation of min-cost b -transshipment problem:

min∑

a∈Ac(a) · xa

s.t.∑

a∈δ−(v)

xa −∑

a∈δ+(v)

xa = b(v) for all v ∈ V

xa ≤ u(a) for all a ∈ A


Dual LP:

max∑

v∈Vb(v) · yv +

∑

a∈Au(a) · za

s.t. yw − yv + z(v ,w) ≤ c((v ,w)

)for all (v ,w) ∈ A

za ≤ 0 for all a ∈ A

The result follows from complementary slackness conditions.202

Negative-Cycle Canceling Algorithm

Negative-cycle canceling algorithm

i compute a feasible b -transshipment x or determine that none exists;

ii while there is a negative-cost dicircuit C in Dx

iii set x := x + δ · χC with δ := min{ux(a) | a ∈ C};

Remarks:

I The negative-cost dicircuit C in step (ii) can be found in O(nm) timeby the Ford-Bellman Algorithm.

I However, the number of iterations is only pseudo-polynomial in theinput size.

I If arc capacities and b-values are integral, the algorithm returns anintegral min-cost b -transshipment.

203

Minimum-Mean-Cycle Canceling Algorithm

The mean cost of a dicircuit C in Dx is∑

a∈C c(a)

|C |

Theorem 7.5.

Choosing a minimum mean-cost dicircuit in step (ii), the number ofiterations is in O(n ·m2 · log n).

Proof: . . .

Theorem 7.6.

A minimum mean-cost dicircuit can be found in O(n ·m) time.

Proof: Exercise!

204

Running Time of Minimum-Mean-Cycle Canceling

Corollary 7.7.

A min-cost b -transshipment can be found in O(n2 ·m3 · log n) time.

Remark:

I Goldberg and Tarjan showed that the running time of theminimum-mean cycle canceling algorithm can be improved toO(n ·m2 · log2 n).

205

Augmenting Flow Along Min-Cost DipathsRemark: In the following we assume without loss of generality that in agiven min-cost b -transshipment problem

I all arc costs are nonnegative;I there is a dipath of infinite capacity between every pair of nodes.

Theorem 7.8.

Let x be a feasible min-cost b -transshipment, s, t ∈ V , and P a min-costs-t-dipath in Dx with bottleneck capacity ux(P) := mina∈P ux(a). Then,

x + δ · χP with 0 ≤ δ ≤ ux(P)

is a feasible min-cost b′-transshipment with

b′(v) :=

b(v) + δ for v = t,

b(v)− δ for v = s,

b(v) otherwise.

Proof: . . .206

Successive Shortest Path AlgorithmIn the following we assume that

∑v∈V b(v) = 0.

Successive Shortest Path Algorithm

i set x := 0;

ii while b 6= 0

iii find min-cost s-t-dipath P in Dx for s, t ∈ V , b(s) < 0, b(t) > 0;

iv set δ := min{−b(s), b(t), ux(P)} and

x := x + δ · χP , b(s) := b(s) + δ, b(t) := b(t)− δ;

Theorem 7.9.

If all arc capacities and b-values are integral, the Successive Shortest PathAlgorithm terminates with an integral min-cost b -transshipment after atmost 1

2

∑v∈V |b(v)| iterations.

Proof: . . .207

Capacity ScalingFor a flow x and ∆ > 0, let A∆

x := {a ∈ Ax | ux(a) ≥ ∆}, D∆x := (V ,A∆

x );set U := maxa∈A u(a).

Successive Shortest Path Algorithm with Capacity Scaling

i set x := 0, ∆ := 2blog Uc, p(v) := 0 for all v ∈ V ;

ii while ∆ ≥ 1

iii for all a = (v ,w) ∈ A∆x with c(a) < p(w)− p(v)

iv set b(v) := b(v) + ux(a) and b(w) := b(w)− ux(a);

augment x by sending ux(a) units of flow along arc a;

v set S(∆) := {v ∈ V | b(v) ≤ −∆}, T (∆) := {v ∈ V | b(v) ≥ ∆};vi while S(∆) 6= ∅ and T (∆) 6= ∅vii find min-cost s-t-dipath P in D∆

x for some s ∈ S(∆), t ∈ T (∆);

set p to the vector of shortest (min-cost) path distances from s;

augment ∆ flow units along P in x ; update b, S(∆), T (∆), D∆x ;

viii ∆ := ∆/2;

208

Analysis of Running Time

Remark:

I Steps (iii)–(iv) ensure that optimality conditions are always fulfilled.

Theorem 7.10.

If all arc capacities and b-values are integral, the Successive Shortest PathAlgorithm with Capacity Scaling terminates with an integral min-costb -transshipment after at most O(m log U) calls to a shortest pathsubroutine.

Proof: . . .

Remark:

I A variant of the Successive Shortest Path Algorithm with stronglypolynomial running time can be obtained by a refined use of capacityscaling (Orlin 1988/1993).

209

Chapter 8:Complexity Theory

(cp. Cook, Cunningham, Pulleyblank & Schrijver, Chapter 9;Korte & Vygen, Chapter 15)

210

Efficient Algorithms: Historical Remark

Edmonds (1965):

Edmonds (1967):

Jack Edmonds (1934–)

211

Is There a Good Algorithm for the TSP?

212


212


212

Decision ProblemsMost of complexity theory is based on decision problems such as, e. g.:

(Undirected) Hamiltonian Circuit Problem


Task: decide whether G contains a Hamiltonian circuit.

Definition 8.1.

i A decision problem is a pair P = (X ,Y ). The elements of X arecalled instances of P, the elements of Y ⊆ X are the yes-instances,those of X \ Y are no-instances.

ii An algorithm for a decision problem (X ,Y ) decides for a given x ∈ Xwhether x ∈ Y .

Example. For Hamiltonian Circuit, X is the set of all (undirected) graphsand Y ⊂ X is the subset of graphs containing a Hamiltonian circuit.

213

Further Examples of Decision Problems

(Integer) Linear Inequalities Problem

Given: matrix A ∈ Zm×n, vector b ∈ Zm.

Task: decide whether there is x ∈ Qn (x ∈ Zn) with A · x ≥ b.

Clique Problem

Given: undirected graph G = (V ,E ), positive integer k .

Task: decide whether G has k mutually adjacent nodes(i. e., a complete subgraph on k nodes which is called clique of size k).

Node Covering Problem


Task: decide whether there is a node cover C ⊆ V with |C | ≤ k.

214

Further Examples of Decision Problems (cont.)

Set Packing Problem

Given: finite set U, family of subsets S ⊆ 2U , positive integer k .

Task: decide whether there are k mutually disjoint subsets in R.

Set Covering Problem


Task: decide whether there is R ⊆ S with U =⋃

R∈RR and |R| ≤ k .

Hitting Set Problem


Task: decide whether there is T ⊆ U with T ∩ S 6= ∅ for all S ∈ S,and |T | ≤ k.

215

Further Examples of Decision Problems (cont.)

Node Coloring Problem


Task: decide whether the nodes of G can be colored with at most k colorssuch that adjacent nodes get different colors.

Spanning Tree Problem

Given: graph G = (V ,E ), edge weights w : E → Z, positive integer k.

Task: decide whether there is a spanning subtree of weight at most k .

Steiner Tree Problem

Given: graph G = (V ,E ), terminals T ⊆ V , edge weights w : E → Z,positive integer k.

Task: decide whether there is a subtree of G of weight at most k thatcontains all terminals in T .

216

Complexity Classes P and NP

Definition 8.2.

The class of all decision problems for which there is a deterministicpolynomial time algorithm is denoted by P.

Example: The Spanning Tree Problem is in P.

Definition 8.3.

A decision problem P = (X ,Y ) belongs to the complexity class NP ifthere is a polynomial function p : Z→ Z and a decision problemP ′ = (X ′,Y ′) in P, where

X ′ := {(x , c) | x ∈ X , c ∈ {0, 1}p(size(x))}

such that Y = {x ∈ X | ∃c : (x , c) ∈ Y ′}.We call c with (x , c) ∈ Y ′ a certificate for x .

Examples: The Hamiltonian Circuit Problem and all problems listed on thelast three slides belong to NP.

217

Certificates and Nondeterministic Turing MachinesRemarks.

I The complexity class P consists of all decision problems that can besolved by a deterministic Turing machine in polynomial time.

I The complexity class NP consists of all decision problems that can besolved by a non-deterministic Turing machine in polynomial time.

I NP stands for Non-deterministic Polynomial.I A certificate c for a problem instance x describes an accepting

computation path of the non-deterministic turing machine for x .

Lemma 8.4.

P ⊆ NP.

NP P

Proof: Deterministic Turing machines are a special case ofnon-deterministic Turing machines.

218

Polynomial Reductions

Definition 8.5.

Let P1 and P2 be decision problems. We say that P1 polynomially reducesto P2 if there exists a polynomial time oracle algorithm for P1 using anoracle for P2.

Remark.

I An oracle for P2 is a subroutine that can solve an instance of P2.

I Calling the oracle (subroutine) is counted as one elementarycomputation step.

Lemma 8.6.

Let P1 and P2 be decision problems. If P2 ∈ P and P1 polynomiallyreduces to P2, then P1 ∈ P.

Proof: Replace the oracle by a polynomial algorithm for P2.

219

Finding Hamiltonian Circuits via Hamiltonian Paths

Hamiltonian Path Problem


Task: decide whether G contains a Hamiltonian path.

Lemma 8.7.

The Hamiltonian Circuit Problem polynomially reduces to the HamiltonianPath Problem.

Proof: The following oracle algorithm runs in polynomial time.

Oracle Algorithm.

i for every e = {v ,w} ∈ E do

ii construct G ′ =(V ∪ {s, t},E ∪

{{s, v}, {w , t}

});

iii ask the oracle whether there is a Hamiltonian path in G ′;

iv if the oracle answers ‘yes’ then stop and output ‘yes’;

v output ‘no’;

220

Polynomial Transformations

Definition 8.8.

Let P1 = (X1,Y1) and P2 = (X2,Y2) be decision problems. We say thatP1 polynomially transforms to P2 if there exists a function f : X1 → X2

computable in polynomial time such that for all x ∈ X1

x ∈ Y1 ⇐⇒ f (x) ∈ Y2 .

Lemma 8.9.

The Hamiltonian Circuit Problem polynomially transforms to theHamiltonian Path Problem.

Proof: . . .

Remarks.I A polynomial reductions is also called Turing reduction.I A polynomial transformation is also called Karp reduction.I Every Karp reduction is also a Turing reduction, but not vice versa.I Both notions are transitive.

221

NP-Completeness

Definition 8.10.

A decision problem P ∈ NP is NP-complete if all other problems in NPpolynomially transform to P.

Satisfiability Problem (SAT)

Given: Boolean variables x1, . . . , xn and a family of clauses where eachclause is a disjunction of Boolean variables or their negations.

Task: decide whether there is a truth assignment to x1, . . . , xn such thatall clauses are satisfied.

Example: (x1 ∨ ¬x2 ∨ x3) ∧ (x2 ∨ ¬x3) ∧ (¬x1 ∨ x2)

222

Cook’s Theorem (1971)

Theorem 8.11.

The Satisfiability problem is NP-complete.

Stephen Cook (1939–)

Proof idea: SAT is obviously in NP. One can show that any non-deterministic Turing machine can be encoded as an instance of SAT.

223

Proofing NP-Completeness

Lemma 8.12.

Let P1 and P2 be decision problems. If P1 is NP-complete, P2 ∈ NP, andP1 polynomially transforms to P2, then P2 is NP-complete.

Proof: As mentioned above, polynomial transformations are transitive.

3-Satisfiability Problem (3SAT)

Given: Boolean variables x1, . . . , xn and a family of clauses where eachclause is a disjunction of at most 3 Boolean variables or their negations.

Task: decide whether there is a truth assignment to x1, . . . , xn such thatall clauses are satisfied.

Theorem 8.13.

3SAT is NP-complete.

Proof: . . .224

Proofing NP-Completeness (cont.)

Stable Set Problem


Task: decide whether G has k mutually non-adjacent nodes(i. e., a stable set of size k).

Theorem 8.14.

The Stable Set problem and the Clique problem are both NP-complete.

Proof: . . .

225

Transformations for Karp’s 21 NP-Complete Problems

Richard M. Karp (1972). Reducibility Among Combinatorial Problems226

P vs. NP

Theorem 8.15.

If a decision problem P is NP-complete and P ∈ P, then P = NP.

Proof: See definition of NP-completeness and Lemma 8.6.

There are two possible szenarios for the shape of the complexity world:

NP

PNP-c

scenario A

P = NP = NP-c

scenario B

I It is widely believed that P 6= NP, i. e., scenario A holds.

I Deciding whether P = NP or P 6= NP is one of the seven milleniumprize problems established by the Clay Mathematics Institute in 2000.

227

228

Complexity Class coNP and coNP-Complete Problems

Definition 8.16.

i The complement of decision problem P = (X ,Y ) is P := (X ,X \ Y ).

ii coNP := {P | P ∈ NP}iii A problem P ∈ coNP is coNP-complete if all problems in coNP

polynomially transform to P.

Theorem 8.17.

i P is NP-complete if and only if P is coNP-complete.

ii Unless NP = coNP, no coNP-complete problem is in NP.

iii Unless P = NP, there are problems in NP that are neither in P norNP-complete.

229

Complexity Landscape (Widely Believed)

coNPNP

PNP-c coNP-c

Remark.I Almost all problems that are known to be in NP ∩ coNP are also

known to be in P.I One of few exceptions used to be the problem Primes (given a

positive integer, decide whether it is prime), before it was shown tobe in P by Agrawal, Kayal, and Saxena in 2002.

230

The Asymmetry of Yes and NoExample:For a given TSP instance, does there exist a tour of cost ≤ 1573084?

Pictures taken from www.tsp.gatech.edu

231

www.tsp.gatech.edu

Partition Problem

Partition Problem

Given: positive integers a1, . . . , an ∈ Z>0.

Task: decide whether there exists S ⊆ {1, . . . , n} with

∑

i∈Sai =

∑

i∈{1,...,n}\S

ai .

Theorem 8.18.

i The Partition problem can be solved in pseudopolynomial time.

ii The Partition Problem is NP-complete.

Proof of (i): . . .

232

Strongly NP-Complete Problems

Let P = (X ,Y ) be a decision problem, p : Z→ Z a polynomial function.

I Let Xp ⊂ X denote the subset of instances x where the absolute valueof every (integer) number in the input x is at most p(size(x)).

I Let Pp := (Xp,Y ∩ Xp).

Definition 8.19.

A problem P ∈ NP is strongly NP-complete if Pp is NP-complete forsome polynomial function p : Z→ Z.

Theorem 8.20.

Unless P = NP, there is no pseudopolynomial algorithm for any stronglyNP-complete problem P.

Proof: Such an algorithm would imply that the NP-complete problem Ppis in P and thus P = NP.

233

Strongly NP-Complete Problems (cont.)Examples:

I The decision problem corresponding to the Traveling SalespersonProblem (TSP) is strongly NP-complete.

I The Partition problem is NP-complete but not strongly NP-complete(unless P = NP, by Theorem 8.20).

I The 3-Partition problem below is strongly NP-complete.

3-Partition Problem

Given: positive integers a1, . . . , a3n,B ∈ Z>0 with∑3n

i=1 ai = nB.

Task: decide whether the index set {1, . . . , 3n} can be partitioned into ndisjoint subsets S1, . . . ,Sn such that

∑i∈Sj ai = B for j = 1, . . . , n.

Remark.I The 3-Partition problem remains strongly NP-complete if one adds

the requirement that B/4 < ai < B/2 for all i = 1, . . . , 3n.I Notice that each Sj must contain exactly three indices in this case.

234

NP-Hardness

Definition 8.21.

Let P be an optimization or decision problem.

i P is NP-hard if all problems in NP polynomially reduce to it.

ii P is strongly NP-hard if Pp is NP-hard for some polynomialfunction p.

Remarks.I A sufficient criterion for an optimization problem to be (strongly)

NP-hard is the (strong) NP-completeness or (strong) NP-hardness ofthe corresponding decision problem.

I It is open whether each NP-hard decision problem P ∈ NP isNP-complete (difference between Turing and Karp reduction!).

I An example of an NP-hard decision problem that does not appear tobe in NP (and thus might not be NP-complete) is the following:

Given an instance of SAT, decide whether the majority of all truthassignments satisfy all clauses.

235

NP-Hardness (cont.)

Theorem 8.22.

Unless P = NP, there is no polynomial time algorithm for any NP-harddecision or optimization problem.

Proof: Clear.

Remark: It is thus widely believed that problems like the TSP and all otherNP-hard optimization problems cannot be solved in polynomial time.

236

Complexity of Linear Programming

I As discussed in Chapter 4, so far no variant of the simplex methodhas been shown to have a polynomial running time.

I Therefore, the complexity of Linear Programming remainedunresolved for a long time.

I Only in 1979, the Soviet mathematician Leonid Khachiyan provedthat the so-called ellipsoid method earlier developed for nonlinearoptimization can be modified in order to solve LPs in polynomial time.

I In November 1979, the New York Times featured Khachiyan and hisalgorithm in a front-page story.

I We will give a sketch of the ellipsoid method and its analysis.

I More details can, e. g., be found in the book of Bertsimas & Tsitsiklis(Chapter 8) or in the book Geometric Algorithms and CombinatorialOptimization by Grotschel, Lovasz & Schrijver (Springer, 1988).

237

New York Times, Nov. 27, 1979 238

Geometric Basics: Positive Definite Matrices

Definition 8.23.

A symmetric matrix D ∈ Rn×n is positive definite if

xT · D · x > 0 for all x ∈ Rn \ {0}.

Lemma 8.24.

For a symmetric matrix D ∈ Rn×n, the following statements areequivalent:

i D is positive definite.

ii D−1 exists and is positive definite.

iii D has only real and positive eigenvalues.

iv D = BT · B for a non-singular matrix B ∈ Rn×n.

Proof: Basic linear algebra. . .

239

Geometric Basics: Ellipsoids

Definition 8.25.

A set E ⊆ Rn of the form

E = E (z ,D) := {x ∈ Rn | (x − z)T · D−1 · (x − z) ≤ 1}

where z ∈ Rn and D ∈ Rn×n is positive definite is called an ellipsoid withcenter z .

z

Example: For r > 0, the ellipsoid

E (z , r 2 · I ) = {x ∈ Rn | (x − z)T · (x − z) ≤ r 2} = {x ∈ Rn | ‖x − z‖2 ≤ r}is the ball of radius r centered at z .

240

Geometric Basics: Affine Transformations and Volume

Definition 8.26.

If D ∈ Rn×n is non-singular and b ∈ Rn, then the mapping S : Rn → Rn

defined by S(x) := D · x + b is called an affine transformation.

I For L ⊆ Rn let S(L) := {y ∈ Rn | y = D · x + b for some x ∈ Rn}.I Define the volume of L ⊆ Rn as Vol(L) :=

∫x∈L dx .

Lemma 8.27.

For the affine transformation S with S(x) = D · x + b, the volume of S(L)is Vol(S(L)) = | det(D)| · Vol(L).

Proof:Vol(S(L)) =

∫

y∈S(L)dy =

∫

x∈L| det(J(x))| dx

where J(x) is the Jacobian matrix associated with S , i. e.,

(J(x))ij =∂Si (x)

∂xj= Dij .

Thus, J(x) = D.241

Ellipsoid Method — Rough IdeaThe ellipsoid method solves the following problem:

Given: A ∈ Zm×n and b ∈ Zm, polyhedron P := {x ∈ Qn | A · x ≥ b}.Task: Find a point x ∈ P or determine that P is empty.

Example:

x1

x2

x3

P

x4

242



Example:

x1

x2

x3

P

x4

242



Example:

x1

x2

x3

P x4

242

How to Find the Next Ellipsoid?

Theorem 8.28.

Let E = E (z ,D) be an ellipsoid in Rn and a ∈ Rn \ {0}. Consider thehalfspace H := {x ∈ Rn | aT · x ≥ aT · z} and set

z := z +1

n + 1· D · a√

aT · D · a,

D :=n2

n2 − 1·(

D − 2

n + 1· D · a · aT · D

aT · D · A

).

The matrix D is symmetric and positive definite. Thus, E := E (z , D) is anellipsoid. Moreover:

i E ∩ H ⊆ E

ii Vol(E ) < e− 1

2(n+1) · Vol(E )

Proof: See Bertsimas & Tsitsiklis, Section 8.2.

243

Simplifying Assumptions

Definition 8.29.

A polyhedron P ⊆ Rn is full-dimensional if it has non-zero volume.

To state the ellipsoid method we first make some simplifying assumptions:

I Polyhedron P is bounded (i. e., a polytope) and either empty orfull-dimensional, i. e.,

I P ⊆ E (x0, r2 · I ) =: E0 with r > 0; V := Vol(E0).

I Vol(P) > v for some v > 0 (or P is empty).

I Assume that E0, V , and v are known a priori.

I Calculations (including square roots) can be made in infinite precision.

We discuss these assumptions later in greater detail. . .

244

Ellipsoid Method

i set t∗ := d2(n + 1) log(V /v)e; E0 := E (x0, r2 · I ); D0 := r 2 · I ; t := 0;

ii if t = t∗ then stop and output “P is empty”;

iii if xt ∈ P then stop and output xt ;

iv find violated constraint in A · xt ≥ b, i. e., aiT · xt < bi for some i ;

v set Ht := {x ∈ Rn | aiT · x ≥ ai

T · xt}; (halfspace containing P)

vi find ellipsoid Et+1 ⊇ Et ∩ Ht by applying Theorem 8.28;

vii set t := t + 1 and go to step ii;

Et ∩ Ht

xt

xt+1

P

245

Ellipsoid Method









Et ∩ Ht

xt

xt+1

P

245

Ellipsoid Method









Et ∩ Ht

xt

xt+1

P

245

Correctness of the Ellipsoid Method

Theorem 8.30.

The ellipsoid method returns a point in P or decides correctly that P = ∅.

Proof: If xt ∈ P for some t < t∗, then the algorithm returns xt .

Otherwise: By induction P ⊆ Ek for k = 0, 1, . . . , t∗.

By Theorem 8.28 we getVol(Et+1)

Vol(Et)< e

− 12(n+1) for all t.

Thus,Vol(Et∗)

Vol(E0)< e

− t∗2(n+1) .

=⇒ Vol(Et∗) < V · e−d2(n+1) log(V/v)e

2(n+1) ≤ V · e− log(V /v) = v .

Since v is a lower bound on the volume of non-empty P, the algorithmcorrectly decides that P = ∅.

246

What if P is Unbounded?

Lemma 8.31.

Let A ∈ Zm×n, b ∈ Rm, and let U be the largest absolute value of entriesin A and b. Every extreme point of polyhedron P = {x ∈ Rn | A · x ≥ b}satisfies

−(nU)n ≤ xj ≤ (nU)n for j = 1, . . . , n. (8.1)

Proof: . . .

Remarks.I Define the polytope PB := {x ∈ P | −(nU)n ≤ xj ≤ (nU)n for all j}.I Under the assumption that rank(A) = n, we get:

P 6= ∅ ⇐⇒ P contains an extreme point ⇐⇒ PB 6= ∅I Thus, we can look for a point in PB ⊆ P instead of P.I Start with ellipsoid E0 := E (0, n(nU)2n · I ) ⊇ PB .I Note that Vol(E0) ≤ V := (2n(nU)n)n = (2n)n · (nU)n

2.

247

PB

−(nU)n(nU)n

−(nU)n

(nU)nP

248

What if P is not Full-Dimensional?

Lemma 8.32.

Let A ∈ Zm×n, b ∈ Zm, and let U be the largest absolute value of entriesin A and b. Consider polyhedron P = {x ∈ Rn | A · x ≥ b}, define

ε :=1

2(n + 1)·((n + 1)U

)−(n+1)

and a new polyhedron

Pε := {x ∈ Rn | A · x ≥ b − ε · 1} .

Then it holds that

a P = ∅ =⇒ Pε = ∅b P 6= ∅ =⇒ Pε is full-dimensional

c Given a point in Pε, a point in P can be obtained in polynomial time.

Proof: . . .

249

Bounding the Number of Iterations

Lemma 8.33.

If the polyhedron P = {x ∈ Rn | A · x ≥ b} is full-dimensional andbounded with U as above, then

Vol(P) > n−n(nU)−n2(n+1) .

Proof idea:One can show that P has n + 1 extreme points v 0, . . . , vn such that

Vol(conv(v 0, . . . , vn)

)> n−n(nU)−n

2(n+1) .

Theorem 8.34.

The number of iterations of the ellipsoid method can be bounded byO(n6 log(nU)).

250

Required Numeric Precision

Major problems:

I Bound number of arithmetic operations per iteration.

I How to take square roots?

I Only finite precision possible!

Theorem 8.35.

Using only O(n3 log U) binary digits of precision, the ellipsoid method stillcorrectly decides whether P is empty in O(n6 log(nU)) iterations. Thus,the Linear Inequalities problem can be solved in polynomial time.

251

Solving LPs in Polynomial Time

Consider a pair of primal and dual LPs:

min cT · x max pT · bs.t. A · x ≥ b s.t. pT · A ≤ cT

x ≥ 0 p ≥ 0

Solve the primal and dual LP by finding a point (x , p) in the polyhedrongiven by

{(x , p) | cT · x = pT · b, A · x ≥ b, pT · A ≤ cT , x , p ≥ 0} .

Theorem 8.36.

Linear programs can be solved in polynomial time.

252

LPs with Exponentially Many Constraints

I The number of iterations of the ellipsoid method only depends on thedimension n and U, but not on the number of constraints m.

I Thus there is hope to solve LPs with exponentially many constraints(that are implicitly given) in polynomial time.

Example: Consider the following LP relaxation of the TSP (subtour LP):

min∑

e∈Ece · xe

s.t.∑

e∈δ(v)

xe = 2 for all nodes v ∈ V ,

∑

e∈δ(X )

xe ≥ 2 for all subsets ∅ 6= X ( V , (8.2)

0 ≤ xe ≤ 1 for all edges e.

Notice that there are 2n−1 − 1 subtour elimination constraints (8.2).253

LPs with Exponentially Many Constraints (cont.)

I Describe polyhedron P = {x ∈ Rn | A · x ≥ b} by specifying n and aninteger vector h of primary data of dimension O(nk) with k constant.Let U0 := maxi |hi |.

I There is a mapping which, given n and h, defines A ∈ Zm×n andb ∈ Zm. Let

U := max{|aij |, |bi | | i = 1, . . . ,m, j = 1, . . . , n} .

I We assume that there are constants C and ` such that

log U ≤ C · n` · log` U0 ,

that is, U can be encoded polynomially in the input size.

I The number of iterations of the ellipsoid method is

O(n6 log(nU)) = O(n6 log n + n6+` log` U0)

and thus polynomial in the input size of the primary problem data.

254

Separation Problem

In every iteration of the ellipsoid method, we have to solve the followingproblem:

Definition 8.37.

Given a polyhedron P ⊆ Rn and x ∈ Rn, the separation problem is to

i either decide that x ∈ P, or

ii find d ∈ Rn with dT · x < dT · y for all y ∈ P.

Example: The subtour elimination constraints (8.2) can be separated inpolynomial time by finding a minimum capacity cut.

255

Optimization is as Difficult as Separation

The following Theorem by Grotschel, Lovasz, and Schrijver is aconsequence of the ellipsoid method.

Theorem 8.38.

i Given a family of polyhedra, if we can solve the separation problem intime polynomial in n and log U, then we can also solve LPs over thosepolyhedra in time polynomial in n and log U.

ii The converse is also true under some technical conditions.

Example: The subtour LP for the TSP can be solved in polynomial timesince the subtour elimination constraints (8.2) can be separated efficiently.

256

Path-Based LP Formulation for Flow ProblemsLet P be the set of all s-t-dipaths in digraph D.

max∑

P∈PyP

s.t.∑

P∈P:a∈PyP ≤ u(a) for all a ∈ A

yP ≥ 0 for all P ∈ PDual LP:

min∑

a∈Au(a) · za

s.t.∑

a∈Pza ≥ 1 for all P ∈ P (8.3)


Since constraints (8.3) can be separated efficiently by a shortest pathcomputation, the dual LP can be solved efficiently. Using complementaryslackness conditions, also the primal LP can be solved in polynomial time.

257

· introduction to linear and combinatorial optimization (adm i) martin skutella tu berlin ws...

Documents