integer programming (part 2)ryantibs/convexopt-f16/lectures/integer2.pdf · integer programming...

30
Integer programming (part 2) Lecturer: Javier Pe˜ na Convex Optimization 10-725/36-725

Upload: others

Post on 05-Mar-2020

22 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Integer programming (part 2)

Lecturer: Javier PenaConvex Optimization 10-725/36-725

Page 2: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Last time: integer programming

Consider the problem

minx

f(x)

subject to x ∈ Cxj ∈ Z, j ∈ J

where f : Rn → R, C ⊆ Rn are convex, and J ⊆ {1, . . . , n}.

Branch and boundAlgorithm to solve integer programs.Divide-and-conquer scheme combined with upper and lowerbounds for efficiency.

2

Page 3: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Branch and bound algorithmConsider the problem

minx

f(x)

subject to x ∈ Cxj ∈ Z, j ∈ J

(IP)

where f : Rn → R, C ⊆ Rn are convex and J ⊆ {1, . . . , n}.

1. Solve the convex relaxation

minx

f(x)

subject to x ∈ C(CR)

2. (CR) infeasible ⇒ (IP) is infeasible. Stop

3. Solution x? to (CR) is (IP) feasible ⇒ x? solves (IP). Stop

4. Solution x? to (CR) not (IP) feasible ⇒ lower bound for (IP).Branch and recursively solve subproblems.

3

Page 4: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

After branching

Key component of B&B

• After branching compute a lower bound for each subproblem.

• If a lower bound for a subproblem is larger than the currentupper bound, no need to consider the subproblem.

• Most straightforward way to compute lower bounds is via aconvex relaxation but other methods (e.g., Lagrangianrelaxations) can also be used.

4

Page 5: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Tightness of relaxations

Suppose we have two equivalent formulations(e.g., facility location)

minx

f(x)

subject to x ∈ Cxj ∈ Z, j ∈ J

minx

f(x)

subject to x ∈ C ′xj ∈ Z, j ∈ J

with C ⊆ C ′.

Which one should we prefer?

5

Page 6: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Outline

Today:

• More about solution techniquesI Cutting planesI Branch and cut

• Two extended examplesI Best subset selectionI Least mean squares

6

Page 7: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Convexification

Consider the special case of an integer program with linearobjective

minx

cTx

subject to x ∈ Cxj ∈ Z, j ∈ J

This problem is equivalent to

minx

cTx

subject to x ∈ S

where S := conv{x ∈ C : xj ∈ Z, j ∈ J}.

7

Page 8: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Special case: integer linear programs

Consider the problem

minx

cTx

subject to Ax ≤ bxj ∈ Z, j ∈ J

TheoremIf A, b are rational, then the set

S := conv{x : Ax ≤ b, xj ∈ Z, j ∈ J}

is a polyhedron.

Thus the above integer linear program is equivalent to a linearprogram.

How hard could that be?

8

Page 9: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Cutting plane algorithm

We say that the inequality πTx ≤ π0 is valid for a set S if

πTx ≤ π0 for all x ∈ S.

Consider the problem

minx

cTx

subject to x ∈ Cxj ∈ Z, j ∈ J

and let S := conv{x ∈ C : xj ∈ Z, j ∈ J}.

9

Page 10: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Cutting plane algorithmRecall: S = conv{x ∈ C : xj ∈ Z, j ∈ J} and want to solve

minx

cTx

subject to x ∈ Cxj ∈ Z, j ∈ J

(IP)

Cutting plane algorithm

1. let C0 := C and compute x(0) := argminx{cTx : x ∈ C0}

2. for k = 0, 1, . . .if x(k) is (IP) feasible then x(k) is an optimal solution. Stopelse

find a valid inequality (π, π0) for S that cuts off x(k)

let Ck+1 := Ck ∩ {x : πTx ≤ π0}compute x(k+1) := argminx{cTx : x ∈ Ck+1}

end ifend for

A valid inequality is also called a cutting plane or a cut 10

Page 11: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

A bit of history on cutting planes

In 1954 Dantzig, Fulkerson, and Johnson pioneered the cuttingplane approach for the traveling salesman problem. In 1958Gomory proposed a general-purpose cutting plane method to solveany integer linear program.

For more than three decades Gomory cuts were deemed impracticalfor solving actual problems. In the early 1990s Sebastian Ceria (atCMU) successfully incorporated Gomory cuts in a branch and cutframework. By the late 1990s cutting planes had become a keycomponent of commercial optimization solvers.

This subject has a long tradition at Carnegie Mellon and isassociated to some of our biggest names: Balas, Cornuejols,Jeroslow, Kilinc-Karzan and many of their collaborators.

11

Page 12: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Gomory cuts (1958)

This class of cuts is based on the following observation:

if a ≤ b and a is an integer then a ≤ bbc.

Suppose

S ⊆

x ∈ Zn+ :

n∑j=1

ajxj = a0

where a0 6∈ Z. The Gomory fractional cut is

n∑j=1

(aj − bajc)xj ≥ a0 − ba0c.

There is a rich variety of extensions of the above idea: Chvatalcuts, split cuts, lift-and-project cuts, etc.

12

Page 13: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Lift-and-project cuts

Theorem (Balas 1974)

Assume C = {x : Ax ≤ b} ⊆ {x : 0 ≤ xj ≤ 1}. ThenCj :=conv{x ∈ C : xj ∈ {0, 1}} is the projection of thepolyhedron Lj(C) defined by the following inequalities

Ay ≤ λbAz ≤ (1− λ)b

y + z = xλ ≤ 1λ ≥ 0.

Suppose x ∈ C but x 6∈ Cj . To find a cut separating x from Cjsolve

minx,y,z,λ

‖x− x‖

subject to (x, y, z, λ) ∈ Lj(C)

13

Page 14: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Branch and cut algorithm

Combine strengths of both branch and bound and cutting planes.

Consider the problem

minx

f(x)

subject to x ∈ Cxj ∈ Z, j ∈ J

(IP)

where f : Rn → R, C ⊆ Rn are convex and J ⊆ {1, . . . , n}.

14

Page 15: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Branch and cut algorithm

Branch and cut algorithm

1. Solve the convex relaxation

minx

f(x)

subject to x ∈ C(CR)

2. (CR) infeasible ⇒ (IP) is infeasible.

3. Solution x? to (CR) is (IP) feasible ⇒ x? solution to (IP).

4. Solution x? to (CR) is not (IP) feasible.Choose between two alternatives

4.1 Add cuts and go back to step 14.2 Branch and recursively solve subproblems

15

Page 16: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Integer programming technology

• State-of-the-art solvers (Gurobi, CPLEX, FICO) rely onextremely efficient implementations of numerical linearalgebra, simplex, interior-point, and other algorithms forconvex optimization.

• For mixed integer optimization, most solvers use a clever typeof branch and cut algorithm. They rely extensively on convexrelaxations. They also take advantage of warm starts as muchas possible.

• Some interesting numbersSpeedup in algorithms 1990–2016: over 500,000Speedup in hardware 1990–2016: over 500,000Total speedup over 250 billion = 2.5 · 1011

16

Page 17: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Best subset selection

Assume X =[x1 · · · xp

]∈ Rn×p and y ∈ Rn.

Best subset selection problem:

minβ

12‖y −Xβ‖

22

subject to ‖β‖0 ≤ k

Here ‖β‖0 := number of nonzero entries of β.

17

Page 18: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Best subset selection

Integer programming formulation

minβ,z

1

2‖y −Xβ‖22

subject to |βi| ≤Mi · zi, i = 1, . . . , pp∑i=1

zi ≤ k

zi ∈ {0, 1}, i = 1, . . . , p.

Here Mi is some a priori known bound on |βi| for i = 1, . . . , p.They can be computed via suitable preprocessing of X, y.

18

Page 19: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

A clever way to get good feasible solutionsBertsimas, King, and Mazumder

Consider the problem

minβg(β) subject to ‖β‖0 ≤ k

where g : Rp → R is smooth convex and ∇g is L-Lipschitz.

Best subset selection corresponds to g(β) = 12‖Xβ − y‖

22.

ObservationFor u ∈ Rp the vector

Hk(u) = argminβ:‖β‖0≤k

‖β − u‖22

is obtained by retaining the k largest entries of u.

19

Page 20: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Discrete first-order algorithmBertsimas et al.

Consider the problem

minβg(β) subject to ‖β‖0 ≤ k

where g : Rp → R is smooth convex and ∇g is L-Lipschitz.

1. start with some β(0)

2. for i = 0, 1, . . . ,β(i+1) = Hk

(β(i) − 1

L∇g(β(i)))

end for

The above iterates satisfy β(i) → β where

β = Hk

(β − 1

L∇g(β)

).

This is a kind of “local” solution to the above minimizationproblem.

20

Page 21: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Computational results (Bertsimas et al.)Diabetes dataset, n = 350, p = 64, k = 6

Best Subset Regression

Diabetes Dataset, n = 350, p = 64, k = 6

Typical behavior of Overall Algorithm

0 100 200 300 400

0.68

0.70

0.72

0.74

0.76

0.78

0.80

k=6

Time (secs)

Obj

ectiv

e

Upper BoundsLower BoundsGlobal Minimum

Bertsimas (MIT) Statistics via Optimization June 2015 12 / 48

Best Subset Regression

Diabetes Dataset, n = 350, p = 64, k = 6

Typical behavior of Overall Algorithm

0 100 200 300 4000.

000.

050.

100.

15

k=6

Time (secs)

MIO−G

ap

Bertsimas (MIT) Statistics via Optimization June 2015 13 / 48

21

Page 22: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Computational results (Bertsimas et al.)Cold vs Warm Starts

838 D. BERTSIMAS, A. KING AND R. MAZUMDER

TABLE 1Quality of upper bounds for problem (1.1) for the Diabetes dataset, for different values of k. We

observe that MIO equipped with warm starts deliver the best upper bounds in the shortest overalltimes. The run time for the MIO with warm start includes the time taken by the discrete

first-order method

Discrete first-order MIO cold start MIO warm start

k Accuracy Time Accuracy Time Accuracy Time

9 0.1306 1 0.0036 500 0 34620 0.1541 1 0.0042 500 0 7749 0.1915 1 0.0015 500 0 8757 0.1933 1 0 500 0 2

seconds to run. We used the best solution as an advanced warm start to the MIOformulation (2.5). The MIO solver was provided with problem-specific bounds ob-tained via Section 2.3.3 with ! = 2. For each of these examples, we also ran theMIO formulation without any such additional problem-specific information, thatis, formulation (2.4)—we refer to this as “Cold Start.” Figure 3 presents a represen-tative subset of the results. We also experimented (not reported here, for brevity)with bounds implied by Sections 2.3.1, 2.3.2 and observed that the MIO formula-tion (2.5) armed with warm-starts and additional bounds closed the optimality gapfaster than their “Cold Start” counterpart.

FIG. 3. The evolution of the MIO optimality gap [in log10(·) scale] for problem (1.1), for theDiabetes dataset with n = 350,p = 64, for different values of k. Here, “Warm Start” indicatesthat the MIO was provided with warm starts and parameter specifications as in Section 2.3; and“Cold Start” indicates that MIO was not provided with any such problem-specific information. MIO(“Warm Start”) is found close the optimality gap much faster. In all of these examples, the globaloptimum was found within a very small fraction of the total time, but the proof of global optimalitycame later.

22

Page 23: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Computational results (Bertsimas et al.)Sparsity detection (synthetic datasets)

Best Subset Regression

Sparsity Detection for n = 500, p = 100

0

10

20

30

40

1.742 3.484 6.967Signal−to−Noise Ratio

Num

ber o

f Non

zero

s

MethodMIOLassoStepSparsenet

Bertsimas (MIT) Statistics via Optimization June 2015 21 / 48

23

Page 24: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Least median of squares regression

Assume X =[x1 · · · xp

]∈ Rn×p and y ∈ Rn.

Given β ∈ Rp let r := y −Xβ

Observe

• Least squares (LS): βLS := argminβ

∑i

r2i

• Least absolute deviation (LAD): βLAD = argminβ

∑i

|ri|

Least Median of Squares (LMS)

βLMS := argminβ

(median|ri|).

24

Page 25: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Least quantile regression

Least Quantile of Squares (LQS)

βLQS := argminβ|r(q)|

where r(q) is the qth ordered absolute residual:

|r(1) ≤ |r(2)| ≤ · · · |r(n)|.

Key step in the formulation

Use binary and auxiliary variables to encode the relevant condition

|ri| ≤ |r(q)| or |ri| ≥ |r(q)|

for each entry i of r.

25

Page 26: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Least quantile regression

Integer programming formulation

minβ,µ,µ,z,γ

γ

subject to γ ≤ |ri|+ µi i = 1, . . . , nγ ≥ |ri| − µi, i = 1, . . . , nµi ≤Mzi, i = 1, . . . , nµi ≤M(1− zi), i = 1, . . . , np∑i=1

zi = q

µi, µi ≥ 0, i = 1, . . . , nzi ∈ {0, 1}, i = 1, . . . , p.

26

Page 27: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

First-order algorithmBertsimas & Mazumder

Observe

|r(q)| = |y(q) − xT(q)β| = Hq(β)−Hq+1(β)

where

Hq(β) =

n∑i=q

|y(i) − xT(i)β| = maxw

n∑i=1

wi|yi − xTi β|

subject to

n∑i=1

wi = q

0 ≤ wi ≤ 1, i = 1, . . . , n.

The function Hq(β) is convex.

Subgradient algorithm yields a local min to Hq(β)−Hq+1(β).

27

Page 28: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Computational results (Bertsimas & Mazumder)

Least Median Regression

Typical Evolution of MIO

0 20 40 60 80

0.0

0.1

0.2

0.3

0.4

Alcohol Data; (n,p,q) = (44,5,31)

Time (s)

Obj

ectiv

e Va

lue

Upper BoundsLower BoundsOptimal Solution

0 200 400 600 8000.

00.

10.

20.

30.

4

Alcohol Data; (n,p,q) = (44,7,31)

Time (s)

Obj

ectiv

e Va

lue

Upper BoundsLower BoundsOptimal Solution

Bertsimas (MIT) Statistics via Optimization June 2015 40 / 4828

Page 29: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

Computational results (Bertsimas & Mazumder)Cold vs Warm Starts

Least Median Regression

Impact of Warm-Starts

Evolution of MIO (cold-start) [top] vs (warm-start) [bottom]

0 500 1000 1500 2000

010

2030

4050

Time (s)

Objecti

ve Value

Upper BoundsLower BoundsOptimal Solution

0 500 1000 1500 2000

02

46

810

12

Time (s)

Objecti

ve Value

Upper BoundsLower BoundsOptimal Solution

n = 501, p = 5, synthetic example

Bertsimas (MIT) Statistics via Optimization June 2015 45 / 48

29

Page 30: Integer programming (part 2)ryantibs/convexopt-F16/lectures/integer2.pdf · Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely e cient

References and further reading

• Belotti, Kirches, Leyffer, Linderoth, Luedke, and Mahajan(2012), “Mixed-integer nonlinear optimization”

• Bertsimas and Mazumder (2016), “Best subset selection via amodern optimization lens”

• Bertsimas, King, and Mazumder (2014), “Least quantileregression via modern optimization”

• Conforti, Cornuejols, and Zambelli (2014), “Integerprogramming”

• Wolsey (1998), “Integer programming”

30