integer and combinatorial optimization: clustering...

21
Integer and Combinatorial Optimization: Clustering Problems John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA February 2019 Mitchell Clustering Problems 1 / 14

Upload: others

Post on 25-Apr-2020

29 views

Category:

Documents


0 download

TRANSCRIPT

Integer and Combinatorial Optimization:Clustering Problems

John E. Mitchell

Department of Mathematical SciencesRPI, Troy, NY 12180 USA

February 2019

Mitchell Clustering Problems 1 / 14

Clustering

Clustering

We have n objects, each with a number of attributes.

We wish to group similar objects into clusters.

There is no limit on the number of clusters, or on the size of eachcluster.

We have a measure cij of the difference between two objects i and j ;the larger this measure, the less similar the objects.

This measure can take positive or negative values.

Mitchell Clustering Problems 2 / 14

Clustering

Variables and dimension

We model this by introducing variables

xij =

{1 if i and j in same cluster0 if i and j in different clusters

for 1 ≤ i < j ≤ n

Let S ⊆ B12 n(n−1) be the set of feasible solutions. We have the

following results regarding conv(S):

PropositionThe set S is full-dimensional.

One way to prove this is to note that the origin and all the unit vectorsare in S.

Mitchell Clustering Problems 3 / 14

Clustering

Nonnegativity

PropositionThe lower bound constraints xij > 0 define facets of conv(S).

Mitchell Clustering Problems 4 / 14

Clustering

Triangle inequalities

Proposition

Let 1 ≤ i < j < k ≤ n. The triangle inequalities

xij + xik − xjk ≤ 1xij − xik + xjk ≤ 1−xij + xik + xjk ≤ 1

define facets of conv(S).

These inequalities enforce consistency.

For example, the first one says that if i and j are in the same clusterand also i and k are in the same cluster then j and k must be in thesame cluster. The only binary solution violating this constraint isxij = xik = 1, xjk = 0.

Mitchell Clustering Problems 5 / 14

Clustering

An integer program

PropositionAny binary vector satisfying all the triangle inequalities is the incidencevector of a clustering.

Thus, finding the best binary vector satisfying the triangle inequalitieswill solve the clustering problem.

Mitchell Clustering Problems 6 / 14

Clustering

Upper bound constraints

The upper bound constraints xij ≤ 1 do not define facets of conv(S).

In particular, if xij = 1 then we must also have xij + xik − xjk = 1 andxij − xik + xjk = 1 for each other k .

Mitchell Clustering Problems 7 / 14

Clustering

2-partition inequalities

The following proposition generalizes the lower bound and triangleinequalities.

Proposition

(2-partition inequalities) Let U and W be disjoint collections of objectswith |U| > |W |. The following inequality defines a facet of conv(S):∑

i∈U,j∈W

xij −∑

i∈U,j∈U

xij −∑

i∈W ,j∈W

xij ≤ |W |.

This gives the lower bound constraints when |U| = 2, |W | = 0. It givesthe triangle constraints when |U| = 2, |W | = 1.

Mitchell Clustering Problems 8 / 14

Clustering

The objective function coefficients

Note that if all the cij are nonnegative then the optimal solution is toplace each object in its own cluster, so all xij = 0.

Thus, our measure cij cannot simply be the distance between twoobjects, but must allow negative values if we are to have an interestingproblem.

For more details, see Grötschel and Wakabayashi [3, 4].

Mitchell Clustering Problems 9 / 14

Equipartition

Equipartition

Given a graph G = (V ,E) with n = |V | = 2q for some integer q, wepartition V into two sets of size q. We define the variables

xij =

{1 if i and j in same partition0 if i and j in different partitions

for 1 ≤ i < j ≤ n

Let S be the set of feasible incidence vectors of equipartitions.

Mitchell Clustering Problems 10 / 14

Equipartition

Polyhedral results

We have the following results:

Proposition

The dimension of conv(S) is 12n(n − 3).

If C is a cycle with q + 1 vertices then the inequality x(E(C)) ≤ q− 1 isfacet defining.

If U ⊆ V with |U| ≥ 3 and odd, the clique inequality x(E(U)) ≥⌊1

2 |U|⌋2

is facet-defining.

Other inequalities are known (Conforti et al. [1, 2]).

Mitchell Clustering Problems 11 / 14

Clustering with lower bound

Clustering with lower bound

Now consider a clustering problem where we require each cluster tocontain at least q elements, for some positive integer q.

For example, this problem arises in the following settings:allocating teams to divisions in a sports league. In this case, oftenrequire each division to have the same cardinality.microaggregation in the release of data: in order to preserveprivacy, clusters with tiny sizes must be avoided.

Mitchell Clustering Problems 12 / 14

Clustering with lower bound

Clustering with lower bound

Now consider a clustering problem where we require each cluster tocontain at least q elements, for some positive integer q.

For example, this problem arises in the following settings:allocating teams to divisions in a sports league. In this case, oftenrequire each division to have the same cardinality.microaggregation in the release of data: in order to preserveprivacy, clusters with tiny sizes must be avoided.

Mitchell Clustering Problems 12 / 14

Clustering with lower bound

Polyhedral theory

Let S ⊆ B12 n(n−1) be the set of incidence vectors of clusterings where

each cluster contains at least q elements. We have the followingresults regarding conv(S):

Proposition

If q < n/2 then dim(conv(S)) = 12n(n − 1), so S is full-dimensional.

PropositionThe nonnegativity constraints and the triangle constraints ofProposition 3 define facets of conv(S), provided q < n/3. The2-partition inequalities of Proposition 5 define facets of conv(S)provided (|W |+ 2)q < n.

Other families of valid inequalities are also known [5, 6].

Mitchell Clustering Problems 13 / 14

Clustering with lower bound

Polyhedral theory

Let S ⊆ B12 n(n−1) be the set of incidence vectors of clusterings where

each cluster contains at least q elements. We have the followingresults regarding conv(S):

Proposition

If q < n/2 then dim(conv(S)) = 12n(n − 1), so S is full-dimensional.

PropositionThe nonnegativity constraints and the triangle constraints ofProposition 3 define facets of conv(S), provided q < n/3. The2-partition inequalities of Proposition 5 define facets of conv(S)provided (|W |+ 2)q < n.

Other families of valid inequalities are also known [5, 6].

Mitchell Clustering Problems 13 / 14

References

M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope I: Formulations, dimension and basic facets.Mathematical Programming, 49:49–70, 1990.

M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope II: Valid inequalities and facets.Mathematical Programming, 49:71–90, 1990.

M. Grötschel and Y. Wakabayashi.A cutting plane algorithm for a clustering problem.Mathematical Programming, 45:59–96, 1989.

M. Grötschel and Y. Wakabayashi.Facets of the clique partitioning polytope.Mathematical Programming, 47:367–387, 1990.

X. Ji and J. E. Mitchell.Branch-and-price-and-cut on the clique partition problem with minimum clique size requirement.Discrete Optimization, 4(1):87–102, 2007.

J. E. Mitchell.Realignment in the national football league: Did they get it right?Naval Research Logistics, 50(7):683–701, 2003.

Mitchell Clustering Problems 14 / 14

References

M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope I: Formulations, dimension and basic facets.Mathematical Programming, 49:49–70, 1990.

M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope II: Valid inequalities and facets.Mathematical Programming, 49:71–90, 1990.

M. Grötschel and Y. Wakabayashi.A cutting plane algorithm for a clustering problem.Mathematical Programming, 45:59–96, 1989.

M. Grötschel and Y. Wakabayashi.Facets of the clique partitioning polytope.Mathematical Programming, 47:367–387, 1990.

X. Ji and J. E. Mitchell.Branch-and-price-and-cut on the clique partition problem with minimum clique size requirement.Discrete Optimization, 4(1):87–102, 2007.

J. E. Mitchell.Realignment in the national football league: Did they get it right?Naval Research Logistics, 50(7):683–701, 2003.

Mitchell Clustering Problems 14 / 14

References

M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope I: Formulations, dimension and basic facets.Mathematical Programming, 49:49–70, 1990.

M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope II: Valid inequalities and facets.Mathematical Programming, 49:71–90, 1990.

M. Grötschel and Y. Wakabayashi.A cutting plane algorithm for a clustering problem.Mathematical Programming, 45:59–96, 1989.

M. Grötschel and Y. Wakabayashi.Facets of the clique partitioning polytope.Mathematical Programming, 47:367–387, 1990.

X. Ji and J. E. Mitchell.Branch-and-price-and-cut on the clique partition problem with minimum clique size requirement.Discrete Optimization, 4(1):87–102, 2007.

J. E. Mitchell.Realignment in the national football league: Did they get it right?Naval Research Logistics, 50(7):683–701, 2003.

Mitchell Clustering Problems 14 / 14

References

M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope I: Formulations, dimension and basic facets.Mathematical Programming, 49:49–70, 1990.

M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope II: Valid inequalities and facets.Mathematical Programming, 49:71–90, 1990.

M. Grötschel and Y. Wakabayashi.A cutting plane algorithm for a clustering problem.Mathematical Programming, 45:59–96, 1989.

M. Grötschel and Y. Wakabayashi.Facets of the clique partitioning polytope.Mathematical Programming, 47:367–387, 1990.

X. Ji and J. E. Mitchell.Branch-and-price-and-cut on the clique partition problem with minimum clique size requirement.Discrete Optimization, 4(1):87–102, 2007.

J. E. Mitchell.Realignment in the national football league: Did they get it right?Naval Research Logistics, 50(7):683–701, 2003.

Mitchell Clustering Problems 14 / 14

References

M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope I: Formulations, dimension and basic facets.Mathematical Programming, 49:49–70, 1990.

M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope II: Valid inequalities and facets.Mathematical Programming, 49:71–90, 1990.

M. Grötschel and Y. Wakabayashi.A cutting plane algorithm for a clustering problem.Mathematical Programming, 45:59–96, 1989.

M. Grötschel and Y. Wakabayashi.Facets of the clique partitioning polytope.Mathematical Programming, 47:367–387, 1990.

X. Ji and J. E. Mitchell.Branch-and-price-and-cut on the clique partition problem with minimum clique size requirement.Discrete Optimization, 4(1):87–102, 2007.

J. E. Mitchell.Realignment in the national football league: Did they get it right?Naval Research Logistics, 50(7):683–701, 2003.

Mitchell Clustering Problems 14 / 14

References

M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope I: Formulations, dimension and basic facets.Mathematical Programming, 49:49–70, 1990.

M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope II: Valid inequalities and facets.Mathematical Programming, 49:71–90, 1990.

M. Grötschel and Y. Wakabayashi.A cutting plane algorithm for a clustering problem.Mathematical Programming, 45:59–96, 1989.

M. Grötschel and Y. Wakabayashi.Facets of the clique partitioning polytope.Mathematical Programming, 47:367–387, 1990.

X. Ji and J. E. Mitchell.Branch-and-price-and-cut on the clique partition problem with minimum clique size requirement.Discrete Optimization, 4(1):87–102, 2007.

J. E. Mitchell.Realignment in the national football league: Did they get it right?Naval Research Logistics, 50(7):683–701, 2003.

Mitchell Clustering Problems 14 / 14