integer and combinatorial optimization: clustering...
TRANSCRIPT
Integer and Combinatorial Optimization:Clustering Problems
John E. Mitchell
Department of Mathematical SciencesRPI, Troy, NY 12180 USA
February 2019
Mitchell Clustering Problems 1 / 14
Clustering
Clustering
We have n objects, each with a number of attributes.
We wish to group similar objects into clusters.
There is no limit on the number of clusters, or on the size of eachcluster.
We have a measure cij of the difference between two objects i and j ;the larger this measure, the less similar the objects.
This measure can take positive or negative values.
Mitchell Clustering Problems 2 / 14
Clustering
Variables and dimension
We model this by introducing variables
xij =
{1 if i and j in same cluster0 if i and j in different clusters
for 1 ≤ i < j ≤ n
Let S ⊆ B12 n(n−1) be the set of feasible solutions. We have the
following results regarding conv(S):
PropositionThe set S is full-dimensional.
One way to prove this is to note that the origin and all the unit vectorsare in S.
Mitchell Clustering Problems 3 / 14
Clustering
Nonnegativity
PropositionThe lower bound constraints xij > 0 define facets of conv(S).
Mitchell Clustering Problems 4 / 14
Clustering
Triangle inequalities
Proposition
Let 1 ≤ i < j < k ≤ n. The triangle inequalities
xij + xik − xjk ≤ 1xij − xik + xjk ≤ 1−xij + xik + xjk ≤ 1
define facets of conv(S).
These inequalities enforce consistency.
For example, the first one says that if i and j are in the same clusterand also i and k are in the same cluster then j and k must be in thesame cluster. The only binary solution violating this constraint isxij = xik = 1, xjk = 0.
Mitchell Clustering Problems 5 / 14
Clustering
An integer program
PropositionAny binary vector satisfying all the triangle inequalities is the incidencevector of a clustering.
Thus, finding the best binary vector satisfying the triangle inequalitieswill solve the clustering problem.
Mitchell Clustering Problems 6 / 14
Clustering
Upper bound constraints
The upper bound constraints xij ≤ 1 do not define facets of conv(S).
In particular, if xij = 1 then we must also have xij + xik − xjk = 1 andxij − xik + xjk = 1 for each other k .
Mitchell Clustering Problems 7 / 14
Clustering
2-partition inequalities
The following proposition generalizes the lower bound and triangleinequalities.
Proposition
(2-partition inequalities) Let U and W be disjoint collections of objectswith |U| > |W |. The following inequality defines a facet of conv(S):∑
i∈U,j∈W
xij −∑
i∈U,j∈U
xij −∑
i∈W ,j∈W
xij ≤ |W |.
This gives the lower bound constraints when |U| = 2, |W | = 0. It givesthe triangle constraints when |U| = 2, |W | = 1.
Mitchell Clustering Problems 8 / 14
Clustering
The objective function coefficients
Note that if all the cij are nonnegative then the optimal solution is toplace each object in its own cluster, so all xij = 0.
Thus, our measure cij cannot simply be the distance between twoobjects, but must allow negative values if we are to have an interestingproblem.
For more details, see Grötschel and Wakabayashi [3, 4].
Mitchell Clustering Problems 9 / 14
Equipartition
Equipartition
Given a graph G = (V ,E) with n = |V | = 2q for some integer q, wepartition V into two sets of size q. We define the variables
xij =
{1 if i and j in same partition0 if i and j in different partitions
for 1 ≤ i < j ≤ n
Let S be the set of feasible incidence vectors of equipartitions.
Mitchell Clustering Problems 10 / 14
Equipartition
Polyhedral results
We have the following results:
Proposition
The dimension of conv(S) is 12n(n − 3).
If C is a cycle with q + 1 vertices then the inequality x(E(C)) ≤ q− 1 isfacet defining.
If U ⊆ V with |U| ≥ 3 and odd, the clique inequality x(E(U)) ≥⌊1
2 |U|⌋2
is facet-defining.
Other inequalities are known (Conforti et al. [1, 2]).
Mitchell Clustering Problems 11 / 14
Clustering with lower bound
Clustering with lower bound
Now consider a clustering problem where we require each cluster tocontain at least q elements, for some positive integer q.
For example, this problem arises in the following settings:allocating teams to divisions in a sports league. In this case, oftenrequire each division to have the same cardinality.microaggregation in the release of data: in order to preserveprivacy, clusters with tiny sizes must be avoided.
Mitchell Clustering Problems 12 / 14
Clustering with lower bound
Clustering with lower bound
Now consider a clustering problem where we require each cluster tocontain at least q elements, for some positive integer q.
For example, this problem arises in the following settings:allocating teams to divisions in a sports league. In this case, oftenrequire each division to have the same cardinality.microaggregation in the release of data: in order to preserveprivacy, clusters with tiny sizes must be avoided.
Mitchell Clustering Problems 12 / 14
Clustering with lower bound
Polyhedral theory
Let S ⊆ B12 n(n−1) be the set of incidence vectors of clusterings where
each cluster contains at least q elements. We have the followingresults regarding conv(S):
Proposition
If q < n/2 then dim(conv(S)) = 12n(n − 1), so S is full-dimensional.
PropositionThe nonnegativity constraints and the triangle constraints ofProposition 3 define facets of conv(S), provided q < n/3. The2-partition inequalities of Proposition 5 define facets of conv(S)provided (|W |+ 2)q < n.
Other families of valid inequalities are also known [5, 6].
Mitchell Clustering Problems 13 / 14
Clustering with lower bound
Polyhedral theory
Let S ⊆ B12 n(n−1) be the set of incidence vectors of clusterings where
each cluster contains at least q elements. We have the followingresults regarding conv(S):
Proposition
If q < n/2 then dim(conv(S)) = 12n(n − 1), so S is full-dimensional.
PropositionThe nonnegativity constraints and the triangle constraints ofProposition 3 define facets of conv(S), provided q < n/3. The2-partition inequalities of Proposition 5 define facets of conv(S)provided (|W |+ 2)q < n.
Other families of valid inequalities are also known [5, 6].
Mitchell Clustering Problems 13 / 14
References
M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope I: Formulations, dimension and basic facets.Mathematical Programming, 49:49–70, 1990.
M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope II: Valid inequalities and facets.Mathematical Programming, 49:71–90, 1990.
M. Grötschel and Y. Wakabayashi.A cutting plane algorithm for a clustering problem.Mathematical Programming, 45:59–96, 1989.
M. Grötschel and Y. Wakabayashi.Facets of the clique partitioning polytope.Mathematical Programming, 47:367–387, 1990.
X. Ji and J. E. Mitchell.Branch-and-price-and-cut on the clique partition problem with minimum clique size requirement.Discrete Optimization, 4(1):87–102, 2007.
J. E. Mitchell.Realignment in the national football league: Did they get it right?Naval Research Logistics, 50(7):683–701, 2003.
Mitchell Clustering Problems 14 / 14
References
M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope I: Formulations, dimension and basic facets.Mathematical Programming, 49:49–70, 1990.
M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope II: Valid inequalities and facets.Mathematical Programming, 49:71–90, 1990.
M. Grötschel and Y. Wakabayashi.A cutting plane algorithm for a clustering problem.Mathematical Programming, 45:59–96, 1989.
M. Grötschel and Y. Wakabayashi.Facets of the clique partitioning polytope.Mathematical Programming, 47:367–387, 1990.
X. Ji and J. E. Mitchell.Branch-and-price-and-cut on the clique partition problem with minimum clique size requirement.Discrete Optimization, 4(1):87–102, 2007.
J. E. Mitchell.Realignment in the national football league: Did they get it right?Naval Research Logistics, 50(7):683–701, 2003.
Mitchell Clustering Problems 14 / 14
References
M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope I: Formulations, dimension and basic facets.Mathematical Programming, 49:49–70, 1990.
M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope II: Valid inequalities and facets.Mathematical Programming, 49:71–90, 1990.
M. Grötschel and Y. Wakabayashi.A cutting plane algorithm for a clustering problem.Mathematical Programming, 45:59–96, 1989.
M. Grötschel and Y. Wakabayashi.Facets of the clique partitioning polytope.Mathematical Programming, 47:367–387, 1990.
X. Ji and J. E. Mitchell.Branch-and-price-and-cut on the clique partition problem with minimum clique size requirement.Discrete Optimization, 4(1):87–102, 2007.
J. E. Mitchell.Realignment in the national football league: Did they get it right?Naval Research Logistics, 50(7):683–701, 2003.
Mitchell Clustering Problems 14 / 14
References
M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope I: Formulations, dimension and basic facets.Mathematical Programming, 49:49–70, 1990.
M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope II: Valid inequalities and facets.Mathematical Programming, 49:71–90, 1990.
M. Grötschel and Y. Wakabayashi.A cutting plane algorithm for a clustering problem.Mathematical Programming, 45:59–96, 1989.
M. Grötschel and Y. Wakabayashi.Facets of the clique partitioning polytope.Mathematical Programming, 47:367–387, 1990.
X. Ji and J. E. Mitchell.Branch-and-price-and-cut on the clique partition problem with minimum clique size requirement.Discrete Optimization, 4(1):87–102, 2007.
J. E. Mitchell.Realignment in the national football league: Did they get it right?Naval Research Logistics, 50(7):683–701, 2003.
Mitchell Clustering Problems 14 / 14
References
M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope I: Formulations, dimension and basic facets.Mathematical Programming, 49:49–70, 1990.
M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope II: Valid inequalities and facets.Mathematical Programming, 49:71–90, 1990.
M. Grötschel and Y. Wakabayashi.A cutting plane algorithm for a clustering problem.Mathematical Programming, 45:59–96, 1989.
M. Grötschel and Y. Wakabayashi.Facets of the clique partitioning polytope.Mathematical Programming, 47:367–387, 1990.
X. Ji and J. E. Mitchell.Branch-and-price-and-cut on the clique partition problem with minimum clique size requirement.Discrete Optimization, 4(1):87–102, 2007.
J. E. Mitchell.Realignment in the national football league: Did they get it right?Naval Research Logistics, 50(7):683–701, 2003.
Mitchell Clustering Problems 14 / 14
References
M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope I: Formulations, dimension and basic facets.Mathematical Programming, 49:49–70, 1990.
M. Conforti, M. R. Rao, and A. Sassano.The equipartition polytope II: Valid inequalities and facets.Mathematical Programming, 49:71–90, 1990.
M. Grötschel and Y. Wakabayashi.A cutting plane algorithm for a clustering problem.Mathematical Programming, 45:59–96, 1989.
M. Grötschel and Y. Wakabayashi.Facets of the clique partitioning polytope.Mathematical Programming, 47:367–387, 1990.
X. Ji and J. E. Mitchell.Branch-and-price-and-cut on the clique partition problem with minimum clique size requirement.Discrete Optimization, 4(1):87–102, 2007.
J. E. Mitchell.Realignment in the national football league: Did they get it right?Naval Research Logistics, 50(7):683–701, 2003.
Mitchell Clustering Problems 14 / 14