better approximations for the minimum common integer partition problem
Post on 16-Jan-2016
37 Views
Preview:
DESCRIPTION
TRANSCRIPT
Better Approximations for the Minimum Common Integer
Partition Problem
David Woodruff
Approx 2006
MIT and Tsinghua University
Minimum Common Integer Partition
• X = {x1, …, xr}, Y = {y1, …, ys} are multisets of positive integers. r ¸ s
• Consider a partition of X into s subsets B1, …, Bs
• If there exist B1, …, Bs with b 2 Bi b = yi for all i, then X is an integer partition of Y. Think of X as a refinement of Y
• k-MCIP problem: Given Y1, …, Yk, find a smallest integer partition X of each of Y1, …, Yk
• Let m = i=1k |Yi|. Efficiency in terms of m.
MCIP Example
Y1 = {2, 2, 3}, Y2 = {1, 1, 5}
Claim: {1, 1, 2, 3} = k-MCIP(Y1, Y2)
Proof: Partition 1: {1, 1}, {2}, {3} Partition 2: {1}, {1}, {2, 3} {1, 1, 2, 3} is an integer partition of Y1 and Y2
Any integer partition of both Y1, Y2 has size ¸ 4
Applications
AAA-AAAAA-AA-AAA-AA-AAAA-AAA
{2,2,4,3} {3,5,2,1}
MCIP = {2, 3, 1, 2, 3}
Since |MCIP| small, humans and monkeys are similar(this measure has been proposed in practice [Jiang, et al])
Applications
A-A-A-A-AA-A-AA-A-AAA-AA-AAAA-AAA
{2,2,4,3} {1,1,1,1,2,1,2,1,1}
MCIP = {1, 1, 1, 1, 1, 1, 1, 2, 2}
Since |MCIP| large, humans and mice are not similar
Applications
• DNA fingerprint assembly– Oligonucleotide Fingerprinting Ribosomal
Genes Project [Valinsky, et al]– Goal is to identify microbial organisms – Use MCIP as a subroutine, k ¼ 28, m ¼ 212
[Jiang]
• Clustering? Scheduling?
Previous Work
k-MCIP problem: Given Y1, …, Yk, find a smallest integer partition of each of Y1, …, Yk
[CLLJ] NP-hard (Maximum Set Packing)
APX-hard for every k ¸ 2 (Maximum-3-Dimensional Matching with Bounded Degree)
Previous Work
[CLLJ] Upper Bounds (5/4)-approximation for k = 2Problem: (m9) running time (m ¼ 212 in practice)
(k-1/3)-approximation in generalProblems: (1) Large ratio (2) Unknown if there is a tight instance
Our Contributions
• .614k + o(k) approximation– O(m log k) time– Extremely easy to implement– If Y1, …, Yk are disjoint, then (k+1)/2
approximation
• We show that the [CLLJ] k-1/3 approximation algorithm is actually a k-1/2 approximation, and this is tight
Algorithm Overview
• Let A be an algorithm for 2-MCIP. We build an algorithm B for k-MCIP
• Choose a random set partition of {1, …, k} into pairs of integers
• For each pair (i,j) 2 , let Ai,j = A(Yi, Yj)
• If there is only one pair (1,2) 2 , output A1,2, otherwise recurse on multisets Ai,j with (i,j) 2
2-MCIP Algorithm
• What is the algorithm for 2-MCIP?
• Greedy algorithm
3422
1253
Y1:
Y2:
Choose two integersTake the minimumSubtract the minimum from both integers and append it to the output
1
0
Remove all 0s
3213
Output
Repeat|Greedy(Y1, Y2)| < |Y1| + |Y2|Generalization: Greedy(Y1, …, Yk) · i=1k |Yi| = m
Better 2-MCIP Algorithm• CommonElements algorithm for 2-MCIP of Y1, Y2:
• T Ã ;. While there is a common integer x of Y1 and Y2, T Ã T [ x Y1 Ã Y1 n x Y2 Ã Y2 n x
• Output T [ Greedy(Y1, Y2)
• Let c1,2 be the # of common integers of Y1 and Y2
• |CommonElements(Y1, Y2)| · (|Y1| + |Y2| - 2c1,2) + c1,2
= |Y1| + |Y2| - c1,2
Algorithm Recap
• Choose a random set partition of {1, …, k} into pairs of integers
• For each pair (i,j) 2 , let Ai,j = CommonElements(Yi, Yj)
• If there is only one pair (1,2) 2 , output A1,2, otherwise recurse on multisets Ai,j with (i,j) 2
Analysis
• Lower bound the output size of our algorithm as a function of the frequency of different integers
• Find the expected output size as a function of the frequency of different integers
• Divide these two to get a worst-case (expected) ratio
• Derandomize using conditional expectations
Frequency of Integers
Define the r-redundancy Red(r) to capture integer frequencies
13132
11125
11341
Consider r disjoint multisets A1, …, Ar such that 1. Each Ai intersects at most one input multiset 2. Ai only contains 1 distinct integer
Red(r) is maxA1, …, Ar i=1r |Ai|
Y1
Y2
Y3
Lower BoundOpt is the size of k-MCIP
Elements of Y1 , Y2, …, Yk
There are opt right vertices each of
degree k
Elements ofk-MCIP
A left vertex is joined to elements partitioning it
5 2
3
# degree-1 vertices on the left is · Red(opt).So, # edges is ¸ 1¢Red(opt) + 2¢(m – Red(opt)).
But, # edges is exactly k¢opt.So, k ¢ opt ¸ 2m – Red(opt)
Example
• Our bound is k ¢ opt ¸ 2m – Red(opt)
• If input multisets are disjoint, Red(opt)=opt
• Trivial greedy algorithm has output size · m
• So greedy algorithm is a m/opt = (k+1)/2 approximation
Algorithm Recap
• Choose a random set partition of {1, …, k} into pairs of integers
• For each pair (i,j) 2 , let Ai,j = CommonElements(Yi, Yj)
• If there is only one pair (1,2) 2 , output A1,2, otherwise recurse on multisets Ai,j with (i,j) 2
Upper Bound
• In some recursive call on multisets Ya and Yb, we are interested in the number of common elements of Ya, Yb
• Since we choose a random partition of input multisets, we can bound the expected number of common elements as a function of Red(opt)
• Linearity of expectations and some calculus allows us to bound the expected number of common elements encountered over all recursive calls, in terms of Red(opt)
• Use lower bound in terms of Red(opt) to get overall ratio
Upper Bound
• Each of O(log k) recursive calls can be implemented in O(m) time, so O(m log k) time
• Actually, proof shows that only 3 recursive calls are necessary to get .614k + o(k) approximation
• This allows derandomization using conditional expectations in O(m poly(k)) time
Conclusions and Future Work
• .614k + o(k) approximation in O(m log k) time
• Improve analysis of previous best algorithm, showing it has ratio exactly k-1/2. – Upper bound uses our notion of redundancy– Lower bound uses an adversarial argument
• Best known lower bound is (1), so there is a huge gap.
Another Example• Consider algorithm which repeatedly removes an integer
common to all k input multisets, and then runs a greedy algorithm on the remaining multisets [CLLJ06]
• Suppose r common integers are removed. Then output size · (m-rk) + r
• But Red(opt) · rk + (opt – r)(k-1). • Our bound is k ¢ opt ¸ 2m – Red(opt)
• This implies opt ¸ (2m-r)/(2k-1), and (m-rk+r)/opt · k – ½.
• Using an adversarial argument, can show this is tight
top related