christian sohler 1 heinz nixdorf institut universität paderborn algorithmen und komplexität a fast...
TRANSCRIPT
![Page 1: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/1.jpg)
Christian Sohler 1
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
A Fast PTAS for k-Means Clustering
Dan Feldman, Tel Aviv University, Morteza Monemizadeh,Christian Sohler ,Universität Paderborn
![Page 2: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/2.jpg)
Christian Sohler 2
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Simple coreset for clustering problemsOverview
Introduction
Weak Coresets• Definition• Intuition• The construction• A sketch of analysis
The k-means PTAS
Conclusions
![Page 3: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/3.jpg)
Christian Sohler 3
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
IntroductionClustering
Clustering• Partition input in sets (cluster), such that
- Objects in same cluster are similar - Objects in different clusters are dissimilar
Goal• Simplification
• Discovery of patterns
Procedure• Map objects to Euclidean space => point set P
• Points in same cluster are close
• Points in different clusters are far away from eachother
![Page 4: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/4.jpg)
Christian Sohler 4
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Introductionk-means clustering
Clustering with Prototypes• One prototyp (center) for each cluster
k-Means Clustering• k clusters C ,…,C
• One center c for each cluster C
• Minimize d(p,c )
1 k
i i
pCiii
2
![Page 5: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/5.jpg)
Christian Sohler 5
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Introductionk-means clustering
Clustering with Prototypes• One prototyp (center) for each cluster
k-Means Clustering• k clusters C ,…,C
• One center c for each cluster C
• Minimize d(p,c )
1 k
i i
pCiii
2
![Page 6: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/6.jpg)
Christian Sohler 6
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Introductionk-means clustering
Clustering with Prototypes• One prototyp (center) for each cluster
k-Means Clustering• k clusters C ,…,C
• One center c for each cluster C
• Minimize d(p,c )
1 k
i i
pCiii
2
![Page 7: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/7.jpg)
Christian Sohler 7
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
(128,59,88)(218,181,163)
IntroductionSimplification / Lossy Compression
![Page 8: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/8.jpg)
Christian Sohler 8
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
IntroductionSimplification / Lossy Compression
![Page 9: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/9.jpg)
Christian Sohler 9
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
IntroductionSimplification / Lossy Compression
![Page 10: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/10.jpg)
Christian Sohler 10
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
IntroductionProperties of k-means
Properties of k-meansOptimal solution, if
• Centers are given assign each point to the nearest center
• Cluster are given centroid (mean) of clusters
![Page 11: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/11.jpg)
Christian Sohler 11
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
IntroductionProperties of k-means
Properties of k-meansOptimal solution, if
• Centers are given assign each point to the nearest center
• Cluster are given centroid (mean) of clusters
![Page 12: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/12.jpg)
Christian Sohler 12
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
IntroductionProperties of k-means
Properties of k-meansOptimal solution, if
• Centers are given assign each point to the nearest center
• Cluster are given centroid (mean) of clusters
![Page 13: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/13.jpg)
Christian Sohler 13
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
IntroductionProperties of k-means
Properties of k-meansOptimal solution, if
• Centers are given assign each point to the nearest center
• Cluster are given centroid (mean) of clusters
![Page 14: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/14.jpg)
Christian Sohler 14
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
IntroductionProperties of k-means
Properties of k-meansOptimal solution, if
• Centers are given assign each point to the nearest center
• Cluster are given centroid (mean) of clusters
Notation:cost(P,C) denotes the cost of the solution defined this way
![Page 15: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/15.jpg)
Christian Sohler 15
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsCentroid Sets
Definition (-approx. centroid set)A set S is called -approximate centroid set, if
it contains a subset C S s.t. cost(P,C) (1+) cost(P,Opt)
Lemma [KSS04]The centroid of a random set of 2/ points is with constant
probability a (1+)-approx. of the optimal center of P.
CorollaryThe set of all centroids of subsets of 2/ points is an -approx.
Centroid set.
![Page 16: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/16.jpg)
Christian Sohler 16
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsDefinition
Definition (weak -Coreset for k-means)A pair (K,S) is called a weak -coreset for P, if for every set C of k
centers from the -approx. centroid set S we have (1-) cost(P,C) cost(K,C) (1+) cost(P,C)
Point set P (light blue)
![Page 17: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/17.jpg)
Christian Sohler 17
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsDefinition
Definition (weak -Coreset for k-means)A pair (K,S) is called a weak -coreset for P, if for every set C of k
centers from the -approx. centroid set S we have (1-) cost(P,C) cost(K,C) (1+) cost(P,C)
Set of solution S (yellow)
![Page 18: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/18.jpg)
Christian Sohler 18
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsDefinition
Definition (weak -Coreset for k-means)A pair (K,S) is called a weak -coreset for P, if for every set C of k
centers from the -approx. centroid set S we have (1-) cost(P,C) cost(K,C) (1+) cost(P,C)
Possible coreset with weights (red)
4
34
5
5
![Page 19: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/19.jpg)
Christian Sohler 19
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsDefinition
Definition (weak -Coreset for k-means)A pair (K,S) is called a weak -coreset for P, if for every set C of k
centers from the -approx. centroid set S we have
(1-) cost(P,C) cost(K,C) (1+) cost(P,C)
Approximates cost of k centers (voilett) from S
4
34
5
5
![Page 20: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/20.jpg)
Christian Sohler 20
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsIdeal Sampling
Problem• Given n numbers a1,…,an >0
• Task: approximate A:=ai by random sampling
Ideal Sampling• Assign weights w1,…, wn to numbers• wj = avg / aj
• Pr[x=j] = aj / avg• Estimator: wxax
![Page 21: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/21.jpg)
Christian Sohler 21
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsIdeal Sampling
Problem• Given n numbers a1,…,an >0
• Task: approximate A:=ai by random sampling
Ideal Sampling• Assign weights w1,…, wn to numbers• wj = avg / aj
• Pr[x=j] = aj / avg• Estimator: wxax
Properties of estimator:(1) wxax = A (0 variance)(2) Expected weight of number j is 1
![Page 22: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/22.jpg)
Christian Sohler 22
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsIdeal Sampling
Problem• Given n numbers a1,…,an >0
• Task: approximate A:=ai by random sampling
Ideal Sampling• Assign weights w1,…, wn to numbers• wj = A / aj
• Pr[x=j] = aj / A• Estimator: wxax
Properties of estimator:(1) wxax = A (0 variance)(2) Expected weight of number j is 1
Only problem:Weights can be very large
![Page 23: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/23.jpg)
Christian Sohler 23
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsConstruction
Step 1• Compute constant factor approximation
![Page 24: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/24.jpg)
Christian Sohler 24
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsConstruction
Step 2• Consider each cluster separately
![Page 25: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/25.jpg)
Christian Sohler 25
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsConstruction
Step 2• Consider each cluster separately
![Page 26: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/26.jpg)
Christian Sohler 26
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsConstruction
Step 2• Consider each cluster separately
Main idea: Apply ideal sampling to each Cluster CPr[pi is taken] = dist(pi, c) / cost(C,c)w(pi) = cost(C,c) / dist(pi,c)
![Page 27: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/27.jpg)
Christian Sohler 27
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsConstruction
Step 2• Consider each cluster separately
Main idea: Apply ideal sampling to each Cluster CPr[pi is taken] = dist(pi, c) / cost(C,c)w(pi) = cost(C,c) / dist(pi,c)
But what about high weights?
![Page 28: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/28.jpg)
Christian Sohler 28
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsConstruction
Step 2• A little twist
Main idea: Apply ideal sampling to each Cluster CPr[pi is taken] = dist(pi, c) / cost(C,c)w(pi) = cost(C,c) / dist(pi,c)
![Page 29: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/29.jpg)
Christian Sohler 29
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsConstruction
Step 3• A little twist
Uniform sampling from small ballRadius = average distance /
Ideal sampling from ‚outliers‘
![Page 30: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/30.jpg)
Christian Sohler 30
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsAnalysis
Fix arbitrary set of centers K• Case (a): nearest center is ‚far away‘
![Page 31: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/31.jpg)
Christian Sohler 31
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsAnalysis
Fix arbitrary set of centers K• Case (a): nearest center is ‚far away‘
At least (1-)-fraction of points is here by choice
of radius
![Page 32: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/32.jpg)
Christian Sohler 32
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsAnalysis
Fix arbitrary set of centers K• Case (a): nearest center is ‚far away‘
At least (1-)-fraction of points is here by choice
of radius
Weight of samples from outliers at most |C|
![Page 33: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/33.jpg)
Christian Sohler 33
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsAnalysis
Fix arbitrary set of centers K• Case (a): nearest center is ‚far away‘
At least (1-)-fraction of points is here by choice
of radius
Forget about outliers!
![Page 34: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/34.jpg)
Christian Sohler 34
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsAnalysis
Fix arbitrary set of centers K• Case (a): nearest center is ‚far away‘
![Page 35: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/35.jpg)
Christian Sohler 35
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsAnalysis
Fix arbitrary set of centers K• Case (a): nearest center is ‚far away‘
Doesn‘t matter where points lie inside the ball
DD
![Page 36: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/36.jpg)
Christian Sohler 36
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsAnalysis
Fix arbitrary set of centers K• Case (b): nearest center is ‚near‘
![Page 37: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/37.jpg)
Christian Sohler 37
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsAnalysis
Fix arbitrary set of centers K• Case (b): nearest center is ‚near‘
Almost ideal sampling- Expectation is cost(C,K)- low variance
![Page 38: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/38.jpg)
Christian Sohler 38
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsResult
The centroid set• S is set of all centroids of 2/ points (with repetition) from our
sample set K
• Can show that K approximates all solutions from S
• Can show that S is an -approx. centroid set w.h.p.
TheoremOne can compute in O(nkd) time a weak -coreset (K,S). The size
of K is poly(k, 1/). S is the set of all centroids of subsets of K of size 2/.
![Page 39: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/39.jpg)
Christian Sohler 39
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Weak CoresetsApplications
Fast-k-Means-PTAS(P,k)1. Compute weak coreset K
2. Project K on poly(1/,k) dimensional space
3. Exhaustively search for best solution of (projection of) centroid set
4. Return centroids of the points that create C
Running time:O(nkd + (k/) )O(k/)
~
![Page 40: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/40.jpg)
Christian Sohler 40
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und KomplexitätSummary
Weak Coresets• independent of n and d
• fast PTAS for k-means
• First PTAS for kernel k-means (if the kernel maps into finite dimensional space)
![Page 41: Christian Sohler 1 HEINZ NIXDORF INSTITUT Universität Paderborn Algorithmen und Komplexität A Fast PTAS for k-Means Clustering Dan Feldman, Tel Aviv University,](https://reader033.vdocuments.site/reader033/viewer/2022061304/5513df845503463a298b57d8/html5/thumbnails/41.jpg)
Christian Sohler 41
HEINZ NIXDORF INSTITUTUniversität Paderborn
Algorithmen und Komplexität
Christian SohlerHeinz Nixdorf Institut& Institut für InformatikUniversität PaderbornFürstenallee 1133102 Paderborn, Germany
Tel.: +49 (0) 52 51/60 64 27Fax: +49 (0) 52 51/62 64 82E-Mail: [email protected]://www.upb.de/cs/ag-madh
Thank you!Thank you!