algorithms at scale - nus computinggilbert/cs5234/2019/... · 2019. 9. 20. · algorithms at scale...
TRANSCRIPT
![Page 1: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/1.jpg)
AlgorithmsatScale(Week6)
![Page 2: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/2.jpg)
Summary
Today:ClusteringandStreamingk-medianclustering• Findk centerstominimizetheaverage
distancetoacenter.LPapproximationalgorithm• Find2k centersthatgivea4-
approximationoftheoptimalclustering.Streaming• Findk centersinastreamofpoints.• Useahierarchicalschemetoreduce
space.Otherclusteringproblems
LastWeek:GraphStreaming
ConnectivityBipartitetestMSTSpannersMatching
![Page 3: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/3.jpg)
Goingforward…
ProblemsetdueThursday,October3:• Experimentalproblemset.• Implementastreamingalgorithms/sketch.• Seewhatperformanceyoucanget.• Goal:testitoutandseewhatyoucanlearnaboutit.
Comingup:• End-of-semesterMiniProject.• Teamsoftwo.• Goal:lookmoredeeplyintoatopiccoveredinthisclass.• I’llprovideoptionsfromeachofthe4partsoftheclass
(sublineartime/sampling,streaming,caching,parallel)• Willsendmoreinformation.
Task:Findapartnerthisweek.
![Page 4: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/4.jpg)
k-Clustering
Givenpoints:P=p1,p2,…,pn
Assumptions:⇒ Pointsareinametricspace:
distancessatisfytriangleinequality.
⇒ (Think:Euclideanspace)⇒ Thenumberofclustersk isgiven.
Goal:⇒ Chooseasetkpoints(“centers”)
thatminimizesomemetric.
Example:3clusters
![Page 5: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/5.jpg)
k-Clustering
Givenpoints:P=p1,p2,…,pn
Assumptions:⇒ Pointsareinametricspace:
distancessatisfytriangleinequality.
⇒ Thenumberofclustersk isgiven.
Goal:⇒ Chooseasetkpoints(“centers”)
thatminimizesomemetric.
Example:3clusters
Metricspace:1. d(x,y)=0iff x=y2. d(x,y)=d(y,x)3. d(x,y)≤d(x,z)+d(z,y)
![Page 6: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/6.jpg)
k-Clustering
Givenpoints:P=p1,p2,…,pn
Manyclusteringvariants:⇒ k-Center⇒ k-Median⇒ k-Means⇒ k-Medoids⇒ Min-CutClustering⇒ SpectralClustering⇒ Etc.⇒ Etc.⇒ Etc.⇒ Etc.
Example:3clusters
![Page 7: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/7.jpg)
k-CenterClustering
Givenpoints:P=p1,p2,…,pn
Assumptions:⇒ Pointsareinametricspace:
distancessatisfytriangleinequality.
⇒ (Think:Euclideanspace)⇒ Thenumberofclustersk isgiven.
Goal:⇒ Chooseasetk points(“centers”)
thatminimizethemaximumdistancetoacenter.
Example:3clusters
![Page 8: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/8.jpg)
k-MedianClustering
Givenpoints:P=p1,p2,…,pn
Assumptions:⇒ Pointsareinametricspace:
distancessatisfytriangleinequality.
⇒ (Think:Euclideanspace)⇒ Thenumberofclustersk isgiven.
Goal:⇒ Chooseasetk points(“centers”)
thatminimizetheaveragedistancetoacenter.
⇒ Equivalent:minimizethesumofthedistancestothecenters.
Example:3clusters• Avg.dist.:2• Totaldist.:22
22
1
13
4
3
6
![Page 9: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/9.jpg)
k-MedianClustering
Givenpoints:P=p1,p2,…,pn
Facts:• k-MedianisNP-hard.• InEuclideanmetric,thereisanearly
lineartime(1+𝜀)approximationalgorithm.
• Ingeneral:o Li-Svensson 2013
(1+√3)-approximationo Byrka etal.2015
2.675-approximationo Improvementssincethen?
Example:3clusters• Avg.dist.:2• Totaldist.:22
22
1
13
4
3
6
![Page 10: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/10.jpg)
k-MedianClustering
Givenpoints:P=p1,p2,…,pn
FindpointsC=c1,c2,…,ck inP
thatminimize:
Example:3clusters• Avg.dist.:2• Totaldist.:22
22
1
13
4
3
6
D(P,C) =nX
i=1
mincj2C
|pi � cj |
![Page 11: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/11.jpg)
k-MedianClustering
Givenpoints:P=p1,p2,…,pn
FindpointsC=c1,c2,…,ck inP
andassignmentfunctionc() that
mapsP—>C minimizing:
Example:3clusters• Avg.dist.:2• Totaldist.:22
22
1
13
4
3
6D(P,C) =nX
i=1
|pi � c(i)|
![Page 12: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/12.jpg)
Summary
Today:ClusteringandStreamingk-medianclustering• Findk centerstominimizetheaverage
distancetoacenter.LPapproximationalgorithm• Find2k centersthatgivea4-
approximationoftheoptimalclustering.Streaming• Findk centersinastreamofpoints.• Useahierarchicalschemetoreduce
space.Otherclusteringproblems
![Page 13: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/13.jpg)
Approximatek-MedianClustering
Givenpoints:P=p1,p2,…,pn
LetC* betheoptimalclustering.
ClusteringC isa𝛄-approximation
if:
Example:3clusters• Avg.dist.:2• Totaldist.:22
22
1
13
4
3
6
D(P,C) �D(P,C⇤)
![Page 14: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/14.jpg)
Approximatek-MedianClustering
Givenpoints:P=p1,p2,…,pn
LetC* betheoptimalclusteringwithk centers.ClusteringC isan(α,𝛄)-approximationifithasatmostαk centersand:
Example:3clusters• Avg.dist.:2• Totaldist.:22
22
1
13
4
3
6
D(P,C) �D(P,C⇤)
![Page 15: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/15.jpg)
(2,2)-approximation
Example:6clusters• Avg.dist.:4• Totaldist.:44
Example:3clusters• Avg.dist.:2• Totaldist.:22
22
1
13
4
3
6
10
4
12
8
10
![Page 16: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/16.jpg)
Approximatek-MedianClustering
Givenpoints:P=p1,p2,…,pn
LetC* betheoptimalclusteringwithk centers.ClusteringC isan(α,𝛄)-approximationifithasatmostαk centersand:
Today:(2,4)-approximation
Example:3clusters• Avg.dist.:2• Totaldist.:22
22
1
13
4
3
6
D(P,C) �D(P,C⇤)
![Page 17: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/17.jpg)
Approximatek-MedianClustering
IntegerLinearProgram
Variables(intuition):yj : Is point pj a cluster head?
xi,j : Is point pi assigned to center pj?
p1p2
p3Example:y1 = 0 x1,2 = 1
y2 = 1 x2,3 = 0
y3 = 1 x1,3 = 0
![Page 18: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/18.jpg)
Approximatek-MedianClustering
IntegerLinearProgram
Variables(intuition):
ILP:
yj : Is point pj a cluster head?
xi,j : Is point pi assigned to center pj?
p1p2
p3
minX
i,j
xi,jd(pi, pj)
8i :P
j xi,j = 1P
j yj k
8i, j : xi,j yj
8i, j : xi,j , yj 2 {0, 1}
![Page 19: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/19.jpg)
Approximatek-MedianClustering
IntegerLinearProgram
Variables(intuition):
ILP:
yj : Is point pj a cluster head?
xi,j : Is point pi assigned to center pj?
p1p2
p3
minX
i,j
xi,jd(pi, pj)
8i :P
j xi,j = 1P
j yj k
8i, j : xi,j yj
8i, j : xi,j , yj 2 {0, 1}
![Page 20: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/20.jpg)
Approximatek-MedianClustering
IntegerLinearProgram
Variables(intuition):
ILP:
yj : Is point pj a cluster head?
xi,j : Is point pi assigned to center pj?
p1p2
p3
minX
i,j
xi,jd(pi, pj)
8i :P
j xi,j = 1P
j yj k
8i, j : xi,j yj
8i, j : xi,j , yj 2 {0, 1}
![Page 21: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/21.jpg)
Approximatek-MedianClustering
IntegerLinearProgram
Variables(intuition):
ILP:
yj : Is point pj a cluster head?
xi,j : Is point pi assigned to center pj?
p1p2
p3
minX
i,j
xi,jd(pi, pj)
8i :P
j xi,j = 1P
j yj k
8i, j : xi,j yj
8i, j : xi,j , yj 2 {0, 1}
![Page 22: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/22.jpg)
Approximatek-MedianClustering
IntegerLinearProgram
Variables(intuition):
ILP:
yj : Is point pj a cluster head?
xi,j : Is point pi assigned to center pj?
p1p2
p3
minX
i,j
xi,jd(pi, pj)
8i :P
j xi,j = 1P
j yj k
8i, j : xi,j yj
8i, j : xi,j , yj 2 {0, 1}
![Page 23: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/23.jpg)
Approximatek-MedianClustering
IntegerLinearProgram
Claim1:Ifx andy satisfytheconstraints,thenitisavalidsolutiontotheclusteringproblem.
ILP:
p1p2
p3
minX
i,j
xi,jd(pi, pj)
8i :P
j xi,j = 1P
j yj k
8i, j : xi,j yj
8i, j : xi,j , yj 2 {0, 1}
![Page 24: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/24.jpg)
Approximatek-MedianClustering
IntegerLinearProgram
Claim2:Ifyouhaveavalidclusteringsolution,youcanchoosex andy tosatisfytheconstraints.
ILP:
p1p2
p3
minX
i,j
xi,jd(pi, pj)
8i :P
j xi,j = 1P
j yj k
8i, j : xi,j yj
8i, j : xi,j , yj 2 {0, 1}
![Page 25: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/25.jpg)
Approximatek-MedianClustering
IntegerLinearProgram
Badnews:SolvingIntegerLinearProgramsisNP-Hard.
ILP:
p1p2
p3
minX
i,j
xi,jd(pi, pj)
8i :P
j xi,j = 1P
j yj k
8i, j : xi,j yj
8i, j : xi,j , yj 2 {0, 1}
![Page 26: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/26.jpg)
8i :P
j xi,j = 1P
j yj k
8i, j : xi,j yj
8i, j : 0 xi,j , yj 1
Approximatek-MedianClustering
Relax:LinearProgram
Goodnews:Relax!Replaceintegralconstraintswith[0,1] constraints.
LP:
p1p2
p3
minX
i,j
xi,jd(pi, pj)
![Page 27: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/27.jpg)
8i :P
j xi,j = 1P
j yj k
8i, j : xi,j yj
8i, j : 0 xi,j , yj 1
Approximatek-MedianClustering
Relax:LinearProgram
Goodnews:Relax!Cansolveefficiently(inpolynomialtime)usinganLPsolver.
LP:
p1p2
p3
minX
i,j
xi,jd(pi, pj)
![Page 28: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/28.jpg)
8i :P
j xi,j = 1P
j yj k
8i, j : xi,j yj
8i, j : 0 xi,j , yj 1
Approximatek-MedianClustering
Relax:LinearProgram
Goodnews:Relax!Ifyouhaveavalidclusteringsolution,youcanchoosex andy tosatisfytheconstraints.
LP:
p1p2
p3
minX
i,j
xi,jd(pi, pj)
![Page 29: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/29.jpg)
Approximatek-MedianClustering
Relax:LinearProgram
Goodnews:Relax!Ifyouhaveavalidclusteringsolution,youcanchoosex andy tosatisfytheconstraints.
IfC isa(fractional)solutiontotheLP,andC* istheoptimal(integral)solution,then:
p1p2
p3
D(C,P ) D(C⇤, P )
Solutionisnoworse thantheoptimalsolution!Maybebetterthanoptimal!
![Page 30: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/30.jpg)
8i :P
j xi,j = 1P
j yj k
8i, j : xi,j yj
8i, j : 0 xi,j , yj 1
Approximatek-MedianClustering
Relax:LinearProgram
Badnews:solutionisfractionalIfx andy satisfytheconstraints,thenitmayNOTbeavalidsolutiontotheclusteringproblem.
LP:
p1p2
p3
minX
i,j
xi,jd(pi, pj)
![Page 31: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/31.jpg)
Approximatek-MedianClustering
Relax:LinearProgram
Badnews:solutionisfractionalIfx andy satisfytheconstraints,thenitmayNOTbeavalidsolutiontotheclusteringproblem.
p1p2
p3
Variables(intuition):yj : Is point pj a cluster head?
xi,j : Is point pi assigned to center pj?
y1 = 0.5 x1,2 = 0.5
y2 = 0.5 x2,3 = 0
y3 = 1 x1,3 = 0.5
![Page 32: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/32.jpg)
Approximatek-MedianClustering
Relax:LinearProgram
Solution:roundtointegersIfx andy satisfytheconstraints,thenmaybewecanroundthevariablesinawaythatdoesnotincreasethecosttoomuch.
p1p2
p3
yj : Is point pj a cluster head?
xi,j : Is point pi assigned to center pj?
y1 = 0.5 x1,2 = 0.5
y2 = 0.5 x2,3 = 0
y3 = 1 x1,3 = 0.5
![Page 33: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/33.jpg)
Roundingthek-MedianLP
Step1: Whatisthecost?
Definethecostofpi:
LPminimizes:
p1p2
p3
minX
i,j
xi,jd(pi, pj)
Ci =X
j
xi,jd(pi, pj)
minX
i
Ci
Goal:roundinawaythatdoesnotincreasecosttoomuch!
![Page 34: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/34.jpg)
Roundingthek-MedianLP
Step1: Whatisthecost?
Definethecostofpi:
Goalafterrounding:constructC’ st.
p1p2
p3
Ci =X
j
xi,jd(pi, pj)
minX
i
Ci
Goal:roundinawaythatdoesnotincreasecosttoomuch!
C 0j 4Cj
![Page 35: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/35.jpg)
Roundingthek-MedianLP
Step2:Sortbycost
Notice:smallestcostishardesttoround.
(Mostriskthatitwillincreasetoomuch.)
p1p2
p3
![Page 36: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/36.jpg)
Roundingthek-MedianLP
Step3:Addpj withsmallestcostCjtooursetofcenters.
S={pj}p1
p2
p3
![Page 37: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/37.jpg)
Roundingthek-MedianLP
Step4:Ifpi iswithindistance4Cj ofpj,thenwecandeleteit.
S={pj}
è pi isalreadycloseenoughtoacenterinoursolution.
p1p2
p3
C 0i d(pi, pj) 4Cj
4Ci
Recall:Cj wasthesmallest.
![Page 38: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/38.jpg)
Roundingthek-MedianLP
Step4:Ifpi iswithindistance4Ci ofpj,thenwecandeleteit.
S={pj}
è pi isalreadycloseenoughtoacenterinoursolution.
p1p2
p3
C 0i d(pi, pj) 4Ci
![Page 39: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/39.jpg)
Roundingthek-MedianLP
Step4:Ifthereissomepointqwhere:
thenwecandeleteit.
è pi isalreadycloseenoughtoacenterinoursolution.
p1p2
d(pi, q) 2Ci
d(pj , q) 2Cj
q
2Ci2C2
Recall:Cj wasthesmallest.
C 0i d(pi, pj)
d(pi, q) + d(q, pj)
2Ci + 2Cj
2Ci + 2Ci
4Ci
![Page 40: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/40.jpg)
Roundingthek-MedianLP
Step4:Ifthereissomepointqwhere:
thenwecandeleteit.
è AllnodesinV(i) arecloseenoughtopi thatwecandeletethem.
p1p2
d(pi, q) 2Ci
d(pj , q) 2Cj
q
2Ci2C2
V (j) = {pi | 9q, d(pi, q) 2Ci, d(pj , q) 2Cj}
![Page 41: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/41.jpg)
Roundingthek-MedianLP
Step5:Repeatuntilallaredeleted.
p1p2
q
2Ci2C2
![Page 42: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/42.jpg)
Roundingthek-MedianLP
RoundingAlgorithm:
1. S={}2. Repeatuntilallpointsaredeleted:• Letpj betheremainingpoint
withminimumCj.• Addpj toS.• DeleteallpointsinV(j).
3. ReturnS.
p1p2
q
2Ci2C2
WheredidweusetheLPsolution??
![Page 43: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/43.jpg)
Roundingthek-MedianLP
RoundingAlgorithm:
1. S={}2. Repeatuntilallpointsaredeleted:• Letpj betheremainingpoint
withminimumCj.• Addpj toS.• DeleteallpointsinV(j).
3. ReturnS.
Claim:Forallj: C 0j 4Cj
ComputeC’ usingcentersinS.
![Page 44: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/44.jpg)
Roundingthek-MedianLP
Step4:Ifthereissomepointqwhere:
thenwecandeleteit.
è pi isalreadycloseenoughtoacenterinoursolution.
p1p2
d(pi, q) 2Ci
d(pj , q) 2Cj
q
2Ci2C2
Recall:Cj wasthesmallest.
C 0i d(pi, pj)
d(pi, q) + d(q, pj)
2Ci + 2Cj
2Ci + 2Ci
4Ci
![Page 45: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/45.jpg)
Roundingthek-MedianLP
RoundingAlgorithm:
1. S={}2. Repeatuntilallpointsaredeleted:• Letpj betheremainingpoint
withminimumCj.• Addpj toS.• DeleteallpointsinV(j).
3. ReturnS.
Claim:Forallj: C 0j 4Cj
ComputeC’ usingcentersinS.
![Page 46: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/46.jpg)
Roundingthek-MedianLP
RoundingAlgorithm:
1. S={}2. Repeatuntilallpointsaredeleted:• Letpj betheremainingpoint
withminimumCj.• Addpj toS.• DeleteallpointsinV(j).
3. ReturnS.
Claim:Forallj:èC 0j 4Cj
d(C 0, P ) 4d(C,P ) 4d(C⇤, P )
![Page 47: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/47.jpg)
Roundingthek-MedianLP
RoundingAlgorithm:
1. S={}2. Repeatuntilallpointsaredeleted:• Letpj betheremainingpoint
withminimumCj.• Addpj toS.• DeleteallpointsinV(j).
3. ReturnS.
Remainingproblem?
![Page 48: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/48.jpg)
Roundingthek-MedianLP
RoundingAlgorithm:
1. S={}2. Repeatuntilallpointsaredeleted:• Letpj betheremainingpoint
withminimumCj.• Addpj toS.• DeleteallpointsinV(j).
3. ReturnS.
Remainingproblem:HowmanycentersaddedtoS?
![Page 49: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/49.jpg)
Roundingthek-MedianLP
RoundingAlgorithm:
1. S={}2. Repeatuntilallpointsaredeleted:• Letpj betheremainingpoint
withminimumCj.• Addpj toS.• DeleteallpointsinV(j).
3. ReturnS.
Claim:Atmost2k centersaddedtoS.
![Page 50: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/50.jpg)
Roundingthek-MedianLP
Keylemma:Ifpi isaddedtoS,then:
è Sincey’ssumtok,ifV(j)aredisjoint,cannotaddmorethan2kpointstoS.
p2X
j2V (i)
yj � 1/2
![Page 51: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/51.jpg)
Roundingthek-MedianLP
Keylemma:Ifpi isaddedtoS,then:
è Sincey’ssumtokandV(j)aredisjoint,cannotaddmorethan2kpointstoS.
p2X
j : d(pi,pj)2Ci
yj � 1/2
Subtlepoint:symmetry!Ifaddingpi deletespj,thenaddpj deletespi.
![Page 52: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/52.jpg)
Roundingthek-MedianLP
Keylemma:Ifpi isaddedtoS,then:
Observation1:
p2X
j : d(pi,pj)2Ci
yj � 1/2
X
j : d(pi,pj)2Ci
yj �X
j : d(pi,pj)2Ci
xi,j
![Page 53: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/53.jpg)
8i :P
j xi,j = 1P
j yj k
8i, j : xi,j yj
8i, j : 0 xi,j , yj 1
Approximatek-MedianClustering
Relax:LinearProgram
LP:
p1p2
p3
minX
i,j
xi,jd(pi, pj)
![Page 54: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/54.jpg)
Roundingthek-MedianLP
Keylemma:Ifpi isaddedtoS,then:
Observation1:
p2X
j : d(pi,pj)2Ci
yj � 1/2
X
j : d(pi,pj)2Ci
yj �X
j : d(pi,pj)2Ci
xi,j
![Page 55: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/55.jpg)
Roundingthek-MedianLP
Observation2:
Ci =“average”distancefromitoacenter. p2
Ci =X
j
xi,jd(pi, pj)
![Page 56: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/56.jpg)
Roundingthek-MedianLP
Observation2:
Ci =“average”distancefromitoacenter.
LetZ berandomvariablethatequalsd(pi,pj) withprobabilityxij.
p2
E[Z] = Ci
![Page 57: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/57.jpg)
Roundingthek-MedianLP
Observation2:
Ci =“average”distancefromitoacenter.
LetZ berandomvariablethatequalsd(pi,pj) withprobabilityxij.
p2
X
j : d(pi,pj)2Ci
xi,j = Pr(Z 2Ci) = 1� Pr(Z > 2Ci)
![Page 58: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/58.jpg)
Roundingthek-MedianLP
Observation2:
Ci =“average”distancefromitoacenter.
LetZ berandomvariablethatequalsd(pi,pj) withprobabilityxij.
p2
X
j : d(pi,pj)2Ci
xi,j = Pr(Z 2Ci) = 1� Pr(Z > 2Ci)
= 1� Pr(Z > 2E[Z])
� 1� 1/2 = 1/2
![Page 59: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/59.jpg)
Roundingthek-MedianLP
Keylemma:Ifpi isaddedtoS,then:
Conclusion:
p2X
j : d(pi,pj)2Ci
yj � 1/2
X
j : d(pi,pj)2Ci
yj �X
j : d(pi,pj)2Ci
xi,j � 1/2
![Page 60: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/60.jpg)
Roundingthek-MedianLP
Keylemma:Ifpi isaddedtoS,then:
è Fact:yi’s sumto≤kè Fact:V(i) aredisjointè Fact:Foreachpi addedtoS,deletepointswithyj’s
thatsumtoatleast½.è Cannotaddmorethan2k pointstoS.
p2X
j : d(pi,pj)2Ci
yj � 1/2
![Page 61: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/61.jpg)
Approximatek-MedianClustering
Givenpoints:P=p1,p2,…,pn
Today:(2,4)-approximation• GiveIntegerLinearProgram(ILP).• RelaxtoLinearProgram(LP).• SolveLP.• Round(carefully).
Example:3clusters• Avg.dist.:2• Totaldist.:22
22
1
13
4
3
6
![Page 62: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/62.jpg)
Weightedk-MedianClustering
Givenpoints:P=p1,p2,…,pn
Givenweights:w1,w2,…,wn
FindpointsC=c1,c2,…,ck inP
andassignmentfunctionc() that
mapsP—>C minimizing:
Example:3clusters• Avg.dist.:2• Totaldist.:22
22
1
13
4
3
6
D(P,C) =nX
i=1
wi|pi � c(i)|
![Page 63: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/63.jpg)
Weightedk-MedianClustering
Givenpoints:P=p1,p2,…,pn
Givenweights:w1,w2,…,wn
FindpointsC=c1,c2,…,ck inP
andassignmentfunctionc() that
mapsP—>C minimizing:
Example:3clusters• Avg.dist.:2• Totaldist.:22
22
1
13
4
3
6
D(P,C) =nX
i=1
wi|pi � c(i)|
Exercise:
Showhowtoadapttheapproximatek-medianalgorithmtogivea(2,4)-approximatesolutionfortheweightedk-Medianclusteringproblem.
![Page 64: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/64.jpg)
StreamingData
Dataarrivesinastream:S=s1,s2,…,sT
Eachsj isapoint.⇒ Eachpointshowsupexactlyonce.⇒ Pointsshowupinanarbitrary(worst-case)order.
ExampleinEuclideanplane:S=(17,3),(1,7),(15,1),(4,1),(3,19),(1,1),(2,1)
Atendofstream:output k clustercenters.
![Page 65: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/65.jpg)
StreamingData
Dataarrivesinastream:S=s1,s2,…,sT
Memory:
Goal:(2,O(1))-approximation
O(pnk)
Warning:Today,theapproximationratioisgoingtobelarge.
![Page 66: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/66.jpg)
S=∅repeattimes:
1.LetP=nextpoints.2.Find(2,4)-approximateclusteringonP.3.Add2knewclustercenterstoS.Weighteach
clustercenterwith#ofpointsattachedtoit.4.EmptyP.
Return(2,4)-approximate(weighted)clusteringonS.
Streamingk-Median
Core-SetAlgorithm
rn
k pnk
![Page 67: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/67.jpg)
Streamingk-Median
Core-SetAlgorithm
St (√nk elements)
Datastreamcontainingnelements
S1 (√nk elements) S1 (√nk elements)
2kcenters
2kcenters
2kcenters
(2,4)-approximatek-median
(2,4)-approximatek-median
(2,4)-approximatek-median
2kcenters
(2,4)-approximateweightedk-median
2pnk centersatintermediatelevel
![Page 68: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/68.jpg)
Streamingk-Median
Core-SetAlgorithm
St (√nk elements)
Datastreamcontainingnelements
S1 (√nk elements) S1 (√nk elements)
(2,4)-approximatek-median
(2,4)-approximatek-median
(2,4)-approximatek-median
2kcenters
(2,4)-approximateweightedk-median
2pnk centersatintermediatelevel
2kcenters
2kcenters
2kcenters CoreSet
![Page 69: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/69.jpg)
Streamingk-Median
Core-SetAlgorithm
St (√nk elements)
Datastreamcontainingnelements
S1 (√nk elements) S1 (√nk elements)
2kcenters
2kcenters
2kcenters
(2,4)-approximatek-median
(2,4)-approximatek-median
(2,4)-approximatek-median
2kcenters
(2,4)-approximateweightedk-median
2pnk centersatintermediatelevel
Space:O(pnk)
![Page 70: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/70.jpg)
Streamingk-Median
Core-SetAlgorithm
Claims:
Claim1:Spaceisatmost.
Claim2:Theoutputisatmost2k centers.
O(pnk)
Byconstruction.
![Page 71: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/71.jpg)
Streamingk-Median
Core-SetAlgorithm
Claims:
Claim1:Spaceisatmost.
Claim2:Theoutputisatmost2k centers.
Claim3:Theoutputis(2,80)-approximationfork-Median.
O(pnk)
Byconstruction.
![Page 72: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/72.jpg)
Streamingk-Median
Core-SetAlgorithm
Notation:
1:Substream Si istheith segmentofthestream.
2:PointsTi arethe2k centersoutputbyith (2,4)-approximation.
3:Sw aretheweightedpoints,andwaretheweights,usedforthefinal(2,4)-approximation.
4:PointsT arethefinaloutputofthealgorithm.
![Page 73: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/73.jpg)
Streamingk-Median
Core-SetAlgorithm
Lemma:
Interpretation:Wecanboundthefinaldistancesbytwoparts:(1) thedistanceofapointtotheintermediateclustering,and(2) thedistanceoftheintermediateclusteringtothefinal
clustering.
d(S, T ) tX
i=1
d(Si, Ti) + d(Sw, T )
![Page 74: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/74.jpg)
Streamingk-Median
Core-SetAlgorithm
Proof: d(S, T ) =tX
i=1
X
x2Si
d(x, T )
tX
i=1
X
x2Si
d(x, ti
(x)) + d(ti
(x), T )
tX
i=1
d(Si
, T
i
) +tX
i=1
X
x2Si
d(ti
(x), T )
tX
i=1
d(Si
, T
i
) +tX
i=1
2kX
j=1
|Si
|d(tij
, T )
tX
i=1
d(Si
, T
i
) + d(Sw
, T )
Definitionofd(S,T).
Variablesxrangeoverallpointsintheset
![Page 75: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/75.jpg)
Streamingk-Median
Core-SetAlgorithm
Proof: d(S, T ) =tX
i=1
X
x2Si
d(x, T )
tX
i=1
X
x2Si
d(x, ti
(x)) + d(ti
(x), T )
tX
i=1
d(Si
, T
i
) +tX
i=1
X
x2Si
d(ti
(x), T )
tX
i=1
d(Si
, T
i
) +tX
i=1
2kX
j=1
|Si
|d(tij
, T )
tX
i=1
d(Si
, T
i
) + d(Sw
, T )
TriangleInequality
Pointti(x) isthecenterassignedtox intheintermediatecoreset,wherexisapointinsegmentSi ofthestream.
![Page 76: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/76.jpg)
Streamingk-Median
Core-SetAlgorithm
Proof: d(S, T ) =tX
i=1
X
x2Si
d(x, T )
tX
i=1
X
x2Si
d(x, ti
(x)) + d(ti
(x), T )
tX
i=1
d(Si
, T
i
) +tX
i=1
X
x2Si
d(ti
(x), T )
tX
i=1
d(Si
, T
i
) +tX
i=1
2kX
j=1
|Si
|d(tij
, T )
tX
i=1
d(Si
, T
i
) + d(Sw
, T )
Definitionofd(Si,Ti).
![Page 77: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/77.jpg)
Streamingk-Median
Core-SetAlgorithm
Proof: d(S, T ) =tX
i=1
X
x2Si
d(x, T )
tX
i=1
X
x2Si
d(x, ti
(x)) + d(ti
(x), T )
tX
i=1
d(Si
, T
i
) +tX
i=1
X
x2Si
d(ti
(x), T )
tX
i=1
d(Si
, T
i
) +tX
i=1
2kX
j=1
|Si
|d(tij
, T )
tX
i=1
d(Si
, T
i
) + d(Sw
, T )
Iterateoverallcentersincoreset.
Counthowmanytimeseachisincludedinthesum.
Pointtij isoneofthe2kpointsinthecoreset fortheith segment.
![Page 78: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/78.jpg)
Streamingk-Median
Core-SetAlgorithm
Proof: d(S, T ) =tX
i=1
X
x2Si
d(x, T )
tX
i=1
X
x2Si
d(x, ti
(x)) + d(ti
(x), T )
tX
i=1
d(Si
, T
i
) +tX
i=1
X
x2Si
d(ti
(x), T )
tX
i=1
d(Si
, T
i
) +tX
i=1
2kX
j=1
|Si
|d(tij
, T )
tX
i=1
d(Si
, T
i
) + d(Sw
, T )Definitionofd(Sw,T).
Weightw(i)=|Si|.
![Page 79: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/79.jpg)
Streamingk-Median
Core-SetAlgorithm
Lemma:
Interpretation:Wecanboundthefinaldistancesbytwoparts:(1) thedistanceofapointtotheintermediateclustering,and(2) thedistanceoftheintermediateclusteringtothefinal
clustering.
d(S, T ) tX
i=1
d(Si, Ti) + d(Sw, T )
![Page 80: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/80.jpg)
Streamingk-Median
Core-SetAlgorithm
Lemma:
Goal:
d(S, T ) tX
i=1
d(Si, Ti) + d(Sw, T )
d(S, T ) 80d(S,C⇤)
![Page 81: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/81.jpg)
Streamingk-Median
Core-SetAlgorithm
Usefulfact:
WhereA issomelargersetofallpossiblepointsinthemetricspace,andS’ isanarbitrarysubsetofA.
Interpretation:ToclusterS’,wecanfocusonpointsinS’ (andonlyloseafactorof2.)Wedon’tneedcentersnotinS’.
minT 0✓S0
d(S0, T 0) 2 minT 0✓A
d(S0, T 0)
![Page 82: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/82.jpg)
Streamingk-Median
Core-SetAlgorithm
Usefulfact:
Proof: TriangleInequalityLetT’ betheoptimalsolutioninA.Lett besomepointinT’thatisnotinS’,let t’ betheclosestpointinS’ tot,andlets besomeotherpointinS’.Wecanreplacet witht’ because:
minT 0✓S0
d(S0, T 0) 2 minT 0✓A
d(S0, T 0)
d(s, t0) d(s, t) + d(t, t0) d(s, t) + d(s, t) 2d(set)
![Page 83: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/83.jpg)
Streamingk-Median
Core-SetAlgorithm
Lemma:
Goal:
d(S, T ) tX
i=1
d(Si, Ti) + d(Sw, T )
d(S, T ) 80d(S,C⇤)
![Page 84: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/84.jpg)
Streamingk-Median
Core-SetAlgorithm
Lemma:
Interpretation:Wecanboundthedistancestothecoreset bytheoptimalclustering.
tX
i=1
d(Si, Ti) 8d(S,C⇤)
![Page 85: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/85.jpg)
tX
i=1
d(Si, Ti) tX
i=1
4 minT 0✓Si
d(Si, T0)
tX
i=1
8 minT 0✓P
d(Si, T0)
tX
i=1
8 minT 0✓P
d(Si, C⇤)
8d(S,C⇤)
Streamingk-Median
Core-SetAlgorithm
Proof: Becauseweusea(2,4)-approximationalgorithmtocomputethecoreset.
![Page 86: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/86.jpg)
tX
i=1
d(Si, Ti) tX
i=1
4 minT 0✓Si
d(Si, T0)
tX
i=1
8 minT 0✓P
d(Si, T0)
tX
i=1
8 minT 0✓P
d(Si, C⇤)
8d(S,C⇤)
Streamingk-Median
Core-SetAlgorithm
Proof:
Becauseweonlyloseafactoroftwogoingtoalargesetofpoints.
![Page 87: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/87.jpg)
tX
i=1
d(Si, Ti) tX
i=1
4 minT 0✓Si
d(Si, T0)
tX
i=1
8 minT 0✓P
d(Si, T0)
tX
i=1
8d(Si, C⇤)
8d(S,C⇤)
Streamingk-Median
Core-SetAlgorithm
Proof:
Bydefinitionoftheoptimalclustering.
![Page 88: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/88.jpg)
tX
i=1
d(Si, Ti) tX
i=1
4 minT 0✓Si
d(Si, T0)
tX
i=1
8 minT 0✓P
d(Si, T0)
tX
i=1
8d(Si, C⇤)
8d(S,C⇤)
Streamingk-Median
Core-SetAlgorithm
Proof:
Bysummingoverallpoints.
![Page 89: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/89.jpg)
Streamingk-Median
Core-SetAlgorithm
Lemma:
Interpretation:Wecanboundthedistancestothecoreset bytheoptimalclustering.
tX
i=1
d(Si, Ti) 8d(S,C⇤)
![Page 90: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/90.jpg)
Streamingk-Median
Core-SetAlgorithm
Lemma:
Goal:
d(S, T ) tX
i=1
d(Si, Ti) + d(Sw, T )
d(S, T ) 80d(S,C⇤)
![Page 91: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/91.jpg)
Streamingk-Median
Core-SetAlgorithm
Lemma:
Interpretation:Wecanboundthecostofthesecondpart…
d(Sw, T ) 8tX
i=1
d(Si, Ti) + 8d(S,C⇤)
![Page 92: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/92.jpg)
Streamingk-Median
Core-SetAlgorithm
Part1: d(Sw, C⇤)
tX
i=1
d(Si, Ti) + d(S,C⇤)
d(Sw
, C
⇤) =X
i,j
|Si,j
|d(ti,j
, T
⇤)
X
i,j
X
x2Si,j
[d(ti,j
, x) + d(x, t⇤(x))]
X
i
X
x2Si
[d(ti
(x), x) + d(x, t⇤(x))]
tX
i=1
d(Si
, T
i
) + d(S,C⇤)
Definitionofweighted...
![Page 93: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/93.jpg)
Streamingk-Median
Core-SetAlgorithm
Part1: d(Sw, C⇤)
tX
i=1
d(Si, Ti) + d(S,C⇤)
d(Sw
, C
⇤) =X
i,j
|Si,j
|d(ti,j
, T
⇤)
X
i,j
X
x2Si,j
[d(ti,j
, x) + d(x, t⇤(x))]
X
i
X
x2Si
[d(ti
(x), x) + d(x, t⇤(x))]
tX
i=1
d(Si
, T
i
) + d(S,C⇤)
SumoverSij andusetriangleinequality.
![Page 94: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/94.jpg)
Streamingk-Median
Core-SetAlgorithm
Part1: d(Sw, C⇤)
tX
i=1
d(Si, Ti) + d(S,C⇤)
d(Sw
, C
⇤) =X
i,j
|Si,j
|d(ti,j
, T
⇤)
X
i,j
X
x2Si,j
[d(ti,j
, x) + d(x, t⇤(x))]
X
i
X
x2Si
[d(ti
(x), x) + d(x, t⇤(x))]
tX
i=1
d(Si
, T
i
) + d(S,C⇤)
Simplifyenumerationoverallpointsincoreset.
![Page 95: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/95.jpg)
Streamingk-Median
Core-SetAlgorithm
Part1: d(Sw, C⇤)
tX
i=1
d(Si, Ti) + d(S,C⇤)
d(Sw
, C
⇤) =X
i,j
|Si,j
|d(ti,j
, T
⇤)
X
i,j
X
x2Si,j
[d(ti,j
, x) + d(x, t⇤(x))]
X
i
X
x2Si
[d(ti
(x), x) + d(x, t⇤(x))]
tX
i=1
d(Si
, T
i
) + d(S,C⇤) Definition…
![Page 96: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/96.jpg)
Streamingk-Median
Core-SetAlgorithm
Part1:
Part2:
Conclusion:
d(Sw, C⇤)
tX
i=1
d(Si, Ti) + d(S,C⇤)
d(Sw, T ) 8d(Sw, C⇤)
d(Sw, T ) 8tX
i=1
d(Si, Ti) + 8d(S,C⇤)
![Page 97: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/97.jpg)
Streamingk-Median
Core-SetAlgorithm
Part1:
Part2:
d(Sw, C⇤)
tX
i=1
d(Si, Ti) + d(S,C⇤)
d(Sw, T ) 4 minT 0✓Sw
d(Sw, T0)
8 minT 0✓P
d(Sw, T0)
8d(Sw, C⇤)
d(Sw, T ) 8d(Sw, C⇤)
Becauseused4-approximationalgorithm.
![Page 98: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/98.jpg)
Streamingk-Median
Core-SetAlgorithm
Part1:
Part2:
d(Sw, C⇤)
tX
i=1
d(Si, Ti) + d(S,C⇤)
d(Sw, T ) 4 minT 0✓Sw
d(Sw, T0)
8 minT 0✓P
d(Sw, T0)
8d(Sw, C⇤)
d(Sw, T ) 8d(Sw, C⇤)
BecauseusingpointsinSwonlylosesafactorof2.
![Page 99: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/99.jpg)
Streamingk-Median
Core-SetAlgorithm
Part1:
Part2:
d(Sw, C⇤)
tX
i=1
d(Si, Ti) + d(S,C⇤)
d(Sw, T ) 4 minT 0✓Sw
d(Sw, T0)
8 minT 0✓P
d(Sw, T0)
8d(Sw, C⇤)
d(Sw, T ) 8d(Sw, C⇤)
Bydefinition…
![Page 100: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/100.jpg)
Streamingk-Median
Core-SetAlgorithm
Part1:
Part2:
d(Sw, C⇤)
tX
i=1
d(Si, Ti) + d(S,C⇤)
d(Sw, T ) 8d(Sw, C⇤)
![Page 101: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/101.jpg)
Streamingk-Median
Core-SetAlgorithm
Additallup:
d(S, T ) tX
i=1
d(Si, Ti) + d(Sw, T )
8d(S,C⇤) + d(Sw, T )
8d(S,C⇤) + 8tX
i=1
d(Si, Ti) + 8d(S,C⇤)
8d(S,C⇤) + 8(8d(S,C⇤)) + 8d(S,C⇤)
80d(S,C⇤)
![Page 102: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/102.jpg)
Streamingk-Median
Core-SetAlgorithm
Additallup:
d(S, T ) tX
i=1
d(Si, Ti) + d(Sw, T )
8d(S,C⇤) + d(Sw, T )
8d(S,C⇤) + 8tX
i=1
d(Si, Ti) + 8d(S,C⇤)
8d(S,C⇤) + 8(8d(S,C⇤)) + 8d(S,C⇤)
80d(S,C⇤)
![Page 103: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/103.jpg)
Streamingk-Median
Core-SetAlgorithm
Additallup:
d(S, T ) tX
i=1
d(Si, Ti) + d(Sw, T )
8d(S,C⇤) + d(Sw, T )
8d(S,C⇤) + 8tX
i=1
d(Si, Ti) + 8d(S,C⇤)
8d(S,C⇤) + 8(8d(S,C⇤)) + 8d(S,C⇤)
80d(S,C⇤)
![Page 104: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/104.jpg)
Streamingk-Median
Core-SetAlgorithm
Additallup:
d(S, T ) tX
i=1
d(Si, Ti) + d(Sw, T )
8d(S,C⇤) + d(Sw, T )
8d(S,C⇤) + 8tX
i=1
d(Si, Ti) + 8d(S,C⇤)
8d(S,C⇤) + 8(8d(S,C⇤)) + 8d(S,C⇤)
80d(S,C⇤)
![Page 105: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/105.jpg)
Streamingk-Median
Core-SetAlgorithm
Additallup:
d(S, T ) tX
i=1
d(Si, Ti) + d(Sw, T )
8d(S,C⇤) + d(Sw, T )
8d(S,C⇤) + 8tX
i=1
d(Si, Ti) + 8d(S,C⇤)
8d(S,C⇤) + 8(8d(S,C⇤)) + 8d(S,C⇤)
80d(S,C⇤)
![Page 106: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/106.jpg)
Streamingk-Median
Core-SetAlgorithm
St (√nk elements)
Datastreamcontainingnelements
S1 (√nk elements) S1 (√nk elements)
2kcenters
2kcenters
2kcenters
(2,4)-approximatek-median
(2,4)-approximatek-median
(2,4)-approximatek-median
2kcenters
(2,4)-approximateweightedk-median
2pnk centersatintermediatelevel
Space:O(pnk)
Approximation:(2,80)
![Page 107: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/107.jpg)
CoreSetAlgorithm
Questions:
Whatifyouwantlessspace?• Increasesegmentsize?• Decreasenumberofcoresets?
![Page 108: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/108.jpg)
CoreSetAlgorithm
Question:
Whatifyouwantlessspace?• Increasesegmentsize?• Decreasenumberofcoresets?
Idea: hierarchicalconstruction!
![Page 109: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/109.jpg)
HierarchicalConstruction
St
Datastreamcontainingnelements
S1 S6
2k
2kcenters
S2 S5S4S3
2k 2k 2k 2k 2k2k
2kcenters
2kcenters
2kcenters
![Page 110: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/110.jpg)
CoreSetAlgorithm
Algorithmidea:
Define.
Wheneveryouseem elementsinthestream:• Runthe(2,4)-approximationè 2kcenters.• Storethe2knewcentersinlevel1.
m = n✏
![Page 111: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/111.jpg)
CoreSetAlgorithm
Algorithmidea:
Define.
Wheneveryouseem elementsinthestream:• Runthe(2,4)-approximationè 2kcenters.• Storethe2knewcentersinlevel1.
Wheneveryouhavem setsofcentersinlevelj:• Runthe(2,4)-approximationè 2kcenters.• Storethe2knewcentersinlevelj+1.
m = n✏
![Page 112: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/112.jpg)
CoreSetAlgorithm
Algorithmidea:
Define.
Wheneveryouseem elementsinthestream:• Runthe(2,4)-approximationè 2kcenters.• Storethe2knewcentersinlevel1.
Wheneveryouhavem setsofcentersinlevelj:• Runthe(2,4)-approximationè 2kcenters.• Storethe2knewcentersinlevelj+1.
m = n✏
![Page 113: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/113.jpg)
HierarchicalConstruction
St
Datastreamcontainingnelements
S1 S6
2k
2kcenters
S2 S5S4S3
2k 2k 2k 2k 2k2k
2kcenters
2kcenters
2kcenters
![Page 114: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/114.jpg)
CoreSetAlgorithm
Algorithmidea:
Define.
Wheneveryouhavem setsofcentersinlevelj:• Runthe(2,4)-approximationè 2kcenters.• Storethe2knewcentersinlevelj+1.
Treewithfan-outm hashowmanylevels?
logm n =
log n
logm=
log n
log n✏=
1
✏
m = n✏
![Page 115: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/115.jpg)
CoreSetAlgorithm
Algorithmidea:
Define.
Treewithfan-outm hashowmanylevels?
Spaceusage?
logm n =
log n
logm=
log n
log n✏=
1
✏
m = n✏
![Page 116: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/116.jpg)
HierarchicalConstruction
St
Datastreamcontainingnelements
S1 S6
2k
2kcenters
S2 S5S4S3
2k 2k 2k 2k 2k2k
2kcenters
2kcenters
2kcenters
![Page 117: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/117.jpg)
CoreSetAlgorithm
Algorithmidea:
Define.
Treewithfan-outm hashowmanylevels?
Spaceusage:
logm n =
log n
logm=
log n
log n✏=
1
✏
m = n✏
✓1
✏
◆(m)(2k) =
2kn✏
✏
Storeatmostmsetsofcentersfor eachlevelofthetree.
![Page 118: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/118.jpg)
CoreSetAlgorithm
Algorithmidea:
Define.
Approximationfactor?
m = n✏
![Page 119: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/119.jpg)
Streamingk-Median
Core-SetAlgorithm
Lemma:
Interpretation:Wecanboundthecostoflevel1by8timeslevel0…
Similarly:Wecanboundthecostoflevel2by8timeslevel1…Wecanboundthecostoflevel(1/𝜀)by8timeslevel(1/𝜀)-1.
d(Sw, T ) 8tX
i=1
d(Si, Ti) + 8d(S,C⇤)
![Page 120: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/120.jpg)
CoreSetAlgorithm
Algorithmidea:
Define.
Approximationfactor:
m = n✏
O(81/✏)
![Page 121: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/121.jpg)
HierarchicalConstruction
St
Datastreamcontainingnelements
S1 S6
2k
2kcenters
S2 S5S4S3
2k 2k 2k 2k 2k2k
2kcenters
2kcenters
2kcenters
Space:Approximation:(2,)O(81/✏) O(kn1/✏/✏)
![Page 122: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/122.jpg)
k-CenterClustering
Givenpoints:P=p1,p2,…,pn
Assumptions:⇒ Pointsareinametricspace:
distancessatisfytriangleinequality.
⇒ (Think:Euclideanspace)⇒ Thenumberofclustersk isgiven.
Goal:⇒ Chooseasetk points(“centers”)
thatminimizethemaximumdistancetoacenter.
Example:3clusters
![Page 123: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/123.jpg)
k-CenterApproximationAlgorithm
Showthatthisisa2-approximation:
1. T={x} foranyx inP.2. Repeatuntil|T|=k:• Letz bethepointinP that
maximizesd(z,T).• Addz toT.
3. ReturnT.
Claim:cost(P,T)≤2cost(P,C*)cost(P,T) isthemaximumdistanceofanypointinP tothesetT.
![Page 124: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/124.jpg)
k-CenterClustering
Someusefulthingstoprove:
Ifx isthefarthestpointfromT attheend(atdistancer):⇒ EverypointT∪{x} isatleastr fromeachother.⇒ Everyotherpointisdistance<r fromT.
IfC* isanoptimalclustering:⇒ AtleasttwopointsinT∪{x} areassignedtothesamecenter.⇒ Sothecentermustbeatleastdistancer/2 fromoneofthem.
![Page 125: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/125.jpg)
Showthatthisisan8-approximation:
T=firstk pointsinstream.R=1Repeatuntilendofstream:
1. While|T|≤k:• Getnewpointx.• ifd(x,T)>2R,thenaddx toT.
2. T’=∅.3. Whilesomez inT hasd(z,T’)>2R:addz toT’4. T=T’5. R=2R
Streamingk-CenterClustering
Assumeminimumdistancebetweenpointsis1.
RebuildT’here.
DoubleR.
![Page 126: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/126.jpg)
Streamingk-CenterClustering
Someusefulthingstoprove:
Beforestep(2):⇒ Everypointiswithin2RofT.
Beforestep(5):⇒ Everypointiswithin4RofT.
Beforestep(2):⇒ Therearek+1centersatdistanceatleastRfromeachother.
Beforestep(5):⇒ Allcentersaredistanceatleast2Rfromeachother.
![Page 127: Algorithms at Scale - NUS Computinggilbert/CS5234/2019/... · 2019. 9. 20. · Algorithms at Scale (Week 6) Summary Today: Clustering and Streaming k-median clustering • Find kcenters](https://reader035.vdocuments.site/reader035/viewer/2022071607/6144a9a1b5d1170afb440524/html5/thumbnails/127.jpg)
Summary
Today:ClusteringandStreamingk-medianclustering• Findk centerstominimizetheaverage
distancetoacenter.LPapproximationalgorithm• Find2k centersthatgivea4-
approximationoftheoptimalclustering.Streaming• Findk centersinastreamofpoints.• Useahierarchicalschemetoreduce
space.Otherclusteringproblems
LastWeek:GraphStreaming
ConnectivityBipartitetestMSTSpannersMatching