sketching, sampling, and other sublinear algorithms 4 (lecture by alex andoni)
DESCRIPTION
Parallel framework: we look at problems where neither the data or the output fits on a machine. For example, given a set of 2D points, how can we compute the minimum spanning tree over a cluster of machines.TRANSCRIPT
![Page 1: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/1.jpg)
Sketching, Sampling and other Sublinear Algorithms:
Algorithms for parallel models
Alex Andoni(MSR SVC)
![Page 2: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/2.jpg)
Parallel Models Data cannot be seen by one machine Distributed across many machines MapReduce, Hadoop, Dryad,…
Algorithmic tools for the models? very incipient!
![Page 3: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/3.jpg)
Types of problems 0. Statistics: 2nd moment of the frequency 1. Sort n numbers 2. s-t connectivity in a graph 3. Minimum Spanning Tree on a graph … many more!
![Page 4: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/4.jpg)
Computational Model machines space per machine O(input size)
cannot replicate data much Input: elements Output: O(input size)=O(n)
doesn’t fit on a machine:
Round: shuffle all (expensive!)
![Page 5: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/5.jpg)
Model Constraints Main goal:
number of rounds for
holds when Resources bounded by
in/out communication/round run-time/round
Model essentially that of: Bulk-Synchronous Parallel [Valiant’90] Map Reduce Framework [Feldman-Muthukrishnan-
Sidiropoulos-Stein-Svitkina’07, Karloff-Suri-Vassilvitskii’10, Goodrich-Sitchinava-Zhang’11]
![Page 6: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/6.jpg)
PRAMs Good news: can implement algorithms
developed for Parallel RAM model can simulate many of PRAM algorithms with
R=O(parallel time) [KSV’10,GSZ’11]
Bad news: often logarithmic…
![Page 7: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/7.jpg)
Problem 0: Statistics Problem:
Log of traffic stored at many machines Want (say) 2nd moment of frequencies of items
Solution: Each machine computes a sketch of local data Send to machine Machine adds up the sketches to get the sketch of
entire data: S(data ) + S(data ) + … S(data ) = S(data + data +…
data )
IP Frequency
2 15 37 2
194
1+9+4=14
![Page 8: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/8.jpg)
Problem 1: sorting Suppose:
Algorithm: Pick each element with Pr=
total elements chosen Send chosen elements to machine Choose ~equidistant pivots and assign a range to each machine
each range will capture about elements Send the pivots to all machines Each machine sends elements in range to machine Sort locally
3 rounds!
machine responsible
machine responsible
machine responsible
![Page 9: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/9.jpg)
Problem 2: graph connectivity Dense: if
Can do in rounds [KSV’10…] Sparse: if
Hard: big open question to do s-t connectivity in rounds.
VS
![Page 10: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/10.jpg)
Problems 3: geometric graphs Implicit graph on points in
distance = Euclidean distance
Questions: Minimum Spanning Tree (MST)
Agglomerative hierarchical clustering Earth-Mover Distance Travelling Salesman Person etc
![Page 11: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/11.jpg)
Problem: Geometric MST Will show algorithm for
approximate Minimum Spanning Tree in number of rounds is
as long as Related to some streaming work [Indyk’04,…]
Which are useful for computing cost, but not actual solution
Geometric information makes the problem tractable for parallel computation!
[A-Nikolov-Onak-Yaroslavtsev’??]
![Page 12: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/12.jpg)
General Approach Partition the space hierarchically in a “nice
way” In each part
Compute a pseudo-solution to the problem Sketch the pseudo-solution with small space Send the sketch to be used in the next level/round
![Page 13: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/13.jpg)
MST algorithm: attempt 1 Partition the space hierarchically in a “nice
way” In each part
Compute a pseudo-solution to the problem Sketch the pseudo-solution with small space Send the sketch to be used in the next level/round
quad trees!
compute MST
send any point as a
representative
![Page 14: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/14.jpg)
Troubles Quad tree can cut MST edges
forcing irrevocable decisions Choose a wrong representative
![Page 15: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/15.jpg)
MST algorithm: final Assume entire pointset in a cube of size Partition:
impose a randomly shifted quad-tree cells of size
Pseudo-solution: MST with edges up to length , where is the
current cell-length Sketch of a pseudo-solution:
Compute an -net of points a maximal subset of inter-distance
Store connectivity of the net points in pseudo-solution
![Page 16: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/16.jpg)
MST algorithm: Glimpse of analysis Quad tree can cut MST edges
consider an edge of MST of length probability it is cut by the quad-tree is morally: instead of the edge, can only use an edge
of length expected cost of misconnecting:
total error from misconnecting: Performance:
Need to consider only levels of the tree Net size is
![Page 17: Sketching, Sampling, and other Sublinear Algorithms 4 (Lecture by Alex Andoni)](https://reader033.vdocuments.site/reader033/viewer/2022061104/53ff968d8d7f7261088b4655/html5/thumbnails/17.jpg)
Finale Gotta love your models:
Streaming: sub-linear space see all data sequentially
Parallel computing: sub-linear space per machine data distributed over many machines communication (rounds) expensive
Algorithmic tools in development!