komplexitätstheorie und effiziente algorithmen christian sohler, tu dortmund algorithms for...
DESCRIPTION
Komplexitätstheorie und effiziente Algorithmen 3 Geometric data streams Massive sets of geometric objects arriving sequentially Objects are typically points Different form of arrival: - sequence of points - sequence of updates Questions Find ways to analyze the geometric structure of the input data using small space IntroductionTRANSCRIPT
![Page 1: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/1.jpg)
Komplexitätstheorie und effiziente Algorithmen
Christian Sohler, TU Dortmund
Algorithms for geometric data streams
![Page 2: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/2.jpg)
2
Komplexitätstheorie und effiziente Algorithmen
Data streams• Massive data set arriving sequentially• Different ways of „arriving“
Examples• Network traffic• Query logs• …
Approach• Find algorithms that make a single (a few) pass(es) and process
data sequentially
Introduction
![Page 3: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/3.jpg)
3
Komplexitätstheorie und effiziente Algorithmen
Geometric data streams• Massive sets of geometric objects arriving sequentially• Objects are typically points• Different form of arrival:
- sequence of points- sequence of updates
Questions• Find ways to analyze the geometric structure of the input data using
small space
Introduction
![Page 4: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/4.jpg)
4
Komplexitätstheorie und effiziente Algorithmen
Motivation• Many computational tasks can be interpreted geometrically• Geometric features may be useful in learning and classification• Geometry plays an important role in the application
Examples• Learning • Clustering• How ‚clusterable‘ is a data set?• Road traffic prediction
Introduction
![Page 5: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/5.jpg)
5
Komplexitätstheorie und effiziente Algorithmen
A basic learning problem• We have two classes of objects
Introduction
![Page 6: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/6.jpg)
6
Komplexitätstheorie und effiziente Algorithmen
A basic learning problem• We have two classes of objects
Introduction
![Page 7: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/7.jpg)
7
Komplexitätstheorie und effiziente Algorithmen
A basic learning problem• We have two classes of objects• We are given examples from both classes
Introduction
![Page 8: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/8.jpg)
8
Komplexitätstheorie und effiziente Algorithmen
A basic learning problem• We have two classes of objects• We are given examples from both classes
Introduction
![Page 9: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/9.jpg)
9
Komplexitätstheorie und effiziente Algorithmen
A basic learning problem• We have two classes of objects• We are given examples from both classes• Learn from examples to which class
future objects belong
Introduction
?
![Page 10: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/10.jpg)
10
Komplexitätstheorie und effiziente Algorithmen
A basic learning problem• We have two classes of objects• We are given examples from both classes• Learn from examples to which class
future objects belong• Map object‘s description to Euclidean space
Introduction
?
![Page 11: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/11.jpg)
11
Komplexitätstheorie und effiziente Algorithmen
A basic learning problem• We have two classes of objects• We are given examples from both classes• Learn from examples to which class
future objects belong• Map object‘s description to Euclidean space
SVM approach • Compute maximum margin hyperplane• Classifiy points according to their side
Introduction
?
![Page 12: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/12.jpg)
12
Komplexitätstheorie und effiziente Algorithmen
SVM and SEB (smallest enclosing balls)• Dual of certain SVM formulation is SEB
[Tax, Duin, Pattern Recognition Letters, ‘99]• Geometric streaming SEB can be used as SVM heuristic
[Rai, Daume III, Venkatasubramanian, IJCAI‘09]• Also: Coresets have been used
to construct CSVMs[Tsang, Kwok, Cheung, Journal of Machine Learning Research, ’05]
Introduction
?
![Page 13: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/13.jpg)
13
Komplexitätstheorie und effiziente Algorithmen
Outline • Merge & Reduce• Embeddings into tree metrics• Estimation of distribution of local neighborhoods• Balanced partitions• Approximating properties of balanced partitions
Introduction
![Page 14: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/14.jpg)
14
Komplexitätstheorie und effiziente Algorithmen
Insertion-only streams• Sequence of points p ,…, p from R
Merge & Reduce
1 nd
![Page 15: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/15.jpg)
15
Komplexitätstheorie und effiziente Algorithmen
Definition [k-median clustering]Given a weighted set P of points in R the k-median problem is to find
a set CR of k points (centers) such that cost(P,C) = wmin ||p-c||
is minimized, where w >0 is the weight of point p.
Merge & Reduce
d
pP cC
d
p
p
![Page 16: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/16.jpg)
16
Komplexitätstheorie und effiziente Algorithmen
Coreset [Har-Peled, Mazumdar, STOC’04]A weighted point set S is a (k,)-coreset of a weighted point set P, if for
every set C of k centers | cost(P,C) – cost(S,C) | cost(P,C).
Merge & Reduce
3
3
3
33
4
4
![Page 17: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/17.jpg)
17
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
![Page 18: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/18.jpg)
18
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
Coreset
![Page 19: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/19.jpg)
19
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
![Page 20: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/20.jpg)
20
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
Coreset
![Page 21: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/21.jpg)
21
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
Coreset of Union of Coreset
![Page 22: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/22.jpg)
22
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
![Page 23: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/23.jpg)
23
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
![Page 24: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/24.jpg)
24
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
![Page 25: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/25.jpg)
25
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
![Page 26: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/26.jpg)
26
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
![Page 27: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/27.jpg)
27
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
![Page 28: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/28.jpg)
28
Komplexitätstheorie und effiziente Algorithmen
Coresets by pre-clustering [Guha, Mishra, Motwani, O‘Callaghan, FOCS’00; Har-Peled, Mazumdar, STOC’04; Frahling, S., STOC‘05]
• Compute a pre-clustering S with >k centers and cost(P,S) Opt• Size exponential in d
Merge & Reduce
3
3
3
33
4
4
k
![Page 29: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/29.jpg)
29
Komplexitätstheorie und effiziente Algorithmen
Coresets by sampling [Chen, SICOMP’09; Feldman, Monemizadeh, S., SoCG‘07]
• Compute a random non-uniform sample• Show that sample approximates all solutions from a net• Size polynomial in d
Merge & Reduce
MM M/4
![Page 30: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/30.jpg)
30
Komplexitätstheorie und effiziente Algorithmen
Coresets by reduction to 1D [Har-Peled, Kushal, DCG’07, Feldman, Fiat, Sharir, FOCS‘06]
• Uses geometric arguments to solve 1D• Combine with preclusting using line centers• For k-median: Size independent of n (but exponential in d)
Merge & Reduce
![Page 31: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/31.jpg)
31
Komplexitätstheorie und effiziente Algorithmen
Open problems • Coresets for k-median of size independent of n and d ?
(Partial result in [Feldman, Monemizadeh, S., SoCG’07])• Coresets for k-median of size O(d/²)• Coresets for k-median of size poly(d, log n)/for
constant c=c(d)>0• Coresets for j-subspace 1-median of size poly(, d, j, log n) ?• Same questions for k-means objective function
Remark: Open questions refer to the definition of coresets from this talk.
Merge & Reduce
2-c
![Page 32: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/32.jpg)
32
Komplexitätstheorie und effiziente Algorithmen
Insertion/deletion model• Stream consists of Insert(p), Delete(p) operations• Points are from {1,…, }• Stream is consistent, i.e. no Delete(p), if p is not present and no
Insert(p), if p is already present in the current set
Geometric update streams
d
![Page 33: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/33.jpg)
33
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms via embeddings into tree metrics
Embeddings in tree metrics
p
q
r
st
![Page 34: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/34.jpg)
34
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms via embeddings into tree metrics
Embeddings in tree metrics
p
q
r
st tsr
p q
![Page 35: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/35.jpg)
35
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms via embeddings into tree metrics
Embeddings in tree metrics
p
q
r
st tsr
p q
p q sr t
2 i
2 i2 i
2 i
![Page 36: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/36.jpg)
36
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms via embeddings into tree metrics
Embeddings in tree metrics
p
q
r
st tsr
p q
p q
2 i-1
2 i2 i
2 i
qp sr
s tr
2 i-1 2 i-1 2 i-1
![Page 37: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/37.jpg)
37
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms via embeddings into tree metrics
Embeddings in tree metrics
p
q
r
st tsr
p q
p q
2 i2 i
2 i
qp sr
s tr
2 i-1 2 i-1 2 i-1
r s
2 i-22 i-2
![Page 38: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/38.jpg)
38
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms via embeddings into tree metrics
Embeddings in tree metrics
p
q
r
st tsr
p q
p q
2 i2 i
2 i
qp sr
s tr
2 i-1 2 i-1 2 i-1
r s
2 i-22 i-2
![Page 39: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/39.jpg)
39
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms via embeddings into tree metrics
Embeddings in tree metrics D(.,.)• ||p-q|| D(p,q)• E[D(p,q)] = O(log )||p-q||
[Bartal, FOCS’96;Charikar, Chekuri, Goel, Guha,Plotkin, FOCS’98]
tsrp q
p q
2 i2 i
2 i
qp sr
s tr
2 i-1 2 i-1 2 i-1
r s
2 i-22 i-2
![Page 40: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/40.jpg)
40
Komplexitätstheorie und effiziente Algorithmen
Estimator for cost of Euclidean minimum spanning tree (EMST) [Indyk, STOC’04]
• Write EMST for cost of EMST• Write MST for cost of minimum spanning tree of tree metric D• E[MST ] = O(log ) EMST (linearity of expectation)• Use cost of MST of D as estimator
Streaming algorithms viaembeddings into tree metrics
D
D
![Page 41: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/41.jpg)
41
Komplexitätstheorie und effiziente Algorithmen
Observation [Indyk, STOC’04]• The MST of D(.,.) is given by the tree defining the tree metric • #edges of length 2 = #non-empty cells in corresponding grid
Streaming algorithms via embeddings into tree metrics
p
q
r
st
tsrp q
p q sr t
2 i2 i
2 i
i
2 i
![Page 42: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/42.jpg)
42
Komplexitätstheorie und effiziente Algorithmen
Euclidean minimum spanning tree1. Use O(log nested grids G(i) with side length 22. for each grid3. approximate |G(i)| := #nonempty cells in G(i) using F sketch
4. return 2 |G(i)|
Theorem [Indyk, STOC’04]The above algorithm computes a O(log )-approximation to the cost of
the minimum spanning tree.
Streaming algorithms viaembeddings into tree metrics
i
i0
![Page 43: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/43.jpg)
43
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms viaembeddings into tree metrics
Results using a similar approach [Indyk, STOC’04]
Earth mover‘s distance O(log )Facility location O(log² )Matching O(log )
k-Median O(1)1+ with huge extraction time
Problem Approx. factor
![Page 44: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/44.jpg)
44
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms viaestimating the distribution of local neighborhoods
Distribution of neighborhoods • Grids G(i) as before• R-neighborhood of C: cells within distance at most R from C• m (i) is number of points in i-th cell of the R-neighborhood of C
1 2 3
4 5 6 7 8
9 10 11 12 13
14 15 16 17 18
19 20 21
C,R
A cell and its 2-neighborhood
![Page 45: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/45.jpg)
45
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms viaestimating the distribution of local neighborhoods
EMST estimator• Define Z (i) = ( m (i) > 0 )• EMST can be approximated from the Z (i)• Approx. ratio goes to 1 as R goes to
C,R C,R
C,R
![Page 46: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/46.jpg)
46
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms viaestimating the distribution of local neighborhoods
EMST estimator• K: Size of R-neighborhood• Z are functions from {1,…,K} to {0,1}• Random (nonempty) C defines distribution over neighborhoods,
i.e. over functions Z:{1,…,K} {0,1}• Can still estimate EMST from this distribution
C,R
![Page 47: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/47.jpg)
47
Komplexitätstheorie und effiziente Algorithmen
Algorithm• Sample a certain number of nonempty grid cells and maintain
number of points for each cell in their neighborhood• Sample gives estimation of the distribution of the Z (.)• Obtain estimation for EMST from estimated distribution
Theorem [Frahling, Indyk, S., IJCGA’07]Let >0, d be constants.The cost of a Euclidean minimum spanning
tree of a point set in R given as an update stream can be estimated with a factor of 1 using polylog() space.
Streaming algorithms viaestimating the distribution of local neighborhoods
C,R
d
![Page 48: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/48.jpg)
48
Komplexitätstheorie und effiziente Algorithmen
Open Problems• (1+)-approximation for matching and/or earth mover‘s distance• Other problems? Approach is not very well understood• General characterization of problems solvable via approximation of
the distribution of local neighborhoods
Streaming algorithms viaestimating the distribution of local neighborhoods
![Page 49: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/49.jpg)
49
Komplexitätstheorie und effiziente Algorithmen
Estimating the distribution [Frahling, S., STOC’05]• Divide space into regions• For each region maintain #points inside• Balance „error“ among regions• Notion of error depends on problem
Example• 1-Median in 1D• Error cell width #points in cell
Streaming algorithms viabalanced partitions
![Page 50: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/50.jpg)
50
Komplexitätstheorie und effiziente Algorithmen
Small space?• Problem dependent • Need to show that decomposition in few regions with sufficiently
small error exists
Streaming algorithms viabalanced partitions
![Page 51: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/51.jpg)
51
Komplexitätstheorie und effiziente Algorithmen
One approach [Frahling, S., STOC’05]• Nested grids G(i)• For each grid maintain cells intersected by random sample
(sample sizes differ for different grids)• #sample points inside cell -> #points inside cell• Combine cells from different grids to space decomposition
Streaming algorithms viabalanced partitions
![Page 52: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/52.jpg)
52
Komplexitätstheorie und effiziente Algorithmen
Works for• k-median• k-means• MaxTSP, MaxMatching, Maximum spanning tree, Average distance,
MaxCut
Why?• Require proof for k-median and k-means• Last 5 problems can be reduced to 1-median
Streaming algorithms viabalanced partitions
![Page 53: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/53.jpg)
53
Komplexitätstheorie und effiziente Algorithmen
Approximating properties of balanced partitions[Lammersen, S., ESA‘08]• Previous approach may lead to many regions• Example: facility location• Can approximate properties of balanced partitions, e.g. #regions• Only gives approximation of cost of solution• More details in Christiane‘s talk
Streaming algorithms viaapproximation of balanced partitions
![Page 54: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/54.jpg)
54
Komplexitätstheorie und effiziente Algorithmen
Open problems• Min-sum-k-clustering• Other problems?
Streaming algorithms viabalanced partitions
![Page 55: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/55.jpg)
55
Komplexitätstheorie und effiziente Algorithmen
(Some) Techniques in geometric streaming:• Merge & Reduce• Embeddings into tree metrics• Estimation of distribution of local neighborhoods• Balanced partitions• Approximating properties of balanced partitions
And lots of open problems to work on…
Summary
![Page 56: Komplexitätstheorie und effiziente Algorithmen Christian Sohler, TU Dortmund Algorithms for geometric data streams](https://reader036.vdocuments.site/reader036/viewer/2022070605/5a4d1add7f8b9ab059975915/html5/thumbnails/56.jpg)
56
Komplexitätstheorie und effiziente Algorithmen
Thank you!