komplexitätstheorie und effiziente algorithmen christian sohler, tu dortmund algorithms for...
Post on 06-Jan-2018
217 Views
Preview:
DESCRIPTION
TRANSCRIPT
Komplexitätstheorie und effiziente Algorithmen
Christian Sohler, TU Dortmund
Algorithms for geometric data streams
2
Komplexitätstheorie und effiziente Algorithmen
Data streams• Massive data set arriving sequentially• Different ways of „arriving“
Examples• Network traffic• Query logs• …
Approach• Find algorithms that make a single (a few) pass(es) and process
data sequentially
Introduction
3
Komplexitätstheorie und effiziente Algorithmen
Geometric data streams• Massive sets of geometric objects arriving sequentially• Objects are typically points• Different form of arrival:
- sequence of points- sequence of updates
Questions• Find ways to analyze the geometric structure of the input data using
small space
Introduction
4
Komplexitätstheorie und effiziente Algorithmen
Motivation• Many computational tasks can be interpreted geometrically• Geometric features may be useful in learning and classification• Geometry plays an important role in the application
Examples• Learning • Clustering• How ‚clusterable‘ is a data set?• Road traffic prediction
Introduction
5
Komplexitätstheorie und effiziente Algorithmen
A basic learning problem• We have two classes of objects
Introduction
6
Komplexitätstheorie und effiziente Algorithmen
A basic learning problem• We have two classes of objects
Introduction
7
Komplexitätstheorie und effiziente Algorithmen
A basic learning problem• We have two classes of objects• We are given examples from both classes
Introduction
8
Komplexitätstheorie und effiziente Algorithmen
A basic learning problem• We have two classes of objects• We are given examples from both classes
Introduction
9
Komplexitätstheorie und effiziente Algorithmen
A basic learning problem• We have two classes of objects• We are given examples from both classes• Learn from examples to which class
future objects belong
Introduction
?
10
Komplexitätstheorie und effiziente Algorithmen
A basic learning problem• We have two classes of objects• We are given examples from both classes• Learn from examples to which class
future objects belong• Map object‘s description to Euclidean space
Introduction
?
11
Komplexitätstheorie und effiziente Algorithmen
A basic learning problem• We have two classes of objects• We are given examples from both classes• Learn from examples to which class
future objects belong• Map object‘s description to Euclidean space
SVM approach • Compute maximum margin hyperplane• Classifiy points according to their side
Introduction
?
12
Komplexitätstheorie und effiziente Algorithmen
SVM and SEB (smallest enclosing balls)• Dual of certain SVM formulation is SEB
[Tax, Duin, Pattern Recognition Letters, ‘99]• Geometric streaming SEB can be used as SVM heuristic
[Rai, Daume III, Venkatasubramanian, IJCAI‘09]• Also: Coresets have been used
to construct CSVMs[Tsang, Kwok, Cheung, Journal of Machine Learning Research, ’05]
Introduction
?
13
Komplexitätstheorie und effiziente Algorithmen
Outline • Merge & Reduce• Embeddings into tree metrics• Estimation of distribution of local neighborhoods• Balanced partitions• Approximating properties of balanced partitions
Introduction
14
Komplexitätstheorie und effiziente Algorithmen
Insertion-only streams• Sequence of points p ,…, p from R
Merge & Reduce
1 nd
15
Komplexitätstheorie und effiziente Algorithmen
Definition [k-median clustering]Given a weighted set P of points in R the k-median problem is to find
a set CR of k points (centers) such that cost(P,C) = wmin ||p-c||
is minimized, where w >0 is the weight of point p.
Merge & Reduce
d
pP cC
d
p
p
16
Komplexitätstheorie und effiziente Algorithmen
Coreset [Har-Peled, Mazumdar, STOC’04]A weighted point set S is a (k,)-coreset of a weighted point set P, if for
every set C of k centers | cost(P,C) – cost(S,C) | cost(P,C).
Merge & Reduce
3
3
3
33
4
4
17
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
18
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
Coreset
19
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
20
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
Coreset
21
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
Coreset of Union of Coreset
22
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
23
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
24
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
25
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
26
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
27
Komplexitätstheorie und effiziente Algorithmen
Observation• Union of two (k,)-coresets is a (k,)-coreset• Can compute coreset of a coreset
Merge & Reduce
… Input Stream
28
Komplexitätstheorie und effiziente Algorithmen
Coresets by pre-clustering [Guha, Mishra, Motwani, O‘Callaghan, FOCS’00; Har-Peled, Mazumdar, STOC’04; Frahling, S., STOC‘05]
• Compute a pre-clustering S with >k centers and cost(P,S) Opt• Size exponential in d
Merge & Reduce
3
3
3
33
4
4
k
29
Komplexitätstheorie und effiziente Algorithmen
Coresets by sampling [Chen, SICOMP’09; Feldman, Monemizadeh, S., SoCG‘07]
• Compute a random non-uniform sample• Show that sample approximates all solutions from a net• Size polynomial in d
Merge & Reduce
MM M/4
30
Komplexitätstheorie und effiziente Algorithmen
Coresets by reduction to 1D [Har-Peled, Kushal, DCG’07, Feldman, Fiat, Sharir, FOCS‘06]
• Uses geometric arguments to solve 1D• Combine with preclusting using line centers• For k-median: Size independent of n (but exponential in d)
Merge & Reduce
31
Komplexitätstheorie und effiziente Algorithmen
Open problems • Coresets for k-median of size independent of n and d ?
(Partial result in [Feldman, Monemizadeh, S., SoCG’07])• Coresets for k-median of size O(d/²)• Coresets for k-median of size poly(d, log n)/for
constant c=c(d)>0• Coresets for j-subspace 1-median of size poly(, d, j, log n) ?• Same questions for k-means objective function
Remark: Open questions refer to the definition of coresets from this talk.
Merge & Reduce
2-c
32
Komplexitätstheorie und effiziente Algorithmen
Insertion/deletion model• Stream consists of Insert(p), Delete(p) operations• Points are from {1,…, }• Stream is consistent, i.e. no Delete(p), if p is not present and no
Insert(p), if p is already present in the current set
Geometric update streams
d
33
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms via embeddings into tree metrics
Embeddings in tree metrics
p
q
r
st
34
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms via embeddings into tree metrics
Embeddings in tree metrics
p
q
r
st tsr
p q
35
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms via embeddings into tree metrics
Embeddings in tree metrics
p
q
r
st tsr
p q
p q sr t
2 i
2 i2 i
2 i
36
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms via embeddings into tree metrics
Embeddings in tree metrics
p
q
r
st tsr
p q
p q
2 i-1
2 i2 i
2 i
qp sr
s tr
2 i-1 2 i-1 2 i-1
37
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms via embeddings into tree metrics
Embeddings in tree metrics
p
q
r
st tsr
p q
p q
2 i2 i
2 i
qp sr
s tr
2 i-1 2 i-1 2 i-1
r s
2 i-22 i-2
38
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms via embeddings into tree metrics
Embeddings in tree metrics
p
q
r
st tsr
p q
p q
2 i2 i
2 i
qp sr
s tr
2 i-1 2 i-1 2 i-1
r s
2 i-22 i-2
39
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms via embeddings into tree metrics
Embeddings in tree metrics D(.,.)• ||p-q|| D(p,q)• E[D(p,q)] = O(log )||p-q||
[Bartal, FOCS’96;Charikar, Chekuri, Goel, Guha,Plotkin, FOCS’98]
tsrp q
p q
2 i2 i
2 i
qp sr
s tr
2 i-1 2 i-1 2 i-1
r s
2 i-22 i-2
40
Komplexitätstheorie und effiziente Algorithmen
Estimator for cost of Euclidean minimum spanning tree (EMST) [Indyk, STOC’04]
• Write EMST for cost of EMST• Write MST for cost of minimum spanning tree of tree metric D• E[MST ] = O(log ) EMST (linearity of expectation)• Use cost of MST of D as estimator
Streaming algorithms viaembeddings into tree metrics
D
D
41
Komplexitätstheorie und effiziente Algorithmen
Observation [Indyk, STOC’04]• The MST of D(.,.) is given by the tree defining the tree metric • #edges of length 2 = #non-empty cells in corresponding grid
Streaming algorithms via embeddings into tree metrics
p
q
r
st
tsrp q
p q sr t
2 i2 i
2 i
i
2 i
42
Komplexitätstheorie und effiziente Algorithmen
Euclidean minimum spanning tree1. Use O(log nested grids G(i) with side length 22. for each grid3. approximate |G(i)| := #nonempty cells in G(i) using F sketch
4. return 2 |G(i)|
Theorem [Indyk, STOC’04]The above algorithm computes a O(log )-approximation to the cost of
the minimum spanning tree.
Streaming algorithms viaembeddings into tree metrics
i
i0
43
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms viaembeddings into tree metrics
Results using a similar approach [Indyk, STOC’04]
Earth mover‘s distance O(log )Facility location O(log² )Matching O(log )
k-Median O(1)1+ with huge extraction time
Problem Approx. factor
44
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms viaestimating the distribution of local neighborhoods
Distribution of neighborhoods • Grids G(i) as before• R-neighborhood of C: cells within distance at most R from C• m (i) is number of points in i-th cell of the R-neighborhood of C
1 2 3
4 5 6 7 8
9 10 11 12 13
14 15 16 17 18
19 20 21
C,R
A cell and its 2-neighborhood
45
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms viaestimating the distribution of local neighborhoods
EMST estimator• Define Z (i) = ( m (i) > 0 )• EMST can be approximated from the Z (i)• Approx. ratio goes to 1 as R goes to
C,R C,R
C,R
46
Komplexitätstheorie und effiziente Algorithmen
Streaming algorithms viaestimating the distribution of local neighborhoods
EMST estimator• K: Size of R-neighborhood• Z are functions from {1,…,K} to {0,1}• Random (nonempty) C defines distribution over neighborhoods,
i.e. over functions Z:{1,…,K} {0,1}• Can still estimate EMST from this distribution
C,R
47
Komplexitätstheorie und effiziente Algorithmen
Algorithm• Sample a certain number of nonempty grid cells and maintain
number of points for each cell in their neighborhood• Sample gives estimation of the distribution of the Z (.)• Obtain estimation for EMST from estimated distribution
Theorem [Frahling, Indyk, S., IJCGA’07]Let >0, d be constants.The cost of a Euclidean minimum spanning
tree of a point set in R given as an update stream can be estimated with a factor of 1 using polylog() space.
Streaming algorithms viaestimating the distribution of local neighborhoods
C,R
d
48
Komplexitätstheorie und effiziente Algorithmen
Open Problems• (1+)-approximation for matching and/or earth mover‘s distance• Other problems? Approach is not very well understood• General characterization of problems solvable via approximation of
the distribution of local neighborhoods
Streaming algorithms viaestimating the distribution of local neighborhoods
49
Komplexitätstheorie und effiziente Algorithmen
Estimating the distribution [Frahling, S., STOC’05]• Divide space into regions• For each region maintain #points inside• Balance „error“ among regions• Notion of error depends on problem
Example• 1-Median in 1D• Error cell width #points in cell
Streaming algorithms viabalanced partitions
50
Komplexitätstheorie und effiziente Algorithmen
Small space?• Problem dependent • Need to show that decomposition in few regions with sufficiently
small error exists
Streaming algorithms viabalanced partitions
51
Komplexitätstheorie und effiziente Algorithmen
One approach [Frahling, S., STOC’05]• Nested grids G(i)• For each grid maintain cells intersected by random sample
(sample sizes differ for different grids)• #sample points inside cell -> #points inside cell• Combine cells from different grids to space decomposition
Streaming algorithms viabalanced partitions
52
Komplexitätstheorie und effiziente Algorithmen
Works for• k-median• k-means• MaxTSP, MaxMatching, Maximum spanning tree, Average distance,
MaxCut
Why?• Require proof for k-median and k-means• Last 5 problems can be reduced to 1-median
Streaming algorithms viabalanced partitions
53
Komplexitätstheorie und effiziente Algorithmen
Approximating properties of balanced partitions[Lammersen, S., ESA‘08]• Previous approach may lead to many regions• Example: facility location• Can approximate properties of balanced partitions, e.g. #regions• Only gives approximation of cost of solution• More details in Christiane‘s talk
Streaming algorithms viaapproximation of balanced partitions
54
Komplexitätstheorie und effiziente Algorithmen
Open problems• Min-sum-k-clustering• Other problems?
Streaming algorithms viabalanced partitions
55
Komplexitätstheorie und effiziente Algorithmen
(Some) Techniques in geometric streaming:• Merge & Reduce• Embeddings into tree metrics• Estimation of distribution of local neighborhoods• Balanced partitions• Approximating properties of balanced partitions
And lots of open problems to work on…
Summary
56
Komplexitätstheorie und effiziente Algorithmen
Thank you!
top related