distributed top-k query processing · 7/18/13 6...
TRANSCRIPT
7/18/13
1
Distributed Data Management Summer Semester 2013
TU Kaiserslautern
Dr.-‐Ing. Sebas9an Michel
[email protected]‐saarland.de
Distributed Data Management, SoSe 2013, S. Michel 1
(DISTRIBUTED) TOP-‐K QUERY PROCESSING
Lecture 11
Distributed Data Management, SoSe 2013, S. Michel 2
Top-‐k Rankings
Distributed Data Management, SoSe 2013, S. Michel 3 3
Distributed Data Management, SoSe 2013, S. Michel 4
Rankings and Top-‐k Queries
• With humans in the loop, it is essen9al to bring large amounts of informa9on into a ranked form.
• To focus on the essence of data (top ranked pieces, according to criteria).
• E.g., query results in Web search
• Then: efficient computa9on of top-‐k results; not full computa9on and sort
Distributed Data Management, SoSe 2013, S. Michel 5
Overview of Today’s Lecture
• Top-‐K Queries Founda9ons/Model • Threshold Algorithms
– With or without Random Accesses – Approximate variant with guarantees
• Distributed Algorithms – Three phase uniform threshold algorithm – Op9miza9ons
Distributed Data Management, SoSe 2013, S. Michel 6
7/18/13
2
Computa9onal Model: Index Lists • We assume data is stored in so called index lists. One per a_ribute.
• Each stores item iden9fier (id) and a score of an item with respect to the list’s a_ribute
Distributed Data Management, SoSe 2013, S. Michel 7
Id Score
d12 0.83
d81 0.81
d43 0.68
d18 0.62
… …
• Inside each list, entries are sorted by score in descending order
Computa9onal Model: Aggrega9on
• Given m such lists. • The task is to efficiently compute the k items with highest aggregated score (the top-‐k result)
• That is, applica9on of aggrega9on func9on on an item’s scores across the m lists – E.g., summa9on
Distributed Data Management, SoSe 2013, S. Michel 8
Example Applica9on 1 • List with visual features (color, shape) of objects: red, blue, round, rectangular, …
• Query: Find the top-‐2 red and round objects
Distributed Data Management, SoSe 2013, S. Michel 9
Id Score
E 0.8
B 0.6
D 0.3
A 0.25
C 0.19
Id Score
D 0.8
B 0.75
A 0.6
C 0.25
E 0.05
color=red shape=round
Result: Id Σ Score B 1.35 D 1.10
More Example Applica9on • Web search engine
– One index list for each term – Query = {term1, term2, …, termM} – Aggrega9on query finds best match documents – Scores computed for each list aier TF*IDF (term frequency * inverse document frequency)
• Access log mining – One list per Web server. Score = bytes downloaded – Find IP addresses (clients) that caused largest aggregated amount of bytes downloaded
Distributed Data Management, SoSe 2013, S. Michel 10
Example Network Server Logs
Distributed Data Management, SoSe 2013, S. Michel 11
IP Bytes in kB
192.168.1.7 31kB
192.168.1.3 23kB
192.168.1.4 12kB
IP Bytes in kB
192.168.1.8 81kB
192.168.1.3 33kB
192.168.1.1 12kB
IP Bytes in kB
192.168.1.4 53kB
192.168.1.3 21kB
192.168.1.1 9kB
IP Bytes in kB
192.168.1.1 29kB
192.168.1.4 28kB
192.168.1.5 12kB
Computa9onal Model: Access Models • Sequen9al Access: Read content of an index list top-‐down
• Random Access: Lookup score of a specific item inside an index list
Distributed Data Management, SoSe 2013, S. Michel 12
Id Score
D 0.8
B 0.75
A 0.6
C 0.25
E 0.05
7/18/13
3
Essence of Top-‐K Algorithms
Distributed Data Management, SoSe 2013, S. Michel 13
!i score(i,a) " score(i,b)( )# aggr(a) " aggr(b)
• Compute top-‐k results without exhaus9ve access to index lists.
• Family of threshold algorithms: Compute early termina9on point using score threshold.
• Require monotonicity of aggrega9on func9on: – Aggr is monotone if for all two items a and b
where denotes score of item in index list i score(i, !)
Monotonicity Explained • Means: if item a is below item b in all considered index lists (that is, its score is smaller), it cannot have a final score higher than the one of b.
• Assume we have read already from the lists and have the following situa9on
Distributed Data Management, SoSe 2013, S. Michel 14
read un9l here
All items seen already in all three lists have aggr. score higher than red-‐dot item
History • Family of threshold algorithms • First by
– Fagin, 1999 – Nepal/ Ramakrishna, 1999 – Güntzer/Balke/Kießling, 2001
• Various versions exist
Distributed Data Management, SoSe 2013, S. Michel 15
Fagin’s Algorithm (FA) 1. Read sequen9ally from each list (round
robin) un9l observa9on of k dis9nct items, each seen in all lists.
Example of top-‐1 query:
Distributed Data Management, SoSe 2013, S. Michel 16
document score
doc3 17
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Fagin’s Algorithm • Have seen doc3 with score 17+7+12 = 36 • Par9ally seen: doc4 (12+15=27), doc1 (9+19=28), doc2 (11+2=13)
• Can we stop here already?
Distributed Data Management, SoSe 2013, S. Michel 17
document score
doc3 17
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Fagin’s Algorithm 2. Lookup missing scores in tail of lists (not in => 0) • Then doc4 (12+0+15=27), doc1 (0+9+19=28), doc2 (11+2+2=15)
• Done! Why?
Distributed Data Management, SoSe 2013, S. Michel 18
document score
doc3 17
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
7/18/13
4
Correctness • Due to monotonicity: Items not seen now are below doc3 in all lists, hence, aggregated score is also lower.
• See Ronald Fagin et al.: Op9mal aggrega9on algorithms for middleware. J. Comput. Syst. Sci. 66(4): 614-‐656 (2003)
for overview of threshold algorithms (FA+next ones). • Also more to this: instance op9mality
Distributed Data Management, SoSe 2013, S. Michel 19
Threshold Algorithm (TA)
• Read from index lists in sequen9al order • Lookup immediately missing scores • Stop if seen at least k objects with aggregated score higher than aggregated “scan line” (=τ)
Distributed Data Management, SoSe 2013, S. Michel 20
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Step 1
• Start seq. scanning. See doc3, lookup its score in list 2 and 3. Get:
• doc3: 18+7+12 = 37
Distributed Data Management, SoSe 2013, S. Michel 21
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Step 2
• Con9nue with doc1 seen in list 2. • doc1: 0+9+19 = 28 • doc3: 18+7+12 = 37
Distributed Data Management, SoSe 2013, S. Michel 22
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Step 3 • doc3: 18+7+12 = 37 • doc1: 0+9+19 = 28 • Scan line scores: 18+9+19=46 • We cannot stop now, why?
Distributed Data Management, SoSe 2013, S. Michel 23
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Step 4 • doc3: 18+7+12 = 37 • doc1: 0+9+19 = 28; doc4 = 12+0+15= 27 • Scan line scores: 12+9+19=40 • We cannot stop now, why?
Distributed Data Management, SoSe 2013, S. Michel 24
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
7/18/13
5
Step 5 • doc3: 18+7+12 = 37; doc1: 0+9+19 = 28; doc4: 12+0+15 = 27
• Scan line scores: 12+7+19=38 • S9ll cannot stop!
Distributed Data Management, SoSe 2013, S. Michel 25
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Step 6 • doc3: 18+7+12 = 37; doc1: 0+9+19 = 28; doc4: 12+0+15 = 27
• Scan line scores: 12+7+15 = 34 • We can stop!
Distributed Data Management, SoSe 2013, S. Michel 26
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Restric9on of Random Accesses • Recall numbers:
– Disk seek 10,000,000 ns – Read 1 MB sequen9ally from disk 30,000,000 ns – Also network rountrip vs. transfer rate
Distributed Data Management, SoSe 2013, S. Michel 27
• Varia9ons of threshold algorithms that consider tradeoff between random and sequen9al accesses
• Or prohibit random accesses at all • => No Random Access (NRA) Algorithm
NRA • Keep for each item two scores: actually seen score (=worstscore) and upper bound score (=bestscore)
• bestscore = worstscore + best possible scores in lists the item has not been seen before (highi for list i)
• Stop if no (not top-‐k item) has bestscore be_er than score of currently top-‐k item (called mink)
28
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Top-‐1 (k=1) Query
Distributed Data Management, SoSe 2013, S. Michel 29
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Id worstscore bestscore
doc3 18 -‐
mink=18
Call worstscore of document at rank k the mink threshold
Bookkeep
ing
Distributed Data Management, SoSe 2013, S. Michel 30
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Id worstscore bestscore
doc3 18 -‐
doc1 9 -‐
mink=18
Bookkeep
ing
7/18/13
6
Distributed Data Management, SoSe 2013, S. Michel 31
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Id worstscore bestscore
doc3 18 46
doc1 28 46
mink=28
Bookkeep
ing
Distributed Data Management, SoSe 2013, S. Michel 32
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Id worstscore bestscore
doc3 18 46
doc1 28 40
doc4 12 40
mink=28
Bookkeep
ing
Distributed Data Management, SoSe 2013, S. Michel 33
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Id worstscore bestscore
doc3 25 44
doc1 28 40
doc4 12 38
mink=28
Bookkeep
ing
Distributed Data Management, SoSe 2013, S. Michel 34
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Id worstscore bestscore
doc3 25 40
doc1 28 40
doc4 27 34
mink=28
Bookkeep
ing
Distributed Data Management, SoSe 2013, S. Michel 35
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Id worstscore bestscore
doc3 25 40
doc1 28 40
doc4 27 34
doc2 11 33
mink=28
Bookkeep
ing
Distributed Data Management, SoSe 2013, S. Michel 36
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Id worstscore bestscore
doc3 25 40
doc1 28 40
doc4 27 34
doc2 13 28
mink=28
Bookkeep
ing
7/18/13
7
Distributed Data Management, SoSe 2013, S. Michel 37
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Id worstscore bestscore
doc3 37 37
doc1 28 40
doc4 27 34
doc2 13 25
mink=37
Bookkeep
ing
Distributed Data Management, SoSe 2013, S. Michel 38
document score
doc3 18
doc4 12
doc2 11
doc5 4
doc6 2
document score
doc1 9
doc3 7
doc2 2
doc6 1
doc7 1
document score
doc1 19
doc4 15
doc3 12
doc5 5
doc2 2
Id worstscore bestscore
doc3 37 37
doc1 28 32
doc4 27 34
doc2 13 25
doc5 4 18 mink=37
Bookkeep
ing
We can stop!
NRA Bookkeeping and Correctness • Keep all candidate items in memory that have a chance to get into the final top-‐k result
• i.e., can throw away (aka. prune) all candidates with best possible score worse than worstscore of the rank-‐k item
• Observa9on: worstscore is increasing, while bestscore is decreasing (as score at scan lines go down, as lists are sorted in decreasing score order)
Distributed Data Management, SoSe 2013, S. Michel 39
NRA Pseudocode top-‐k := ∅; candidates := ∅; mink := 0; scan all lists Li (i = 1..m) in parallel: consider item d at posi9on posi in Li; E(d) := E(d) ∪ {i}; //remember for each item where we saw it
highi := si(qi,d); //maintain scores at scan line worstscore(d) := aggr{sν(qν,d)|ν∈E(d)};
bestscore(d):= aggr{aggr{sν(qν,d)|ν∈E(d)}, aggr{highν|ν∉E(d)}}; if worstscore(d) > mink then //put it into top-‐k set remove argmind’{worstscore(d’)|d’∈top-‐k} from top-‐k; add d to top-‐k mink := min{worstscore(d’) | d’ ∈ top-‐k}; //update mink else if bestscore(d) > mink then //else keep it as candidate candidates := candidates ∪ {d}; threshold := max {bestscore(d’) | d’∈ candidates}; if threshold ≤ mink then exit;
Distributed Data Management, SoSe 2013, S. Michel 40
Observa9on: pruning oien overly conservaQve (deep scans, high memory consump9on)
Evolu9on of a Candidate’s Score
• Approximate top-‐k – “What is the probability that d qualifies for the top-‐k ?”
Distributed Data Management, SoSe 2013, S. Michel 41
scan depth
bestscored
worstscored
mink
score drop d
from the candidate queue
Probabilis9c Pruning
• NRA based on invariant
• Relaxed into probabilis2c threshold test
• Or equivalently, with
si (d) ! s(d) ! si (d)+ highi"E (d )#
i$E (d )#
i$E (d )#
p(d) := P si (d)+ si (d)i!E (d )"
i#E (d )" >mink
$
%&&
'
())* !
Distributed Data Management, SoSe 2013, S. Michel
bestscored
worstscored
mink
δ(d)
!(d ) :=mink ! {si | i " E (d )# }
worstscored bestscored
p(d) = P si (d)> !(d)i!E (d )"
#
$%%
&
'(() "
Theobald et al.: Top-‐k Query Evalua9on with Probabilis9c Guarantees. VLDB 2004: 648-‐659 42
7/18/13
8
Expected Result Quality • Missing relevant items: • Probability pmiss of missing a true top-‐k object equals the probability of erroneously dropping a candidate from considera9on
• For each candidate pmiss ≤ ε • P[recall = r/k] = P[precision = r/k] =
• E[precision] = E[recall] =
)()1( rkmiss
rmiss pp
rk −−⎟⎟⎠
⎞⎜⎜⎝
⎛
P[precision = r / k]* r / k=(1!!)r=0..k"
Distributed Data Management, SoSe 2013, S. Michel 43
recall = |returned relevant docs| / |all relevant docs| precision = |returned relevant docs| / |returned docs|
Score Es9ma9on • Pre-‐compute model of score distribu9on in index list
• Use it at run9me to obtain expecta9on of score. • Fi}ng distribu9on func9on or by building histograms (more robust)
images source :hAp://en.wikipedia.org/wiki/Cumula2ve_distribu2on_func2on Distributed Data Management, SoSe 2013, S. Michel 44
Recap (?) Histograms
• A histogram par99ons a domain into cells (also called buckets). For bucket the number of elements that fall into this bucket is kept.
• Two basic histogram kinds: – Equi Width: Buckets have the same width (=size on “x-‐axis”)
• E.g., 100 buckets on [0,1] interval – Equi Depth: Buckets have same height (“y-‐axis”)
• to achieve this, width is adapted
Distributed Data Management, SoSe 2013, S. Michel 45
0
S1
1 high1
Score Es9ma9ons
• Build equi-‐width histogram for each index list’s score distribu9on.
Distributed Data Management, SoSe 2013, S. Michel 46
• Then can lookup probability that score is larger or smaller than a specific value.
• Considering also that we read already parts of the list (un9l score = highi)
Convolu9on
Distributed Data Management, SoSe 2013, S. Michel 47
( f *h)(l) = f (i)*h( j)l=i+ j!
0
S1
1 high1
S2
high2 1 0
Convolution (S1,S2)
2 0 δ(d)
P[d gets in the final top-k] =
Illustra9ons on this slide are based on material from Mar9n Theobald
Given two histograms, compute histogram that represents the score distribu9on aier summa9on:
Sample Varia9ons • Approximate version of TA: allow to stop already earlier; if seen at least k items with score larger or equal to τ/θ (τ is scan line score; θ>1)*
• Combined algorithm (CA): cost model to trade off random and sequen9al accesses.*
• Data sources can be random access only or sequen9al access only (or both), par9cularly on the Web.
Distributed Data Management, SoSe 2013, S. Michel 48
N. Bruno, L. Gravano, and A. Marian. Proc. of the 18th IEEE Interna9onal Conference on Data Engineering (ICDE’02), 2002.
*) Fagin et al. Op9mal aggrega9on algorithms for middleware. J. Comput. Syst. Sci. 66(4): 614-‐656 (2003)
7/18/13
9
Top-‐k Queries in Distributed Environments
• Each index list is stored at different node (in general) in a (possibly wide area) network
• Key Observa9ons: – Network traffic is crucial – Number of round trips is crucial
• Straight forward applica9on of TA/NRA? – expensive: huge number of rounds trips – even with batching: unpredictable performance
Distributed Data Management, SoSe 2013, S. Michel 49
query ini9ator
Three Phase Uniform Threshold Algorithm (TPUT)
Distributed Data Management, SoSe 2013, S. Michel 50
Exactly 3 phases: 1. fetch k best entries (d, sj) from each of N1 ... Nm and
aggregate (∑j=1..m sj(d)) at query ini9ator 2. ask each of N1 ... Nm for all entries with sj > mink / m and
aggregate results at query ini9ator. min-‐k is score of item currently at rank k.
3. fetch missing scores for all candidates by random lookups at N1 ... Nm
Distributed top-‐k algorithm with fixed number of phases!
Pei Cao, Zhe Wang: Efficient top-‐K query calcula9on in distributed networks. PODC 2004: 206-‐215
...
Index List
Node Ni
Coordinator current top-‐k -‐
candidate set
...
score
Index List
Node Nj
score
top k top
k
cand
idates
cand
idates
mink / m mink / m
min-‐k / m
Retrieve missing scores
Retrieve
missing
scores
Distributed Data Management, SoSe 2013, S. Michel 51
Correctness of TPUT • Theorem: TPUT is an exact algorithm, i.e. iden9fies the true top-‐k items
Distributed Data Management, SoSe 2013, S. Michel
n Proof (sketch): TPUT cannot miss a true top-‐k item. Assume it misses one, i.e. item is below mink/m in all lists.
à overall score < mink à not a true top-‐k item!
list 1 list 2 list 3
mink score < mink
State aier phase 2:
52
Op9miza9ons • Performance depends on choice of threshold mink/m
• (But proof as well)
• Tradeoffs in result quality are (par9ally) acceptable
Distributed Data Management, SoSe 2013, S. Michel 53
Increase of mink/m Threshold
• Get extra informa9on about items’ scores • But with li_le overhead, compact
Distributed Data Management, SoSe 2013, S. Michel 54
1 1 1 0 0 0 • Create Bloom filter synopses for each index list?
• Very coarse. Can say if it is in or not (modulo false posi9ves), not how good the score is
0 1
?
7/18/13
10
Increase of mink/m Threshold (Cont’d) • Create one Bloom filter for each histogram cell. • In phase 1: get k from each list + “some” bloom filters for top (high score) cells
Distributed Data Management, SoSe 2013, S. Michel 55
0 1
• Es9mate score of item based on actually seen one plus lower bound score of histogram cell of filter it is in (conserva9ve es9ma9on)
01010101010111
111011101010111
1010101111111
……
Increase of mink/m Threshold
...
Index List
Node Ni
Coordinator current top-‐k -‐
candidate set
...
score
Index List
Node Nj
Histogram Histogram
b bits
0 0
0 1
0 1
1 0
0 0
0 1
0 1
1 0
0 1
0 1
1 0
1 0
0 1
0 1
1 0
1 0
0 1
0 0
1 0
1 0
0 1
0 0
1 0
1 0
0 0
0 1
0 0
1 0
0 0
0 1
0 0
1 0
0 1
0 0
0 1
1 1
0 1
0 0
0 1
1 1
c cells
b bits
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 0
0 1
1 1
0 1
0 0
0 1
1 1
0 1
0 1
0 0
0 0
0 0
0 1
0 0
0 0
0 0
0 0
0 0
0 0
1 0
0 0
0 0
0 0
1 0
0 1
0 0
1 1
1 0
0 1
0 0
1 1
1 0
c cells
score
top k top
k
cand
idates
cand
idates
mink / m mink / m
Distributed Data Management, SoSe 2013, S. Michel 56
Further Op9miza9ons
• Non uniform thresholds
• Node sampling
• Hierarchical aggrega9on
Distributed Data Management, SoSe 2013, S. Michel 57
Literature • Ronald Fagin: Combining Fuzzy Informa9on from Mul9ple Systems. J.
Comput. Syst. Sci. 58(1): 83-‐99 (1999) • Ronald Fagin, Amnon Lotem, Moni Naor: Op9mal aggrega9on algorithms
for middleware. J. Comput. Syst. Sci. 66(4): 614-‐656 (2003) • Mar9n Theobald, Gerhard Weikum, Ralf Schenkel: Top-‐k Query Evalua9on
with Probabilis9c Guarantees. VLDB 2004: 648-‐659 • Pei Cao, Zhe Wang: Efficient top-‐K query calcula9on in distributed
networks. PODC 2004: 206-‐215 • Sebas9an Michel, Peter Triantafillou, Gerhard Weikum: KLEE: A
Framework for Distributed Top-‐k Query Algorithms. VLDB 2005: 637-‐648
Distributed Data Management, SoSe 2013, S. Michel 58