ludwig- maximilians- university munich database systems group department institute for informatics...
TRANSCRIPT
LUDWIG-MAXIMILIANS-UNIVERSITYMUNICH
DATABASESYSTEMSGROUP
DEPARTMENTINSTITUTE FORINFORMATICS
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel,
Matthias Renz, Stefan Zankl and Andreas Zuefle
Ludwig-Maximilians-Universität München (LMU)Munich, Germanyhttp://www.dbs.ifi.lmu.de{bernecker, emrich, kriegel, renz, zuefle} @dbs.ifi.lmu.de
DATABASESYSTEMSGROUP
• Background Uncertain Data Model Reverse k-nearest neighbour queries Reverse k-nearest neighbour queries on uncertain objects
• Framework for Probabilistic RkNN Processing Approximation Spatial Filter Probabilistic Filter Verification
• Evaluation + Summary
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 2
Outline
DATABASESYSTEMSGROUP
Objects are described by a multi-dimensional probability distribution Object Independence Assumption Queries are answered according to possible worlds semantic Object PDFs can be spatially bounded Continuous or discrete representation
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 3
Background DatamodelFramework RkNN QueriesSummary PRkNN Queries
Uncertain Attribute a
PDFX
Uncertain Attribute b
Act
ion
Humor
User ratings for „Life of Brian“
DATABASESYSTEMSGROUP
RkNN(q) = {o DB | q kNN(o)}
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 4
o1o2
o3 o4o5
o6
o7
q
R1NN(q) = {o7}R2NN(q) = {o7, o5, o4}
What is it good for?
Market segmentation Outlier detection Incremental algorithms …
Background DatamodelFramework RkNN QueriesSummary PRkNN Queries
DATABASESYSTEMSGROUP
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 5
O1
O2O‘
Q
„Is O‘ R1NN of Q?“
Note: The query object may be uncertain.as well!
Background DatamodelFramework RkNN QueriesSummary PRkNN Queries
DATABASESYSTEMSGROUP
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 6
O1
O2O‘
Q
„Is O‘ R1NN of Q?“=> In some worlds it is
Background DatamodelFramework RkNN QueriesSummary PRkNN Queries
DATABASESYSTEMSGROUP
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 7
O1
O2O‘
Q
„Is O‘ R1NN of Q?“=> In other worlds it is not
Background DatamodelFramework RkNN QueriesSummary PRkNN Queries
DATABASESYSTEMSGROUP
Definition of Probabilistic RkNN
PRkNN(Q, τ) = {O DB | P(O RkNN(Q)) ≥ τ}{O DB | P(Q kNN(O)) ≥ τ}
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 8
O1
O2O‘
QP(Q 1NN(O‘)) = 21/24e.g. O‘ PR1NN(Q, 0.5)
Background DatamodelFramework RkNN QueriesSummary PRkNN Queries
DATABASESYSTEMSGROUP
Framework for PRkNN query processing Approximation (Indexing)
• Simplification of spatial-probabilistic keys
Spatial Filter• Filter objects according to simple spatial keys
Probabilistic Filter• Derive lower/upper bounds of qualification probability (by means
of simple spatial-probabilistic keys)• Filter objects according to lower/upper probability bounds
Verification• Computation of the exact probability (very expensive)• Monte-Carlo Sampling (many samples required)
9Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
DATABASESYSTEMSGROUP
R*-Tree for indexing objects (global index)
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 10
Q
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
DATABASESYSTEMSGROUP
AR*-Tree for indexing instances (local index)
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 11
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
0.15
0.1 0.2
0.150.1
0.150.15
0.25
0.45
0.3
1.0
DATABASESYSTEMSGROUP
Pruning based on rectangular approximations only [1].
[1] Tobias Emrich, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Andreas Züfle: Boosting Spatial Pruning: On Optimal
Pruning of MBRs. SIGMOD Conference 2010: 39-50 12
B
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
For any O‘ in this region, O is closer than Q.
For any O‘ in this region, O is not closer than Q.
For any O‘ intersecting this region, Q may possibly be closer than O.
O
Q
Task
Find k objects O DB\O‘ which are closer to O‘ than to Q
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
DATABASESYSTEMSGROUP
Probability of O to be closer to O‘ than Q?
13
B
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
O
QO‘
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
DATABASESYSTEMSGROUP
Probability of O to be closer to O‘ than Q?
„O is closer to O‘ than Q with at least x% probability“
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 14
Q
O
O‘
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
DATABASESYSTEMSGROUP
Probability of O to be closer to O‘ than Q?
„O is closer to O‘ than Q with at most x% probability“
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 15
Q
O
O‘
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
DATABASESYSTEMSGROUP
16
Exemplary statements• O1 is closer to O’ with at least 20% and at most 50%
• O2 is closer to O’ with at least 60% and at most 80%
• Correctly deriving these bounds is not trivial (see paper)
How many objects O DB are closer to O‘ than Q?
Consider the following uncertain generating function• x-term: probability of the object to be closer to O’ than Q• z-term: probability of the object to be further from O’ than Q• y-term: uncertainty
=> (0.2x + 0.3y + 0.5z) * (0.6x + 0.2y + 0.2z)
Expansion yields0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
DATABASESYSTEMSGROUP
21
17
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
0
20 %
40 %
60 %
80 %
# objects O DB that are closer to O‘ than Q
pro
ba
bili
ty
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
DATABASESYSTEMSGROUP
21
18
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
0
20 %
40 %
60 %
80 %
pro
ba
bili
ty
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
# objects O DB that are closer to O‘ than Q
DATABASESYSTEMSGROUP
21
19
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
0
20 %
40 %
60 %
80 %
pro
ba
bili
ty
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
# objects O DB that are closer to O‘ than Q
DATABASESYSTEMSGROUP
21
20
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
0
20 %
40 %
60 %
80 %
pro
ba
bili
ty
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
# objects O DB that are closer to O‘ than Q
DATABASESYSTEMSGROUP
21
21
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
0
20 %
40 %
60 %
80 %
pro
ba
bili
ty
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
# objects O DB that are closer to O‘ than Q
DATABASESYSTEMSGROUP
21
22
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
0
20 %
40 %
60 %
80 %
pro
ba
bili
ty
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
# objects O DB that are closer to O‘ than Q
DATABASESYSTEMSGROUP
• Example PRkNN queries– PR1NN (Q, 50%) O‘ is not part of the result– PR2NN (Q, 40%) O‘ is part of the result– PR2NN (Q, 80%) O‘ has to be further investigated
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 23
210
20 %
40 %
60 %
80 %
Exact # objects O DB that are closer to O‘ than Q
prob
abili
tyBackground ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
210
20 %
40 %
60 %
80 %
prob
abili
ty
100 %
Maximum # objects O DB that are closer to O‘ than Q
DATABASESYSTEMSGROUP
• Example PRkNN queries– PR1NN (Q, 50%) O‘ is not part of the result– PR2NN (Q, 40%) O‘ is part of the result– PR2NN (Q, 80%) O‘ has to be further investigated
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 24
210
20 %
40 %
60 %
80 %
Exact # objects O DB that are closer to O‘ than Q
prob
abili
tyBackground ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
210
20 %
40 %
60 %
80 %
prob
abili
ty
100 %
Maximum # objects O DB that are closer to O‘ than Q
DATABASESYSTEMSGROUP
• Example PRkNN queries– PR1NN (Q, 50%) O‘ is not part of the result– PR2NN (Q, 40%) O‘ is part of the result– PR2NN (Q, 80%) O‘ has to be further investigated
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 25
210
20 %
40 %
60 %
80 %
Exact # objects O DB that are closer to O‘ than Q
prob
abili
tyBackground ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
210
20 %
40 %
60 %
80 %
prob
abili
ty
100 %
Maximum # objects O DB that are closer to O‘ than Q
DATABASESYSTEMSGROUP
• Example PRkNN queries– PR1NN (Q, 50%) O‘ is not part of the result– PR2NN (Q, 40%) O‘ is part of the result– PR2NN (Q, 80%) O‘ has to be further investigated
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 26
210
20 %
40 %
60 %
80 %
Exact # objects O DB that are closer to O‘ than Q
prob
abili
tyBackground ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
210
20 %
40 %
60 %
80 %
prob
abili
ty
100 %
Maximum # objects O DB that are closer to O‘ than Q
DATABASESYSTEMSGROUP
Options for Verification
Consideration of all possible worlds (exponential)
Adabting probabilistic nearest neighbour ranking [2] on instance level of objects (polynomial)
Monte-Carlo based (linear in the number of samples)
[2] Jian Li, Barna Saha, Amol Deshpande: A Unified Approach to Ranking in Probabilistic Databases. PVLDB 2(1): 502-513 (2009)
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 27
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter
Verification
DATABASESYSTEMSGROUP
Spatial Filter
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 28
Background EvaluationFramework ConclusionSummary
DATABASESYSTEMSGROUP
Probabilitsic Filter
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 29
Background EvaluationFramework ConclusionSummary
DATABASESYSTEMSGROUP
Comparison to other algorithms
Background EvaluationFramework ConclusionSummary
DATABASESYSTEMSGROUP
• Framework for PRkNN query processing
• Deriving probabilistic pruning bounds for single objects
• Accumulate theses bounds using uncertain generating functions
• Cost model for choosing the optimal value for tree depth
• Comparison to existing algorithms for PRNN processing
Background EvaluationFramework ConclusionSummary
DATABASESYSTEMSGROUP
32Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Thanks!
Questions?
DATABASESYSTEMSGROUP
Dependency on k
05
1015202530354045
1 5 10 15 20
verific ation
probabilis tic pruning
s patial pruning
runti
me
(sec
)
k