ludwig- maximilians- university munich database systems group department institute for informatics...

34
LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DATABASE SYSTEMS GROUP DEPARTMENT INSTITUTE FOR INFORMATICS Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data Thomas Bernecker, Tobias Emrich , Hans-Peter Kriegel, Matthias Renz, Stefan Zankl and Andreas Zuefle Ludwig-Maximilians-Universität München (LMU) Munich, Germany http://www.dbs.ifi.lmu.de {bernecker, emrich, kriegel, renz, zuefle} @dbs.ifi.lmu.de

Upload: wendy-murphy

Post on 17-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

LUDWIG-MAXIMILIANS-UNIVERSITYMUNICH

DATABASESYSTEMSGROUP

DEPARTMENTINSTITUTE FORINFORMATICS

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel,

Matthias Renz, Stefan Zankl and Andreas Zuefle

Ludwig-Maximilians-Universität München (LMU)Munich, Germanyhttp://www.dbs.ifi.lmu.de{bernecker, emrich, kriegel, renz, zuefle} @dbs.ifi.lmu.de

DATABASESYSTEMSGROUP

• Background Uncertain Data Model Reverse k-nearest neighbour queries Reverse k-nearest neighbour queries on uncertain objects

• Framework for Probabilistic RkNN Processing Approximation Spatial Filter Probabilistic Filter Verification

• Evaluation + Summary

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 2

Outline

DATABASESYSTEMSGROUP

Objects are described by a multi-dimensional probability distribution Object Independence Assumption Queries are answered according to possible worlds semantic Object PDFs can be spatially bounded Continuous or discrete representation

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 3

Background DatamodelFramework RkNN QueriesSummary PRkNN Queries

Uncertain Attribute a

PDFX

Uncertain Attribute b

Act

ion

Humor

User ratings for „Life of Brian“

DATABASESYSTEMSGROUP

RkNN(q) = {o DB | q kNN(o)}

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 4

o1o2

o3 o4o5

o6

o7

q

R1NN(q) = {o7}R2NN(q) = {o7, o5, o4}

What is it good for?

Market segmentation Outlier detection Incremental algorithms …

Background DatamodelFramework RkNN QueriesSummary PRkNN Queries

DATABASESYSTEMSGROUP

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 5

O1

O2O‘

Q

„Is O‘ R1NN of Q?“

Note: The query object may be uncertain.as well!

Background DatamodelFramework RkNN QueriesSummary PRkNN Queries

DATABASESYSTEMSGROUP

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 6

O1

O2O‘

Q

„Is O‘ R1NN of Q?“=> In some worlds it is

Background DatamodelFramework RkNN QueriesSummary PRkNN Queries

DATABASESYSTEMSGROUP

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 7

O1

O2O‘

Q

„Is O‘ R1NN of Q?“=> In other worlds it is not

Background DatamodelFramework RkNN QueriesSummary PRkNN Queries

DATABASESYSTEMSGROUP

Definition of Probabilistic RkNN

PRkNN(Q, τ) = {O DB | P(O RkNN(Q)) ≥ τ}{O DB | P(Q kNN(O)) ≥ τ}

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 8

O1

O2O‘

QP(Q 1NN(O‘)) = 21/24e.g. O‘ PR1NN(Q, 0.5)

Background DatamodelFramework RkNN QueriesSummary PRkNN Queries

DATABASESYSTEMSGROUP

Framework for PRkNN query processing Approximation (Indexing)

• Simplification of spatial-probabilistic keys

Spatial Filter• Filter objects according to simple spatial keys

Probabilistic Filter• Derive lower/upper bounds of qualification probability (by means

of simple spatial-probabilistic keys)• Filter objects according to lower/upper probability bounds

Verification• Computation of the exact probability (very expensive)• Monte-Carlo Sampling (many samples required)

9Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Background ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

DATABASESYSTEMSGROUP

R*-Tree for indexing objects (global index)

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 10

Q

Background ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

DATABASESYSTEMSGROUP

AR*-Tree for indexing instances (local index)

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 11

Background ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

0.15

0.1 0.2

0.150.1

0.150.15

0.25

0.45

0.3

1.0

DATABASESYSTEMSGROUP

Pruning based on rectangular approximations only [1].

[1] Tobias Emrich, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Andreas Züfle: Boosting Spatial Pruning: On Optimal

Pruning of MBRs. SIGMOD Conference 2010: 39-50 12

B

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

For any O‘ in this region, O is closer than Q.

For any O‘ in this region, O is not closer than Q.

For any O‘ intersecting this region, Q may possibly be closer than O.

O

Q

Task

Find k objects O DB\O‘ which are closer to O‘ than to Q

Background ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

DATABASESYSTEMSGROUP

Probability of O to be closer to O‘ than Q?

13

B

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

O

QO‘

Background ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

DATABASESYSTEMSGROUP

Probability of O to be closer to O‘ than Q?

„O is closer to O‘ than Q with at least x% probability“

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 14

Q

O

O‘

Background ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

DATABASESYSTEMSGROUP

Probability of O to be closer to O‘ than Q?

„O is closer to O‘ than Q with at most x% probability“

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 15

Q

O

O‘

Background ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

DATABASESYSTEMSGROUP

16

Exemplary statements• O1 is closer to O’ with at least 20% and at most 50%

• O2 is closer to O’ with at least 60% and at most 80%

• Correctly deriving these bounds is not trivial (see paper)

How many objects O DB are closer to O‘ than Q?

Consider the following uncertain generating function• x-term: probability of the object to be closer to O’ than Q• z-term: probability of the object to be further from O’ than Q• y-term: uncertainty

=> (0.2x + 0.3y + 0.5z) * (0.6x + 0.2y + 0.2z)

Expansion yields0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Background ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

DATABASESYSTEMSGROUP

21

17

0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

0

20 %

40 %

60 %

80 %

# objects O DB that are closer to O‘ than Q

pro

ba

bili

ty

Background ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

DATABASESYSTEMSGROUP

21

18

0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

0

20 %

40 %

60 %

80 %

pro

ba

bili

ty

Background ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

# objects O DB that are closer to O‘ than Q

DATABASESYSTEMSGROUP

21

19

0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

0

20 %

40 %

60 %

80 %

pro

ba

bili

ty

Background ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

# objects O DB that are closer to O‘ than Q

DATABASESYSTEMSGROUP

21

20

0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

0

20 %

40 %

60 %

80 %

pro

ba

bili

ty

Background ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

# objects O DB that are closer to O‘ than Q

DATABASESYSTEMSGROUP

21

21

0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

0

20 %

40 %

60 %

80 %

pro

ba

bili

ty

Background ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

# objects O DB that are closer to O‘ than Q

DATABASESYSTEMSGROUP

21

22

0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

0

20 %

40 %

60 %

80 %

pro

ba

bili

ty

Background ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

# objects O DB that are closer to O‘ than Q

DATABASESYSTEMSGROUP

• Example PRkNN queries– PR1NN (Q, 50%) O‘ is not part of the result– PR2NN (Q, 40%) O‘ is part of the result– PR2NN (Q, 80%) O‘ has to be further investigated

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 23

210

20 %

40 %

60 %

80 %

Exact # objects O DB that are closer to O‘ than Q

prob

abili

tyBackground ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

210

20 %

40 %

60 %

80 %

prob

abili

ty

100 %

Maximum # objects O DB that are closer to O‘ than Q

DATABASESYSTEMSGROUP

• Example PRkNN queries– PR1NN (Q, 50%) O‘ is not part of the result– PR2NN (Q, 40%) O‘ is part of the result– PR2NN (Q, 80%) O‘ has to be further investigated

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 24

210

20 %

40 %

60 %

80 %

Exact # objects O DB that are closer to O‘ than Q

prob

abili

tyBackground ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

210

20 %

40 %

60 %

80 %

prob

abili

ty

100 %

Maximum # objects O DB that are closer to O‘ than Q

DATABASESYSTEMSGROUP

• Example PRkNN queries– PR1NN (Q, 50%) O‘ is not part of the result– PR2NN (Q, 40%) O‘ is part of the result– PR2NN (Q, 80%) O‘ has to be further investigated

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 25

210

20 %

40 %

60 %

80 %

Exact # objects O DB that are closer to O‘ than Q

prob

abili

tyBackground ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

210

20 %

40 %

60 %

80 %

prob

abili

ty

100 %

Maximum # objects O DB that are closer to O‘ than Q

DATABASESYSTEMSGROUP

• Example PRkNN queries– PR1NN (Q, 50%) O‘ is not part of the result– PR2NN (Q, 40%) O‘ is part of the result– PR2NN (Q, 80%) O‘ has to be further investigated

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 26

210

20 %

40 %

60 %

80 %

Exact # objects O DB that are closer to O‘ than Q

prob

abili

tyBackground ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

210

20 %

40 %

60 %

80 %

prob

abili

ty

100 %

Maximum # objects O DB that are closer to O‘ than Q

DATABASESYSTEMSGROUP

Options for Verification

Consideration of all possible worlds (exponential)

Adabting probabilistic nearest neighbour ranking [2] on instance level of objects (polynomial)

Monte-Carlo based (linear in the number of samples)

[2] Jian Li, Barna Saha, Amol Deshpande: A Unified Approach to Ranking in Probabilistic Databases. PVLDB 2(1): 502-513 (2009)

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 27

Background ApproximationFramework Spatial FilterSummary Probabilistic Filter

Verification

DATABASESYSTEMSGROUP

Spatial Filter

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 28

Background EvaluationFramework ConclusionSummary

DATABASESYSTEMSGROUP

Probabilitsic Filter

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 29

Background EvaluationFramework ConclusionSummary

DATABASESYSTEMSGROUP

Comparison to other algorithms

Background EvaluationFramework ConclusionSummary

DATABASESYSTEMSGROUP

• Framework for PRkNN query processing

• Deriving probabilistic pruning bounds for single objects

• Accumulate theses bounds using uncertain generating functions

• Cost model for choosing the optimal value for tree depth

• Comparison to existing algorithms for PRNN processing

Background EvaluationFramework ConclusionSummary

DATABASESYSTEMSGROUP

32Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Thanks!

Questions?

DATABASESYSTEMSGROUP

Dependency on k

05

1015202530354045

1 5 10 15 20

verific ation

probabilis tic pruning

s patial pruning

runti

me

(sec

)

k

DATABASESYSTEMSGROUP

Problem of dependency

O1, O2

O’

Q