finding probabilistic nearest neighbors for query objects with imprecise locations

35
Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations Yuichi Iijima and Yoshiharu Ishikawa Nagoya University, Japan

Upload: romeo

Post on 23-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations. Yuichi Iijima and Yoshiharu Ishikawa Nagoya University, Japan. Outline. Background and Problem Formulation Related Work Query Processing Strategies Experimental Results Conclusions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Finding Probabilistic Nearest Neighbors for Query Objects with

Imprecise Locations

Yuichi Iijima and Yoshiharu IshikawaNagoya University, Japan

Page 2: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Outline

• Background and Problem Formulation• Related Work• Query Processing Strategies• Experimental Results• Conclusions

2

Page 3: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

3

Imprecise Location Information

• Sensor Environments– Measurement Noise– Frequent updates may not be possible

• GPS-based positioning consumes batteries• Robotics

– Localization using sensing andmovement histories

– Probabilistic approach hasvagueness

• Privacy– Location Anonymity

Page 4: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

4

Nearest Neighbor Queries

• Nearest Neighbor Queries– Example: Find the closest bus stop– Traditional problem in spatial databases

• Efficient query processing using spatial indices• Extensible to multi-dimensional cases (e.g., image

retrieval)• What happen if the

location of query object is uncertain? q

NN of q

Page 5: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Example Scenario (1)

• Query: Find the nearest gas station

5

Location estimation based on noisy GPS data

35%

10%

55%

Nearest gas stationdepends on the possiblecar locations

0%

NN objects are definedin a probabilistic way

Page 6: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Example Scenario (2)

• Mobile Robotics– Location of the robot is estimated

based on movement histories andsensor data

– Measurements are noisy– Localization based on probabilistic

modeling• Kalman filter, particle filter, etc.

– Estimated location is given as aprobabilistic density function (PDF)• PDF changes on each estimation

6

Page 7: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

7

Probabilistic Nearest Neighbor Query (1)

• PNNQ for short• Assumptions

– Location of query object q is specified as a Gaussian distribution

– Target data: static points• Gaussian Distribution

–Σ: Covariance matrix

)()(

21exp

||)2(1)( 1

2/12/ qxqxx ΣΣ

tdqp

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08

-4-2

0 2

4

0 0.02 0.04 0.06 0.08

pq(x)

Page 8: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

8

Probabilistic Nearest Neighbor Query (2)

• Definition

• Find objects which satisfy the condition– The probability that

the object is thenearest neighbor of q is greater than or equal to θ

q

),','Pr(),(Pr o'xox ooΟooqNN

}),(Pr,|{),( oqOooqPNNQ NN

Page 9: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Outline

• Background and Problem Formulation• Related Work• Query Processing Strategies• Experimental Results• Conclusions

9

Page 10: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Related Work

• Query processing methods for uncertain (location) data– Cheng, Prabhakar, et al. (SIGMOD’03, VLDB’04, …)– Tao et al. (VLDB’05, TODS’07)– Consider arbitrary PDFs or uniform PDFs– Most of the case, target objects are imprecise

• Research related to Gaussian distribution– Gauss-tree [Böhm et al., ICDE’06]

• Target objects are based on Gaussian distributions

• Our former work– Ishikawa, Iijima, Yu (ICDE’09): Probabilistic

range queries10

Page 11: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Outline

• Background and Problem Formulation• Related Work• Query Processing Strategies• Experimental Results• Conclusions

11

Page 12: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Naïve Approach (1)

• Use of Voronoi Diagram– Well-known method for standard (non-imprecise)

nearest neighbor queries

12

a b

e

f g

c

d

h

Voronoi region Ve

Page 13: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Naïve Approach (2)

• PrNN(q, o): Nearest neighbor probability– Probability that object o is the nearest

neighbor of query object q– It can be computed by integrating the

probability density function pq(x) over Voronoi region Vo

• Problem– Need to consider all target

objects– Numerical integration (Monte

Carlo method) is quite costly! 13

a b

e

fg

c

d

h

qpq(x)

Ve

oV qNN dpoq xx)(),(Pr

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08

-4-2

0 2

4

0 0.02 0.04 0.06 0.08

pq(x)

Page 14: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Our Approach

• Outline of processing1. Filtering

• Prune non-candidate objects whose PrNN are obviously smaller than the threshold

• Low-cost filtering conditions2. Numerical integration for the remaining

candidate objects• Two strategies

– -region-based Approach– SES-based Approach

• SES: Smallest Enclosing Sphere

14

Page 15: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Strategy 1: -Region-Based Approach (1)

• -region– Similar concepts are often used in query processing for

uncertain spatial databases– Definition: Ellipsoidal region for which the result of the

integration becomes 1 – 2 :

The ellipsoidal region

is the -region• Example: is specified as 1%,

we consider 98% -region15

21 )()(21)(

r qt

dpqxqx

xxΣ

21)()( r

t

qxqx Σ

q

-region

Page 16: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Strategy 1: -Region-Based Approach (2)

• Query Processing1. -region for the query is computed at first

• -region can be derived using r-table:– It is constructed for the normal Gaussian (S = I, q = 0): Given ,

it returns appropriate r

• Final -region can be obtained by transformation2. Derive the bounding box of

-region 3. Objects whose Voronoi

regions overlaps with the box are the candidates

• {a, c, d, f, g}, in this example16

q

b

e

h

θ-region

ac

f

dg

Page 17: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Strategy 1: -Region-Based Approach (3)

• Derivation details• First, we consider the

normal Gaussian– Using r-table, we can get

the appropriate r for given

• Second, a spherical -region is derived based on transformation – Transformation is performed

by analyzing S

• Third, the bonding box is calculated

where (Σ)ii is the (i, i) entry of Σ

17

qr

q

iii

ii rw

)(Σ

qwj

wj

wi wi

xi

xj

Page 18: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Strategy 2: SES-Based Approach (1)

• SES: Smallest Enclosing Sphere– For each Voronoi region Vo, we compute its SES

SESo beforehand

18

SESe: SES for Voronoi region of e

a b

e

f g

c

d

h

Page 19: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Strategy 2: SES-Based Approach (2)

• Integration over SESo gives the upper-bound for PrNN(q, o)

– Integration over a sphere region is more easier to compute• We use a table called

U-catalog constructed beforehand

19

oo SES qV qNN dpdpoq xxxx )()(),(Pr

a b

e

fg

c

d

h

qpq(x)

Page 20: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Strategy 2: SES-Based Approach (3)

• What is U-catalog?– Given two parameters and , it returns corresponding

integral– U-catalog is made for different (, ) pairs by computing

the integral of normal Gaussian over sphere region R

20

xxx

dpR

)(),( norm

R

q

α δ π(α, δ)0.0 0.1 …0.0 0.2 …… … …1.0 0.1… … …

Page 21: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Strategy 2: SES-Based Approach (4)

• To use U-catalog, another approximation is required since it is only useful for normal Gaussian– Use of upper-bounding function pq(x)– pq(x) tightly bounds pq(x) and has spherical isosurfaces– For pq(x), we can easily derive its integral over SESo using

U-catalog• In summary, we use two

approximations:

21

ab

e

fg

c

d

h

q

pq(x)

o

o

o

SES q

SES q

V qNN

dp

dp

dpoq

xx

xx

xx

)(

)(

)(),(Pr

T

pq(x)

T

T

T

T

Page 22: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Strategy 2: SES-Based Approach (5)

• Bounding Function– Original Gaussian PDF

– Upper-Bounding Function

where

22

)()(

21exp

)2(1)( 1

2/12/qxqxx Σ

Σt

dqp

2

2/12/ 2exp

)2(1)( qxx

TT

Σdqp

}min{1

1

i

d

i

tiii

T

vvΣ )()( xx Tqq pp

is satisfied

Page 23: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Outline

• Background and Problem Formulation• Related Work• Query Processing Strategies• Experimental Results• and Conclusions

23

Page 24: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

24

Setup of Experiments (1)

• Target Data– 2D point data (50K entries)– Based on road line

segments in Long Beach• Computation of Voronoi

regions and SESs– LEDA package was used

• Comparison– Strategies 1, 2, and their hybrid approach– Evaluation metric: Response time

Page 25: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

25

Setup of Experiments (2)

• Default Parameters– Covariance matrix

• For this matrix the shape of the isosurface of pq(x) is an ellipse titled at 30 degrees and the major-to-minor axis ratio is 3:1

• γ : Parameter for controlling vagueness (default: γ = 10)– Probability threshold value: θ = 0.01– No. of samples for Monte Carlo method: 1,000,000

732327Σ

qpq(x)

Page 26: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Candidate Objects in Strategy 1

26

Voronoi regions for candidate objects

Voronoi regions for answer objects

Bounding box of the θ-region

Query center

Page 27: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Candidate Objects in Strategy 2

27

Voronoi regions for candidate objects

Voronoi regions for answer objects

Query center

Page 28: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Candidate Objects in Hybrid Strategy

28

Voronoi regions for candidate objects

Voronoi regions for answer objects

Bounding box of the θ-region

Query Center

Page 29: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Experimental Results - Default Parameters

• Most of the processing time is spent in numerical integration

• A strategy that can prune more objects has better performance

29

Number of candidates 179 150 129Number of answers 26

Strategy 1 Strategy 2 Hybrid0.00 5.00

10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00

0.09 0.19 0.13

43.70 38.70

34.13

2.45 2.48

2.47

Filtering Compute Prob. Rest

Res

pons

e Ti

me[

s]

Page 30: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Experimental Results - Different γ Values

30

Number of candidates 24 31 23Number of answers 8

γ = 1(almost exact)

γ = 10 γ = 50(too vague)

Strategy 1

Strategy 2

Hybrid0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00

10.00

0.09 0.13 0.13

6.28 6.05 5.53

2.45 2.48 2.48

Res

pons

e Ti

me[

s]

Strategy 1

Strategy 2

Hybrid0.00 5.00

10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00

0.09 0.19 0.13

43.70 38.70 34.13

2.45 2.48

2.47

Filtering Compute Prob. Rest

Strategy 1

Strategy 2

Hybrid0.00

50.00

100.00

150.00

200.00

250.00

0.09 0.22 0.13

209.22

70.88 66.41

2.45

2.47 2.46

179 150 129

26

847 276 260

15

Query is costly when impreciseness is high

Page 31: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Experimental Results - Different Shapes

32

Number of candidates 115 50 49Number of answers 26

Circle Default Ellipse Narrow Ellipse

179 150 129

26

276 366 250

24

Strategy 1

Strategy 2

Hybrid0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

0.09 0.15 0.13

26.97

13.60 13.42

2.45

2.46 2.45

Res

pons

e Ti

me[

s]

Strategy 1

Strategy 2

Hybrid0.00 5.00

10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00

0.09 0.19 0.13

43.70 38.70 34.13

2.45 2.48

2.47

Filtering Compute Prob. Rest

Strategy 1

Strategy 2

Hybrid0.00

10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00

100.00

0.09 0.20 0.13

69.69 84.56

62.56

2.45

2.46

2.46

Narrow ellipse (correlated distribution) needs to consider additional candidates

Page 32: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Conclusions of Experimental Results

• Most of the processing time is spent in numerical integration⇒A strategy that can prune more objects has

better performance Superiority or inferiority of two strategies

depends on the given query and the specified parameters No apparent winner

• The hybrid strategy inherits the benefits of two strategies and shows the best performance

33

Page 33: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Outline

• Background and Problem Formulation• Related Work• Query Processing Strategies• Experimental Results• Conclusions

34

Page 34: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Conclusions

• Nearest neighbor query processing methods for imprecise query objects– Location of query object is represented by

Gaussian distribution– Two strategies and their combination– Reduction of numerical integration is important– Proposal of two pruning strategies and their

evaluation• Future work

– Evaluation for multi-dimensional cases (d 3)

35

Page 35: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations

Finding Probabilistic Nearest Neighbors for Query Objects with

Imprecise Locations

Yuichi Iijima and Yoshiharu IshikawaNagoya University, Japan