k-nearest neighbor temporal aggregate queries · of people now find a good restaurant not far away...

27
Motivation and Related Work Query kNNTA and Index TAR-tree Entry Grouping Strategies Experiments and Conclusion K-Nearest Neighbor Temporal Aggregate Queries Yu Sun Jianzhong Qi Yu Zheng Rui Zhang Department of Computing and Information Systems University of Melbourne Microsoft Research, Beijing March 26 th 2015 Yu Sun, JianzhongQi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Upload: others

Post on 23-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion

K-Nearest Neighbor Temporal Aggregate

Queries

Yu Sun † Jianzhong Qi † Yu Zheng ‡ Rui Zhang †

†Department of Computing and Information SystemsUniversity of Melbourne

‡Microsoft Research, Beijing

March 26th 2015

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 2: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion

Outline

1 Motivation and Related Work

Motivating Examples

Previous Work

2 Query kNNTA and Index TAR-tree

Definition of kNNTA

Structure and Usage of TAR-tree

3 Entry Grouping Strategies

The Proposed Strategy

Analysis of Different Strategies

4 Experiments and Conclusion

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 3: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Motivating Examples Previous Work

Examples in Our Daily Life

Find some walking-distanceattractions

Find a nearby club gathering lotsof people now

Find a good restaurant not faraway and has few customers now

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 4: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Motivating Examples Previous Work

Answering Such Questions Has Wide Applications

Foursquare or Facebook: placesnearby

Flickr or Instagram: photos takennearby having many Likes

Urban computing

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 5: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Motivating Examples Previous Work

No Existing Queries Can Effectively Answer Such

Questions

Ranking locations on

Spatial distanceTemporal aggregate on visits or likes

Range aggregate does not work

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 6: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Motivating Examples Previous Work

No Existing Algorithms or Indexes Can Efficiently

Support Such Rankings

Characteristics of such applications

Visits or likes arrive continuously

Interested periods range from hours to years

Explore results of different preferences

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 7: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Definition of kNNTA Structure and Usage of TAR-tree

A Weighted Sum of the Spatial Distance and Temporal

Aggregate

The ranking function

f (p) = αd(p, q) + (1 − α)(1 − g(p, Iq))

K-Nearest Neighbor Temporal Aggregate Queries(kNNTA): returns the k locations with the minimumranking scores

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 8: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Definition of kNNTA Structure and Usage of TAR-tree

A Query Example

a

bc

de

f

gh

i

jk

l

q t0→ t1→ t2→

a 1 1 0

. . .

e 1 1 0

f 3 5 4

. . .

l 1 0 1

Query: q, [t0, now], α = 0.3, k = 1

f (e) = 0.3 ·2.2415.6

+ (1 − 0.3) · (1 −2

12) = 0.626

f (f) = 0.3 ·3

15.6+ (1 − 0.3) · (1 −

1212) = 0.058

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 9: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Definition of kNNTA Structure and Usage of TAR-tree

A Straightforward Approach May Encounter Very High

Cost

Number of locations in Foursquare is 60 million

Number of records for each location is 525, 600

Much more for applications like Instagram or Twitter

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 10: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Definition of kNNTA Structure and Usage of TAR-tree

A Brief Reminder of R-tree

a

b

c

d

ef

gh

ij

kl R1

R2

R3

R4

R5

R6

R7

c g l a b d h j f e i k

R1 R2 R3 R4 R5

R6 R7

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 11: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Basic Structure of TAR-tree

a

bc

de

fg

h

i

j kl

R1R2

R3

R4R5

R6

R7

Temporal Indexes

f c g b a e d h k i l j

R1 R2 R3 R4 R5

R6 R7

(3,5,4)

(2,2,2)

(2,3,1)

(1,*,1)

(3,5,4)

(2,3,2)

Page 12: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Definition of kNNTA Structure and Usage of TAR-tree

kNNTA Query Processing using TAR-tree

Best-First Search:

R6 (0.000) R7 (0.427)

R1 (0.058) R2 (0.352) R3 (0.408) R7 (0.427)

f (0.058) R2 (0.352) R3 (0.408) R7 (0.427)

f c g b a e d h k i l j

R1 R2 R3 R4 R5

R6 R7(3,5,4) (2,2,2)

(3,5,4)

(2,3,2)

(2,3,*)

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 13: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Definition of kNNTA Structure and Usage of TAR-tree

Maintenance of TAR-tree

Insert Visits or Likes

Insert location

Re-Insert

Note split

f c g b a e d h k i l j

R1 R2 R3 R4 R5

R6 R7

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 14: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Definition of kNNTA Structure and Usage of TAR-tree

Maintenance of TAR-tree

Insert Visits or Likes

Insert location

Re-Insert

Note split

f c g b a e d h k i l j

R1 R2 R3 R4 R5

R6 R7

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 15: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies

The Importance of Entry Grouping Strategy

R2

R6

R7

Root, R6, R7, R2

Four node accesses

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 16: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies

The Importance of Entry Grouping Strategy

R2

R6

R7

Root, R6, R7, R2

Four node accesses

R3

R6

Root, R6, R3

Three node accesses

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 17: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies

Properly Integrating the Spatial Distance and

Temporal Aggregate

Straightforward one: Use the spatial information

Straightforward two: Use the aggregate distribution

(1, 0, 1) with (1, 1, 0)(1, 0, 1) not with (10, 8, 9)

Propose: 3D MBR in R*-tree

two spatial dimensionsthird is the aggregate dimension

Coordinate of the aggregate dimension

λ̂p =1

m

m∑

i=1

vi

Example: (1, 0, 1) → 0.67 and (10, 2, 9, 8) → 7.25

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 18: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies

Power-law Distribution of the Aggregate Data

Roughly 80% of the visits are at 20% of the locations

100

101

102

10310

-5

10-4

10-3

10-2

10-1

100

Pr(

X ≥

x)

x

(a) NYC

100

101

102

10310

-5

10-4

10-3

10-2

10-1

100

Pr(

X ≥

x)

x

(b) LA

100

101

102

103

10410

-7

10-5

10-3

10-1

Pr(

X ≥

x)

x

(c) GW

100

101

102

103

104

10510

-6

10-4

10-2

100

Pr(

X ≥

x)

x

(d) GS

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 19: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies

Estimation of the Query Search Region and Number

of Node Accesses

l jh

kb

da e

c

gi

f 0.00

0.50

0.83

g′

aggre

gate

dim

ensio

n

Conic shape search region

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 20: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies

Estimation of the Query Search Region and Number

of Node Accesses

l jh

kb

da e

c

gi

f 0.00

0.50

0.83

g′

aggre

gate

dim

ensio

n

Conic shape search region

search region

nodes

aggre

gate

dim

ensio

n

Power-law like node extents

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 21: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies

Bad Behavior of the Two Straightforward Strategies

lj

h k

bd

a

e

c

gi

f

Spatial grouping

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 22: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies

Bad Behavior of the Two Straightforward Strategies

lj

h k

bd

a

e

c

gi

f

Spatial grouping

l j hk

bd

ae

c

gi

f

Aggregate grouping

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 23: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Experiments Conclusion

Experiment Set Up

Table: Data Set

Name Time Locations Check-ins

NYC 05/2008-06/2011 72,626 237,784

LA 02/2009-07/2011 45,591 127,924

GW 02/2009-10/2010 1,280,969 6,442,803

GS 01/2011-07/2011 182,968 1,385,223

Temporal index: Multi-version B-tree

Desktop with 3.40GHz CPU and 16GB RAM

Results are averaged over 1, 000 queries

By default k = 10 and α = 0.3.

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 24: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Experiments Conclusion

Validation of The Analysis

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 5 10 50 100

f(pk)

k

measuredestimated

0

200

400

600

800

1000

1 5 10 50 100

node accesses

k

measuredestimated

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.1 0.3 0.5 0.7 0.9

f(pk)

α0

measuredestimated

0

100

200

300

0.1 0.3 0.5 0.7 0.9

node accesses

α0

measuredestimated

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 25: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Experiments Conclusion

Performance of the TAR-tree

1

10

100

1000

10000

20% 40% 60% 80% 100%

CPU time (ms)

time

baselineIND-aggIND-spaTAR-tree

1

10

100

1000

10000

1 5 10 50 100

CPU time (ms)

k

baselineIND-aggIND-spaTAR-tree

1

10

100

1000

10000

0.1 0.3 0.5 0.7 0.9

CPU time (ms)

α0

baselineIND-aggIND-spaTAR-tree

1

10

100

1000

10000

1 3 7 14 28

CPU time (ms)

epoch length (day)

baselineIND-aggIND-spaTAR-tree

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 26: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Experiments Conclusion

Conclusion

The kNNTA query can provide highly customizedlocation retrieval and has wide applications.

The TAR-tree index efficiently processes the kNNTAquery.

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries

Page 27: K-Nearest Neighbor Temporal Aggregate Queries · of people now Find a good restaurant not far away and has few customers now Yu Sun, Jianzhong Qi, ... Flickr or Instagram: photos

Motivation and Related WorkQuery kNNTA and Index TAR-tree

Entry Grouping StrategiesExperiments and Conclusion Experiments Conclusion

Questions?

Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries