k-nearest neighbor temporal aggregate queries · of people now find a good restaurant not far away...
TRANSCRIPT
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion
K-Nearest Neighbor Temporal Aggregate
Queries
Yu Sun † Jianzhong Qi † Yu Zheng ‡ Rui Zhang †
†Department of Computing and Information SystemsUniversity of Melbourne
‡Microsoft Research, Beijing
March 26th 2015
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion
Outline
1 Motivation and Related Work
Motivating Examples
Previous Work
2 Query kNNTA and Index TAR-tree
Definition of kNNTA
Structure and Usage of TAR-tree
3 Entry Grouping Strategies
The Proposed Strategy
Analysis of Different Strategies
4 Experiments and Conclusion
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Motivating Examples Previous Work
Examples in Our Daily Life
Find some walking-distanceattractions
Find a nearby club gathering lotsof people now
Find a good restaurant not faraway and has few customers now
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Motivating Examples Previous Work
Answering Such Questions Has Wide Applications
Foursquare or Facebook: placesnearby
Flickr or Instagram: photos takennearby having many Likes
Urban computing
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Motivating Examples Previous Work
No Existing Queries Can Effectively Answer Such
Questions
Ranking locations on
Spatial distanceTemporal aggregate on visits or likes
Range aggregate does not work
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Motivating Examples Previous Work
No Existing Algorithms or Indexes Can Efficiently
Support Such Rankings
Characteristics of such applications
Visits or likes arrive continuously
Interested periods range from hours to years
Explore results of different preferences
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Definition of kNNTA Structure and Usage of TAR-tree
A Weighted Sum of the Spatial Distance and Temporal
Aggregate
The ranking function
f (p) = αd(p, q) + (1 − α)(1 − g(p, Iq))
K-Nearest Neighbor Temporal Aggregate Queries(kNNTA): returns the k locations with the minimumranking scores
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Definition of kNNTA Structure and Usage of TAR-tree
A Query Example
a
bc
de
f
gh
i
jk
l
q t0→ t1→ t2→
a 1 1 0
. . .
e 1 1 0
f 3 5 4
. . .
l 1 0 1
Query: q, [t0, now], α = 0.3, k = 1
f (e) = 0.3 ·2.2415.6
+ (1 − 0.3) · (1 −2
12) = 0.626
f (f) = 0.3 ·3
15.6+ (1 − 0.3) · (1 −
1212) = 0.058
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Definition of kNNTA Structure and Usage of TAR-tree
A Straightforward Approach May Encounter Very High
Cost
Number of locations in Foursquare is 60 million
Number of records for each location is 525, 600
Much more for applications like Instagram or Twitter
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Definition of kNNTA Structure and Usage of TAR-tree
A Brief Reminder of R-tree
a
b
c
d
ef
gh
ij
kl R1
R2
R3
R4
R5
R6
R7
c g l a b d h j f e i k
R1 R2 R3 R4 R5
R6 R7
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Basic Structure of TAR-tree
a
bc
de
fg
h
i
j kl
R1R2
R3
R4R5
R6
R7
Temporal Indexes
f c g b a e d h k i l j
R1 R2 R3 R4 R5
R6 R7
(3,5,4)
(2,2,2)
(2,3,1)
(1,*,1)
(3,5,4)
(2,3,2)
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Definition of kNNTA Structure and Usage of TAR-tree
kNNTA Query Processing using TAR-tree
Best-First Search:
R6 (0.000) R7 (0.427)
R1 (0.058) R2 (0.352) R3 (0.408) R7 (0.427)
f (0.058) R2 (0.352) R3 (0.408) R7 (0.427)
f c g b a e d h k i l j
R1 R2 R3 R4 R5
R6 R7(3,5,4) (2,2,2)
(3,5,4)
(2,3,2)
(2,3,*)
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Definition of kNNTA Structure and Usage of TAR-tree
Maintenance of TAR-tree
Insert Visits or Likes
Insert location
Re-Insert
Note split
f c g b a e d h k i l j
R1 R2 R3 R4 R5
R6 R7
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Definition of kNNTA Structure and Usage of TAR-tree
Maintenance of TAR-tree
Insert Visits or Likes
Insert location
Re-Insert
Note split
f c g b a e d h k i l j
R1 R2 R3 R4 R5
R6 R7
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies
The Importance of Entry Grouping Strategy
R2
R6
R7
Root, R6, R7, R2
Four node accesses
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies
The Importance of Entry Grouping Strategy
R2
R6
R7
Root, R6, R7, R2
Four node accesses
R3
R6
Root, R6, R3
Three node accesses
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies
Properly Integrating the Spatial Distance and
Temporal Aggregate
Straightforward one: Use the spatial information
Straightforward two: Use the aggregate distribution
(1, 0, 1) with (1, 1, 0)(1, 0, 1) not with (10, 8, 9)
Propose: 3D MBR in R*-tree
two spatial dimensionsthird is the aggregate dimension
Coordinate of the aggregate dimension
λ̂p =1
m
m∑
i=1
vi
Example: (1, 0, 1) → 0.67 and (10, 2, 9, 8) → 7.25
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies
Power-law Distribution of the Aggregate Data
Roughly 80% of the visits are at 20% of the locations
100
101
102
10310
-5
10-4
10-3
10-2
10-1
100
Pr(
X ≥
x)
x
(a) NYC
100
101
102
10310
-5
10-4
10-3
10-2
10-1
100
Pr(
X ≥
x)
x
(b) LA
100
101
102
103
10410
-7
10-5
10-3
10-1
Pr(
X ≥
x)
x
(c) GW
100
101
102
103
104
10510
-6
10-4
10-2
100
Pr(
X ≥
x)
x
(d) GS
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies
Estimation of the Query Search Region and Number
of Node Accesses
l jh
kb
da e
c
gi
f 0.00
0.50
0.83
g′
aggre
gate
dim
ensio
n
Conic shape search region
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies
Estimation of the Query Search Region and Number
of Node Accesses
l jh
kb
da e
c
gi
f 0.00
0.50
0.83
g′
aggre
gate
dim
ensio
n
Conic shape search region
search region
nodes
aggre
gate
dim
ensio
n
Power-law like node extents
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies
Bad Behavior of the Two Straightforward Strategies
lj
h k
bd
a
e
c
gi
f
Spatial grouping
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion The Proposed Strategy Analysis of Different Strategies
Bad Behavior of the Two Straightforward Strategies
lj
h k
bd
a
e
c
gi
f
Spatial grouping
l j hk
bd
ae
c
gi
f
Aggregate grouping
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Experiments Conclusion
Experiment Set Up
Table: Data Set
Name Time Locations Check-ins
NYC 05/2008-06/2011 72,626 237,784
LA 02/2009-07/2011 45,591 127,924
GW 02/2009-10/2010 1,280,969 6,442,803
GS 01/2011-07/2011 182,968 1,385,223
Temporal index: Multi-version B-tree
Desktop with 3.40GHz CPU and 16GB RAM
Results are averaged over 1, 000 queries
By default k = 10 and α = 0.3.
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Experiments Conclusion
Validation of The Analysis
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 5 10 50 100
f(pk)
k
measuredestimated
0
200
400
600
800
1000
1 5 10 50 100
node accesses
k
measuredestimated
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.1 0.3 0.5 0.7 0.9
f(pk)
α0
measuredestimated
0
100
200
300
0.1 0.3 0.5 0.7 0.9
node accesses
α0
measuredestimated
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Experiments Conclusion
Performance of the TAR-tree
1
10
100
1000
10000
20% 40% 60% 80% 100%
CPU time (ms)
time
baselineIND-aggIND-spaTAR-tree
1
10
100
1000
10000
1 5 10 50 100
CPU time (ms)
k
baselineIND-aggIND-spaTAR-tree
1
10
100
1000
10000
0.1 0.3 0.5 0.7 0.9
CPU time (ms)
α0
baselineIND-aggIND-spaTAR-tree
1
10
100
1000
10000
1 3 7 14 28
CPU time (ms)
epoch length (day)
baselineIND-aggIND-spaTAR-tree
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Experiments Conclusion
Conclusion
The kNNTA query can provide highly customizedlocation retrieval and has wide applications.
The TAR-tree index efficiently processes the kNNTAquery.
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related WorkQuery kNNTA and Index TAR-tree
Entry Grouping StrategiesExperiments and Conclusion Experiments Conclusion
Questions?
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang K-Nearest Neighbor Temporal Aggregate Queries