approximate querying about the past, the present, and the future in spatio-temporal databases jimeng...
Post on 20-Dec-2015
216 Views
Preview:
TRANSCRIPT
Approximate querying about the Past, the Present, and the Future
in Spatio-Temporal Databases
Jimeng Sun, Dimitris Papadias,
Yufei Tao, Bin Liu
2
Motivation
• Spatio-temporal databases vs. Data streams• The monitoring applications
– Traffic supervision
– Mobile users monitoring
– Weather forecasting
• Example: – find the number of vehicles
in the city center now
• The challenge is to provide fast query response in highly intensive environment
3
Problems and methods
• Problems:– How to efficiently store/summarize the spatio-temporal
information?
– How to approximately answer the query about the past, the present, and the future?
• Methods:– Adaptive multi-dimensional histogram (AMH)
– Historical synopsis
– Stochastic prediction method
4
Related work
• Histograms– Static multi-dimensional histograms
• Equi-depth, Mhist, Minskew, Genhist, SQ
– Query-adaptive multi-dimensional histograms• STGrid, STHoles, SASH
• Other approximation methods– DCT, Wavelet, Sketch
• Spatio-temporal databases– Historical retrieval
– Future prediction
5
Outline
• Introduction• Problem and proposed methods
– Adaptive multi-dimensional histogram
– Historical synopsis
– Prediction model
• Experiment • Conclusion
6
Query types
Present Time (PT)
Historical Time (HT)
Future Time (FT)
Queries
time
location
currentpast future
7
System Overview
PT
HT
FT
Queries
AMH
Past Index
Historical Synopsis
PredictionModel
Spatio-temporalupdates
8
Histogram
• Partition the space into buckets• Data within a bucket summarize by
the mean• The properties of a good histogram:
– Uniformity within each bucket
– Incremental updateable
0
20
40
60
80
100
0 20 40 60 80 100
0
20
40
60
80
100
0 20 40 60 80 100
bad
good
9
Adaptive Multi-dimensional Histogram (AMH)
Regular cells
1 1 3 3 3 5
446312
1 1 5 3 4 5
5
4
5
9
111165
4 5
5 6
4
10
6
9
• Objective: minimize WVS=(areai∙vari) (Minskew [Acharya, Poosala, Ramaswamy 99])
n1
n2 n3
n4
b1 b2
b4b3
b5
n5 b6
BPT
b1
b2
b3
b4
b6
b5
Buckets
10
Dynamic Maintenance of AMH
• Our scheme: record the information during the construction and modify the structure as needed.– 1. information update
• Update the bucket count
– 2. bucket reorganization• Merge: to claim buckets
• Split: to reduce WVS
11
Information update of AMH
n1
n2 n3
n4
b1 b2
b4b3
b5
n5 b6
BPT
b1
b2
b3
b4
b6
b5
Buckets
mappingb1
b1
n2
n1
12
Bucket reorganization -Merge
n1
n2 n3
b1 b2
b5
BPT
n1
n2 n3
n4
b1 b2
b4b3
b5
n5 b6
BPT
n1
n2 n3
n4
b1 b2
b4b3
b5
n5 b6
n4
b*
Merge
b1
b2
b*
b5
Buckets
Bucket Info:1. region [x-, x+][y-,y+]2. frequency: count/area3. 2nd moment:(for variance calculation)
•Merge the subtree that leads to minimal WVS increase
13
Bucket reorganization -Split
n1
n2 n3
b1 b2
b5b*
Split
n1
n2 n3
b*1
b2b5b*
b*2
n4
b*3 b*4
n5
• Split the bucket that leads to maximal WVS decrease
14
Features of AMH
• Bucket information is updated as new data arrive• Bucket extents continuously adapt the data
distribution changes• The maintenance does not affect the normal query
processing– It is interruptible at any moment of time
– It is performed at the CPU idle time
15
Outline
• Introduction• Problem and proposed methods
– Adaptive multi-dimensional histogram
– Historical synopsis
– Prediction model
• Experiment • Conclusion
16
Historical Synopsis
• AMH maintains the current buckets.
• Past index stores the obsolete buckets.
• Past index: – Packed B-tree
– 3D R-tree
AMH
current bucketsrecent buckets
....
Past Index T
old buckets
....
main memorydisk
current cells
incoming streams
17
Prediction Model
• Prediction based on velocity doesn’t work!– It is not realistic to assume velocity remains constant
between current time and query time
– Velocity is highly dynamic
• We suggest to use only the past and present location information to do prediction.
18
Prediction Model (cont.)
FT
PredictionModel
HT
PT
Historical Synopsis
results
Parse
forecast the future using any time series prediction method: we use AR
0
2
4
6
8
10
0 10 20 30 40 50 60 70 80
19
Outline
• Introduction• Related work• Problem and proposed methods
– Adaptive multi-dimensional histogram
– Historical synopsis
– Prediction model
• Experiment • Conclusion
20
Experiment settings
• Datasets– 2.5M updates for each dataset
– spatial: 50K mobile objects from 2 spatial dataset
– road: from a spatio-temporal generator (described in [Brinkhoff 2002] )
median finalinitial
Road network Data distribution
21
Robustness with time
0.5M 1M 1.5M 2M 2.5Mnumber of location updates
error rate
0
4%
8%
12%
16%
5k
number of location updates0
10%
20%
30%
0.5M 1M 1.5M 2M 2.5M5k
spatial
road
Query: qlength = 6% of the data space; 25K queries uniformly distribute along space and time
22
Comparison with conventional histogram
• Minskew (a static spatial histogram) is rebuilt every 50k location updates
• tp is the proportion between the cost of AMH and that of Minskew
• The re-organization operations of AMH are uniformly distributed among the 50k location updates.
error rate
0
10%
20%
30%
0.001 0.01 0.1 1
time proportion
error rate
10%
15%
20%
25%
30%
0.001 0.01 0.1 1
spatial
road
minskew
AMH
minskew
AMH
23
The effect of update intensity
• B-tree performs better at the high update rate.
• R-tree provides much faster query response.
• In general, when query/update ratio is large (>30%), R-tree performs better.
CPU timemsec
0
1
2
3
4
5
PT HT FT
error rate
0%
5%
10%
15%
20%
25%
1k 10k 100kupdate rate update rate
error rate
0%
5%
10%
15%
20%
25%
1k 10k 100k
spatialroad
3D r-tree b-treeQuery type
24
Conclusion
• We present a comprehensive approach for processing queries that refer to any time in history.
• The proposed architecture maintains– an incremental multi-dimensional histogram;
– a past index structure for storing the outdated buckets.
• Future queries are answered by a stochastic method that uses the recent history to predict the future.
25
Q+A
26
Summary
AMH
Past Index
Historical Synopsis
PredictionModel
0. goal: min(WVS)1. Info update2. Reorganization happens when CPU is idle
1.Recent buckets in memory2.Old buckets dump to the disk
Old
buc
kets
Forecast based on the present and past.
27
Related work
• Static multi-dimensional histograms• Query-adaptive multi-dimensional histograms• Other multi-dimensional approximation methods• Spatio-temporal prediction methods• Spatio-temporal aggregation methods
28
Evaluation over different query typeserror rate
q
0%
5%
10%
15%
20%
2% 4% 6% 8% 10%L
q
error rate
0%
5%
10%
15%
20%
25%
30%
35%
2% 4% 6% 8% 10%L
spatial
road
29
Motivation (cont.)
• Spatio-temporal database (STDB) research:– historical retrieval
– future prediction
30
Bucket reorganization -Split
n1
n2 n3
b1 b2
b5b*
b1
b2
b*
b5
BucketsSplit
b*1
b2
b*
b5
Buckets
n1
n2 n3
b*1
b2b5b*
b*2
n4
b*2
b*3 b*4
n5
b*3
b*4
top related