approximate querying about the past, the present, and the future in spatio-temporal databases jimeng...

Approximate querying about the Past, the Present, and the Future

in Spatio-Temporal Databases

Jimeng Sun, Dimitris Papadias,

Yufei Tao, Bin Liu

Motivation

• Spatio-temporal databases vs. Data streams• The monitoring applications

– Traffic supervision

– Mobile users monitoring

– Weather forecasting

• Example: – find the number of vehicles

in the city center now

• The challenge is to provide fast query response in highly intensive environment

Problems and methods

• Problems:– How to efficiently store/summarize the spatio-temporal

information?

– How to approximately answer the query about the past, the present, and the future?

• Methods:– Adaptive multi-dimensional histogram (AMH)

– Historical synopsis

– Stochastic prediction method

Related work

• Histograms– Static multi-dimensional histograms

• Equi-depth, Mhist, Minskew, Genhist, SQ

– Query-adaptive multi-dimensional histograms• STGrid, STHoles, SASH

• Other approximation methods– DCT, Wavelet, Sketch

• Spatio-temporal databases– Historical retrieval

– Future prediction

Outline

• Introduction• Problem and proposed methods

– Adaptive multi-dimensional histogram

– Prediction model

• Experiment • Conclusion

Query types

Present Time (PT)

Historical Time (HT)

Future Time (FT)

Queries

location

currentpast future

System Overview

Queries

Past Index

Historical Synopsis

PredictionModel

Spatio-temporalupdates

Histogram

• Partition the space into buckets• Data within a bucket summarize by

the mean• The properties of a good histogram:

– Uniformity within each bucket

– Incremental updateable

0 20 40 60 80 100

Adaptive Multi-dimensional Histogram (AMH)

Regular cells

1 1 3 3 3 5

446312

1 1 5 3 4 5

111165

• Objective: minimize WVS=(areai∙vari) (Minskew [Acharya, Poosala, Ramaswamy 99])

Buckets

Dynamic Maintenance of AMH

• Our scheme: record the information during the construction and modify the structure as needed.– 1. information update

• Update the bucket count

– 2. bucket reorganization• Merge: to claim buckets

• Split: to reduce WVS

Information update of AMH

Buckets

mappingb1

Bucket reorganization -Merge

Buckets

Bucket Info:1. region [x-, x+][y-,y+]2. frequency: count/area3. 2nd moment:(for variance calculation)

•Merge the subtree that leads to minimal WVS increase

Bucket reorganization -Split

b2b5b*

b*3 b*4

• Split the bucket that leads to maximal WVS decrease

Features of AMH

• Bucket information is updated as new data arrive• Bucket extents continuously adapt the data

distribution changes• The maintenance does not affect the normal query

processing– It is interruptible at any moment of time

– It is performed at the CPU idle time

Outline

• Introduction• Problem and proposed methods

Historical Synopsis

• AMH maintains the current buckets.

• Past index stores the obsolete buckets.

• Past index: – Packed B-tree

– 3D R-tree

current bucketsrecent buckets

Past Index T

old buckets

main memorydisk

current cells

incoming streams

Prediction Model

• Prediction based on velocity doesn’t work!– It is not realistic to assume velocity remains constant

between current time and query time

– Velocity is highly dynamic

• We suggest to use only the past and present location information to do prediction.

Prediction Model (cont.)

PredictionModel

Historical Synopsis

results

forecast the future using any time series prediction method: we use AR

0 10 20 30 40 50 60 70 80

Outline

• Introduction• Related work• Problem and proposed methods

Experiment settings

• Datasets– 2.5M updates for each dataset

– spatial: 50K mobile objects from 2 spatial dataset

– road: from a spatio-temporal generator (described in [Brinkhoff 2002] )

median finalinitial

Road network Data distribution

Robustness with time

0.5M 1M 1.5M 2M 2.5Mnumber of location updates

error rate

number of location updates0

0.5M 1M 1.5M 2M 2.5M5k

spatial

Query: qlength = 6% of the data space; 25K queries uniformly distribute along space and time

Comparison with conventional histogram

• Minskew (a static spatial histogram) is rebuilt every 50k location updates

• tp is the proportion between the cost of AMH and that of Minskew

• The re-organization operations of AMH are uniformly distributed among the 50k location updates.

error rate

0.001 0.01 0.1 1

time proportion

error rate

0.001 0.01 0.1 1

spatial

minskew

The effect of update intensity

• B-tree performs better at the high update rate.

• R-tree provides much faster query response.

• In general, when query/update ratio is large (>30%), R-tree performs better.

CPU timemsec

PT HT FT

error rate

1k 10k 100kupdate rate update rate

error rate

1k 10k 100k

spatialroad

3D r-tree b-treeQuery type

Conclusion

• We present a comprehensive approach for processing queries that refer to any time in history.

• The proposed architecture maintains– an incremental multi-dimensional histogram;

– a past index structure for storing the outdated buckets.

• Future queries are answered by a stochastic method that uses the recent history to predict the future.

Summary

Past Index

Historical Synopsis

PredictionModel

0. goal: min(WVS)1. Info update2. Reorganization happens when CPU is idle

1.Recent buckets in memory2.Old buckets dump to the disk

Forecast based on the present and past.

Related work

• Static multi-dimensional histograms• Query-adaptive multi-dimensional histograms• Other multi-dimensional approximation methods• Spatio-temporal prediction methods• Spatio-temporal aggregation methods

Evaluation over different query typeserror rate

2% 4% 6% 8% 10%L

error rate

2% 4% 6% 8% 10%L

spatial

Motivation (cont.)

• Spatio-temporal database (STDB) research:– historical retrieval

– future prediction

Bucket reorganization -Split

BucketsSplit

Buckets

b2b5b*

b*3 b*4

approximate querying about the past, the present, and the future in spatio-temporal databases jimeng...

Documents

netkit ftpd/ftp migration part 3 yufei 10/01/2010

develop application with open fabrics yufei ren tan li

digitalgaborfiltersdogeneratemra-basedwavelettightframes...digitalgaborfiltersdogeneratemra-basedwavelettightframes...

wildfires and climate ---interactions and variations yufei...

multi-dimensional reverseknn search -...

yufei lin thesis-updated - cranfield university

yufei li suggestions on how to improve employee …

yufei tao department of computer science and technology

personalized privacy preservation xiaokui xiao, yufei tao...

app engine web app framework jim eng jimeng@umich.edu

incremental pattern discovery on streams, graphs and tensors...

less is more: compact matrix decomposition for...

yufei huang,...

© 2007 jimeng sun less is more: compact matrix...

pi: shabbir ahmed presenter: dionissios (dennis) papadias...

author: jim c. huang etc. lecturer: dong yue director: dr....

1 the mv3r-tree: a spatio- temporal access method for...

computer science spatio-temporal aggregation using sketches...

maintaining sliding window skylines on data...

hashing · 2020. 10. 18. · hashing yufei tao department...