approximate spatial query processing using raster signatures leonardo guerreiro azevedo, rodrigo...
TRANSCRIPT
Approximate Spatial Query Approximate Spatial Query Processing Using Processing Using Raster SignaturesRaster Signatures
Approximate Spatial Query Approximate Spatial Query Processing Using Processing Using Raster SignaturesRaster Signatures
Leonardo Guerreiro Azevedo, Rodrigo Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Salvador Monteiro, Geraldo Zimbrão &
Jano Moreira de SouzaJano Moreira de SouzaCoppe – Graduate School of EngineeringCoppe – Graduate School of Engineering
Institute of Mathematics – Computer Science DepartmentInstitute of Mathematics – Computer Science Department
Federal Federal University University
of of Rio de Rio de
JaneiroJaneiro
Approximate Spatial Query Processing Using Raster Signatures 2
Common Spatial QueriesCommon Spatial Queries
Area of polygonArea of polygon Area of polygon within windowArea of polygon within window Spatial Joins Spatial Joins
polygon polygon polygon, polygon polygon, polygon polyline & polyline polyline & polyline polylinepolyline
DistanceDistance BufferBuffer PerimeterPerimeter Topological queriesTopological queries
Approximate Spatial Query Processing Using Raster Signatures 3
Common Spatial QueriesCommon Spatial Queries
ApproximateApproximate Area of polygon Area of polygon ApproximateApproximate Area of polygon within Area of polygon within
windowwindow ApproximateApproximate Spatial Joins Spatial Joins
polygon polygon polygon, polygon polygon, polygon polyline & polyline polyline & polyline polylinepolyline
ApproximateApproximate Distance Distance ApproximateApproximate Buffer Buffer ApproximateApproximate Perimeter Perimeter ApproximateApproximate Topological queries Topological queries
Approximate Spatial Query Processing Using Raster Signatures 4
Approximate Answers to Spatial QueriesApproximate Answers to Spatial Queries
What is an approximate What is an approximate answer?answer?If the “exact” result is a number, If the “exact” result is a number,
the approximate result will be a the approximate result will be a number and a confidence number and a confidence intervalinterval
If not, the graphical display of If not, the graphical display of approximate answers is approximate answers is something like a “fuzzy” mapsomething like a “fuzzy” map
Approximate Spatial Query Processing Using Raster Signatures 5
The increase of storage capacityThe increase of storage capacity The decrease of hardware costsThe decrease of hardware costs Disk access time is still highDisk access time is still high Complex queriesComplex queries Data stored in devices that are Data stored in devices that are
not on-line.not on-line.
A query may take A query may take minutes or hours to be minutes or hours to be
processed.processed.
MotivationMotivation
Approximate Spatial Query Processing Using Raster Signatures 6
MotivationMotivation
Approximate answer may be Approximate answer may be enoughenough““exact” answers are itself approximationsexact” answers are itself approximationsApproximate answers can be computed quicklyApproximate answers can be computed quickly
Spatial query processing:Spatial query processing:ScaleScaleQualityQualityRound-off errorsRound-off errors
Approximate Spatial Query Processing Using Raster Signatures 7
Decision Support System Decision Support System Increasing business competitivenessIncreasing business competitiveness More use of accumulated dataMore use of accumulated data
Data mining Data mining During drill down query sequence in ad-hoc data miningDuring drill down query sequence in ad-hoc data mining Earlier queries in a sequence can be used to find out the Earlier queries in a sequence can be used to find out the
interesting queries.interesting queries. Data warehouseData warehouse
Performance and scalability when accessing very large Performance and scalability when accessing very large volumes of data during the analysis process.volumes of data during the analysis process.
Scenarios and ApplicationsScenarios and Applications
Approximate Spatial Query Processing Using Raster Signatures 8
Query optimizationQuery optimization To define the most efficient access plan for a given queryTo define the most efficient access plan for a given query
Distributed data recording and Distributed data recording and warehousing environments warehousing environments Data may be remote, and even may be unavailableData may be remote, and even may be unavailable Old data can be disposed in order to make room for new Old data can be disposed in order to make room for new
ones. Therefore it becomes impossible to answer to ones. Therefore it becomes impossible to answer to queries on deleted information.queries on deleted information.
Scenarios and ApplicationsScenarios and Applications
Approximate Spatial Query Processing Using Raster Signatures 9
Mobile computingMobile computing An approximate answer may be an alternative:An approximate answer may be an alternative:
• When the data is not availableWhen the data is not available• To save storage space To save storage space
Scenarios and ApplicationsScenarios and Applications
Approximate Spatial Query Processing Using Raster Signatures 10
Data environment set-Data environment set-up for providing up for providing
approximate answersapproximate answers
New data
Queries
Responses
Approx. Query Engine
A framework for approximate query processingA framework for approximate query processing
Database
Approximate Spatial Query Processing Using Raster Signatures 11
Four Color Raster Signature (4CRS)Four Color Raster Signature (4CRS)
Raster approximation Raster approximation (VLDB98)(VLDB98) Object representation upon a grid of cells.Object representation upon a grid of cells. Each cell stores relevant information using few bits.Each cell stores relevant information using few bits. Grid resolution can be changedGrid resolution can be changed
• Precision Precision storage requirements storage requirements 4 types of cells4 types of cells
Bit valueBit value Cell Cell typetype
DescriptionDescription
0000 EmptyEmpty The cell is not intersected by the polygonThe cell is not intersected by the polygon
0101 WeakWeak The cell contains an intersection of 50% or less The cell contains an intersection of 50% or less with the polygonwith the polygon
1010 StrongStrong The cell contains an intersection of more than The cell contains an intersection of more than 50% with the polygon and less than 100%50% with the polygon and less than 100%
1111 FullFull The cell is fully occupied by the polygonThe cell is fully occupied by the polygon
24th VLDB Conference New York, USA, 1998 12
Polygon
4CRS
4CRS Approximation Construction of Signatures
4CRS Approximation Construction of Signatures
Approximate Spatial Query Processing Using Raster Signatures 13
Polygon approximate areaPolygon approximate area
The algorithm is based on the sum of the The algorithm is based on the sum of the expectedexpected area of each cell grid area of each cell grid Empty cells: 0%Empty cells: 0% Full cells: 100%Full cells: 100% Weak and Strong cells Weak and Strong cells supposing uniform distribution supposing uniform distribution
• Weak cells: (0, 0.5] interval Weak cells: (0, 0.5] interval mean 0.25 mean 0.25• Strong cells: (0.5, 1) interval Strong cells: (0.5, 1) interval mean 0.75 mean 0.75
Count the number of each cell type in the Count the number of each cell type in the polygon’s 4CRS, and multiply these values polygon’s 4CRS, and multiply these values by the presumed cell area.by the presumed cell area.
Approximate Spatial Query Processing Using Raster Signatures 14
A measure of answer accuracyA measure of answer accuracy The polygon area inside weak or strong cell is The polygon area inside weak or strong cell is
assumed to be uniformly distributed.assumed to be uniformly distributed.
Weak cellsWeak cells
Strong cellsStrong cells
Using Central Limit Theorem Using Central Limit Theorem confidence confidence intervalinterval 95%95% 99%99%
25.02
)05.0(
48
1
12
5.01 22
75.0
2
)5.01(
NN 96.1NN 576.2
48
1
12
05.0 22
Confidence intervalConfidence interval
Approximate Spatial Query Processing Using Raster Signatures 15
Confidence interval (example)Confidence interval (example)
QQuery resultsuery results # weak cells: 100# weak cells: 100 # strong cells: 120# strong cells: 120 # full cells: 400# full cells: 400
Confidence interval: 95%Confidence interval: 95% Weak cells: Weak cells:
Strong cells:Strong cells:
Full cells: 400 (full cells have the exact area!)Full cells: 400 (full cells have the exact area!) Total:Total: Error between -1.15% and 1.15%Error between -1.15% and 1.15%
10.39048
12096.175.0120
83.22548
10096.125.0100
93.5515
Approximate Spatial Query Processing Using Raster Signatures 16
Cell Area DistributionCell Area Distribution
Cell Area Distribution - Township dataset
0
500
1000
1500
2000
2500
3000
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
Weak Strong
Comparable to an uniform distributionComparable to an uniform distributionVariance: Variance: 0.021369 (U: 0.020833)0.021369 (U: 0.020833)
Mean: 0.246453 (U: 0.25)Mean: 0.246453 (U: 0.25)
Approximate Spatial Query Processing Using Raster Signatures 17
ExampleExample
# empty cells: 55# empty cells: 55 # weak cells: 27# weak cells: 27 # strong cells: 26# strong cells: 26 # full cells: 79# full cells: 79 Approximate area:Approximate area:( ( Σ weak * 0.25 +Σ weak * 0.25 + Σ strong * 0.75 + Σ strong * 0.75 + Σ full ) * cellArea Σ full ) * cellArea
Exact area: 106.40Exact area: 106.40 Appr. area: 105.25Appr. area: 105.25 Error: 1.07%Error: 1.07%
Approximate Spatial Query Processing Using Raster Signatures 18
This algorithm is similar to the approximate This algorithm is similar to the approximate polygon area algorithmpolygon area algorithm
There are two kinds of cell overlap: There are two kinds of cell overlap: The cell may be completely contained by the windowThe cell may be completely contained by the window The cell may be partially contained by the windowThe cell may be partially contained by the window
• proportional to its overlapping areaproportional to its overlapping area
Approximate area of polygon window intersection
Approximate area of polygon window intersection
Approximate Spatial Query Processing Using Raster Signatures 19
Experimental testsExperimental tests
Computer: PC Pentium IV 1,8 GHz, 512 MB RAMComputer: PC Pentium IV 1,8 GHz, 512 MB RAM Page size 2,048 BytesPage size 2,048 Bytes Target: to evaluate the use of 4CRS for Target: to evaluate the use of 4CRS for
approximate query processing against exact approximate query processing against exact query processing related to the following query processing related to the following aspects:aspects: Response timeResponse time Storage requirements Storage requirements AccuracyAccuracy
The algorithms tested were :The algorithms tested were : Polygon approximate areaPolygon approximate area Approximate area of polygon x window intersectionApproximate area of polygon x window intersection
• 100 random windows for each data set (different sizes and positions)100 random windows for each data set (different sizes and positions)
Approximate Spatial Query Processing Using Raster Signatures 20
Use of R*-trees in order to reduce the search space.Use of R*-trees in order to reduce the search space.
Relation A Relation B
SAMs
Candidate pairs
Exact geometryprocessor
$
Response set
Step 1
Step 2
Relation A Relation B
SAMs
Candidate pairs
Approximate query processing
Response set
Step 1
Step 2
4CRS
Experimental testsExperimental tests
Approximate Spatial Query Processing Using Raster Signatures 21
The polygon real data sets used in the experiments The polygon real data sets used in the experiments consist of township boundaries, census block-group, consist of township boundaries, census block-group, topography, geologic map and hydrographic map topography, geologic map and hydrographic map from Iowa (USA), and Brazilian municipalities.from Iowa (USA), and Brazilian municipalities.
Experimental testsExperimental tests
Approximate Spatial Query Processing Using Raster Signatures 22
Approximate polygon areaApproximate polygon area
Execution time
1,69
3,26
0,32
1,550,95
0,39
7,82
25,51
2,03
3,132,52
1,18
0
2
4
6
8
10
12
14
Censusblock-group
Topograp. Hydrologic Townshipboundaries
Geologic map Municip.
Appr
Exact
Approximate Spatial Query Processing Using Raster Signatures 23
Approximate polygon areaApproximate polygon area
# Disk Access
259 69439 193 154 77
14,552
3876
8753
5351
3190
0
2000
4000
6000
8000
10000
12000
14000
16000
Census block-
group
Topograp. Hydrologic Township
boundaries
Geologic map Municip.
Appr
Exact
Approximate Spatial Query Processing Using Raster Signatures 24
Approximate polygon window areaApproximate polygon window area
Execution tim e
8.95 7.58
0.945.29 4.23 2.78
25.807
53.768 54.769
24.665
0
10
20
30
40
50
60
Census blk-group
Topogr. Hydrolog. Tow nshipboundar.
Geologic Municip.
Appr
Exact
Approximate Spatial Query Processing Using Raster Signatures 25
Approximate polygon window areaApproximate polygon window area
# Disk Access
4,8737,576
885 3,250 2,811 556
63,182
17,168
32,251 30,727
11,629
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
Census blk-group
Topogr. Hydrolog. Townshipboundar.
Geologic Municip.
Appr.
Exact
Approximate Spatial Query Processing Using Raster Signatures 26
ConclusionConclusion
The experimental results demonstrated the The experimental results demonstrated the efficiency of the 4CRS use for approximate query efficiency of the 4CRS use for approximate query processing.processing. Storage requirementsStorage requirements
• 4CRS has an average of 3.75% of the real data set size4CRS has an average of 3.75% of the real data set size
AccuracyAccuracy• Approximate area: average error of 2.62%Approximate area: average error of 2.62%• Window query approximate area: average error of 1%Window query approximate area: average error of 1%
Response timeResponse time• Approximate area: average 28.41%Approximate area: average 28.41%• Window query approximate area: average 7.22%Window query approximate area: average 7.22%
Disk accessDisk access• Approximate area: average 1.90%Approximate area: average 1.90%• Window query approximate area: average 7.04%Window query approximate area: average 7.04%
Approximate Spatial Query Processing Using Raster Signatures 27
Future worksFuture works
Algorithms for the other operationsAlgorithms for the other operationsApproximate area of polygon x polygon Approximate area of polygon x polygon
intersection algorithm is being evaluatedintersection algorithm is being evaluated
Use of approximations for mobile Use of approximations for mobile computingcomputing