approximate spatial query processing using raster signatures leonardo guerreiro azevedo, rodrigo...

27
Approximate Spatial Query Approximate Spatial Query Processing Using Processing Using Raster Signatures Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Monteiro, Geraldo Zimbrão & Jano Moreira de Souza Souza Coppe – Graduate School of Engineering Coppe – Graduate School of Engineering Institute of Mathematics – Computer Science Department Institute of Mathematics – Computer Science Department Federal Federal University of University of Rio de Rio de Janeiro Janeiro

Upload: alex-jordan

Post on 27-Mar-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Approximate Spatial Query Processing Using Processing Using Raster SignaturesRaster Signatures

Approximate Spatial Query Approximate Spatial Query Processing Using Processing Using Raster SignaturesRaster Signatures

Leonardo Guerreiro Azevedo, Rodrigo Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Salvador Monteiro, Geraldo Zimbrão &

Jano Moreira de SouzaJano Moreira de SouzaCoppe – Graduate School of EngineeringCoppe – Graduate School of Engineering

Institute of Mathematics – Computer Science DepartmentInstitute of Mathematics – Computer Science Department

Federal Federal University University

of of Rio de Rio de

JaneiroJaneiro

Page 2: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 2

Common Spatial QueriesCommon Spatial Queries

Area of polygonArea of polygon Area of polygon within windowArea of polygon within window Spatial Joins Spatial Joins

polygon polygon polygon, polygon polygon, polygon polyline & polyline polyline & polyline polylinepolyline

DistanceDistance BufferBuffer PerimeterPerimeter Topological queriesTopological queries

Page 3: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 3

Common Spatial QueriesCommon Spatial Queries

ApproximateApproximate Area of polygon Area of polygon ApproximateApproximate Area of polygon within Area of polygon within

windowwindow ApproximateApproximate Spatial Joins Spatial Joins

polygon polygon polygon, polygon polygon, polygon polyline & polyline polyline & polyline polylinepolyline

ApproximateApproximate Distance Distance ApproximateApproximate Buffer Buffer ApproximateApproximate Perimeter Perimeter ApproximateApproximate Topological queries Topological queries

Page 4: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 4

Approximate Answers to Spatial QueriesApproximate Answers to Spatial Queries

What is an approximate What is an approximate answer?answer?If the “exact” result is a number, If the “exact” result is a number,

the approximate result will be a the approximate result will be a number and a confidence number and a confidence intervalinterval

If not, the graphical display of If not, the graphical display of approximate answers is approximate answers is something like a “fuzzy” mapsomething like a “fuzzy” map

Page 5: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 5

The increase of storage capacityThe increase of storage capacity The decrease of hardware costsThe decrease of hardware costs Disk access time is still highDisk access time is still high Complex queriesComplex queries Data stored in devices that are Data stored in devices that are

not on-line.not on-line.

A query may take A query may take minutes or hours to be minutes or hours to be

processed.processed.

MotivationMotivation

Page 6: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 6

MotivationMotivation

Approximate answer may be Approximate answer may be enoughenough““exact” answers are itself approximationsexact” answers are itself approximationsApproximate answers can be computed quicklyApproximate answers can be computed quickly

Spatial query processing:Spatial query processing:ScaleScaleQualityQualityRound-off errorsRound-off errors

Page 7: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 7

Decision Support System Decision Support System Increasing business competitivenessIncreasing business competitiveness More use of accumulated dataMore use of accumulated data

Data mining Data mining During drill down query sequence in ad-hoc data miningDuring drill down query sequence in ad-hoc data mining Earlier queries in a sequence can be used to find out the Earlier queries in a sequence can be used to find out the

interesting queries.interesting queries. Data warehouseData warehouse

Performance and scalability when accessing very large Performance and scalability when accessing very large volumes of data during the analysis process.volumes of data during the analysis process.

Scenarios and ApplicationsScenarios and Applications

Page 8: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 8

Query optimizationQuery optimization To define the most efficient access plan for a given queryTo define the most efficient access plan for a given query

Distributed data recording and Distributed data recording and warehousing environments warehousing environments Data may be remote, and even may be unavailableData may be remote, and even may be unavailable Old data can be disposed in order to make room for new Old data can be disposed in order to make room for new

ones. Therefore it becomes impossible to answer to ones. Therefore it becomes impossible to answer to queries on deleted information.queries on deleted information.

Scenarios and ApplicationsScenarios and Applications

Page 9: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 9

Mobile computingMobile computing An approximate answer may be an alternative:An approximate answer may be an alternative:

• When the data is not availableWhen the data is not available• To save storage space To save storage space

Scenarios and ApplicationsScenarios and Applications

Page 10: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 10

Data environment set-Data environment set-up for providing up for providing

approximate answersapproximate answers

New data

Queries

Responses

Approx. Query Engine

A framework for approximate query processingA framework for approximate query processing

Database

Page 11: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 11

Four Color Raster Signature (4CRS)Four Color Raster Signature (4CRS)

Raster approximation Raster approximation (VLDB98)(VLDB98) Object representation upon a grid of cells.Object representation upon a grid of cells. Each cell stores relevant information using few bits.Each cell stores relevant information using few bits. Grid resolution can be changedGrid resolution can be changed

• Precision Precision storage requirements storage requirements 4 types of cells4 types of cells

Bit valueBit value Cell Cell typetype

DescriptionDescription

0000 EmptyEmpty The cell is not intersected by the polygonThe cell is not intersected by the polygon

0101 WeakWeak The cell contains an intersection of 50% or less The cell contains an intersection of 50% or less with the polygonwith the polygon

1010 StrongStrong The cell contains an intersection of more than The cell contains an intersection of more than 50% with the polygon and less than 100%50% with the polygon and less than 100%

1111 FullFull The cell is fully occupied by the polygonThe cell is fully occupied by the polygon

Page 12: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

24th VLDB Conference New York, USA, 1998 12

Polygon

4CRS

4CRS Approximation Construction of Signatures

4CRS Approximation Construction of Signatures

Page 13: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 13

Polygon approximate areaPolygon approximate area

The algorithm is based on the sum of the The algorithm is based on the sum of the expectedexpected area of each cell grid area of each cell grid Empty cells: 0%Empty cells: 0% Full cells: 100%Full cells: 100% Weak and Strong cells Weak and Strong cells supposing uniform distribution supposing uniform distribution

• Weak cells: (0, 0.5] interval Weak cells: (0, 0.5] interval mean 0.25 mean 0.25• Strong cells: (0.5, 1) interval Strong cells: (0.5, 1) interval mean 0.75 mean 0.75

Count the number of each cell type in the Count the number of each cell type in the polygon’s 4CRS, and multiply these values polygon’s 4CRS, and multiply these values by the presumed cell area.by the presumed cell area.

Page 14: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 14

A measure of answer accuracyA measure of answer accuracy The polygon area inside weak or strong cell is The polygon area inside weak or strong cell is

assumed to be uniformly distributed.assumed to be uniformly distributed.

Weak cellsWeak cells

Strong cellsStrong cells

Using Central Limit Theorem Using Central Limit Theorem confidence confidence intervalinterval 95%95% 99%99%

25.02

)05.0(

48

1

12

5.01 22

75.0

2

)5.01(

NN 96.1NN 576.2

48

1

12

05.0 22

Confidence intervalConfidence interval

Page 15: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 15

Confidence interval (example)Confidence interval (example)

QQuery resultsuery results # weak cells: 100# weak cells: 100 # strong cells: 120# strong cells: 120 # full cells: 400# full cells: 400

Confidence interval: 95%Confidence interval: 95% Weak cells: Weak cells:

Strong cells:Strong cells:

Full cells: 400 (full cells have the exact area!)Full cells: 400 (full cells have the exact area!) Total:Total: Error between -1.15% and 1.15%Error between -1.15% and 1.15%

10.39048

12096.175.0120

83.22548

10096.125.0100

93.5515

Page 16: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 16

Cell Area DistributionCell Area Distribution

Cell Area Distribution - Township dataset

0

500

1000

1500

2000

2500

3000

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1

Weak Strong

Comparable to an uniform distributionComparable to an uniform distributionVariance: Variance: 0.021369 (U: 0.020833)0.021369 (U: 0.020833)

Mean: 0.246453 (U: 0.25)Mean: 0.246453 (U: 0.25)

Page 17: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 17

ExampleExample

# empty cells: 55# empty cells: 55 # weak cells: 27# weak cells: 27 # strong cells: 26# strong cells: 26 # full cells: 79# full cells: 79 Approximate area:Approximate area:( ( Σ weak * 0.25 +Σ weak * 0.25 + Σ strong * 0.75 + Σ strong * 0.75 + Σ full ) * cellArea Σ full ) * cellArea

Exact area: 106.40Exact area: 106.40 Appr. area: 105.25Appr. area: 105.25 Error: 1.07%Error: 1.07%

Page 18: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 18

This algorithm is similar to the approximate This algorithm is similar to the approximate polygon area algorithmpolygon area algorithm

There are two kinds of cell overlap: There are two kinds of cell overlap: The cell may be completely contained by the windowThe cell may be completely contained by the window The cell may be partially contained by the windowThe cell may be partially contained by the window

• proportional to its overlapping areaproportional to its overlapping area

Approximate area of polygon window intersection

Approximate area of polygon window intersection

Page 19: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 19

Experimental testsExperimental tests

Computer: PC Pentium IV 1,8 GHz, 512 MB RAMComputer: PC Pentium IV 1,8 GHz, 512 MB RAM Page size 2,048 BytesPage size 2,048 Bytes Target: to evaluate the use of 4CRS for Target: to evaluate the use of 4CRS for

approximate query processing against exact approximate query processing against exact query processing related to the following query processing related to the following aspects:aspects: Response timeResponse time Storage requirements Storage requirements AccuracyAccuracy

The algorithms tested were :The algorithms tested were : Polygon approximate areaPolygon approximate area Approximate area of polygon x window intersectionApproximate area of polygon x window intersection

• 100 random windows for each data set (different sizes and positions)100 random windows for each data set (different sizes and positions)

Page 20: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 20

Use of R*-trees in order to reduce the search space.Use of R*-trees in order to reduce the search space.

Relation A Relation B

SAMs

Candidate pairs

Exact geometryprocessor

$

Response set

Step 1

Step 2

Relation A Relation B

SAMs

Candidate pairs

Approximate query processing

Response set

Step 1

Step 2

4CRS

Experimental testsExperimental tests

Page 21: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 21

The polygon real data sets used in the experiments The polygon real data sets used in the experiments consist of township boundaries, census block-group, consist of township boundaries, census block-group, topography, geologic map and hydrographic map topography, geologic map and hydrographic map from Iowa (USA), and Brazilian municipalities.from Iowa (USA), and Brazilian municipalities.

Experimental testsExperimental tests

Page 22: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 22

Approximate polygon areaApproximate polygon area

Execution time

1,69

3,26

0,32

1,550,95

0,39

7,82

25,51

2,03

3,132,52

1,18

0

2

4

6

8

10

12

14

Censusblock-group

Topograp. Hydrologic Townshipboundaries

Geologic map Municip.

Appr

Exact

Page 23: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 23

Approximate polygon areaApproximate polygon area

# Disk Access

259 69439 193 154 77

14,552

3876

8753

5351

3190

0

2000

4000

6000

8000

10000

12000

14000

16000

Census block-

group

Topograp. Hydrologic Township

boundaries

Geologic map Municip.

Appr

Exact

Page 24: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 24

Approximate polygon window areaApproximate polygon window area

Execution tim e

8.95 7.58

0.945.29 4.23 2.78

25.807

53.768 54.769

24.665

0

10

20

30

40

50

60

Census blk-group

Topogr. Hydrolog. Tow nshipboundar.

Geologic Municip.

Appr

Exact

Page 25: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 25

Approximate polygon window areaApproximate polygon window area

# Disk Access

4,8737,576

885 3,250 2,811 556

63,182

17,168

32,251 30,727

11,629

0

10,000

20,000

30,000

40,000

50,000

60,000

70,000

80,000

90,000

Census blk-group

Topogr. Hydrolog. Townshipboundar.

Geologic Municip.

Appr.

Exact

Page 26: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 26

ConclusionConclusion

The experimental results demonstrated the The experimental results demonstrated the efficiency of the 4CRS use for approximate query efficiency of the 4CRS use for approximate query processing.processing. Storage requirementsStorage requirements

• 4CRS has an average of 3.75% of the real data set size4CRS has an average of 3.75% of the real data set size

AccuracyAccuracy• Approximate area: average error of 2.62%Approximate area: average error of 2.62%• Window query approximate area: average error of 1%Window query approximate area: average error of 1%

Response timeResponse time• Approximate area: average 28.41%Approximate area: average 28.41%• Window query approximate area: average 7.22%Window query approximate area: average 7.22%

Disk accessDisk access• Approximate area: average 1.90%Approximate area: average 1.90%• Window query approximate area: average 7.04%Window query approximate area: average 7.04%

Page 27: Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza

Approximate Spatial Query Processing Using Raster Signatures 27

Future worksFuture works

Algorithms for the other operationsAlgorithms for the other operationsApproximate area of polygon x polygon Approximate area of polygon x polygon

intersection algorithm is being evaluatedintersection algorithm is being evaluated

Use of approximations for mobile Use of approximations for mobile computingcomputing