spatial indexing sams. spatial indexing point access methods can index only points. what about...

45
Spatial Indexing SAMs

Post on 19-Dec-2015

238 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

Spatial Indexing

SAMs

Page 2: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

Spatial Indexing Point Access Methods can index only

points. What about regions? Z-ordering and quadtrees Use the transformation technique and a

PAM New methods: Spatial Access Methods

SAMs R-tree and variations

Page 3: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

Problem Given a collection of geometric

objects (points, lines, polygons, ...) organize them on disk, to answer

spatial queries (range, nn, etc)

Page 4: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

Transformation Technique Map an d-dim MBR into a point: ex. [(xmin, xmax) (ymin, ymax)] =>

(xmin, xmax, ymin, ymax) Use a PAM to index the 2d points Given a range query, map the

query into the 2d space and use the PAM to answer it

Page 5: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-tree

[Guttman 84] Main idea: allow parents to overlap! => guaranteed 50% utilization => easier insertion/split algorithms. (only deal with Minimum Bounding

Rectangles - MBRs)

Page 6: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-tree A multi-way external memory tree Index nodes and data (leaf) nodes All leaf nodes appear on the same level Every node contains between m and M

entries The root node has at least 2 entries

(children)

Page 7: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

Example

eg., w/ fanout 4: group nearby rectangles to parent MBRs; each group -> disk page

A

B

C

DE

FG

H

J

I

Page 8: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

Example F=4

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

F GD E

H I JA B C

Page 9: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

Example F=4| m=2, M=4

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B C

P5 P6

P5

P6

Page 10: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees - format of nodes

{(MBR; obj_ptr)} for leaf nodes

P1 P2 P3 P4

A B C

x-low; x-highy-low; y-high

...

objptr ...

Page 11: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees - format of nodes

{(MBR; node_ptr)} for non-leaf nodes

P1 P2 P3 P4

A B C

x-low; x-highy-low; y-high

...nodeptr ...

Page 12: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees:Search

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B C

P5 P6

Page 13: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees:Search

P1 P2 P3 P4

F GD E

H I JA B C

P5 P6

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

Page 14: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees:Search

Main points: every parent node completely covers its

‘children’ a child MBR may be covered by more than

one parent - it is stored under ONLY ONE of them. (ie., no need for dup. elim.)

a point query may follow multiple branches. everything works for any(?) dimensionality

Page 15: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees:Insertion

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B CX

X

Insert X

Page 16: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees:Insertion

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B CY

Insert Y

Page 17: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees:Insertion

Extend the parent MBR

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B CY

Y

Page 18: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees:Insertion How to find the next node to insert

the new object? Using ChooseLeaf: Find the entry that

needs the least enlargement to include Y. Resolve ties using the area (smallest)

Other methods (later)

Page 19: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees:Insertion

If node is full then Split : ex. Insert w

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B C

W

K

K

Page 20: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees:Insertion

If node is full then Split : ex. Insert w

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

Q1 Q2

F G

D E

H I J

A BW

K

C K W

P5 P1 P5 P2 P3 P4

Q1Q2

Page 21: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees:Split

Split node P1: partition the MBRs into two groups.

A

B

CW

KP1

• (A1: plane sweep,

until 50% of rectangles)

• A2: ‘linear’ split

• A3: quadratic split

• A4: exponential split:

2M-1 choices

Page 22: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees:Split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’ ‘seed’

seed1

seed2

R

Page 23: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees:Split pick two rectangles as ‘seeds’; assign each rectangle ‘R’ to the ‘closest’

‘seed’: ‘closest’: the smallest increase in area

seed1

seed2

R

Page 24: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees:Split How to pick Seeds:

Linear:Find the highest and lowest side in each dimension, normalize the separations, choose the pair with the greatest normalized separation

Quadratic: For each pair E1 and E2, calculate the rectangle J=MBR(E1, E2) and d= J-E1-E2. Choose the pair with the largest d

Page 25: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees:Insertion Use the ChooseLeaf to find the

leaf node to insert an entry E If leaf node is full, then Split,

otherwise insert there Propagate the split upwards, if

necessary Adjust parent nodes

Page 26: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-Trees:Deletion Find the leaf node that contains the entry E Remove E from this node If underflow:

Eliminate the node by removing the node entries and the parent entry

Reinsert the orphaned (other entries) into the tree using Insert

Other method (later)

Page 27: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees: Variations R+-tree: DO not allow overlapping, so

split the objects (similar to z-values) R*-tree: change the insertion, deletion

algorithms (minimize not only area but also perimeter, forced re-insertion )

Hilbert R-tree: use the Hilbert values to insert objects into the tree

Page 28: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

Spatial Access Methods PAMs

Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree

R-tree Variations: R*-tree, Hilbert R-tree

Page 29: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-tree

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B C

Multi-way external memory structure, indexes MBRsDynamic structure

Page 30: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-tree The original R-tree tries to

minimize the area of each enclosing rectangle in the index nodes.

Is there any other property that can be optimized?

R*-tree Yes!

Page 31: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R*-tree Optimization Criteria:

(O1) Area covered by an index MBR (O2) Overlap between directory MBRs (O3) Margin of a directory rectangle (O4) Storage utilization

Sometimes it is impossible to optimize all the above criteria at the same time!

Page 32: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R*-tree

ChooseSubtree: If next node is a leaf node, choose the

node using the following criteria: Least overlap enlargement Least area enlargement Smaller area

Else Least area enlargement Smaller area

Page 33: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R*-tree SplitNode

Choose the axis to split Choose the two groups along the chosen axis

ChooseSplitAxis Along each axis, sort rectangles and break

them into two groups (M-2m+2 possible ways where one group contains at least m rectangles). Compute the sum S of all margin-values (perimeters) of each pair of groups. Choose the one that minimizes S

ChooseSplitIndex Along the chosen axis, choose the grouping

that gives the minimum overlap-value

Page 34: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R*-tree Forced Reinsert:

defer splits, by forced-reinsert, i.e.: instead of splitting, temporarily delete some entries, shrink overflowing MBR, and re-insert those entries

Which ones to re-insert? How many? A: 30%

Page 35: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-tree: variations What about static datasets?

(no ins/del) Hilbert What about other bounding

shapes?

Page 36: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees - variations

what about static datasets (no ins/del/upd)?

Q: Best way to pack points?

Page 37: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees - variations what about static datasets (no

ins/del/upd)? Q: Best way to pack points? A1: plane-sweep great for queries on ‘x’; terrible for ‘y’

Page 38: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees - variations what about static datasets (no

ins/del/upd)? Q: Best way to pack points? A1: plane-sweep great for queries on ‘x’; bad for ‘y’

Page 39: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees - variations what about static datasets (no

ins/del/upd)? Q: Best way to pack points? A1: plane-sweep great for queries on ‘x’; terrible for ‘y’ Q: how to improve?

Page 40: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees - variations A: plane-sweep on HILBERT curve!

Page 41: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees - variations

A: plane-sweep on HILBERT curve! In fact, it can be made dynamic

(how?), as well as to handle regions (how?)

Page 42: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees - variations Dynamic (‘Hilbert

R-tree): each point has an

‘h’-value (hilbert value)

insertions: like a B-tree on the h-value

but also store MBR, for searches

Page 43: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

Hilbert R-tree

Data structure of a node?

LHV x-low, ylowx-high, y-high

ptr

h-value >= LHV &MBRs: inside parent MBR

Page 44: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees - variations Data structure of a node?

LHV x-low, ylowx-high, y-high

ptr

h-value >= LHV &MBRs: inside parent MBR

~B-tree

Page 45: Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation

R-trees - variations Data structure of a node?

LHV x-low, ylowx-high, y-high

ptr

h-value >= LHV &MBRs: inside parent MBR

~ R-tree