r-trees extension of b+-trees. collection of d-dimensional rectangles. a point in d-dimensions is...

44
R-Trees Extension of B+-trees. Collection of d-dimensional rectangles. A point in d-dimensions is a trivial rectangle.

Upload: hester-lamb

Post on 24-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

R-Trees

• Extension of B+-trees. Collection of d-dimensional rectangles. A point in d-dimensions is a trivial rectangle.

Non-rectangular Data

• Non-rectangular data may be represented by minimum bounding rectangles (MBRs).

Operations

• Insert

• Delete

• Find all rectangles that intersect a query rectangle.

• Good for large rectangle collections stored on disk.

R-Trees—Structure

• Data nodes (leaves) contain rectangles.

• Index nodes (non-leaves) contain MBRs for data in subtrees.

• MBR for rectangles or MBRs in a non-root node is stored in parent node.

R-Trees—Structure

• R-tree of order M. Each node other than the root has between m

<= ceil(M/2) and M rectangles/MBRs.• Assume m = ceil(M/2) henceforth.

Typically, m = ceil(M/2). Root has between 2 and M rectangles/MBRs. Each index node has as many MBRs as

children. All data nodes are at the same level.

Example• R-tree of order 4.

Each node may have up to 4 rectangles/MBRs.

Example• Possible partitioning of our example data

into 12 leaves.

Example• Possible R-tree of order 4 with 12 leaves.

a b c d e f g h i j k l

m n o p

Leaves are data nodes that contain 4 input rectangles each.

a-p are MBRs

Example• Possible corresponding

grouping.

a

b

cd

m

a b cd e f g h i j k l

m n o p

Example

a

b

cd

me fn

• Possible corresponding grouping.

a b cd e f g h i j k l

m n o p

Example

a

b

cd

me fn

h

g

i

o p

• Possible corresponding grouping.

a b cd e f g h i j k l

m n o p

Query• Report all rectangles that intersect a given

rectangle.

Query

• Start at root and find all MBRs that overlap query.• Search corresponding subtrees recursively.

Query

mn

o p

x

a b cd e f g h i j k l

m n o p

Query• Search m.

mn

o p

a

b

cd

x x

a b cd e f g h i j k l

m n o p

Insert• Similar to insertion into B+-tree but may

insert into any leaf; leaf splits in case capacity exceeded. Which leaf to insert into? How to split a node?

Insert—Leaf Selection• Follow a path from root to leaf.• At each node move into subtree whose MBR area

increases least with addition of new rectangle.

mn

o p

Insert—Leaf Selection• Insert into m.

m

Insert—Leaf Selection• Insert into n.

n

Insert—Leaf Selection• Insert into o.

o

Insert—Leaf Selection• Insert into p.

p

Insert—Split A Node• Split set of M+1 rectangles/MBRs into 2 sets A

and B. A and B each have at least m rectangles/MBRs. Sum of areas of MBRs of A and B is minimum.

M = 8, m = 4

Insert—Split A Node• Split set of M+1 rectangles/MBRs into 2 sets A

and B. A and B each have at least m rectangles/MBRs. Sum of areas of MBRs of A and B is minimum.

M = 8, m = 4

Insert—Split A Node• Split set of M+1 rectangles/MBRs into 2 sets A

and B. A and B each have at least m rectangles/MBRs. Sum of areas of MBRs of A and B is minimum.

M = 8, m = 4

Insert—Split A Node• Exhaustive search for best A and B.

Compute area(MBR(A)) + area(MBR(B)) for each possible A.

Note—for each A, the B is unique. Select partition that minimizes this sum.

• When |A| = m = ceil(M/2), number of choices for A is

(M+1)!

m!(M+1-m)!

Impractical for large M.

Insert—Split A Node• Grow A and B using a clustering strategy.

Start with a seed rectangle a for A and b for B. Grow A and B one rectangle at a time. Stop when the M+1 rectangles have been

partitioned into A and B.

Insert—Split A Node• Quadratic Method—seed selection.

Let S be the set of M+1 rectangles to be partitioned.

Find a and b inS that maximizearea(MBR(a,b)) – area(a) – area(b)

M = 8, m = 4

Insert—Split A Node• Quadratic Method—seed selection.

Let S be the set of M+1 rectangles to be partitioned.

Find a and b inS that maximizearea(MBR(a,b)) – area(a) – area(b)

M = 8, m = 4

Insert—Split A Node• Quadratic Method—assign remaining

rectangles/MBRs. Find an unassigned rectangle c that maximizes

|area(MBR(A,c)) – area(MBR(A))

- (area(MBR(B,c)) – area(MBR(B)))|

M = 8, m = 4

Insert—Split A Node• Quadratic Method—assign remaining

rectangles/MBRs. Find an unassigned rectangle c that maximizes

|area(MBR(A,c)) – area(MBR(A))

- (area(MBR(B,c)) – area(MBR(B)))|

M = 8, m = 4

Insert—Split A Node• Quadratic Method—assign remaining

rectangles/MBRs. Assign c to partition whose area increases least.

M = 8, m = 4

Insert—Split A Node• Quadratic Method—assign remaining rectangles/MBRs.

Continue assigning in this way until all remaining rectangles must necessarily be assigned to one of the two partitions for that partition to have m rectangles.

M = 8, m = 4

Insert—Split A Node• Linear Method—seed selection.

Choose a and b to have maximum normalized separation.

M = 8, m = 4

Insert—Split A Node• Linear Method—seed selection.

Choose a and b to have maximum normalized separation.

M = 8, m = 4

Separation in x-dimension

Insert—Split A Node• Linear Method—seed selection.

Choose a and b to have maximum normalized separation.

M = 8, m = 4

Rectangles with max x-separation

Insert—Split A Node• Linear Method—seed selection.

Choose a and b to have maximum normalized separation.

M = 8, m = 4

Divide by x-width to normalize

Insert—Split A Node• Linear Method—seed selection.

Choose a and b to have maximum normalized separation.

M = 8, m = 4

Separation in y-dimension

Insert—Split A Node• Linear Method—seed selection.

Choose a and b to have maximum normalized separation.

M = 8, m = 4

Rectangles with max y-separation

Insert—Split A Node• Linear Method—seed selection.

Choose a and b to have maximum normalized separation.

M = 8, m = 4

Divide by y-width to normalize

Insert—Split A Node• Linear Method—assign remainder.

Assign remaining rectangles in random order. Rectangle is assigned to partition whose MBR area

increases least. Stop when all remaining rectangles must be assigned to

one of the partitions so that the partition has its minimum required m rectangles.

M = 8, m = 4

Delete

• If leaf doesn’t become deficient, simply readjust MBRs in path from root.

• If leaf becomes deficient, get from nearest sibling (if possible) and readjust MBRs.

• Combine with sibling as in B+ tree.

• Could instead do a more global reorganization to get better R-tree.

Variants

• R*-tree Leaf selection and node overflows in insertion

handled differently.

• Hilbert R-tree

Related Structures

• R+-tree Index nodes have non-overlapping rectangles. A data object may be represented in several

data nodes. No upper bound on size of a data node. No bounds (lower/upper) on degree of an index

node.

Related Structures

• Cell tree Combines BSP and R+-tree concepts. Index nodes have non-overlapping convex

polyhedrons. No lower/upper bound on size of a data node. Lower bound (but not upper) on degree of an

index node.