computational geometrygerth/emf03/slides/buffertrees.pdf · 2003. 9. 17. · priority queues...
TRANSCRIPT
Buffer Trees
Lars Arge. The Buffer Tree: A New Technique for Optimal I/O
Algorithms. In Proceedings of Fourth Workshop on Algorithms and
Data Structures (WADS), Lecture Notes in Computer Science
Vol. 955, Springer-Verlag, 1995, 334-345.
1
Computational Geometry
2
Pairwise Rectangle Intersection
A
B
C
E
D
F
Input N rectangles
Output all R pairwise intersections
Example (A, B) (B, C) (B, F ) (D, E) (D, F )
Intersection Types
Intersection Identified by. . .
A
B Orthogonal Line Segment Intersection
on 4N rectangle sides
E
D
Batched Range Searching
on N rectangles and N upper-left corners
Algorithm Orthogonal Line Segment Intersection
+ Batched Range Searching + Duplicate removal
3
Orthogonal Line Segment Intersection
Input N segments, vertical and horizontal
Output all R intersections
sweepline
y1
y2
y3
y4
Sweepline Algorithm
• Sort all endpoints w.r.t. x-coordinate
• Sweep left-to-right with a range tree T
storing the y-coordinates of horizontal
segments intersecting the sweepline
• Left endpoint ⇒ insertion into T
• Right endpoint ⇒ deletion from T
• Vertical segment [y1, y2] ⇒report T ∩ [y1, y2]
Total (internal) time O(N · log2 N + R)
4
Range Trees
Create Create empty structure
Insert(x) Insert element x
Delete(x) Delete the inserted element x
Report(x1, x2) Report all x ∈ [x1, x2] x2x1
Binary search trees B-trees
(internal) (# I/Os)
Updates O(log2 N) O(logB N)
Report O(log2 N + R) O(logB N + RB )
Orthogonal Line Segment Intersection using B-trees
O(Sort(N) + N · logB N + RB ) I/Os . . .
5
Batched Range Searching
Input N rectangles and points
Output all R (r, p) where point p is within rectangle r
sweepline
Sweepline Algorithm• Sort all points and left/right rectangle
sides w.r.t. x-coordinate
• Sweep left-to-right while storing the
y-intervals of rectangles intersecting
the sweepline in a segment tree T
• Left side ⇒ insert interval into T
• Right side ⇒ delete interval from T
• Point (x, y) ⇒ stabbing query :
report all [y1, y2] where y ∈ [y1, y2]
Total (internal) time O(N · log2 N + R)
6
Segment Trees
Create Create empty structure
Insert(x1, x2) Insert segment [x1, x2]
Delete(x1, x2) Delete the inserted segment [x1, x2]
Report(x) Report the segments [x1, x2] where x ∈ [x1, x2]
Assumption The endpoints come from a fixed set S of size N + 1
• Construct a balanced binary tree on the N intervals defined by S
• Each node spans an interval and stores a linked list of intervals
• An interval I is stored at the O(log N) nodes where the node
intervals ⊆ I but the intervals of the parents are not
Create O(N log2 N)
Insert O(log2 N)
Delete O(log2 N)
Report O(log2 N + R)
7
Computational Geometry – Summary
A
B
C
E
D
F
Pairwise Rectangle Intersection
Orthogonal Line Segment Intersection
Batched Range Searching
O(N · log2 N + R)
Range Trees
Segment Trees
Updates O(log2 N)
Queries O(log2 N + R)
8
Observations on Range and Segment Trees
• Only inserted elements are deleted, i.e. Delete does not have to
check if the elements are present in the structure
• Applications are off-line, i.e. amortized performance is sufficient
• Queries to the range trees and segment trees can be answered lazily,
i.e. postpone processing queries until there are sufficient many
queries to be handled simultaneously
• Output can be generated in arbitrary order, i.e. batched queries
• The deletion time of a segment in a segment tree is known when the
segment is inserted, i.e. no explicit delete operation required
Assumptions for buffer trees
9
Buffer Trees
“General transformation”
=⇒
On-line Internal Batched External
10
Buffer Trees
• (a, b)-tree, a = m/4 and b = m
• Buffer at internal nodes m blocks
• Buffers contain delayed operations, e.g. Insert(x) and Delete(x)
• Internal memory buffer containing ≤ B last operations
Moved to root buffer when full
.. ..
... ...
.. ....
.. ...........
������������������������
���������������������������
���� � �� � �� � �� � �O(logm n)
B
m blocks
14
m . . . m
11
Buffer Emptying : Insertions Only
O( nm ) buffer empty operations per
internal level, each of O(m) I/Os
⇒ in total O(Sort(N)) I/Os
Emptying internal node buffers• Distribute elements to children
• For each child with more than m blocks of elements recursively
empty buffer
Emptying leaf buffers• Sort buffer
• Merge buffer with leaf blocks
• Rebalance by splitting nodes bottom-up (buffers are now empty)
Corollary Optimal sorting by top-down emptying all buffers
.. ..
... ...
.. ....
.. ...........
��������
�������������������
��������
�������������������
� � �� � �� � �� � �O(logm n)
B
m blocks
14
m . . . m
12
Priority Queues
• Operations : Insert(x) and DeleteMin
• Internal memory min-buffer containing the 14mB smallest elements
• Allow nodes on leftmost path to have degree between 1 and m
⇒ rebalancing only requires node splittings
• Buffer emptying on leftmost path
⇒ two leftmost leaves contain ≥ mB/4 elements
• Insert and DeleteMin amortized O( 1B logM/B
NB ) I/Os
.... .. ..
... ...
..
.. ...........
���������������������������
���������������������������
� � �� � �� � �
� � �� � �� � �
O(logm n)
B
m blocks
14
m . . . m
13
Batched Range Trees
Delayed operations in buffers : Insert(x), Delete(x), Report(x1, x2)
Assumption : Only inserted elements are deleted
14
Time Order Representation
Definition A buffer is in time order representation (TOR) if
1. Report queries are older than Insert operations and younger than
Delete operations
2. Insertions and deletions are in sorted order
3. Report queries are sorted w.r.t. x1
x1, x2, . . . [x11, x12], [x21, x22], . . .timeDelete Report
x1 ≤ x2 ≤ · · ·
Insert
y1, y2, . . .
y1 ≤ y2 ≤ · · ·x11 ≤ x21 ≤ · · ·
15
Constructing Time Order Representations
Lemma A buffer of O(M) elements can be made into TOR using
O(M+RB ) I/Os where R is the number of matches reported
Proof
• Load buffer into memory
• First Inserts are shifted up thru time
– If Insert(x) passes Report(x1, x2) and x ∈ [x1, x2] then a match
is reported
– If Insert(x) meets Delete(x), then both operations are removed
• Deletes are shifted down thru time
– If Delete(x) passes Report(x1, x2) and x ∈ [x1, x2] then a match
is reported
• Sort Deletions, Reports and Insertion internally
• Output to buffer 2
16
Merging Time Order Representations
Lemma Two list S1 and S2 in TOR where the elements in S2 are older
than the elements in S1 can be merged into one time ordered list in
O( |S1|+|S2|+RB ) I/Os
Proof
1. Swap i2 and d1 and remove canceling operations
2. Swap d1 and s2 and report matches
3. Swap i2 and s1 and report matches
4. Merge lists
� �� �� �� �
i1
s1
s2
d2
d1
i2
i1
s1
i2
d2
d1
s2
i1
d2
d1
s2
i2
s1
i
s
d
i1
s1
d1
i2
s2
d2
S1
S2
Step 3Step 2Step 1 Step 4
time
2
17
Emptying All Buffers
Lemma Emptying all buffers in a tree takes O(N+RB ) I/Os
Proof
• Make all buffers into time order representation, O(N+RB ) I/Os
• Merge buffers top-down for complete layers ⇒ since layer sizes
increase geometrically, #I/Os dominated by size of lowest level, i.e
O(N+RB ) I/Os
.. ..
... ...
.. ....
.. ...........
������������������������
���������������������������
���� � �� � �� � �� � �O(logm n)
B
m blocks
14
m . . . m
2
Note The tree should be rebalanced afterwards
18
Emptying Buffer on Overflow
Invariant Emptying a buffer distributes information to children in TOR
1. Load first m blocks in and make TOR and report matches
2. Merge with result from parent in TOR that caused overflow
3. Identify which subtrees are spanned completely by a Report(x1, x2)
4. Empty subtrees identified in 3.
• Merge with Delete operations
• Generate output for the range queries spanning the subtrees
• Merge Insert operations
5. Distribute remaining information to trees not found in 3.
19
Batched Range Trees - The Result
Rebalancing As in (a, b)-trees, except that buffers must be empty. For
Fusion and Sharing a forced buffer emptying on the sibling is required,
causing O(m) addtional I/Os. Since at most O(n/m) rebalacning steps
done ⇒ O(n) additional I/Os.
Total #I/Os Bounded by generated output O( RB ), and O( 1
B ) I/O for
each level an operation is moved down.
Theorem Batched range trees support
Updates O( 1N Sort(N)) amortized I/Os
Queries O( 1N Sort(N) + R
B ) amortized I/Os
20
Batched Segment Trees
√
m nodes
n leaves
m nodes
EA BC D
F
O(logm n)
• Internal node:
– Partition x-interval in√
m slabs/intervals
– O(m) multi-slabs defined by continuous ranges of slabs
– Segments spanning at least one slab (long segment) stored in list
associated with largest multi-slab it spans
– Short segments, as well as ends of long segments, are stored
further down the tree
21
Batched Segment Trees
√
m nodes
n leaves
m nodes
EA BC D
F
O(logm n)
• Buffer-emptying process in O(m + RB ) I/Os:
– Load buffer — O(m)
– Store long segments from buffer in multi-slab lists — O(m)
– Report “intersections” between queries from buffer and segments
in relevant multi-slab lists — O( RB )
– “Push” elements one level down — O(m)
22
Batched Segment Trees
Theorem Batched segment trees support
Updates O( 1N Sort(N)) amortized I/Os
Queries O( 1N Sort(N) + R
B ) amortized I/Os
23
Orthogonal Line Segment Intersection
sweepline
y1
y2
y3
y4
• Sort all endpoints w.r.t. x-coordinate Sort(N)
• Sweep left-to-right with a batched range tree T O(NB )
• Left endpoint ⇒ insertion into T }
O( 1B logM/B
NB )
• Right endpoint ⇒ deletion from T
• Vertical segment ⇒ batched report O( 1B logM/B
NB + R
B )
O(Sort(N) + RB ) I/Os
24
Batched Range Searching
sweepline
• Sort w.r.t. x-coordinate Sort(N)
• Sweep left-to-right with a batched segment tree T O(NB )
• Left side ⇒ insert interval into T }
O( 1B logM/B
NB )
• Right side ⇒ delete interval from T
• Point ⇒ batched stabbing query O( 1B logM/B
NB + R
B )
O(Sort(N) + RB ) I/Os
25
Pairwise Rectangle Intersection
A
B
C
E
D
F
Orthogonal line Batched range Duplicatesegment intersection searching removal
=A
B
+ E
D
+ ?
4N rectangle sides N rectangles andN upper-left corners
Trick Only generate one intersection between two rectangles
⇒ O(Sort(N) + RB ) I/Os
26
Buffer Tree Applications – Summary
A
B
C
E
D
F
Pairwise Rectangle Intersection
Orthogonal Line Segment Intersection
Batched Range Searching
O(Sort(N) + RB )
Batched Range Trees
Batched Segment Trees
}
Updates O( 1N Sort(N))
Queries O( 1N Sort(N) + R
B )
Priority Queues O( 1N Sort(N))
27