i/o efficient algorithms and data structures 3-4 (lecture by lars arge)
DESCRIPTION
In many modern applications that deal with massive data sets, communication between internal and external memory, and not actual computation time, is the bottleneck in the computation. This is due to the huge difference in access time of fast internal memory and slower external memory such as disks. In order to amortize this time over a large amount of data, disks typically read or write large blocks of contiguous data at once. This means that it is important to design algorithms with a high degree of locality in their disk access pattern, that is, algorithms where data accessed close in time is also stored close on disk. Such algorithms take advantage of block transfers by amortizing the large access time over a large number of accesses. In the area of I/O-efficient algorithms the main goal is to develop algorithms that minimize the number of block transfers (I/Os) used to solve a given problem. In these four lectures, we will cover fundamental I/O-efficient algorithms and data structures, such as sorting algorithms, search trees and priority queues. We will also discuss geometric problems, as well as geometric data structures for various range searching problems.TRANSCRIPT
I/O-efficient Algorithms and Data Structures
Lars Arge
Professor and center director
Aarhus University
ALMADA summer schoolAugust 5, 2013
Lars Arge 2
I/O-Model
• Parameters
N = # elements in problem instance
B = # elements that fits in disk block
M = # elements that fits in main memory
T = # output size in searching problem
• We often assume that M>B2
• I/O: Movement of block between memory and disk
D
P
M
Block I/O
I/O-efficient Algorithms and Data Structures
Lars Arge 3
Fundamental Bounds Internal External
• Scanning: N• Sorting: N log N• Permuting • Searching:
• Note:– Linear I/O: O(N/B)– Permuting not linear– Permuting and sorting bounds are equal in all practical cases– B factor VERY important: – Cannot sort optimally with search tree
NBlog
BN
BN
BMlog
BN
NBN
BN
BN
BM log
}log,min{BN
BN
BMNN
N2log
I/O-efficient Algorithms and Data Structures
Lars Arge 4
External Sort• Merge sort:
– Create N/M memory sized sorted runs– Merge runs together M/B at a time
phases using I/Os each
• Distribution sort similar (but harder – split elements)
)( BNO)(log
MN
BMO
I/O-efficient Algorithms and Data Structures
Lower bounds• Permuting N elements according to a given permutation takes
I/Os in “indivisibility” model
• Sorting N elements takes I/Os in comparison model
– Proved using counting arguments
Lars Arge 5
I/O-efficient Algorithms and Data Structures
})log,(min{BN
BN
BMN
)log(BN
BN
BM
!)!())(( NBN BN
B
M t
6Lars Arge
Cache-Oblivious Model
• N, B, and M as in I/O-model– Assumed that M>B2 (tall cache assumption)
• M and B not used in algorithm description• Block transfers by optimal paging strategy
Analyze in two-level modelEfficient on one level,
efficient on all levels (of fully associative hierarchy using LRU)
I/O-efficient Algorithms and Data Structures
7Lars Arge
Cache-Oblivious Distribution Sort• Break into subarrays of size
– Recursively sort each subarray• Distribute into buckets of size ~
– Recursively sort each bucket
• Distribution performed in I/Os
=
N
I/O-efficient Algorithms and Data Structures
)log(BN
BN
BMO
N
N
N
MNO
MNONTNNT
BN
BN
if)(
if)()(2)(
)(BNO
1 23 4
Lars Arge 8
Sorting summary• External merge or distribution sort takes I/Os
– Merge-sort based on M/B-way merging– Distribution sort based on -way distribution
and (complicated) split elements finding– Also cache-oblivious using -way distribution
• Optimal in comparison model
• lower bound in stronger indivisibility model– Holds even for permuting
)log(BN
BN
BMO
})log,(min{BN
BN
BMN
BM
I/O-efficient Algorithms and Data Structures
N
Distribution sweeping• Combination of distribution and plane sweep
– Used to solve batched geometric problems• General idea:
– Divide plane into M/B slabs with O(N/(M/B)) endpoints each– Sweep plane top-down while finding solutions involving
objects in different slabs– Distribute data to M/B slabs recourse in each slab
• Examples:– Orthogonal line segment intersection– Rectangle intersection– Homework
Lars Arge 9
I/O-efficient Algorithms and Data Structures
Outline• Thursday:
– Sorting upper bound (merge- and distribution-sort)– Sorting (permuting) lower bound– Cache-oblivious sorting– Batched geometric problems: Distribution sweeping
• Today:– Searching (B-trees)– Batched searching (Buffer-trees)– I/O-efficient priority queues– I/O-efficient list ranking
Lars Arge 10
I/O-efficient Algorithms and Data Structures
11Lars Arge
– If nodes stored arbitrarily on diskÞ Search in I/OsÞ Rangesearch in I/Os
• Binary search tree:– Standard method for search among N elements– We assume elements in leaves
– Search traces at least one root-leaf path
External Search Trees
)(log2 NO
)(log2 N
)(log2 TNO
I/O-efficient Algorithms and Data Structures
Lars Arge 12
External Search Trees
• BFS blocking:– Block height– Output elements blocked
Rangesearch in I/Os
• Optimal: O(N/B) space and query
)(log2 B
)(B
)(log)(log/)(log 22 NOBONO B
)(log BT
B N )(log B
TB N
I/O-efficient Algorithms and Data Structures
13Lars Arge
Cache-Oblivious Search Tree• “Van Emde Boas” recursive layout:
NBlog
B1 B
A
B2 N
Nlog21
Nlog21
Recursive memory layout:
A B1 B2 BN
I/O-efficient Algorithms and Data Structures
14Lars Arge
Cache-Oblivious Search Tree• Search analysis:
– Conceptually stop recursion when recursive subtrees have size and each recursive subtree fits in a block
NBlog
– Search path “hits” blocks)(log)(log/log NOBN BB
BB
B
B
BB
)(log B
I/O-efficient Algorithms and Data Structures
Lars Arge 15
• Maintaining BFS blocking during updates?– Balance normally maintained in search trees using rotations
• Seems very difficult to maintain BFS blocking during rotation– Also need to make sure output (leaves) is blocked!
Dynamic External Search Trees?
x
y
x
y
I/O-efficient Algorithms and Data Structures
Lars Arge 16
B-trees• BFS-blocking naturally corresponds to tree with fan-out
• B-trees balanced by allowing node degree to vary– Rebalancing performed by splitting and merging nodes
)(B
I/O-efficient Algorithms and Data Structures
Lars Arge 17
• (a,b)-tree uses linear space and has height
Choosing a,b = each node/leaf stored in one disk block
O( /N B) space and query
(a,b)-tree• T is an (a,b)-tree (a≥2 and b≥2a-1)
– All leaves on the same level and contain between a and b elements
– Except for the root, all nodes have degree between a and b
– Root has degree between 2 and b
)(log NO a
)(log BT
B N
)(B
(2,4)-tree
I/O-efficient Algorithms and Data Structures
Lars Arge 18
(a,b)-Tree Insert• Insert:
Search and insert element in leaf v
DO v has b+1 elements/children
Split v:
make nodes v’ and v’’ with
and elements
insert element (ref) in parent(v)
(make new root if necessary)
v=parent(v)
• Insert touch nodes
bb 2
1 ab 2
1
)(log Na
v
v’ v’’
21b 2
1b
1b
I/O-efficient Algorithms and Data Structures
Lars Arge 19
(2,4)-Tree Insert
I/O-efficient Algorithms and Data Structures
20
(a,b)-Tree Delete• Delete:
Search and delete element from leaf v
DO v has a-1 elements/children
Fuse v with sibling v’:
move children of v’ to v
delete element (ref) from parent(v)
(delete root if necessary)
If v has >b (and ≤ a+b-1<2b) children split v
v=parent(v)
• Delete touch nodes )(log NO a
v
v
1a
12 a
I/O-efficient Algorithms and Data Structures
Lars Arge
Lars Arge 21
(2,4)-Tree Delete
I/O-efficient Algorithms and Data Structures
Lars Arge 22
• (a,b)-tree properties:– If b=2a-1 every update can
cause many rebalancing
operations
– If b≥2a update only cause O(1) rebalancing operations amortized– If b>2a only rebalancing operations amortized
* Both somewhat hard to show– If b=4a easy to show that update causes rebalance
operations amortized* After split during insert a leaf contains 4a/2=2a elements* After fuse during delete a leaf contains between 2a and
5a elements (split if more than 3a between 3/2a and 5/2a)
(a,b)-Tree
)()( 11
2aa
OO b
)log( 1 NO aa
insert
delete
(2,3)-tree
I/O-efficient Algorithms and Data Structures
23Lars Arge
B-tree Summary• B-trees: (a,b)-trees with a,b =
– O(N/B) space– O(logB N+T/B) query
– O(logB N) update
• B-trees with elements in the leaves sometimes called B+-tree
• Construction in I/Os– Sort elements and construct leaves– Build tree level-by-level bottom-up
)(B
)log(BN
BN
BMO
I/O-efficient Algorithms and Data Structures
24Lars Arge
Generalized B-tree• B-tree with branching parameter b and leaf parameter k (b,k≥8)
– All leaves on same level and contain between 1/4k and k elements– Except for the root, all nodes have degree between 1/4b and b– Root has degree between 2 and b
• B-tree with leaf parameter – O(N/B) space– Height – amortized leaf rebalance operations– amortized internal node rebalance operations
)(logBN
bO)( 1
kO
)log( 1BN
bkbO
)(Bk
I/O-efficient Algorithms and Data Structures
B-tree Sorting• Normally we can sort N elements in O(N log N) time with search tree
– Insert all elements one-by-one (construct tree)– Output in sorted order using in-order traversal
• Same algorithm using B-tree use I/Os– A factor of non-optimal
• We would like to have dynamic data structure to use in algorithms I/O operations
)log( NNO B
)(log
log
BBM
BO
)log(BN
BMBNO )log( 1
BN
BMBO
I/O-efficient Algorithms and Data Structures
25Lars Arge
26
• Main idea: Logically group nodes together and add buffers– Insertions done in a “lazy” way – elements inserted in buffers– When a buffer runs full elements are pushed one level down– Buffer-emptying in O(M/B) I/Os
every block touched constant number of times on each level
inserting N elements (N/B blocks) costs I/Os)log(BN
BMBNO
Buffer-tree Technique
B
B
M elements
fan-out M/B)(log
BN
BMO
I/O-efficient Algorithms and Data Structures
Lars Arge
27Lars Arge
• Definition:– B-tree with branching parameter and leaf parameter B – Size M buffer in each internal node
• Updates:– Add time-stamp to insert/delete element– Collect B elements in memory before inserting in root buffer– Perform buffer-emptying when buffer runs full
Basic Buffer-tree
BM
$m$ blocksMBM
BM ...
41
B
I/O-efficient Algorithms and Data Structures
28Lars Arge
Basic Buffer-tree• Note:
– Buffer can be larger than M during recursive buffer-emptying* Elements distributed in sorted order
at most M elements in buffer unsorted– Rebalancing needed when “leaf-node” buffer emptied
* Leaf-node buffer-emptying only performed after all full internal node buffers are emptied
$m$ blocksMBM
BM ...
41
B
I/O-efficient Algorithms and Data Structures
29Lars Arge
Basic Buffer-tree• Internal node buffer-empty:
– Load first M (unsorted) elements into
memory and sort them– Merge elements in memory with rest
of (already sorted) elements– Scan through sorted list while
* Removing “matching” insert/deletes* Distribute elements to child buffers
– Recursively empty full child buffers
• Emptying buffer of size X takes O(X/B+M/B)=O(X/B) I/Os
$m$ blocksMBM
BM ...
41
I/O-efficient Algorithms and Data Structures
30Lars Arge
Basic Buffer-tree• Buffer-empty of leaf node with K elements in leaves
– Sort buffer as previously– Merge buffer elements with elements in leaves– Remove “matching” insert/deletes obtaining K’ elements– If K’<K then
* Add K-K’ “dummy” elements and insert in “dummy” leaves
Otherwise* Place K elements in leaves* Repeatedly insert block of elements in leaves and rebalance
• Delete dummy leaves and rebalance when all full buffers emptied
K
I/O-efficient Algorithms and Data Structures
31Lars Arge
Basic Buffer-tree• Invariant:
Buffers of nodes on path from root to emptied leaf-node are empty
• Insert rebalancing (splits)
performed as in normal B-tree
• Delete rebalancing: v’ buffer emptied before fuse of v– Necessary buffer emptyings performed before next dummy-
block delete– Invariant maintained
v vv’
v v’ v’’
I/O-efficient Algorithms and Data Structures
32Lars Arge
Basic Buffer-tree• Analysis:
– Not counting rebalancing, a buffer-emptying of node with X ≥ M elements (full) takes O(X/B) I/Os
total full node emptying cost I/Os– Delete rebalancing buffer-emptying (non-full) takes O(M/B) I/Os
cost of one split/fuse O(M/B) I/Os– During N updates
* O(N/B) leaf split/fuse* internal node split/fuse
Total cost of N operations: I/Os
)log(BN
BM
BM
BN
O
)log(BN
BN
BMO
)log(BN
BN
BMO
I/O-efficient Algorithms and Data Structures
33Lars Arge
Basic Buffer-tree• Emptying all buffers after N insertions:
Perform buffer-emptying on all nodes in BFS-order
resulting full-buffer emptyings cost I/Os
empty non-full buffers using O(M/B) O(N/B) I/Os
• N elements can be sorted using buffer tree in I/Os
)log(BN
BN
BMO
)(B
MB
N
O
$m$ blocksMBM
BM ...
41
B
)log(BN
BN
BMO
I/O-efficient Algorithms and Data Structures
34Lars Arge
• Batching of operations on B-tree using M-sized buffers– I/O updates amortized– All buffers emptied in I/Os
• One-dim. rangesearch operations can also be supported in
I/Os amortized– Search elements handle lazily like updates– All elements in relevant sub-trees
reported during buffer-emptying– Buffer-emptying in O(X/B+T’/B),
where T’ is reported elements
Buffer-tree Summary
)log( 1BN
BMBO
)log( 1BT
BN
BMBO
$m$ blocks
)log(BN
BN
BMO
I/O-efficient Algorithms and Data Structures
35Lars Arge
Orthogonal Line Segment Intersection
• Sweep plane top-down while maintaining search tree T on vertical segments crossing sweep line (by x-coordinates)– Top endpoint of vertical segment: Insert in T– Bottom endpoint of vertical segment: Delete from T– Horizontal segment: Perform range query with x-interval on T
I/Os using buffer-tree
I/O-efficient Algorithms and Data Structures
)log(BT
BN
BN
BMO
36Lars Arge
• Basic buffer tree can be used in external priority queue• To delete minimal element:
– Empty all buffers on leftmost path– Delete elements in leftmost
leaf and keep in memory– Deletion of next M minimal
elements free– Inserted elements checked against
minimal elements in memory
• I/Os every O(M) delete amortized
Buffered Priority Queue
)log(BN
BMBMO
M41
)log( 1BN
BMBO
)(BM
B
I/O-efficient Algorithms and Data Structures
37Lars Arge
List Ranking• Problem:
– Given N-vertex linked list stored in array– Compute rank (number in list) of each vertex
• One of the simplest graph problem one can think of
• Straightforward O(N) internal algorithm– Also uses O(N) I/Os in external memory
• Much harder to get external algorithm
3 4 5 9 68 27 101 5 2 6 73 4 108 9
)log(BN
BMBNO
I/O-efficient Algorithms and Data Structures
38Lars Arge
List Ranking• We will solve more general problem:
– Given N-vertex linked list with edge-weights stored in array– Compute sum of weights (rank) from start for each vertex
• List ranking: All edge weights one
• Note: Weight stored in array entry together with edge (next vertex)
1 5 2 6 73 4 108 9
1 1 11 111 1
1 1
I/O-efficient Algorithms and Data Structures
39Lars Arge
List Ranking
• Algorithm:
1. Find and mark independent set of vertices
2. “Bridge-out” independent set: Add new edges
3. Recursively rank resulting list
4. “Bridge-in” independent set: Compute rank of independent set
• Step 1, 2 and 4 in I/Os• Independent set of size αN for 0 < α ≤ 1
I/Os
11 111 1 1 11 12 2 2
1 3 4 6 8 9 102 5 7
)log(BN
BMBNO
)log()log())1(()(BN
BMBN
BN
BMBN OONTNT
I/O-efficient Algorithms and Data Structures
40Lars Arge
List Ranking: Bridge-out/in
• Obtain information (edge or rang) of successor – Make copy of original list– Sort original list by successor id– Scan original and copy together to obtain successor information– Sort modified original list by id
I/Os
11
23 4 5 9 68 27 102 3 4 95 86 7 103 4 5 9 68 27 103 4 8 9 627 10
)log(BN
BMBNO
I/O-efficient Algorithms and Data Structures
41Lars Arge
List Ranking: Independent Set• Easy to design randomized algorithm:
– Scan list and flip a coin for each vertex– Independent set is vertices with head and successor with tails
Independent set of expected size N/4
• Deterministic algorithm:– 3-color vertices (no vertex same color as predecessor/successor) – Independent set is vertices with most popular color
Independent set of size at least N/3
• 3-coloring I/O algorithm)log(BN
BMBNO )log(
BN
BMBNO
)log(BN
BMBNO
3 4 5 9 68 27 10
I/O-efficient Algorithms and Data Structures
42Lars Arge
List Ranking: 3-coloring• Algorithm:
– Consider forward and backward lists (heads/tails in two lists)– Color forward lists (except tail) alternately red and blue– Color backward lists (except tail) alternately green and blue
3-coloring
3 4 5 9 68 27 10
I/O-efficient Algorithms and Data Structures
43Lars Arge
List Ranking: Forward List Coloring• Identify heads and tails• For each head, insert red element in priority-queue (priority=position)• Repeatedly:
– Extract minimal element from queue– Access and color corresponding element in list– Insert opposite color element corresponding to successor in queue
• Scan of list• O(N) priority-queue operations
I/Os
`3 4 5 9 68 27 10
)log(BN
BMBNO
I/O-efficient Algorithms and Data Structures
44Lars Arge
List Ranking Summary• Simplest graph problem: Traverse linked list
• Very easy O(N) algorithm in internal memory• Much more difficult external memory
– Finding independent set via 3-coloring– Bridging vertices in/out
• Permuting bound best possible– Also true for other graph problems
)log(BN
BMBNO
})log,(min{BN
BN
BMNO
3 4 5 9 68 27 101 5 2 6 73 4 108 9
I/O-efficient Algorithms and Data Structures
45Lars Arge
List Ranking Summary• External list ranking algorithm similar to PRAM algorithm
– Sometimes external algorithms by “PRAM algorithm simulation”
• Forward list coloring algorithm example of “time forward processing”– Use external priority-queue to send information “forward in time”
to vertices to be processed later
• PRAM algorithm ideas and time forward processing used in many graph algorithms
3 4 5 9 68 27 10
I/O-efficient Algorithms and Data Structures
References• External Memory Geometric Data Structures (section 1-5)
Lecture notes by L. Arge
• I/O-Efficient Graph Algorithms (section 1-2)
Lecture notes by N. Zeh
• Cache-Oblivious Algorithms. M. Frigo, C. Leiserson, H. Prokop, and S. Ramachandran. Proc. FOCS '99.
Lars Arge 46
I/O-efficient Algorithms and Data Structures
Summary• We discussed
– Sorting– Hardness of permuting (sorting)– Batched geometric problems– Searching: B-trees, buffer-trees, priority-queue– Graph algorithms: List-ranking– Cache-oblivious algorithms
• Important techniques– Multiway merging/distribution– Distribution sweeping– Multiway-trees, buffered data structures– Time-forward processing (and PRAM simulation)
Lars Arge 47
I/O-efficient Algorithms and Data Structures