advanced topics in algorithms and data structures page 1 an overview of lecture 3 a simple parallel...
Post on 21-Dec-2015
220 views
TRANSCRIPT
![Page 1: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/1.jpg)
Advanced Topics in Algorithms and Data Structures
Page 1
An overview of lecture 3
• A simple parallel algorithm for computing parallel prefix.
• A parallel merging algorithm
![Page 2: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/2.jpg)
Advanced Topics in Algorithms and Data Structures
Page 2
• We are given an ordered set A of n elements
and a binary associative operator .
• We have to compute the ordered set
0 1 2 1, , ,..., nA a a a a
0 0 1 0 1 1, ,..., ... na a a a a a
Definition of prefix computation
![Page 3: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/3.jpg)
Advanced Topics in Algorithms and Data Structures
Page 3
• For example, if is + and the input is the ordered set
{5, 3, -6, 2, 7, 10, -2, 8}then the output is
{5, 8, 2, 4, 11, 21, 19, 27}• Prefix sum can be computed in O (n)
time sequentially.
An example of prefix computation
![Page 4: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/4.jpg)
Advanced Topics in Algorithms and Data Structures
Page 4
First Pass• For every internal node of the tree,
compute the sum of all the leaves in its subtree in a bottom-up fashion.
sum[v] := sum[L[v]] + sum[R[v]]
Using a binary tree
![Page 5: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/5.jpg)
Advanced Topics in Algorithms and Data Structures
Page 5
for d = 0 to log n – 1 dofor i = 0 to n – 1 by 2d+1 do in parallel
a[i + 2d+1 - 1] := a[i + 2d - 1] + a[i + 2d+1 - 1]
• In our example, n = 8, hence the outer loop iterates 3 times, d = 0, 1, 2.
Parallel prefix computation
![Page 6: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/6.jpg)
Advanced Topics in Algorithms and Data Structures
Page 6
• d = 0: In this case, the increments of 2d+1
will be in terms of 2 elements.• for i = 0,
a[0 + 20+1 - 1] := a[0 + 20 - 1] + a[0 + 20+1 - 1]or, a[1] := a[0] + a[1]
When d= 0
![Page 7: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/7.jpg)
Advanced Topics in Algorithms and Data Structures
Page 7
First Pass• For every internal node of the tree,
compute the sum of all the leaves in its subtree in a bottom-up fashion.
sum[v] := sum[L[v]] + sum[R[v]]
Using a binary tree
![Page 8: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/8.jpg)
Advanced Topics in Algorithms and Data Structures
Page 8
• d = 1: In this case, the increments of 2d+1
will be in terms of 4 elements.• for i = 0,
a[0 + 21+1 - 1] := a[0 + 21 - 1] + a[0 + 21+1 - 1]or, a[3] := a[1] + a[3]
• for i = 4, a[4 + 21+1 - 1] := a[4 + 21 - 1] + a[4 + 21+1 - 1]or, a[7] := a[5] + a[7]
When d = 1
![Page 9: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/9.jpg)
Advanced Topics in Algorithms and Data Structures
Page 9
• blue: no change from last iteration.• magenta: changed in the current
iteration.
The First Pass
![Page 10: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/10.jpg)
Advanced Topics in Algorithms and Data Structures
Page 10
Second Pass• The idea in the second pass is to do a
topdown computation to generate all the prefix sums.
• We use the notation pre[v] to denote the prefix sum at every node.
The Second Pass
![Page 11: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/11.jpg)
Advanced Topics in Algorithms and Data Structures
Page 11
• pre[root] := 0, the identity element for the operation, since we are considering the operation.
• If the operation is max, the identity element will be .
Computation in the second phase
![Page 12: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/12.jpg)
Advanced Topics in Algorithms and Data Structures
Page 12
pre[L[v]] := pre[v]pre[R[v]] := sum[L[v]] + pre[v]
Second phase (continued)
![Page 13: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/13.jpg)
Advanced Topics in Algorithms and Data Structures
Page 13
Example of second phase
pre[L[v]] := pre[v]pre[R[v]] := sum[L[v]] + pre[v]
![Page 14: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/14.jpg)
Advanced Topics in Algorithms and Data Structures
Page 14
for d = (log n – 1) downto 0 dofor i = 0 to n – 1 by 2d+1 do in parallel
temp := a[i + 2d - 1]a[i + 2d - 1] := a[i + 2d+1 - 1] (left child)a[i + 2d+1 - 1] := temp + a[i + 2d+1 - 1]
(right child)
a[7] is set to 0
Parallel prefix computation
![Page 15: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/15.jpg)
Advanced Topics in Algorithms and Data Structures
Page 15
• We consider the case d = 2 and i = 0temp := a[0 + 22 - 1] := a[3]a[0 + 22 - 1] := a[0 + 22+1 - 1] or, a[3] := a[7]a[0 + 22+1 - 1] := temp + a[0 + 22+1 - 1] or,a[7] := a[3] + a[7]
Parallel prefix computation
![Page 16: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/16.jpg)
Advanced Topics in Algorithms and Data Structures
Page 16
• blue: no change from last iteration.• magenta: left child.• brown: right child.
Parallel prefix computation
![Page 17: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/17.jpg)
Advanced Topics in Algorithms and Data Structures
Page 17
• All the prefix sums except the last one are now in the leaves of the tree from left to right.
• The prefix sums have to be shifted one position to the left. Also, the last prefix sum (the sum of all the elements) should be inserted at the last leaf.
• The complexity is O (log n) time and O (n) processors.Exercise: Reduce the processor complexity to O (n / log n).
Parallel prefix computation
![Page 18: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/18.jpg)
Advanced Topics in Algorithms and Data Structures
Page 18
• Vertex x precedes vertex y if x appears before y in the preorder (depth first) traversal of the tree.
Lemma: After the second pass, each vertex of the tree contains the sum of all the leaf values that precede it.
Proof: The proof is inductive starting from the root.
Proof of correctness
![Page 19: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/19.jpg)
Advanced Topics in Algorithms and Data Structures
Page 19
Inductive hypothesis: If a parent has the correct sum, both children must have the correct sum.
Base case: This is true for the root since the root does not have any node preceding it.
Proof of correctness
![Page 20: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/20.jpg)
Advanced Topics in Algorithms and Data Structures
Page 20
•Left child: The left child L[v] of vertex v has exactly the same leaves preceding it as the vertex itself.
Proof of correctness
![Page 21: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/21.jpg)
Advanced Topics in Algorithms and Data Structures
Page 21
•These are the leaves in the region A for vertex L[v].
•Hence for L[v], we can copy pre(v) as the parent’s prefix sum is correct from the inductive hypothesis.
Proof of correctness
![Page 22: Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d625503460f94a442c9/html5/thumbnails/22.jpg)
Advanced Topics in Algorithms and Data Structures
Page 22
•Right child: The right child of v has two sets of leaves preceding it.• The leaves preceding the parent
(region A ) for R[v]• The leaves preceding L[v] (region B ).
pre(v) is correct from the inductive hypothesis.
Hence, pre(R[v]) := pre(v) + sum(L[v]).
Proof of correctness