lecture 19: parallel sorting - university of maryland · 2019-11-05 · parallel sorting •...

10
Lecture 19: Parallel Sorting Abhinav Bhatele, Department of Computer Science High Performance Computing Systems (CMSC714)

Upload: others

Post on 19-Jul-2020

15 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Lecture 19: Parallel Sorting - University Of Maryland · 2019-11-05 · Parallel Sorting • Sorting is used in many HPC codes • For example, figuring out which particles/atoms

Lecture 19: Parallel SortingAbhinav Bhatele, Department of Computer Science

High Performance Computing Systems (CMSC714)

Page 2: Lecture 19: Parallel Sorting - University Of Maryland · 2019-11-05 · Parallel Sorting • Sorting is used in many HPC codes • For example, figuring out which particles/atoms

Abhinav Bhatele, CMSC714

Summary of last lecture

• I/O can become a bottleneck when other portions of the code scale well

• Reading input datasets, writing numerical/scientific output, checkpointing

• Parallel file system required for high performance

• Different approaches

• One process per file, shared file, shared files for subsets of processes

• Contention for metadata server and OSTs/disks

2

Page 3: Lecture 19: Parallel Sorting - University Of Maryland · 2019-11-05 · Parallel Sorting • Sorting is used in many HPC codes • For example, figuring out which particles/atoms

Abhinav Bhatele, CMSC714

Parallel Sorting

• Sorting is used in many HPC codes

• For example, figuring out which particles/atoms are within a cutoff radius

• Two broad categories of parallel sorting algorithms:

• Merge-based

• Splitter-based

3

Page 4: Lecture 19: Parallel Sorting - University Of Maryland · 2019-11-05 · Parallel Sorting • Sorting is used in many HPC codes • For example, figuring out which particles/atoms

Abhinav Bhatele, CMSC714

Review Bitonic Sort

• Merge-based algorithm: sort by merging bitonic sequences

• Bitonic sequence: increases monotonically then decreases monotonically

• At each step, merge a bitonic sequence

4

9 6 1 5 14 7 15 11 2 12 13 4 16 8 3 10

6 9 5 1 7 14 15 11 2 12 13 4 8 16 10 3

1 5 6 9 15 14 11 7 2 4 12 13 16 10 8 3

1 5 6 7 9 11 14 15 16 13 12 10 8 4 3 2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Page 5: Lecture 19: Parallel Sorting - University Of Maryland · 2019-11-05 · Parallel Sorting • Sorting is used in many HPC codes • For example, figuring out which particles/atoms

Abhinav Bhatele, CMSC714

Review QuickSort

• Choose a pivot element from the unsorted list

• Move all elements < pivot before the pivot and all elements > pivot after the pivot

• Recursively apply this to the sublists before and after pivot

5

Page 6: Lecture 19: Parallel Sorting - University Of Maryland · 2019-11-05 · Parallel Sorting • Sorting is used in many HPC codes • For example, figuring out which particles/atoms

Abhinav Bhatele, CMSC714

Parallel Sample Sort

• Instead of selecting one pivot, we select p-1 samples (if there are p processors)

• This provides us with p-1 “splitters”

• These p-1 splitters create p buckets

• Keys are then sent to the appropriate bucket

• Why called sample sort? sample s keys randomly from each processor, sort sp keys and select p-1 splitters from this sorted sample

6

Page 7: Lecture 19: Parallel Sorting - University Of Maryland · 2019-11-05 · Parallel Sorting • Sorting is used in many HPC codes • For example, figuring out which particles/atoms

Abhinav Bhatele, CMSC714

Parallel Radix Sort

• Instead of comparing keys, looks at k bits of each key in every step

• k-bit radix sort looks at k bits in one step

• Move from least significant to most significant bits

• k bits leads to putting keys into 2k buckets in a step

• Parallel version:

• These buckets are assigned to p processes and key movement leads to all-to-all communication

• To balance buckets across processes: use histograms to decide assignment of buckets to processes

7

Page 8: Lecture 19: Parallel Sorting - University Of Maryland · 2019-11-05 · Parallel Sorting • Sorting is used in many HPC codes • For example, figuring out which particles/atoms

Abhinav Bhatele, CMSC714

Questions

• Can we talk about how the “plus scan” works as described in the “Scanning the Histogram” section? The paper essentially just states that it’s something that happens.

• The sending of data clearly dominates the run time of the described radix sort. Is this always the case with parallel sorting?

• What does it mean to pipeline operations?

• How do GPUs fare in these sorting schemes?

8

An Improved Supercomputer Sorting Benchmark

Page 9: Lecture 19: Parallel Sorting - University Of Maryland · 2019-11-05 · Parallel Sorting • Sorting is used in many HPC codes • For example, figuring out which particles/atoms

Abhinav Bhatele, CMSC714

Questions

• Are there commonly used libraries that implement these versions of parallel sorting algorithms? Are there other commonly used parallel sorting libraries?

• Can we go over the different sorting algorithms, the paper was confusing

• When is it important for sorting algorithms to be stable?

9

A Comparison of Sorting Algorithms for the Connection Machine CM-2

Page 10: Lecture 19: Parallel Sorting - University Of Maryland · 2019-11-05 · Parallel Sorting • Sorting is used in many HPC codes • For example, figuring out which particles/atoms

Abhinav Bhatele

5218 Brendan Iribe Center (IRB) / College Park, MD 20742

phone: 301.405.4507 / e-mail: [email protected]

Questions?