parallel sorting… n data values sequential algorithms ~ o( n.log n) except special cases....
Post on 21-Dec-2015
218 views
TRANSCRIPT
Parallel sorting…
• n data values • sequential algorithms ~ O( n.log n) except special cases.• motivation:
• as n huge, memory must be distributed• want some parallel speedup.
• on p processes nlocal nglobal / nprocs.• O(nglobal(log nglobal+1) ) possible with parallel sort.
Naive solution…
• Distributed sequential sort method.• Still ~ O( nglobal.log nglobal ) + communication cost:
• as n huge gets painful because many short messages passed.• no parallel speedup• optimal for memory space
• Consider lightly optimized bubble sort.• O(n2)
S O R T I N G E X A M P L E O N E AS O R T I N G E X A M P L E O N E A
SO R TI N GE XA M P LE ONEA
SO R TI N GE XA M P LE ONEA
Partition
Sort
Compare/swapIA
SO R TN GE XM P LE ONEA Compare/swapIA NIE IA E
SO R TG X MP LE ONEA NIA E Sort
Compare/swapSO R TG X MP LE ONEA NIA E
Compare/swap
OI
Compare/swap
S RTG XM PLE ONEA NA E OI SR T XPO SR T XPO Sort
DONE
Distributed sequential sort…
S O R T I N G E X A M P L E O N E AS O R T I N G E X A M P L E O N E B
Partition odd/even
Odd-even sort…
S O R G E X
A M P
L E O
N E AT I N
SO R GE X
A M P
LE O
NEATI N
Sort data[0,…,nlocal-1]A M P NEATI N Odds send leftS O R G E X L E OM P X L N OR S TOI N EA G EA E Evens mergeEvens return high
M P X L N OR S T
Evens send left
EA G EA E
Odds merge
M P XR S TEA G EA E
Odds return highM P XR S T
M P X L N OR S TOI N
L N OM P XR S TEA G EA E
M P XR S T Odds send leftL N OEA G EA E Evens mergeOI N M P XR S T L N OEA G EA E Evens return high
OI N P XR S T O
Evens send left
ML NEA EOI N R S TML NEA E
Odds mergeOdds return highOI N R S T
OI N P XR S T OEA G
P XOOI N R S TML NEA E
OI N R S T Odds send leftML NEA E Evens mergeI P XOEA G OI N R S TML NEA E Evens return high
XE G ON S TNE
DONEceil(nprocs/2) iterations
p1
p0 p2
p3
p4
p5First iteration
Second iteration
Third iteration
Odd-even sort… (subtleties)
• Allocate contiguous memory for mergeint * data, *buffer;data = (int*)calloc(2*nlocal, sizeof(int)); // 2*nlocal ints allocated. data [0,…,nlocal-1] is significantbuffer = &data[nlocal]; // buffer points to second half of the data[ ] array
• Need to figure out the left and right neighbors:left_neighbor = mype-1; // mype will send nlocal ints from &data[0] to left_neighbor’s &buffer[0]right_neighbor = mype+1; // mype will send nlocal ints from &buffer[0] to right_neighbor’s &data[0]
• BUT…
• …the left-most pe (mype == 0) has nowhere to send the low values in data[0,…,nlocal-1] since it has no left neighbor.
• This pe’s left_neighbor should be set to MPI_PROC_NULL. Sending to MPI_PROC_NULL returns immediately without doing anything.
• AND …
• …the right-most pe (mype == nprocs-1) has nowhere to return the high data pointed to by buffer because it has no right neighbor.
• This pe should send the high values in &buffer[0,nlocal-1] to MPI_PROC_NULL, and• EITHER
• When initializing data[ ], this pe should also pad buffer[0,…,nlocal-1] with INT_MAX (assuming we’re sorting ints) so that the merge step operates correctly,
• OR • This pe should not execute the merge step of the iteration.
Shearsort…
• Have seen treating processes as 1D array decomposed into odd/even.
• How about odd-even sort on 2D array of processes?
• Big speed-up requires special ordering
O 1 2 3
7 6 5 4
8 9 10 11
15 14 13 12
Smallestnumber
Largestnumber
Shearsort…
• For i = 1,2,3,…,log(n)+1
• If i is odd• sort even rows biggest at left, smallest at right• sort odd rows smallest at left, biggest at right
• If i is even• sort all columns so smallest number is at top, and biggest at bottom
S O R T
E G N I
X A M P
N O E L
O R S T
N I G E
A M P X
O N L E
A I G E
N M L E
N N P T
O R S X
A E G I
N M L E
N N P T
X S R O
A E G E
N M L I
N N P O
X S R T
E E G
M L I
N N PO
X T S R
A
Nn=1n=2n=3n=4n=5
Done
Other sorts…
• Bucketsort: • Pe’s partition their data into small buckets.• Pe’s send appropriate chunks to “large buckets” on each pe.• Pe’s sort “large buckets”.• (Optional: preprocessing stage where master collects info on distribution,
assigns buckets to slaves. This procedure deals with non-uniform distribution problems).
• Parallel mergesort O(log2n)• Odd-even implementation of standard mergesort.
• Parallel quicksort: O(n)• master-slave, difficult to balance sub-tasks.• tree implementation – even harder to balance tree• hypercube topology can be optimal!• still O(n2) worst case.