sorting on the cray x1 - cray user group...qksort on cray x1 •a version of standard radix sort....

45
Sorting on the Cray X1 Helene Kulsrud CCR-P

Upload: others

Post on 24-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

Sorting on the Cray X1

Helene Kulsrud

CCR-P

Page 2: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

Sorting Methods

• Heap Sort• Bubble Sort• Insertion Sort• Shellsort• Mergesort• Quicksort

• RRRRaaaaddddiiiixxxx SSSSoooorrrrtttt

• RRRRaaaaddddiiiixxxx EEEExxxxcccchhhhaaaannnnggggeeeeSSSSoooorrrrtttt

• etc…• Parallel Sorts etc..

Page 3: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

Sorting on the Cray X1

• QKSORT – A scalar sort (radix exchange)

• PSORT - A vectorized sort (radix)

• MSORT - A multiprocessor sort

Page 4: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

11101011

1010

0100

11000110

11110011

00010101

1101

1000

01000000

0111

Page 5: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

11101011

1010

0100

11000110

11110011

00010101

1101

1000

01000000

0111

Page 6: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

11101011

1010

0100

11000110

11110011

0001

0101

11011000

01000000

0111

Page 7: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

1110

10111010

0100

11000110

11110011

0001

0101

11011000

01000000

0111

Page 8: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

11101011

1010

0100

11000110

1111

00110001

0101

11011000

01000000

0111

Page 9: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

11101011

0010

0100

1100

0110

1111

00110001

0101

11011000

0100

0000

0111

Page 10: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

00100100

0011

0001

0101

0100

0000

0111

Page 11: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

01010100

0011

0001

0010

0100

0000

0111

Page 12: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

01010100

0100

0001

0010

0011

0000

0111

Page 13: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

01010100

0100

0111

0010

0011

0000

0001

Page 14: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

0010

0100

0011

0001

0101

0100

0000

0111

Page 15: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

0011

0100

0010

0000

01010100

0001

0111

Page 16: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

QKSORT on Cray X1

• A version of standard radix sort.

• Basically a scalar code.

• Code has two distinct parts.

• Part 1: Interchange which uses vectorregisters.

• Part 2 -Short sort in vector registers.

• Two routines in assembly language.

Page 17: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

Partitioning in the Vector Registers

new 64 wds 64 wds

64 wds 64 wds new

V0

V7

V1

V6

64 wds

V2

V4

Page 18: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

Short Sort

1277

87933

124332214

55577823152966

V1 V2

……12

Page 19: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

Short Sort

1111333377

87933

124332214

55577823152966

……12………………77

V1 V2

Page 20: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

Improving QKSORT

27.208 5.987 2.784 .467 .217 .037.016Port

Version

4000000100000050000010000050000100005000Count

Page 21: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

Improving QKSORT

1.824 .425 .205. 038.018.003.002New

27.208 5.987 2.784 .467 .217 .037.016Port

Version

4000000100000050000010000050000100005000Count

Page 22: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

QKSORT in 2003

4.805 2.331 .425. 204.038.018.003 X1

2222....888888888888 1111....222277773333 ....111188880000....000088885555....000011116666....000000009999....000000001111AlphaEV6.8

4.756 2.242 .342.153.025.020.002Alpha

EV6.7

9.507 4.367 .855.427.083.041.008T90

1.024. 510.079.037.006T3E

10000000500000010000005000001000005000010000 SizeMachine

Page 23: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

11101011

1101

0100

11000110

11110010

00010101

1000

01000000

0111

Page 24: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

11101011

1101

0100

1100

0010

1111

00010101

1000

01000000

0111

0110

Page 25: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

Counts in Right 2 bit pocket

11 3 11

8 3 10

5 3 01

0 5 00

AddressCountValue

Page 26: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

11101011

0100

11000110

1111

00010101

1000

01000000

0111

1011

00100110

1111

0001

0101

0111

1000

01001100

01000000 1101

1101

0010

1110

Page 27: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

11101011

0100

11000110

1111

00010101

1000

01000000

0111

0111

0110

1011

0101

0001

0010

1110

1111

1000

01001100

01000000

0010

1101

1101

Page 28: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

Counts in left 2 bit pocket

10 4 11

8 2 10

3 5 01

0 3 00

AddressCountValue

Page 29: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

1011

0010

01101110

00010101

0111

1101

1011

0111

0100

1100

0101

1111

0010

0001

0110

1000

0100

0000

1000

01001100

01000000

00000000........

00001111........

11110000........

11111111....

11101101

1111

Page 30: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

PSORT on the Cray X1

• A radix sort .• It requires double memory.• Three major parts to the code.• A vector code.• The address calculation used most time. word[j+1] = word[j]+ word[j+1]• Write address calculation in assembly

language.• Two versions – scalar and 2 vectors

Page 31: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

Two vectors Address Update

k=(n-1)/63+1

for(j=1;j<k;j++) for(i=0;i<63;i++) word[k*i+j+1] = word[k*i+j+1] +word[k*i+j]

for(i=1;i<63;i++) for(j=1;j<k;j++) word[k*i+j] = word[k*i] + word[k*i+j}

Page 32: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

Improving PSORT

1.2252.4972 .4142 .0722.0360.0047.0049Ported

4000000100000050000010000050000100005000Count

Version

Page 33: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

Improving PSORT

.7838.2484 .1725 .0327.0159.0028.0013V-S Loop

1.2252.4972 .4142 .0722.0360.0047.0049Ported

4000000100000050000010000050000100005000Count

Version

Page 34: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

Improving PSORT

.7838.2484 .1725 .0327.0159.0028.0013V-S Loop

1.2252.4972 .4142 .0722.0360.0047.0049Ported

.7080.2015 .1273 .0210.0092.0027.00122V Loops

4000000100000050000010000050000100005000Count

Version

Page 35: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

PSORT in 2003

1.684 .866 .202 .127.021.009.003 X1

16.536 7.636 .934 .394.070.023.003AlphaEV6.8

27.604 12.801 3.016 1.490.120.040.004AlphaEV6.7

1111....33331111 ....777700002222 ....111188883333 ....000099990000....000011119999....000011117777....000000003333T90

6.371 .3.307.747.377 .037T3E

10000000500000010000005000001000005000010000 SizeMachine

Page 36: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

SSSSOOOORRRRTTTT

SSSSOOOORRRRTTTT

SSSSOOOORRRRTTTT

SSSSOOOORRRRTTTT

SSSSOOOORRRRTTTT

SSSSOOOORRRRTTTT

PE 0

PE 5

PE 4PE 2

PE 1

PE 3

Page 37: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

TTTTiiiilllleeeessss

TTTTiiiilllleeeessss

TTTTiiiilllleeeessss

TTTTiiiilllleeeessss

TTTTiiiilllleeeessss

TTTTiiiilllleeeessss

PE 0

Page 38: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

MMMMeeeerrrrgggg

eeee

MMMMeeeerrrrggggeeee

MMMMeeeerrrrggggeeee

MMMMeeeerrrrggggeeee

MMMMeeeerrrrggggeeee

MMMMeeeerrrrggggeeee

Page 39: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

Improving MSORT

• Slow time due to SHMEM.

• May be due to barrier code.

• PSORT is faster than QKSORT.

• We need double block of memory formerge.

• However, times are similar since mostof time is spent in moving data.

Page 40: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

MSORT in 2003

1.098 .099 .068 .004T3E - 16

2.292 .348 .048 .042X1 - 16

.650 .064 .014X1 - 4

4.011 .492 .050 .033X1 - 8

.755 .074 .085 .009T3E - 24

1.661 .142 .025 .002T3E - 8

1111....666677774444 ....111199991111 ....000022226666 .109X1 - 24

.955 .091 ....000011113333X1 - 2

.263 .023 .002T3E - 4

.488 .040 .003T3E - 2

10,000,0001,000,000100,00010,000 Size

Page 41: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

What next?

• Can we think of a way to usemultistreaming?

• If so – should we do it?

• Algorithms using SPP mode.

• Suggest compiler improvements toCray.

Page 42: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

What was the Point?

• Early exercise on the X1.

Page 43: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

What was the Point?

• Early exercise on the X1.

• Produce good sorts for the X1.

Page 44: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

What was the Point?

• Early exercise on the X1.

• Produce good sorts for the X1.

• Found compiler weaknesses.

Page 45: Sorting on the Cray X1 - Cray User Group...QKSORT on Cray X1 •A version of standard radix sort. • Basically a scalar code. • Code has two distinct parts. • Part 1: Interchange

What was the Point?

• Early exercise on the X1.

• Produce good sorts for the X1.

• Found compiler weaknesses.

• Found machine strengths.