csci-455/552 introduction to high performance computing lecture 13

22
CSCI-455/552 Introduction to High Performance Computing Lecture 13

Upload: grant-grant

Post on 04-Jan-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CSCI-455/552 Introduction to High Performance Computing Lecture 13

CSCI-455/552

Introduction to High Performance Computing

Lecture 13

Page 2: CSCI-455/552 Introduction to High Performance Computing Lecture 13

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Sorting NumbersA parallel version of insertion sort.

5.18

Page 3: CSCI-455/552 Introduction to High Performance Computing Lecture 13

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.19

Pipeline for sorting using insertion sort

Type 2 pipeline computation

Page 4: CSCI-455/552 Introduction to High Performance Computing Lecture 13

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

The basic algorithm for process Pi is

recv(&number, Pi-1);if (number > x) {

send(&x, Pi+1);x = number;

} else send(&number, Pi+1);

With n numbers, number ith process is to accept = n - i.Number of passes onward = n - i - 1Hence, a simple loop could be used.

5.20

Page 5: CSCI-455/552 Introduction to High Performance Computing Lecture 13

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Insertion sort with results returned to master process using bidirectional line configuration

5.21

Page 6: CSCI-455/552 Introduction to High Performance Computing Lecture 13
Page 7: CSCI-455/552 Introduction to High Performance Computing Lecture 13
Page 8: CSCI-455/552 Introduction to High Performance Computing Lecture 13

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Insertion sort with results returned

5.22

Page 9: CSCI-455/552 Introduction to High Performance Computing Lecture 13

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

The code for a process, Pi, could be based upon

recv(&x, Pi-1);/* repeat following for each number */recv(&number, Pi-1);if ((number % x) != 0) send(&number, P i+1);

Each process will not receive the same number of numbers and is not known beforehand. Use a “terminator” message, which is sent at the end of the sequence:

recv(&x, Pi-1);for (i = 0; i < n; i++) {

recv(&number, Pi-1);If (number == terminator) break; (number % x) != 0) send(&number, P i+1);

}

5.24

Page 10: CSCI-455/552 Introduction to High Performance Computing Lecture 13
Page 11: CSCI-455/552 Introduction to High Performance Computing Lecture 13
Page 12: CSCI-455/552 Introduction to High Performance Computing Lecture 13
Page 13: CSCI-455/552 Introduction to High Performance Computing Lecture 13
Page 14: CSCI-455/552 Introduction to High Performance Computing Lecture 13
Page 15: CSCI-455/552 Introduction to High Performance Computing Lecture 13

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Solving a System of Linear EquationsUpper-triangular form

where a’s and b’s are constants and x’s are unknowns to be found.

5.25

Page 16: CSCI-455/552 Introduction to High Performance Computing Lecture 13

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Back SubstitutionFirst, unknown x0 is found from last equation; i.e.,

Value obtained for x0 substituted into next equation to obtain x1; i.e.,

Values obtained for x1 and x0 substituted into next equation to obtain x2:

and so on until all the unknowns are found.

5.26

Page 17: CSCI-455/552 Introduction to High Performance Computing Lecture 13

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Pipeline Solution

First pipeline stage computes x0 and passes x0 onto the second stage, which computes x1 from x0 and passes both x0 and x1 onto the next stage, which computes x2 from x0 and x1, and so on.

Type 3 pipeline computation

5.27

Page 18: CSCI-455/552 Introduction to High Performance Computing Lecture 13

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

The ith process (0 < i < n) receives the values x0, x1, x2, …, xi-1 and computes xi from the equation:

5.28

Page 19: CSCI-455/552 Introduction to High Performance Computing Lecture 13

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Sequential CodeGiven constants ai,j and bk stored in arrays a[ ][ ] and b[ ], respectively, and values for unknowns to be stored in array, x[ ], sequential code could be

x[0] = b[0]/a[0][0]; /* computed separately */for (i = 1; i < n; i++) { /*for remaining unknowns*/

sum = 0;For (j = 0; j < i; j++

sum = sum + a[i][j]*x[j];x[i] = (b[i] - sum)/a[i][i];

}

5.29

Page 20: CSCI-455/552 Introduction to High Performance Computing Lecture 13

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Parallel Code

Pseudocode of process Pi (1 < i < n) of could be

for (j = 0; j < i; j++) {recv(&x[j], Pi-1);send(&x[j], Pi+1);

}sum = 0;for (j = 0; j < i; j++)

sum = sum + a[i][j]*x[j];x[i] = (b[i] - sum)/a[i][i];send(&x[i], Pi+1);

Now have additional computations to do after receiving and resending values.

5.30

Page 21: CSCI-455/552 Introduction to High Performance Computing Lecture 13

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.31

Pipeline processing using back substitution

Page 22: CSCI-455/552 Introduction to High Performance Computing Lecture 13