introduction to parallel programming (message passing)

67
Introduction to Parallel Programming (Message Passing) Francisco Almeida [email protected] Parallel Computing Group

Upload: sakura

Post on 15-Jan-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Introduction to Parallel Programming (Message Passing). Francisco Almeida [email protected]. Parallel Computing Group. Beowulf Computers. Distributed Memory. COTS: Commercial-Off-The-Shelf computers. The Parallel Model. PRAM. Computational Models Programming Models Architectural Models. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Parallel Programming  (Message Passing)

Introduction to Parallel Programming

(Message Passing)

Francisco Almeida

[email protected]

Parallel Computing Group

Page 2: Introduction to Parallel Programming  (Message Passing)

Beowulf Computers

•Distributed Memory

•COTS: Commercial-Off-The-Shelf computers

Page 3: Introduction to Parallel Programming  (Message Passing)
Page 4: Introduction to Parallel Programming  (Message Passing)

The Parallel Model

PRAM

BSP, LogP

PVM, MPI, HPF, Threads, OPenMP

Parallel Architectures

Computational Models

Programming Models

Architectural Models

Page 5: Introduction to Parallel Programming  (Message Passing)

The Message Passing Model

Interconnection Network processor

processor

processor

processor

processor

processor

processor

Send(parameters)

Recv(parameters)

Page 6: Introduction to Parallel Programming  (Message Passing)

Network of WorkstationsHardware

•Sun Sparc Ultra 1• 143 Mhz Etherswitch

•Distributed Memory•Non Shared Memory Space•Star Topology

Page 7: Introduction to Parallel Programming  (Message Passing)

SGI Origin 2000Hardware

•C4-CEPBA•64 R1000processos•8 Gb memory•32 Gflop/s

•Shared Dsitributed Memory•Hypercubic Topology

Page 8: Introduction to Parallel Programming  (Message Passing)

Digital AlphaServer 8400Hardware

•C4-CEPBA•10 Alpha processors21164•2 Gb Memory•8,8 Gflop/s

•Shared Memory•BusTopology

Page 9: Introduction to Parallel Programming  (Message Passing)

Drawbacks that arise when solving Problems using Parallelism

Parallel Programming is more complex than sequential.

Results may vary as a consequence of the intrinsic non determinism.

New problems. Deadlocks, starvation...

Is more difficult to debug parallel programs.

Parallel programs are less portable.

Page 10: Introduction to Parallel Programming  (Message Passing)

MPI

MPIEUI

p4

pvm ExpressZipcode

CMMD

PARMACS

Parallel Libraries

Parallel Applications

Parallel Languages

Page 11: Introduction to Parallel Programming  (Message Passing)

MPI

• What Is MPI?• Message Passing Interface standard • The first standard and portable message passing library with good performance • "Standard" by consensus of MPI Forum participants from over 40 organizations • Finished and published in May 1994, updated in June 1995

• What does MPI offer? • Standardization - on many levels • Portability - to existing and new systems • Performance - comparable to vendors' proprietary libraries • Richness - extensive functionality, many quality implementations

Page 12: Introduction to Parallel Programming  (Message Passing)

MPI hello.c#include <stdio.h>#include <string.h>#include "mpi.h"main(int argc, char*argv[]) {

int name, p, source, dest, tag = 0;char message[100];MPI_Status status;MPI_Init(&argc,&argv);MPI_Comm_rank(MPI_COMM_WORLD,&name);MPI_Comm_size(MPI_COMM_WORLD,&p);

if (name != 0) { printf("Processor %d of %d\n",name, p); sprintf(message,"greetings from process %d!", name); dest = 0; MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else { printf("processor 0, p = %d ",p); for(source=1; source < p; source++) { MPI_Recv(message,100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s\n",message); } } MPI_Finalize();}

Processor 2 of 4Processor 3 of 4Processor 1 of 4processor 0, p = 4 greetings from process 1!greetings from process 2!greetings from process 3!

A Simple MPI Program

mpicc –o hello hello.c

mpirun –np 4 hello

Page 13: Introduction to Parallel Programming  (Message Passing)

Basic Communication Operations

Page 14: Introduction to Parallel Programming  (Message Passing)

One-to-all broadcast Single-node Accumulation

. . .0 p1 . . .0 p1

M One-to-all broadcast

Single-node AccumulationM MM

. . .

1

p

2

0Step 1

Step 2

Step p

Page 15: Introduction to Parallel Programming  (Message Passing)

Broadcast on Hypercubes

76

54

32

10

76

54

32

10

First Step

76

54

32

10

76

54

32

10

Second Step

Page 16: Introduction to Parallel Programming  (Message Passing)

Broadcast on Hypercubes

76

54

32

10

76

54

32

10

Third Step

Page 17: Introduction to Parallel Programming  (Message Passing)

MPI Broadcast

int MPI_Bcast(

void *buffer;

int count;

MPI_Datatype datatype;

int root;

MPI_Comm comm;

);

Broadcasts a message from the

process with rank "root" to

all other processes of the group

Page 18: Introduction to Parallel Programming  (Message Passing)

Reduction on Hypercubes

@ conmutative and associative operator

Ai in processor i

Every processor has to obtain A0@A1@...@AP-1

A0@A1 @A2 @A3 000 A1@A0@ A3 @A2 001

A2 @A3@ A0@A1 101 A3 @A2@ A1@A0 101

A7 @A6@A5@A4 101A6 @A7@A4@A5 110

A7 @A6@ A5@A4 101 A0@A1 000 A1@A0 001

A2 @A3 101 A3 @A2 101

A5@A4 101

A7 @A6 101A6 @A7 110

A0 000 A1 001

A2 101A3 101

A5 101

A7 101A6 110

A0

A1

Page 19: Introduction to Parallel Programming  (Message Passing)

Reductions with MPI

int MPI_Reduce(

void *sendbuf;void *recvbuf;int count;MPI_Datatype datatype;MPI_Op op;int root;MPI_Comm comm;);

Reduces values on all processes to a single value processes

int MPI_Allreduce(

void *sendbuf;void *recvbuf;int count;MPI_Datatype datatype;MPI_Op op;MPI_Comm comm;);

Combines values form all processes and distributes the result back to all

Page 20: Introduction to Parallel Programming  (Message Passing)

All-To-All BroadcastMultinode Accumulation

. . .0 p1 . . .0 p1

M1All-to-all broadcast

Single-node AccumulationM0

Mp

M1

M2 Mp

M0 M0

M1 M1

MpMp

Reductions, Prefixsums

Page 21: Introduction to Parallel Programming  (Message Passing)

MPI Collective Operations

MPI Operator Operation

---------------------------------------------------------------

MPI_MAX maximum

MPI_MIN minimum

MPI_SUM sum

MPI_PROD product

MPI_LAND logical and

MPI_BAND bitwise and

MPI_LOR logical or

MPI_BOR bitwise or

MPI_LXOR logical exclusive or

MPI_BXOR bitwise exclusive or

MPI_MAXLOC max value and location

MPI_MINLOC min value and location

Page 22: Introduction to Parallel Programming  (Message Passing)

The Master Slave Paradigm

Master

Slaves

Page 23: Introduction to Parallel Programming  (Message Passing)

Computing

0.0 0.2 0.4 0.6 0.8 1.0

2

4

=0

1

4(1+x2)

dx MPI_Bcast(&n, 1, MPI_INT, 0,

MPI_COMM_WORLD);

h = 1.0 / (double) n; mypi = 0.0; for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i - 0.5); mypi += f(x); } mypi = h * sum; MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE,

MPI_SUM, 0, MPI_COMM_WORLD);

mpirun –np 3 cpi

Page 24: Introduction to Parallel Programming  (Message Passing)

The Portability of the Efficiency

Page 25: Introduction to Parallel Programming  (Message Passing)

The Sequential Algorithm

void mochila01_sec (void)

{

unsigned v1;

int c, k;

for (c = 0; c <= C; c++)

f[0][c] = 0;

for (k = 1; k <= N; k++) {

for (c = 0; c <= C; c++)

f[k][c] = f[k-1][c];

if (c >= w[k])

v1 = f[k-1][c - w[k]] + p[k];

if (f[k][c] > v1)

f[k][c] = v1;

}

}

f[k][c] = max {f[k-1][C], f[k-1][C - W[k] ] + p[k ] for C W[k]}

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

n

C

f[k]f[k -] 1]

O(n*C)

Page 26: Introduction to Parallel Programming  (Message Passing)

The Parallel Algorithm

1:void transition (int stage)

2:{

3: unsigned x;

4: int c, k;

5: k = stage;

6: for (c = 0; c <= C; c++)

7: f[c] = 0;

8: for (c = 0; c <= C; c++) {

9: IN(&x);

10: f[c] = max(f[c], x);

11: OUT(&f[c], 1, sizeof(unsigned));

12: if (C >= c + w[k])

13: f[c + w[k]] = x + p[k];

14: }

15:}

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

Processor k

f[k-1][c] f[k][c]

Processor k - 1

c

f[k][c] = max {f[k-1][C], f[k-1][C - W[k] ] + p[k]}

Page 27: Introduction to Parallel Programming  (Message Passing)

The Evolution of the Pipeline

n

C

Page 28: Introduction to Parallel Programming  (Message Passing)

The Running Time

n -1 + C

n

C

Page 29: Introduction to Parallel Programming  (Message Passing)

Processor Virtualization

0 1 2

n/p

C

Block Mapping

Page 30: Introduction to Parallel Programming  (Message Passing)

Processor Virtualization

0 1 2

n/p

C

Block Mapping

Page 31: Introduction to Parallel Programming  (Message Passing)

Processor Virtualization

0 1 2

n/p

C

Page 32: Introduction to Parallel Programming  (Message Passing)

The Running Time

0 1 2

n/p

C

(n/p -1)C

(n/p -1)C

....(n/p -1)C

(p-1)(n/p-1)C

+ nC/p

= nC

Page 33: Introduction to Parallel Programming  (Message Passing)

Processor Virtualization

0 1 2

C

n/p

Page 34: Introduction to Parallel Programming  (Message Passing)

The Running Time

0 1 2

C

n/pn/p

n/p....

n/p

(p-1)(n/p)

+ nC/p

= nC/p

Page 35: Introduction to Parallel Programming  (Message Passing)

Block Mapping

void transition (void)

{

unsigned c, k i, inData;

for (c = 0; c <= C; c++){

IN(&inData);

k = calcInitStage();

for (i = 0; i < width; k++, i++) {

f[i] [c] = max(f[i][c], inData);

if (c + w[k] <= C)

f[i][c + w[k]] = inData + p[k];

inData = f[i][c];

}

OUT(&f[i-1][c], 1, sizeof(unsigned));

}

}

width = N / num_proc;

if (f_name < N % num_proc)

/* Load Balancing */

width++;

int calcInitStage( void )

{

return (f_name < N % num_proc) ?

f_name * width :

(f_name * width) + (N % num_proc) ;

}

Page 36: Introduction to Parallel Programming  (Message Passing)

Cola

Cyclic Mapping

0 1 2

Page 37: Introduction to Parallel Programming  (Message Passing)

The Running Time

0 1 2

(p-1)

Cola

+ n/p C

Page 38: Introduction to Parallel Programming  (Message Passing)

Cyclic Mapping

void transition (int stage)

{

unsigned x;

int c, k;

k = stage;

for (c = 0; c <= C; c++)

f[c] = 0;

for (c = 0; c <= C; c++) {

IN(&x);

f[c] = max(f[c], x);

OUT(&f[c], 1, sizeof(unsigned));

if (C >= c + w[k])

f[c + w[k]] = x + p[k];

}

}

int bands = num_bands(n);

for (i = 0; i < bands; i++) {

stage = f_name + i * num_proc;

if (stage <= n - 1)

transition (stage);

}

unsigned num_bands (unsigned n)

{

float aux_f;

unsigned aux;

aux_f = (float) n / (float) num_proc;

aux = (unsigned) aux_f;

if (aux_f > aux)

return (aux + 1);

return (aux);

}

Page 39: Introduction to Parallel Programming  (Message Passing)

Advantages and Disadvantages

Block Distribution:– Minimizes the Number of Communications

– Penalizes the Startup Time of the Pipeline

Cyclic Distribution:– Minimizes the Startup Time of the Pipeline

– May Produce Communications Overhead

Page 40: Introduction to Parallel Programming  (Message Passing)

Transputer Network - Local Area Network

Local Area Network– Coarse Grain

– Serial Communications

Transputer Network– Fine Grain

– Parallel Communications

Page 41: Introduction to Parallel Programming  (Message Passing)

Computational Results

0

50

100

150

200

250

1 2 4 8 16 32

4x8

4x32

4x128

8x8

8x32

8x128

16x8

16x32

16x128

0

5

10

15

20

25

1 2 4 8 16 32

4x8

4x32

4x128

8x8

8x32

8x128

16x8

16x32

16x128

Processos Processors

Tim

e

Tim

e

Transputers Local Area Network

Page 42: Introduction to Parallel Programming  (Message Passing)

The Resource Allocation Problem

M units of an indivisible Resource and a set of N Tasks. fj(x) Benefit obtained when x unidades of resource are

allocated to task j.

max

Subject to

integer,

fj

xjj

N

xj

Mj

N

xj

Bj

xj j N M Bj

( )

,

,..., ; ,

1

10

1 N

Page 43: Introduction to Parallel Programming  (Message Passing)

RAP- The Sequential Algorithm

G[k][m] = max{G[k-1][m-i] + fk(i) / 0 i m }

int rap_seq(void) {

int i, k, m;

for (m = 0; m <= M; n++)

G[0][m] = 0;

q = a; Q = b;

for(k = 0; k < N; k++) {

for(m = 0; m <= M; m++) {

for (i = 0; i <= m; i++)

G[k][m] = max{G[k][m],

G[k-1][i] + f[k](m- i)};

}

return G[N ][M];

}

. . .

.. .

. .

. . .

.

. . .

.

. . .

.

. . .

.

. . .

.

. . .

.

N

M

kk -1

O(nM2)

Page 44: Introduction to Parallel Programming  (Message Passing)

RAP - The Parallel Algorithm

1:void transition (int stage)2:{

3: int m, j, x, k;

4: for( m = 0; m <= M; m++)

5: G[m] = 0;

6: k = stage;

7: for (m = 0; m <= M; m++) {

8: IN(&x);

9: G[m] = max(G[m], x + f(k-1, 0));

10: OUT(&G[m], 1, sizeof(int));

11: for (j = m + 1; j <= M; j++)

12: G[j] = max(G[j], x + f(k - 1, j - m));

13: } /* for m ... */

14: } /* transition */

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

Processor k

G[k-1][m] G[k][m]

Processor k - 1

m

G[k][m] = max{G[k-1][m-i] + fk(i) / 0 i m }

Page 45: Introduction to Parallel Programming  (Message Passing)

The Cray T3E

CRAY T3E– Shared Address Space

– Three-Dimensional Toroidal Network

Page 46: Introduction to Parallel Programming  (Message Passing)

Block - Cyclic Mapping

Cola

0 1 2

g(p-1) + gM2 n/gp

Page 47: Introduction to Parallel Programming  (Message Passing)

Computational Results

05

1015202530354045

1 2 5 10 20 40

Grain

Tim

e

2

4

8

16

0

20

40

60

80

100

120

2 4 8 16

Processsors

Tim

e

10x100

100x1000

400x1000

1000x1000

0

1

2

3

4

5

1 2 5 10 20 40

Grain

Sp

ee

du

p 2

4

8

16

0

1

2

3

4

5

2 4 8 16

Processors

Sp

ee

du

p

1

2

5

10

20

40

Page 48: Introduction to Parallel Programming  (Message Passing)

Linear Model to Predict Communication Performance

Time to send N bytes= n +

0,00001

0,0001

0,001

0,01

0,1

1

BEOULL

CRAYT3E

5E-08 n + 5E-05

7E-07 n + 0,0003

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

0 5 0000 1 00000 1 5 0000 2 00000 2 5 0000 3 00000 3 5 0000 4 00000 4 5 0000 5 00000 5 5 0000 6 00000 6 5 0000 7 00000 7 5 0000 8 00000 8 5 0000 9 00000 9 5 0000 1 000000 1 05 0000

BEOULL

CRAYT3E

Page 50: Introduction to Parallel Programming  (Message Passing)

Buffering Data

Virtual Process name runs of real processor fname if (name / grain) mod p = fname

00 11 22 33 44 55 66 77 88 ......

Processor 0 Processor 1 Processor 0

Virtual Processes

P = 2Grain = 3

OUT

IN

SET_BUFIO(1, size)

Size = B

Page 51: Introduction to Parallel Programming  (Message Passing)

The knapsack ProblemN = 12800, M = 12800

Cray - T3E

50

100

150

10002000

30004000

50006000

140

160

180

200

220

Grain

Problem p128-128.knp - np 2

Buffer

Tim

e (

sec)

50

100

150

10002000

30004000

5000

80

100

120

140

Grain

Problem p128-128.knp - np 4

Buffer

Tim

e (

sec)

2040

6080

100120

140

500

1000

1500

40

50

60

70

Grain

Problem p128-128.knp - np 8

Buffer

Tim

e (

sec)

2040

6080

100120

140

500

1000

1500

20

40

60

Grain

Problem p128-128.knp - np 16

Buffer

Tim

e (

sec)

Page 52: Introduction to Parallel Programming  (Message Passing)

The Resource Allocation Problem. Cray - T3E

510

1520

50100150200250300350400450

38

40

42

44

46

48

Grain

Problem 1000x1000 - np 4

Buffer

Tim

e (

sec)

510

1520

20406080100120140160

19

20

21

22

23

24

25

Grain

Problem 1000x1000 - np 8

Buffer

Tim

e (

sec)

510

1520

20406080100120140160

10

11

12

13

14

15

16

17

Grain

Problem 1000x1000 - np 16

Buffer

Tim

e (

sec)

510

1520

100200

300400

72

74

76

78

80

Grain

Problem 1000x1000 - np 2

Buffer

Tim

e (

sec)

Page 53: Introduction to Parallel Programming  (Message Passing)

Portability of the Efficiency

One disappointing contrast in parallel systems is between the peak performance of the parallel systems and the actual performance of parallel applications.

Metrics, techniques and tools have been developed to understand the sources of performance degradation.

An effective parallel program development cycle, may iterate many times before achieving the desired performance.

Performance prediction is important in achieving efficient execution of parallel programs, since it allows to avoid the coding and debugging cost of inefficient strategies.

Most of the approaches to performance analysis fall into two categories: Analytical Modeling and Performance Profiling.

Page 54: Introduction to Parallel Programming  (Message Passing)

Performance Analysis

Profiling may be conducted on an existing parallel system to recognize current performance bottlenecks, correct them, and identify and prevent potential future performance problems.

Architectural Dependent. The majority of performance

metrics and tools devised reflect their orientation towards the measurement-modify paradigm.

PICL, Dimemas, Kpi.

ParaGraph, Vampir, Paraver.

Instrumentation

Computation

Profile analysis

New Tuning Parameters

Error Prediction

Run Time Prediction

Page 55: Introduction to Parallel Programming  (Message Passing)

Performance Analysis

Analytical Modeling– Provides a structured way for

understanding performance problems

– Architectural Independent

– Has predictive ability

– Modeling is not a trivial task. The model must be simple enough to be tractable, and sufficiently detailed to be accurate.

– PRAM, LogP, BSP, BSPWB, etc...

Computation

OptimalParameterPrediction

Analytical Modeling

ErrorPrediction

Run TimePrediction

Page 56: Introduction to Parallel Programming  (Message Passing)

Standard Loop on a Pipeline Algorithm

void f() { Compute(body0); While (running) { Receive(); Compute(body1); Send(); Compute(body2); }

}

body0 take constant time

body1 and body2 depends on the iteration of the loop

Analytical Model

Numerical Solutions for every case

Page 57: Introduction to Parallel Programming  (Message Passing)

The Analytical Model

B

B

G

B

G

B

G

• Ts denotes the startup time betweentwo processors.

Ts = t0*( G - 1) + + G*i = 1, (B-1) (t1i + t2i )+ 2*I * (G - 1)* B + E * B + + *B

Tc denotes the whole evaluation of G processes, including the time to send M/B packets of size B:

Tc = t0 * G + G*i = 1, M (t1i + t2i )+

2*I*(G - 1)*M + E*M + (*B)* M/B

G

B...

M/B

0 1 2

Page 58: Introduction to Parallel Programming  (Message Passing)

The Analytical Model

T1(G, B) = Ts * (p - 1) + Tc * N/(G*p)

1 G N/p and 1 B M

G

B

G

B

G

11 p-1p-100

B

B

00

B

G

.

.

.

G

B

G

B

G

11 p-1p-100

B

B

00

B

G

.

.

.

T2(G, B) = Ts * (N/G – 1) + Tc

RR1 1 = = Values (G, B) where Ts * p Tc

RR2 2 = = Values (G, B) where Ts * p Tc

Page 59: Introduction to Parallel Programming  (Message Passing)

Validation of The Model

Knapsack Problem: Model vs Best Real

0

20

40

60

80

100

120

140

160

2 4 8 16

Processors

Tim

e Model

Best Real

RAP Problem: Model vs Best Real

0

10

20

30

40

50

60

70

80

2 4 8 16

Processors

Tim

e Model

Best Real

Page 60: Introduction to Parallel Programming  (Message Passing)

The Tuning Problem

Given an Algorithm A, FA is the input/output fuction computed by the algorithm

FA : D = D1x...xDn * * FA(z) is the output value of the Algorithm A for the entry z belonging to D

TimeM(A(z)) is the execution time of the Algorithm A over the input z on a Machine M. CTimeM(A(z)) is the analytical Complexity Time formula that approximates TimeM(A(z))

T = D1x...xDk T Tunning Parameters I = Dk+1x...xDn I Input Parametersx T if and only if, occurs that x has only impact in the performance of the algorithm but not in its

output.

FA(x, z) = FA(y, z) for any x and y T

TimeM(A(x, z)) TimeM(A(y, z)

The Tuning Problem:

is to find x0 T such that CTimeM(A(x0, z)) = min { CTimeM(A(x, z)) / xT}

Page 61: Introduction to Parallel Programming  (Message Passing)

Tunning Parameters

The list of tuning parameters in parallel computing is extensive:

– The most obvious tuning parameter is the Number of Processors.

– The size of the buffers used during data exchange.

– Under the Master-Slave paradigm, the size and the number of data item generated by the master.

– In the parallel Divide and Conquer technique, the size of a subproblem to be considered trivial and the the processor assignment policy.

– On regular numerical HPF-like algorithms, the block size allocation.

Page 62: Introduction to Parallel Programming  (Message Passing)

The Methodology

Profiling the execution to compute the parameters needed for the Complexity Time function CTimeM(A(x, z)).

Compute x0T such that minimizes the Complexity Time function CTimeM(A(x, z)).

CTimeM(A(x0, z)) = min { CTimeM(A(x, z)) /xT}

At this point, the predictive ability of the Complexity Time function can be used to predict the execution time TimeM(A(z)) of an optimal execution or to execute the algorithm according to the tuning parameter T.

Instrumentation

Analytical Modeling

Optimal ParameterComputation

Run TimePrediction

Error Prediction Computation

Page 63: Introduction to Parallel Programming  (Message Passing)

llp Solver

Instrumentation onllp Comunication

Calls

llp AnalyticalModelin

g

Computation

Run TimePrediction

Error Prediction

Min( T(p, G, B))

Computation

t0, t1 and t2

IN

OUT

gettime();

gettime();

Page 64: Introduction to Parallel Programming  (Message Passing)

BA

MA

LL

The MALLBA Infrastructure

Page 65: Introduction to Parallel Programming  (Message Passing)

Performace PredictionBA - ULL

0,0001 n - 0,0151

9E-05 n + 0,005

0

1

2

3

4

5

0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 7 0 0 0 8 0 0 0 9 0 0 0 1 0 0 0 0 1 1 0 0 0 1 2 0 0 0 1 3 0 0 0 1 4 0 0 0 1 5 0 0 0 1 6 0 0 0 1 7 0 0 0 1 8 0 0 0 1 9 0 0 0 2 0 0 0 0 2 1 0 0 0 2 2 0 0 0 2 3 0 0 0 2 4 0 0 0 2 5 0 0 0 2 6 0 0 0 2 7 0 0 0 2 8 0 0 0 2 9 0 0 0 3 0 0 0 0 3 1 0 0 0 3 2 0 0 0 3 3 0 0 0

BAULL-1

BAULL-2

0,01

0,1

1

10

BAULL-1

BAULL-2

0,00001

0,0001

0,001

0,01

0,1

1

10

100

1 4

16

64

25

6

10

24

40

96

16

38

4

65

53

6

3E

+0

5

1E

+0

6

BEOULL

CRAYT3E

BAULL-1

BAULL-2

Page 66: Introduction to Parallel Programming  (Message Passing)

The MALLBA Project

Library for the resolution of combinatorial optimisation problems.– 3 types of resolution techniques:

• Exact• Heuristic• Hybrid

– 3 implementations: • Sequential• LAN• WAN

Goals:– Genericity– Ease of utilization– Locally- and geographically-distributed computation

Page 67: Introduction to Parallel Programming  (Message Passing)

References

Willkinson B., Allen M. Parallel Programming. Techniques and Applications Using Networkded Workstations and Parallel Computers. 1999. Prentice-Hall.

Gropp W., Lusk E., Skjellum A. Using MPI. Portable Parallel Programming with the Message-Passing Interface. 1999. The MIT Press.

Pacheco P. Parallel Programming with MPI. 1997. Morgan Kaufmann Publishers.

Wu X. Performance Evaluation, Prediction and Visualization of Parallel Systems.

nereida.deioc.ull.es