operating systems - distributed parallel computing

31
UNIVERSITY OF MASSACHUSETTS AMHERST Department of Computer Science Operating Systems CMPSCI 377 Distributed Parallel Programming Emery Berger University of Massachusetts Amherst

Upload: emery-berger

Post on 11-May-2015

2.851 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Operating SystemsCMPSCI 377

Distributed Parallel ProgrammingEmery Berger

University of Massachusetts Amherst

Page 2: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 2

Outline

Previously:

Programming with threads

Shared memory, single machine

Today:

Distributed parallel programming

Message passing

some material adapted from slides by Kathy Yelick, UC Berkeley

Page 3: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 3

Why Distribute?

SMP (symmetric multiprocessor): easy to program but limited Bus becomes

bottleneck when processors not operating locally

Typically < 32 processors

$$$

P1

network/bus

$

memory

P2

$

Pn

$

Page 4: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 4

Distributed Memory Vastly different platforms

Networks of workstations

Supercomputers

Clusters

Page 5: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 5

Distributed Architectures

Distributed memory machines:local memory but no global memory

Individual nodes often SMPs

Network interface for all interprocessorcommunication – message passing

interconnect

P0

memory

NI

. . .

P1

memory

NI Pn

memory

NI

Page 6: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 6

Message Passing Program: # independent communicating processes

Thread + local address space only

Shared data: partitioned

Communicate by send & receive events

Cluster = message sent over sockets

PnP1P0

y = ..s ...

s: 12

i: 2

s: 14

i: 3

s: 11

i: 1

send P1,s

Network

receive Pn,s

Page 7: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 7

Message Passing

Pros: efficient Makes data sharing explicit

Can communicate only what is strictly necessary for computation No coherence protocols, etc.

Cons: difficult Requires manual partitioning

Divide up problem across processors

Unnatural model (for some)

Deadlock-prone (hurray)

Page 8: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 8

Message Passing Interface

Library approach to message-passing

Supports most common architectural abstractions Vendors supply optimized versions

⇒ programs run on different machine, but with (somewhat) different performance

Bindings for popular languages Especially Fortran, C

Also C++, Java

Page 9: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 9

MPI execution model

Spawns multiple copies of same program (SPMD = single program, multiple data)

Each one is different “process”(different local memory)

Can act differently by determining which processor “self” corresponds to

Page 10: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 10

An Example

% mpirun –np 10 exampleProgram

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, size;MPI_Init(&argc, &argv );MPI_Comm_size(MPI_COMM_WORLD, &size);MPI_Comm_rank(MPI_COMM_WORLD, &rank);printf("Hello world from process %d of %d\n",

rank, size);MPI_Finalize();return 0;

}

Page 11: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 11

An Example

% mpirun –np 10 exampleProgram

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, size;MPI_Init(&argc, &argv );MPI_Comm_size(MPI_COMM_WORLD, &size);MPI_Comm_rank(MPI_COMM_WORLD, &rank);printf("Hello world from process %d of %d\n",

rank, size);MPI_Finalize();return 0;

}

initializes MPI (passes

arguments in)

Page 12: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 12

An Example

% mpirun –np 10 exampleProgram

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, size;MPI_Init(&argc, &argv );MPI_Comm_size(MPI_COMM_WORLD, &size);MPI_Comm_rank(MPI_COMM_WORLD, &rank);printf("Hello world from process %d of %d\n",

rank, size);MPI_Finalize();return 0;

}

returns # of processors in

“world”

Page 13: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 13

An Example

% mpirun –np 10 exampleProgram

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, size;MPI_Init(&argc, &argv );MPI_Comm_size(MPI_COMM_WORLD, &size);MPI_Comm_rank(MPI_COMM_WORLD, &rank);printf("Hello world from process %d of %d\n",

rank, size);MPI_Finalize();return 0;

}

which processor am I?

Page 14: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 14

An Example

% mpirun –np 10 exampleProgram

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, size;MPI_Init(&argc, &argv );MPI_Comm_size(MPI_COMM_WORLD, &size);MPI_Comm_rank(MPI_COMM_WORLD, &rank);printf("Hello world from process %d of %d\n",

rank, size);MPI_Finalize();return 0;

}

we’re done sending

messages

Page 15: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 15

An Example% mpirun –np 10 exampleProgramHello world from process 5 of 10Hello world from process 3 of 10Hello world from process 9 of 10Hello world from process 0 of 10Hello world from process 2 of 10Hello world from process 4 of 10Hello world from process 1 of 10Hello world from process 6 of 10Hello world from process 8 of 10Hello world from process 7 of 10% // what happened?

Page 16: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 16

Message Passing

Messages can be sent directly to another processor

MPI_Send, MPI_Recv

Or to all processors

MPI_Bcast (does send or receive)

Page 17: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 17

Send/Recv Example

Send data from process 0 to all

“Pass it along” communication

Operations: MPI_Send (data *, count, MPI_INT, dest, 0,

MPI_COMM_WORLD );

MPI_Recv (data *, count, MPI_INT, source, 0, MPI_COMM_WORLD );

Page 18: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 18

Send & Receive

Send integer input in a ring

int main(int argc, char * argv[]) {int rank, value, size;MPI_Status status;MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, &size);do {

if (rank == 0) {scanf( "%d", &value );MPI_Send(&value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );} else {

MPI_Recv(&value, 1, MPI_INT, rank - 1,0, MPI_COMM_WORLD, &status );

if (rank < size - 1)MPI_Send( &value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );}printf("Process %d got %d\n", rank, value);

} while (value >= 0);MPI_Finalize();return 0;

}

Page 19: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 19

Send & Receive

Send integer input in a ring

int main(int argc, char * argv[]) {int rank, value, size;MPI_Status status;MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, &size);do {

if (rank == 0) {scanf( "%d", &value );MPI_Send(&value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );} else {

MPI_Recv(&value, 1, MPI_INT, rank - 1,0, MPI_COMM_WORLD, &status );

if (rank < size - 1)MPI_Send( &value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );}printf("Process %d got %d\n", rank, value);

} while (value >= 0);MPI_Finalize();return 0;

}

send destination?

Page 20: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 20

Send & Receive

Send integer input in a ring

int main(int argc, char * argv[]) {int rank, value, size;MPI_Status status;MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, &size);do {

if (rank == 0) {scanf( "%d", &value );MPI_Send(&value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );} else {

MPI_Recv(&value, 1, MPI_INT, rank - 1,0, MPI_COMM_WORLD, &status );

if (rank < size - 1)MPI_Send( &value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );}printf("Process %d got %d\n", rank, value);

} while (value >= 0);MPI_Finalize();return 0;

}

receive from?

Page 21: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 21

Send & Receiveint main(int argc, char * argv[]) {int rank, value, size;MPI_Status status;MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, &size);do {

if (rank == 0) {scanf( "%d", &value );MPI_Send(&value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );} else {

MPI_Recv(&value, 1, MPI_INT, rank - 1,0, MPI_COMM_WORLD, &status );

if (rank < size - 1)MPI_Send( &value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );}printf("Process %d got %d\n", rank, value);

} while (value >= 0);MPI_Finalize();return 0;

}

message tag

message tag

message tag

Page 22: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Exercise

Compute expensiveComputation(i) on n processors; process 0 computes & prints sum

22

// MPI_Send (&value, 1, MPI_INT, dest, 0, MPI_COMM_WORLD );int main(int argc, char * argv[]) {int rank, size;MPI_Status status;MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, &size);if (rank == 0) {

int sum = 0;

printf(“sum = %d\n", sum);} else {

}MPI_Finalize(); return 0;

}

Page 23: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Broadcast

Send and receive: point-to-point

Can also broadcast data

Source sends to everyone else

23

Page 24: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 24

Broadcast

Repeatedly broadcast input (one integer) to all

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, value;MPI_Init( &argc, &argv );MPI_Comm_rank( MPI_COMM_WORLD, &rank );do {if (rank == 0)

scanf( "%d", &value );MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);printf( "Process %d got %d\n", rank, value );

} while (value >= 0);MPI_Finalize( );return 0;

}

Page 25: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 25

Broadcast

Repeatedly broadcast input (one integer) to all

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, value;MPI_Init( &argc, &argv );MPI_Comm_rank( MPI_COMM_WORLD, &rank );do {if (rank == 0)

scanf( "%d", &value );MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);printf( "Process %d got %d\n", rank, value );

} while (value >= 0);MPI_Finalize( );return 0;

}

send or receive value

Page 26: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 26

Broadcast

Repeatedly broadcast input (one integer) to all

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, value;MPI_Init( &argc, &argv );MPI_Comm_rank( MPI_COMM_WORLD, &rank );do {if (rank == 0)

scanf( "%d", &value );MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);printf( "Process %d got %d\n", rank, value );

} while (value >= 0);MPI_Finalize( );return 0;

}

how many to send/receive?

Page 27: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 27

Broadcast

Repeatedly broadcast input (one integer) to all

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, value;MPI_Init( &argc, &argv );MPI_Comm_rank( MPI_COMM_WORLD, &rank );do {if (rank == 0)

scanf( "%d", &value );MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);printf( "Process %d got %d\n", rank, value );

} while (value >= 0);MPI_Finalize( );return 0;

}

what’s the datatype?

Page 28: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 28

Broadcast

Repeatedly broadcast input (one integer) to all

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, value;MPI_Init( &argc, &argv );MPI_Comm_rank( MPI_COMM_WORLD, &rank );do {if (rank == 0)

scanf( "%d", &value );MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);printf( "Process %d got %d\n", rank, value );

} while (value >= 0);MPI_Finalize( );return 0;

}

who’s “root” for broadcast?

Page 29: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 29

Communication Flavors

Basic communication

blocking = wait until done

point-to-point = from me to you

broadcast = from me to everyone

Non-blocking

Think create & join, fork & wait…

MPI_ISend, MPI_IRecv

MPI_Wait, MPI_Waitall, MPI_Test

Collective

Page 30: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 30

The End

Page 31: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 31

From Pat Worley, ORNL

Scaling Limits Kernel used in

atmospheric models

99% floating point ops; multiplies/adds

Sweeps through memory with little reuse

One “copy” of code running independently on varying numbers of procs