parallel programming and mpi

20
An Advanced Simulation & Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical Thermonuclear Flash FLASH Tutorial May 13, 2004 Parallel Computing and MPI

Upload: flashdomain

Post on 23-Jun-2015

372 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Parallel Programming and MPI

An Advanced Simulation & Computing (ASC) Academic Strategic Alliances Program (ASAP) Center

at The University of Chicago

The Center for Astrophysical Thermonuclear Flashes

FLASH TutorialMay 13, 2004

Parallel Computing and MPI

Page 2: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

What is Parallel Computing ?And why is it useful

Parallel Computing is more than one cpu working together on one problem

It is useful when Large problem, could take very long Data size too big to fit in the memory of one processor

When to parallelize Problem could be subdivided into relatively independent tasks

How much to parallelize While the speedup in computation relative to single processor

is of the order of number of processors

Page 3: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

Parallel paradigms

SIMD – Single instruction multiple data Processors work in lock-step

MIMD – Multiple instruction multiple data Processors do their own thing with occasional synchronization

Shared Memory One way communications

Distributed Memory Message passing

Loosely Coupled When the process on each cpu is fairly self contained and

relatively independent of processes on other cpu’s Tightly Coupled

When cpu’s need to communicate with each other frequently

Page 4: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

How to Parallelize

Divide a problem into a set of mostly independent tasks

Partitioning a problem Tasks get their own data

Localize a task They operate on their own data for the most part

Try to make it self contained Occasionally

Data may be needed from other tasks Inter-process communication

Synchronization may be required between tasks Global operation

Map tasks to different processors One processor may get more than one task Task distribution should be well balanced

Page 5: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

New Code Components

Initialization Query parallel state

Identify process Identify number of processes

Exchange data between processes Local, Global

Synchronization Barriers, Blocking Communication, Locks

Finalization

Page 6: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

MPI

Message Passing Interface, standard for distributed memory model of parallelism

MPI-2 will support one-way communication, commonly associated with shared memory operations

Works with communicators; a collection of processors MPI_COMM_WORLD default

Has support for lowest level communication operations and composite operations

Has blocking and non-blocking operations

Page 7: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

Communicators

COMM1

COMM2

Page 8: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

Low level Operations in MPI

MPI_Init MPI_Comm_size

Find number of processors MPI_Comm_rank

Find my processor number MPI_Send/Recv

Communicate with other processors one at a time MPI_Bcast

Global data transmission MPI_Barrier

Synchronization MPI_Finalize

Page 9: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

Advanced Constructs in MPI

Composite Operations Gather/Scatter Allreduce Alltoall

Cartesian grid operations Shift

Communicators Creating subgroups of processors to operate on

User-defined Datatypes I/O

Parallel file operations

Page 10: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

Communication Patterns

10

32

Collective

0 1 2 3

Shift

10

2

All to All

10

32

Point to Point

10

32

One to All Broadcast

Page 11: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

Communication Overheads

Latency vs. Bandwidth Blocking vs. Non-Blocking

Overlap Buffering and copy

Scale of communication Nearest neighbor Short range Long range

Volume of data Resource contention for links

Efficiency Hardware, software, communication method

Page 12: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

Parallelism in FLASH

Short range communications Nearest neighbor

Long range communications Regridding

Other global operations All-reduce operations on physical quantities Specific to solvers

multi-pole method FFT based solvers

Page 13: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

Domain Decomposition

P0 P1

P2 P3

Page 14: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

Border Cells / Ghost Points

When splitting up solnData, need data from other processors.

Need a layer of cells from each processor

Need to update each time step

Page 15: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

Border/Ghost Cells

Short Range communication

Page 16: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

Two MPI Methods for doing it

MPI_Cart_create Create topology

MPE_Decomp1d Domain decomp on topology

MPI_Cart_shift Who’s on the left/right?

MPI_SendRecv Ghost cells left

MPI_SendRecv Ghost cells right

MPI_Comm_rank MPI_Comm_size Manually decompose grid

over processors Calculate left/right MPI_Send/MPI_Recv

Carefully to avoid deadlocks

Page 17: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

Adaptive Grid Issues

Discretization not uniform Simple left-right guard cell fills inadequate Adjacent grid points may not be mapped to the

nearest neighbors in processors topology Redistribution of work necessary

Page 18: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

Regridding

Change in number of cells/blocks Some processors get more work than others Load imbalance Redistribute data to even out work on all processors Long range communications Large quantities of data moved

Page 19: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

Regridding

Page 20: Parallel Programming and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear FlashesThe University of Chicago

Other parallel operations in FLASH

Global max/sum etc (Allreduce) Physical quantities In solvers Performance monitoring

Alltoall FFT based solver on UG

User defined datatypes and file operations Parallel I/O