cc-mpi: a compiled communication capable mpi prototype for ethernet switched clusters amit karwande,...

26
CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State University David K. Lowenthal Department of Computer Science, University of Georgia

Upload: rosamond-patterson

Post on 08-Jan-2018

243 views

Category:

Documents


0 download

DESCRIPTION

Traditional communication libraries (e.g. MPI) hide network details and provide a simple API. Advantage: User friendly. Limitation: Communication optimization opportunity is limited. –Optimizations can either be done at the compiler or in the library. »Architectural independent optimizations in the compiler »Architectural dependent optimizations in the library, but such optimizations can only be done for a single routine.

TRANSCRIPT

Page 1: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet

Switched Clusters

Amit Karwande, Xin YuanDepartment of Computer Science, Florida State University

David K. LowenthalDepartment of Computer Science, University of Georgia

Page 2: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• Motivation• Related work• CC-MPI

– One-to-many(all) communications – Many(all)-to-many(all) communications

• Performance study• Conclusion

Page 3: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• Traditional communication libraries (e.g. MPI) hide network details and provide a simple API.

• Advantage: User friendly.• Limitation: Communication optimization

opportunity is limited.– Optimizations can either be done at the compiler or in the

library.» Architectural independent optimizations in the

compiler» Architectural dependent optimizations in the library,

but such optimizations can only be done for a single routine.

Page 4: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• Compiled Communication:– At compile time, use both the application communication and

network architecture information to perform communication optimizations.

• Static management of network resources.• Compiler directed architecture dependent optimizations.• Architecture dependent optimization across patterns.

– To apply the compiled communication technique to MPI programs, • The library must closely match the MPI library.• The library must be able to support optimizations in compiled

communication.– Expose network details.– Different implementations for a routine so that the user can

choose the best one.– This work focuses on the compiled communication capable

communication library.

Page 5: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• Related Work:– Compiled directed architectural dependent

optimization [Hinrichs94]– Compiled Communication

[Bromley91,Cappello95,Kumar92,Yuan03]– MPI optimizations

[Ogawa96,Lauria97,Tang00,Kielmann99]

Page 6: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• CC-MPI:– Optimizes one-al-all, one-to-many, all-to-all,

and many-to-many communications– Targets Ethernet Switched Clusters

– Basic idea:• Separate network control routines from data

transmission routines• Multiple implementations for each MPI routine.

Page 7: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• One-to-many(all) communications:– Multicast based implementations.

• Reliable multicast (IP multicast is unreliable)– Use a simple ACK-based protocol)

• Group Management– A group needs to be created before any communication is

to be performed.– 2^n potential groups for n members– The hardware limits the number of simultaneous groups.

– CC-MPI supports three group management schemes:

• Static, dynamic, and compiler-assisted

Page 8: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• Static group management:– Associate a multicast group with a communicator

statically.• MPI_Bcast: send a reliable multicast message to the group• MPI_Scatter: aggregate all messages to different nodes and

send the aggregated messages to the group. Each receiver can extract its portion.

• MPI_Scatterv: Two MPI_Bcasts, one for the layout of the data, the other for the data.

– Problem:• For one-to-many communications, nodes that are not in the

communications must also participate in the reliable multicast process

Page 9: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• dynamic group management:– Dynamically creates a new group for one-to-

many communications.– May introduce too much group management overheads.

• Compiler-assisted group management:– Extend the MPI API to allow users to directly manage

multicast groups. • For example, for MPI_Scatterv, we have three routines

– MPI_Scatterv_open_group– MPI_Scatterv_ data_movement– MPI_Scatterv_close_group

– The users may move, merge, and delete the control routines when additional information is available.

Page 10: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• An example:

Page 11: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• All(many)-to-all(many) communications– MPI_Alltoall, MPI_Alltoallv, MPI_Allgather, etc.– Multicast based implementation may not be efficient.– We need to distinguish between communications with

large messages and small messages:• Small messages: each node sends as fast as it can• Large messages: use some mechanism to reduce contention.

– Phase communication [Hinrichs94] » partition the all-to-all communication into phases such that

there is no network contention within each phase. » Use barriers to separate phases so that different phases do

not interference with each other.

– Phase communications for all-to-all communications is well studied for many topologies.

Page 12: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• Phase communication for many-to-many communication (MPI_alltoallv):– All nodes must know the pattern

• Use MPI_Allgather before anything is done• Assume the compile has the info and store in local data

structures.

– Communication scheduling• Greedy scheduling• All-to-all based scheduling

Page 13: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• CC-MPI supports four methods for MPI_Alltoallv:– All nodes send as fast as possible– Phased communication, level 1:

• MPI_Allgather for pattern information• Communication scheduling• Actual phase communication

– Phased communication, level 2: (pattern is known)• Communication scheduling• Actual phase communication

– Phased communication, level 3: (phases are known)• Actual phase communication.

Page 14: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• Performance Study:– Environment: 29 P3-650, 100Mbps Ethernet switch– LAM/MPI version 6.5.4 with c2c mode– MPICH version 4.2.4 with device ch_p4

Page 15: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• Evaluation individual routine:

Page 16: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• MPI_Bcast:

Page 17: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• MPI_Scatter:

Page 18: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• MPI_Scatterv (5 to 5 out of 29 nodes):

Page 19: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• MPI_Allgather (16 nodes):

Page 20: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• MPI_alltoall (16 nodes)

Page 21: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• MPI_Alltoallv (alltoall pattern on 16 nodes):

Page 22: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• MPI_Alltoallv (random pattern):

Page 23: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• Benchmark Program (IS):

Page 24: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• Benchmark Program (FT):

Page 25: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• CC-MPI for software DSM (a synthetic application):

Page 26: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State

• Conclusion:– We develop a compiled communication capable MPI

prototype.– We demonstrate that by allowing users more control on

the communications, significant improvement can be obtained.

– Compiler support is needed for this model to be successful.

http://www.cs.fsu.edu/~xyuan/CCMPI