zero copy mpi derived datatype communication over...

39
Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan Santhanaraman Jiesheng Wu D.K.Panda Network Based Computing Lab The Ohio State University

Upload: others

Post on 12-Jun-2020

51 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Zero Copy MPI Derived Datatype Communication Over InfiniBand

Gopalakrishnan SanthanaramanJiesheng WuD.K.Panda

Network Based Computing LabThe Ohio State University

Page 2: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Presentation Layout

� Introduction

� Background and Existing approaches

� Motivation for new Scatter/Gather (SGRS) approach

� Design and implementation issues

� Performance Evaluation

� Conclusions and Future work

Page 3: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Introduction

� Non-contiguous data communication is common in scientific applications.

� Decomposition of multi dimensional volumes, FFT, finite elementcodes

� NAS BENCHMARKS, LINPACK

� MPI provides derived datatype interface to facilitate this kind of data movement

� Current Implementations of derived datatypes not very efficient

Page 4: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Presentation Layout

� Introduction

� Background and Existing Approaches

� Motivation for new Scatter/Gather(SGRS) approach

� Design and Implementation Issues

� Performance Evaluation

� Conclusions

Page 5: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Related Work

� Improve datatype processing

� Optimized packing and Unpacking Procedures

� Taking advantage of network features to improve non contiguous datatype communication

Page 6: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

InfiniBand Overview

� Emerging interconnect based on Open standards

� Provides low latency and high Bandwidth

� Several Novel features

� RDMA

� Scatter/Gather

� Atomic operations

� VAPI – low level interface (API) over InfiniBand

Page 7: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Our Previous Work

� Different Approaches

� Pack/Unpack Based Approach

� Copy on both sides

� Pipeline packing, network communication and unpacking

� Reduced Copy

� RDMA write with Gather on sender side

� RDMA read with Scatter on receiver side

� Zero Copy

� Multiple RDMA writes on sender side (Multi-W scheme)

Jiesheng Wu, Pete Wyckoff, and Dhabaleswar K. Panda. High Performance Implementation of MPI Datatype Communication over InfiniBand. In Int'l Parallel and Distributed Processing Symposium (IPDPS 04), April, 2004

Page 8: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Conclusions of Previous Work

� For small messages with eager protocol, segment pack/unpack is best.

� For messages in rendezvous protocol range, zero copy schemes are beneficial.

Multi-W zero copy scheme was proposed.

��� � � �� ��� � �� � �

�� �� ��� � � � � �� �� ��� � � � �

�� �� �� � �

�� �� �� � �

�� �� �� � �

�� �� �� � �

Page 9: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Limitations of Earlier Approaches

� RDMA write/gather, RDMA read/scatter

Needs copy in order to handle non-contiguity on both sides

� Multi-W

For large number of small segments, performance degrades.

� Overhead of large number of RDMA operations

� Poor network utilization

� Motivation to explore other zero copy schemes

� Problem statementHow can we utilize the advanced features provided by modern

interconnects like InfiniBand to handle non-contiguous data communication efficiently and overcome the above limitations?

Page 10: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Presentation Layout

� Introduction� Background and Existing approaches� Motivation for New Scatter/Gather (SGRS)

Approach� Design and Implementation issues� Performance Evaluation� Conclusions and Future work

Page 11: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Semantics of send/gather, receive/scatter feature

� Based on send/receive channel semantics

� Handles non-contiguity on both send/receive sides which is the most generic case

� To implement datatype using this feature needs a synchronization phase. Hence applicable for messages which fall under the rendezvous protocol

Page 12: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

VAPI level Comparison Multi-W vs SGRS

Observations

� For a fixed number of segments SGRS approach outperforms the Multi-W approach for different message sizes

� For a fixed message size with increasing degree of non-contiguity,

SGRS scheme degradation is negligible

Multi-W degradation is significant

Page 13: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Presentation Layout

� Introduction� Background and Existing approaches� Motivation for new Scatter/Gather

approach� Design and Implementation issues� Performance Evaluation� Conclusions

Page 14: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

MVAPICH Overview

� High Performance Implementation of MPI over InfiniBand

� Design based on MPICH and MVICH

� Eager protocol for small messages

� Rendezvous protocol for large messages

� Datatype Implementation currently uses the generic packing and unpacking scheme.

� small datatype messages are packed/unpacked

� large datatype messages both sides allocate pack/ unpack buffers dynamically

Page 15: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

� Open Source (current version is 0.9.4 released last week)

� Have been directly downloaded by more than 119 organizations and industry

� Available in the software stack distributions of IBA vendors

MVAPICH Software Distribution

��� � ��� � � � �� �� � �� �� �� � � � � � � ��

����� � �� ��� � �� �� ! � "� � � �� �#$� � � � % & �� � # $� � � � �$� � � � � '� � (� � & � )� � ��* + � � , $� )-. � � � /* � � �* �0% & � � � � & � � � � , + 12 � + �43 '� � 5�6 - � � � )� � �� 7 &# + ��* + 08 � � )� �# 1

2 � + �43 '� � 7 �� � � � ) / ��. * �. � � + � � ,9 � �� :�� � � ��; � � �� �08 � � )� �# 1

! �< � � �* � =�� � >� � # ��� � �� �� ! � "� � � �� �#!� + � � )� + �� � �� �� ! � "� � � �� � #(� 6 7 ?� �* > 2 � + � � �. � � '� � � + ��� �� )# 0 8 � � )� �# 1

� � / � � ) � + @� +� � � * & $� � �� �� $ / ��� � �� �� $� � � � � '� � � � )� +- & � � �* @ � +� � � * &

: & �� /. - � � *� )-. � � � $� � �� �7 � * � ' �* �� � � &< � + � ��� � �� �� ! � "� � � �� � #7 � � � + ". �� & /. - � � *� ) -. � � �� $� � � � �@� +� � � * &A 9 �B � � - )� � � 2 � + � � �. � � CB � � � 0 @. + + � � 1

/* � � �* � � - - �* � � �� � + 2 � � � � �� � �� �� $� � -� � � � �� �/� � , ��� ��� � �� �� ! � "� � � �� � #

D � ��E � �� � � � ��

8 �� � � � � % � * &

2 � , ��� �� F � �B � � + � �#C� � � � F � �B 3 0 C� � � � 1

C� � � � 2 � + �43 : ' /* � � �* � � � ,% � * & 3 0 C� � � � 1

C# . + &. F � �B 3 0G � - � � 1( � + + � + + � - - � / �� � � F � �B � � + � �#(� +*� < / �� � � F � �B � � + � �# 0 @. + + � � 1

�� � � & � � + � � � � F � �B � � + � �#7 � � � / �� � � F � �B � � + � �#@. + + � � � �* � , � )# � ' /* � � �* � + 0 @. + + � � 1

/ �� � '� � , F � �B � � + � �#% � * & � �� � 0 2 +� � � 1

% � * & � �* � F � �B 3� ' (. �* & � � 0 8 � � )� �# 1

% � * & � �* � F � �B 3� ' $ & � ) � � �; 0 8 � � )� �# 1

F � �B 3� ' 8 � � �B � 0 /< � �; � � � � , 1

F � �B 3� 'H�� . + �� �F � �B 3� ' C� � +�. & � 0 8 � � )� �# 1

F � �B 3� ' (� + +� * &. +� � � +!� < �

F � �B 3� '7 � , � � "� � � 0 8 � � )� �# 1

F � �B 3� '7� � + ,� ) 0 8 � � )� �# 1

F � �B 3� ' @ �� 8 � � � ,?� 0 =� � ; � 1

F � �B 3� ' / & � � "�� � >� 0 $� �� ,� 1

F � �B 3� ' / �. � �� � � � 0 8 � � )� �# 1

F � �B 3� '%� �� � �� 0 $� �� ,� 1

Page 16: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

MVAPICH Users (Cont’d)

� " "� % � * & �� � � #� ,B � �* � , $ . + � � � � �� % � * & 3� (9� ) )� + +�� - - ���� � � # /# + � � ) + $� )- 3 0 $� �� ,� 1

� � � - � % � * & �� � � � � +�� � � � �% � * & �� � � � � +$ . + �� � + /. - � � *� )-. � � ���% � * & �� � � # 2 �* 3 0 $ & � �� 1

$ . + � � �B � + �� � 0 � � � & � � � � , + 1$� )-. +# + 0 F C 1

$ / /! � "� � � �� � � � +�� 2 �* 39 � 9 � �� $� )-. �� � 0 8 � � )� �# 1

5 )- �* + 0 8 � � )� �# 1

� . � � � 2 �* 356 � � � � 0 2 +� � � 1

8 � � - & / �� � � ) � 2 �* 3H 7H 7 0 �� � �* � 1

� � ��� � � ��

2 = (2 = ( 0 �� � �* � 1

2 = ( 0 8 � � )� �# 1

2 �% 5 @ / 59 0 �� � �* � 1

2 � ' � � � $� �2 � � � 2 � � � 0 $ & � �� 1

2 � � � 0 8 � � )� �# 1

2 � � � /� . � �� � /�� �B ��* � + 0H�� �� C� �� 1

2 � � � /� . � �� � /�� �B ��* � + 0G � - � � 1G � 2C� � ' �< � # 0 @. + + � � 1

! � �� * &�� 0 $ & � �� 1

! � �. 6 � � �< � � 6! � �B � + �� � 0 � � � & � � � � , + 1( � � �< � � � 0 8 � � )� �# 1

( � � * . �# $� )-. �� � /# + �� ) +( � � �� 6 % � * & �� � � � � +( � �� +# + 0 �� � �* � 1

( ��* �� < � # � 2 �* 3� 5 $ 0G � - � � 1� 5 $ /� . � �� � + � 2 �* 3� 5 $ 0 / � �� � -� � � 1

� 2 $ 5 % 0 @. + + � � 1

: $ � - * 0 F � � � � , C � �� ,� ) 1

:* � �� � =� # 0 $� �� ,� 1

7 � �% � /# + �� ) +7 � � % � * 0 8 � � )� �# 1

7 � � & /* � � � 2 �* 37 . �� * 0G � - � � 17 # � � ) � , $� )-. � � � 08 � � )� �# 1

. + � � � + 0 2 +� � � 1

@� # � & �� � 2 �* 3@! �% � * & �� � � � � +@� + �� ! � ,3 0 @. + + � � 1

/ = $% � * & �� � � � � +�� 2 �* 3/* # , /� ' �< � � �/ 8 2 0 / � ��*� � 8 � � - & �* + � 2 �* 3 1

/ C� $� ) -. � � � +/ �� � � ) � �� $� )-. � � �� 0 F C 1

/# + �� � �%� )� �% � ?*� � , ��� � - - � � , @ � +� � � * &

% &� � + F � , � �< � � � � /# + � � ) + 0 F C 1

% � � � + � � * 0 8 � � )� �# 1

%� 7 � � '� � ) + 0 @. + + � � 1

%� - +- � �F � � +# +

� �� �� � � � > + �� � �� � + F C� ! � ,3 0 F C 1

� B � � /# + � � ) + � 2 �* 3

Page 17: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Larger IBA Clusters using MVAPICH and Top500 Rankings

� 1105-node cluster at Virginia Tech

� 3rd in Nov. ’03 ranking

� 192-node cluster at Mississippi State University

� 150th in June ’04 ranking

� 128-node cluster at Sandia/Livermore

� 111th in Nov ’03 ranking and 211th in June ’04 ranking

� 256-node cluster at Los Alamos

� 116th in Nov ’03 ranking and 218th in June ’04 ranking

� 128-node cluster at Ohio Supercomputer Center (OSC)

� 272th in June ’04 ranking

� More are getting installed ….

Page 18: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Framework For Handling Datatypes

MPI INTERFACE

INFINIBAND LAYER

Rendezvous

Reduced CopyPipeline Zero copyPack

Small messages Large messages

Eager

Page 19: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

��� � � � � �� � �� �

� �� � ��

��� � � � �

�� � � ��� � � � � �� �� ��� � � � �

��� � �

� �� � � �

Basic Idea

Page 20: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Design Issues

� Exchanging layout information

MPI datatype has only local semantics

Optimizing layout exchange

� Layout matching decision needs to be conveyed

� Registration and deregistration on user datatype message buffers

Unique issue due to non-contiguity in buffers

� Posting Descriptors

Upper limit on number of scatter gather descriptors.

Needs a secondary connection for transmitting non-contiguous data

Page 21: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

SG

RS

CO

MM

UN

ICA

TIO

N P

RO

TO

CO

L

SE

ND

ER

RE

CE

IVE

R

RE

QU

ES

T C

TR

L M

ES

G+

LA

YO

UT

(P

RIM

AR

Y C

ON

NE

CT

ION

)

PO

ST

SC

AT

TE

R

PO

ST

GA

TH

ER

RE

PL

Y C

TR

L M

ES

G +

DE

CIS

ION

INF

O (

PR

IMA

RY

CO

NN

EC

TIO

N)

DA

TA

(S

EC

ON

D C

ON

NE

CT

ION

)

Page 22: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Layout Exchange and Matching Decision

� Take advantage of handshake messages in the rendezvous protocol to achieve this

� Sender’s datatype layout is appended to Rendezvous start control message

� The matching decision information is conveyed in the Rendezvous reply/clear to send message

� A layout cache mechanism is implemented to reduce overhead of layout transfer

� Datatype information is exchanged only once

� Only the index needs to be sent for future messages

� Datatype Cache mechanism proposed by Traff et al.

Page 23: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Registration

� Registration and Deregistration on user datatype message buffers

� Common issues in both the zero copy schemes

� Unique issue due to non-contiguity in buffers

� Use Optimistic Group Registration scheme

J. Wu, P. Wyckoff, and D. K. Panda. “Supporting Efficient Noncontiguous Access in PVFS over InfiniBand”. IEEE Cluster Computing 2003, Dec. 2003

Page 24: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Posting Descriptors

� Needs a separate Queue pair connection

Ordering

Scalability

� Upper limit on number of gather/scatter descriptor

Message might need to be chopped into multiple gather/scatter descriptors

Number of posted gather descriptors must be equal to the number of posted scatter

Needs a negotiation phase

Page 25: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Presentation Layout

� Introduction� Background and Existing approaches� Motivation for new Scatter/Gather (SGRS)

approach� Design and Implementation issues� Performance Evaluation� Conclusions and Future work

Page 26: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Experimental Evaluation

� Experimental Test bed

Cluster of 8 Supermicro nodes

� Dual Xeon 3.0 GHz processors

� 512 KB L2 Cache, PCI-X 64bit 133 MHz bus

� InfiniHost SDK version 3.0.1

� Physical memory 1GB DDR-SDRAM memory

� Experiments conducted

Latency, Bandwidth with vector datatype

Collective latency (MPI_Alltoall)

CPU overhead tests

Impact of layout cache

Page 27: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Vector Datatype TestA vector (multiple columns in a 64x4096 integer array) test

Page 28: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

MPI Level Vector Latency

0

100

200

300

400

500

600

700

800

900

2k 4k 8k 16k 32k 64k 128k 256k 512k

Message size(bytes)L

aten

cy (u

sec)

SGRS-128

Multi-W-128

Generic-128

Contiguous

0

300

600

900

1200

1500

1800

2100

2k 4k 8k 16k 32k 64k 128k 256k 512k

Message size (bytes)

Lat

ency

(u

secs

)

SGRS-64

MultiW-64

Contiguous

Generic-64

� SGRS scheme reduces latency by up to 62% as compared to Multi-W

Page 29: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

MPI Level Vector Bandwidth

0

100

200

300

400

500

600

700

800

900

2k 4k 8k 16k 32k 64k 128k 256k 512k

Message size(bytes)

Ban

dw

idth

(M

egab

ytes

/sec

)

SGRS-64

SGRS-128

MultiW-64

MultiW-128

Contiguous

Generic-128

Generic-64

� SGRS scheme gives the best performance

� For large messages we get Bandwidth close to that of contiguous Bandwidth

Page 30: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

0

10

20

30

40

50

60

2k 4k 8k 16k 32k 64k 128k 256k 512k

Message size(bytes)

CP

U o

verh

ead

(use

c)

MultiW-64 segments

MultiW-128 segments

SGRS-64 segments

SGRS-128 segments

• The CPU overhead associated with SGRS protocol is relatively low

CPU Overhead

Receiver side OverheadSender side Overhead

0

4

8

12

16

20

2k 4k 8k 16k 32k 64k 128k 256k 512k

Message size(bytes)C

PU

ove

rhea

d(us

ec)

MultiW-64 segments

MultiW-128 segments

SGRS-64 segments

SGRS-128 segments

Page 31: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

MPI_Alltoall Latency

• The Alltoall latency test shows significant improvement for the SGRS approach

0

2000

4000

6000

8000

10000

12000

4k 8k 16k 32k 64k 128k 256k 512k

Message size(bytes)

Lat

ency

(u

sec)

Multi-W 64

Multi-W-128

SGRS-64

SGRS-128

Page 32: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Synthetic Benchmark to Measure Impact of Layout Caching

� Need to transfer the two diagonals of a square matrix.

� Diagonal elements are actually blocks.

� Need significant layout size to describe it

Page 33: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

0

5

10

15

20

25

30

500 750 1000 1250 1500 1750 2000

Num of blocks

Per

centa

ge

of O

verh

ead

blocksize:4bytes

blocksize:8bytes

blocksize:16bytes

Effect of Layout Cache

� Layout cache shows benefits for certain scenarios

� Layout itself is contiguous as compared to the data that it describes

Page 34: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Presentation Layout

� Introduction� Background and Existing approaches� Motivation for new Scatter/Gather (SGRS)

approach� Design and implementation issues� Performance Evaluation� Conclusions and Future work

Page 35: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Conclusions and Future Work

� Provided a new zero-copy scheme for datatype communication over InfiniBand

� The new scheme outperforms the existing schemes

Latency can be improved by up to 62%

Bandwidth can be increased by up to 400%

Collective communication like Alltoall can derive potential benefits

Layout cache is shown to be beneficial for some scenarios

� Future Work

Evaluate the effectiveness of this scheme at application level

Provide a comprehensive solution that internally uses multiple schemes to achieve best performance

Page 36: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

��� � � � � � ��� �� � � � ���� � � � �� � � � � �

� � � �� � � � ��� � � ��� � � � ��� � �� � ���

� � �� � � � � � !"� � � �� # $ � � � � �%

& � � ' � � � ( � ) � �� � � � � %

Thank You!

NBC Home Page

Page 37: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

BACKUP SLIDES

Page 38: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Vapi Level Bandwidth Comparison SGRS vs. Multi-W

• SGRS scheme consistently outperforms the Multi-W

0

100

200

300

400

500

600

700

800

900

1000

2k 4k 8k 16k 32k 64k 128k 256k 512k

Message size (bytes)

Ban

dw

idth

(M

ega

byt

es/s

ec)

Multi-W-Bw

SGRS-Bw

Page 39: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Effect of degree of non-contiguity

• SGRS scheme fares better with increased non-contiguity

500

550

600

650

700

750

800

850

900

4 8 16 32 64

Num of Blocks

Ban

dw

idth

(M

egab

ytes

/sec

)

Multi-W-128k

Multi-W-256k

Multi-W-512k

SGRS-128k

SGRS-256k

SGRS-512k