towards mpi progression layer elimination with tcp and sctp brad penoff and alan wagner department...
TRANSCRIPT
![Page 1: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/1.jpg)
Towards MPI progression layer elimination with TCP and SCTP
Brad Penoff and Alan Wagner Department of Computer Science
University of British ColumbiaVancouver, Canada
HIPS 2006 April 25
Distributed SystemsGroup
![Page 2: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/2.jpg)
Portability Aspect of parallel processing integration
MPI API provides interface for portable parallel applications, independent of MPI implementation
Will my application run?
![Page 3: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/3.jpg)
MPI API
User Code
any MPI Implementation
Resources
![Page 4: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/4.jpg)
Portability
Aspect of parallel processing integration
MPI API provides interface for portable parallel applications, independent of MPI implementation
MPI Middleware provides glue for a variety of underlying components required for a complex parallel runtime environment, independent of component implementation
Will my application perform well?
![Page 5: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/5.jpg)
MPI Middleware
User Code
any MPI Implementation
Resources
![Page 6: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/6.jpg)
MPI Middleware
User Code
Job SchedulerComponent
Process Manager Component
Transport
Network
MPI Middleware
Message Progression
Communication Component
Operating System
Glues together components
Job SchedulerComponent
Process Manager Component
![Page 7: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/7.jpg)
Message Progression Communication Component
Maintains necessary state between MPI calls Calls not a simple library
function
Manages underlying communication through the OS (e.g. TCP) direct low-level interaction
(e.g. Infiniband)
User Code
Transport
Network
MPI Middleware
Message Progression
Communication Component
OS
![Page 8: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/8.jpg)
Communication Requirements
Common: Portability by having support
for all potential interconnects
In this work: Portability by eliminating this
component by assuming IP! Push MPI functionality down
onto IP-based transports Learn about necessary MPI
implementation design changes
Application
Middleware
transport Infiniband Myrinet. . .
Ethernet
IP
Application
Transport
IP
Ethernet Infiniband Myrinet. . .
![Page 9: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/9.jpg)
Component Elimination
User Code
Job SchedulerComponent
Process Manager Component
Network
MPI Middleware/Library
Message Progression
Communication Component
Operating System Transport
![Page 10: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/10.jpg)
Elimination Motivation
Common approach Exploit specific features for all potential interconnects
Middleware does transport-layer “things”
Sequencing & flow control complicates the middleware
Implemented differently, MPI implementations incompatible
Our approach here Assume IP
Leverage mainstream commodity networking advances
Simplify middleware
Increase MPI implementation interoperability (perhaps?)
![Page 11: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/11.jpg)
Elimination Approach View MPI as a protocol, from a
networking point-of-view
Design MPI with elimination as a goal
MPI Message matching Expected / unexpected queues Short / long protocol
Networking Demultiplexing Storage/buffering Flow control
![Page 12: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/12.jpg)
MPI Implementation Designs
TCP
SCTP
![Page 13: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/13.jpg)
TCP Socket Per TRC
General scheme Socket per MPI message stream (tag-
rank-context (TRC)) Control port
MPI_Send calls connect (MPI_Recv could wildcard)
Resulting socket stored in table attached to communicator object
![Page 14: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/14.jpg)
TCP-MPI as a Protocol Matching
select() fd sets for wildcards Queues
Unexpected = socket buffer w/ flow control Expected = more local, attached to
handles Short/long
No distinction, rely on TCP flow control
![Page 15: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/15.jpg)
TCP per TRC critique
Design achieves elimination, but… # sockets – OS user limits Expense of sys calls (context switch,
copying) select() – doesn’t scale Flow control
Mismatch : transport/OS = event driven vs. MPI application = control-driven
![Page 16: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/16.jpg)
SCTP-based design
![Page 17: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/17.jpg)
What is SCTP?
Stream Control Transmission Protocol General purpose unicast transport
protocol for IP network data communications
Recently standardized by IETF Can be used anywhere TCP is used
![Page 18: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/18.jpg)
Available SCTP stacks
BSD / Mac OS X LKSCTP – Linux Kernel 2.4.23 and later Solaris 10 HP OpenCall SS7 OpenSS7 Other implementations listed on
sctp.org for Windows, AIX, VxWorks, etc.
![Page 19: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/19.jpg)
Relevant SCTP features
Multistreaming
One-to-many socket style
Multihoming
Message-based
![Page 20: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/20.jpg)
Logical View of Multiple Streams in an Association
Flow control per association (not stream)
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
Stream 1
Stream 2
Stream 3
SEND
SEND
RECEIVE
RECEIVE
Inbound Streams
Outbound Streams
![Page 21: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/21.jpg)
Using SCTP for MPI TRC-to-stream map matches MPI semantics
MPISCTP
Context
Rank
Tag
One-to-Many Socket
Association
Streams
![Page 22: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/22.jpg)
SCTP-MPI as a protocol Matching – required since cannot receive
from a particular stream sctp_recvmsg() = ANY_RANK + ANY_TAG Avoids select() through one-to-many socket
Queues – globally required for matching Short/Long – required; flow control not
per stream
![Page 23: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/23.jpg)
SCTP and elimination
SCTP thins the middleware but the component cannot be eliminated Need flow control per stream Need ability to receive from stream Need ability to query which streams
have data ready
![Page 24: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/24.jpg)
Conclusions TCP design eliminates but doesn’t scale
SCTP scales but only thins component
SCTP one-to-many socket style requires additional features for elimination Flow control per stream Ability to receive from stream Ability to query which streams have data
ready
![Page 25: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/25.jpg)
More information about our work is at:http://www.cs.ubc.ca/labs/dsg/mpi-sctp/
Thank you!
Or Google “sctp mpi”
![Page 26: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/26.jpg)
Upcoming annual SCTP Interop
July 30 – Aug 4, 2006 to be held at UBC
Vendors and implementers test their stacks Performance Interoperability
![Page 27: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/27.jpg)
Extra slides
![Page 28: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/28.jpg)
MPI Point-to-Point
Message matching is done based on Tag, Rank and Context (TRC).
Combinations such as blocking, non-blocking, synchronous, asynchronous, buffered, unbuffered.
Use of wildcards for receive
MPI_Send(msg,cnt,type,dst-rank,tag,context)
MPI_Recv(msg,cnt,type,src-rank,tag,context)
Payload
Format of MPI Message
Context Rank Tag
Envelope
![Page 29: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/29.jpg)
MPI Messages Using Same Context, Two Processes
Process X Process Y
Msg_1MPI_Send(Msg_1,Tag_A)
MPI_Irecv(..ANY_TAG..)
MPI_Send(Msg_2,Tag_B)
MPI_Send(Msg_3,Tag_A) Msg_3
Msg_2
Process X Process Y
Msg_1MPI_Send(Msg_1,Tag_A)
MPI_Send(Msg_2,Tag_B)
MPI_Send(Msg_3,Tag_A)Msg_3
Msg_2
MPI_Irecv(..ANY_TAG..)
![Page 30: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/30.jpg)
MPI Messages Using Same Context, Two Processes
Process X Process Y
Msg_1
MPI_Send(Msg_1,Tag_A)
MPI_Send(Msg_2,Tag_B)
MPI_Send(Msg_3,Tag_A)Msg_3
Msg_2
MPI_Irecv(..ANY_TAG..)
Out of order messages withsame tagsviolate MPI semantics
![Page 31: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/31.jpg)
Associations and Multihoming
Endpoint X
NIC1 NIC2
Endpoint Y
NIC3 NIC4
Network207.10.x.x
Network168.1.x.x
IP=207 .10.40.1
IP=168.1.140.10IP=168.1.10.30
IP=207.10.3.20
Association
![Page 32: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/32.jpg)
SCTP Key Similarities
Reliable in-order delivery, flow control, full duplex transfer.
TCP-like congestion control
Selective ACK is built-in the protocol
![Page 33: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/33.jpg)
SCTP Key Differences
Message oriented
Added security
Multihoming, use of associations
Multiple streams within an association
![Page 34: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/34.jpg)
MPI over SCTP
LAM and MPICH2 are two popular open source implementations of the MPI library.
We redesigned LAM to use SCTP and take advantage of its additional features.
Future plans include SCTP support within MPICH2.
![Page 35: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/35.jpg)
How can SCTP help MPI? A redesign for SCTP thins the MPI
middleware’s communication component. Use of one-to-many socket-style scales well.
SCTP adds resilience to MPI programs. Avoids unnecessary head-of-line blocking with
streams Increased fault tolerance in presence of
multihomed hosts Built-in security features Improved congestion control
Full Results Presented @
![Page 36: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/36.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg A Msg D Msg EMsg B Msg C
Send order
![Page 37: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/37.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg A
Msg D Msg EMsg B Msg C
Send order
![Page 38: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/38.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg A
Msg D Msg E
Msg B
Msg C
Send order
![Page 39: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/39.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg A
Msg D Msg E
Msg B
Msg C
Send order
![Page 40: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/40.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg A Msg D
Msg E
Msg B
Msg C
Send order
![Page 41: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/41.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg A Msg D
Msg E
Msg B
Msg C
Send order
![Page 42: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/42.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg A
Msg D
Msg E
Msg B
Msg C
Receive order
![Page 43: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/43.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg A
Msg D
Msg E
Msg B
Msg C
Receive order
![Page 44: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/44.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg A
Msg D
Msg E
Msg B Msg C
Receive order
![Page 45: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/45.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg E
Receive order
Msg A Msg DMsg B Msg C
![Page 46: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/46.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg A Msg D Msg EMsg B Msg C
Receive order
![Page 47: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/47.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg A Msg D Msg EMsg B Msg C
Can be received in the same order as it was sent (required in TCP).
![Page 48: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/48.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg A Msg D
Msg E
Msg B
Msg C
Alternative receiveorder
![Page 49: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/49.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg A Msg D
Msg B
Msg C
Alternative receiveorder
Msg E
![Page 50: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/50.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg D
Msg B
Msg C
Msg AMsg E
Alternative receiveorder
![Page 51: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/51.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg D
Msg B
Msg C
Msg AMsg E
Alternative receiveorder
![Page 52: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/52.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg D
Msg B Msg CMsg AMsg E
Alternative receiveorder
![Page 53: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/53.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg DMsg B Msg CMsg AMsg E
Alternative receiveorder
![Page 54: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/54.jpg)
Partially Ordered User Messages Sent
on Different Streams
Endpoint X Endpoint Y
Stream 1
Stream 2
Stream 3
SENDRECEIVE
Msg DMsg B Msg CMsg AMsg E
Delivery constraints: A must be before C and C must be before D
![Page 55: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/55.jpg)
MPI Middleware
JobScheduler
ProcessManager
Message Progression
Communication Component
MPI Middleware
Parallel Application
Parallel Application
Resource Resource Resource Resource Resource
Parallel Application
Parallel Application
MPI Parallel Library API
Components{ }←
![Page 56: Towards MPI progression layer elimination with TCP and SCTP Brad Penoff and Alan Wagner Department of Computer Science University of British Columbia Vancouver,](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649f525503460f94c762b8/html5/thumbnails/56.jpg)
Elimination Motivation
• Common approach : Exploit specific features for all potential interconnects– Middleware does transport-layer “things”
• Sequencing & flow control complicates the middleware
– Implemented differently, MPI implementations incompatible
• Our approach here : Assume IP – Leverage mainstream commodity networking advances
• Simplify middleware
– Increase MPI implementation interoperability