![Page 1: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697c0251a28abf838cd4e58/html5/thumbnails/1.jpg)
Implementing Babel RMI with ARMCI
Jian YinKhushbu AgarwalDaniel ChavarríaManoj Krishnan
Ian GortonVidhya Gurumoorthi
Patrick Nichols
![Page 2: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697c0251a28abf838cd4e58/html5/thumbnails/2.jpg)
Motivation
Remote Method Invocation provides a useful abstraction for distributed computing
Example: event service for CCA framework
Existing TCP/IP based implementation has performance problemsQuestion: can we speed up Babel RMI with high performance communication protocols
2
![Page 3: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697c0251a28abf838cd4e58/html5/thumbnails/3.jpg)
Objectives
Demonstrate that it is feasible to build high performance Babel RMI
Prototype a Babel RMI with ARMCI and measure its performance experimentally
Produce a quality implementation of high performance RMI
3
![Page 4: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697c0251a28abf838cd4e58/html5/thumbnails/4.jpg)
Outline
MotivationObjectivesBackground
Babel RMI
ARMCI
Preliminary performance resultsFuture works
4
![Page 5: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697c0251a28abf838cd4e58/html5/thumbnails/5.jpg)
Babel RMI
Babel supports Remote Method InvocationTransparent
Flexible
Implemented with extensive code marshalling and runtime libraryExisting TCP/IP based implementation incurs high overhead
Multiple copying
Context switching
5
![Page 6: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697c0251a28abf838cd4e58/html5/thumbnails/6.jpg)
TCP RMI Performance
6
![Page 7: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697c0251a28abf838cd4e58/html5/thumbnails/7.jpg)
ARMCI
Middleware for remote memory access (RMA)Support many networks and HPC systems
Myrinet, Infiniband, Quadrics, Giganet, …
Cray XT4, XT, X1, IBM BlueGene,…
Efficient
Minimum number of copying
Truly one side communication protocolPut, get, accumulating
Atomic read-modified-write, mutex
Blocking and non-blocking interfaces
7
![Page 8: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697c0251a28abf838cd4e58/html5/thumbnails/8.jpg)
Experiment Setup
Hardwarecluster with 11 nodes
4 core 2.4 GHz Intel Xeon processor
Infiniband DDR network
SoftwareBabel 1.4.0
ARMCI 1.4
OpenMPI 1.2.6
8
![Page 9: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697c0251a28abf838cd4e58/html5/thumbnails/9.jpg)
Implementation
Implemented extensive set of functions in the runtime library
InstanceHandle, Server, Invocation, Response, Call, Return, …
Usage Exampleshello_World h = hello_World__createRemote(armcihandler://<process_id>:<mutex_id>, &_ex);
hello_World h2 = hello_World__connect(armcihandler://<process_id>:<mutex_id>/<object_id>&_ex);
9
![Page 10: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697c0251a28abf838cd4e58/html5/thumbnails/10.jpg)
ARMCI RMI Performance
10
![Page 11: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697c0251a28abf838cd4e58/html5/thumbnails/11.jpg)
Next Step
Reduce protocol overheadReduce function call overhead
Reduce copying
Batch RMI CallReduce RDMA overhead
Prefetch in the backgroundPreload libraries
Prefech arguments
11
![Page 12: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697c0251a28abf838cd4e58/html5/thumbnails/12.jpg)
Where to Use High Performance Babel RMI
Applications for high performance RMIFine grain distribution
Hybrid computing
Suggestions …
12