latency reduction techniques for remote memory access in anemone mark lewandowski department of...
TRANSCRIPT
![Page 1: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/1.jpg)
Latency Reduction Techniques for Remote Memory Access in ANEMONE
Mark LewandowskiDepartment of Computer ScienceFlorida State University
![Page 2: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/2.jpg)
Outline
Introduction Architecture / Implementation
Adaptive NEtwork MemOry engiNE (ANEMONE) Reliable Memory Access Protocol (RMAP) Two Level LRU Caching Early Acknowledgments
Experimental Results Future Work Related Work Conclusions
![Page 3: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/3.jpg)
Introduction
Virtual Memory performance is bound by slow disks
State of computers today lends to the idea of shared memory Gigabit Ethernet Machines on a LAN have lots
of free memory Improvements to ANEMONE
yield higher performance than disk and the original ANEMONE system
Cache
Memory
ANEMONEDisk
Registers
![Page 4: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/4.jpg)
Contributions
Pseudo Block Device (PBD) Reliable Memory Access Protocol
Replace NFS
Early Acknowledgments Shortcut Communication Path
Two Level LRU-Based Caching Client Memory Engine
![Page 5: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/5.jpg)
ANEMONE Architecture
ANEMONE (NFS) ANEMONE
ClientNFS Swapping Pseudo Block Device (PBD)
Swap Daemon Cache Client Cache
Memory EngineNo caching Engine Cache
Must wait for server to receive page
Early ACKs
Memory ServerCommunicates with Memory Engine
Communicates with Memory Engine
![Page 6: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/6.jpg)
Architecture
Client Module
RMAP Protocol
Engine Cache
![Page 7: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/7.jpg)
Pseudo Block Device
Provides a transparent interface for swap daemon and ANEMONE
Is not a kernel modification Begins handling READ/WRITE requests in
order of arrivalNo expensive elevator algorithm
![Page 8: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/8.jpg)
IP
Transport
Application
RMAP
Swap Daemon
Ethernet
Reliable Memory Access Protocol (RMAP) Lightweight Reliable Flow Control Protocol sits next to
IP layer to allow swap daemon quick access to pages
![Page 9: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/9.jpg)
RMAP• Window Based Protocol
• Requests are served as they arrive
• Messages:
•REG/UNREG – Register the client with the ANEMONE cluster
•READ/WRITE – send/receive data from ANEMONE
•STAT – retrieves statistics from the ANEMONE cluster
![Page 10: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/10.jpg)
Why do we need cache?
It is a natural answer to on-disk buffers Caching reduces network traffic Decreases Latency
Write latencies benefit the mostBuffers requests before they are sent over the
wire
![Page 11: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/11.jpg)
Basic Cache Structure
FIFO Queue is used to keep track of LRU page Hashtable is used for fast page lookups
FIFO Queue
Cache_entry
Hash Function
TailHead
Index (Hash Table)
struct cache_entry { struct list_head queue; /* points to the linked list that makes up the cache */ unsigned long offset; /* Offset of page */ u8 *page; /* the page */ int write; struct sk_buff *skb; /* This may or may not point to an sk_buff. If it does, * then the cache must take care to call kfree_skb when the * page is kicked out of memory (this is to avoid a memcpy). */ int answered;
};
![Page 12: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/12.jpg)
ANEMONE Cache Details
Client Cache 16 MB Write-Back Memory allocation at load time
Engine Cache 80 MB Write-Through Partial memory allocation at load time
sk_buffs are copied when they arrive at the Engine
![Page 13: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/13.jpg)
Early Acknowledgments
`
ClientMemory Engine
Memory Server
• Reduces client wait time• Can reduce write latency by up to 200 µs per
write request• Early ACK performance is slowed by small RMAP
window size• Small pool (~200) of sk_buffs are maintained for
forward ACKing
![Page 14: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/14.jpg)
Experimental Testbed
Experimental testbed configured with 400,000 blocks (4KB page) of memory (~1.6 GB)
![Page 15: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/15.jpg)
Experimental Description
Latency 100,000 Read/Write requests
Sequential/Random Application Run Times
Quicksort / POV-Ray Single/Multiple Processes
Execution Times Cache Performance
Measured cache hit rates Client / Engine
![Page 16: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/16.jpg)
Sequential Read
![Page 17: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/17.jpg)
Sequential Write
![Page 18: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/18.jpg)
Random Read
![Page 19: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/19.jpg)
Random Write
![Page 20: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/20.jpg)
Single Process Performance
Increase single process size by 100 MB for each iteration Quicksort: 298% performance increase over disk, 226% increase
over original ANEMONE POV-Ray: 370% performance increase over disk, 263% increase
over original ANEMONE
![Page 21: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/21.jpg)
Multiple Process Performance Increase number of 100 MB processes by 1 for each
iteration Quicksort: 710% increase over disk, and 117% increase
over original ANEMONE POV-Ray: 835% increase over disk, and 115% increase
over original ANEMONE
![Page 22: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/22.jpg)
Client Cache Performance Hits save ~500 µs POV-Ray hit rate
saves ~270 seconds for 1200 MB test
Quicksort hit rate saves ~45 seconds for 1200 MB test
Swap daemon interferes with cache hit rates Prefetching
![Page 23: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/23.jpg)
Engine Cache Performance
Cache performance levels out ~10%
POV-Ray does not exceed 10% because it performs over 3x the number of page swaps that Quicksort does
Engine cache saves up to 1000 seconds for 1200 MB POV-Ray
![Page 24: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/24.jpg)
Future Work
More extensive testing Aggressive caching algorithms Data Compression Page Fragmentation P2P RDMA over Ethernet Scalability and Fault tolerance
![Page 25: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/25.jpg)
Related Work
Global Memory System [feeley95] Implements a global memory management algorithm over ATM Does not directly address Virtual Memory
Reliable Remote Memory Pager [markatos96], Network RAM Disk [flouris99] TCP Sockets
Samson [stark03] Myrinet Does not perform caching
Remote Memory Model [comer91] Implements custom protocol Guarantees in-order delivery
![Page 26: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/26.jpg)
Conclusions
ANEMONE does not modify client OS or applications
Performance increases by up to 263% for single processes
Performance increases by up to 117% for multiple processes
Improved caching is provocative line of research, but more aggressive algorithms are required.
![Page 27: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/27.jpg)
Questions?
![Page 28: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/28.jpg)
Appendix A: Quicksort Memory Access Patterns
![Page 29: Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University](https://reader033.vdocuments.site/reader033/viewer/2022051618/56649f505503460f94c72792/html5/thumbnails/29.jpg)
Appendix B: POV-Ray Memory Access Patterns