practical memory disaggregation
TRANSCRIPT
![Page 1: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/1.jpg)
1
Practical Memory DisaggregationA Case Study in Network-Informed Data Systems Design
Mosharaf ChowdhuryNovember 2020
![Page 2: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/2.jpg)
Five Years Ago…
3
The volume of data businesses want to make sense of is increasing
2015 2016 2017 2018
![Page 3: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/3.jpg)
1. Data Volume Will Keep Increasing
4
2015 2016 2017 2018
![Page 4: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/4.jpg)
Data Systems
5
Big Data AnalyticsAI/ML Tools
Massive dataHigh parallelismGPU clustersDistributed…
![Page 5: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/5.jpg)
> 100 ms
Over the World
2. Deployed in Diverse Networks
< 10 µs
Within a Rack Within a Datacenter~ 1 ms
6
![Page 6: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/6.jpg)
Network-Informed Data Systems Design
7
I. Network-adaptive Big Data and AI/ML systems
II. Tailoring data systems to extreme networksI. Computation over the InternetII. Leveraging high-speed networks
2016 20202017 20192018
HU
G
CO
DA
Herm
es
EC-C
ache
Carbyne
QO
OP
Tiresias
Salu
s
AlloX
NO
CS
Sol
Pand
o
Infin
iswap
DSL
R
Leap
NetLock
CellScope
![Page 7: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/7.jpg)
PracticalMemoryDisaggregation
8
![Page 8: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/8.jpg)
Memory is King!
9
![Page 9: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/9.jpg)
Perform Great!
36.18
6.619
1.5420
5
10
15
20
25
30
35
40
100% 75% 50%
TPS
(T
hous
ands
)
In-Memory Working Set
TPC-C on VoltDB
10
![Page 10: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/10.jpg)
Perform Great Until Memory Runs Out
36.18
6.619
1.5420
5
10
15
20
25
30
35
40
100% 75% 50%
TPS
(T
hous
ands
)
In-Memory Working Set
TPC-C on VoltDB
11
![Page 11: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/11.jpg)
Perform Great Until Memory Runs Out
36.18
6.619
1.5420
5
10
15
20
25
30
35
40
100% 75% 50%
TPS
(T
hous
ands
)
In-Memory Working Set
95.8
44.9
23.8
0
20
40
60
80
100
120
100% 75% 50%O
ps (
Tho
usan
ds)
In-Memory Working Set
TPC-C on VoltDB FB Workload on Memcached
12
![Page 12: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/12.jpg)
Perform Great Until Memory Runs Out
36.18
6.619
1.5420
5
10
15
20
25
30
35
40
100% 75% 50%
TPS
(T
hous
ands
)
In-Memory Working Set
57 67.5
453.4
1
10
100
1000
100% 75% 50%
Com
plet
ion
Tim
e (s
)
In-Memory Working Set
TPC-C on VoltDB FB Workload on Memcached PageRank on PowerGraph95
.8
44.9
23.8
0
20
40
60
80
100
120
100% 75% 50%O
ps (
Tho
usan
ds)
In-Memory Working Set
13
![Page 13: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/13.jpg)
50% Less Memory Causes Slowdown of …
36.18
6.619
1.5420
5
10
15
20
25
30
35
40
100% 75% 50%
TPS
(T
hous
ands
)
In-Memory Working Set
57 67.5
453.4
1
10
100
1000
100% 75% 50%
Com
plet
ion
Tim
e (s
)
In-Memory Working Set
TPC-C on VoltDB FB Workload on Memcached PageRank on PowerGraph95
.8
44.9
23.8
0
20
40
60
80
100
120
100% 75% 50%O
ps (
Tho
usan
ds)
In-Memory Working Set
14
![Page 14: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/14.jpg)
Between a Rock and a Hard Place
OverallocationLeads to underutilization
30-50% in Google, Alibaba, and Facebook
UnderallocationLeads to severe performance loss
VS.
15
![Page 15: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/15.jpg)
Machine 1 Machine 2 Machine 3 Machine N
Used Memory Free Memory
…
Disaggregated Memory
Memory Disaggregation
Remote Memory 16
![Page 16: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/16.jpg)
Network is Getting Faster!
TCP/IP Hundreds of µsec
DPDK Tens of µsec
RDMA Single-digit µsec
Hundreds of nsecDRAM
time to access a 4KB memory page17
![Page 17: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/17.jpg)
What is PracticalMemoryDisaggregation?
1. Applicability2. Scalability3. Efficiency4. Performance5. Isolation6. Resilience7. Security8. Generality9. …
18
![Page 18: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/18.jpg)
1. Applicability2. Scalability3. Efficiency4. Performance5. Isolation6. Resilience7. Security8. Generality9. …
19
What is PracticalMemoryDisaggregation?
![Page 19: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/19.jpg)
Infiniswap
w/ Juncheng Gu and many othersNSDI’17
Efficient Memory Disaggregation
20
![Page 20: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/20.jpg)
Applicability
Application ChangesPe
rfor
man
ce
Best
Poor
Major NoMinor
Key-Value
File
Disk
Paging
Very Good
File
Disk
Key-Value
Paging
How can we enable any application to leverage disaggregated memory without sacrificing performance?
![Page 21: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/21.jpg)
Remote Memory Paging
22
Exposes memory across server boundaries• Scalable• Efficient• Fault-tolerant
No changes to• applications,• operating systems, or• hardware
![Page 22: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/22.jpg)
Core Idea
23
1. Infiniswap Block DeviceFinds free remote memory, maps pages, and provides fault tolerance without any central coordination
2. Infiniswap DaemonProactively evicts remote pages to ensure transparent, best-effortservice
Exposes free remote memory as swap devices in a decentralized manner without affecting remote processes
![Page 23: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/23.jpg)
Infiniswap in One Slide
Container 1 Container NInfiniswapDaemon
…
User Space
Kernel Space
Local Disk
Async Sync
RNIC
2 3 2
Machine-1
User Space
Kernel Space
InfiniswapDaemonContainer A
Machine-2
X Mapped to memory of Machine-X
Virtual Memory Manager (VMM)
Infiniswap Block Device
Individual pagePage fault
User Space
Kernel Space
InfiniswapDaemonContainer A
Machine-3
Container 1 Container NInfiniswapDaemon
…
User Space
Kernel Space
Local Disk
Async Sync
RNIC
2 3
Machine-N
Virtual Memory Manager (VMM)
Infiniswap Block Device
24
![Page 24: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/24.jpg)
Scalability via Decentralization
How to find free remote memory in a large cluster?• Problem: Centralized solution can be slow and expensive• Solution: Power of two choices
How to evict mapped memory?• Problem: LRU/LFU is hard because one-sided RDMA bypasses CPU• Solution: Power of many choices
25
![Page 25: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/25.jpg)
Higher Efficiency & Better Load Balancing
Higher Utilization
0
20
40
60
80
100
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29Mem
ory
Util
izat
ion
(%)
Rank of 32 Machines
Infiniswap w/o Infiniswap
26
![Page 26: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/26.jpg)
35.89
27.74
19.33
0
5
10
15
20
25
30
35
40
100% 75% 50%
TPS
(T
hous
ands
)
In-Memory Working Set
56.1 63.9 64.2
1
10
100
1000
10000
100% 75% 50%
Com
plet
ion
Tim
e (s
)
In-Memory Working Set
99.1
100.
4
91.3
0
20
40
60
80
100
120
140
160
100% 75% 50%O
ps (
Tho
usan
ds)
In-Memory Working Set
TPC-C on VoltDB FB Workload on Memcached
Even on 50% Memory, Slowdown is
PageRank on PowerGraph
<2X ≈1X ≈1X 27
![Page 27: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/27.jpg)
1. Applicability2. Scalability3. Efficiency4. Performance5. Isolation6. Resilience7. Security8. Generality9. …
28
What is PracticalMemoryDisaggregation?
![Page 28: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/28.jpg)
w/ Hasan Al MarufATC’20 Best Paper
29
LeapEffectively Prefetching Remote Memory
![Page 29: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/29.jpg)
User Space
Kernel Space
Device Mapping Layer
Block Device Driver
Generic Block Layer
I/O Scheduler Request QueueRequest queue processing:
Insertion, Merging, Sorting, Staging and Dispatch
bio
Remote Memory
Dispatch Queue
Memory ManagementUnit (MMU)
Process 1 Process 2 Process N…
Page Fault
RDMA: 4.3 us
0.27 us
10.04 us
21.88 us
2.1 us
CacheMiss
CacheHit
MMU Page Cache
Life of a Page
30
![Page 30: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/30.jpg)
Where Does the Time Go?Page Request
In Page Cache?
Allocate Cache for Page
Read Request?
No
Yes
Update Page Table & End I/OPrepare for I/O
YesNo
Queue and Batch Requests
Execute I/O
0.12 µs
2.1 µs
10.04 µs
21.88 µs
RDMA: 4.3 µs
0.15 µs
Fast Path
Slow Path
31
![Page 31: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/31.jpg)
We Need to
32
1. Increase cache hit• Faster path serves more page faults
2. Reduce the latency of the slow path• Remove block-layer operations unnecessary for RDMA
![Page 32: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/32.jpg)
User Space
Kernel Space
Device Mapping Layer
Block Device Driver
Generic Block Layer
I/O Scheduler Request QueueRequest queue processing:
Insertion, Merging, Sorting, Staging and Dispatch
bio
Remote Memory
Dispatch Queue
Memory ManagementUnit (MMU)
Process 1 Process 2 Process N…
Page Fault
RDMA: 4.3 us
0.27 us
10.04 us
21.88 us
2.1 us
CacheMiss
CacheHit
MMU Page Cache
Life of a Page
33
![Page 33: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/33.jpg)
User Space
Kernel Space
Remote Memory
Memory ManagementUnit (MMU)
Process 1 Process 2 Process N…
Page Fault
RDMA: 4.3 us
0.27 us
2.1 us
CacheMiss
CacheHit
MMU Page Cache
Life of a Pagew/ Leap
Process Specific Page Access Tracker
Leap
Trend Detection
Prefetch CandidateGeneration
Prefetcher
Eager Cache Eviction
34
0.34 us
![Page 34: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/34.jpg)
Prefetching in Linux
Reads ahead pages sequentially
Based only on the last page access
Does not distinguish between processesCannot detect thread-level access irregularities
too aggressive on seq: cache pollution
too conservative off seq: brings nothing
35
![Page 35: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/35.jpg)
Trend Detection in LeapStart with a smaller window of Access History
Majority found?
Doubles the window size
No Yes
Run Boyer-Moore on the window
Return Majority ∆maj
Max. window
size?
YesNo trend found
No
Resilient to short term irregularity
Identifies the majority element in access history
Regular trends can be found within recent accesses
36
![Page 36: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/36.jpg)
Lowers Remote Page Access Latency by…Sequential Access Stride Access
0
0.2
0.4
0.6
0.8
1
0.01 1 100 10000
CD
F
Latency (us)
Infiniswap
Infiniswap+Leap
0
0.2
0.4
0.6
0.8
1
0.01 1 100 10000
CD
F
Latency (us)
4X 104X37
![Page 37: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/37.jpg)
Performs Great Even After Memory Runs Out
TPC-C on VoltDB
<2X
37.00
27.74
19.33
1.50
5
10
15
20
25
30
35
40
100% 75% 50% 25%
TPS
(T
hous
ands
)
In-Memory Working Set
Infiniswap
37 36.3 35.6
15.6
0
5
10
15
20
25
30
35
40
100% 75% 50% 25%
TPS
(T
hous
ands
)
In-Memory Working Set
TPC-C on VoltDB
Infiniswap + Leap
≈1X 38
![Page 38: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/38.jpg)
Performs Great Even After Memory Runs Out
TPC-C on VoltDB
37.00
27.74
19.33
1.50
5
10
15
20
25
30
35
40
100% 75% 50% 25%
TPS
(T
hous
ands
)
In-Memory Working Set
Infiniswap
37 36.3 35.6
15.6
0
5
10
15
20
25
30
35
40
100% 75% 50% 25%
TPS
(T
hous
ands
)
In-Memory Working Set
TPC-C on VoltDB
Infiniswap + Leap
2.4X 39
![Page 39: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/39.jpg)
Applicability & Performance
Application Changes
Perf
orm
ance
Best
Poor
Major NoMinor
Very Good
File
Disk
Key-Value
Infiniswap
40
Infiniswap + Leap
![Page 40: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/40.jpg)
1. Applicability2. Scalability3. Efficiency4. Performance5. Isolation6. Resilience7. Security8. Generality9. …
41
What is PracticalMemoryDisaggregation?
Memtrade
Justitia
Hydra
![Page 41: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/41.jpg)
1. Applicability2. Scalability3. Efficiency4. Performance5. Isolation6. Resilience7. Security8. Generality9. …
42
What is PracticalMemoryDisaggregation?
![Page 42: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/42.jpg)
w/ ZhuolongYu, Yiwen Zhang and othersSIGCOMM’20
43
NetLockLock Management with Programmable Switches
![Page 43: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/43.jpg)
Transactions
44
Transaction processing needs• High throughput;• Low latency; and• Policy support
Existing approaches• Centralized: low throughput• Decentralized: limited policy support
![Page 44: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/44.jpg)
Network-Assisted Lock Management
45
Transaction processing needs• High throughput;• Low latency; and• Policy support
Challenges• Limited memory to store the locks• Limited functionalities to process the
locks and realize the policies
Programmable Switch
![Page 45: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/45.jpg)
NetLock processes lock requests with a combination of switch and servers• The switch only stores and
processes the requests on hot locks• Servers do the rest
Implemented on a 6.5Tbps Barefoot Tofino switch
46
Clients
L2/L3 Routing
LockTable
Lock TableServer
DatabaseServers
ToR Switch Lock TableServer
NetLock
NetLock Architecture
![Page 46: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/46.jpg)
Switch Memory Disaggregation
47
Determine how much switch memory is needed for a target throughput• Formulated as a fractional knapsack problem• Depends of expected contention
Handling overflow• Move locks back and forth between switch and servers
![Page 47: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/47.jpg)
48
Single µs Latency
20X lower latency for TPC-C over DSLR
![Page 48: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/48.jpg)
Billions of Locks/Sec
49
18X higher throughput for TPC-C over DSLR
![Page 49: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/49.jpg)
1. Applicability2. Scalability3. Efficiency4. Performance5. Isolation6. Resilience7. Security8. Generality9. …
50
What is PracticalMemoryDisaggregation?
![Page 50: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/50.jpg)
51
Memory Disaggregation
Wide-Area Computing
AI/MLSystems
Big DataSystems
![Page 51: Practical Memory Disaggregation](https://reader036.vdocuments.site/reader036/viewer/2022071612/6157014aa097e25c764ffcb5/html5/thumbnails/51.jpg)
Network-Informed Data Systems Design
52
Juncheng Gu Jie You Yiwen ZhangFan Lai Jiachen Liu Hasan Al Maruf Peifeng Yu
PhD Students
Undergraduate& Master’s
Collaborators
Chris ChenYinwei DaiShuoren FuSongyuan Guan
Jack KosaianQinye LiYang LiuYuze Lou
Alexander NebenWenting TanYue TanKaiwei Tu
Yuchen WangYujia XieYilei XuJiaxing Yang
Yiwei ZhangJiangchen ZhuJingyuan ZhuXiangfeng Zhu
Aditya AkellaGanesh AnanthanarayananWei BaiVladimir BravermanShuchi ChawlaKai ChenLi ChenAsaf CidonYanhui GengAli GhodsiAyush Goel
Robert GrandlChuanxiong GuoMatan HamilisAnthony HuangAnand P. IyerMyeongjae JeonXin JinSamir KhullerTan N. LeYoungmoon LeeLi Erran Li
Hongqiang LiuZhenhua LiuHarsha V. MadhyasthaKshiteej MahajanBarzan MozafariLinh NguyenAurojit PandaManish PurohitJunjie QianKannan RamchandranK. V. Rashmi
Kang G. ShinScott ShenkerBrent StephensIon StoicaXiao SunMuhammed UluyolShivaram VenkataramanCarl WaldspurgerHongyi WangJingfeng WuSheng Yang
BairenYiDong Young YoonZhuolongYuHong ZhangJunxue ZhangYuhong ZhongYibo Zhu