spin - usenix · you are here acsl - technion 8 summary background observations objective spin...
TRANSCRIPT
![Page 1: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/1.jpg)
Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and GPUsShai Bergman | Tanya Brokhman | Tzachi Cohen | Mark Silberstein
ACSL - Technion
SPIN:
![Page 2: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/2.jpg)
2ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
Summary What do we do?
Enable efficient file I/O for GPUs
Why?
Support diverse I/O workloads involving GPUs
How?
Make P2P a first class citizen within the file I/O stack
Results
Better throughputStandard file API
cross-GPU portability
![Page 3: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/3.jpg)
3ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
Fast data transfers
Bounded by extra copy
Data resides in SSD
BackgroundCPU mediated data
transfers introduce extra latency with lower
throughput
CPUIO - CPU mediated transfer
PCIe
![Page 4: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/4.jpg)
4ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
Eliminates redundant copies
Not involved in data path
Better power efficiency
Higher throughput & lower latency
Background
[1] J. Zhang, D. Donofrio, J. Shalf, M. T. Kandemir, and M. Jung, “NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures,”
[2] H.-W. Tseng, Y. Liu, M. Gahagan, J. Li, Y. Jin, and S. Swanson, “Gullfoss: Accelerating and Simplifying Data Movement Among Heterogeneous Computing and Storage Resources,”
[3] M. Shihab, K. Taht, and M. Jung, “GPUDrive: Reconsidering Storage Accesses for GPU Acceleration,”[4] H.-W. Tseng, Q. Zhao, Y. Zhou, M. Gahagan, and S. Swanson, “Morpheus: creating application objects efficiently for heterogeneous
computing,”[5]“Project Donard.” https://github.com/sbates130272/donard, 2015.
GPU vendors support P2P
&
PCIe
![Page 5: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/5.jpg)
5ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
0.00
0.50
1.00
1.50
2.00
2.50
3.00
512B53
4K262
16k472
32k675
64K846
512K2570
1M2692
Spee
du
p
Block size
CPUIO P2P
Observations
P2P is not a silver bullet…
P2P is great!*
* But wait – only for certain data access patterns
33x 7.6x
Sequential reads (e.g. grep)
P2P is not a silver bullet!
**data is not preloaded to the page cache
4.2x
CPUIO - CPU mediated transfer
Large sequential reads: P2P ~1.4x SpeedupShort sequential reads: P2P ~33x Slower than CPUIO?
![Page 6: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/6.jpg)
6ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
Hard to utilize
Non-standard API | No misaligned accesses | LVM/MDADM incompatible
No Page Cache Integration
No read ahead | Cannot utilize P$ for data reuse
ObservationsWhat went wrong?
P2P bypasses the kernel!
No file consistency
Can read stale data | Requires explicit flushes to SSD
![Page 7: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/7.jpg)
7ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
ObjectiveWhat do we want?
Regular file I/O to GPU memory
int fd;
...
//open file
...
pread64(fd,gpu_buffer,20*1024,0);
...
![Page 8: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/8.jpg)
8ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
SPIN:Contributions
Combine Page Cache and P2PInterleave system memory and SSD when possible
GPU Read AheadActivate read ahead mechanism when determined beneficial. Nested page cache within CPU memory for GPU Use
Standard File API• Underlying block device
support (RAID, LVM)
Data Consistency + POSIX file semanticsKeep POSIX file semantics + data consistency, even when CPU + GPU work on the same file
• Activate P2P when beneficial
![Page 9: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/9.jpg)
9ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
NVMe SSD
SPIN
P-routerP-Read Ahead policy
P-cache checker
P2PDMA
VFSPage cache
BlockLayer
NVMeDriver
GPU
GPU buffer
GPU addr. extraction
Current FS API
file
2.a 2.b
3.b
3.a
4.b
4.a
PCIe
P2P
DM
A t
ran
sfe
r
5.a
fd GPU buffer
1
pread( , )
5.b
GPURA
CPU PC
SPIN:High level overview
RQ
From Compatible SSD?Destined to GPU?Part of a sequential read?Data resides in page cache?
![Page 10: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/10.jpg)
10ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
NVMe SSD
SPIN
P-routerP-Read Ahead policy
P-cache checker
P2PDMA
VFSPage cache
BlockLayer
NVMeDriver
GPU
GPU buffer
GPU addr. extraction
Current FS API
file
2.a 2.b
3.b
3.a
4.b
4.a
PCIe
P2P
DM
A t
ran
sfe
r
5.a
fd GPU buffer
1
pread( , )
5.b
GPURA
CPU PC
SPIN:High level overview RQ
![Page 11: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/11.jpg)
11ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
NVMe SSD
SPIN
P-routerP-Read Ahead policy
P-cache checker
P2PDMA
VFSPage cache
BlockLayer
NVMeDriver
GPU
GPU buffer
GPU addr. extraction
Current FS API
file
2.a 2.b
3.b
3.a
4.b
4.a
PCIe
P2P
DM
A t
ran
sfe
r
5.a
fd GPU buffer
1
pread( , )
5.b
GPURA
CPU PC
SPIN:High level overview
RQ
![Page 12: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/12.jpg)
12ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
NVMe SSD
SPIN
P-routerP-Read Ahead policy
P-cache checker
P2PDMA
VFSPage cache
BlockLayer
NVMeDriver
GPU
GPU buffer
GPU addr. extraction
Current FS API
file
2.a 2.b
3.b
3.a
4.b
4.a
PCIe
P2P
DM
A t
ran
sfe
r
5.a
fd GPU buffer
1
pread( , )
5.b
GPURA
CPU PC
SPIN:High level overview RQ
![Page 13: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/13.jpg)
13ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
NVMe SSD
SPIN
P-routerP-Read Ahead policy
P-cache checker
P2PDMA
VFSPage cache
BlockLayer
NVMeDriver
GPU
GPU buffer
GPU addr. extraction
Current FS API
file
2.a 2.b
3.b
3.a
4.b
4.a
PCIe
P2P
DM
A t
ran
sfe
r
5.a
fd GPU buffer
1
pread( , )
5.b
GPURA
CPU PC
SPIN:High level overview
RQ
![Page 14: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/14.jpg)
14ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
NVMe SSD
SPIN
P-routerP-Read Ahead policy
P-cache checker
P2PDMA
VFSPage cache
BlockLayer
NVMeDriver
GPU
GPU buffer
GPU addr. extraction
Current FS API
file
2.a 2.b
3.b
3.a
4.b
4.a
PCIe
P2P
DM
A t
ran
sfe
r
5.a
fd GPU buffer
1
pread( , )
5.b
GPURA
CPU PC
SPIN:High level overview RQRQ
![Page 15: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/15.jpg)
15ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
NVMe SSD
SPIN
P-routerP-Read Ahead policy
P-cache checker
P2PDMA
VFSPage cache
BlockLayer
NVMeDriver
GPU
GPU buffer
GPU addr. extraction
Current FS API
file
2.a 2.b
3.b
3.a
4.b
4.a
PCIe
P2P
DM
A t
ran
sfe
r
5.a
fd GPU buffer
1
pread( , )
5.b
GPURA
CPU PC
SPIN:High level overview
P-cache checkerP-cache checker
![Page 16: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/16.jpg)
16ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
SPIN:Combine page cache and P2P
Sometimes the requested data resides in the P$
e.g due to previous usage of the data by CPU
0
500
1000
1500
2000
2500
0 500 1000 1500 2000Tr
ansf
er T
ime
[use
c]
Request Size [KiB]
P2P
P$
Transferring data from P$ is faster!
Transfer time from P$ and SSD P2P vs request size
Lower is better
![Page 17: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/17.jpg)
17ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
SPIN:Combine page cache and P2P
pread64(fd,gpu_dest,5*4096,0); //5 pages of 4KiB
Page cache is empty:
PCIe
Memory Bus
TransferP2P!
P-cache checker
Page Cache
EMPTY
Page #0 Page #1 Page #2 Page #3 Page #4
![Page 18: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/18.jpg)
18ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
Page #0 Page #1 Page #2 Page #3 Page #4
SPIN:Combine page cache and P2P
pread64(fd,gpu_destk,5*4096,0); //5 pages of 4KiB
All pread64 contents in P$:
PCIe
Memory Bus
Transferfrom memory (CPUIO)
P-cache checker
Page #0 Page #1 Page #2 Page #3 Page #4
Page Cache
EMPTY
Page Cache
![Page 19: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/19.jpg)
19ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
Page #0 Page #1 Page #2 Page #3 Page #4
SPIN:Combine page cache and P2P
pread64(fd,gpu_dest,5*4096,0); //5 pages of 4KiB
Some of the data is in P$?
PCIe
Memory Bus
Page #0 Page #1 Page #2 Page #3 Page #4
Page Cache
EMPTY
Page Cache
?
?#p1#p3
#p0#p2#p4
Fine grained interleaving is a bad idea!
Page resides in SSD only
Page resides in SSD and P$
![Page 20: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/20.jpg)
20ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
Page #0 Page #1 Page #2 Page #3 Page #4
SPIN:Combine page cache and P2P
pread64(fd,gpu_dest,5*4096,0); //5 pages of 4KiB
Page #0 Page #1 Page #2 Page #3 Page #4
Page resides in SSD only
Page resides in SSD and P$
3 transfers of 4KiB via P2P: 120.3us
Single transfer of 20KiB via P2P: 74.3us
40.1us 40.1us 40.1us
![Page 21: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/21.jpg)
21ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
SPIN:Combine page cache and P2P SSDs:
- Short IO requests are less efficient (low parallelism)- Invocation overhead per request
Fine grained interleaving = poor performance!
Optimization Problem: Find the transfer schedule to minimize transfer time
![Page 22: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/22.jpg)
22ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
0
500
1000
1500
2000
2500
32 512 992 1472 1952
Tran
sfer
Tim
e [s
ec]
Request Size [KiB]
P2P
P$
15361024 204851232
SPIN:Combine page cache and P2P
We model the SSD and RAM performance characteristics:- Assume P2P transfer time as piece-wise linear- Assume RAM transfer time as linear
To solve the problem & get an optimal schedule we need:
𝑇𝑝2𝑝 𝑠 - P2P transfer time for a given request size
𝑇𝑃$ 𝑠 - P$ transfer time for a given request size
![Page 23: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/23.jpg)
23ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
SPIN:Combine page cache and P2P
Solution is polynomial in number of blocks
We apply a greedy heuristic:- Examine every 3 consecutive data chunks
Costly to calculate for every transfer
Chunk #n Chunk #n+1 Chunk #n+2
Page resides in SSD only
Page resides in SSD and P$
𝑇𝑝2𝑝 𝑛 + 𝑛 + 1 + |𝑛 + 2|
𝑣𝑠.𝑇𝑝2𝑝 |𝑛| + 𝑇𝑃$ |𝑛 + 1| + 𝑇𝑝2𝑝 |𝑛 + 2|
Calculate:
![Page 24: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/24.jpg)
24ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
SPIN:Combine page cache and P2P
Solution is polynomial in number of blocks
We apply a greedy heuristic:- Examine every 3 consecutive data chunks
Costly to calculate for every transfer
Chunk #n Chunk #n+1 Chunk #n+2
Page resides in SSD only
Page resides in SSD and P$
𝑇𝑝2𝑝 𝑛 + 𝑛 + 1 + |𝑛 + 2|
𝑣𝑠.𝑇𝑝2𝑝 |𝑛| + 𝑇𝑃$ |𝑛 + 1| + 𝑇𝑝2𝑝 |𝑛 + 2|
Calculate:
Greedy Heuristic is only 1.6% slower than optimal
scheduling
![Page 25: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/25.jpg)
25ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
SPIN:Implementation:P2P & P$ Transfers
NVMe SSD
SPIN
P-routerP-Read Ahead policy
P-cache checker
P2PDMA
VFSPage cache
BlockLayer
NVMeDriver
GPU
GPU buffer
GPU addr. extraction
Current FS API
file
2.a 2.b
3.b
3.a
4.b
4.a
PCIe
P2P
DM
A t
ran
sfe
r
5.a
fd GPU buffer
1
pread( , )
5.b
GPURA
CPU PC
P2P: Address tunneling mechanism
P$: Memcpy from P$ to GPU mapped memory
![Page 26: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/26.jpg)
26ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
SPIN:Implementation& Evaluation
SPIN is implemented as a kernel module, patched
NVME module & an LD_PRELOAD library
No kernel modifications are required
System Specs:
- Intel P3700 NVME SSD
- AMD Radeon R9 Fury & NVIDIA Tesla K40c
- Ubuntu + Linux kernel 3.19
- Intel Core i7-5930K (6 Phys Cores) & X99 Chipset
- 24GB DDR4 RAM
![Page 27: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/27.jpg)
27ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
SPIN:Implementation& Evaluation
We have evaluated the following:
Threaded IO (TIOtest) Benchmark (1-4 threads):
- Sequential reads (including software RAID)
- Random reads/writes
- Effects of P$ residency on read throughput
- Effects CPU & I/O stress on read throughput
Application Benchmarks
- Aerial imagery rendering
- GPU accelerated log server
- Image collage utilizing GPUFS
![Page 28: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/28.jpg)
28ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
SPIN:Implementation& Evaluation
Effect of P$ on Read Throughput
Potential performance gains for producer-consumer workloads
0
20
40
60
80
100
120
07
108
208
3010
4011
5014
6017
7022
8032
9058
100239
Rel
ativ
e th
rou
ghp
ut
%
% of file in page cache
SPIN P2PDMA CPUIO
Thpt [MB/sec]:
*512B reads
No data in P$, less than 5%
overhead
All data in P$, less than 5%
overhead
![Page 29: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/29.jpg)
29ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
SPIN:Implementation& Evaluation
GPU Accelerated Log Server
- Store a log into SSD
- Analyze log using GPU acceleration for string matching
- Similar to fail2ban
Real time configuration:- Log arrives to server- Server stores logs in SSD- GPU analyzes logs by reading file
Offline configuration:- Log is already in SSD
- GPU analyzes logs by reading file
![Page 30: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/30.jpg)
30ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
SPIN:Implementation& Evaluation
GPU Accelerated Log Server
- Store a log into SSD
- Analyze log using GPU acceleration for string matching
- Similar to fail2ban
Real time configuration:- Log arrives to server- Server stores logs in SSD- GPU analyzes logs by reading file
Offline configuration:- Log is already in SSD
- GPU analyzes logs by reading file
We want our application to work efficiently in any
configuration
![Page 31: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/31.jpg)
31ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
SPIN:Implementation& Evaluation
GPU Accelerated Log Server
Real Time configuration
0
500
1000
1500
2000
2500
P2P CPUIO SPIN
BW
[M
BP
S]
Transfer Mechanism
Offline configuration
0
500
1000
1500
2000
2500
3000
P2P CPUIO SPIN
BW
[M
BP
S]
Transfer Mechanism
Data resides in p$ and SSDSPIN reads data from P$
Data resides in SSD onlySPIN utilizes P2P
I/O
Co
nte
nti
on
Ad
dit
ion
al “
ho
p”
![Page 32: SPIN - USENIX · You are here ACSL - Technion 8 Summary Background Observations Objective SPIN Conclusion SPIN: Contributions Combine Page Cache and P2P Interleave system memory and](https://reader033.vdocuments.site/reader033/viewer/2022041618/5e3cfef42d651031d0414719/html5/thumbnails/32.jpg)
32ACSL - TechnionYou are here
Summary
Background
Observations
Objective
SPIN
Conclusion
SPIN:Conclusion
Thank you!
[email protected] github.com/acsl-technion/spin
- SPIN seamlessly integrates P2P as a first class citizen
into the file I/O stack
- SPIN utilizes several mechanisms to speed up data
transfers transparently
- With SPIN, the same code performs well under all
setups