strata: a cross media file systemawang/courses/cop5611_s2020/strata.pdfintuitive crash consistency....
TRANSCRIPT
![Page 1: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/1.jpg)
Strata: A Cross Media File System
1
Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson
![Page 2: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/2.jpg)
Let’s build a fast server
2
Requirements
• Small updates (1 Kbytes) dominate
• Dataset scales up to 10 TB
• Updates must be crash consistent
NoSQL store, Database, File server, Mail server …
![Page 3: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/3.jpg)
Storage diversification
Latency $/GB
DRAM 100 ns 8.6
NVM (soon) 300 ns 4.0
SSD 10 us 0.25
HDD 10 ms 0.02 Bet
ter p
erfo
rman
ce
Hig
her c
apac
ity
3
![Page 4: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/4.jpg)
Storage diversification
Latency $/GB
DRAM 100 ns 8.6
NVM (soon) 300 ns 4.0
SSD 10 us 0.25
HDD 10 ms 0.02 Bet
ter p
erfo
rman
ce
Hig
her c
apac
ity
3
Byte-addressable: cache-line granularity IO
![Page 5: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/5.jpg)
Storage diversification
Latency $/GB
DRAM 100 ns 8.6
NVM (soon) 300 ns 4.0
SSD 10 us 0.25
HDD 10 ms 0.02 Bet
ter p
erfo
rman
ce
Hig
her c
apac
ity
3
Large erasure blocks need to be sequentially written Random writes: 5~6x slowdown due to GC [FAST’15]
Byte-addressable: cache-line granularity IO
![Page 6: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/6.jpg)
Application
A fast server on today’s file system
4
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
91%Kernel file system
NVM
Kernel file system: NOVA [FAST 16, SOSP 17]
![Page 7: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/7.jpg)
Application
A fast server on today’s file system
4
Small, random IO is slow!
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
91%Kernel file system
NVM
Kernel file system: NOVA [FAST 16, SOSP 17]
![Page 8: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/8.jpg)
Application
A fast server on today’s file system
4
Small, random IO is slow!
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
91%Kernel file system
NVM
1 KB
IO latency (us)0 1.5 3 4.5 6
Write to device Kernel code91%
Kernel file system: NOVA [FAST 16, SOSP 17]
![Page 9: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/9.jpg)
Application
A fast server on today’s file system
4
Small, random IO is slow!
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
91%Kernel file system
NVM
1 KB
IO latency (us)0 1.5 3 4.5 6
Write to device Kernel code91%
Kernel file system: NOVA [FAST 16, SOSP 17] NVM is so fast that kernel is the bottleneck
![Page 10: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/10.jpg)
A fast server on today’s file system
5
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
Kernel file system
NVM
Application
![Page 11: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/11.jpg)
A fast server on today’s file system
5
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
Need huge capacity, but NVM alone is too expensive! ($40K for 10TB)
Kernel file system
NVM
Application
![Page 12: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/12.jpg)
A fast server on today’s file system
5
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
Need huge capacity, but NVM alone is too expensive! ($40K for 10TB)
Kernel file system
NVM
Application
For low-cost capacity with high performance, must leverage multiple device types
![Page 13: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/13.jpg)
A fast server on today’s file system
6
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
Block-level caching
NVM SSD HDD
Kernel file system
Application
![Page 14: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/14.jpg)
A fast server on today’s file system
6
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
• Block-level caching manages data in blocks, but NVM is byte-addressable
• Extra level of indirection
Block-level caching
NVM SSD HDD
Kernel file system
Application
![Page 15: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/15.jpg)
A fast server on today’s file system
6
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
• Block-level caching manages data in blocks, but NVM is byte-addressable
• Extra level of indirection
Block-level caching
NVM SSD HDD
Kernel file system
Application
1 KB
IO latency (us)
0 3 6 9 12
NOVA Block-level caching
Better
![Page 16: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/16.jpg)
A fast server on today’s file system
6
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
• Block-level caching manages data in blocks, but NVM is byte-addressable
• Extra level of indirection
Block-level caching
NVM SSD HDD
Kernel file system
Application
1 KB
IO latency (us)
0 3 6 9 12
NOVA Block-level caching
Better
Block-level caching is too slow
![Page 17: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/17.jpg)
A fast server on today’s file system
7
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
![Page 18: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/18.jpg)
A fast server on today’s file system
7
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
Pillai et al., OSDI 2014
SQLiteHDFS
ZooKeeperLevelDB
HSQLDBMercurial
Git
Crash vulnerabilities0 2 4 6 8 10
![Page 19: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/19.jpg)
A fast server on today’s file system
7
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
Pillai et al., OSDI 2014
SQLiteHDFS
ZooKeeperLevelDB
HSQLDBMercurial
Git
Crash vulnerabilities0 2 4 6 8 10
Applications struggle for crash consistency
![Page 20: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/20.jpg)
Problems in today’s file systems
8
• Kernel mediates every operation NVM is so fast that kernel is the bottleneck
• Tied to a single type of device For low-cost capacity with high performance, must leverage multiple device types NVM (soon), SSD, HDD
• Aggressive caching in DRAM, write to device only when you must (fsync)
Applications struggle for crash consistency
![Page 21: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/21.jpg)
Strata: A Cross Media File System
9
Performance: especially small, random IO • Fast user-level device access
Low-cost capacity: leverage NVM, SSD & HDD • Transparent data migration across different storage media • Efficiently handle device IO properties
Simplicity: intuitive crash consistency model • In-order, synchronous IO • No fsync() required
![Page 22: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/22.jpg)
Strata: main design principle
Log operations to NVM at user-level
10
Digest and migrate data in kernel
![Page 23: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/23.jpg)
Strata: main design principle
Log operations to NVM at user-level
Performance: Kernel bypass, but private
10
Digest and migrate data in kernel
![Page 24: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/24.jpg)
Strata: main design principle
Log operations to NVM at user-level
Simplicity: Intuitive crash consistencyPerformance: Kernel bypass, but private
10
Digest and migrate data in kernel
![Page 25: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/25.jpg)
Strata: main design principle
Log operations to NVM at user-level
Simplicity: Intuitive crash consistencyPerformance: Kernel bypass, but private
Coordinate multi-process accesses
10
Digest and migrate data in kernel
![Page 26: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/26.jpg)
Strata: main design principle
Log operations to NVM at user-level
Simplicity: Intuitive crash consistencyPerformance: Kernel bypass, but private
Coordinate multi-process accesses
10
Digest and migrate data in kernel
Apply log operations to shared data
![Page 27: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/27.jpg)
Strata: main design principle
Log operations to NVM at user-level
Simplicity: Intuitive crash consistencyPerformance: Kernel bypass, but private
Coordinate multi-process accesses
10
Digest and migrate data in kernel
Apply log operations to shared data
LibFS
KernelFS
![Page 28: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/28.jpg)
Outline• LibFS: Log operations to NVM at user-level
• Fast user-level access • In-order, synchronous IO
• KernelFS: Digest and migrate data in kernel • Asynchronous digest • Transparent data migration • Shared file access
• Evaluation
11
![Page 29: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/29.jpg)
Log operations to NVM at user-level
12
unmodified application
Strata: LibFS
NVM
POSIX API
Private operation log
creat write …
File operations (data & metadata)
rename
![Page 30: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/30.jpg)
Log operations to NVM at user-level
12
• Fast writes
• Directly access fast NVM
• Sequentially append data
• Cache-line granularity
• Blind writes
unmodified application
Strata: LibFS
Kernel-bypass
NVM
POSIX API
Private operation log
creat write …
File operations (data & metadata)
rename
![Page 31: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/31.jpg)
Log operations to NVM at user-level
12
• Fast writes
• Directly access fast NVM
• Sequentially append data
• Cache-line granularity
• Blind writes
unmodified application
Strata: LibFS
Kernel-bypass
NVM
POSIX API
Private operation log
creat write …
File operations (data & metadata)
• Crash consistency
• On crash, kernel replays logrename
![Page 32: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/32.jpg)
unmodified application
Intuitive crash consistency
13
Strata: LibFS
Kernel-bypass
NVM
Synchronous IO
POSIX API
Private operation log
![Page 33: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/33.jpg)
unmodified application
Intuitive crash consistency
13
Strata: LibFS
Kernel-bypass
NVM
Synchronous IO
• When each system call returns:
• Data/metadata is durable
• In-order update
• Atomic write
• Limited size (log size)
POSIX API
Private operation log
![Page 34: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/34.jpg)
unmodified application
Intuitive crash consistency
13
Strata: LibFS
Kernel-bypass
NVM
Synchronous IO
• When each system call returns:
• Data/metadata is durable
• In-order update
• Atomic write
• Limited size (log size)
POSIX API
Private operation log
fsync() is no-op
![Page 35: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/35.jpg)
unmodified application
Intuitive crash consistency
13
Strata: LibFS
Kernel-bypass
NVM
Synchronous IO
• When each system call returns:
• Data/metadata is durable
• In-order update
• Atomic write
• Limited size (log size)
POSIX API
Fast synchronous IO: NVM and kernel-bypass
Private operation log
fsync() is no-op
![Page 36: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/36.jpg)
Outline• LibFS: Log operations to NVM at user-level
• Fast user-level access • In-order, synchronous IO
• KernelFS: Digest and migrate data in kernel • Asynchronous digest • Transparent data migration • Shared file access
• Evaluation
14
![Page 37: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/37.jpg)
15
Digest data in kernel
NVMNVM Shared areaPrivate operation log
Application
Strata: LibFS
POSIX API
Strata: KernelFS
![Page 38: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/38.jpg)
15
Digest data in kernel
Write
NVMNVM Shared areaPrivate operation log
Application
Strata: LibFS
POSIX API
Strata: KernelFS
![Page 39: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/39.jpg)
15
Digest data in kernel
Write
NVM
Digest
NVM Shared areaPrivate operation log
Application
Strata: LibFS
POSIX API
Strata: KernelFS
![Page 40: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/40.jpg)
15
• Visibility: make private log visible to other applications
• Data layout: turn write-optimized to read-optimized format (extent tree)
• Large, batched IO
• Coalesce log
Digest data in kernel
Write
NVM
Digest
NVM Shared areaPrivate operation log
Application
Strata: LibFS
POSIX API
Strata: KernelFS
![Page 41: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/41.jpg)
Digest optimization: Log coalescing
SQLite, Mail server: crash consistent update using write ahead logging
16
Digest eliminates unneeded work
. . .. . .
Remove temporary durable writes
Private operation log
Application
Strata: LibFS
Strata: KernelFS
NVM Shared area
![Page 42: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/42.jpg)
Digest optimization: Log coalescing
SQLite, Mail server: crash consistent update using write ahead logging
16
Create journal fileWrite data to journal fileWrite data to database fileDelete journal file
Digest eliminates unneeded work
. . .. . .
Remove temporary durable writes
Private operation log
Application
Strata: LibFS
Strata: KernelFS
NVM Shared area
![Page 43: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/43.jpg)
Digest optimization: Log coalescing
SQLite, Mail server: crash consistent update using write ahead logging
16
Create journal fileWrite data to journal fileWrite data to database fileDelete journal file
Digest eliminates unneeded work
. . .. . .
Write data to database file
Remove temporary durable writes
Private operation log
Application
Strata: LibFS
Strata: KernelFS
NVM Shared area
![Page 44: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/44.jpg)
Digest optimization: Log coalescing
SQLite, Mail server: crash consistent update using write ahead logging
16
Create journal fileWrite data to journal fileWrite data to database fileDelete journal file
Digest eliminates unneeded work
. . .. . .
Write data to database file
Remove temporary durable writes
Private operation log
Application
Strata: LibFS
Strata: KernelFS
Throughput optimization: Log coalescing saves IO while digesting
NVM Shared area
![Page 45: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/45.jpg)
17
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
Digest and migrate data in kernel
![Page 46: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/46.jpg)
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
18
SSD Shared area
HDD Shared area
• Low-cost capacity
• KernelFS migrates cold data to lower layers
Digest and migrate data in kernel
![Page 47: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/47.jpg)
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
18
SSD Shared area
HDD Shared area
• Low-cost capacity
• KernelFS migrates cold data to lower layers
Digest and migrate data in kernel
NVM data
![Page 48: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/48.jpg)
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
18
SSD Shared area
HDD Shared area
• Low-cost capacity
• KernelFS migrates cold data to lower layers
Digest and migrate data in kernel
NVM dataLogs
Digest
![Page 49: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/49.jpg)
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
18
SSD Shared area
HDD Shared area
• Low-cost capacity
• KernelFS migrates cold data to lower layers
Digest and migrate data in kernel
NVM dataLogs
Digest
![Page 50: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/50.jpg)
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
18
SSD Shared area
HDD Shared area
• Low-cost capacity
• KernelFS migrates cold data to lower layers
Digest and migrate data in kernel
NVM data
![Page 51: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/51.jpg)
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
18
SSD Shared area
HDD Shared area
• Low-cost capacity
• KernelFS migrates cold data to lower layers
Digest and migrate data in kernel
NVM data
• Handle device IO properties
• Migrate 1 GB blocks
• Avoid SSD garbage collection overhead
![Page 52: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/52.jpg)
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
18
SSD Shared area
HDD Shared area
• Low-cost capacity
• KernelFS migrates cold data to lower layers
Digest and migrate data in kernel
NVM data
Resembles log-structured merge (LSM) tree
• Handle device IO properties
• Migrate 1 GB blocks
• Avoid SSD garbage collection overhead
![Page 53: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/53.jpg)
Read: hierarchical search
19
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate OP log
SSD Shared area
HDD Shared area
NVM data
SSD data
HDD data
Log data
21
3
4
Search order
![Page 54: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/54.jpg)
Shared file access
20
• Leases grant access rights to applications [SOSP’89] • Required for files and directories • Function like lock, but revocable • Exclusive writer, shared readers
![Page 55: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/55.jpg)
Shared file access
20
• Leases grant access rights to applications [SOSP’89] • Required for files and directories • Function like lock, but revocable • Exclusive writer, shared readers
• On revocation, LibFS digests leased data • Private data made public before losing lease
• Leases serialize concurrent updates
![Page 56: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/56.jpg)
Outline• LibFS: Log operations to NVM at user-level
• Fast user-level access • In-order, synchronous IO
• KernelFS: Digest and migrate data in kernel • Asynchronous digest • Transparent data migration • Shared file access
• Evaluation
21
![Page 57: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/57.jpg)
Experimental setup• 2x Intel Xeon E5-2640 CPU, 64 GB DRAM
• 400 GB NVMe SSD, 1 TB HDD • Ubuntu 16.04 LTS, Linux kernel 4.8.12
• Emulated NVM • Use 40 GB of DRAM • Performance model [Y. Zhang et al. MSST 2015]
• Throttle latency & throughput in software
22
![Page 58: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/58.jpg)
Evaluation questions
23
• Latency:
• Does Strata efficiently support small, random writes?
• Does asynchronous digest have an impact on latency?
• Throughput:
• Strata writes data twice (logging and digesting). Can Strata sustain high throughput?
• How well does Strata perform when managing data across storage layers?
![Page 59: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/59.jpg)
Related work
24
• NVM file systems PMFS[EuroSys 14]: In-place update file system
• NOVA[FAST 16]: log-structured file system
• EXT4-DAX: NVM support for EXT4
• SSD file system
• F2FS[FAST 15]: log-structured file system
![Page 60: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/60.jpg)
Microbenchmark: write latency
25
• Strata logs to NVM
• Compare to NVM kernel file systems:PMFS, NOVA, EXT4-DAX
• Strata, NOVA
• In-order, synchronous IO
• Atomic write
• PMFS, EXT4-DAX
• No atomic write 0
2
4
6
8
10
IO size
128 B 1 KB 4 KB 16 KB
Strata PMFSNOVA EXT4-DAX
Latency (us)
Better
17 21 23 29
![Page 61: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/61.jpg)
Microbenchmark: write latency
25
• Strata logs to NVM
• Compare to NVM kernel file systems:PMFS, NOVA, EXT4-DAX
• Strata, NOVA
• In-order, synchronous IO
• Atomic write
• PMFS, EXT4-DAX
• No atomic write 0
2
4
6
8
10
IO size
128 B 1 KB 4 KB 16 KB
Strata PMFSNOVA EXT4-DAX
Latency (us)
Better
17 21 23 29
Avg.: 26% better Tail : 43% better
![Page 62: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/62.jpg)
Latency: persistent RPC
26
0
15
30
45
60
RPC size (IO size)1 KB 4 KB 64 KB
Strata PMFSNOVA EXT4-DAXNo persist
Better
98
Latency (us)
• Foundation of most servers
• Persist RPC data before sending ACK to client
• RPC over RDMA
• 40 Gb/s Infiniband NIC
• For small IO (1 KB)
• 25% slower than No persist
• 35% faster than PMFS 7x faster than EXT4-DAX
![Page 63: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/63.jpg)
Latency: persistent RPC
26
0
15
30
45
60
RPC size (IO size)1 KB 4 KB 64 KB
Strata PMFSNOVA EXT4-DAXNo persist
Better
98
Latency (us)
• Foundation of most servers
• Persist RPC data before sending ACK to client
• RPC over RDMA
• 40 Gb/s Infiniband NIC
• For small IO (1 KB)
• 25% slower than No persist
• 35% faster than PMFS 7x faster than EXT4-DAX
35% better
![Page 64: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/64.jpg)
Latency: LevelDB
0
10
20
30
Write sync.
Write seq.
Write rand.
Overwrite Readrand.
Strata PMFSNOVA EXT4-DAX
35.2 49.2 37.7
27
BetterLatency (us)
• LevelDB (NVM)
• Key size: 16 B
• Value size: 1 KB
• 300,000 objects
• Workload causes asynchronous digests
• Fast user-level logging
• Random write • 25% better than PMFS
• Random read • Tied with PMFS
![Page 65: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/65.jpg)
Latency: LevelDB
0
10
20
30
Write sync.
Write seq.
Write rand.
Overwrite Readrand.
Strata PMFSNOVA EXT4-DAX
35.2 49.2 37.7
27
Better
25% better
TiedLatency (us)
• LevelDB (NVM)
• Key size: 16 B
• Value size: 1 KB
• 300,000 objects
• Workload causes asynchronous digests
• Fast user-level logging
• Random write • 25% better than PMFS
• Random read • Tied with PMFS
![Page 66: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/66.jpg)
Latency: LevelDB
0
10
20
30
Write sync.
Write seq.
Write rand.
Overwrite Readrand.
Strata PMFSNOVA EXT4-DAX
35.2 49.2 37.7
27
Better
25% better
TiedLatency (us)
• LevelDB (NVM)
• Key size: 16 B
• Value size: 1 KB
• 300,000 objects
• Workload causes asynchronous digests
• Fast user-level logging
• Random write • 25% better than PMFS
• Random read • Tied with PMFS
IO latency not impacted by asynchronous digest
![Page 67: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/67.jpg)
Evaluation questions
28
• Latency:
• Does Strata efficiently support small, random writes?
• Does asynchronous digest have an impact on latency?
• Throughput:
• Strata writes data twice (logging and digesting). Can Strata sustain high throughput?
• How well does Strata perform when managing data across storage layers?
![Page 68: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/68.jpg)
Throughput: Varmail
29
Mail server workload from Filebench • Using only NVM • 10000 files • Read/Write ratio is 1:1 • Write-ahead logging Create journal file
Write data to journalWrite data to database fileDelete journal file
Digest eliminates unneeded work
Write data to database file
Removes temporary durable writes
KernelFS
Application
LibFS
Log coalescing
![Page 69: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/69.jpg)
Throughput: Varmail
29
Mail server workload from Filebench • Using only NVM • 10000 files • Read/Write ratio is 1:1 • Write-ahead logging
StrataPMFSNOVA
EXT4-DAX
Throughput (op/s)
0K 100K 200K 300K 400KBetter
29% better
Create journal fileWrite data to journalWrite data to database fileDelete journal file
Digest eliminates unneeded work
Write data to database file
Removes temporary durable writes
KernelFS
Application
LibFS
Log coalescing
![Page 70: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/70.jpg)
Throughput: Varmail
29
Mail server workload from Filebench • Using only NVM • 10000 files • Read/Write ratio is 1:1 • Write-ahead logging
Log coalescing eliminates 86% of log entries, saving 14 GB of IO
StrataPMFSNOVA
EXT4-DAX
Throughput (op/s)
0K 100K 200K 300K 400KBetter
29% better
Create journal fileWrite data to journalWrite data to database fileDelete journal file
Digest eliminates unneeded work
Write data to database file
Removes temporary durable writes
KernelFS
Application
LibFS
Log coalescing
![Page 71: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/71.jpg)
Throughput: Varmail
29
Mail server workload from Filebench • Using only NVM • 10000 files • Read/Write ratio is 1:1 • Write-ahead logging
Log coalescing eliminates 86% of log entries, saving 14 GB of IO
StrataPMFSNOVA
EXT4-DAX
Throughput (op/s)
0K 100K 200K 300K 400KBetter
29% better
Create journal fileWrite data to journalWrite data to database fileDelete journal file
Digest eliminates unneeded work
Write data to database file
Removes temporary durable writes
KernelFS
Application
LibFS
Log coalescing No kernel file system has both low latency and high throughput:
• PMFS: better latency • NOVA: better throughput
Strata achieves both low latency and high throughput
![Page 72: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/72.jpg)
Throughput: data migration
30
File server workload from Filebench • Working set starts at NVM, grows to SSD, HDD • Read/Write ratio is 1:2
![Page 73: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/73.jpg)
Throughput: data migration
30
File server workload from Filebench • Working set starts at NVM, grows to SSD, HDD • Read/Write ratio is 1:2
User-level migration • LRU: whole file granularity • Treat each file system as a black-box • NVM: NOVA, SSD: F2FS, HDD: EXT4
![Page 74: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/74.jpg)
Throughput: data migration
30
File server workload from Filebench • Working set starts at NVM, grows to SSD, HDD • Read/Write ratio is 1:2
User-level migration • LRU: whole file granularity • Treat each file system as a black-box • NVM: NOVA, SSD: F2FS, HDD: EXT4
Block-level caching • Linux LVM cache, formatted with F2FS
![Page 75: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/75.jpg)
Throughput: data migration
30
File server workload from Filebench • Working set starts at NVM, grows to SSD, HDD • Read/Write ratio is 1:2
User-level migration • LRU: whole file granularity • Treat each file system as a black-box • NVM: NOVA, SSD: F2FS, HDD: EXT4
StrataUser-level migrationBlock-level caching
Avg. throughput (ops/s)0K 2K 4K 6K 8K 10K
2x faster
Block-level caching • Linux LVM cache, formatted with F2FS
![Page 76: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/76.jpg)
Throughput: data migration
30
File server workload from Filebench • Working set starts at NVM, grows to SSD, HDD • Read/Write ratio is 1:2
User-level migration • LRU: whole file granularity • Treat each file system as a black-box • NVM: NOVA, SSD: F2FS, HDD: EXT4
22% faster than user-level migration Cross layer optimization: placing hot metadata in faster layers
StrataUser-level migrationBlock-level caching
Avg. throughput (ops/s)0K 2K 4K 6K 8K 10K
2x faster
Block-level caching • Linux LVM cache, formatted with F2FS
![Page 77: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/77.jpg)
Conclusion
Source code is available at https://github.com/ut-osa/strata
31
Server applications need fast, small random IO on vast datasets with intuitive crash consistency
Strata, a cross media file system, addresses these concerns
Performance: low latency, high throughput • Novel split of LibFS, KernelFS • Fast user-level access
Low-cost capacity: leverage NVM, SSD & HDD • Asynchronous digest • Transparent data migration with large, sequential IO
Simplicity: intuitive crash consistency model • In-order, synchronous IO
![Page 78: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/78.jpg)
Backup
32
![Page 79: Strata: A Cross Media File Systemawang/courses/cop5611_s2020/strata.pdfIntuitive crash consistency. 13. Strata: LibFS. Kernel-bypass NVM. Synchronous IO • When each system call returns:](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120422/html5/thumbnails/79.jpg)
Device management overhead
SSD
Thro
ughp
ut (M
B/s)
0
250
500
750
1000
SSD utilization0.1 0.25 0.5 0.6 0.7 0.8 0.9 1
64 MB 128 MB 256 MB512 MB 1024 MB
For example, SSD Random write:
Sequential writes avoid management overhead33
5-6x difference by hardware GC
SSD, HDD prefer large sequential IO