strata: a cross media file system - nvmw 2020 | 11th...
TRANSCRIPT
![Page 1: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/1.jpg)
Strata: A Cross Media File System
1
Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson
![Page 2: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/2.jpg)
Let’s build a fast server
2
Requirements
• Small updates (1 Kbytes) dominate
• Dataset scales up to 10 TB
• Updates must be crash consistent
NoSQL store, Database, File server, Mail server …
![Page 3: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/3.jpg)
Storage diversification
Latency $/GB
DRAM 100 ns 8.6
NVM (soon) 300 ns 4.0
SSD 10 us 0.25
HDD 10 ms 0.02 Bet
ter p
erfo
rman
ce
Hig
her c
apac
ity
3
![Page 4: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/4.jpg)
Storage diversification
Latency $/GB
DRAM 100 ns 8.6
NVM (soon) 300 ns 4.0
SSD 10 us 0.25
HDD 10 ms 0.02 Bet
ter p
erfo
rman
ce
Hig
her c
apac
ity
3
Byte-addressable: cache-line granularity IO
![Page 5: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/5.jpg)
Storage diversification
Latency $/GB
DRAM 100 ns 8.6
NVM (soon) 300 ns 4.0
SSD 10 us 0.25
HDD 10 ms 0.02 Bet
ter p
erfo
rman
ce
Hig
her c
apac
ity
3
Large erasure blocks need to be sequentially written Random writes: 5~6x slowdown due to GC [FAST’15]
Byte-addressable: cache-line granularity IO
![Page 6: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/6.jpg)
Application
A fast server on today’s file system
4
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
91%Kernel file system
NVM
Kernel file system: NOVA [FAST 16, SOSP 17]
![Page 7: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/7.jpg)
Application
A fast server on today’s file system
4
Small, random IO is slow!
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
91%Kernel file system
NVM
Kernel file system: NOVA [FAST 16, SOSP 17]
![Page 8: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/8.jpg)
Application
A fast server on today’s file system
4
Small, random IO is slow!
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
91%Kernel file system
NVM
1 KB
IO latency (us)0 1.5 3 4.5 6
Write to device Kernel code91%
Kernel file system: NOVA [FAST 16, SOSP 17]
![Page 9: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/9.jpg)
Application
A fast server on today’s file system
4
Small, random IO is slow!
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
91%Kernel file system
NVM
1 KB
IO latency (us)0 1.5 3 4.5 6
Write to device Kernel code91%
Kernel file system: NOVA [FAST 16, SOSP 17] NVM is so fast that kernel is the bottleneck
![Page 10: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/10.jpg)
A fast server on today’s file system
5
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
Kernel file system
NVM
Application
![Page 11: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/11.jpg)
A fast server on today’s file system
5
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
Need huge capacity, but NVM alone is too expensive! ($40K for 10TB)
Kernel file system
NVM
Application
![Page 12: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/12.jpg)
A fast server on today’s file system
5
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
Need huge capacity, but NVM alone is too expensive! ($40K for 10TB)
Kernel file system
NVM
Application
For low-cost capacity with high performance, must leverage multiple device types
![Page 13: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/13.jpg)
A fast server on today’s file system
6
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
Block-level caching
NVM SSD HDD
Kernel file system
Application
![Page 14: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/14.jpg)
A fast server on today’s file system
6
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
• Block-level caching manages data in blocks, but NVM is byte-addressable
• Extra level of indirection
Block-level caching
NVM SSD HDD
Kernel file system
Application
![Page 15: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/15.jpg)
A fast server on today’s file system
6
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
• Block-level caching manages data in blocks, but NVM is byte-addressable
• Extra level of indirection
Block-level caching
NVM SSD HDD
Kernel file system
Application
1 KB
IO latency (us)
0 3 6 9 12
NOVA Block-level caching
Better
![Page 16: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/16.jpg)
A fast server on today’s file system
6
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
• Block-level caching manages data in blocks, but NVM is byte-addressable
• Extra level of indirection
Block-level caching
NVM SSD HDD
Kernel file system
Application
1 KB
IO latency (us)
0 3 6 9 12
NOVA Block-level caching
Better
Block-level caching is too slow
![Page 17: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/17.jpg)
A fast server on today’s file system
7
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
![Page 18: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/18.jpg)
A fast server on today’s file system
7
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
Pillai et al., OSDI 2014
SQLiteHDFS
ZooKeeperLevelDB
HSQLDBMercurial
Git
Crash vulnerabilities0 2 4 6 8 10
![Page 19: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/19.jpg)
A fast server on today’s file system
7
• Small updates (1 Kbytes) dominate • Dataset scales up to 10TB • Updates must be crash consistent
Pillai et al., OSDI 2014
SQLiteHDFS
ZooKeeperLevelDB
HSQLDBMercurial
Git
Crash vulnerabilities0 2 4 6 8 10
Applications struggle for crash consistency
![Page 20: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/20.jpg)
Problems in today’s file systems
8
• Kernel mediates every operation NVM is so fast that kernel is the bottleneck
• Tied to a single type of device For low-cost capacity with high performance, must leverage multiple device types NVM (soon), SSD, HDD
• Aggressive caching in DRAM, write to device only when you must (fsync)
Applications struggle for crash consistency
![Page 21: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/21.jpg)
Strata: A Cross Media File System
9
Performance: especially small, random IO • Fast user-level device access
Low-cost capacity: leverage NVM, SSD & HDD • Transparent data migration across different storage media • Efficiently handle device IO properties
Simplicity: intuitive crash consistency model • In-order, synchronous IO • No fsync() required
![Page 22: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/22.jpg)
Strata: main design principle
Log operations to NVM at user-level
10
Digest and migrate data in kernel
![Page 23: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/23.jpg)
Strata: main design principle
Log operations to NVM at user-level
Performance: Kernel bypass, but private
10
Digest and migrate data in kernel
![Page 24: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/24.jpg)
Strata: main design principle
Log operations to NVM at user-level
Simplicity: Intuitive crash consistency
Performance: Kernel bypass, but private
10
Digest and migrate data in kernel
![Page 25: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/25.jpg)
Strata: main design principle
Log operations to NVM at user-level
Simplicity: Intuitive crash consistency
Performance: Kernel bypass, but private
Coordinate multi-process accesses
10
Digest and migrate data in kernel
![Page 26: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/26.jpg)
Strata: main design principle
Log operations to NVM at user-level
Simplicity: Intuitive crash consistency
Performance: Kernel bypass, but private
Coordinate multi-process accesses
10
Digest and migrate data in kernel
Apply log operations to shared data
![Page 27: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/27.jpg)
Strata: main design principle
Log operations to NVM at user-level
Simplicity: Intuitive crash consistency
Performance: Kernel bypass, but private
Coordinate multi-process accesses
10
Digest and migrate data in kernel
Apply log operations to shared data
LibFS
KernelFS
![Page 28: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/28.jpg)
Outline• LibFS: Log operations to NVM at user-level
• Fast user-level access • In-order, synchronous IO
• KernelFS: Digest and migrate data in kernel • Asynchronous digest • Transparent data migration • Shared file access
• Evaluation
11
![Page 29: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/29.jpg)
Log operations to NVM at user-level
12
unmodified application
Strata: LibFS
NVM
POSIX API
Private operation log
creat write …
File operations (data & metadata)
rename
![Page 30: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/30.jpg)
Log operations to NVM at user-level
12
• Fast writes
• Directly access fast NVM
• Sequentially append data
• Cache-line granularity
• Blind writes
unmodified application
Strata: LibFS
Kernel-bypass
NVM
POSIX API
Private operation log
creat write …
File operations (data & metadata)
rename
![Page 31: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/31.jpg)
Log operations to NVM at user-level
12
• Fast writes
• Directly access fast NVM
• Sequentially append data
• Cache-line granularity
• Blind writes
unmodified application
Strata: LibFS
Kernel-bypass
NVM
POSIX API
Private operation log
creat write …
File operations (data & metadata)
• Crash consistency
• On crash, kernel replays logrename
![Page 32: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/32.jpg)
unmodified application
Intuitive crash consistency
13
Strata: LibFS
Kernel-bypass
NVM
Synchronous IO
POSIX API
Private operation log
![Page 33: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/33.jpg)
unmodified application
Intuitive crash consistency
13
Strata: LibFS
Kernel-bypass
NVM
Synchronous IO
• When each system call returns:
• Data/metadata is durable
• In-order update
• Atomic write
• Limited size (log size)
POSIX API
Private operation log
![Page 34: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/34.jpg)
unmodified application
Intuitive crash consistency
13
Strata: LibFS
Kernel-bypass
NVM
Synchronous IO
• When each system call returns:
• Data/metadata is durable
• In-order update
• Atomic write
• Limited size (log size)
POSIX API
Private operation log
fsync() is no-op
![Page 35: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/35.jpg)
unmodified application
Intuitive crash consistency
13
Strata: LibFS
Kernel-bypass
NVM
Synchronous IO
• When each system call returns:
• Data/metadata is durable
• In-order update
• Atomic write
• Limited size (log size)
POSIX API
Fast synchronous IO: NVM and kernel-bypass
Private operation log
fsync() is no-op
![Page 36: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/36.jpg)
Outline• LibFS: Log operations to NVM at user-level
• Fast user-level access • In-order, synchronous IO
• KernelFS: Digest and migrate data in kernel • Asynchronous digest • Transparent data migration • Shared file access
• Evaluation
14
![Page 37: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/37.jpg)
15
• Operation log
• Private data
• Read/writable to LibFS
• Shared area
• Managed by KernelFS
• Globally visible
• Read only to LibFS
Digest data in kernel
NVMNVM Shared areaPrivate operation log
Application
Strata: LibFS
POSIX API
Strata: KernelFS
![Page 38: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/38.jpg)
16
Digest data in kernel
NVMNVM Shared areaPrivate operation log
Application
Strata: LibFS
POSIX API
Strata: KernelFS
![Page 39: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/39.jpg)
16
Digest data in kernel
Write
NVMNVM Shared areaPrivate operation log
Application
Strata: LibFS
POSIX API
Strata: KernelFS
![Page 40: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/40.jpg)
16
Digest data in kernel
Write
NVM
Digest (Background copy)
NVM Shared areaPrivate operation log
Application
Strata: LibFS
POSIX API
Strata: KernelFS
![Page 41: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/41.jpg)
16
• Visibility: make private log visible to other applications
• Data layout: turn write-optimized to read-optimized format (extent tree)
• Large, batched IO
• Coalesce log
Digest data in kernel
Write
NVM
Digest (Background copy)
NVM Shared areaPrivate operation log
Application
Strata: LibFS
POSIX API
Strata: KernelFS
![Page 42: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/42.jpg)
Digest optimization: Log coalescing
SQLite, Mail server: crash consistent update using write ahead logging
17
Digest eliminates unneeded work
. . .. . .
Remove temporary durable writes
Private operation log
Application
Strata: LibFS
Strata: KernelFS
NVM Shared area
![Page 43: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/43.jpg)
Digest optimization: Log coalescing
SQLite, Mail server: crash consistent update using write ahead logging
17
Create journal fileWrite data to journal fileWrite data to database fileDelete journal file
Digest eliminates unneeded work
. . .. . .
Remove temporary durable writes
Private operation log
Application
Strata: LibFS
Strata: KernelFS
NVM Shared area
![Page 44: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/44.jpg)
Digest optimization: Log coalescing
SQLite, Mail server: crash consistent update using write ahead logging
17
Create journal fileWrite data to journal fileWrite data to database fileDelete journal file
Digest eliminates unneeded work
. . .. . .
Write data to database file
Remove temporary durable writes
Private operation log
Application
Strata: LibFS
Strata: KernelFS
NVM Shared area
![Page 45: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/45.jpg)
Digest optimization: Log coalescing
SQLite, Mail server: crash consistent update using write ahead logging
17
Create journal fileWrite data to journal fileWrite data to database fileDelete journal file
Digest eliminates unneeded work
. . .. . .
Write data to database file
Remove temporary durable writes
Private operation log
Application
Strata: LibFS
Strata: KernelFS
Throughput optimization: Log coalescing saves IO while digesting
NVM Shared area
![Page 46: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/46.jpg)
18
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
Digest and migrate data in kernel
![Page 47: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/47.jpg)
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
19
SSD Shared area
HDD Shared area
• Low-cost capacity
• KernelFS migrates cold data to lower layers
Digest and migrate data in kernel
![Page 48: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/48.jpg)
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
19
SSD Shared area
HDD Shared area
• Low-cost capacity
• KernelFS migrates cold data to lower layers
Digest and migrate data in kernel
NVM data
![Page 49: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/49.jpg)
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
19
SSD Shared area
HDD Shared area
• Low-cost capacity
• KernelFS migrates cold data to lower layers
Digest and migrate data in kernel
NVM dataLogs
Digest
![Page 50: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/50.jpg)
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
19
SSD Shared area
HDD Shared area
• Low-cost capacity
• KernelFS migrates cold data to lower layers
Digest and migrate data in kernel
NVM dataLogs
Digest
![Page 51: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/51.jpg)
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
19
SSD Shared area
HDD Shared area
• Low-cost capacity
• KernelFS migrates cold data to lower layers
Digest and migrate data in kernel
NVM data
![Page 52: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/52.jpg)
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
19
SSD Shared area
HDD Shared area
• Low-cost capacity
• KernelFS migrates cold data to lower layers
Digest and migrate data in kernel
NVM data
• Handle device IO properties
• Migrate 1 GB blocks
• Avoid SSD garbage collection overhead
![Page 53: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/53.jpg)
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate operation log
19
SSD Shared area
HDD Shared area
• Low-cost capacity
• KernelFS migrates cold data to lower layers
Digest and migrate data in kernel
NVM data
Resembles log-structured merge (LSM) tree
• Handle device IO properties
• Migrate 1 GB blocks
• Avoid SSD garbage collection overhead
![Page 54: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/54.jpg)
Read: hierarchical search
20
Application
Strata: LibFS
Strata: KernelFS
NVM Shared areaPrivate OP log
SSD Shared area
HDD Shared area
NVM data
SSD data
HDD data
Log data
21
3
4
Search order
![Page 55: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/55.jpg)
Shared file access
21
• Leases grant access rights to applications [SOSP’89] • Function like lock, but revocable • Required for files and directories • Exclusive writer, shared readers
![Page 56: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/56.jpg)
Shared file access
21
• Leases grant access rights to applications [SOSP’89] • Function like lock, but revocable • Required for files and directories • Exclusive writer, shared readers
• On revocation, LibFS digests leased data • Private data made public before losing lease
• Leases serialize concurrent updates
![Page 57: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/57.jpg)
Outline• LibFS: Log operations to NVM at user-level
• Fast user-level access • In-order, synchronous IO
• KernelFS: Digest and migrate data in kernel • Asynchronous digest • Transparent data migration • Shared file access
• Evaluation
22
![Page 58: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/58.jpg)
Experimental setup• 2x Intel Xeon E5-2640 CPU, 64 GB DRAM
• 400 GB NVMe SSD, 1 TB HDD • Ubuntu 16.04 LTS, Linux kernel 4.8.12
• Emulated NVM • Use 40 GB of DRAM • Performance model [Y. Zhang et al. MSST 2015]
• Throttle latency & throughput in software
23
![Page 59: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/59.jpg)
Related work
24
• NVM file systems PMFS[EuroSys 14]: In-place update file system
• NOVA[FAST 16]: log-structured file system
• EXT4-DAX: NVM support for EXT4
• SSD file system
• F2FS[FAST 15]: log-structured file system
![Page 60: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/60.jpg)
Evaluation questions
25
• Latency:
• Does Strata efficiently support small, random writes?
• Does asynchronous digest have an impact on latency?
• Throughput:
• Strata writes data twice (logging and digesting). Can Strata sustain high throughput?
• How well does Strata perform when managing data across storage layers?
![Page 61: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/61.jpg)
Microbenchmark: write latency
26
• Strata logs to NVM
• Compare to NVM kernel file systems:PMFS, NOVA, EXT4-DAX
• Strata, NOVA
• In-order, synchronous IO
• Atomic write
• PMFS, EXT4-DAX
• No atomic write 0
2
4
6
8
10
IO size
128 B 1 KB 4 KB 16 KB
Strata PMFSNOVA EXT4-DAX
Latency (us)
Better
17 21 23 29
![Page 62: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/62.jpg)
Microbenchmark: write latency
26
• Strata logs to NVM
• Compare to NVM kernel file systems:PMFS, NOVA, EXT4-DAX
• Strata, NOVA
• In-order, synchronous IO
• Atomic write
• PMFS, EXT4-DAX
• No atomic write 0
2
4
6
8
10
IO size
128 B 1 KB 4 KB 16 KB
Strata PMFSNOVA EXT4-DAX
Latency (us)
Better
17 21 23 29
Avg.: 26% better Tail : 43% better
![Page 63: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/63.jpg)
Latency: LevelDB
0
10
20
30
Write sync.
Write seq.
Write rand.
Overwrite Readrand.
Strata PMFSNOVA EXT4-DAX
35.2 49.2 37.7
27
BetterLatency (us)
• LevelDB (NVM)
• Key size: 16 B
• Value size: 1 KB
• 300,000 objects
• Workload causes asynchronous digests
• Fast user-level logging
• Random write • 25% better than PMFS
• Random read • Tied with PMFS
![Page 64: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/64.jpg)
Latency: LevelDB
0
10
20
30
Write sync.
Write seq.
Write rand.
Overwrite Readrand.
Strata PMFSNOVA EXT4-DAX
35.2 49.2 37.7
27
Better
25% better
TiedLatency (us)
• LevelDB (NVM)
• Key size: 16 B
• Value size: 1 KB
• 300,000 objects
• Workload causes asynchronous digests
• Fast user-level logging
• Random write • 25% better than PMFS
• Random read • Tied with PMFS
![Page 65: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/65.jpg)
Latency: LevelDB
0
10
20
30
Write sync.
Write seq.
Write rand.
Overwrite Readrand.
Strata PMFSNOVA EXT4-DAX
35.2 49.2 37.7
27
Better
25% better
TiedLatency (us)
• LevelDB (NVM)
• Key size: 16 B
• Value size: 1 KB
• 300,000 objects
• Workload causes asynchronous digests
• Fast user-level logging
• Random write • 25% better than PMFS
• Random read • Tied with PMFS Low latency IO despite of background digest
![Page 66: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/66.jpg)
Evaluation questions
28
• Latency:
• Does Strata efficiently support small, random writes?
• Does asynchronous digest have an impact on latency?
• Throughput:
• Strata writes data twice (logging and digesting). Can Strata sustain high throughput?
• How well does Strata perform when managing data across storage layers?
![Page 67: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/67.jpg)
Throughput: Varmail
29
Mail server workload from Filebench • Using only NVM • 10000 files • Read/Write ratio is 1:1 • Write-ahead logging Create journal file
Write data to journalWrite data to database fileDelete journal file
Digest eliminates unneeded work
Write data to database file
Removes temporary durable writes
KernelFS
Application
LibFS
Log coalescing
![Page 68: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/68.jpg)
Throughput: Varmail
29
Mail server workload from Filebench • Using only NVM • 10000 files • Read/Write ratio is 1:1 • Write-ahead logging
StrataPMFSNOVA
EXT4-DAX
Throughput (op/s)
0K 100K 200K 300K 400KBetter
29% better
Create journal fileWrite data to journalWrite data to database fileDelete journal file
Digest eliminates unneeded work
Write data to database file
Removes temporary durable writes
KernelFS
Application
LibFS
Log coalescing
![Page 69: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/69.jpg)
Throughput: Varmail
29
Mail server workload from Filebench • Using only NVM • 10000 files • Read/Write ratio is 1:1 • Write-ahead logging
Log coalescing eliminates 86% of log entries, saving 14 GB of IO
StrataPMFSNOVA
EXT4-DAX
Throughput (op/s)
0K 100K 200K 300K 400KBetter
29% better
Create journal fileWrite data to journalWrite data to database fileDelete journal file
Digest eliminates unneeded work
Write data to database file
Removes temporary durable writes
KernelFS
Application
LibFS
Log coalescing
![Page 70: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/70.jpg)
Throughput: Varmail
29
Mail server workload from Filebench • Using only NVM • 10000 files • Read/Write ratio is 1:1 • Write-ahead logging
Log coalescing eliminates 86% of log entries, saving 14 GB of IO
StrataPMFSNOVA
EXT4-DAX
Throughput (op/s)
0K 100K 200K 300K 400KBetter
29% better
Create journal fileWrite data to journalWrite data to database fileDelete journal file
Digest eliminates unneeded work
Write data to database file
Removes temporary durable writes
KernelFS
Application
LibFS
Log coalescing No kernel file system has both low latency and high throughput:
• PMFS: better latency • NOVA: better throughput
Strata achieves both low latency and high throughput
![Page 71: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/71.jpg)
Throughput: data migration
30
File server workload from Filebench • Working set starts at NVM, grows to SSD, HDD • Read/Write ratio is 1:2
![Page 72: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/72.jpg)
Throughput: data migration
30
File server workload from Filebench • Working set starts at NVM, grows to SSD, HDD • Read/Write ratio is 1:2
User-level migration • LRU: whole file granularity • Treat each file system as a black-box • NVM: NOVA, SSD: F2FS, HDD: EXT4
![Page 73: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/73.jpg)
Throughput: data migration
30
File server workload from Filebench • Working set starts at NVM, grows to SSD, HDD • Read/Write ratio is 1:2
User-level migration • LRU: whole file granularity • Treat each file system as a black-box • NVM: NOVA, SSD: F2FS, HDD: EXT4
Block-level caching • Linux LVM cache, formatted with F2FS
![Page 74: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/74.jpg)
Throughput: data migration
30
File server workload from Filebench • Working set starts at NVM, grows to SSD, HDD • Read/Write ratio is 1:2
User-level migration • LRU: whole file granularity • Treat each file system as a black-box • NVM: NOVA, SSD: F2FS, HDD: EXT4
StrataUser-level migrationBlock-level caching
Avg. throughput (ops/s)0K 2K 4K 6K 8K 10K
2x faster
Block-level caching • Linux LVM cache, formatted with F2FS
![Page 75: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/75.jpg)
Throughput: data migration
30
File server workload from Filebench • Working set starts at NVM, grows to SSD, HDD • Read/Write ratio is 1:2
User-level migration • LRU: whole file granularity • Treat each file system as a black-box • NVM: NOVA, SSD: F2FS, HDD: EXT4
22% faster than user-level migration Cross layer optimization: placing hot metadata in faster layers
StrataUser-level migrationBlock-level caching
Avg. throughput (ops/s)0K 2K 4K 6K 8K 10K
2x faster
Block-level caching • Linux LVM cache, formatted with F2FS
![Page 76: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/76.jpg)
Conclusion
Source code is available at https://github.com/ut-osa/strata
31
Server applications need fast, small random IO on vast datasets with intuitive crash consistency
Strata, a cross media file system, addresses these concerns
Performance: low latency, high throughput • Novel split of LibFS, KernelFS • Fast user-level access
Low-cost capacity: leverage NVM, SSD & HDD • Asynchronous digest • Transparent data migration with large, sequential IO
Simplicity: intuitive crash consistency model • In-order, synchronous IO
![Page 77: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/77.jpg)
Backup
32
![Page 78: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/78.jpg)
Device management overhead
SSD
Thro
ughp
ut (M
B/s)
0
250
500
750
1000
SSD utilization0.1 0.25 0.5 0.6 0.7 0.8 0.9 1
64 MB 128 MB 256 MB512 MB 1024 MB
For example, SSD Random write:
Sequential writes avoid management overhead33
5-6x difference by hardware GC
SSD, HDD prefer large sequential IO
![Page 79: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/79.jpg)
Latency: persistent RPC
34
0
15
30
45
60
RPC size (IO size)1 KB 4 KB 64 KB
Strata PMFSNOVA EXT4-DAXNo persist
Better
98
Latency (us)
• Foundation of most servers
• Persist RPC data before sending ACK to client
• RPC over RDMA
• 40 Gb/s Infiniband NIC
• For small IO (1 KB)
• 25% slower than No persist
• 35% faster than PMFS 7x faster than EXT4-DAX
![Page 80: Strata: A Cross Media File System - NVMW 2020 | 11th ...nvmw.ucsd.edu/nvmw2018-program/unzip/current/nvmw...Intuitive crash consistency 13 Strata: LibFS Kernel-bypass NVM Synchronous](https://reader034.vdocuments.site/reader034/viewer/2022050208/5f5ae33d350bf851aa120423/html5/thumbnails/80.jpg)
Latency: persistent RPC
34
0
15
30
45
60
RPC size (IO size)1 KB 4 KB 64 KB
Strata PMFSNOVA EXT4-DAXNo persist
Better
98
Latency (us)
• Foundation of most servers
• Persist RPC data before sending ACK to client
• RPC over RDMA
• 40 Gb/s Infiniband NIC
• For small IO (1 KB)
• 25% slower than No persist
• 35% faster than PMFS 7x faster than EXT4-DAX
35% better