how to teach an old file system dog new object store tricks · performance • small write (4kb)...

35
How to Teach an Old File System Dog New Object Store Tricks USENIX HotStorage ’18 Eunji Lee 1 , Youil Han 1 , Suli Yang 2 , Andrea C. Arpaci-Dusseau 2 , Remzi H. Arpaci-Dusseau 2 1 Chungbuk National University

Upload: others

Post on 21-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

How to Teach an Old File System Dog New Object Store Tricks

USENIX HotStorage ’18

Eunji Lee1, Youil Han1, Suli Yang2, Andrea C. Arpaci-Dusseau2, Remzi H. Arpaci-Dusseau2

1

Chungbuk National University

Page 2: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Data-service Platforms• Layering

• Abstract away underlying details • Reuse of existing software • Agility: development, operation, and maintenance

2 Eco System of Data-Service Platform

Page 3: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Often at odds with efficiency • Local File System

• Bottom layer of modern storage platforms• Portability, Extensibility, Ease of Development

3

Distributed Data Store (Dynamo, MongoDB)

Key-value Store (RocksDB, BerkelyDB)

Local File System (Ext4, XFS, BtrFS)

Distributed Data Store (HBase, BigTable)

Object Store

Distributed File System (HDFS, GFS)

Local File System (Ext4, XFS, BtrFS)

Distributed Data Store (Ceph)

Object Store Daemon (Ceph)

Local File System (Ext4, XFS, BtrFS)

Page 4: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Local File System • Not intended to serve as an underlying storage engine • Mismatch between the two layers • System-wide optimization

• Ignore demands from individual applications • Little control over file system internals • Suffer from degraded QoS

• Lack of required operations• No atomic operation• No data movement or reorganization• No additional user-level metadata

4

Out-of-control and Sub-optimal Performance

Page 5: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Current Solutions• Bypass File System

• Key-value store, Object Store, Database • But, reliniquish file system benefits

• Extend file system interfaces • Add new features to POSIX APIs• Slow and conservative

evolution • Stable maintenance than

specific optimizations

5

Name: Ext2/3/4 Birth: 1993

Page 6: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Our Approach • Use a file system as it is, but in a different

manner!• Design patterns of user-level data platform

• Take advantages of file system • Minimize negative effects of mismatches

6

Page 7: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Contents• Motivation • Problem Analysis • SwimStore • Performance Evaluation • Conclusion

7

Page 8: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Data-service Platform Taxonomy

8

PackingMapping

“Multiple objects in a file”“Object as a file”

What is the best way to store objects atop a file system?

Page 9: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Case Study: Ceph• Backend object store engine

• FileStore : mapping • KStore : packing • BlueStore

9

FileStore

OSD

BlueStore

File system

Storage Device

RGW RBD CephFS

RADOS

Ceph ArchitectureBackend Object Store

KStore

Page 10: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Mapping vs. Packing

10

KStore (Packing)

Object Store Log

FileStore (Mapping)

Object Store

Log

LSM Tree

“Multiple Objects in a File”

Object

File A

“Object as a File”

FileA B

File B

… A

Page 11: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Experimental Setup• Ceph 12.01• Amazon EC2 Clusters • Intel Xeon quad-core• 32GB DRAM • 256 GB SSD x 2 • Ubuntu Server 16.04 • File System : XFS (recommended in Ceph)• Backend: FileStore, KStore• Benchmark: Rados• Metric: IOPS, throughput, write traffic

11

Page 12: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Performance• Small Write (4KB)

• KStore performs better than FileStore by 1.5x • Write amplification by file metadata

12

KstoreFilestore

KstoreFilestore0.0

1.02.03.04.05.06.07.08.09.0

10.0

Rat

io w

rto O

rigin

al W

rite

4KB 1MB

4.4x

8.8x

3.2x2.1x

OriginalLoggingCompactionFilesystem

Original write trafficKstore(4KB) 864 MBKstore(1MB) 2.4 GBFilestore(4KB) 332 MBFilestore(1MB) 3.8 GB

IOPS

FileStoreKStore

FilestoreKstore

FilestoreKstore0.0

1.02.03.04.05.06.07.08.09.0

10.0

Rat

io w

rto O

rigin

al W

rite

4KB 1MB

8.8x

4.4x

2.1x3.2x

OriginalLoggingCompactionFilesystem

1.5x

Write Traffic BreakdownAverage IOPS

1x

Page 13: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Performance• Large Write (1MB)

• FileStore outperforms KStore by 1.6x• Write amplification by compaction

13

IOPS

FileStore KStoreKstore

FilestoreKstore

Filestore0.01.02.03.04.05.06.07.08.09.0

10.0

Rat

io w

rto O

rigin

al W

rite

4KB 1MB

4.4x

8.8x

3.2x2.1x

OriginalLoggingCompactionFilesystem

Original write trafficKstore(4KB) 864 MBKstore(1MB) 2.4 GBFilestore(4KB) 332 MBFilestore(1MB) 3.8 GB

FilestoreKstore

FilestoreKstore0.0

1.02.03.04.05.06.07.08.09.0

10.0

Rat

io w

rto O

rigin

al W

rite

4KB 1MB

8.8x

4.4x

2.1x3.2x

OriginalLoggingCompactionFilesystemFileStore

KStore

1.6x

Write Traffic BreakdownAverage IOPS

1x

Page 14: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Performance• Lack of atomic update support in file systems• Double-write penalty of logging• Halve bandwidth in large writes

14

FilestoreKstore

FilestoreKstore0.0

1.02.03.04.05.06.07.08.09.0

10.0

Rat

io w

rto O

rigin

al W

rite

4KB 1MB

8.8x

4.4x

2.1x3.2x

OriginalLoggingCompactionFilesystem

KstoreFilestore

KstoreFilestore0.0

1.02.03.04.05.06.07.08.09.0

10.0

Rat

io w

rto O

rigin

al W

rite

4KB 1MB

4.4x

8.8x

3.2x2.1x

OriginalLoggingCompactionFilesystem

Original write trafficKstore(4KB) 864 MBKstore(1MB) 2.4 GBFilestore(4KB) 332 MBFilestore(1MB) 3.8 GB

Write Traffic Breakdown

Page 15: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

QoS• FileStore

15

0 10 20 30 40 50 600

100

200

Time(s)

Writ

e (M

iB)

0

10

20

Thro

ughp

ut(M

B/s)BG-Write Throughputfilestore

FS: XFSW: 4KB

PerformanceWrite Traffic

0 10 20 30 40 50 600

100

200

Time(s)

Writ

e (M

iB)

0

150

300

Thro

ughp

ut(M

B/s)BG-Write Throughputfilestore

FS: XFSW: 1MB

4KB write

1MB write

Page Cache

Storage

Periodic Flush w. Buffered I/O Transaction Entanglement

Page 16: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

QoS• KStore

16

0 10 20 30 40 500

150

300

Time(s)

Writ

e (M

iB)

0

15

30

Thro

ughp

ut(M

B/s)BG-Write Throughputkstore

FS: XFSW: 4KB

Throughput: 40MB/s0 10 20 30 40 50 60

0

100

200

Time(s)

Writ

e (M

iB)

0

150

300Th

roug

hput

(MB/

s)BG-Write ThroughputkstoreFS: XFSW: 1MB

Consistently Poor

4KB write

1MB write

User-level Cache

StorageFrequent Compaction

Write amplification by merge

Page 17: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Summary• Performance penalties of file systems

• Small objects seriously suffer from write amplification caused by filesystem metadata

• Large writes are sensitive to write traffic increase by Logging in common, and frequent compaction in packing architecture.

• Buffered I/O and out-of-control flush mechanism in file systems makes it challenging to support QoS.

17

Page 18: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Contents• Motivation • Problem Analysis • SwimStore • Performance Evaluation • Conclusion

18

Page 19: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

SwimStore• Shadowing with Immutable Metadata Store• Provide consistently excellent performance for

all object sizes running over a file system

19

Page 20: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

• Strategy 1. In-file shadowing

SwimStore

20

File

Object

Log

Direct I/O

A

A’

B

Problems• Filesystem metadata overhead• Double-write penalty • Performance fluctuation• Compaction cost

key, offset, length

Indexing

Page 21: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

• Strategy 1. In-file shadowing

SwimStore

21

File

Synchronous Direct I/O

A

A’

User-facing Latency increases!

File

Raw Device Logging

A’

Log

Asynchronous Buffered I/O

A

FileStore SwimStore

File System

Page 22: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

SwimStore • File system access is slower than raw device access

• File system metadata (e.g., inode, allocation bitmap, etc.)• Transaction entanglement

22

File

Synchronous Direct I/O

A

A’

File System

m m m m

Page 23: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

SwimStore• Strategy 2. Metadata-Immutable Container

23

File

Synchronous Direct I/O

A

A’

File System

m m m m

1

0.4

0.86

0.4

Per-file

Single file

Raw device

Metadata-Immutable Container

Latency (4KB write)

Create a container file and allocate space in advance

Page 24: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

• Strategy 3. Hole-punching with Buddy-like Allocation

SwimStore

24

Shadowing technique requires the recycling of obsolete data space

Page 25: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

• Strategy 3. Hole-punching with Buddy-like Allocation

SwimStore

25

Opportunities(+) Filesystem has “infinite address space”(+) Filesystem provides “physical space reclamation” with punch-hole

….

Hole-punching

Page 26: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

• Strategy 3. Hole-punching with Buddy-like Allocation

SwimStore

26

Too small holes severely fragments space

Logical address

Physical address

New object

Page 27: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

SwimStore• Strategy 3. Hole-punching with Buddy-like

Allocation

27

2^0

2^1

2^n

…. Hole-punching for large holes

GC for small holes

Page 28: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

• Architecture

SwimStore

Container File Pool Metadata(Indexing, attributes, etc.)

Intent Log(metadata, checksum)

LSM-Tree (LevelDB)

Page 29: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Contents• Motivation • Problem Analysis • SwimStore • Performance Evaluation • Conclusion

29

Page 30: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Experimental Setup• Ceph 12.01, C++ 12K LOC • Amazon EC2 Clusters • Intel Xeon quad-core• 32GB DRAM • 256 GB SSD x 2 • Ubuntu Server 16.04 • File System : XFS (recommended in Ceph)• Backend: FileStore, KStore, BlueStore, SwimStore • Benchmark: Rados• Metric: IOPS, throughput, write traffic

30

Page 31: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Performance Evaluation• IOPS

31

4KB 16KB 64KB 256KB 1MB0.00.51.01.52.02.53.0

IOSize

Rat

io w

rto F

ileSt

ore

1454

ops

/s

1472

ops

/s

881

ops/

s

243

ops/

s

67 o

ps/s

FileStoreBlueStoreSwimStoreKStore

Small Write2.5x better than FileStore 1.6x better than BlueStore 1.1x better than KStore

Large Write1.8x better than FileStore 3.1x better than KStore

FileStore

BlueStore

SwimStoreKStore

Page 32: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Performance Evaluation• Write Traffic

32

4KB 16KB 64KB 256KB 1MB0.01.02.03.04.05.0

IOSize

Rat

io w

rto S

wim

Stor

e

1129

(MB)

3020

(MB)

5370

(MB)

7342

(MB)

7342

(MB) FileStore

BlueStoreSwimStoreKStore

FileStore

BlueStoreSwimStore

KStore

Page 33: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Contents• Motivation • Problem Analysis • Solution • Performance Evaluation • Conclusion

33

Page 34: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

Conclusion• Explore design patterns to build an object

store atop a local file system • SwimStore: a new backend object store

• In-file shadowing• Immutable metadata container • Hole-punching with buddy-like allocation

• Provide high performance and little performance variations

• Retain all benefits of the file system

34

Page 35: How to Teach an Old File System Dog New Object Store Tricks · Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file

35

Thank you