lcfs - storage driver for docker

38
1 © 2017 PORTWORX | LAYER CLONING FILESYSTEM LCFS Storage Driver For Docker Jobi FEB10, 2017

Upload: fred-love

Post on 11-Apr-2017

149 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: LCFS - Storage Driver for Docker

1© 2017 PORTWORX | LAYER CLONING FILESYSTEM

LCFS Storage Driver For Docker

Jobi FEB10, 2017

Page 2: LCFS - Storage Driver for Docker

2© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Every time you build, pull or destroy a Docker container, you are using a storage driver.

Because it is designed only for containers, it is up to 2.5x faster to build an image and up to almost 2x faster to pull an image.

We're looking forward to working with the container community to improve and expand this new tool.

− Open Sourced (Apache 2.0)

− Use or Contribute!https://github.com/portworx/lcfs

Exec Summary

Page 3: LCFS - Storage Driver for Docker

3© 2017 PORTWORX | LAYER CLONING FILESYSTEM

What is LCFS?

Layers are first class citizens− Atomicity guarantees for each layer, not

at system call

Provides− Efficient snapshotting/cloning

mechanism

− correctness guarantees to containers

A Posix File System in User space (FUSE) in C

− No kernel modifications or license issues

No configuration required

imag

e so

urce

: Doc

ker D

ocs

Page 4: LCFS - Storage Driver for Docker

4© 2017 PORTWORX | LAYER CLONING FILESYSTEM

What is a Graphdriver?

Docker image and container data repository− And corresponding configuration data

It is a POSIX file system, with some special operations like − Create read-only layer

− Create read-write layer

− Mount a layer

− Unmount a layer

− Delete a layer

Layers are mostly ephemeral (temporary)

Docker provides ordering of operations

Page 5: LCFS - Storage Driver for Docker

5© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Existing solutions

Union file systems vs. Snapshot based

Merged solutions (duplicated effort)−AUFS on top of Ext4/XFS−Overlay on top of Ext4/XFS−Devicemapper on top of LVM/Ext4/XFS

Traditional solutions are optimized for file/block storage, persistent data, point-in-time snapshots and clones, and all kinds of workflows (mostly data constantly being modified)

− Not very efficient for storing ephemeral and mostly read-only layers

Page 6: LCFS - Storage Driver for Docker

6© 2017 PORTWORX | LAYER CLONING FILESYSTEM

LCFS Architecture

6

kernel

device

FUSE Library

Fedora imageLayers

MySQL imageLayers

Container 1 boot device

init

read/write

LCFS

• User mode• Purpose built• Native

Docker Daemon

FUSE in Kernel

init

read/write

init

read/write

. . .

Page 7: LCFS - Storage Driver for Docker

7© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Layers

Root Layer – docker configuration data & volumes

Base layer and read-only layers

Read-write layers (2 per container)

Data shared between layers in a tree

Layers track space allocated to data created in a layer

Each layer has an inode table

Strictly read-only once a layer is created on top

Thin provisioned and branch-on-write

Page 8: LCFS - Storage Driver for Docker

8© 2017 PORTWORX | LAYER CLONING FILESYSTEM

How layers different?

Layers can be created/deleted without pausing any running containers

− cloning read-only layers is a lot simple

Data access time is constant for a container irrespective of the number on containers of an image

− Different from point-in-time snapshots/clones, no roll back

Layers are deleted in the reverse order of creation− Layers are not deleted in the beginning/middle of a chain

No reference counting of blocks− Creation/Deletion time independent of size of device, size of data set and

number of layers

− Unlimited number of layers

Page 9: LCFS - Storage Driver for Docker

9© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Layout

Unit of allocation is 4KB

Each layer has a super block

Superblocks are linked together to recreate the tree of layers on remount

Root layer superblock tracks blocks where free space information is tracked

Each layer tracks blocks where allocated space is tracked for the layer

Each layer tracks blocks where inodes are stored

Metadata blocks are checksummed

Page 10: LCFS - Storage Driver for Docker

10© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Space Management

Space is tracked using Extents (start block + count of blocks)

Free Extent Map of the whole file system

Allocated Extent Map for each layer

Each layer make reservations in large chunks and allocate from those chunks

− Less locking of the global free list

− Better contiguity within a layer (separate chunks for user data, metadata and inodes)

Minimum size for a device, Minimum free space for writes and layer creation

Page 11: LCFS - Storage Driver for Docker

11© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Inodes

Each inode takes 128 bytes on disk− Symbolic links are stored along with inode and inode consumes 4KB

− Access/Creation times not tracked

− Inode number is stored within the inode

Directory blocks are reachable from directory inodes

User data of single extent files reachable directly from the inode

Emap of fragmented files reachable from inode

Same the case with blocks tracking extended attributes

Page 12: LCFS - Storage Driver for Docker

12© 2017 PORTWORX | LAYER CLONING FILESYSTEM

File Handles

Formed using layer index + inode number

Layer index is unique for a layer, range between 0-64K

Inode number is unique globally− inode numbers are shared between layers in a tree for shared files

Inode numbers are never reused

Creates duplicate copies of shared data in kernel page cache, but those are invalidated as soon as file is closed

− May work better if FUSE is smarter here

Page 13: LCFS - Storage Driver for Docker

13© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Directory Tree

Global root of the file system with inode number 2

There is another directory called Layer Root Directory, created for docker for placing root directory of all layers

− This directory cannot be deleted or many operations are not allowed

Atomic rename(2) is supported

No need to keep “whiteouts” for removed files as directories are COWed

Page 14: LCFS - Storage Driver for Docker

14© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Locking

Each layer has a read-write lock, taken by all operations in shared mode

A layer is locked exclusive while deleting it

Root layer is locked in shared mode while creating/deleting layers

Root layer is locked exclusive while unmounting the file system

Page 15: LCFS - Storage Driver for Docker

15© 2017 PORTWORX | LAYER CLONING FILESYSTEM

File Operations

Each inode has a read-write lock, taken in shared mode by read-only operations and exclusive mode by modify operations – this lock is not taken on frozen layers

Writes are acknowledged immediately after copying data to dirty page cache of the file

fsync(2) is disabled

rmdir(2) in root layer succeeds even when directory is not empty

getxattr()/removexattr() are failed when the file system does not have any extended attributes without looking up the inode

ioctl(2) support on layer root directory for creating/ mounting / unmounting / deleting layers

Page 16: LCFS - Storage Driver for Docker

16© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Branch-On-Write (BOW - COW – Copy UP)

Inode is copied up on modification along with metadata like extended attributes and directory entries or block map

− Shared metadata may be shared in cache even after copy up

User data blocks are BOWed on modification in 4KB sizes− Most applications truncate the whole file and rewrite file with new data

Page 17: LCFS - Storage Driver for Docker

17© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Caching

All metadata stays in memory− Inodes, directories, emaps, extended attributes, space extent maps,

symbolic links etc.

− Caching actual amount of metadata, not page aligned metadata

Each layer has a hash table for inodes− Lookups may traverse the parent chain

Inodes have a dirty page list

Layers track hardlinks

Mostly using sequential lists (hashing scheme for large directories and dirty page list)

Page 18: LCFS - Storage Driver for Docker

18© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Page Block Cache

File system blocks are cached in a private page cache, indexed by block numbers for shared data

− Data not shared still use kernel page cache

Each Base image maintains a page cache and shared by all layers in the tree which have the same base image

Shared by both user data and metadata

Page 19: LCFS - Storage Driver for Docker

19© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Data Placement

Space allocated to files at the time of sync, not when written− Size of file known at the time of sync and never changes in a read-only

layer

− Most files can be placed contiguous on disk

− Temporary files and layers may not be written to disk

Small files and metadata are coalesced together as well

Zero blocks written do not consume space

Less metadata, less memory, less number of I/Os

Page 20: LCFS - Storage Driver for Docker

20© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Layer Diff

Needed for docker commit/build operations to find paths modified in a layer compared to parent layer

Uses custom diff driver – Not NaiveDiffDriver− Except pre-existing layers after remount

Plugin invokes getxattr calls to get diff for a layer from LCFS

LCFS traverse the private icache of the layer and report inodes instantiated in the layer

Only for generating diff from the parent layer

Page 21: LCFS - Storage Driver for Docker

21© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Crash Consistency

Docker Database of images and containers need to stay consistent even after an abnormal shutdown of the graphdriver

Considering a checkpointing scheme over a journaling scheme− Note fsync is disabled

Page 22: LCFS - Storage Driver for Docker

22© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Stats

Every operation in every layer is counted and total, maximum and minimum time for each type of operation is tracked

This information can be presented in a tabular form on a per layer basis on demand, periodically or at the time a layer is unmounted

Stats for a container can be restarted before running an application for proper tracing

Memory usage tracked for each layer

Count of different file types in every layer is tracked

CPU profiling can be enabled with gperftools

Page 23: LCFS - Storage Driver for Docker

23© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Container statsRunning a dd command in an ubuntu/bash container - dd if=/dev/zero of=file count=10000 bs=4096

Stats for file system 0x1878680 with root 8130 index 7 at Thu Dec 8 09:26:30 2016

Layer created at Thu Dec 8 09:25:11 2016

Last acccessed at Thu Dec 8 09:26:14 2016

Request: Total Failed Average Max Min

LOOKUP: 110 34 0s.000010u 0s.000054u 0s.000003u

GETATTR: 36 0 0s.000005u 0s.000018u 0s.000003u

READLINK: 22 0 0s.000006u 0s.000023u 0s.000004u

OPEN: 43 0 0s.000005u 0s.000013u 0s.000003u

READ: 191 0 0s.000068u 0s.000266u 0s.000004u

FLUSH: 2 0 0s.000000u 0s.000000u 0s.000000u

RELEASE: 35 0 0s.000039u 0s.000430u 0s.000003u

OPENDIR: 1 0 0s.000007u 0s.000007u 0s.000007u

RELEASEDIR: 1 0 0s.000007u 0s.000007u 0s.000007u

CREATE: 1 0 0s.000011u 0s.000011u 0s.000011u

WRITE_BUF: 10000 0 0s.000008u 0s.000120u 0s.000003u

blocks allocated 1 freed 0

2 inodes 10000 pages

0 reads 0 writes (0 inodes written)

Page 24: LCFS - Storage Driver for Docker

24© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Container Memory statsRunning a dd command in an ubuntu/bash container - dd if=/dev/zero of=file count=10000 bs=4096

Memory Stats for file system 0x1435a00 with root 8130 index 7 at Fri Dec 9 06:15:15 2016

DIRENT Allocated 21 Freed 0

ICACHE Allocated 1 Freed 0

INODE Allocated 2 Freed 0

EXTENT Allocated 1 Freed 0

BLOCK Allocated 1 Freed 0

DATA Allocated 10000 Freed 0

DPAGEHASH Allocated 14 Freed 13

STATS Allocated 1 Freed 0

Total memory in use 41213339 bytes

Page 25: LCFS - Storage Driver for Docker

25© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Time to Pull/Delete 30 popular images

Serial Pull Parallel Pull Serial Delete Parallel Delete0

100

200

300

400

500

600

700

800

Devmapper btrfs Overlay Overlay2 Lcfs

Page 26: LCFS - Storage Driver for Docker

26© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Time to Pull/Delete 30 popular images

Serial Pull Parallel Pull Serial Delete Parallel Delete0

50

100

150

200

250

300

350

400

450

500

AUFS LCfs

Page 27: LCFS - Storage Driver for Docker

27© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Time to Pull individual images

php-zendserve

r

hectcast

ro/riak

wordpresrai

ls

rabbitm

q

logstash

golan

g

sysdig/

sysdig

cassan

dra

postgres

mariad

bredis

httpdngin

x

gliderla

bs/logsp

out0

20

40

60

80

100

120

140

Overlay Overlay2 Lcfs

Page 28: LCFS - Storage Driver for Docker

28© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Time to Spawn fedora/apache Containers

20 40 60 80 1000

20

40

60

80

100

120

140

160

180

Devicemapper btrfs OverlayOverlay2 Lcfs

Page 29: LCFS - Storage Driver for Docker

29© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Time to Spawn fedora/apache Containers

20 40 60 80 1000

10

20

30

40

50

60

AUFS Lcfs

Page 30: LCFS - Storage Driver for Docker

30© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Time to Remove fedora/apache Containers

20 40 60 80 1000

10

20

30

40

50

60

70

Devmapper btrfs Overlay Overlay2 Lcfs

Page 31: LCFS - Storage Driver for Docker

31© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Time to Remove fedora/apache Containers

20 40 60 80 1000

5

10

15

20

25

30

35

40

45

AUFS Lcfs

Page 32: LCFS - Storage Driver for Docker

32© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Time to Build Docker sources

Docker Build 0

200

400

600

800

1000

1200

1400

1600

Devmapper btrfs Overlay Overlay2 Lcfs

Page 33: LCFS - Storage Driver for Docker

33© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Time to Build Docker sources

Docker Build 0

100

200

300

400

500

600

700

AUFS Lcfs

Page 34: LCFS - Storage Driver for Docker

34© 2017 PORTWORX | LAYER CLONING FILESYSTEM

IOPS with fiograph docker run portworx/fiograph --blocksize=1024K --filename=/root/1g.bin --ioengine=libaio --readwrite=read --size=1024M --name=test --gtod_reduce=1 --iodepth=1 --time_based --runtime=60

libaio splice0

1000

2000

3000

4000

5000

6000

7000 DevmapperOverlayOverlay2Lcfs

Page 35: LCFS - Storage Driver for Docker

35© 2017 PORTWORX | LAYER CLONING FILESYSTEM

LCFS - A Docker V2 Graphdriver Plugin

Download & Build LCFS or install RPM− git clone [email protected]:/portworx/lcfs, cd lcfs/lcfs, make

− rpm -Uvh http://yum.portworx.com/repo/rpms/px-graph/lcfs-0.0.0-0.x86_64.rpm

Mount a device at /var/lib/docker and /lcfs− ./lcfs <device/file> /var/lib/docker /lcfs –f

Start docker with vfs storage driver (1.13+)− dockerd –s vfs

Install LCFS plugin− docker plugin install portworx/lcfs

Restart docker with lcfs graphdriver− dockerd –experimental –s portworx/lcfs

Page 36: LCFS - Storage Driver for Docker

36© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Pending tasks

Crash consistency

Metadata paging

Replace linear search algorithms

https://github.com/portworx/lcfs/issues

QA

Page 37: LCFS - Storage Driver for Docker

37© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Roadmap

QOS at container level (COS, IOPS, Quotas etc.)

Distributed Graphdriver (images shared) Seamless container migration in a cluster

− Load Balancing

Backup/Restore of Graphdriver

37

Page 38: LCFS - Storage Driver for Docker

38© 2017 PORTWORX | LAYER CLONING FILESYSTEM

Q&A

More info − https://docs.docker.com/engine/userguide/storagedriver/imagesandcontai

ners/

− https://github.com/portworx/lcfs

Thank You!