osdc2011.ext4btrfs.talk

50
Quo vadis Linux File Systems Ext4 or BTRFS Udo Seidel

Upload: udo-seidel

Post on 19-May-2015

403 views

Category:

Technology


0 download

DESCRIPTION

Some technical information on EXT4 and BTRFS (Spring 2011)

TRANSCRIPT

Page 1: Osdc2011.ext4btrfs.talk

Quo vadis Linux File Systems Ext4 or BTRFS

Udo Seidel

Page 2: Osdc2011.ext4btrfs.talk

OSDC 2011 2

Agenda

● Introduction/motivation● ext4 – the new member of the extfs family

● Facts, specs● Migration

● BTRFS – the newbie .. the hope● Facts, specs● Migration

● Summary

Page 3: Osdc2011.ext4btrfs.talk

OSDC 2011 3

Linux file systems

● More than 50 file systems shipped with Linux kernel● Local● Remote● Cluster● ...

● A few as standard for root directory● ext2, ext3● XFS

Page 4: Osdc2011.ext4btrfs.talk

OSDC 2011 4

Linux file systems – challenges

● ReiserFS sun-setted● Limitations of ext3● Changes in recent Enterprise distributions

Page 5: Osdc2011.ext4btrfs.talk

OSDC 2011 5

Linux file systems – new players

● New version of the ext family -> ext4● Marked as stable● Shipped with Enterprise distributions

● New approach with BTRFS● Still experimental● Default by some projects, e.g. MeeGo

Page 6: Osdc2011.ext4btrfs.talk

OSDC 2011 6

4th extended file system

● Shipped since 2.6.19● Stable since 2.6.28● To overcome limits of ext3

● Size● Performance

Page 7: Osdc2011.ext4btrfs.talk

OSDC 2011 7

Ext4 - history

● Successor of ext3● Started as set of patches for ext3● Later forked

● First called ext3dev (sometimes ext4dev)● Not impact ext3 stability● Less dependencies to ext3 code● Easier to maintain source code

Page 8: Osdc2011.ext4btrfs.talk

OSDC 2011 8

Ext4 - facts

● Max volume size: 1 EByte = 1024 PByte ● Max file size: 16 TByte● Max length of file name: 256 Bytes● Support of extended attributes● No encryption● Not really compression● Partially 64bit

Page 9: Osdc2011.ext4btrfs.talk

OSDC 2011 9

Ext4 – starting from known

● Known tools● mkfs● fsck● tune2fs● e2label

Page 10: Osdc2011.ext4btrfs.talk

OSDC 2011 10

Ext4 – global structure I

● Entry point -> superblock● Block size● Number of blocks and inodes● Number of free blocks and inodes

● Disk divided in block groups● backup of superblock ● Block group description (inode/block bitmaps)

Page 11: Osdc2011.ext4btrfs.talk

OSDC 2011 11

Ext4 – global structure II

● Similar to ext3● Inherits some ext3 limitations

● Number of inodes per block group

● 2nd type of block groups => flexible ● Flexible placement of bitmaps

● Bigger inodes to store additional information● 256 Bytes● Nano second time stamps

Page 12: Osdc2011.ext4btrfs.talk

OSDC 2011 12

Ext4 – from blocks to extents

● Common addressing for modern file systems● Contiguous area of blocks

● Less management information needed● Less meta data operations● Less “fragmentation”

● Requires change of on-disk format

Page 13: Osdc2011.ext4btrfs.talk

OSDC 2011 13

Ext4 – extent I● 15 bit for extent size

● Block size of 4 KByte => 128 MByte

● 1 bit for extent initialization information

struct ext4_extent {

  __le32  ee_block; /* first logical block extent covers */

  __le16  ee_len;  /* number of blocks covered by extent */

  __le16  ee_start_hi; /* high 16 bits of physical block */

  __le32  ee_start_lo; /* low 32 bits of physical block */

};

Page 14: Osdc2011.ext4btrfs.talk

OSDC 2011 14

Ext4 – extent II

● 32 bit for block addresses inside file● Block size of 4 KByte => 16 TByte

● 48 (!) bit for block addresses of file system● Block size of 4 KByte => 1 EByte

Page 15: Osdc2011.ext4btrfs.talk

OSDC 2011 15

Ext4 – extent III

● 60 Byte for extent information● 12 Byte for extent header● 12 Byte for extent structure

– Up to 4 extents per inode – max. 512 MByte direct addressable (ext3: 48 KByte)– Different schema for bigger files

Page 16: Osdc2011.ext4btrfs.talk

OSDC 2011 16

Ext4 – extent tree I

● For files > 512 MByte● B+ tree● Extent structure only at leaf nodes ● New element: extent index

● Same header structure like data extent ● Points to data block● Data block contains either extent index or extent

structure

Page 17: Osdc2011.ext4btrfs.talk

OSDC 2011 17

Ext4 – extent tree II

Page 18: Osdc2011.ext4btrfs.talk

OSDC 2011 18

Ext4 – from extents to blocks

● At the end block allocation● New features

● Multi-block allocation● Delayed allocation● Persistent allocation

Page 19: Osdc2011.ext4btrfs.talk

OSDC 2011 19

Ext4 – multi-block allocation

● Ext3: only one block● 12800 calls for 50 MByte file

● Ext4: multiple blocks per call● Less overhead● Contiguous physical location of data

Page 20: Osdc2011.ext4btrfs.talk

OSDC 2011 20

Ext4 – delayed allocation

● Ext3● Instant block allocation● Fragmentation due to buffers and caches

● Ext4● Delayed block allocation● Use cache information for placement● Risk of data loss in early versions => improved

since 2.6.30

Page 21: Osdc2011.ext4btrfs.talk

OSDC 2011 21

Ext4 – “clever” allocation

● Support of system call fallocate()● Application reserves blocks ahead● File system ensures disk space availability

● Allocation information in extent structure● Remember 16th bit

Page 22: Osdc2011.ext4btrfs.talk

OSDC 2011 22

Ext4 – consistent status

● New journaling => JBD2● Transactions have checksums● 64 bit ready● Deactivation possible

Page 23: Osdc2011.ext4btrfs.talk

OSDC 2011 23

Ext4 – repair

● Improved fsck()● No check of unused blocks

– information stored in block group header– Information secured via checksums– (de)activation possible at any time

● First run as slow like in ext3

Page 24: Osdc2011.ext4btrfs.talk

OSDC 2011 24

Ext4 – other news

● Nano second precision time stamps● Unix millennium bug shifted to 2514

● More subdirectories● Up to 65000● More than 65000 ... with limitation

Page 25: Osdc2011.ext4btrfs.talk

OSDC 2011 25

Ext4 – general migration paths

● mkfs() and backup/restore● Clean new file system structure● Only way for file systems other than ext2/3● Extended outage

● Conversion via tune2fs● Partial only● Only possible for ext family● Faster/easier

Page 26: Osdc2011.ext4btrfs.talk

OSDC 2011 26

Ext4 – background for migration

● 2 kind of changes compared to ext3● change of ondisk format:

– Extents– Only enabled for new files via tune2fs– Additional tasks needed

● Ondisk format not relevant– block allocation– Immediately enabled via tune2fs

Page 27: Osdc2011.ext4btrfs.talk

OSDC 2011 27

Ext4 – migration via tune2fs

● Results in mix of ext3 and ext4 structure● Access via ext3 driver impossible● fsck() needed

parameter description

extent Extent based block allocation

flex_bg Flexible placement of meta data

uninit_bg Flag uninitialized blocks for faster fsck

dir_nlink Infinite number of sub directories

extra_isize Timestamps with nano seconds

Page 28: Osdc2011.ext4btrfs.talk

OSDC 2011 28

Ext4 – migration hints

● fsck() recommended● /boot – booting from ext4 possible?● Rescue media enabled for ext4?

Page 29: Osdc2011.ext4btrfs.talk

OSDC 2011 29

Ext4 – summary

● Good successor of ext3● Manages higher amount of data● Faster

● Performance● recovery

● Safer● Sufficient migration options from ext2/3

Page 30: Osdc2011.ext4btrfs.talk

OSDC 2011 30

Better/b-tree file system

● Shipped since 2.6.29● Still experimental● Replace ext3/4● New storage management approach

Page 31: Osdc2011.ext4btrfs.talk

OSDC 2011 31

BTRFS - history

● Basic idea ● Shown 2007● Usage of B trees for standard structures● Not new ... see XFS, ReiserFS

● Chris Mason● Worked on ReiserFS for SUSE● Moved to Oracle -> started BTRFS developement

Page 32: Osdc2011.ext4btrfs.talk

OSDC 2011 32

BTRFS - facts

● Max file/volume size: 16 EByte ● Max length of file name: 256 Bytes● Support of

● Extended attributes● Encryption● Compression● Snapshot● Copy-on-Write

Page 33: Osdc2011.ext4btrfs.talk

OSDC 2011 33

BTRFS – global structure

● Entry point -> superblock● More than one file system per volume● Extents

● Put together in block groups● No mix of data and meta data

Page 34: Osdc2011.ext4btrfs.talk

OSDC 2011 34

BTRFS – internals: the trees

● Consists of B+ trees● Root tree● File system tree● Extent allocation tree● Checksum tree● Log tree● Chunk & device tree● Data relocation tree

Page 35: Osdc2011.ext4btrfs.talk

OSDC 2011 35

BTRFS – internals: structures

● 3 structures● Key

– index of the tree structure● Block header

– ID of file system– Reference of insert time– Level position

● Item– Different types: inodes, extents, directories

Page 36: Osdc2011.ext4btrfs.talk

OSDC 2011 36

BTRFS – internals: the key

● Index of the tree structure● Size: 136 bit● First 64 bit: unique object ID● Next 8 bit: type/item● Last 64 bit: item dependent

● e.g. Hash of directory name● e.g. Number of elements in directory● e.g. object ID of upper layer directory

Page 37: Osdc2011.ext4btrfs.talk

OSDC 2011 37

BTRFS – internals: the item

● More than one item per object ID possibleItem Value

INODE_ITEM 1

XATTR_ITEM 24

DIR_ITEM 84

DIR_INDEX 96

EXTENT_DATA 108

EXTENT_CSUM 128

ROOT_ITEM 132

EXTENT_ITEM 168

Page 38: Osdc2011.ext4btrfs.talk

OSDC 2011 38

BTRFS – more about trees

● Highest layer● Root tree● Referenced in superblock● Other trees => object ID in root tree

● Some trees unique● Extent allocation● Data relocation

● Possibly multiple trees● File system

Page 39: Osdc2011.ext4btrfs.talk

OSDC 2011 39

BTRFS – file system tree

● Visible part● Contains:

● Inode items ● Reference items

● No data of files ● See extents● Exception: small files

Page 40: Osdc2011.ext4btrfs.talk

OSDC 2011 40

BTRFS – extent allocation tree

● Space management● Backward reference

● file system object ● Possibly multiple per extent● Maybe move to extent data reference object

Page 41: Osdc2011.ext4btrfs.talk

OSDC 2011 41

BTRFS – other trees

● Log tree● Collects fsync() calls● Journal of this kind of COW calls

● Checksum tree● CRC32 checksums of data and meta data

● Chunk tree● Manage devices: device item and chunk map item

● Device tree● Counterpart of chunk tree

Page 42: Osdc2011.ext4btrfs.talk

OSDC 2011 42

BTRFS – device management

● Included volume manager ● pool concept● RAID-0 and RAID-1

● For data and meta data● Not necessarily identical

● Chunk tree● abstract from disk block

Page 43: Osdc2011.ext4btrfs.talk

OSDC 2011 43

BTRFS – extents, chunks, blocks

Page 44: Osdc2011.ext4btrfs.talk

OSDC 2011 44

BTRFS – what else

● Transparent compression via zlib● Support of POSIX ACL's● Online grow/shrink● Online add/removal of disks● No fsck() tool (yet)● Management tool evolution (btrfsctl -> btrfs)

Page 45: Osdc2011.ext4btrfs.talk

OSDC 2011 45

BTRFS – migration I

● Via tool btrfs-convert● du/df not fully BTRFS-aware● In place from ext3/4

● Via libe2fs ● BTRFS meta data location flexible● Old ext3/4 organized in snapshot● Roll-back possible to date/time of conversion

Page 46: Osdc2011.ext4btrfs.talk

OSDC 2011 46

BTRFS – migration II

Page 47: Osdc2011.ext4btrfs.talk

OSDC 2011 47

BTRFS summary

● Still experimental● Meets standard file systems requirements● Bridges existing gaps

● e.g. snapshots

● easy migration from ext3/4 possible● New approach to storage management

● e.g. included volume manager

Page 48: Osdc2011.ext4btrfs.talk

OSDC 2011 48

Summary

● Improvement moving to ext4● Safe switching to ext4● In place migration from ext3 possible● Future is BTRFS● In place migration from ext3/4 to BTRFS

possible

Page 49: Osdc2011.ext4btrfs.talk

OSDC 2011 49

References

● http://ext4.wiki.kernel.org● http://btrfs.wiki.kernel.org

Page 50: Osdc2011.ext4btrfs.talk

OSDC 2011 50

Thank you!