lug11 zfs on linux for lustre

Upload: thietnha7167

Post on 04-Jun-2018

237 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    1/22

    Lawrence Livermore National Laboratory

    Brian Behlendorf

    Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551

    This work performed under the auspices of the U.S. Department of Energy by

    Lawrence Livermore ational Laboratory under !ontract DE"#!$%"&'#%'())

    ZFS on Linux for Lustre

    LUG11April 13, !11

    LLNL-PRES-479831

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    2/22

    2

    Lawrence Livermore National Laboratory

    LLNL-PRES-479831

    ZFS"Lustre #istory

    2007 Livermore raises ldiskfs scalability/performance concerns

    Fsck, filesystem size, random IO, data integrity, etc

    Alternate backend is needed for largelstre filesystems

    !F" identified as tec#nically t#e best soltionAddresses all kno$n ldiskfs limitations

    %roven prodction &ality implementation

    Licensing concerns can be addressed

    'st be ported to Lin(

    )F"/"n start !F"/Lstre ser space implementation

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    3/22

    3

    Lawrence Livermore National Laboratory

    LLNL-PRES-479831

    ZFS"Lustre #istory

    200* Livermore starts porting !F" to t#e kernel

    Intended to determine viability of a kernel port

    +o nsrmontable tec#nical isses discovered

    Initial performance reslts are encoraging

    "n Lstreosd development

    "#ift in strategy, t#e Livermore kernel port is adopted

    -rian .oins t#e "n Lstreosd development team

    )ontined Lstreosd development

    Licensing concerns nresolved $ork contines

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    4/224

    Lawrence Livermore National Laboratory

    LLNL-PRES-479831

    ZFS"Lustre #istory

    200 Livermore !F" development

    Focs on a prodction &ality !F" port

    -ilt &arter scale prototype !F"/Lstre filesystem

    "n/Oracle Lstreosd developmentOracle ac&ires "n

    Lstreosd development contines nc#anged

    !erocopy, grants, large dnodes, &otas, tilities, etc

    Licensing concerns nresolved $ork contines

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    5/225

    Lawrence Livermore National Laboratory

    LLNL-PRES-479831

    ZFS"Lustre #istory

    2010 Livermore !F" development

    Lin( integration tilities, dev, zevents, disk failres3

    -ilt a fll scale !F"/Lstre filesystem

    Oracle Lstreosd developmentAnnonced !F"/Lstre only available for "olaris

    Lstreosd development contines on Lin(

    Oracle cancels Lstre progress is delayed

    Licensing concerns nresolved $ork contines at LL+L

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    6/226

    Lawrence Livermore National Laboratory

    LLNL-PRES-479831

    ZFS"Lustre #istory

    2011 Livermore !F" development

    !F" %osi( Layer !%L3 added

    Lstreosd development branc# pblicly available

    4#amclod Lstreosd development

    )ontracted by Livermore to complete Lstreosd

    'ost of t#e original Lstreosd developers are at 4#amclod

    Licensing concerns nresolved $ork contines

    Late 2011

    Livermore plans a !F"/Lstre filesystem for "e&oia

    50 %- capacity, 512 6-/s 1 8-/s band$idt#

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    7/227

    Lawrence Livermore National Laboratory

    LLNL-PRES-479831

    ZFS $verview

    9eveloped by "n no$ Oracle3 on "olaris )ombined filesystem, logical volme

    manager, :AI9

    )opyon$rite -iltin data integrity

    Intelligent online scrbbing and resilvering

    ;ery large filesystem limits

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    8/22

    8

    Lawrence Livermore National Laboratory

    LLNL-PRES-479831

    LLNL%s &easons for portin' ZFS

    Lstre servers crrently se e(t< ldiskfs3 :andom $rites bond by disk IO%" rate, not disk band$idt#

    O"8 size limits

    fsck time is nacceptable

    =(pensive #ard$are re&ired to make disks reliable

    Late 2011 re&irement>

    50%-, 5126-/s 1 8-/s

    At a price $e can afford

    )O4 se&entializes random $rites

    +o longer bond by drive IO%"

    "ingle volme size limit of 1? =i-

    !ero fsck time Online data integrity and error #andling

    =(pensive :AI9 controllers are nnecessary

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    9/22

    9

    Lawrence Livermore National LaboratoryLLNL-PRES-479831

    Licensin' (oncerns

    Lin( @ernel

    "%L

    !F")99L

    6%L

    6%L

    )99L )ommon 9evelopment and 9istribtion License6%L 6n3 6eneral %blic License

    Lstre6%L

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    10/22

    10

    Lawrence Livermore National LaboratoryLLNL-PRES-479831

    Licensin' (oncerns

    9istribting "orce )99L is an open sorce license

    )99L provides an e(plicit patent license

    !F" c#anges contribted as )99L code

    !F" sorces kept separate from all 6%L code

    9istribting -inaries

    Lin( kernel allo$s non6%L t#ird party modles

    +vidia, A8I, etc

    Lins vie$s t#e kernel modle interface as L6%L

    !F" ses no 6%Lonly symbols

    Inclded #eaders do not make a derived $ork

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    11/22

    11

    Lawrence Livermore National LaboratoryLLNL-PRES-479831

    Licensin' (oncerns

    !F" is +O8 a derived $ork of Lin(

    BIt $old be rat#er preposteros to call t#e Andre$ File"ystem a Cderived

    $orkC of Lin(, for e(ample, so I t#ink itCs perfectly O@ to #ave a AF"

    modle, for e(ampleD

    Lins 8orvalds

    BOr vie$ is t#at .st sing strctre definitions, typedefs, enmeration

    constants, macros $it# simple bodies, etc, is +O8 enog# to make a

    derivative $ork It $old take a sbstantial amont of code coming from

    inline fnctions or macros $it# sbstantial bodies3 to do t#atD

    :ic#ard "tallman 8#e F"FCs vie$3

    S l i ) ti L

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    12/22

    12

    Lawrence Livermore National LaboratoryLLNL-PRES-479831

    Solaris )ortin' LayerLinux"ZFS Glue

    Lin( @ernel

    "%L

    !F"

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    13/22

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    14/22

    14

    Lawrence Livermore National LaboratoryLLNL-PRES-479831

    )orte* by LLNL

    A:)

    !IO

    ;9=; )onfigration

    9'E

    !A%!IL

    9"L

    8raversal

    O"8

    OF9

    !F" O"9

    '98

    '99

    !;OL /dev/zfs

    libzfs

    !F" )LI

    Interface

    Layer

    8ransactionalOb.ectLayer

    %ooled"torageLayer

    Lstre

    Eser

    @ernel

    !%L

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    15/22

    15

    Lawrence Livermore National LaboratoryLLNL-PRES-479831

    (FS + Sun + $racle + -amclou*

    A:)

    !IO

    ;9=; )onfigration

    9'E

    !A%!IL

    9"L

    8raversal

    O"8

    OF9

    !F" O"9

    '98

    '99

    !;OL /dev/zfs!%L

    libzfs

    !F" )LI

    Interface

    Layer

    8ransactionalOb.ectLayer

    %ooled"torageLayer

    Lstre

    Eser

    @ernel

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    16/22

    16Lawrence Livermore National Laboratory

    LLNL-PRES-479831

    ZFS"Lustre )rototype .Zeno/

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    17/22

    17Lawrence Livermore National Laboratory

    LLNL-PRES-479831

    $SS SSU .Zeno/

    )omponent -and$idt#9: I- 25? 6-/s

    Gost "A" ?0 6-/s

    H-O9 "A" ?0 6-/s

    9isk 5?0 6-/s

    *? 8- / ""E

    25? 6-/s

    70 28- 9isks / Gost

    7 *2 :aid!2 grops 1 112 8- O"8 / Gost

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    18/22

    18Lawrence Livermore National Laboratory

    LLNL-PRES-479831

    $SS SSU .Zeno3/

    ?0 8- / ""E

    J*< 6-/s

    50 28- 9isks / Gost

    5 *2 :aid!2 grops 1 *08- O"8 / Gost

    )omponent -and$idt#9: I- J*< 6-/s

    Gost "A" J*< 6-/s

    H-O9 "A" ?0 6-/s

    9isk ?00 6-/s

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    19/22

    19Lawrence Livermore National Laboratory

    LLNL-PRES-479831

    4rite :ead0

    5

    10

    15

    20

    25

    J0

    )rrent ""ELstreIO:

    )rrent ""EGard$areLimit

    !F" ""ELstreIO:

    !F" ""E!%IO"

    !F" ""EGard$areLimit

    6i-

    /s

    ZFS )erformance (omparison

    "ame nmber of drives

    "A8A vs "A" disk

    :AI9!2 vs :AI9?

    4rite %erformance isLimited by t#e !F" %ort

    :ead %erformance is

    Limited by Lstre/)%E

    !F" is noptimized, t#iscan all be improvedK

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    20/22

    20Lawrence Livermore National Laboratory

    LLNL-PRES-479831

    Sin'le No*e rite )erformance

    4rite performance is

    consistent $it# Lstre

    Lstre $orkload

    :andom 1'i- I/Os

    12* t#rs to

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    21/22

    21Lawrence Livermore National Laboratory

    LLNL-PRES-479831

    Sin'le No*e &ea* )erformance

    :ead performance is

    significantly better t#an Lstre

    Lstre 4orkload

    :andom 1'i- I/Os

    12* t#rs to

  • 8/13/2019 LUG11 ZFS on Linux for Lustre

    22/22

    22Lawrence Livermore National Laboratory

    LLNL-PRES-479831

    0ore nformation

    !F" N "%L

    #ttp>//zfsonlin(org

    'ailing Lists

    zfsannoncezfsonlin(org

    zfsdiscsszfsonlin(org

    zfsdevelzfsonlin(org

    9o$nload soft$are

    9ocmentation

    Lstre spport for !F"

    #ttp>//zfsonlin(org/lstre#tml

    Licenses

    )99L #ttp>//#bopensolarisorg/bin/vie$/'ain/licensingPfa&

    6%Lv2 #ttp>//$$$gnorg/licenses/gpl20#tml

    Lins #ttp>//lin(mafiacom/fa&/@ernel/proprietarykernelmodles#tml

    :'" #ttp>//lkmlindianaed/#ypermail/lin(/kernel/0J011/0J?2#tml

    http://zfsonlinux.org/mailto:[email protected]:[email protected]:[email protected]://zfsonlinux.org/lustre.htmlhttp://hub.opensolaris.org/bin/view/Main/licensing_faqhttp://www.gnu.org/licenses/gpl-2.0.htmlhttp://linuxmafia.com/faq/Kernel/proprietary-kernel-modules.htmlhttp://lkml.indiana.edu/hypermail/linux/kernel/0301.1/0362.htmlhttp://lkml.indiana.edu/hypermail/linux/kernel/0301.1/0362.htmlhttp://linuxmafia.com/faq/Kernel/proprietary-kernel-modules.htmlhttp://www.gnu.org/licenses/gpl-2.0.htmlhttp://hub.opensolaris.org/bin/view/Main/licensing_faqhttp://zfsonlinux.org/lustre.htmlmailto:[email protected]:[email protected]:[email protected]://zfsonlinux.org/