d conf 2016 - using d for primary storage
TRANSCRIPT
UsingDforDevelopmentofLargeScalePrimaryStorage
LiranZvibelWeka.IO,[email protected]@liranzvibel 1#DConf2016
• Weka.IOIntroduction
• Ourprogresssincewepickedoff• ExampleswhereDreallyshines
• Ourchallenges• Improvementssuggestions
• Q&A
2
Agenda
DforPrimaryStorage#DConf2016
• Enablingcloudsandenterpriseswithasinglestoragesolutionforresilience,performance,scalabilityandcostefficiency
• HQinSanJose,CA;R&DinTelAviv,Israel• 30engineers,vaststorageexperience• VCbackedcompany;SeriesBledbyWaldenInternational;SeriesAledbyNorwestVenturePartners
• Productusedinproductionbyearlyadopters(stillinstealth)• Over200klocofourownDcode,about35packages
4
AboutWeka.IO
DforPrimaryStorage#DConf2016
• Extremelyreliable,“alwayson”,state-full.
• Highperformancedatapath,measuredinµsecs
• Complicated“controlpath”/“managementcode”
• DistributednatureduetoHArequirements
• LowlevelinteractionwithHWdevices
• Somekernel-levelcode,someassembly
• Languagehastobeefficienttoprogram,andfitforlargeprojects
5
Storagesystemrequirements
DforPrimaryStorage#DConf2016
• Softwareonlysolution• User-spaceprocesses• 100%CPU,pollingbasedonnetworkingandstorage• Asynchronousprogrammingmodel,usingFibersandaReactor
• Memoryefficient,zero-copyeverything,verylowlatency
• GCfree,lock-freeefficientdatastructures• ProprietarynetworkingstackfromEthernettoRPC
6
TheWeka.IOframework
DforPrimaryStorage#DConf2016
• Nomoreshow-stoppers,stillalongwaytogo• Indeedproductivityisveryhigh,verygoodcode-to-featuresratio• Weareableto“rapidprototype”featuresandthenironthem• Allmajorruntimeissuesresolved• Wegetgreatperformance
• ChoosingDwasagoodmove,andprovedtobeahugesuccess
8
CurrentstateforWeka
DforPrimaryStorage#DConf2016
• SwitchedtoLDC(thanksDavidNadlingerandtheLDCteam!)
• Compilationisnowbypackage• BetterRAM“management”• Leveragingparallelismtospeedbuildtime
• Recentfront-ends“feel”muchmorestable
• LDCletsusbuildoptimizedcompilationwithasserts,whichisagoodthingforQA.
9
Compilationprogress
DforPrimaryStorage#DConf2016
• Gotover100%performanceboostoverDMD• Whencompilingasasinglepackagewithoptimizations• Fiberswitchingbasedonregistersandnotpthreads• NoGCallocationwhenthrowingandhandlingexceptions(ThanksMithun!)• Integratelibunwindwithdwarfsupportforstacktraces(no--disable-fp-elim)• Supportdebug(-g)withbackendoptimizations• Templateinstantiationbug—stillunresolvedfortheupstream• @ldc.attribute.section(“SECTIONNAME”) • -staticflagtoldc,allowingeasycompileandshipmentofutilities
10
LDCstatus
DforPrimaryStorage#DConf2016
• Wenowcheckhowmuchweallocated(usinghacks,apiwouldbenice)fromtheReactor,anddecidetocollectifweallocatedmorethan20MB
• Collectionactuallyhappensveryinfrequently(fewtimesinanhour)
• Collectiontimeisde-synchronizedacrossthecluster
• Collectiontimestillsignificant—about10ms
• Maindrawback—allocationMAYtake‘infinite’amountoftimeifkernelisstressedonmemory.
11
GCallocationandlatency
DforPrimaryStorage#DConf2016
• ExceptionhandlingcodewasmodifiedtoneverrelyonGCallocation• ReactorandFiberscode(+ourTraceInfoclass)modifiedtokeepthetraceinafiberlocalstate.✴Problem:potentiallythrowingfromscope(exit/success/failure) • Throwablesareaclass,soallocatingthemcomesfromtheGC,mustbestaticallyallocated:• static __gshared auto ex = new Exception(“:o(”);
12
ExceptionsandGC
DforPrimaryStorage#DConf2016
• _genkeepsincrementingwhenbuffetsallocatedfrompools• Pointersremembertheirgenerations,andvalidateaccurateaccess• Helpsdebuggingstalepointers• problemwithimplicitcastsofnull,aliasthisisnotstrongenough.Maybesomesyntaxcouldhelp
14
NetworkBufferPtr
DforPrimaryStorage#DConf2016
@nogc @property inout(NetworkBuffer)* get() inout nothrow pure {autoptr=cast(NetworkBuffer*)(_addr>>MAGIC_BITS);assert(ptrisnull||(_addr&MAGIC_MASK)==ptr._gen);returnptr;}
aliasgetthis;
switch (pkt.header.type) { foreach(name; __traits(allMembers, PacketType)) { case __traits(getMember, PacketType, name): return __traits(getMember, this, "handle" ~ name)(pkt);}
15
Handlingallenumvalues
DforPrimaryStorage#DConf2016
• SimilarsolutionverifiesallfieldsinaCstructhavethesameoffset,naturallytheCpartendsupbeingmuchmorecomplex.
@propertyboolflag(stringNAME)(){return(_flags&__traits(getMember,NBFlags,NAME))!=0;}@propertyvoidflag(stringNAME)(boolval){if(val){_flags|=__traits(getMember,NBFlags,NAME);}else{_flags&=~__traits(getMember,NBFlags,NAME);}}buffer.flag!"TX_ACK"=true;
16
Flagsetting/testing
DforPrimaryStorage#DConf2016
staticif(JoinedKV.sizeof<=CACHE_LINE_SIZE){aliasKV=JoinedKV;enumseparateKV=false;}else{structKV{Kkey;/*valueswillbestoredseparatelyforbettercachebehavior*/}V[NumEntries]values;enumseparateKV=true;}
17
Efficientpacking
DforPrimaryStorage#DConf2016
• Projectisbrokeninto~35packages.• Somelogicalpackagesarecompiledasseveralsmallerpackages• Current2.0.68.2compilerhasseveralpackagescompiledabout90seconds,leadingtototalcompiletimeof4-5minutes.• Newer2.070.2+PGOcompilerreducestimebyabout35%(ThanksJohan!).Stillgetting3-4minutespercompletecompile.
19
Compilationtime
DforPrimaryStorage#DConf2016
• Introducemoreparallelismintothebuildprocess• Supportincrementalcompiles.• Nowwhenadependencyischanged,completepackageshavetobecompletelyrebuilt.Inmanycases,mostoftheworkisredundant
• WhendependencyIMPLEMENTATIONischanged,stilleverythinggetsrecompiled
• Support(centralized)cachingforbuildresults.
• Don'tlethumans“contextswitch”whilewaitingforthecompiler!
20
Compiletimeimprovementsuggestions
DforPrimaryStorage#DConf2016
• Totalsymbols:99649,over1k:9639,over500k:102,over1M:62• Longestsymbolwas5M!
• Makesworkingwithstandardtoolsmuchharder(somenmtoolscrashontheexe).• Asimplehashingsolutionwasimplementedinourspecialcompiler• Demanglingnowstoppedworkingforus,weonlygetmodule/funcname
• Moretimeisspentonhashingthanwhatissavedonlinkage.Wemayneeda“native”solution.
21
LongSymbols
DforPrimaryStorage#DConf2016
privatestructMapResult(aliasfun,Range,ARGS…){ARGS_args;aliasR=Unqual!Range;R_input;this(Rinput,ARGSargs){_input=input;_args=args;}@propertyautoreffront(){returnfun(_input.front,_args);}…autounder_value_gc(R)(Rr,intvalue){returnr.filter!(x=>x<value);}
autounder_value_nogc(R)(Rr,intvalue){returnr.xfilter!((x,y)=>x<y)(value);}
automultiple_by_gc(R)(Rr,intvalue){returnr.map!(x=>x*value);}
automultiple_by_nogc(R)(Rr,intvalue){returnr.xmap!((x,y)=>x*y)(value);}
22
PhobosAlgsForcingGC
DforPrimaryStorage#DConf2016
• Makeitexplicit• Allowittomanipulatetypes,toreplacecomplextemplaterecursion
24
static foreach
DforPrimaryStorage#DConf2016
templatehasUDAttributeOfType(T,aliasX){aliasattrs=TypeTuple!(__traits(getAttributes,X));
templatehelper(inti){staticif(i>=attrs.length){enumhelper=false;}elsestaticif(is(attrs[i]==T)||is(typeof(attrs[i])==T)){staticassert(!helper!(i+1),"Morethanonematchingattribute:"~attrs.stringof);enumhelper=true;}else{enumhelper=helper!(i+1);}}enumhasUDAttributeOfType=helper!0;}
• Specifysome@UDAsastransitive,sohecompilercanhelp“prove”correctness.• Forexample:• Definefunctionas@atomicifitdoesnotcontextswitch• Functionmaybe@atomicifitonlycalls@atomicfunctions• Nextstepwouldbetoprovethatnocontextswitchhappens• Canbeimplementedin“runtime”ifthereisa__traitsthatreturnsallthefunctionsthatafunctionmaycall.• Nextphasewouldbetobeableto‘prove’thingsonthefunctions,so@nogc,nothrow,pureetccanusethesamemechanism.
25
Transitive@UDA
DforPrimaryStorage#DConf2016
• __traitsthatreturnsthatmaxstacksizeofafunction
• Addapredicatethattellswhetherthereisanexistingexceptioncurrentlyhandled
• DonateWeka’s@nogc‘standardlibrary’toPhobos:• OurFiberadditionsintoPhobos(throwInFiber,TraceInfosupport,etc)[otherlibfunsaswell]
• Containers,algorithms,locklessdatastructures,etc…
26
OtherSuggestions
DforPrimaryStorage#DConf2016
DforPrimaryStorage#DConf2016 28
Table 1
0.68 0.70 0.70 + PGO88.4 58.1 54.784.3 57.1 54.775.8 51.0 49.767.5 40.7 37.359.5 36.4 43.156.6 38.3 35.351.9 35.9 32.450.4 30.9 34.644.9 25.2 26.842.7 31.0 27.035.7 31.3 30.235.1 24.5 22.931.4 21.1 17.730.5 20.5 19.925.8 20.1 16.319.0 13.5 15.418.3 12.0 11.314.3 10.6 10.413.7 14.0 13.69.4 6.8 6.3
• 2.0.70.2isamajorimprovementincompiletimeoverthe2.068.2• Still,the30-40%improvementmeanthatengineershavetowaitlongminutestogetthewholeexetobuild.
• We’rebreakinglargepackageintosmallerones,whenpossible