design rationales in the jrockit jvm

113
Design Rationales in the JRockit JVM Marcus Lagergren Senior Software Architect, Klarna

Upload: javadayua

Post on 16-Apr-2017

316 views

Category:

Software


1 download

TRANSCRIPT

DesignRationalesintheJRockit JVMMarcusLagergren

SeniorSoftwareArchitect,Klarna

DesignRationalesintheJRockit JVMMarcusLagergren

SeniorSoftwareArchitect,Klarna

DesignRationalesintheJRockit JVMMarcusLagergren

SeniorSoftwareArchitect,Klarna

Agenda• Inthebeginning…• Whatdidweaccomplish/Internals

– CodeGeneration– MemoryManagement– Threads&Synchronization

• Externals– TheJavaMissionControlsuite– AparenthesisonJRockitVE

• Q&A

Aboutthespeaker

@lagergren

Aboutthespeaker

@lagergren

Aboutthespeaker

@lagergren

Aboutthespeaker• M.Sc.fromKTH,Stockholm– NarrowlyescapeddoingaPhDonbitsecurityincryptographicsystems

• Runtime,OSandcompilerengineersince1999,withsomestartupbreaks

• OneoftheoriginalcreatorsoftheJRockitJVM

Inthebeginning

AppealVirtualMachines• AppealSoftwareSolutions– Consulting,almostexclusivelyJavaby1997

• Stillthepre-appserverera

AppealVirtualMachines• WesawthatJavawouldbegreatontheserverside

AppealVirtualMachines• WesawthatJavawouldbegreatontheserverside– Shorterdevelopmentcycles– moneyinthebank

• Bufferoverrunprotection• Automaticmemorymanagement• Writeonceruneverywhere

AppealVirtualMachines• Tremendousscalability problems• SunClassicVMwasall-encompassing

JavaOne1997• SunMicrosystemspresentstheHotSpotvirtualmachine

JavaOne1997• SunMicrosystemspresentstheHotSpotvirtualmachine– “WOW!Thisisthewaytodoit!Adaptiveruntimes!”

JavaOne1998• SunMicrosystemspresentstheHotSpotvirtualmachineagain

JavaOne1998• SunMicrosystemspresentstheHotSpotvirtualmachineagain– “WTF!Thisisslide-by-slidetheexactsamepresentationaslastyear!?!”

–Wecan’twaitanylonger.Let’sbuildourownVM.Howhardcanitbe?

CreatingourownJVM- JRockit

Productizeanarrowerdomain?• Server-sideusageonly.Headless.– Weneedtohelptheearlyappservervendorsgetperformanceandscalability

Productizeanarrowerdomain?• Server-sideusageonly.Headless.– Weneedtohelptheearlyappservervendorsgetperformanceandscalability

• Nointerpreter– “startuptimedoesn’tmatterontheserveranyway”

Productizeanarrowerdomain?• Server-sideusageonly.Headless.– Weneedtohelptheearlyappservervendorsgetperformanceandscalability

• Nointerpreter– “startuptimedoesn’tmatterontheserveranyway”

• Greenthreadsorn xm threads.– Explicitparallelismwasall-pervasive.

Productizeanarrowerdomain?• IncrementalGC–Wethoughtsomethinglike[Seligman,Grarup]wouldsuffice.

Productizeanarrowerdomain?• IncrementalGC–Wethoughtsomethinglike[Seligman,Grarup]wouldsuffice.

• Supportourselvesonconsultingonly.– Nope– neededventurecapital

TheJavaLicense• Youcan’tcallyourself“Java”withoutaJavalicense

• YouneedtopasstheTCKtestsuite– Notavailablewithoutlicense

• TogetaJavaLicenseyouneeda“valueadd”

TheJavaLicense• What’sa“valueadd”?

TheJavaLicense• What’sa“valueadd”?

TheJavaLicense• What’sa“valueadd”?

TheJavaLicense• What’sa“valueadd”?– Superiorperformance!

TheJavaLicense• What’sa“valueadd”?– Superiorperformance!–What?Youdidn’tlikethat?

TheJavaLicense• What’sa“valueadd”?– Superiorperformance!–What?Youdidn’tlikethat?– OK…Let’ssee…Err..“managability”

TheJavaLicense• JavaLicensewasgranted2001– HelpeduspartnerupwithBEASystemsandIntel

– BEAacquiredusin2002– OracleacquiredBEAin2008– OracleacquiredSunin2010

Whatdidweaccomplish?

Therealvalueaddsturnedouttobe:

• Multitieredsupportforpayingcustomers– PartoftheWLSstack

• MonitoringandServiceability– JRockit MissionControl(nowJavaMissionControl)

– Recordandintrospectproductionsystemswithzerooverhead.

Therealvalueaddsturnedouttobe:

• Pioneered“Softrealtime”GC– DeterministicGC– LowlatencyGC

Therealvalueaddsturnedouttobe:

• Virtualization– JRockitVirtualEdition– anoperatingsystemforJava

– ShorterpathsbetweenJavaandhardware– Hypervisorrequired– JRockitVEonvirtualhardwareoutperformedphysicalLinux!

Therealvalueaddsturnedouttobe:

• Thebenchmarkwars– ConstantlykeepingitgoingwithSunandIBM,drivingJavaserver-sideperformance

Therealvalueaddsturnedouttobe:

• JRockit becamethedefaultJVMintheOraclestackin2008

• ExaLogic

…andthen

INTERNALS

@SimmsUpNorth

CodeGeneration

Codegeneration– NoInterpreter• Keeptestmatrixsmall• Keepoperationalcomplexitydown• Targetingserversideapps– warmupasmallissue

• “Codecaching/AOTcanbedonelater”

Codegeneration– OneJIT• Keeptestmatrixsmall• Keepoperationcomplexitydown• Runitindifferentmodes,withmaximumcodereuse

• SameIRthroughout–Withgradualaugmentations

But…• Startupbecameaproblem–Weremovedoptimizersandaddedasa“spine”tothenormalJITpipeline.

• Lazycodegenerationthroughtrampolines• Samemechanismforcodeinvalidation• Bookkeepingtoidentifyaprogrampointdowntoanyindividualmachineinstruction

CodeGeneration• Same“spine”usedinalltiersofcodegeneration

CodeGeneration• Same“spine”usedinalltiersofcodegeneration

Optimizations• InandoutofSSA• AppliedtoalllevelsofIR

– Looppeeling,valuenumbering, Stringappendexplosion,Typecheckremoval,signextensionelimination,copypropagation, bounds checkremoval,virtualtofixedcalls, inlining,ifshortcircuiting,straightening,strengthreduction,constantpropagation, deadcoderemoval,outofloophoisting,explodeobjectsandarraycopies,boxing&unboxing removal,localescapeanalysis,ASMpeepholeoptimization,redundant memoryaccessremoval,etcetcetc…

• SupportforregionalizedIRs• GraphFusionRegisterAllocator

OptimizationTargets• Threadsampling• PartlytakenoverbysafepointbasedapproachinR28

• Somecodeinstrumentation,forexampleforinliningpath– Notinthegeneralcase,e.ginvocationcounters

OptimizationTargets• Hardwaresamplingwhereavailable– OnlygoodthingaboutIA64?– Couldalsomatche.g.L2missestoprogrampoints

• Buggingtheprocessormanufacturerssince2002aboutuserlandPCsamplebuffer.

• JRockitVEx1000moresamples– significantlyprovenshorterwarmup

HotSpotstyle?• On-stackreplacement?• Deoptimization?

HotSpotstyle?• On-stackreplacement?• Deoptimization?• Nevermuchcaredforanyit;-)

HotSpot StyleOSRandDeoptimization• We’veneverfoundapractical usecase.

– Sowecan’teverswapoutthemainfunctionwiththemicrobenchmark loop.Whocares?

• Anassumption isinvalidated– Eitherpatchcodedirectlyoruseaguardwhengeneratingitin

thefirstplace• Alargeassumption

– Writeatrapinthecodeandschedulelazyregenerationofentiremethod

• Notstrictly truefordynamic languages

HotSpot StyleOSRandDeoptimization• We’veneverfoundapractical usecase.

– Sowecan’teverswapoutthemainfunctionwiththemicrobenchmark loop.Whocares?

• Anassumption isinvalidated– Eitherpatchcodedirectlyoruseaguardwhengeneratingitin

thefirstplace• Alargeassumption

– Writeatrapinthecodeandschedulelazyregenerationofentiremethod

• Notstrictly truefordynamic languages

HotSpot StyleOSRandDeoptimization• We’veneverfoundapractical usecase.

– Sowecan’teverswapoutthemainfunctionwiththemicrobenchmark loop.Whocares?

• Anassumption isinvalidated– Eitherpatchcodedirectlyoruseaguardwhengeneratingitin

thefirstplace• Alargeassumption

– Writeatrapinthecodeanddoregenerationofentiremethod• Notstrictly truefordynamic languages

HotSpot StyleOSRandDeoptimization• We’veneverfoundapractical usecase.

– Sowecan’teverswapoutthemainfunctionwiththemicrobenchmark loop.Whocares?

• Anassumption isinvalidated– Eitherpatchcodedirectlyoruseaguardwhengeneratingitin

thefirstplace• Alargeassumption

– Writeatrapinthecodeanddoregenerationofentiremethod• Notstrictly truefordynamic languages

“Garbagecollectingcode”• Codekeptinbinarytreeofcodeblocks~ 64M– Moreiflargepagesenabled

• Classloaderunloadingà garbagecollection• Referencecounttoactivecodemodifiedwhen

backpatching• Specializedusageofcodeblocks.– Trampolinesonly– Optimizedcodeonly

Bytecodeisbad– killitquickly

Bytecodeisbad– killitquickly• What’swiththegoto:s?• WhycanitexpressmorethanJavasourcecode?– OKweunderstandthemultilanguageconcept,wesortaforgiveyou.

– Butman,dominatorsandloopanalysis–that’salotofcompiletime

Bytecodeisbad– killitquickly• …andwhyisitastackmachineANDaregistermachinewith65535registersatthesametime!?

• Initially triedtoreconstructASTs– Obfuscatorsetcmadethisprettyhopeless.

• ~15%oftheklocsinJRockit/codegendoflowcontrolanalysisonthegoto:s

TheIR• UseIReverywhere(orJava)• TheIRshouldideallyreflectanyofseveralpluggable

frontends.– WeplayedaroundwithCLRabit.– Thesedays– dynamiclanguages:-)

• NoSeaofNodes• NoHotSpotstyle“highlevelIRislowlevel”

TheIR• SimpleIRinMIRform(platformindependent)

TheIR– DesignRationale• Wehadsomecompilerexperience– wantedtobeontrackquickly.Doitthetraditionalway.

• Wearenot“wrong”.LLVMisverysimilar.

TheIR– DesignRationale• Tiered: highesttier==alwayshighlevel• Hardwareagnostic.• Noarchitecturespecificmemoryops

• Tiered: lowesttier==alwaysthenativearchitectureinstructionforinstruction.• Agradualtransition.• ACPUhasnoseaofnodes.

TheIR• HighestIRlevelmayhaveoperationsasoperands

• Intrinsicseverywhere– arraycopy, membar, cmpuXX, sse4IndexOf,

doubleToLongBits, crypto, Math.sin andsoon…• RegretnotdoingmoreinSSAform

TheIRInfo“database”• Lazilycomputableinformation

– Liveness– Dominators– Loopinformation– Aliases– Typeinference– Ranges– Nullnessanalysis– …– Invalidateonmodification.

• Notaverystablemodel.

Memorymanagement

Transition:objectlayout,typesandlivemaps…

Objectlayoutandtypes• Objectheadersshouldbefixedsized.• JRockit Objectheaderis32+32bits• Allplatformswithsomecontentvariations.

• [Grove]ramblingsonobjectmodels• Typetreesimilarto [Krall,Vitek,

Horspool]

Livemaps(oopmaps)• Registersandstackslotsonthelocalframethatcontainobjects.

• Nothingstrangehere.Requiredfornon-conservativegarbagecollectionofanysort.

• Internalpointerbit• Formstherootset.• Rollforwardingvsthesafepointapproach

Transition- Livemaps

Memorymanagement• Garbagecollectors– Concurrent– Parallel– Deterministic

• Withorwithoutgenerations

Memorymanagement• Concurrent collection

– Yourbasicgenerational concurrentmarkandsweepcollector [Printezis,Detlefs]

– Supportsmultigeneration (>1)youngspaces.• Combatsheavyobjectallocationsituations.• Adaptivelybalancedagainstcopyoverhead

– Writebarriersbeforeobjectwrites– Minimizestoppingtheworld– Youngcollections useavariantofstop&copy

Memorymanagement• Canalsorunwithaparallel policy– Stoptheworldandcleanupquickly– Onlythroughputoriented– Nowritebarriers,asthereisnoneedforacardtable

Mark&Sweep• BackboneofGCbasedontraditionaltri-colormarkandsweep

• Adaptivethreadusageandadditionalconcurrency

Mark&Sweep• Twocolors– notthree.

– Objectisinoneoftwosets– Liveobjects:greybits(mixofgrey&blackobjectsintraditional tri-coloring)

– Distinctionhandledbyputtinggreyobjectsinthreadlocalqueues foreachGCthread.

– Parallel threadscanworkonthreadlocaldata– Efficientprefetching ispossibleduetoFIFOorder.

Nopermgenever!

Othernicefeatures• Nopermgen!!!Ever!

Othernicefeatures• Nopermgen!!!Ever!• Pinnedobjects.– Fastmemorybuffers– Alsoenablenon-contiguousheaps

Othernicefeatures• Nopermgen!!!Ever!• Pinnedobjects.– Fastmemorybuffers.– Alsoenablenon-contiguousheaps.

• Compaction– “Internalandexternal”.– G1evacuatesregionsinsteadwithastoptheworld-and-copypolicysimilartoJRockit YC

Memorymanagement• Concurrent GChasanadditionalset:livebits

– Containsallliveobjectsinthesystem,includingthenewlycreatedones.

– JRockit canquicklyfindobjectsthathavebeencreatedduringaconcurrentmarkphase.

– Cardtables• NotjustforgenerationalGC• Alsotoavoidsearchingtheentireliveobjectgraphwhenaconcurrentmarkphasecleansup.

• Justlookatdirtycardsattheendofthemarkphase.

YoungCollections• Avariantofstopandcopyisused.– Allthreadsarehaltedandobjectsaredeletedorpromoted

– Hierarchicalbreadthfirstcopyforcachelocality• Parallelizesnicely• Manythreadsalwaysharvestayoungspace

YoungCollections• Youngandoldcollectionsmayoccuratsametime.– Allbitsetsanddatastructurescanbesharedaslongastheoldcollectionisguaranteedtoseeallcardsthathavebecomedirtyduringaconcurrentphase.(Extracardtabletorecordthis“difference”– “modifiedunionset”)

– Keepthisintactforoldcollection

ThreadLocalAllocation• Threadlocalallocation• ThreadlocalareasareroughlyL2cachesizedandobjectsareallocatedherebeforetheyareforcedupontheheap

CompressedReferences• Forlessthan4(or4*x)GBofmaximumheapsize

• Use32bitpointers(or32+log2(x)bits)

CompRef compress(Ref ref) {

return (uint32_t)ref; //truncate reference to 32-bits

}

Ref decompress(CompRef ref) {

return globalHeapBase | ref;

}

CompressedReferencesCompRef compress(Ref ref) {

return (uint32_t)ref; //truncate reference to 32-bits

}

Ref decompress(CompRef ref) {

return globalHeapBase | ref;

}

CompRef compress(Ref ref) {

return (uint32_t)(ref >> log2(objectAlignment));

}

Ref decompress(CompRef ref) {

return globalHeapBase | (ref << log2(objectAlignment));

}

DeterministicGC• QoSlevelforlatencies.“NomorethanXms”• Downtosingledigitsonmodernx86hardware

• Caveat:livedataonheapisthemainconstraint.– Upto50%ofheaplivedatastillfeasible

DeterministicGC

DeterministicGC– How?• Greedystrategy– Postponestoppingtheworldforaslongaspossible.

–Maybetheproblemgoesawayandwedon’thavetostoptheworld

• Splitupeverythingintoworkpackets– Dropthematanytime.

DeterministicGC– How?• Efficientparallelization.–Markphaseis90%ofGCtime

• Efficientheuristics– Somemoreworkine.g.writebarriers

ThreadsandSynchronization

ThreadsandSynchronization• Ajava.lang.Thread isanativethread.– Interesting,though:threadpoolingandpseudothin-threadsareback,forexampleinAkka.

– Java8– Collection.parallelStream– Theworldismovingtowardsimplicitparallelismingeneral

• MostoftheJRockitthreadcodeandadaptivitylogiciswritteninJava

ThreadsandSynchronization• Locksarethinorfat– Adaptiveinflationanddeflation

• Lazylocking(biasedlockingsupported)– Adaptiveheuristicsforbanningandretryingthelazyapproach.

ThreadsandSynchronizationpublic class PseudoSpinlock {

private static final int LOCK_FREE = 0; private static final int LOCK_TAKEN = 1;

public void lock() { //burn cycleswhile (cmpxchg(LOCK_TAKEN, &lock) == LOCK_TAKEN) {

micropause(); //optional}

}

public void unlock() { int old = cmpxchg(LOCK_FREE, &lock); //guard against recursive locksassert(old == LOCK_TAKEN);

} }

ThreadsandSynchronization• Locksarethinwhenfirsttaken• Timespentinlockandtimestakentriggersinflation

• wait ornotify immediatelyinflatesalock• Fatlocksarealsodeflatedwhenuncontendedfortoolong

ThreadsandSynchronization

ThreadsandSynchronizationThinlocklifecycle

ThreadsandSynchronizationThin&fatlocklifecycle

LockPairing• Bytecodeagain– norestrictiononmatchingmonitorenter withmonitorexit

• NotallofthemcanbeanalyzedbytheJIT

LockPairing• Wecanstorewhatweknow,andmakeunlocksquick.– Locktokens(theobjectOR3bits)

• Thin,fat,recursive, lazilytaken,unmatched

– Livemapsystemcontainsnestingorder.

Optimizations• Alotofsmallish codegentransforms: e.g.Lockfusion• “Fatspin”• Lazyunlocking(biasedlocking)

– Startassumingalllocksarelazy.Tagthinlocksaslazilylocked.– Ifobjectalreadylazilylocked

• Ifit’sthesamethread:profit• Else– stopthelockholder,detectthe“real”lockstatebystackwalk.Converttothinlockorforcefullyunlockit

– Transferbits– Heuristics:objectandclassbanning.Ageing.

ThreadsandSynchronizationThin,fat&lazylocklifecycle

Exportitall!– JRockitMissionControl

(nowJavaMissionControl)

@javamissionctrl$JAVA_HOME/bin/jmc

MissionControl• Use“free”runtimeinformation!

– JRockit(Java)MissionControl• JRockit(Java)flightrecorder• Memoryleakdetector(JRockitonly)• Managementconsole

• $JAVA_HOME/bin/JCMD (usedtobeJRCMD)• EverythingintheVMabstracted intoaneventthat

mayormaynothaveaduration• Soon:publicAPI

JavaFlightRecorder• Alwayson

– Excellentfordebuggingandanalysisofcrashes– Canbesettorecordmoreintrusivelyforperiodsinproduction

• E.g.extensive lockprofiling• Everythingisanevent• Bufferedrecording– thelastn secondsavailableatanycrashor

whenacommandisgiven.• Veryfineprecision.

– Multimediatimersandsystemhardwaresupportrequiredfore.g.latencies

LatencyAnalysis

TheManagementConsole• PeekintotherunningproductionJVM• Addtriggersonevents• InteractwiththeVM:forceGCetc.

TheMemoryLeakDetector• Introspectthetypegraphinrealtime.LookfortypesthataregrowingdespiteGC:s

Studyingarecordingoffline

JRockitVirtualEdition

IstheJVManOS?

IstheJVManOS?• Addacooperativeaspecttothreadswitching• Zero-copynetworkingcode• ReducecostofenteringOS• Balloondriver• Runsonlyonhypervisor• FacilitatespauselessGC

IstheJVManOS?

Thankyou!

Wouldyouliketo

knowmore?

OracleJRockit –

theDefinitiveGuide