are we fast yet? html & javascript performance - utahjs
DESCRIPTION
Presentation to UtahJS on webkit.js and HTML/Javascript performance.TRANSCRIPT
Are we fast yet? JavaScript & HTML performance
Trevor Linton - July 2014
JavaScript? Sure.
• Firefox has asm.js (A subset of JavaScript)
• Chrome has V8 • Safari now has LLVM optimizations
C++/Clang
Firefox ASM.js
In some measurements we’re butting up against native C++.
JS has road blocks in HTML though.
JIT Begins optimizing.
STOP, unknown what this function may do or return.
Recalculate Layouts
STOP, Waiting for return value from renderer
STOP, Events have unknown values, cannot pre-optimize
Mouse click from user
Renderer kicks off JS
HTML Renderer JS compiler Code
function() { .... var el = document.getElementById(‘id’) .... var bounds = el.getBoundingClientRect() .... } addEventListener(‘click’,function(e) { ... });
An experiment to overcome this
Re-implement rendering in HTML5 to be JavaScript based.
An experiment to overcome this
• Re-implement HTML5 rendering in JavaScript.
• JS can fully JIT through any DOM operation and optimize.
• JS optimizer has ability to anticipate inputs from C++ in sync/async events.
• Using ASM.js we can get near C++ runtime speeds.
Original C++ WebKit Code (webcore actually)
Using LLVM/Clang and emscripten compile it down to javascript.
webkit.js
webkit.js speed results (x=iter.)
• Rendering becomes substantially faster after progressive runs.
• Rendering speed on pair with native speeds.
• Firefox faster due to built-in 1:1 ASM.js optimizations.
• DEMO: http://trevorlinton.github.io
0"
0.2"
0.4"
0.6"
0.8"
1"
1.2"
1" 2" 3" 4" 5" 6" 7" 8"
webkit.js"Chrome"35"
webkit.js"in"Firefox"30"
Chrome"35"
Firefox"30"
Getting over the DOM fence...
• Continue building a JS based HTML renderer.
• Firefox, Chromium and the later are working on pulling more of the DOM into native JS.
• Proposal is out to recreate CSS styles in JS for Chromium.
• Firefox is already getting close to this...
But WebKit is complex...
CSS Animation
Rendering
Hardware Compositing
Other Things… (Layout, Network, Parsing, DOM, CSS, Javascript)
Compositing, Painting, Drawing and Rendering
ChromeClient(Implemented as
ChromeClient***.cpp)
AcceleratedCompositor (GraphicsLayerClient)
GraphicsLayer(TextureMapperLayer)
WebView
TextureMapperGL(TextureMapper /
GraphicsLayerClient)
ContextGL, OpenGLES V2 (platform specific, accelerated)
CREA
TES
BUT
DOES
NO
T M
ANAG
E
CREA
TES
CHRO
MEC
LIENT
JS A
ND H
ANDS
OFF
ACCE
LERA
TED
COM
POSI
TOR
ONCE
CRE
ATED
ChromeClientJS Executes on AcceleratedContext:setRootGraphicsLayerenabled?()scheduleLayerFlushresizeRootLayer
Chrome class A proxy for ChromeClient interface
passed into Frame
When a graphics layer is created it sends attachRootGraphicsLayer to ChromeClientJS, in addition it will execute WidgetSizeChanged (or WebView may), setNeedsOneShotDrawingSynchronization, scheduleCompositingLayerFlush and scheduleAnimation. These are all passed through to the AcceleratedCompositor on behalf of webkit. Chrome//WebKit//WebCore will only do this if accelerated compositing is turned on by settings and ACCELERATED_COMPOSITING=1 && TEXTURE_MAPPER_GL=1 && TEXTURE_MAPPER=1 in compiler settings.
DEVICE SCALE FACTOR, PAGE SIZE, ETC.Executes setDeviceScaleFactor(float) usually 2 in webkit.js for hide rendering. Also executes viewport size to set size of view. This will cause the frame in both accelerated and non accelerated mode to kick out twice the size of bitmap image when bitblting. However all coordinates are still in logical pixels.
The “layout black box”. This is where the magic happens, we will be re-informed of results through the ChromeClient executed by Chrome
Shoots the created layers and root layers to texture mapper which tiles and uploads them to GL for display, these manage for us scrolling, memory use and other things so we don’t just haphazardly create 20,000 different compositing layers, textures, etc.
paintContents on ChromeClientJS actually draws contents to TextureMapperLayer as the TextureMapperGL interface needs, it’ll request these through paintContents
GraphicsLayer(CREATION)
HostWindowor IPC Channel for WebKit2/Chrome
GraphicsContext3D(GraphicsContext)
TextureMapper sends paintContentsto AcceleratedCompositor, whichin turn manages clearing OpenGL,maintaining buffers/contexts.
Clears buffers,
makes current context.
Chrome/WebKit pushes a graphics layer to ChromeClient
that is created by GraphicsLayerFactory
GraphicsLayerFactory
Created by taking ChromeClientJS that’s held by Frame or global default constructor to create all RenderLayer’s and GraphicsLayers
Note, on some platforms this is part
of ChromeClient
WebView creates the chrome client that isplatform specific, it’s sent to WebCore::Frameand a copy is retained for WebView. WebCore::Frame,WebCore::FrameView, WebCore::Page and a wholehost of other classes run methods on chrome clientwhen specific work needs to be done.
Informs each other of size changes,when graphics layers needs to beflushed, and a whole host of otherthings to sync states.
Pushes textures thatare tiled or full as “composited”layers to GL.
Used for special transformsor accelerated scaling.
Painting / Drawing
cairo (or other drawing library, skia,
CoreAnimation, etc.)
pixman for fast patched drawing optimizations
Image, ImageBufferlibjpegturbo, libpng
(note gif and bmp are built in to webcore)
zlib (decompressing pngs)
FreeType, FontConfigused for font parsing and layout
GraphicsContext(library/platform specific)
WebCore::Frame WebCore::Page
ChromeClient(Created by WebView is passed to Frame/Page
for WebCore to use.)
There’s also coordinated graphics and tiling.
Platform Blit Surface(non-accelerated)
Software Compositing
TextureMapper
GraphicsLayer
When attachRootGraphicsLayer is executed by Chrome the Graphics Layer is passed into accelerated compositor. The compositor is checked to see if its enabled, if not compositing is turned off, if so compositing is turned on.
Non-accelerated, non-composited,
bitable path.
Composited, but not accelerated
path (not bitblted)
WebView creates a device GL and EGL (openglesv2) context via SDL. This context in webkit is globally available once created. It then creates AcceleratedCompositor and does nothing else than hand it to ChromeClientJS. It also makes these the current context and sets the device viewport size (not the GL context size). ContextGL and ContextEGL are hacked to pass specific params to Emscripten to create the right compatible surface, these hacks are wrapped in PLATFORM(JS) Preprocessors
RenderLayerCompositor
RenderLayer
Accelerated, but not composited
bit-blt path.
Composited and accelerated path
Compiled Vertexes & Shaders
Classes compile layout commands into OpenGL Vertex
& Shader Program
WebCore::FrameViewWebCore::Document
Video Codecs
GraphicsLayerTextureMapper.cppGraphicsLayer::create factory ? factory->create :
GraphicsLayerTextureMapper()
ChromeClient->graphicsLayerFactory()(GraphicsLayerFactory passed through from ChromeClient->factory(), if non exists, use default
TextureMapper implementation. RenderLayerBacking Plugins
Layout and painting produce a render tree that is managed by a host of classes. The RenderLayers and RenderLayerTree communicate with render layer compositer to determine the GraphicsLayers that are then passed on through the RenderLayerBacking
glBindTexture() / Canvas / SDL / GLUT / XWindow / DWM / NSOpenGL / etc..
AnimationController
AnimationBaseAnimationControllerPrivate
Document
New
StartWaitTimer
StartWaitStyleAvailable
StartWaitResponse
Looping
Ending
PausedNew
PausedWaitTimer
PausedWaitResponse
PausedWaitStyleAvailable
PausedRun
Done
FillingForwards
Animation state, view
ed as a state machine w
ith enum
m_anim
State
Knows About, and firesAnimation Controller methods as states change.
Element
Knows about and executes stylerecalculations on documents andelements. However it does notactually change the styles value, just whether it should recalculateand potentially layout/render.
Document::updateStyleIfNeededElement::setNeedsStyleRecalc
CompositeAnimation
RenderElement
Knows about and interacts with animation base, unclear
why.
WaitingAnimationSet (An array of AnimationBase)
! Seems to be a list of animations (AnimationBase classes) waiting to be animated, their state is stored in AnimationBase and could potentially become out of sync by being in an array that’s technically not waiting.
RenderStyle
! AnimationController has two paths based on if request animation frame is enabled or not, in addition there is request animation frame timing feature that further branches into a new path confusing how the implementation path flows.
Performs most of its work in AnimationControllerPrivate as a proxy, seems unnecessary and unclear why.
! Performs separate paths for compositing animations, this makes for confusing bugs.
AnimationUpdateBlock(implemented in
AnimationController.h)
! Issues beginAnimationUpdate or endAnimationUpdate simply through its constructor/destructor, very unclear why, and seems to pollute the paths.
animatinon() contains one controller per frame. " Has a circular dependency with
AnimationController, unclear why.
# Runs on a one-shot timer, unclear why.
" Has a circular dependency with AnimationBase, unclear why.
! Implementation hides “AnimationControllerPrivate” rather than implementing AnimationController. Unclear why.
Creates on stack an animation update block letting the deconstructor/constructor fire begin/
end calls to AnimationController. Gets Animation Controller from frame.animation()
Uses the frame reference only to get accessto the frame view class to execute the flushcompositingstateincludingsubframes and otherflush compositing state classes.
Combined with RenderLayerCompositor these do the actual changes to the styles and are called by AnimationController, AnimationBase and Frame/Element.
KeyframeAnimation
KNOWN DESIGN ISSUES:
This system has a race condition if the compositor is flushed or invalidated too quickly (e.g., chrome client calls scheduleLayerFlush on AcceleratedContext.cpp) the animation base’s timer (within AnimationController) fails to remove waiting animations that have already completed within the WaitingAnimationSet. What happens is since there is not a chance for the AnimationController to remove these on its next timer run between the AcceleratedContext’s scheduled layer flushes items within WaitingAnimationSet are thought to be “Waiting” for an animation, but have a m_animState (on AnimationBase) of Ending, Done or other. In other words, the AnimationController thinks that animations that have completed are still waiting for their style because the accelerated compositor is plowing through them too quickly.
The cure for this is to simply think of requests from ChromeClient to flush, invalidate or paint as “suggestions” and prevent them from executing more than 1/60th of a second in addition do not allow more than one flush to be issued at a time (e.g., two timers on separate threads running a flush concurrently).
flushPendingLayerChanges
flushCompositingStateIncludingSubframes
Academic exercises aside...
What can we do now?
FIRST REMEMBER:
• There’s a difference between perceived performance vs.
actual performance (E.g., is your event just firing late?)
• Be careful when optimizing your code; it’s a rabbit hole and sometimes a pitfall (80/20 rule).
Some"rules"of"thumb:"
Avoid"interacFng"with"the"DOM"with"these"paLerns:""• Changing"a"DOM"parameter"(adding,"modifying,"removing"elements)"then"reading"from"another.""This"requires"a"layout"validaFon"/"invalidaFon"since"the"render"has"no"idea"if"the"change"you"made"could"potenFally"cause"a"change"to"the"value"you’re"trying"to"read!"
Some"rules"of"thumb:"
Avoid"incremental"changes"to"DOM"if"you"can"batch"them"together:"
For"instance,"if"you"need"to"create"HTML"elements"in"javascript"using"innerHTML"is"faster"than"using"document.createElement,"that"is"if"you’re"creaFng"nested"or"more"than"one"element."
Some"rules"of"thumb:"
Avoid"JavaScript"that"interacts"with"DOM"funcFons"(vs."strings"or"properFes"on"the"DOM)""• JavaScript"can"safely"opFmize"more"if"you’re"modifying"a"string"rather"than"execuFng"a"funcFon."
• Again,"innerHTML"does"not"cause"a"JS"opFmizaFon"pause"(if"you’re"wriFng,"appending"but"not"reading),"but"document.createElement"will."
Some"rules"of"thumb:"
Give"the"browser"as"much"informaFon"about"animaFons"as"you"can."Use"declaraFve"animaFon"styles"in"CSS.""• Use"animaFon"key"frames"and"transiFon"in"CSS.""• Use"will]change"CSS"property"for"properFes"that"frequently"change"(not"yet"implemented,"but"SOON!)"
• These"can"be"pre]compiled"by"the"RenderLayer"prior"to"the"animaFon"ever"being"executed!"
Some"rules"of"thumb:"
Use"linear"transformaFons"rather"than"standard"CSS"style"rules"to"change"the"posiFon"or"scale.""• Using"CSS"transform()"you"can"apply"linear"transformaFons"that"can"be"enFrely"done"in"the"compositor"and"GPU."
• Changing"the"X/Y"(lec/top)"or"width/height"will"cause"a"reflow/relayout"and"a"new"texture"to"upload"in"the"GPU."
Some"rules"of"thumb:"
Use"requestAnimaFonFrame"whenever"possible.""• requestAnimaFonFrame"prevents"layout"thrashing"as"it’s"
explicitly"done"before"the"next"layout"loop"and"acer"composiFng."
• The"compositor"is"aware"of"requestAnimaFonFrame"and"lets"you"modify"elements"prior"to"composiFng"frames."
• This"can"significantly"prevent"you"from"interrupFng"a"layout"and"causing"a"new"one"from"running."
Some"rules"of"thumb:"
Simplify"your"CSS""• Do"not"use"overtly"complex"selectors"• Duplicate"styles"must"be"resolved"and"increase"layout"Fme."• This"has"a"r*e"growth"rate!"(r=rules,"e=elements),"reducing"
either"will"lower"your"layout"Fme."• Rules"have"a"z*r*e"growth"rate!"(z=number"of"selector"
parameters)""
Some"rules"of"thumb:"
Do"not"add"CSS"rules"or"explicitly"set"style"parameters"acer"a"document"load.""• Browsers"can"cache"possible"states"(or"visited"style"states),"
but"not"when"its"dynamically"set."• Create"various"possible"“style"states”"for"each"element"and"
switch"the"class"on"the"element"rather"than"sekng"the"style"aLribute."
Some"rules"of"thumb:"
Avoid,"if"you"can,"using"libraries"and"frameworks""Best"pracFce:"• Prototype"with"libraries,"then"profile"and"begin"removing/
replacing"funcFonality"with"a"smaller"limited"set/needs."• Most"libraries"and"frameworks"are"built"for"ease"of"use,"and"
not"performance."
Some"rules"of"thumb:"
Javascript"Memory"Leaks"are"easy"to"create.""It’s"fairly"easy"to"accidently"have"an"object"refer"to"another"object"that"refers"back"to"itself.""Becareful"(and"aware)"of"these"corner"cases.""• Use"closures"and"avoid"objects"that"take"in"other"objects."• Avoid"defining"variables"in"the"global"scope"
Some"rules"of"thumb:"
Use"prototypes"rather"than"user]defined"objects.""var"obj"="{foo:funcFon()"{"console.log(‘hello’);"}""Create"10,000"of"these"and"you’ll"have"10,000"definiFons"AND"instances.""Careful!"
Some"rules"of"thumb:"
Don’t"fear"iframes""• If"you"have"complex"controls"(visjs,"d3?)"that"may"need"their"
own"UI"loop"consider"placing"them"into"iframes"• iframes"give"you"a"new"thread"and"potenFally"a"new"process!"• Useful,"but"don’t"overdo"it,"iframes"are"heavy"weight."
Some"rules"of"thumb:"
Be"careful"mixing"contents""• Plugins,"video,"webgl,"CSS"animaFons"and"tradiFonal"DOM"
rendering"all"run"on"separate"contexts."• They’re"pulled"together"via"render"layers"and"graphics"layers."• The"more"contexts"you"introduce"the"more"complex"the"
synchronizaFon"between"them"can"become."• Contexts"!="composiFng"layers"(but"can"someFmes)"
Some"rules"of"thumb:"
Avoid"listening"to"high]throughput"events""• A"common"performance"mistake"is"not"removing"event"
listeners"on"DOM"elements"or"reacFng"to"the"DOM"event"in"the"event"thread."
• High]throughput"events"such"as"mousemove,"touchmove"and"scroll"should"very"rarely"be"used."
• If"you"need"to"use"these"cache"the"result"and"animate"in"requestAnimaFonFrame,"NOT"the"event."
Some"rules"of"thumb:"
Be"conservaFve"when"forcing"a"composiFng"layer""(e.g.,"transform3D(0,0,0)"or"translateZ(0))"
"• CreaFng"a"graphics"layer/render"layer"is"expensive."• Generally"the"rendering"sub]system"is"very"efficient"at"figuring"out"what"
should"and"shouldn’t"be"layers."• It"makes"very"liLle"sense"to"force"composiFng"layers"in"a""nested"manner,"
be"careful"doing"this!"• It"makes"very"liLle"sense"to"force"composiFng"layers"if"they"don’t"have"a"
linear"transformaFon"or"mask"(e.g.,"overflow:scroll)"
CSS"styles"that"cause"paints"
Repaints"are"the"most"expensive"operaFon,"and"should"ALWAYS"be"declaraFve"(when"possible..)"
color " " " "border]style"visibility " " " "background"text]decoraFon " "background]image"background]posiFon "background]repeat"outline]color " " "outline"outline]style " " "border]radius"outline]width " " "box]shadow"background]size"
CSS"styles"that"cause"layout"
Layouts"are"lighter"than"repaints"but"can"(in"certain"circumstances)"trigger"a"repaint"as"well!"
width " "height " "overflow]y "font]weight"padding " "margin " "display " "border]width"border " "top " " "posiFon " "font]size"float" " "text]align" "overflow " "lec"font]family "line]height "verFcal]align "right"clear" " "white]space "boLom " "min]height"
CSS"styles"that"cause"a"composite"
Composites"are"generally"not"expensive,"declaraFve"or"imperaFve"style"declaraFons"are"fine"(note,"not"all"of"these"cause"a"NEW"composite"layer,"but"cause"a"composite"of"exisFng"layers):"
opacity " " " "]webkit]user]select"cursor " " " "]webkit]transform"z]index " " " "transform(scale)"transform3D " " "transformZ()"transform(rotate)"
Identifying Issues: Jank
The overuse of graphics layers causing pages to take excessively long to composite:
Cause: Composite CSS calls used in a nested pattern. Diagnose: Large composite times.
Cure: Remove nested transform3d/transformZ, reduce linear transforms, remove scroll event listeners, remove opacity or CSS composite filters.
http://wesleyhales.com/blog/2013/10/26/Jank-Busting-Apples-Home-Page/
Identifying Issues: Paint Storm
Cause: Changing a paint CSS style on a high-throughput event or circular flip/flopping a CSS paint style.
Diagnose: Very frequent paint->composite in frames. Cure: Find where paint CSS styles are changing.
Identifying Issues: Layout Trashing
Common Cause: Reading layout DOM properties or modifying DOM.
Diagnose: Frequent but short layout requests without paint/composite after. Cure: You’re most likely reading a CSS property or DOM property A LOT in your JavaScript code (perhaps in a tight loop?)
// els is an array of elementsfor(var i = 0; i < els.length; i += 1){ var w = someOtherElement.offsetWidth / 3; els[i].style.width = w + 'px';}
IdenFfying"Issues:"JS"Memory"Leaks"
Common%Cause:""Circular"references"or"code"holding"onto"object"references"for"longer"than"necessary.""Diagnose:"Use"DOM"shim’s"and"run"your"code"in"node"–expose_gc"opFon,"use"“gc()”"to"force"garbage"collecFon.""Cure:"Use"binary]search"type"methods"for"isolaFng"the"offending"code"and"fix/refactor."
Mobile"App"Best"PracFces"
• Don’t use touch move events on scrollable items. • Nest overflow elements to produce scroll effects • Overflow elements should be in 500px intervals
– WebKit uses tiling for composite layers, each tile is 500px.
• Use absolute positioning/transforms where ever possible. • Avoid nesting elements • Less is more when listening to events • Pre-paint items soon to show up, use display:none to hide. • Mobile has more memory to lend, less GPU/CPU.
– Declarative style CSS animations are key here. – Be careful when forcing a compositing layer with transforms.