the power of esi and http cache for performant page delivery
DESCRIPTION
For web sites with frequent content changes, like news portals, an optimal cache strategy is crucial for serving millions of page views. To just cache every page statically is usually not an option as editors want to see the changes immediately. With the new version of eZ Publish CMS, which is based on Symfony PHP framework, content caching is based on HTTP Cache. This enables much tighter integration with reverse proxies like Varnish and better control of cached pages. Also, Symfony provides ESI capabilities for controlling reverse proxy cache even on block level so invalidating a whole page is not expensive.TRANSCRIPT
The power of ESI and HTTP Cache for
performant page delivery
Ivo Lukač @ ZgPHP Conference 2014, Zagreb
www.netgenlabs.com
Warning!
• Not so much code in the slides
• This is more about using right tools for the job and integrating those tools
www.netgenlabs.com
Netgen
• Web technology company, established 2002
• Focused on middle and big size web projects based on eZ Publish CMS and Symfony
• eZ Business partner from 2007
• Working for eZ Systems on core eZ ver 5 stuff from 2012
• Very active in the community
• Organised 3rd in a row eZ/PHP Summer Camp, more than 90 people from across EU this September in Rovinj
www.netgenlabs.com
The problem
• Middle and big size web sites need to be cached on various levels as much as possible
• But the specific cache needs to be invalidated when there is some change in the content
• So simple “cache everything with TTL in days“ is not an option if there are frequent content changes on sites that are also cross linked a lot (e.g. News, Media, Portals, etc)
www.netgenlabs.com
Agenda• How to cache content pages and invalidate when necessary
• How to cache parts of content pages and invalidate when necessary
Focus on optimal cache strategy: invalidate only when there is a change in content
* We are not covering caching of static assets like css, javascript and images. These are usually cached with long TTL and versioned with hash in the url
www.netgenlabs.com
Part 1
• How to cache content pages and invalidate when necessary:- use a powerful CMS- with a good framework- and a fast reverse proxy- which all support W3C standards :)
www.netgenlabs.com
eZ Publish CMS
• Powerful enterprise open source CMS
• Developed and supported by eZ Systems, Norway
• Symfony full stack from eZ version 5.0
• Latest versions: 5.3 EE and 2014.07 CE
www.netgenlabs.com
Content caching in eZ
• eZ Publish has an integrated content caching system that figures out when to invalidate what page cache: Smart View Cache
• In version 4 cache was stored to files locally
• In version 5 it uses HTTP Cache implementation from Symfony to integrate reverse proxies for content caching
www.netgenlabs.com
Smart View Cache rules example
[folder]
DependentClassIdentifier[]=frontpage
ClearCacheMethod[]=object
ClearCacheMethod[]=relating
[article]
DependentClassIdentifier[]=folder
DependentClassIdentifier[]=frontpage
ClearCacheMethod[]=object
ClearCacheMethod[]=relating
www.netgenlabs.com
Symfony
• Powerful and modern PHP framework
• Separated in many components
• Big community
• Excellent documentation
• Supported by Sensio Labs
• Latest version: 2.5.5
www.netgenlabs.com
HTTP Cache
• Part of RFC 2616 (HTTP/1.1)
• 2 models of cache handling:
• Expiration: “Cache-Control”, “Expires”
• Validation: “ETag", “Last-modified”(not implemented with eZ 5 yet as it doesn’t behave very well in multiuser scenario)
www.netgenlabs.com
Expiration with Cache-Control
• setting headers to public instead of private to cache pages in reverse proxy (private is default in Symfony)
• setting max-age header (or specifically s-maxage) to some bigger number (days or even more)
www.netgenlabs.com
Cache miss
credits to http://tomayko.com/writings/things-caches-do
www.netgenlabs.com
Cache hit
credits to http://tomayko.com/writings/things-caches-do
www.netgenlabs.com
use Symfony\Component\HttpFoundation\Response;$response = new Response();
// mark the response as either public or private$response->setPublic();$response->setPrivate();
// set the private or shared max age$response->setMaxAge(600);$response->setSharedMaxAge(600);
In Symfony?
www.netgenlabs.com
But what about invalidation?
• as the Etag is not implemented yet a different mechanism is used to invalidate cache
• custom header X-Location-Id: 123 is set to mark the page with location id 123
• eZ content caching mechanism calls HTTP command PURGE against the reverse proxy when its necessary to invalidate a specific cache
www.netgenlabs.com
How eZ does it?# stock ViewController::viewLocation()
$response->setPublic();
if ( $this->getParameter( 'content.ttl_cache' ) === true )
{
$response->setSharedMaxAge( $this->getParameter( 'content.default_ttl' ));
}
if ( $request->headers->has( 'X-User-Hash' ) )
{
$response->setVary( 'X-User-Hash' );
}
$response->headers->set( 'X-Location-Id', $locationId );
www.netgenlabs.com
Reverse proxy becomes very
important• Reverse proxy AKA web accelerator
or gateway cache
• preferred with Symfony and eZ Publish: Varnish Cache
www.netgenlabs.com
Varnish Cache
• HTTP reverse proxy (web accelerator)
• Performant and configurable
• Supports most important parts of ESI
• Supported by Varnish Software
• Latest version: 4.0
www.netgenlabs.com
Varnish Cache setup• we need to support PURGE call
sub vcl_recv {
if (req.request == "PURGE") { return(lookup); }
}
sub vcl_hit {
if (req.request == "PURGE") {
set obj.ttl = 0s;
error 200 "Purged";
}
}
sub vcl_miss {
if (req.request == "PURGE") { error 404 "Not purged”; }
}
www.netgenlabs.com
Part 2
• How to cache parts of content pages and invalidate when necessary:- use a powerful CMS- with a good framework- and a fast reverse proxy- which all support ESI :)
www.netgenlabs.com
Block caching in eZ
• Repeatable parts of a page need special attention (header, menus, footer, sidebars, etc)
• In eZ version 4 cache blocks were stored to files locally
• In eZ version 5 it uses the same HTTP Cache principles but with ESI powered fragments
www.netgenlabs.com
Fragments in Symfony
• Enable sub requests to be callable via URI
• Trust only certain IPs (reverse proxies) to call it
• Could be used directly from web clients (from javascript with hinclude.js) but then the call must be signed
# app/config/config.yml
framework:
esi: { enabled: true }
fragments: { path: /_fragment }
trusted_proxies: [127.0.0.1, 10.0.0.0/8]
www.netgenlabs.com
Edge Side Includes
• Markup protocol for proxies to cache partial pages
• Specified by Akamai
• Implemented by many proxies, including Varnish
• The most simple implementation:
<esi:include src=“http://foo.bar/1.html"/>
www.netgenlabs.com
Include ESI block
• render sub request as ESI in Symfony (will generate ESI markup if stated in the request):
{{ render_esi( controller( 'aController:someAction', { 'foo': 'bar' } )) }}
www.netgenlabs.com
Cache control on block level
• as each block is a sub request we can set different headers (private or public, ttl, etc.)
• this can be per block type (from settings) or even per block (parameter set by editor)
• when it makes sense X-Location-Id header is set automatically so the cache can be invalidated
www.netgenlabs.com
Varnish Cache setup
• when Varnish request a page from the web server it needs to tell that it wants ESI markup instead of rendered HTML
sub vcl_recv {
// Add a header to announce ESI support.
set req.http.Surrogate-Capability = "abc=ESI/1.0";
}
www.netgenlabs.com
Result?
• common blocks have long TTL and get rarely executed on backend
• they are mostly just embedded from cache when the reverse proxy builds the page with ESI tags
• possible different cache strategies for different pages, zones and blocks across the whole site
www.netgenlabs.com
Caution
• A rule of thumb is not to have more than 10 sub requests in one page render
• As there might be lot of blocks on the page we create zones of blocks that are rendered as sub request with a small TTL (that way we spread the sub requests over more page renderings)
www.netgenlabs.com
Achieved scenario
• When an editor creates some new content (article) in a news section (folder), CMS will automatically purge the cache of the folder and the latest news block on the frontage so that the new article shows up instantly on both pages without touching other cache
Questions now or later
ilukac.com/twitter
ilukac.com/facebook
ilukac.com/gplus
ilukac.com/linkedin