ci_conf 2012: scaling - chris miller

21
Going Big: Scalability

Upload: the-huffington-post-tech-team

Post on 14-Aug-2015

251 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CI_CONF 2012: Scaling - Chris Miller

Going Big: Scalability

Page 2: CI_CONF 2012: Scaling - Chris Miller

Who am I?

• Chris Miller

• Huffington Post - Senior Developer• CMS platform and API

• Started in systems/network admin before code

Page 3: CI_CONF 2012: Scaling - Chris Miller

What is Huffington Post?

• #87 most popular site in the world (Alexa)• #3 most popular news site in world (Alexa)• #19 most popular US site (Alexa)

• More traffic than nytimes.com

Page 4: CI_CONF 2012: Scaling - Chris Miller

Our Platform: Today

• Everything! No, really.

• Perl: CMS core• PHP “layer” integrated on top of Perl code• MySQL data storage• MongoDB for comments storage• Hadoop for internal statistical analysis• Memcache for lightweight caching• Redis for more structured data types• Varnish for caching!

Page 5: CI_CONF 2012: Scaling - Chris Miller

Our Platform: Tomorrow

• Re-think tools and platform from ground up• Building new API

– Yes, OAuth 2.0!– Complete REST approach– Will be public!

• We can’t re-write everything at once, so the API build has 4 phases:– Build “bridge” middleware to allow access to existing functionality – Refactor backend edit/admin tools– Refactor frontend to use API– Transparently, and calmly, refactor old code while maintaining API

interfaces

Page 6: CI_CONF 2012: Scaling - Chris Miller

So what about CI?

• New API is built on CodeIgniter– Using Phil’s REST library as a starting point

• Thanks Phil!

• Backend editorial tools are being built on CI

• We love CI– But it isn’t our only framework– Different tools work better for different teams– We use what works. You should too.

Page 7: CI_CONF 2012: Scaling - Chris Miller

How we scale

• CDN: Akamai• 80%+ hit rate• Amazon S3 for origin of static files

• Basic page layout/content is generated to flat file• These contain some dynamic content, in PHP• By having the basic page as a flat file, it's less overhead to

load• It also means for certain changes, we have to "regenerate"

the page. Ugh.

Page 8: CI_CONF 2012: Scaling - Chris Miller

Varnish

• HTTP caching reverse proxy (“HTTP Accelerator”)

• Caching layer in front of your web server

• Stores complete responses in memory• If request exists, serves from memory– Otherwise, forwards to web server, and then caches

• Works nicely with Linux Kernel to delegate memory allocation and management to the OS, where it belongs

Page 9: CI_CONF 2012: Scaling - Chris Miller

Controlling Varnish• Set custom TTLs for content:if (beresp.http.X-HP-Cache-Control ~ "s-maxage") {

set beresp.http.X-HP-Cache-Control = regsub(beresp.http.X-HP-Cache-Control, "^.*s-maxage=([0-9]+).*", "\1"); // set the ttl. C{ char *ttl; ttl = VRT_GetHdr(sp, HDR_BERESP, "\023X-HP-Cache-Control:"); VRT_l_beresp_ttl(sp, atoi(ttl)); }C set beresp.http.X-Cacheable = "CUSTOM: " + beresp.ttl ;

} elsif (beresp.http.X-HP-Cache-Control ~ "(no-cache|private)" || beresp.http.pragma ~ "no-cache") {

set beresp.ttl = 0s; set beresp.http.X-Cacheable = "NO-CACHE";

} else {

set beresp.http.X-Cacheable = "DEFAULT: 30s"; set beresp.ttl = 30s;

}

Page 10: CI_CONF 2012: Scaling - Chris Miller

Controlling Varnish

• Refreshing content

sub process_refresh_requests {

if (req.request == "REFRESH") { set req.request = "GET"; set req.hash_always_miss = true; }

}

• This is invoked early in the vcl_recvvcl_recv method

Page 11: CI_CONF 2012: Scaling - Chris Miller

Edge Side Includes• Include cached content blocks into pages

<html><body>

<esi:include src="http://example.com/my_page1.html” alt="http://example.com/my_page2.html" onerror="continue” />

</body></html>

Page 12: CI_CONF 2012: Scaling - Chris Miller

Edge Side Includes

• How to use ESI:– Make complicated blocks independently-

accessible URIs– Create a “template” file with ESI includes to bring

the page together• Why this is powerful– If multiple pages use different combinations of

page components, some may already be cached– Reduces amount of times entire page must be

served; Serve only components needed

Page 13: CI_CONF 2012: Scaling - Chris Miller

Varnish Tricks

• Intelligently purge the cache when your content changes– Allows you to increase TTL without fear of caching

outdated content

if (req.request == "PURGE") { if (!client.ip ~ purgers) { error 405 "Method not allowed"; } return (lookup);}

Page 14: CI_CONF 2012: Scaling - Chris Miller

Other Scaling Tips

• Hardware SSL offloading is your friend• Consider mod_php– CGI has huge overhead– CGI/SuExec has huge security advantages– FastCGI is a happy-medium for some

Page 15: CI_CONF 2012: Scaling - Chris Miller

Other Scaling Tips

• Don’t try to do everything on one server/cluster– Splitting your application is ok– 1 cluster for frontend, 1 server/cluster for backend, etc.

• Keep an open mind about technologies, platforms, and tools

Page 16: CI_CONF 2012: Scaling - Chris Miller

One More Thing…

(sorry, I couldn’t resist)

Page 17: CI_CONF 2012: Scaling - Chris Miller

Guilds!• What a guild is:– Groups of people around a topic– Membership/participating is encouraged, but not

required– Think of it as an internal Meetup

• Join to learn new things• Join to talk about things you are interested in

• Examples: PHP, Front End, Python, Ruby, Management, Platform/Architecture, Big Data, etc…

Page 18: CI_CONF 2012: Scaling - Chris Miller

Guilds!

• Experts to solve technology-specific problems– Example: Front-end swat team to improve page load

time due to slow/too much JS

• Collectively give back to the community around your technology

• Help others learn, and learn from others

• Meet people on other teams

Page 19: CI_CONF 2012: Scaling - Chris Miller

Guilds!

• Try it out

Page 20: CI_CONF 2012: Scaling - Chris Miller

¿Preguntas?

Questions?

Perguntas?

Page 21: CI_CONF 2012: Scaling - Chris Miller

Chris Miller

[email protected]

@ee99ee

(P.S. – We’re hiring in NYC)