improving the performance of your web app
DESCRIPTION
These are the slides from my FOWA workshop on how to scale your web apps.TRANSCRIPT
Improving the Performance of your
Web ApplicationJoe Stump, Lead Architect, Digg.com
Introductions
Users want access to all of their crap at all times. I, personally, don’t find your dog funny or cute, but I’ll be damned if I’m the one who’ll stand in the way of you posting it and others consuming it.
“Web 2.0 sucks (for scaling).”Joe Stump, Lead Architect, Digg.com
Backend Considerations Language considerations Scaling out Caching strategies Content storage and delivery Parallel data requests Near time data processing Partitioning data
Frontend Considerations
Reduce HTTP requests Avoid inline JavaScript and CSS Compression and Minification Learn to love HTTP/1.1
“PHP doesn’t scale.”Cal Henderson, Director of Development, Flickr.com
Languages don’t scale Bytecode caching (PHP, Python, etc) Robust library & driver support Active developer communities
What language do you use? Why? Does it help you or hurt to use it?
Discussion!
Your mom lied; don’t share.
Decentralize data, storage, processing, etc. Increased redundancy Scaling becomes simple; add more boxes
Scaling Up
Scaling Up
Scaling Up
Scaling Out
Scaling Out
Scaling Out
Scaling Out
Scaling Out
Scaling Out
Scaling Out
Scaling Out
Scaling Out
Scaling Out
How do I scale easily?
1.Caching
2.Caching
3.Caching!
What are my options?
Disk based caching (e.g. Cache_Lite) In memory caching (e.g. APC, Memcached) Cloud caching (e.g. MogileFS, S3)
Disk based caching Stupid simple Cheap Fairly easy to scale out Dynamic images Slower than others Use fast disks! RAM disks are faster
APC (PHP)
Bytecode caching In memory user cache Insanely fast Not centralized or shared
Memcache
If you’re not using this you’re crazy Easy to set up and use Insanely fast over the network Scales to insane heights Failover, widely supported, etc. Centralized and shared across site
Mogile FS
File and data store Runs over WebDAV Scales out infinitely (in theory) Serialize data, store in file Centralized and shared across site
Amazon S3
File and data store Runs over HTTP Scales out infinitely (in theory) Serialize data, store in file Centralized and shared across site Costs money Widely supported in all languages Check out ThruDB
Are you using caching? Why not? If so, what’s your strategy?
Discussion!
Content Storage/Delivery
What are your storage needs? Is it critical YOU store them? How costly is it to store in-house? Can you do it for free? (YAY! Mooching!)
i can has free storage?
YouTube for video Scribd for documents Flickr for images
Cloud Services (S3)
Simple to get up and running No hardware maintenance Costs money, but not as much as you think
NFS
Simple to set up and get running Costs money, requires colocation, etc. Does. Not. Scale. Did I mention it doesn’t scale? Stop gap solution at best
Mogile FS
Somewhat complicated to set up Costs money, requires colocation, etc. Scales exceptionally well Used at Digg, LiveJournal, others Check out File_Mogile by Digg (PEAR)
Roll Your Own
File storage IS your business Highly specialized and customized Costs money, requires colocation, etc. Last resort
CDN
Completely outsource it Costs a ton of money Out of your control Scales and scales and scales
What are you using for storage? What’s worked for you? What’s failed epically?
Discussion!
Parallel Data Requests
Access your data in parallel Make data access asynchronous (WHAT?!) Loosely couple your data access layer All for the low, low price of FREE!*
*Offer only available for hardcore nerds looking for street cred.
HTTP
Parallel Asynchronous Non-blocking Loosely coupled Free foot massages!
HTTP
Gearman
Parallel Asynchronous Scales well
Which format to use for exchange? Anyone doing this already? Amazon, Google, Yahoo!
Discussion!
Near time processing
Does this need to be done NOW? Offload to background processes Offloading must be a no op Feeds, Facebook, crawling, etc.
Cron
Run every minute or two Simple Great for batch jobs Not decentralized, locking issues
Gearman
Fire and forget Simple Scales well Digg Images Nearly instant Decentralized No guarantees
Queues
Grid Engine by Sun Starling by Twitter Others?
Amzon’s EC2
Near limitless computing resources Remember; don’t share Awesome for bots, crawling, etc.
http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/
What’s low(er) priority? Where would you implement this?
Discussion!
Partitioning Data
Horizontal v. Vertical Not all data lives in a single place Hash records to partitions App smart / logical sharding
Horizontal
Usersid int(11)username char(15)password char(15)email char(45)
Usersid int(11)username char(15)password char(15)email char(45)
Usersid int(11)username char(15)password char(15)email char(45)
192.168.0.1 192.168.0.3192.168.0.2
Hashing your dataoh hai! were’s mai dataz?!
How?
Put 10,000 users per partition Partition users alphabetically Partition home listings by zip code Partition products by SKU
Vertical
Usersid int(11)username char(15)password char(15)email char(45)
UsersPrfid int(11)fname char(50)lname char(50)url char(255)
UsersStgid int(11)cmts_pg tinyint(2)cmts_lvl tinyint(1)cmts_prf tinyint(1)
192.168.0.1 192.168.0.3192.168.0.2
Why?
Avoid altering large tables Save time during insert Many small tables v. one large table Lazy loading of rarely used data
Natural partitions in your data? How would you hash your data?
Discussion!
Reduce HTTP Requests
Bundle JavaScript and CSS Use sprites for images Reduce images / outside objects
Reduce HTTP Requests
Bundle JavaScript and CSS Use sprites for images Reduce images / outside objects
Avoid inline JS/CSS
External = Cached
Inline = Not Cached
Compression / Minify
Enable Gzip compression sitewide Use minification software on JS jQuery/Prototype Minified
Learn to Love HTTP/1.1
Cache-Control: public/private Connection: close Expires: Thu, 28 Feb 2008 16:00:00 GMT
Conclusions
Share nothing, decentralize, redundancy Caching, caching, caching, caching Reduce, recycle and reuse
Resources
High Performance Web SitesEssential Knowledge for Front-End Engineersby Steve Souders
Serving JavaScript Fasthttp://www.thinkvitamin.com/features/webapps/serving-javascript-fastby Cal Henderson, Director of Development, Flickr.com
Questions?!
Contact/Flame Me
Joe [email protected]://joestump.net