pycon 2011 scaling disqus

41
DISQUS Jason Yan @jasonyan David Cramer @zeeg Python at 400 500 million visitors Got feedback? Use hashtag #sckrw Sunday, March 13, 2011

Upload: zeeg

Post on 08-Sep-2014

21.604 views

Category:

Technology


0 download

DESCRIPTION

Disqus talks about how they scale their Python web application to over 500 million visitors a month. Video is available here: http://pycon.blip.tv/file/4880330/

TRANSCRIPT

Page 1: PyCon 2011 Scaling Disqus

DISQUSJason Yan@jasonyan

David Cramer@zeeg

Python at 400 500 million visitors

Got feedback? Use hashtag #sckrw

Sunday, March 13, 2011

Page 2: PyCon 2011 Scaling Disqus

Agenda

• What is DISQUS?

• An Overview of the Infrastructure• Iterative Development and Deployment• Why We Love Python

Sunday, March 13, 2011

Page 3: PyCon 2011 Scaling Disqus

We are a comment system with an emphasis on connecting communities

http://disqus.com/about/

dis·cuss • dĭ-skŭs'

What is DISQUS?

Sunday, March 13, 2011

Page 4: PyCon 2011 Scaling Disqus

Embeddable Comments

Sunday, March 13, 2011

Page 5: PyCon 2011 Scaling Disqus

A Brief History

Sunday, March 13, 2011

Page 6: PyCon 2011 Scaling Disqus

Startup-ish

• Founded just about 4 years ago• 16 employees, 8 engineers• Tra!c increasing 15-20% a month• Flat organizational structure, every

engineer is a product manager• Fast turnaround, new feature launches

every week (sometimes daily)

Sunday, March 13, 2011

Page 7: PyCon 2011 Scaling Disqus

Tra!c

0M

125M

250M

375M

500M

Number of Visitors

March 2008 through March 2011

Sunday, March 13, 2011

Page 8: PyCon 2011 Scaling Disqus

DjangoCon 2010

• 17,000 requests/second peak

• 450,000 websites

• 15 million profiles

• 75 million comments

• 250 million visitors

Sunday, March 13, 2011

Page 9: PyCon 2011 Scaling Disqus

Six Months Later

• 25,000 requests/second peak

• 700,000 websites

• 30 million profiles

• 170 million comments

• 500 million visitors

• 17,000 requests/second peak

• 450,000 websites

• 15 million profiles

• 75 million comments

• 250 million visitors

Sunday, March 13, 2011

Page 10: PyCon 2011 Scaling Disqus

Six Months Later

• September 2010: 250 million uniques

• March 2011: 500 million uniques

• Handling over 2x the tra!c

Sunday, March 13, 2011

Page 11: PyCon 2011 Scaling Disqus

Six Months Later

• September 2010: ~100 servers• March 2011: ~100 servers

• Scale diagonally

Sunday, March 13, 2011

Page 12: PyCon 2011 Scaling Disqus

Scaling Diagonally

• We still rent hardware, so there is no “commodity hardware”

• Cheaper to upgrade

• Everything is redundant• Partition data where you need to, scale

partitions vertically

• Upgrade hardware (more RAM, more drives, more cores)

• Python apps tend to be CPU bound

Sunday, March 13, 2011

Page 13: PyCon 2011 Scaling Disqus

Infrastructure

• 35% Web Servers (Apache + mod_wsgi)

• 15% Utility Servers (Python scripts, background workers)

• 20% Databases (PostgreSQL, Redis, Membase)

• 20% Load Balancing / High Availability (HAProxy + Heartbeat)

• 10% Caching servers (Memcached, Varnish)

• Half of our servers run Python

Sunday, March 13, 2011

Page 14: PyCon 2011 Scaling Disqus

• Use what you’re comfortable with• Apache + mod_wsgi vs nginx + uWSGI

• Bottleneck is in the application

Python Web Servers

mod_wsgi

uWSGI

0 200 400 600

req/sec

Min Avg Max

015.030.045.060.0

mod_wsgi uWSGI

Memory

Sunday, March 13, 2011

Page 15: PyCon 2011 Scaling Disqus

Background Workers

• Lots of tasks that don’t need to be done in web application process:

• Crawling URLs

• Updating avatars

• Email notifications

• Analytics

• Counters

Sunday, March 13, 2011

Page 16: PyCon 2011 Scaling Disqus

Background Workers (cont’d)

• Most jobs are I/O bound• Slow external calls

• Twitter is slow

• Facebook is slow

• Could parallelize with multiple processes, but...

Sunday, March 13, 2011

Page 17: PyCon 2011 Scaling Disqus

Background Workers (cont’d)

• Waste of memory

• Use non-blocking I/O• Celery 2.2 adds support for gevent/

eventlet

Sunday, March 13, 2011

Page 18: PyCon 2011 Scaling Disqus

Monitoring

• Application side: Graphite• Real-time(ish) graphing

• Django front-end, Python backend

• Etsy’s StatsD proxy to Graphite

• UDP (fire and forget)

• Batches updates

Sunday, March 13, 2011

Page 19: PyCon 2011 Scaling Disqus

Monitoring

• Track application metrics

• Errors, exceptions

• New comments, users, sites, etc.

• Anything

Sunday, March 13, 2011

Page 20: PyCon 2011 Scaling Disqus

Monitoring

• Check out Etsy’s posts:

• Measure Anything, Measure Everything http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/

• Tracking Every Release http://codeascraft.etsy.com/2010/12/08/track-every-release/

Sunday, March 13, 2011

Page 21: PyCon 2011 Scaling Disqus

What about the code?

Sunday, March 13, 2011

Page 22: PyCon 2011 Scaling Disqus

Powered By Django

Sunday, March 13, 2011

Page 23: PyCon 2011 Scaling Disqus

Which means...

• Largest Django-powered web application

• We fork, and even sometimes monkey patch to make it scale to our needs

• Fortunately, we don’t have to do too much (Yay, Django!)

• Unfortunately, we can’t use the whole of the Django internal components (and if we do, we do it in atypical ways)

Sunday, March 13, 2011

Page 24: PyCon 2011 Scaling Disqus

Iterative DevelopmentRelease Early Release Often

Sunday, March 13, 2011

Page 25: PyCon 2011 Scaling Disqus

Iterating Quickly

• Abstracting our application environment

• Less dependancies locally• Rely on CI for dependency coverage

• Heavy use of open source packages• No NIH syndrome

• Deploy frequently, 3-7 times a day

• Lots of branches, but master is “stable”• Realtime reporting on exceptions, metrics

• Our test suite is the main blocker (slow)

Sunday, March 13, 2011

Page 26: PyCon 2011 Scaling Disqus

Dealing with Deploys

Sunday, March 13, 2011

Page 27: PyCon 2011 Scaling Disqus

Gargoyle

Being users of our product, we actively use early versions of features before public release

Deploy features to portions of a user base at a time to ensure smooth, measurable releases

Sunday, March 13, 2011

Page 28: PyCon 2011 Scaling Disqus

The Deployment Problem

• Make some changes locally

• Run a subset of the test suite• Push your commits• CI server begins running tests

• ....

Sunday, March 13, 2011

Page 29: PyCon 2011 Scaling Disqus

Waiting on the test suite...

Sunday, March 13, 2011

Page 30: PyCon 2011 Scaling Disqus

Rinse and Repeat

• 30 minutes later tests fail, start over• Finally, deploy to a subset of servers

• Open Sentry (our exception logger)

• Monitor Graphite• Deploy to 35 servers (~8 minutes)

• Full rollback in < 30 seconds

Sunday, March 13, 2011

Page 31: PyCon 2011 Scaling Disqus

Wait, Sentry?

Sunday, March 13, 2011

Page 32: PyCon 2011 Scaling Disqus

Testing

Sunday, March 13, 2011

Page 33: PyCon 2011 Scaling Disqus

Testing Code

• Test suite takes around 25 minutes usually• “Stuck” with Hudson (or Jenkins)

• Most tightly integrated plugins are geared towards Java developers

• Which framework do we use?

• unittest(2), nose, doctests, LETTUCE?

• We use unittest and nose• Need to report code coverage, speed of

tests, pylint (or pyflakes)

Sunday, March 13, 2011

Page 34: PyCon 2011 Scaling Disqus

We Love Python

Sunday, March 13, 2011

Page 35: PyCon 2011 Scaling Disqus

Love-ish

• Many of us started with PHP or Rails• Clean syntax, clear standards

• All languages need PEP8.py and PyFlakes

• Interpreted, fast... enough

• Very easy to learn• We all started by learning Django first,

then Python

Sunday, March 13, 2011

Page 36: PyCon 2011 Scaling Disqus

Haters Gonna HateIf you could choose one thing in

Python to hate on...

Sunday, March 13, 2011

Page 37: PyCon 2011 Scaling Disqus

Better package management

Sunday, March 13, 2011

Page 38: PyCon 2011 Scaling Disqus

What can we do?

• Too many forks, too many frameworks• We need less clones, and more combined

e"ort

• Improving existing Python solutions

• More Python solutions for existing products

Sunday, March 13, 2011

Page 39: PyCon 2011 Scaling Disqus

Python Rocks!

Sunday, March 13, 2011

Page 40: PyCon 2011 Scaling Disqus

DISQUSQuestions?

psst, we’re [email protected]

Sunday, March 13, 2011

Page 41: PyCon 2011 Scaling Disqus

References

• Sentry (our exception tracking tool)http://github.com/dcramer/django-sentry

• Gargoyle (feature switches)https://github.com/disqus/gargoyle

• Django DB Utils (collection of db helpers for Django)https://github.com/disqus/django-db-utils

• Jenkins CIhttp://jenkins-ci.org/

code.disqus.com

Sunday, March 13, 2011