real-time django
DESCRIPTION
The web is live. APIs give us access to continuously changing data. We discuss ways to get real-time data into your app, how to handle data processing and what to do when you get thousands of updates per second.TRANSCRIPT
with Ben Slavin and Adam Miskiewicz
Real-Time Django
Presented for your enjoymentat DjangoCon US 2011
with Ben Slavin and Adam Miskiewicz
Real-Time Django@benslavin @skevy
The web is
Read / Write
The web is
Read / Write
GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET POST GET POST GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET
GET POST GET POST GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET
GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET POST GET POST GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET POST GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET POST GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET
GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET
GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET POST GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET
GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET POST GET GET GET POST GET GET GET GET GET GET GET POST GET POST GET GET GET GET GET GET GET GET GET GET GET GET
GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET
GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET POST GET POST GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET
POST GET GET GET GET GET GET GET GET GET POST GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET
GET GET POST GET GET GET GET
1 / second
(with intelligent application design and proper caching)
Django Just Works
50 / second
500 / second
5,000 / second
http://twitter.com/#!/twitterglobalpr/status/108285017792331776
8,868 / second
Beyonce!!!
http://blog.twitter.com/2011/02/superbowl.html
Superbowl XLV
4,064 at peak
>2,000 sustained
Django wasn’tbuilt for this.
... but that doesn’tmean we need to use
J2EE or Erlang.
Using the techniques discussed today, we have:
Processed > 4k pieces of data/secondTracked >50k live datapoints
Run live eventsServed award-show sized audiences
You may not deal with this scale, but hopefully you can
Learn from our techniques
Under-documented
A lot to cover
A play. In three parts.
RetrievalProcessing
Presentation
Polling
Retrieval
Twitter, Facebook, Foursquare, etc.
Widely used
Retrieval / Polling
the naïve approach
Continuous Polling
Retrieval
Synchronously blocks the request/response cycle
Slow
Retrieval / Continuous Polling
Adds undue burden on the upstream service
Not neighborly
Retrieval / Continuous Polling
If the upstream service goes down, so do you
Failure model sucks
Retrieval / Continuous Polling
a slightly less-awful approach
Cached Polling
Retrieval
Same as ‘continuous polling’ in the degenerate case
Dog pile
Retrieval / Cached Polling
If the upstream service goes down, so do you
Failure model sucks
Retrieval / Cached Polling
Don’t do this in the request/response cycle
DON’T BREAK THE CYCLE
Retrieval / Polling
♥manage.py poll_stuff+
crontab -e
Retrieval / Polling
Still not enough
Retrieval / Polling
ex. 500 requests / hour
Rate limits
Retrieval / Polling
http://api.twitter.com/1/users/lookup.json?screen_name=bolsterlabs,benslavin,skevy
Batched requests
Retrieval / Rate Limits
Use a pool of workers withdifferent IPs and API keys
Multiple clients
Retrieval / Rate Limits
Ask the upstream provider.
Special access
Retrieval / Rate Limits
No, you come to me.
Web hooks
Retrieval
Asynchronous from the user’s perspective.
Out of band
Retrieval / Web Hooks
Used by Gowalla, Myspace, Google
PubSubHubbub
Retrieval / Web Hooks
True ‘push’.
The data comes to you
Retrieval / Web Hooks
Class based views or plain-old methods.It’s just Django w/ different auth.
Just handle it
Retrieval / Web Hooks
Or worse, completely manual.
Setup can be complex
Retrieval / Web Hooks
Long-lived, open-socket communication.
Streaming
Retrieval
but only when you’re connected.
Live updates
Retrieval / Streaming
can be a significant bottleneck
Single client
Retrieval / Streaming
“Site Streams may deliver hundreds of messages per second to a client, and each stream may consume significant (> 1 Mbit/sec) bandwidth.
Your processing of tweets should be asynchronous,with appropriate buffers in place to handle spikes of 3x normal throughput. Note
that slow reading clients are automatically terminated.”
https://dev.twitter.com/docs/streaming-api/site-streams
Retrieval / Streaming
Pass data off as quickly as possible
Hot potato
Retrieval / Streaming
This data is ephemeral, and there may beno good way to recreate it once it’s gone.
STORE IT. LOG IT. SAVE IT.
Retrieval
Processing
Denormalization
Processing
* Unless you know Frank Wiles.
Your DB is slow.
Processing / Denormalization
db_index=Trueis not the answer
Processing / Denormalization
Tweet.objects.filter(screenname=”aplusk”)
.count()
Processing / Denormalization
TweetCount.objects.get(screenname=”aplusk”)
.tweet_count
Processing / Denormalization
Also consider, memcached, Redis, etc.
pre_save, post_save, post_delete and F objects
Processing / Denormalization
Use these.
These only work in Django
Be careful
Processing / Denormalization
Workers
Processing
Deconstruct the problem
Processing / Workers
Check for profanitythen
Retrieve an avatarthen
Geo-locate the authorthen
Add as input for trending termsthen
Retrieve author’s social graphthen
Adjust the leaderboard
Retrieve an avatarGeo-locate the authorAdd as input for trending termsRetrieve author’s social graphAdjust the leaderboard
Check for profanitythen
Or manage yourself with any queue
django-celery
Processing / Workers
It’s not that scary
map + reduce
Processing
Get data. Process. Cache results.
Generational
Processing / map + reduce
Especially where the intermediate working set is large.
Good for many problems.
Processing / map + reduce / Generational
CouchDB, Mongo, Hadoop
Solutions exist.
Processing / map + reduce / Generational
Sometimes we canbe smarter
Processing / map + reduce
Consider averages.
Processing / map + reduce / incremental
I mean the mean.
Σ
Processing / map + reduce / incremental
n
i = 1
ain1
( )Σ
Processing / map + reduce / incremental
n
i = 1
ain1 Σ
n-1
i = 1
ain1
+ an( )=From O(n) to O(1)
This example was trivial, but you can often
Store a partial solution
Processing / map + reduce / incremental
Presentation(Of the data, not the thing we’re doing now.)
Partial Caching
Presentation
Template fragment caching
{% cache 500 my_stuff %}
Presentation / Partial Caching
class MyModel(models.Model): as_html = models.TextField()
Presentation / Partial Caching
Don’t be afraid of low-level caching.
serialized = json.dumps(my_stuff)cache.set(‘my_stuff’, serialized)
Presentation / Partial Caching
Continuous Caching
Presentation
while True: cache_page()
Presentation / Continuous Caching
Works when the number of pages is relatively small.Similar to proxy_cache, but more resilient.
Out-of-band caching.
Presentation / Continuous Caching
[Watch this space.]
Presentation / Continuous Caching
Real-Time Updates
Presentation
gevent, eventlet,tornado, twisted
Presentation / Real-Time Updates
Django plays wellwith others
Presentation / Real-Time Updates
Presentation / Real-Time Updates
♥Django + RabbitMQ+
node.js + socket.io
[Watch this space.]
Presentation / Real-Time Updates
Failure Models
Presentation
Presentation / Failure Models
This isn’t good for anyone.
proxy_cache_use_staleand
.proxy_next_upstreamFor use with nginx
Presentation / Failure Models
It can serve pre-cached content.Anything is better than a 404, 500 or 502 (usually)
Build a small backup app.
Presentation / Failure Models
Follow @bolsterlabsfor slides and
lively discussion.
Thank you.
@bolsterlabs
Don’t be a stranger.Ben Slavin
Adam Miskiewicz@skevy