real-time django

91
with Ben Slavin and Adam Miskiewicz Real-Time Django Presented for your enjoyment at DjangoCon US 2011 with Ben Slavin and Adam Miskiewicz Real-Time Django @benslavin @skevy

Upload: bolster-labs

Post on 25-May-2015

7.174 views

Category:

Technology


3 download

DESCRIPTION

The web is live. APIs give us access to continuously changing data. We discuss ways to get real-time data into your app, how to handle data processing and what to do when you get thousands of updates per second.

TRANSCRIPT

Page 1: Real-Time Django

with Ben Slavin and Adam Miskiewicz

Real-Time Django

Presented for your enjoymentat DjangoCon US 2011

with Ben Slavin and Adam Miskiewicz

Real-Time Django@benslavin @skevy

Page 2: Real-Time Django

The web is

Read / Write

Page 3: Real-Time Django

The web is

Read / Write

Page 4: Real-Time Django

GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET POST GET POST GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET

GET POST GET POST GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET

GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET POST GET POST GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET POST GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET POST GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET

GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET

GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET POST GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET

GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET POST GET GET GET POST GET GET GET GET GET GET GET POST GET POST GET GET GET GET GET GET GET GET GET GET GET GET

GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET

GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET POST GET POST GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET

POST GET GET GET GET GET GET GET GET GET POST GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET

GET GET POST GET GET GET GET

Page 5: Real-Time Django

1 / second

Page 6: Real-Time Django

(with intelligent application design and proper caching)

Django Just Works

Page 7: Real-Time Django

50 / second

Page 8: Real-Time Django

500 / second

Page 9: Real-Time Django

5,000 / second

Page 10: Real-Time Django

http://twitter.com/#!/twitterglobalpr/status/108285017792331776

8,868 / second

Beyonce!!!

Page 11: Real-Time Django

http://blog.twitter.com/2011/02/superbowl.html

Superbowl XLV

Page 12: Real-Time Django

4,064 at peak

Page 13: Real-Time Django

>2,000 sustained

Page 14: Real-Time Django

Django wasn’tbuilt for this.

Page 15: Real-Time Django

... but that doesn’tmean we need to use

J2EE or Erlang.

Page 16: Real-Time Django

Using the techniques discussed today, we have:

Processed > 4k pieces of data/secondTracked >50k live datapoints

Run live eventsServed award-show sized audiences

Page 17: Real-Time Django

You may not deal with this scale, but hopefully you can

Learn from our techniques

Page 18: Real-Time Django

Under-documented

Page 19: Real-Time Django

A lot to cover

Page 20: Real-Time Django

A play. In three parts.

RetrievalProcessing

Presentation

Page 21: Real-Time Django

Polling

Retrieval

Page 22: Real-Time Django

Twitter, Facebook, Foursquare, etc.

Widely used

Retrieval / Polling

Page 23: Real-Time Django

the naïve approach

Continuous Polling

Retrieval

Page 24: Real-Time Django

Synchronously blocks the request/response cycle

Slow

Retrieval / Continuous Polling

Page 25: Real-Time Django

Adds undue burden on the upstream service

Not neighborly

Retrieval / Continuous Polling

Page 26: Real-Time Django

If the upstream service goes down, so do you

Failure model sucks

Retrieval / Continuous Polling

Page 27: Real-Time Django

a slightly less-awful approach

Cached Polling

Retrieval

Page 28: Real-Time Django

Same as ‘continuous polling’ in the degenerate case

Dog pile

Retrieval / Cached Polling

Page 29: Real-Time Django

If the upstream service goes down, so do you

Failure model sucks

Retrieval / Cached Polling

Page 30: Real-Time Django

Don’t do this in the request/response cycle

DON’T BREAK THE CYCLE

Retrieval / Polling

Page 31: Real-Time Django

♥manage.py poll_stuff+

crontab -e

Retrieval / Polling

Page 32: Real-Time Django

Still not enough

Retrieval / Polling

Page 33: Real-Time Django

ex. 500 requests / hour

Rate limits

Retrieval / Polling

Page 34: Real-Time Django

http://api.twitter.com/1/users/lookup.json?screen_name=bolsterlabs,benslavin,skevy

Batched requests

Retrieval / Rate Limits

Page 35: Real-Time Django

Use a pool of workers withdifferent IPs and API keys

Multiple clients

Retrieval / Rate Limits

Page 36: Real-Time Django

Ask the upstream provider.

Special access

Retrieval / Rate Limits

Page 37: Real-Time Django

No, you come to me.

Web hooks

Retrieval

Page 38: Real-Time Django

Asynchronous from the user’s perspective.

Out of band

Retrieval / Web Hooks

Page 39: Real-Time Django

Used by Gowalla, Myspace, Google

PubSubHubbub

Retrieval / Web Hooks

Page 40: Real-Time Django

True ‘push’.

The data comes to you

Retrieval / Web Hooks

Page 41: Real-Time Django

Class based views or plain-old methods.It’s just Django w/ different auth.

Just handle it

Retrieval / Web Hooks

Page 42: Real-Time Django

Or worse, completely manual.

Setup can be complex

Retrieval / Web Hooks

Page 43: Real-Time Django

Long-lived, open-socket communication.

Streaming

Retrieval

Page 44: Real-Time Django

but only when you’re connected.

Live updates

Retrieval / Streaming

Page 45: Real-Time Django

can be a significant bottleneck

Single client

Retrieval / Streaming

Page 46: Real-Time Django

“Site Streams may deliver hundreds of messages per second to a client, and each stream may consume significant (> 1 Mbit/sec) bandwidth.

Your processing of tweets should be asynchronous,with appropriate buffers in place to handle spikes of 3x normal throughput. Note

that slow reading clients are automatically terminated.”

https://dev.twitter.com/docs/streaming-api/site-streams

Retrieval / Streaming

Page 47: Real-Time Django

Pass data off as quickly as possible

Hot potato

Retrieval / Streaming

Page 48: Real-Time Django

This data is ephemeral, and there may beno good way to recreate it once it’s gone.

STORE IT. LOG IT. SAVE IT.

Retrieval

Page 49: Real-Time Django

Processing

Page 50: Real-Time Django

Denormalization

Processing

Page 51: Real-Time Django

* Unless you know Frank Wiles.

Your DB is slow.

Processing / Denormalization

Page 52: Real-Time Django

db_index=Trueis not the answer

Processing / Denormalization

Page 53: Real-Time Django

Tweet.objects.filter(screenname=”aplusk”)

.count()

Processing / Denormalization

Page 54: Real-Time Django

TweetCount.objects.get(screenname=”aplusk”)

.tweet_count

Processing / Denormalization

Also consider, memcached, Redis, etc.

Page 55: Real-Time Django

pre_save, post_save, post_delete and F objects

Processing / Denormalization

Use these.

Page 56: Real-Time Django

These only work in Django

Be careful

Processing / Denormalization

Page 57: Real-Time Django

Workers

Processing

Page 58: Real-Time Django

Deconstruct the problem

Processing / Workers

Page 59: Real-Time Django

Check for profanitythen

Retrieve an avatarthen

Geo-locate the authorthen

Add as input for trending termsthen

Retrieve author’s social graphthen

Adjust the leaderboard

Page 60: Real-Time Django

Retrieve an avatarGeo-locate the authorAdd as input for trending termsRetrieve author’s social graphAdjust the leaderboard

Check for profanitythen

Page 61: Real-Time Django

Or manage yourself with any queue

django-celery

Processing / Workers

Page 62: Real-Time Django

It’s not that scary

map + reduce

Processing

Page 63: Real-Time Django

Get data. Process. Cache results.

Generational

Processing / map + reduce

Page 64: Real-Time Django

Especially where the intermediate working set is large.

Good for many problems.

Processing / map + reduce / Generational

Page 65: Real-Time Django

CouchDB, Mongo, Hadoop

Solutions exist.

Processing / map + reduce / Generational

Page 66: Real-Time Django

Sometimes we canbe smarter

Processing / map + reduce

Page 67: Real-Time Django

Consider averages.

Processing / map + reduce / incremental

I mean the mean.

Page 68: Real-Time Django

Σ

Processing / map + reduce / incremental

n

i = 1

ain1

Page 69: Real-Time Django

( )Σ

Processing / map + reduce / incremental

n

i = 1

ain1 Σ

n-1

i = 1

ain1

+ an( )=From O(n) to O(1)

Page 70: Real-Time Django

This example was trivial, but you can often

Store a partial solution

Processing / map + reduce / incremental

Page 71: Real-Time Django

Presentation(Of the data, not the thing we’re doing now.)

Page 72: Real-Time Django

Partial Caching

Presentation

Page 73: Real-Time Django

Template fragment caching

{% cache 500 my_stuff %}

Presentation / Partial Caching

Page 74: Real-Time Django

class MyModel(models.Model): as_html = models.TextField()

Presentation / Partial Caching

Page 75: Real-Time Django

Don’t be afraid of low-level caching.

serialized = json.dumps(my_stuff)cache.set(‘my_stuff’, serialized)

Presentation / Partial Caching

Page 76: Real-Time Django

Continuous Caching

Presentation

Page 77: Real-Time Django

while True: cache_page()

Presentation / Continuous Caching

Page 78: Real-Time Django

Works when the number of pages is relatively small.Similar to proxy_cache, but more resilient.

Out-of-band caching.

Presentation / Continuous Caching

Page 79: Real-Time Django

[Watch this space.]

Presentation / Continuous Caching

Page 80: Real-Time Django

Real-Time Updates

Presentation

Page 81: Real-Time Django

gevent, eventlet,tornado, twisted

Presentation / Real-Time Updates

Page 82: Real-Time Django

Django plays wellwith others

Presentation / Real-Time Updates

Page 83: Real-Time Django

Presentation / Real-Time Updates

♥Django + RabbitMQ+

node.js + socket.io

Page 84: Real-Time Django

[Watch this space.]

Presentation / Real-Time Updates

Page 85: Real-Time Django

Failure Models

Presentation

Page 86: Real-Time Django

Presentation / Failure Models

This isn’t good for anyone.

Page 87: Real-Time Django

proxy_cache_use_staleand

.proxy_next_upstreamFor use with nginx

Presentation / Failure Models

Page 88: Real-Time Django

It can serve pre-cached content.Anything is better than a 404, 500 or 502 (usually)

Build a small backup app.

Presentation / Failure Models

Page 89: Real-Time Django

Follow @bolsterlabsfor slides and

lively discussion.

Page 90: Real-Time Django

Thank you.

Page 91: Real-Time Django

@bolsterlabs

Don’t be a stranger.Ben Slavin

@[email protected]

Adam Miskiewicz@skevy

[email protected]