real-time django

Post on 25-May-2015

7.174 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

The web is live. APIs give us access to continuously changing data. We discuss ways to get real-time data into your app, how to handle data processing and what to do when you get thousands of updates per second.

TRANSCRIPT

with Ben Slavin and Adam Miskiewicz

Real-Time Django

Presented for your enjoymentat DjangoCon US 2011

with Ben Slavin and Adam Miskiewicz

Real-Time Django@benslavin @skevy

The web is

Read / Write

The web is

Read / Write

GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET POST GET POST GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET

GET POST GET POST GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET

GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET POST GET POST GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET POST GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET POST GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET

GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET

GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET POST GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET

GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET POST GET GET GET POST GET GET GET GET GET GET GET POST GET POST GET GET GET GET GET GET GET GET GET GET GET GET

GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET

GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET POST GET POST GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET

POST GET GET GET GET GET GET GET GET GET POST GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET GET POST GET GET GET GET POST GET GET GET GET GET GET GET GET GET GET GET GET GET GET

GET GET POST GET GET GET GET

1 / second

(with intelligent application design and proper caching)

Django Just Works

50 / second

500 / second

5,000 / second

http://twitter.com/#!/twitterglobalpr/status/108285017792331776

8,868 / second

Beyonce!!!

http://blog.twitter.com/2011/02/superbowl.html

Superbowl XLV

4,064 at peak

>2,000 sustained

Django wasn’tbuilt for this.

... but that doesn’tmean we need to use

J2EE or Erlang.

Using the techniques discussed today, we have:

Processed > 4k pieces of data/secondTracked >50k live datapoints

Run live eventsServed award-show sized audiences

You may not deal with this scale, but hopefully you can

Learn from our techniques

Under-documented

A lot to cover

A play. In three parts.

RetrievalProcessing

Presentation

Polling

Retrieval

Twitter, Facebook, Foursquare, etc.

Widely used

Retrieval / Polling

the naïve approach

Continuous Polling

Retrieval

Synchronously blocks the request/response cycle

Slow

Retrieval / Continuous Polling

Adds undue burden on the upstream service

Not neighborly

Retrieval / Continuous Polling

If the upstream service goes down, so do you

Failure model sucks

Retrieval / Continuous Polling

a slightly less-awful approach

Cached Polling

Retrieval

Same as ‘continuous polling’ in the degenerate case

Dog pile

Retrieval / Cached Polling

If the upstream service goes down, so do you

Failure model sucks

Retrieval / Cached Polling

Don’t do this in the request/response cycle

DON’T BREAK THE CYCLE

Retrieval / Polling

♥manage.py poll_stuff+

crontab -e

Retrieval / Polling

Still not enough

Retrieval / Polling

ex. 500 requests / hour

Rate limits

Retrieval / Polling

http://api.twitter.com/1/users/lookup.json?screen_name=bolsterlabs,benslavin,skevy

Batched requests

Retrieval / Rate Limits

Use a pool of workers withdifferent IPs and API keys

Multiple clients

Retrieval / Rate Limits

Ask the upstream provider.

Special access

Retrieval / Rate Limits

No, you come to me.

Web hooks

Retrieval

Asynchronous from the user’s perspective.

Out of band

Retrieval / Web Hooks

Used by Gowalla, Myspace, Google

PubSubHubbub

Retrieval / Web Hooks

True ‘push’.

The data comes to you

Retrieval / Web Hooks

Class based views or plain-old methods.It’s just Django w/ different auth.

Just handle it

Retrieval / Web Hooks

Or worse, completely manual.

Setup can be complex

Retrieval / Web Hooks

Long-lived, open-socket communication.

Streaming

Retrieval

but only when you’re connected.

Live updates

Retrieval / Streaming

can be a significant bottleneck

Single client

Retrieval / Streaming

“Site Streams may deliver hundreds of messages per second to a client, and each stream may consume significant (> 1 Mbit/sec) bandwidth.

Your processing of tweets should be asynchronous,with appropriate buffers in place to handle spikes of 3x normal throughput. Note

that slow reading clients are automatically terminated.”

https://dev.twitter.com/docs/streaming-api/site-streams

Retrieval / Streaming

Pass data off as quickly as possible

Hot potato

Retrieval / Streaming

This data is ephemeral, and there may beno good way to recreate it once it’s gone.

STORE IT. LOG IT. SAVE IT.

Retrieval

Processing

Denormalization

Processing

* Unless you know Frank Wiles.

Your DB is slow.

Processing / Denormalization

db_index=Trueis not the answer

Processing / Denormalization

Tweet.objects.filter(screenname=”aplusk”)

.count()

Processing / Denormalization

TweetCount.objects.get(screenname=”aplusk”)

.tweet_count

Processing / Denormalization

Also consider, memcached, Redis, etc.

pre_save, post_save, post_delete and F objects

Processing / Denormalization

Use these.

These only work in Django

Be careful

Processing / Denormalization

Workers

Processing

Deconstruct the problem

Processing / Workers

Check for profanitythen

Retrieve an avatarthen

Geo-locate the authorthen

Add as input for trending termsthen

Retrieve author’s social graphthen

Adjust the leaderboard

Retrieve an avatarGeo-locate the authorAdd as input for trending termsRetrieve author’s social graphAdjust the leaderboard

Check for profanitythen

Or manage yourself with any queue

django-celery

Processing / Workers

It’s not that scary

map + reduce

Processing

Get data. Process. Cache results.

Generational

Processing / map + reduce

Especially where the intermediate working set is large.

Good for many problems.

Processing / map + reduce / Generational

CouchDB, Mongo, Hadoop

Solutions exist.

Processing / map + reduce / Generational

Sometimes we canbe smarter

Processing / map + reduce

Consider averages.

Processing / map + reduce / incremental

I mean the mean.

Σ

Processing / map + reduce / incremental

n

i = 1

ain1

( )Σ

Processing / map + reduce / incremental

n

i = 1

ain1 Σ

n-1

i = 1

ain1

+ an( )=From O(n) to O(1)

This example was trivial, but you can often

Store a partial solution

Processing / map + reduce / incremental

Presentation(Of the data, not the thing we’re doing now.)

Partial Caching

Presentation

Template fragment caching

{% cache 500 my_stuff %}

Presentation / Partial Caching

class MyModel(models.Model): as_html = models.TextField()

Presentation / Partial Caching

Don’t be afraid of low-level caching.

serialized = json.dumps(my_stuff)cache.set(‘my_stuff’, serialized)

Presentation / Partial Caching

Continuous Caching

Presentation

while True: cache_page()

Presentation / Continuous Caching

Works when the number of pages is relatively small.Similar to proxy_cache, but more resilient.

Out-of-band caching.

Presentation / Continuous Caching

[Watch this space.]

Presentation / Continuous Caching

Real-Time Updates

Presentation

gevent, eventlet,tornado, twisted

Presentation / Real-Time Updates

Django plays wellwith others

Presentation / Real-Time Updates

Presentation / Real-Time Updates

♥Django + RabbitMQ+

node.js + socket.io

[Watch this space.]

Presentation / Real-Time Updates

Failure Models

Presentation

Presentation / Failure Models

This isn’t good for anyone.

proxy_cache_use_staleand

.proxy_next_upstreamFor use with nginx

Presentation / Failure Models

It can serve pre-cached content.Anything is better than a 404, 500 or 502 (usually)

Build a small backup app.

Presentation / Failure Models

Follow @bolsterlabsfor slides and

lively discussion.

Thank you.

@bolsterlabs

Don’t be a stranger.Ben Slavin

@benslavinben@bolsterlabs.com

Adam Miskiewicz@skevy

adam@bolsterlabs.com

top related