asynchronous architectures for implementing scalable cloud services - evan cooke - gluecon 2012

Asynchronous Architectures for Implementing Scalable Cloud ServicesDesigning for Graceful Degradation

EVAN COOKE

CO-FOUNDER & CTO twilioCLOUD COMMUNICATIONS

Cloud services power the apps that are the backbone of modern society. How

we work, play, and communicate.

Cloud WorkloadsCan Be

Unpredictable

6x spike in 5 mins

SMS API Usage

RequestLatency

Load

Time

FAIL

Danger!Load higher than instantaneous throughput

Don’t Fail Requests

LoadBalancer

Incoming Requests

AAA AAA AAA

...Throttling Throttling Throttling

Throttling Throttling Throttling

App Server

App Server

App Server

App Server

W

WW

W

WWW

W

WorkerPool

10%

70%

100%+

FailedRequests

Time

Worker Poolse.g., Apache/Nginx

Problem Summary

•Cloud services often use worker pools to handle incoming requests

•When load goes beyond size of the worker pool, requests fail

What next?

A few observations based on work implementing and scaling the Twilio API over the past 4 years...

• Twilio Voice/SMS Cloud APIs

• 100,000 Twilio Developers

• 100+ employees

Observation 1

For many APIs, taking more time to service a request is better than failing that request

Implication: in many cases, it is better to service a request with some delay rather than failing it

Observation 2

Matching the amount of available resources precisely to the size of incoming request worker pools is challenging

Implication: under load, it may be possible delay or drop only those requests that truly impact resources

What are we going to do?

Suggestion: if request concurrency was very cheap, we could implement delay and finer-grained resource controls much more easily...

Event-driven programming and the Reactor Pattern


req = ‘GET /’;req.append(‘/r/n/r/n’);socket.write(req);resp = socket.read();print(resp);

1110000x10000000x10

TimeWorker


req = ‘GET /’;req.append(‘/r/n/r/n’);socket.write(req);resp = socket.read();print(resp);

1110000x10000000x10

Time

Huge IO latency blocks worker


req = ‘GET /’;req.append(‘/r/n/r/n’);socket.write(req, fn() {

socket.read(fn(resp) {print(resp);});

});

Make IO operations async and “callback” when done




});Central dispatch to coordinate event callbacksreactor.run_forever();




});reactor.run_forever();

11

10

Time

1010

Result: we don’t block the worker

(Some)Reactor Pattern Frameworks

js/node.js

python/twistedpython/gevent

c/libeventc/libev

ruby/eventmachine

java/nio/netty

The Callback Mess

Python Twistedreq = ‘GET /’req += ‘/r/n/r/n’

def r(resp): print resp

def w(): socket.read().addCallback(r)

socket.write().addCallback(w)

The Callback Mess


yield socket.write()resp = yield socket.read()print resp

Use deferred generators and inline callbacks

The Callback Mess


yield socket.write()resp = yield socket.read()print resp

Easy sequential programming with

mostly implicit async IO

Enter gevent“gevent is a coroutine-based Python networking library that uses greenlet

to provide a high-level synchronous API on top of the libevent event loop.”

socket.write()resp = socket.read()print resp

Natively Async

http://en.wikipedia.org/wiki/Coroutine

http://en.wikipedia.org/wiki/Coroutine

http://www.python.org/

http://www.python.org/

http://codespeak.net/py/0.9.2/greenlet.html

http://codespeak.net/py/0.9.2/greenlet.html

http://monkey.org/~provos/libevent/

http://monkey.org/~provos/libevent/

Enter gevent

from gevent.server import StreamServer

def echo(socket, address): print ('New connection from %s:%s' % address) socket.sendall('Welcome to the echo server!\r\n') line = fileobj.readline() fileobj.write(line) fileobj.flush() print ("echoed %r" % line)

if __name__ == '__main__': server = StreamServer(('0.0.0.0', 6000), echo) server.serve_forever()

Simple Echo Server

Easy sequential modelFully async

Async Services with Ginkgo

Ginkgo is a simple framework for composing async gevent services with common

configuration, logging, demonizing etc.

https://github.com/progrium/ginkgo

Let’s look a simple example that implements a TCP and

HTTP server...



Async Services with Ginkgoimport geventfrom gevent.pywsgi import WSGIServerfrom gevent.server import StreamServer

from ginkgo.core import Service

def handle_http(env, start_response): start_response('200 OK', [('Content-Type', 'text/html')]) print 'new http request!' return ["hello world"]

def handle_tcp(socket, address): print 'new tcp connection!' while True: socket.send('hello\n') gevent.sleep(1)

app = Service()app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp))app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))app.serve_forever()






Import WSGI/TCPServers






HTTP Handler






TCP Handler






Service Composition

LoadBalancer

...

Incoming Requests

Async Server

Async Server

Async Server

Using our async reactor-based approach let’s redesign our serving infrastructure

LoadBalancer

...

Incoming Requests

Async Server

AAA

Async Server

AAA

Async Server

AAA

Step 1: define an authentication and authorization layer that will identify the user and the resource being requested

LoadBalancer

...

Incoming Requests

Throttling

Async Server

AAA

Throttling

Async Server

AAA

Throttling

Async Server

AAA

ConcurrencyManager

Step 2: add a throttling layer and concurrency manager

Concurrency Admission Control

•Goal: limit concurrency by delaying or selectively failing requests

•Common metrics- By Account

- By Resource Type

- By Availability of Dependent Resources

•What we’ve found useful- By (Account, Resource Type)

Delay - delay responses without failing requests

Latency

Load

Load

Latency /x Fail

Latency /*

Deny - deny requests based on resource usage

LoadBalancer

...

Incoming Requests

Throttling

App Server

AAA

Throttling

App Server

AAA

Throttling

App Server

AAA

DependentServices

ConcurrencyManager

Throttling Throttling Throttling

Step 3: allow backend resources to throttle requests

SummaryAsync frameworks like gevent allow you to easily decouple a request from access to constrained resources

RequestLatency

Time

Service-wideFailure

Don’t Fail RequestsDecrease

Performance

asynchronous architectures for implementing scalable cloud services - evan cooke - gluecon 2012

Technology