scaling the britain's got talent buzzer
Post on 28-Nov-2014
2.624 Views
Preview:
DESCRIPTION
TRANSCRIPT
Powering the Britain’s Got Talent buzzer*
*And Big Data
Big Data Meetup, London 25/5/2011
1
1Thursday, 26 May 2011
2
What we do
2Thursday, 26 May 2011
Me
Malcolm Box, Co-founder & CTO
boxm@livetalkback.com
@malcolmbox
3
3Thursday, 26 May 2011
The Buzzer
4
BIG DATA
4Thursday, 26 May 2011
The challenge
10 Million+ viewers
Design goal of 50,000 requests/s, 10,000 buzzes/second
Equivalent to 130 Billion requests/month
But just on Saturday night
And four weeks to build
5
5Thursday, 26 May 2011
The challenge
6
Source: http://www.google.com/adplanner/static/top1000/#
Where does 130 Billion requests fit?
6Thursday, 26 May 2011
Where we started....
7
ELB
WebserverDjangoUbuntu
WebserverDjangoUbuntu
MySQL
app.livetalkback.com
Zabbix
Control plane
S3
CloudFront
cdn.livetalkback.com
7Thursday, 26 May 2011
Step 1: Testing
Started with a platform with a previous peak of 100 requests/s
No idea where it would break
Tsung! http://tsung.erlang-projects.org/
8
8Thursday, 26 May 2011
Step 2: ELB
Amazon Elastic Load Balancer
“Infinite capacity”
BUT very long impulse response and NO controls :(
HAProxy to the rescue
5K requests/s per node
9
9Thursday, 26 May 2011
Step 3: Avoid the DB
MySQL was never going to be able to handle 10,000 writes/s, nor 50,000 reads
“Hey, Django does memcached. Problem solved”
Help, our memcached server I/O is maxed out :(
Two-layer cache: https://gist.github.com/953524
Write-behind data
10
10Thursday, 26 May 2011
But we want analytics!
Now 10K things to write to disk every second
Logging? Database?
This is starting to look like BIG DATA
11
11Thursday, 26 May 2011
Step 4: Baby
12
12Thursday, 26 May 2011
Step 5: Cassandra
Deployed Cassandra cluster on EC2 to handle buzz records
Tested to > 10K writes/s
All good!
“So how many users did we have last night?”
13
13Thursday, 26 May 2011
Where we ended...
14
HAProxy HAProxy
WebserverDjangoUbuntu
WebserverDjangoUbuntu
Memcached CassandraRDS Master
app.livetalkback.com
Chef
Zabbix
Control plane
CassandraMemcached S3
CloudFront
cdn.livetalkback.com10
nodes
100+ nodes
14Thursday, 26 May 2011
Scaling up - and down
Configuring 100+ servers by hand each week would have been a pain
Used to Chef to automate
Also builds the test swarm
http://wiki.opscode.com/display/chef/Home
15
15Thursday, 26 May 2011
Now what?
Still challenges with analytics & ad-hoc queries
Looking at Brisk and Hadoop
We’re sucking the Twitter firehose for Tellybug
MySQL is coping so far, but only just
16
16Thursday, 26 May 2011
Questions?
boxm@livetalkback.com
@malcolmbox
17
17Thursday, 26 May 2011
top related