behind the scenes at livejournal: scaling storytime
DESCRIPTION
Brad talks about clustering setups using MySQL and DRDB and their Open Source software most of which he wrote initially and continues to develop. A lot of these techniques and/or software is used by many other companies as well - among them Flickr/Yahoo! and Facebook.TRANSCRIPT
![Page 1: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/1.jpg)
http://danga.com/words/
LiveJournal: Behind The ScenesScaling Storytime
June 2007USENIX
Brad [email protected]
danga.com / livejournal.com / sixapart.com
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/1.0/ or send a letter to
Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
1
![Page 2: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/2.jpg)
http://danga.com/words/
The plan...
Refer to previous presentations for more details... http://danga.com/words/
Questions anytime! Yell. Interrupt. Part 0:
− show where talk will end up Part I:
− What is LiveJournal? Quick history.− LJ’s scaling history
Part II:− explain all our software,− explain all the moving parts
2
![Page 3: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/3.jpg)
http://danga.com/words/
LiveJournal Backend: Today(Roughly.)
User DB Cluster 1uc1a uc1b
User DB Cluster 2uc2a uc2b
User DB Cluster 3uc3a uc3b
User DB Cluster NucNa ucNb
Job Queues (xN)jqNa jqNb
Memcached
mc4
mc3
mc2
mcN
...
mc1
mod_perl
web4
web3
web2
webN
...
web1
BIG-IP
bigip2bigip1 perlbal (httpd/proxy)
proxy4
proxy3
proxy2
proxy5
proxy1
Global Database
slave1
master_a master_b
slave2 ... slave5
MogileFS Database
mog_a mog_b
Mogile Trackerstracker3tracker1
Mogile Storage Nodes
...sto2sto8
sto1
net.
djabberddjabberddjabberd
gearmandgearmand1gearmandN
“workers”gearwrkNtheschwkN
slave1 slaveN3
![Page 4: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/4.jpg)
http://danga.com/words/
LiveJournal Overview
college hobby project, Apr 1999 4-in-1:
− blogging− forums− social-networking (“friends”)− aggregator: “friends page”
− “friends” can be external RSS/Atom 10M+ accounts Open Source!
− server,− infrastructure,− original clients,
4
![Page 5: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/5.jpg)
http://danga.com/words/
memcached− distributed caching
MogileFS− distributed filesystem
Perlbal− HTTP load balancer, web
server, swiss-army knife gearman
− LB/HA/coalescing low-latency function call “router”
TheSchwartz− reliable, async job
dispatch system
djabberd− the super-extensible
everything-is-a-plugin mod_perl/qpsmtpd/Eclipse of XMPP/Jabber servers
..... OpenID
federated identity protocol
Stuff we've built...(all production, open source)
5
![Page 6: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/6.jpg)
http://danga.com/words/
“Uh, why?”
NIH? (Not Invented Here?) Are we reinventing the wheel?
6
![Page 7: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/7.jpg)
http://danga.com/words/
Yes.
We build wheels.− ... when existing suck,− ... or don’t exist.
7
![Page 8: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/8.jpg)
http://danga.com/words/
Yes.
We build wheels.− ... when existing suck,− ... or don’t exist.
7
![Page 9: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/9.jpg)
http://danga.com/words/
Yes.
We build wheels.− ... when existing suck,− ... or don’t exist.
7
![Page 10: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/10.jpg)
http://danga.com/words/
Yes.
We build wheels.− ... when existing suck,− ... or don’t exist.
(yes, arguably tires. sshh..)
7
![Page 11: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/11.jpg)
http://danga.com/words/
Part IQuick Scaling History
8
![Page 12: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/12.jpg)
http://danga.com/words/
Quick Scaling History
1 server to hundreds...
you can do all this with just 1 server!− then you’re ready for tons of servers, without pain− don’t repeat our scaling mistakes
9
![Page 13: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/13.jpg)
http://danga.com/words/
Terminology
Scaling:− NOT: “How fast?”− But: “When you add twice as many servers, are you
twice as fast (or have twice the capacity)?” Fast still matters,
− 2x faster: 50 servers instead of 100... that’s some good money
− but that’s not what scaling is.
10
![Page 14: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/14.jpg)
http://danga.com/words/
Terminology
“Cluster”− varying definitions... basically:− making a bunch of computers work together for
some purpose− what purpose?
load balancing (LB), high availablility (HA)
Load Balancing? High Availability? Venn Diagram time!
− I love Venn Diagrams
11
![Page 15: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/15.jpg)
http://danga.com/words/
LB vs. HA
Load Balancing High Availability
12
![Page 16: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/16.jpg)
http://danga.com/words/
LB vs. HA
Load Balancing
High Availability
httpreverse proxy,wackamole,
...
round-robin DNS,data partitioning,....
LVS heartbeat,cold/warm/hot spare,
...
13
![Page 17: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/17.jpg)
http://danga.com/words/
Favorite Venn Diagram
Times WhenI’m Truly Happy
Times When I’mWearing Pants
14
![Page 18: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/18.jpg)
http://danga.com/words/
One Server
Simple:
mysql
apache
15
![Page 19: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/19.jpg)
http://danga.com/words/
Two Servers
mysql
apache
16
![Page 20: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/20.jpg)
http://danga.com/words/
Two Servers - Problems
Two single points of failure! No hot or cold spares Site gets slow again.
− CPU-bound on web node− need more web nodes...
17
![Page 21: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/21.jpg)
http://danga.com/words/
Four Servers
3 webs, 1 db Now we need to load-balance!
LVS, mod_backhand, whackamole, BIG-IP, Alteon, pound, Perlbal, etc, etc..
− ...
18
![Page 22: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/22.jpg)
http://danga.com/words/
Four Servers - Problems
Now I/O bound... ... how to use another database?
−
19
![Page 23: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/23.jpg)
http://danga.com/words/
Five Serversintroducing MySQL replication
We buy a new DB MySQL replication Writes to DB (master) Reads from both
20
![Page 24: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/24.jpg)
http://danga.com/words/
More Servers
Chaos!
21
![Page 25: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/25.jpg)
http://danga.com/words/
Where we're at....
mod_perl
web4
web3
web2
web12
...
web1
BIG-IP
bigip2bigip1
mod_proxy
proxy3
proxy2
proxy1
Global Database
slave1 slave2 ... slave6
master
net.
22
![Page 26: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/26.jpg)
http://danga.com/words/
Problems with Architectureor,
“This don't scale...”
DB master is SPOF Adding slaves doesn't scale
well...− only spreads reads, not writes!
200 writes/s 200 write/s
500 reads/s250 reads/s
200 write/s
250 reads/s
23
![Page 27: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/27.jpg)
http://danga.com/words/
Eventually...
databases eventual only writing
400 write/s
3 reads/s
400write/s
3 r/s
400 write/s
3 reads/s
400write/s
3 r/s
400 write/s
3 reads/s
400write/s
3 r/s
400 write/s
3 reads/s
400write/s
3 r/s
400 write/s
3 reads/s
400write/s
3 r/s
400 write/s
3 reads/s
400write/s
3 r/s
400 write/s
3 reads/s
400write/s
3 r/s
24
![Page 28: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/28.jpg)
http://danga.com/words/
Spreading Writes
Our database machines already did RAID We did backups So why put user data on 6+ slave machines?
(~12+ disks)− overkill redundancy− wasting time writing everywhere!
25
![Page 29: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/29.jpg)
http://danga.com/words/
Partition your data!
Spread your databases out, into “roles”− roles that you never need to join between
different users or accept you'll have to join in app
Each user assigned to a numbered HA cluster Each cluster has multiple machines
− writes self-contained in cluster (writing to 2-3 machines, not 6)
26
![Page 30: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/30.jpg)
http://danga.com/words/
User Clusters
27
![Page 31: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/31.jpg)
http://danga.com/words/
User Clusters
SELECT userid,clusterid FROM user WHERE user='bob'
27
![Page 32: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/32.jpg)
http://danga.com/words/
User Clusters
SELECT userid,clusterid FROM user WHERE user='bob'
userid: 839clusterid: 2
27
![Page 33: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/33.jpg)
http://danga.com/words/
User Clusters
SELECT userid,clusterid FROM user WHERE user='bob'
userid: 839clusterid: 2
SELECT .... FROM ...WHERE userid=839 ...
27
![Page 34: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/34.jpg)
http://danga.com/words/
User Clusters
SELECT userid,clusterid FROM user WHERE user='bob'
userid: 839clusterid: 2
SELECT .... FROM ...WHERE userid=839 ...
OMG i like totally hate my parents they just dont understand me and i h8 the world omg lol rofl *! :^-^^;
add me as a friend!!!
27
![Page 35: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/35.jpg)
http://danga.com/words/
Details
per-user numberspaces− don't use AUTO_INCREMENT− PRIMARY KEY (user_id, thing_id)− so:
Can move/upgrade users 1-at-a-time:− per-user “readonly” flag− per-user “schema_ver” property− user-moving harness
job server that coordinates, distributed long-lived user-mover clients who ask for tasks
− balancing disk I/O, disk space
28
![Page 36: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/36.jpg)
http://danga.com/words/
Shared Storage(SAN, SCSI, DRBD...)
Turn pair of InnoDB machines into a cluster− looks like 1 box to outside world. floating IP.
One machine at a time mounting fs, running MySQL Heartbeat to move IP, {un,}mount filesystem, {stop,start}
mysql filesystem repairs, innodb repairs, don’t lose any committed transactions.
No special schema considerations MySQL 4.1 w/ binlog sync/flush options
− good− The cluster can be a master or slave as well
29
![Page 37: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/37.jpg)
http://danga.com/words/
Shared Storage: DRBD
Linux block device driver− “Network RAID 1”− Shared storage without sharing!− sits atop another block device− syncs w/ another machine's
block device cross-over gigabit cable
ideal. network is faster than random writes on your disks.
InnoDB on DRBD: HA MySQL!− can hang slaves off HA pair,− and/or,− HA pair can be slave of a
master
drbd
sda
ext3
mysql
floater ip
drbd
sda
ext3
mysql
30
![Page 38: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/38.jpg)
http://danga.com/words/
MySQL Clustering Options:Pros & Cons
No magic bullet...− Master/Slave
doesn’t scale with writes− Master/Master
special schemas− DRBD
only HA, not LB− MySQL Cluster
special-purpose− ....
lots of options!− :)− :(
31
![Page 39: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/39.jpg)
http://danga.com/words/
Part IIOur Software
32
![Page 40: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/40.jpg)
http://danga.com/words/
Caching
caching's key to performance− store result of a computation or I/O for quicker future
access (classic space/time trade-off) Where to cache?
− mod_perl/php internal caching memory waste (address space per apache child)
− shared memory limited to single machine, same with Java/C#/
Mono− MySQL query cache
flushed per update, small max size− HEAP tables
fixed length rows, small max size
33
![Page 41: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/41.jpg)
http://danga.com/words/
memcachedhttp://www.danga.com/memcached/
our Open Source, distributed caching system implements a dictionary ADT, with network API
run instances wherever free memory two-level hash
− client hashes* to server,− server has internal dictionary (hash table)
no “master node”, nodes aren’t aware of each other
protocol simple, XML-free− clients: c, perl, java, c#, php, python, ruby, ...
popular, fast scalable
34
![Page 42: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/42.jpg)
http://danga.com/words/
Protocol Commands
set, add, replace delete incr, decr
− atomic, returning new value
35
![Page 43: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/43.jpg)
http://danga.com/words/
Picture
36
![Page 44: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/44.jpg)
http://danga.com/words/
Picture
10.0.0.100:112111GB
10.0.0.101:112112GB
10.0.0.102:112111GB
36
![Page 45: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/45.jpg)
http://danga.com/words/
Picture
10.0.0.100:112111GB
10.0.0.101:112112GB
10.0.0.102:112111GB
36
![Page 46: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/46.jpg)
http://danga.com/words/
Picture
10.0.0.100:112111GB
10.0.0.101:112112GB
10.0.0.102:112111GB
0 1 2 3
36
![Page 47: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/47.jpg)
http://danga.com/words/
Picture
10.0.0.100:112111GB
10.0.0.101:112112GB
10.0.0.102:112111GB
Client
0 1 2 3
36
![Page 48: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/48.jpg)
http://danga.com/words/
Picture
10.0.0.100:112111GB
10.0.0.101:112112GB
10.0.0.102:112111GB
Client
0 1 2 3
$val = $client->get(“foo”)
36
![Page 49: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/49.jpg)
http://danga.com/words/
Picture
10.0.0.100:112111GB
10.0.0.101:112112GB
10.0.0.102:112111GB
Client
0 1 2 3
$val = $client->get(“foo”) CRC32(“foo”) % 4 = 2
36
![Page 50: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/50.jpg)
http://danga.com/words/
Picture
10.0.0.100:112111GB
10.0.0.101:112112GB
10.0.0.102:112111GB
Client
0 1 2 3
$val = $client->get(“foo”) CRC32(“foo”) % 4 = 2 connect to server[2] (“10.0.0.101:11211”)
36
![Page 51: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/51.jpg)
http://danga.com/words/
Picture
10.0.0.100:112111GB
10.0.0.101:112112GB
10.0.0.102:112111GB
Client
0 1 2 3
$val = $client->get(“foo”) CRC32(“foo”) % 4 = 2 connect to server[2] (“10.0.0.101:11211”)
GET foo
36
![Page 52: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/52.jpg)
http://danga.com/words/
Picture
10.0.0.100:112111GB
10.0.0.101:112112GB
10.0.0.102:112111GB
Client
0 1 2 3
$val = $client->get(“foo”) CRC32(“foo”) % 4 = 2 connect to server[2] (“10.0.0.101:11211”)
GET foo (response)
36
![Page 53: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/53.jpg)
http://danga.com/words/
Client hashing onto a memcacached node
Up to client how to pick a memcached node Traditional way:
− CRC32(<key>) % <num_servers>− (servers with more memory can own more slots)− CRC32 was least common denominator for all
languages to implement, allowing cross-language memcached sharing
− con: can’t add/remove servers without hit rate crashing
“Consistent hashing”− can add/remove servers with minimal <key> to
<server> map changes
37
![Page 54: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/54.jpg)
http://danga.com/words/
memcached internals
libevent− epoll, kqueue...
event-based, non-blocking design− optional multithreading, thread per CPU (not per
client) slab allocator referenced counted objects
− slow clients can’t block other clients from altering namespace or data
LRU all internal operations O(1)
38
![Page 55: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/55.jpg)
http://danga.com/words/
Perlbal
39
![Page 56: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/56.jpg)
http://danga.com/words/
Web Load Balancing
BIG-IP, Alteon, Juniper, Foundry− good for L4 or minimal L7− not tricky / fun enough. :-)
Tried a dozen reverse proxies− none did what we wanted or were fast enough
Wrote Perlbal− fast, smart, manageable HTTP web server / reverse proxy / LB− can do internal redirects
and dozen other tricks
40
![Page 57: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/57.jpg)
http://danga.com/words/
Perlbal
Perl parts optionally in C with plugins
single threaded, async event-based− uses epoll, kqueue, etc.
console / HTTP remote management− live config changes
handles dead nodes, smart balancing multiple modes
− static webserver− reverse proxy− plug-ins (Javascript message bus.....)
plug-ins− GIF/PNG altering, ....
41
![Page 58: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/58.jpg)
http://danga.com/words/
Perlbal: Persistent Connections
42
![Page 59: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/59.jpg)
http://danga.com/words/
Perlbal: Persistent Connections
perlbal to backends (mod_perls)− know exactly when a connection is ready for a new
request no complex load balancing logic: just use whatever's free.
beats managing “weighted round robin” hell. clients persistent; not tied to a specific backend
connection
42
![Page 60: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/60.jpg)
http://danga.com/words/
Perlbal: Persistent Connections
perlbal to backends (mod_perls)− know exactly when a connection is ready for a new
request no complex load balancing logic: just use whatever's free.
beats managing “weighted round robin” hell. clients persistent; not tied to a specific backend
connection
PB
42
![Page 61: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/61.jpg)
http://danga.com/words/
Perlbal: Persistent Connections
perlbal to backends (mod_perls)− know exactly when a connection is ready for a new
request no complex load balancing logic: just use whatever's free.
beats managing “weighted round robin” hell. clients persistent; not tied to a specific backend
connection
PB
Apache
Apache
Client
Client
42
![Page 62: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/62.jpg)
http://danga.com/words/
Perlbal: Persistent Connections
perlbal to backends (mod_perls)− know exactly when a connection is ready for a new
request no complex load balancing logic: just use whatever's free.
beats managing “weighted round robin” hell. clients persistent; not tied to a specific backend
connection
PB
Apache
Apache
Client
Client reqA1, B2
reqB1, A2
reqA1, A2
reqB1, B2
42
![Page 63: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/63.jpg)
http://danga.com/words/
Perlbal: can verify new backend connections
connects to backends are often fast, but... are you talking to the kernel’s listen queue? or apache? (did apache accept() yet?)
send OPTIONs request to see if apache is there− Apache can reply to OPTIONS request quickly,− then Perlbal knows that conn is bound to an
apache process, not waiting in a kernel queue Huge improvement to user-visible latency!
(and more fair/even load balancing)
#include <sys/socket.h>int listen(int sockfd, int backlog);
43
![Page 64: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/64.jpg)
http://danga.com/words/
Perlbal: multiple queues
high, normal, low priority queues paid users -> high queue bots/spiders/suspect traffic -> low queue
44
![Page 65: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/65.jpg)
http://danga.com/words/
Perlbal: cooperative large file serving
large file serving w/ mod_perl bad...− mod_perl has better things to do than spoon-feed
clients bytes
45
![Page 66: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/66.jpg)
http://danga.com/words/
Perlbal: cooperative large file serving
internal redirects− mod_perl can pass off serving a big file to Perlbal
either from disk, or from other URL(s)− client sees no HTTP redirect− “Friends-only” images
one, clean URL mod_perl does auth, and is done. perlbal serves.
46
![Page 67: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/67.jpg)
http://danga.com/words/
Internal redirect picture
47
![Page 68: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/68.jpg)
http://danga.com/words/
And the reverse...
Now Perlbal can buffer uploads as well..− Problems:
LifeBlog uploading−cellphones are slow
LiveJournal/Friendster photo uploads−cable/DSL uploads still slow
− decide to buffer to “disk” (tmpfs, likely) on any of: rate, size, time
blast at backend, only when full request is in
48
![Page 69: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/69.jpg)
http://danga.com/words/
Palette Altering GIF/PNGs
based on palette indexes, colors in URL, dynamically alter GIF/PNG palette table, then sendfile(2) the rest.
49
![Page 70: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/70.jpg)
http://danga.com/words/
MogileFS
50
![Page 71: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/71.jpg)
http://danga.com/words/
oMgFileS
51
![Page 72: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/72.jpg)
http://danga.com/words/
MogileFS
our distributed file system open source userspace
based all around HTTP (NFS support now removed) hardly unique
− Google GFS− Nutch Distributed File System (NDFS)
production-quality− lot of users− lot of big installs
52
![Page 73: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/73.jpg)
http://danga.com/words/
MogileFS: Why
alternatives at time were either:− closed, non-existent, expensive, in development,
complicated, ...− scary/impossible when it came to data recovery
new/uncommon/ unstudied on-disk formats because it was easy
− initial version = 1 weekend! :)− current version = many, many weekends :)
53
![Page 74: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/74.jpg)
http://danga.com/words/
MogileFS: Main Ideas
files belong to classes, which dictate:− replication policy, min
replicas, ... tracks what disks files
are on− set disk's state (up,
temp_down, dead) and host
keep replicas on devices on different hosts− (default class policy)− No RAID!
− multiple tracker databases
− all share same database cluster (MySQL, etc..)
big, cheap disks− dumb storage nodes
w/ 12, 16 disks, no RAID
54
![Page 75: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/75.jpg)
http://danga.com/words/
MogileFS components
clients mogilefsd (does all real work) database(s) (MySQL, .... abstract) storage nodes
55
![Page 76: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/76.jpg)
http://danga.com/words/
MogileFS: Clients
tiny text-based protocol Libraries available for:
− Perl tied filehandles MogileFS::Client
− my $fh = $mogc->new_file(“key”, [[$class], ...])− Java− PHP− Python?− porting to $LANG is be trivial− future: no custom protocol. only HTTP
clients don't do database access
56
![Page 77: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/77.jpg)
http://danga.com/words/
MogileFS: Tracker(mogilefsd)
The Meat event-based message bus load balances client requests, world info process manager
− heartbeats/watchdog, respawner, ... Child processes:
− ~30x client interface (“query” process) interfaces client protocol w/ db(s), etc
− ~5x replicate− ~2x delete− ~1x fsck, reap, monitor, ..., ...
57
![Page 78: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/78.jpg)
http://danga.com/words/
Trackers' Database(s)
Abstract as of Mogile 2.x− MySQL− SQLite (joke/demo)− Pg/Oracle coming soon?− Also future:
wrapper driver, partitioning any above− small metadata in one driver (MySQL Cluster?),− large tables partitioned over 2-node HA pairs
Recommend config:− 2xMySQL InnoDB on DRBD− 2 slaves underneath HA VIP
1 for backups read-only slave for during master failover window
58
![Page 79: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/79.jpg)
http://danga.com/words/
MogileFS storage nodes(mogstored)
HTTP transport− GET− PUT− DELETE
mogstored listens on 2 ports... HTTP. --server={perlbal,lighttpd,...}
configs/manages your webserver of choice. perlbal is default. some people like apache, etc
− management/status: iostat interface, AIO control, multi-stat() (for faster
fsck) files on filesystem, not DB
− sendfile()! future: splice()− filesystem can be any filesystem
59
![Page 80: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/80.jpg)
http://danga.com/words/
Large file GET
request
60
![Page 81: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/81.jpg)
http://danga.com/words/
Large file GET
request
Auth: complex, but quick
60
![Page 82: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/82.jpg)
http://danga.com/words/
Large file GET
request
Auth: complex, but quick
Spoonfeeding: slow, but event-based
60
![Page 83: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/83.jpg)
http://danga.com/words/
Gearman
61
![Page 84: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/84.jpg)
http://danga.com/words/
manaGer
62
![Page 85: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/85.jpg)
http://danga.com/words/
Managerdispatches work,
but doesn't do anything useful itself. :)
63
![Page 86: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/86.jpg)
http://danga.com/words/
Gearman
system to load balance function calls... scatter/gather bunch of calls in parallel, different languages, db connection pooling, spread CPU usage around your network, keep heavy libraries out of caller code, ... ...
64
![Page 87: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/87.jpg)
http://danga.com/words/
Gearman Pieces
gearmand− the function call router− event-loop (epoll, kqueue, etc)
workers.− Gearman::Worker – perl/ruby− register/heartbeat/grab jobs
clients− Gearman::Client[::Async] -- perl
− also Ruby Gearman::Client− submit jobs to gearmand
− opaque (to server) “funcname” string− optional opaque (to server) “args” string− opt coallescing key
65
![Page 88: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/88.jpg)
http://danga.com/words/
Gearman Picture
66
![Page 89: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/89.jpg)
http://danga.com/words/
Gearman Picture
gearmand gearmand gearmand
66
![Page 90: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/90.jpg)
http://danga.com/words/
Gearman Picture
Worker Worker
gearmand gearmand gearmand
66
![Page 91: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/91.jpg)
http://danga.com/words/
Gearman Picture
Worker Worker
gearmand gearmand gearmand
can_do(“funcA”)
can_do(“funcA”)can_do(“funcB”)
66
![Page 92: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/92.jpg)
http://danga.com/words/
Gearman Picture
Client Worker Worker
gearmand gearmand gearmand
can_do(“funcA”)
can_do(“funcA”)can_do(“funcB”)
66
![Page 93: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/93.jpg)
http://danga.com/words/
Gearman Picture
Client Worker Worker
gearmand gearmand gearmand
call(“funcA”)can_do(“funcA”)
can_do(“funcA”)can_do(“funcB”)
66
![Page 94: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/94.jpg)
http://danga.com/words/
Gearman Picture
Client Client Worker Worker
gearmand gearmand gearmand
call(“funcA”)can_do(“funcA”)
can_do(“funcA”)can_do(“funcB”)
66
![Page 95: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/95.jpg)
http://danga.com/words/
Gearman Picture
Client Client Worker Worker
gearmand gearmand gearmand
call(“funcA”)
call(“funcB”)can_do(“funcA”)
can_do(“funcA”)can_do(“funcB”)
66
![Page 96: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/96.jpg)
http://danga.com/words/
Gearman Protocol
efficient binary protocol No XML but also line-based text protocol for admin
commands−telnet to gearmand and get status−useful for Nagios plugins, etc
67
![Page 97: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/97.jpg)
http://danga.com/words/
Gearman Uses
Image::Magick outside of your mod_perls! DBI connection pooling (DBD::Gofer +
Gearman) reducing load, improving visibility “services”
− can all be in different languages, too!
68
![Page 98: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/98.jpg)
http://danga.com/words/
Gearman Uses, cont..
running code in parallel− query ten databases at once
running blocking code from event loops− DBI from POE/Danga::Socket apps
spreading CPU from ev loop daemons calling between different languages, ...
69
![Page 99: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/99.jpg)
http://danga.com/words/
Gearman Misc
Guarantees:− none! hah! :)
please wait for your results. if client goes away, no promises
− all retries on failures are done by client but server will notify client(s) if working worker
goes away. No policy/conventions in gearmand
− all policy/meaning between clients <-> workers ...
70
![Page 100: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/100.jpg)
http://danga.com/words/
Sick Gearman Demo
Don’t actually use it like this... but:use strict;use DMap qw(dmap);DMap->set_job_servers("sammy", "papag");
my @foo = dmap { "$_ = " . `hostname` } (1..10);
print "dmap says:\n @foo";
$ ./dmap.pldmap says: 1 = sammy 2 = papag 3 = sammy 4 = papag 5 = sammy 6 = papag 7 = sammy 8 = papag 9 = sammy 10 = papag
71
![Page 101: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/101.jpg)
http://danga.com/words/
Gearman Summary
Gearman is sexy.− especially the coalescing
Check it out!− it's kinda our little unadvertised secret
oh crap, did I leak the secret?
72
![Page 102: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/102.jpg)
http://danga.com/words/
TheSchwartz
73
![Page 103: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/103.jpg)
http://danga.com/words/
TheSchwartz
Like gearman:− job queuing system− opaque function name− opaque “args” blob− clients are either:
submitting jobs workers
But unlike gearman:− Reliable job queueing system− not low latency
− fire & forget (as opposed to gearman, where you wait for result)
currently library, not network service
74
![Page 104: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/104.jpg)
http://danga.com/words/
TheSchwartz Primitives
insert job “grab” job (atomic grab)
− for 'n' seconds. mark job done temp fail job for future
− optional notes, rescheduling details.. replace job with 1+ other jobs
− atomic. ...
75
![Page 105: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/105.jpg)
http://danga.com/words/
TheSchwartz
backing store:− a database− uses Data::ObjectDriver
MySQL, Postgres, SQLite, ....
but HA: you tell it @dbs, and it finds one to insert job into− likewise, workers foreach (@dbs) to do work
76
![Page 106: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/106.jpg)
http://danga.com/words/
TheSchwartz uses
outgoing email (SMTP client)− millions of emails per day− TheSchwartz::Worker::SendEmail− Email::Send::TheSchwartz
LJ notifications− ESN: event, subscription, notification
one event (new post, etc) -> thousands of emails, SMSes, XMPP messages, etc...
pinging external services atomstream injection ..... dozens of users shared farm for TypePad, Vox, LJ
77
![Page 107: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/107.jpg)
http://danga.com/words/
gearmand + TheSchwartz
gearmand: not reliable, low-latency, no disks TheSchwartz: latency, reliable, disks In TypePad:
− TheSchwartz, with gearman to fire off TheSchwartz workers.
disks, but low-latency future: no disks, SSD/Flash, MySQL Cluster
78
![Page 108: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/108.jpg)
http://danga.com/words/
djabberd
79
![Page 109: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/109.jpg)
http://danga.com/words/
djabberd
Our Jabber/XMPP server powers our “LJ Talk” service
S2S: works with GoogleTalk, etc perl, event-based (epoll, etc) done 300,000+ conns tiny per-conn memory overhead
− release XML parser state if possible
80
![Page 110: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/110.jpg)
http://danga.com/words/
djabberd hooks
everything is a hook− not just auth! like, everything.
− auth,− roster,− vcard info (avatars),− presence,− delivery,− inter-node cluster delivery,
− ala mod_perl, qpsmtpd, etc. async hooks
− hooks phases can take as long as they want before they answer, or decline to next phase in hook chain...
− we use Gearman::Client::Async
81
![Page 111: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/111.jpg)
http://danga.com/words/
Thank you!
Questions to:[email protected]
Software:http://danga.com/
http://code.sixapart.com/
82
![Page 112: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/112.jpg)
http://danga.com/words/
Questions?
User DB Cluster 1uc1a uc1b
User DB Cluster 2uc2a uc2b
User DB Cluster 3uc3a uc3b
User DB Cluster NucNa ucNb
Job Queues (xN)jqNa jqNb
Memcached
mc4
mc3
mc2
mcN
...
mc1
mod_perl
web4
web3
web2
webN
...
web1
BIG-IP
bigip2bigip1 perlbal (httpd/proxy)
proxy4
proxy3
proxy2
proxy5
proxy1
Global Database
slave1
master_a master_b
slave2 ... slave5
MogileFS Database
mog_a mog_b
Mogile Trackerstracker3tracker1
Mogile Storage Nodes
...sto2sto8
sto1
net.
djabberddjabberddjabberd
gearmandgearmand1gearmandN
“workers”gearwrkNtheschwkN
slave1 slaveN83
![Page 113: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/113.jpg)
http://danga.com/words/
Bonus Slides
if extra time
84
![Page 114: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/114.jpg)
http://danga.com/words/
Data Integrity
Databases depend on fsync()− but databases can't send raw SCSI/ATA commands
to flush controller caches, etc fsync() almost never works work
− Linux, FS' (lack of) barriers, raid cards, controllers, disks, ....
Solution: test! & fix− disk-checker.pl
client/server spew writes/fsyncs, record intentions on alive machine,
yank power, checks.
85
![Page 115: Behind the Scenes at LiveJournal: Scaling Storytime](https://reader033.vdocuments.site/reader033/viewer/2022042814/554939b3b4c905144d8b4bcc/html5/thumbnails/115.jpg)
http://danga.com/words/
Persistent Connection Woes
connections == threads == memory− My pet peeve:
want connection/thread distinction in MySQL! w/ max-runnable-threads tunable
max threads− limit max memory/concurrency
DBD::Gofer + Gearman− Ask
Data::ObjectDriver + Gearman
86