mixi jp scaling out with open source

57
mixi.jp scaling out with open source Batara Kesuma mixi, Inc. [email protected]

Upload: yejr

Post on 12-Nov-2014

738 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: mixi jp scaling out with open source

mixi.jpscaling out with open source

Batara Kesuma mixi, [email protected]

Page 2: mixi jp scaling out with open source

Introduction

•Batara Kesuma•CTO of mixi, Inc.

Page 3: mixi jp scaling out with open source

What is mixi?

•Social networking service• Diary, community, message, review, photo

album, etc.

• Invitation only

•Largest and fastest growing SNS in Japan

Page 4: mixi jp scaling out with open source
Page 5: mixi jp scaling out with open source

Latest information- Friends new diary- Comments history- Communities topics- Friends new reviews- Friends new albums

My latest diaries and reviews

User Testimonials

Friends

Community listing

Page 6: mixi jp scaling out with open source

History of mixi

•Development started in December 2003• Only 1 engineer (me)

• 4 months of coding

•Opened on February 2004

Page 7: mixi jp scaling out with open source

Two months later

•10,000 users•600,000 PV/day

Page 8: mixi jp scaling out with open source

The “Oh crap!” factor

•This model works•But how do we scale out?

Page 9: mixi jp scaling out with open source

The first year

•The online population of mixi grew significantly

•600 users to 210,000 users

Page 10: mixi jp scaling out with open source

The second year

•210,000 users to 2 million users

Page 11: mixi jp scaling out with open source

And now?

Page 12: mixi jp scaling out with open source

More than 3.7 million users15,000 new users/day

Population of Japan is: 127 millionInternet users: 86.7 million

Source CIA Factbook

Page 13: mixi jp scaling out with open source

70% of active users(last login less than 72 hours)

Page 14: mixi jp scaling out with open source

Average user spends 3 hours 20 minutes on mixi

per week

Page 15: mixi jp scaling out with open source

Ranked 35th on Alexa worldwide, and 3rd in

Japan

Page 16: mixi jp scaling out with open source

PV growth in 2 years

Google Japan

mixi

Amazon Japan

Page 17: mixi jp scaling out with open source

Users growth in 2 years

0

875,000

1,750,000

2,625,000

3,500,000

04/03 05/03 06/03

Users

Page 18: mixi jp scaling out with open source

Our technologysolutions

Page 19: mixi jp scaling out with open source

The technology behind

•Linux 2.6•Apache 2.0

•MySQL

•Perl 5.8•memcached

•Squid

Page 20: mixi jp scaling out with open source

mod_proxy

mod_perl

diary cluster message cluster

images

other cluster

HOT OBJECTS

memcached

REQUEST REQUEST

Powered by

Page 21: mixi jp scaling out with open source

MySQL

•More than 100 MySQL servers•Add more than 10 servers/month

•Non-persistent connection

•Mostly InnoDB•Heavily rely on the use of DB partitioning (our own solution)

Page 22: mixi jp scaling out with open source

DB replication

•MySQL server load gets heavy•Add more slaves

mod_perl

DB

RE

QU

ES

T

DB

Replicate

QUERY (WRITE)

QUERY (READ)

Page 23: mixi jp scaling out with open source

DB replication•Classic problem with DB replication

100 reads/s

50 writes/s

MASTER

50 reads/s

50 writes/s

50 reads/s

50 writes/s

SLAVES

100 reads/s

50 writes/s

MASTER

25 reads/s

50 writes/s

25 reads/s

50 writes/s

25 reads/s

50 writes/s

25 reads/s

50 writes/s

SLAVES

Page 24: mixi jp scaling out with open source

Some statistics•Diary related tables

•Read 85%•Write 15%

•Message related tables •Read 75%•Write 25%

Page 25: mixi jp scaling out with open source

DB partitioning

•Replication couldn’t keep up anymore

•Try to split the DB

Page 26: mixi jp scaling out with open source

How to split?

DB

message tables

diary tables

other tables

user A user B user C

Splitting vertically by users or splitting horizontally by table types

Page 27: mixi jp scaling out with open source

Vertical partition

DB

message tables

diary tables

other tables

user A user B user C

DB 1 DB 2

Page 28: mixi jp scaling out with open source

Vertical partition

•Too many tables to deal with at one time

•The transition in splitting gets complex and difficult

Page 29: mixi jp scaling out with open source

Horizontal partition

message tables

OLD DB

other tables

diary tables

Also called level 1 partitioning within mixi

message tables

NEW DB

$dbh = $db->load_dbh(type => “message”);

$dbh = $db->load_dbh();

diary tables

NEW DB

$dbh = $db->load_dbh(type => “diary”);

Page 30: mixi jp scaling out with open source

Partition map for level 1

•Small and static•Just put it in configuration file

•For example:$DB_DIARY = ‘DBI:mysql:host=db1;database=diary’;$DB_MESSAGE = ‘DBI:mysql:host=db2;database=message’;...

Page 31: mixi jp scaling out with open source

Easy transition

OLD DB NEW DB

mod_perlW

RITE

READ

WRITE

1 Writes to both DBs

SELECTINSERT IGNORE

2 Copies in background

READ

3Shifts reads

Page 32: mixi jp scaling out with open source

Problems with level 1

•Cannot use JOIN anymore• Use FEDERATED TABLE from MySQL 5

• Or do SELECT twice which is faster than using FEDERATED TABLEs

• If table is small, just duplicate it

Page 33: mixi jp scaling out with open source

Next step

•When the new DB gets overloaded

•We split the DB, yet again•Get ready for level 2

Page 34: mixi jp scaling out with open source

message tables

user id

Partitioning key

•user id, message id•Choose wisely!

user A user B

message id

message tablesor

Page 35: mixi jp scaling out with open source

Level 2 partition

message tables

LEVEL 1 DB

user A user B user C user D

message tables

NODE 1NEW DB message tables

NODE 2

Page 36: mixi jp scaling out with open source

Partition map for level 2

•Big and dynamic•Cannot put it all in configuration file

Page 37: mixi jp scaling out with open source

Partition map for level 2

•Manager based• Use another DB to do the partition

mapping

•Algorithm based• Partition map is counted inside

application

• node_id = member_id % TOTAL_NODE

Page 38: mixi jp scaling out with open source

Manager based

message tables

NODE 1

message tables

NODE 2

message tables

NODE 3

MANAGER DB

mod_perl

user_id=14

1 Asks for node_id

node_id=22 Returns node_id

3 Connects to node

Page 39: mixi jp scaling out with open source

Algorithm based

message tables

NODE 1

message tables

NODE 2

message tables

NODE 3

mod_perl

node_id=(user_id%3)+1node_id=3

1 Computes node_id

number of nodes = 3

2 Connects to node

Page 40: mixi jp scaling out with open source

Manager based

•Pros:• Easy to manage

• Add a new node, move data between nodes

•Cons:• This process increases by 1 query for

partition map

• It needs to send a request to the manager

Page 41: mixi jp scaling out with open source

Algorithm based

•Cons:• Difficult to manage

• Adding new nodes is tricky

•Pros:• Application servers can compute node id

by themselves

• Bypass the connection to the manager

Page 42: mixi jp scaling out with open source

Adding nodes is tricky

mod_perl

NODE 1

NODE 2+

NODE 3

NODE 4

READWRITE

2 Writes to both DBsif node_id is different

old_node_id=(member_id%2)+1

WRITE

number of nodes = 2

new_node_id=(member_id%4)+1

number of nodes = 41 Adds a new application logic C

OP

Y

CO

PY

3 Copies in background

READ4 Shifts reads

Page 43: mixi jp scaling out with open source

Problems with level 2

NODE 1member tables

NODE 2member tables

NODE 3member tables

• Too many connections to different DBs

• Fortunately, on mixi, the majority are small data sets

• Cache them all by using distributed memory caching

• We rarely hit the DB

NODE 1community tables

NODE 2community tables

• Average page load time is about 0.02 sec*

* depending on data sets average load time may vary

Page 44: mixi jp scaling out with open source

Caching

•memcached• Also used in LiveJournal, Slashdot, etc

•Install server on mod_perl machine

•39 machines x 2 GB memory

Page 45: mixi jp scaling out with open source

Summary of DB partitioning

•Level 1 partition (split by table types)

•Level 2 partition (split by partitioning key)•Manager based•Algorithm based

Page 46: mixi jp scaling out with open source

LEVEL 1message tables

1 Split by table types

Summary of DB partitioninguser A user B user C

message tables

OLD DB

other tables

diary tables

message tables

LEVEL 2

2

message tables

LEVEL 2

Split by partitioning key

Page 47: mixi jp scaling out with open source

Image Servers

Page 48: mixi jp scaling out with open source

Statistics

•Total size is more than 8 TB of storage

•Growth rate is about 23 GB / day•We use MySQL to store metadata only

Page 49: mixi jp scaling out with open source

Two types of images

•Frequently accessed images• Number of image files is relatively small

(about a few million files)

• For example, user profile photos, community logos

•Rarely accessed images• About hundred millions of image files

• Diary photos, album photos, etc

Page 50: mixi jp scaling out with open source

Frequently accessed images

•Few hundred GBs of files•Distribute via the use of FTP and Squid

•Third party Content Delivery Network

Page 51: mixi jp scaling out with open source

Frequently accessed images

mod_perl Storage

Squid CDNsto1.mixi.jp sto2.mixi.jp

UPLOAD

1 Uploads to storage

2 Pull images from storage

Page 52: mixi jp scaling out with open source

Rarely accessed images

•Few TBs of files•Newer files get accessed more often

•Cache hit ratio is very bad

•Distribute directly from storage

Page 53: mixi jp scaling out with open source

Uploading rarely accessed images

mod_perl

MANAGERDB

Storagesto1.mixi.jp

Storagesto2.mixi.jp

Storagesto3.mixi.jp

Storagesto4.mixi.jp

abc.gif

1 Assigns a id for an image file

area_id=1,2

2 Arranges a pair of area_id

UPLOAD

UPLOAD

3 Uploads image to storage

Page 54: mixi jp scaling out with open source

Viewing rarely accessed images

Storagesto1.mixi.jp

Storagesto2.mixi.jp

Storagesto3.mixi.jp

Storagesto4.mixi.jp

User

mod_perl

MANAGERDB

Asks for view_diary.pl

1

2 Detects abc.gif in view_diary.pl

abc.gifAsks for area_id 3

area_id =1

4 Returns area_id

Creates image URL

5

Returns view_diary.pland URL for abc.gif

6

Asks for abc.gif7

Returns abc.gif8

Page 55: mixi jp scaling out with open source

To do

•Try MySQL Cluster•Try to implement better algorithm• Consistent hashing?

• Linear hashing?

•Level 3 partitioning?• Split again by timestamp?

Page 56: mixi jp scaling out with open source

Questions?

Page 57: mixi jp scaling out with open source

Thank you

•Further questions to [email protected]

•We are hiring :)•Have a nice day!