retaining globally distributed high availability
DESCRIPTION
Example of a solution for retaining globally distributed high availability with MySQLTRANSCRIPT
![Page 1: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/1.jpg)
Retaining globally distributed high availability Art van Scheppingen Head of Database Engineering
![Page 2: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/2.jpg)
2
1. Who is Spil Games? 2. Theory 3. Spil Storage Pla9orm 4. Ques=ons?
Overview
![Page 3: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/3.jpg)
Who are we? Who is Spil Games?
![Page 4: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/4.jpg)
4
• Company founded in 2001 • 350+ employees world wide • 180M+ unique visitors per month • 45 portals in 19 languages • Casual games • Social games • Real =me mul=player games • Mobile games
• 35+ MySQL clusters • 60k queries per second (3.5 billion qpd)
Facts
![Page 5: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/5.jpg)
5
Geographic Reach 180 Million Monthly Ac=ve Users(*)
Source: (*) Google Analy3cs, August 2012
• Over 45 localized portals in 19 languages • Mul= pla9orm: web, mobile, tablet • Focus on casual and social games • 180M MAU per month (30M YoY growth) • Over 50M registered users
![Page 6: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/6.jpg)
6
Girls, Teens and Family
spielen.com juegos.com gamesgames.com games.co.uk
Brands
![Page 7: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/7.jpg)
Foundations The exci2ng theory
![Page 8: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/8.jpg)
8
• What does it exactly mean?
Retaining globally distributed HA
![Page 9: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/9.jpg)
9
Wikipedia: High availability is a system design approach and associated service implementa=on that ensures a prearranged level of opera=onal performance will be met during a contractual measurement period. Oracle: • Availability of resources in a computer system
What is high availability?
![Page 10: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/10.jpg)
10
• Master with (many) slave(s)
How do we reach HA with MySQL?
Master
Slave Slave Slave
![Page 11: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/11.jpg)
11
• Master with (many) slave(s) • Mul= Master
How do we reach HA with MySQL?
Master
Slave
Master
Slave
![Page 12: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/12.jpg)
12
• Master with (many) slave(s) • Mul= Master • Clustering
How do we reach HA with MySQL?
MysqldMysqld
ndbd
ndbd ndbd
ndbd ndbd
mgmt
![Page 13: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/13.jpg)
13
• Master with (many) slave(s) • Mul= Master • Clustering • Geographical redundancy
How do we reach HA with MySQL?
Master local DC
Slave local DC
Slave Asia Slave US
![Page 14: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/14.jpg)
14
• Scale up • Ver=cal • Faster CPU/Memory/disks • Expensive • Costs mul=ply in same rate as # of nodes
• Scale out • Horizontal • More (small) machines • Inexpensive • Par==oning/federa=ng (sharding)
What if we keep growing?
![Page 15: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/15.jpg)
15
• Func=onal • Shard your database func=onally
• Reads • Add more slaves (keep them coming!)
• Writes • More disks • Horizontal par==oning • Federated par==ons
Scale out
![Page 16: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/16.jpg)
16
• Breaking up tables in small parts on the same host • Par==oned on a column • Infinite growth (as long as you add diskspace) • Less used data to slower (cheaper) disks • No stored procedures, func=ons, etc • Uneven usage of par==ons (hash par==on may help) • Once wrihen, data remains on the par==on
Horizontal partitioning
![Page 17: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/17.jpg)
17
• Breaking up your table in parts on mul=ple hosts • Par==oned on a column • Infinite growth (as long as you add hosts) • Less used data on slower hosts • Not supported in (standard) MySQL • Par==oning on applica=on level (or proxy) • Alterna=vely: NDB
• Uneven usage of par==ons • Once wrihen data (mostly) remains on the par==on • Parallel queries to retrieve data from all shards
Federated partitions (sharding)
![Page 18: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/18.jpg)
18
• Parallel execu=on of sequen=al jobs • Limited by the weakest link • As fast as the slowest node • Fix: nonsequen=al (asynchronous) execu=on
Amdahl's law
![Page 19: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/19.jpg)
19
Typical LAMP stack
Client
Webserver
PHP
MySQL
Memcache
Webserver
PHP
Loadbalancer
![Page 20: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/20.jpg)
20
A-typical LAMP stack
Client
Webserver
PHP
MySQL
Memcache
Webserver
PHP
Loadbalancer
MQ
Jobs
![Page 21: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/21.jpg)
Spil Storage Platform Abstrac2ng the storage layer
![Page 22: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/22.jpg)
22
• Dependent on one storage pla9orm • No more pla9orm-‐specific query language
• Differen=ate writes • Op=mis=c (asynchronous) • Pessimis=c (synchronous)
• Shard data beher • Par==on on user and func=on • Cluster informa=on by users, not by func=on
• Global expansion • Par==on on geographic loca=on
• Solve uneven usage of data storage • Move data from shard to shard
• Anything may/could/will fail eventually • Not designed for the “happy” flow
What was our wishlist?
![Page 23: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/23.jpg)
23
Old architecture overview
![Page 24: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/24.jpg)
24
New architecture overview
![Page 25: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/25.jpg)
25
New architecture overview
Server API
Application Model
Storage platform
Client-side API
Presentation layer
Physical storage
![Page 26: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/26.jpg)
26
• Everything wrihen in Erlang • Piqi as protocol • binary • JSON • XML
• SSP u=lizes local caching (memcache) • Flexible (persistent) storage layer • MySQL (various flavors) • Membase/Couchbase • Could be any other storage product
• MQs (DWH updates)
Our building blocks
![Page 27: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/27.jpg)
27
• Predictable • Reliable • Decent performance • Easy to comprehend • Excellent eco system • Libraries • Monitoring tools • Knowledge
Why choose MySQL?
![Page 28: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/28.jpg)
28
• Func=onal language • High availability: designed for telecom solu=ons • Excels at concurrency, distribu=on, fault tolerance • Do more with less! • Other companies using Erlang:
Why Erlang?
![Page 29: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/29.jpg)
29
• What is the bucket model? • Each record has one unique owner ahribute (GID) • GID (Global IDen=fier) iden=fying different types • Bucket(s) per func=onality • Bucket is structured data • Ahributes contain data of records • Ahributes do not have to correspond to schema
How do we shard?
![Page 30: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/30.jpg)
30
$ curl -‐X POST -‐H 'Accept: applica=on/json' -‐H \ 'Content-‐Type: applica=on/json' -‐-‐data-‐binary "{\"gid\": \ 288511851128422401}" hhp://127.0.0.1:8777/demobucket/get { "records": [ { "gid": 288511851128422401, "given_name": "g", "registered_on": 1, "email": "mail1", "gender": "m", "birthdate": { "year": 1963, "month": 6, "day": 21 } } ], "meta_info": { "total_ct": 1 } }
Example bucket
![Page 31: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/31.jpg)
31
CREATE TABLE demobucket ( gid bigint(20) unsigned not null, given_name varchar(64) not null, registered_on =nyint(3) unsigned default 0, email varchar(255) not null, gender enum(‘m’, ‘f’, ‘u’) not null default ‘m’, birthdate date not null, PRIMARY KEY(gid) );
Example bucket MySQL 1
![Page 32: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/32.jpg)
32
CREATE TABLE demobucket ( gid bigint(20) unsigned not null, user_name varchar(64) not null, user_register =mestamp on update CURRENT_TIMESTAMP(), user_emailaddress varchar(255) not null, user_gender char(1) not null default ‘m’, user_dob varchar(10) not null, PRIMARY KEY(gid) );
Example bucket MySQL 2
![Page 33: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/33.jpg)
33
CREATE COLUMNFAMILY demobucket ( gid int PRIMARY KEY, given_name varchar, registered_on =mestamp, email varchar, gender varchar, birth_date varchar );
Example bucket Cassandra
![Page 34: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/34.jpg)
34
demobucket:get( #demobucket_get_input{ gid=12345, filters= [ #filter{ ahr= <<"gender">> , op= <<"=">> , parms= {string, <<"f">>}}, #filter{ ahr= <<"registered_on">>, op= <<"sort">>, parms=asc }, #filter{ ahr= <<"gid">>, op= <<"limit">>, parms={int, 10 }} ]} )
Example Erlang filters
![Page 35: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/35.jpg)
35
Pipeline flow of a bucket
![Page 36: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/36.jpg)
36
• Nearest datacenter (DC) to the end user • Satellite DC • Processing and caching • Do not own/store data
• Storage DC • Processing, caching and persistent storage • Store all same user data in same DC
• Par==on on user globally • Global IDen=fier per user
Global distribution
![Page 37: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/37.jpg)
37
• Contains GIDs and their master DC • GIDs master DC predefined • Migrated GIDs get updated
The lookup server
![Page 38: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/38.jpg)
38
• Globally sharded on GID • (local) GID Lookup
How does this work?
GID lookup
Shard 1 Shard 2
Persistent storage
![Page 39: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/39.jpg)
39
Master/Satellite DC example
![Page 40: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/40.jpg)
40
• Spread data even on shards • Migra=on of buckets between shards
• GID migra=on between DCs • Crea=ng a new storage DC needs data migra=on • Users will automa=cally be migrated a�er visi=ng another DC many =mes
Why do we need data migration?
![Page 41: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/41.jpg)
41
• Versioning on bucket defini=ons • GIDs are assigned to a bucket version • Data in old bucket versions remain (read only) • New data only gets wrihen to new bucket version • Updates migrate data to new bucket version • Migrates can be triggered
Seamless schema upgrades
![Page 42: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/42.jpg)
42
Seamless schema upgrades
Demobucket v1
GID 1234 1235 1236 1237 1238 1239
name Roy Moss Jen
Douglas Denholm Richmond
Demobucket v2
GID
name
gender
GID 1241
name Patricia
gender f
GID 1241 1235
name Patricia Moss
gender f m
GID 1234
1236 1237 1238 1239
name Roy Jen
Douglas Denholm Richmond
GID 1234
1237 1238 1239
name Roy
Douglas Denholm Richmond
GID 1241 1235 1236
name Patricia Moss Jen
gender f m f
![Page 43: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/43.jpg)
43
• Every cluster (two masters) will contain two shards • Data wrihen interleaved • HA for both shards • No warmup needed
• Both masters ac=ve and “warmed up” • Slaves added (other DC) for HA and backup
Multi Master writes
SSP
Shard 1
Shard 2
![Page 44: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/44.jpg)
44
• SPAPI is in place • SSP is (mostly) running in shadow mode • GID buckets running in produc=on • Ac=vity feed system first to produc=on • Satellite DC in early 2013!
Where do we stand now?
![Page 45: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/45.jpg)
45
![Page 46: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/46.jpg)
Questions?
![Page 47: Retaining globally distributed high availability](https://reader034.vdocuments.site/reader034/viewer/2022052523/5554dda1b4c905a16f8b52ab/html5/thumbnails/47.jpg)
47
• Presenta=on can be found at: hhp://spil.com/perconalondon2012 • If you wish to contact me: [email protected] • Don’t forget to rate my talk!
Thank you!