phxsql: a high-availability & strong-consistency mysql cluster · pdf filephxsql: a...
TRANSCRIPT
PhxSQL: A High-Availability & Strong-Consistency
MySQL Cluster
Ming CHEN@WeChat
Why PhxSQLHighly expected features for MySql cluster
3
Availability and consistency in MySQL cluster• Master-slaves replication
• Replication solutions
• Fully synchronous
• Semi-synchronous
• Asynchronous
• Third-party aided failover
• Zookeeper/Admin/…
Binlog…3: z2: y1: x
Innodb-|--|--|--|-
SVR C
MySQL(slave)
Binlog…3: z2: y1: x
Innodb-|--|--|--|-
SVR B
MySQL(slave)
Binlog…3: z2: y1: x
Innodb-|--|--|--|-
SVR A
MySQL(master)
pull pull
client
4
Call for both high-availability and strong-consistency
• Critical Applications demand both high-availability and strong-consistency
• accounts/financial transactions/…
• Support for both high-availability and strong-consistency
• greatly simplifies system design
• makes correctness-reasoning easier
5
A new MySQL cluster supports the same high-availability and strong-consistency as Zookeeper does
PhxSQL
What is PhxSQLHigh-availability & strong-consisteny MySQL cluster
7
Key PhxSQL features• Support full compatibility with MySQL
• Support high availability and linearizable consistency
• Support deployment in wide-area network
• Support online reconfiguration of cluster membership
8
Support full compatibility with MySQL• Transparent to MySQL clients
• Support full features of MySQL
• Even serializable level of transaction isolation
• Minimum intrusive change to MySQL source code
modify after_flush
add before_recoverty
9
Support high availability• Fully automatic failover within configurable seconds
• A cluster works well when more than half of cluster servers still function
10
Support linearizable consistency• A cluster of PhxSQL seems to be a single MySQL server to MySQL clients
concurrently accessing it
• PhxSQL supports the same consistency level as strong as Zookeeper does!
11
Support online reconfiguration of cluster membership• Add a new server, remove an old server, or replace an old server with a new one
in an atomic fashion while still serving read/write requests
• Be easier for maintenance and more friendly to clients
How PhxSQL WorksMySQL cluster powered by Paxos
13
Paxos maintains consistent states across servers by enforcing same operations on all servers when more than half of servers work
SVR A
Client R Client S Client T
int X=0;
int Y=0;
void Foo(int a, int b);
bool Bar(int a, int b);
invoke: R1, T1, S1, R2, …
Paxos
SVR B
int X=0;
int Y=0;
void Foo(int a, int b);
bool Bar(int a, int b);
Paxos
SVR C
int X=0;
int Y=0;
void Foo(int a, int b);
bool Bar(int a, int b);
Paxos
Foo(1, 2); Bar(3, 30) Bar(1, 10) Foo(3, 4)
1: R1: Foo(1, 2)
2: T1: Foo(3, 4)
3: S1: Bar(1, 10)
4: R2: Bar(3, 30)
…
invoke: R1, T1, S1, R2, … invoke: R1, T1, S1, R2, …God
14
PhxSQL enforces a simple but effective constraint by Paxos• Constraint: two MySQL servers have SAME states if and only if they have
SAME binlog
• Enforcement• PhxSQL maintains a GLOBAL consistent binlog by Paxos
• Every MySQL server aligns its LOCAL binlog to the GLOBAL one.
15
MySQL cluster
client
Binlog…3: z2: y1: x
Innodb-|--|--|--|-
SVR B
MySQL(slave)
Binlog…3: z2: y1: x
Innodb-|--|--|--|-
SVR A
MySQL(master)
Binlog…3: z2: y1: x
Innodb-|--|--|--|-
SVR C
MySQL(slave)
pull
pull
16
PhxSQL
client
Binlog…3: z2: y1: x
Innodb-|--|--|--|-
PhxPlugin
PhxBinlog
…3: z2: y1: x
Paxos
PhxProxy
SVR B
PhxBinlogSvr
MySql(slave)
Binlog…3: z2: y1: x
Innodb-|--|--|--|-
PhxPlugin
PhxBinlog
…3: z2: y1: x
Paxos
PhxProxy
SVR A
PhxBinlogSvr
MySql(master)
Binlog…3: z2: y1: x
Innodb-|--|--|--|-
PhxPlugin
PhxBinlog
…3: z2: y1: x
Paxos
PhxProxy
SVR C
PhxBinlogSvr
MySql(slave)
pull pull
forward
1. Paxos detects failure and elects new master by leasing and periodic heartbeat
2. Clients access master MySQL transparently via PhxProxywho forwards requests to current master
3. MySql master appends local binlog to global consistent PhxBinlog maintained by Paxos
4. MySql slaves pull binlog from global consistent PhxBinlog
17
Paxos maintains consistent states in the cluster
PhxBinlogSvr A
PhxPlugin:Client R PhxPlugin:Client S PhxPlugin:Client T
master_ver
master_id
binlog_queue
C1: bool change_master(ver, new_master);
C2: int app_binlog(binlog_id, binlog);
Paxos
PhxBinlogSvr B
Paxos
PhxBinlogSvr C
Paxos
C1(1, R); C2(“R_xxx”, “insert foo=2 into table_1”) C1(1, S) C1(1, T)
1: R1: C1(1, R)
2: T1: C1(1, T)
3: S1: C1(1, S)
4: R2: C2(…)
…
SM SM
God
SM master_ver
master_id
binlog_queue
C1: bool change_master(ver, new_master);
C2: int app_binlog(binlog_id,binlog);
master_ver
master_id
binlog_queue
C1: bool change_master(ver, new_master);
C2: int app_binlog(binlog_id,binlog);
How to Use PhxSQLEasy to integrate into existing system
19
Case 1: non-intrusive change to existing system
• Assume there some kinds of naming service that tells the current master in existing system• Zookeeper/DNS/configuration file/...
• Develop a new daemon program to learn the change of master in PhxSQL cluster and update information in the naming service accordingly• Safety: PhxSQL ensures consistency even the master information in naming service
is stale or MySQL clients connects to a slave
naming servicemaster ip: 10.1.1.1
master10.1.1.1
slave10.1.1.3
slave10.1.1.2
PhxSQL
client
daemonI. learn new master
II. update
1. get ip of master
2. request
20
Case 2: minimal intrusive change to existing system
• Traditional invocation scenario1. Read the IP of master from configuration file
2. Get MySQL handler by calling mysql_real_connect(MYSQL *mysql, const char * IP, …) with the IP
3. Invoke other MySQL client API with the handler
• PhxSQL invocation scenario1. Read the IP list of PhxSQL cluster servers from configuration file
2. Get MySQL handler by calling PhxSQLClientBase::Connect() with the IP list and then PhxSQLClientBase::GetMySQLFD()
3. Invoke other MySQL client API with the handler
PerformanceOn par with MySQL semi-sync replication
22
Performance test settings: PhxSQL vs. MySQL Semi-sync• 3 servers: Intel(R) Xeon(R) CPU E5-2420 @ 1.90GHz * 2, 32G memory, SSD
Raid10, 1000M NIC
• Network ping latency: master->slave: 3ms; client->master: 4ms
• Percona 5.6.31-77.0
• Master of MySQL semi-sync replication waits for only ONE ack
• Test tool and parameters: sysbench --oltp-tables-count=10 --oltp-table-
size=1000000 --num-threads=500 --max-requests=100000 --report-interval=1 --
max-time=200
23
QPS on par
5076
4633425657
4055
47528
20391
insert.lua(100% write)
select.lua(0% write)
OLTP.lua(20% write)
200 concurrent client threads
PhxSQL MySQL
8260
105928
465437072
121535
33229
insert.lua(100% write)
select.lua(0% write)
OLTP.lua(20% write)
500 concurrent client threads
PhxSQL MySQL
24
Response time (ms) on par
39.34 4.21
140.16
49.274.1
176.39
insert.lua(100% write)
select.lua(0% write)
OLTP.lua(20% write)
200 concurrent client threads
PhxSQL MySQL
60.414.58
192.93
70.64.17
270.38
insert.lua(100% write)
select.lua(0% write)
OLTP.lua(20% write)
500 concurrent client threads
PhxSQL MySQL
25
Current progress
• Deployed in WeChat account system• WeChat account: 889M monthly active users
• Support for Percona 5.7 in progress
• Open source: https://github.com/tencent-wechat/phxsql