yahoo: experiences with mysql gtid and multi threaded replication

57
Yahoo Case Study: MySQL GTIDs and Parallel or Multithreaded Replication PRESENTED BY Stacy Yuan, Yashada JadhavOctober 2015

Upload: yashada-jadhav

Post on 15-Apr-2017

5.196 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Yahoo Case Study: MySQL GTIDs and Parallel or Multithreaded Replication

PRESENTED BY Stacy Yuan, Yashada Jadhav| October 2015

Page 2: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

About Yahoo

▪  Yahoo is focused on making the world’s daily habits inspiring and entertaining.

▪  By creating highly personalized experiences for our users, we keep people connected to what matters most to them, across devices and around the world.

▪  In turn, we create value for advertisers by connecting them with the audiences that build their businesses

▪  More than 1B monthly active users across Yahoo and Tumblr

▪  More than 575M mobile monthly active users across Yahoo and Tumblr

Page 3: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Ad Products Team

Mission Statement: Delivering scalable and cost efficient data services through innovation and automation powering Yahoo Products ▪  Thousands of Production Servers ▪  OLTP systems & Data marts ▪  Database Design and Architecture ▪  Capacity Planning and Performance Reviews ▪  24x7 Monitoring and Operational Support

Page 4: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

MySQL at Yahoo

▪  MySQL powers many mission-critical products within Advertising and User space across Desktop and Mobile

▪  Multiple production configurations based on product requirement ▪  Yahoo Sports, Daily Fantasy: Mobile friendly ▪  Flickr: Sharded across thousands of servers ▪  DBaaS setup for multiple products ▪  Hot:Hot, Hot:Warm Configurations ▪  Versions range from Percona Server 5.5 to 5.6 including Percona

XtraDB Cluster ▪  Operating systems running customized RHEL 6.x

Page 5: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

About Stacy

▪  14 years of experience on various flavors of relational databases. ▪  Focus on performance tuning, code reviews, database deployment

and infrastructure management for MySQL ▪  In her spare time, she enjoys reading books and doing some

volunteer work.

Page 6: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

About Yashada

▪  MySQL DevOps Engineer with a background in database design and performance tuning.

▪  4+ years of experience on various flavors of relational databases. ▪  Special Skills - Fluency in Sarcasm

Page 7: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

What are the next 45 minutes about?

•  GTID Replication ● Advantages and Disadvantages ● Performance when compared to regular replication

▪  Multi threaded slaves

●  Why do we want MTS? ●  MTS vs single threaded replication - Performance tests

▪  Rolling out GTID and MTS to a live system with no downtime ▪  GTID and MTS in Production

●  Operational issues ●  Monitoring and HA ●  Backups using xtrabackup

Page 8: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Why go for GTID and MTS

▪  Slave promotion becomes easier with a global transaction ID ▪  Multitenant database systems suffer from problems like resource

contention due to bad queries, batch jobs etc. that affect replication. ▪  MTS without GTID - replication co-ordinates might no longer be

accurate due to multiple parallel worker threads. ▪  MTS with GTID

Page 9: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

GTID Replication

Page 10: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

File-based Replication

Enables data from one MySQL database server (the master) to be replicated to one or more MySQL database servers (the slaves) through MySQL log file and its position ▪  Have replication user, binlog is enabled ▪  Have a copy of master database ▪  Connect to master through master_host, port, replication user,

master log file and its position. ▪  Each slave pulls the data from the master, and execute the events

to the slave.

Page 11: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

GTID Replication

A global transaction identifier (GTID) is a unique identifier created and associated with each transaction committed on the server of origin (master). GTID is unique not only to the server on which it originated, but is unique across all servers in a given replication setup. GTID = source_id:transaction_id ▪  The source_id identifies the originating server. ▪  The transaction_id is a sequence number determined by the order in

which the transaction was committed on this server.

Example: ▪  5c7401d3-3623-11e5-ae8c-78e7d15fd641:1-13476

Page 12: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

GTID Replication Advantage

▪  Replication topology is easy to change - binlog file name and

position are not required any more instead we use master_auto_position=1

▪  Failover is simplified

▪  Increase performance in relay slave - set sync_binlog=0 ▪  Managing multi-tiered replication is easier

Master_log_file=‘mysql-bin.***’ Master_log_pos=****

master_auto_position=1

Page 13: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Replication Failover Comparison

Regular Rep Failover If S1 is bad, S4 S5 need to be rebuilt.

M M

GTID Rep Failover Redirect S4 to M

S2

S3

S1

S4

S5

S2 S1

S3 S4

S5

Page 14: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

GTID Replication Limitations

▪  GTID does not provide replication monitoring ▪  SQL_SKIP_SLAVE_COUNTER does not work

▪  Can not force the database to start replication from specific position

Page 15: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

GTIDs Replication Caveats

▪  Updates involving non-transactional storage engines.

▪  CREATE TABLE ... SELECT statements is not supported. ▪  Temporary table is not supported inside a transaction

To prevent GTID-based replication to fail: enforce-gtid-consistency

Page 16: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Replication Performance GTID vs Regular Rep

In terms of performance, GTID is almost same as regular replication. It is slightly slower. The reasons could be - ▪  GTIDs write more lines into binary log - information about GTID ▪  GTID performs additional checks for transactions

Page 17: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

GTID vs Regular Rep

Page 18: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

GTID vs Regular Rep

Page 19: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

GTID vs Regular Rep

Page 20: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Multi threaded Replication

Page 21: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Single threaded replication

▪ Applications multi-threaded parallel write into master ▪ Replication from master to slave is single-threaded, it becomes bottleneck in a busy system.

Master Slave

Page 22: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Multi-Threaded Slaves (MTS)

▪  Coordinator thread on slave dispatches work across several worker threads

▪  Each worker thread commit transaction individually. ▪  Multiple active schemas/databases can take advantage of parallel

replication

Master Slave

Page 23: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

MTS Prerequisites

▪  MySQL 5.6 above ▪  Transactions are independently based on different databases. ▪  Multitenant databases is the best to enable MTS ▪  N databases, use N parallel workers slave_parallel_workers = N ▪  Example: 3 databases in MySQL, better to set

slave_parallel_workers =3

Master db1, db2, db3

Slave db1, db2, db3

Page 24: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Configure MTS

▪  STOP SLAVE; ▪  SET GLOBAL slave_parallel_workers=3; ▪  START SLAVE;

Page 25: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

MTS Execution Gaps and Checkpoint

▪  Events are no longer guaranteed to be consecutive ▪  Execution gaps are tracked ▪  Checkpoints are performed from time to time Check settings slave_checkpoint_period default 300 ms slave_checkpoint_group default 512 trx ▪  Exec_Master_Log_Pos shows the latest checkpoint and not latest

transaction ▪  How to fix execution gaps - STOP SLAVE; START SLAVE UNTIL SQL_AFTER_MTS_GAPS

Page 26: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Convert MTS to Single-threaded

▪  Run MTS until no more gaps are found in the relay log ▪  Stop Replication ▪  Configure single threaded slave ▪  Start single threaded slave

START SLAVE UNTIL SQL_AFTER_MTS_GAPS; SET @@GLOBAL.slave_parallel_workers = 0; START SLAVE;

Page 27: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

MTS Advantages and Limitations

Advantages: ▪  Take advantage of multi-core servers ▪  Changes to each schema applied and committed independently by

worker threads ▪  Smaller risk of data loss

Limitations: ▪  START SLAVE UNTIL no longer support ▪  Foreign Keys cross-referencing DBs will disable MTS ▪  No implicit transaction retry after transient failure

Page 28: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

MTS Caveats

▪  Enforcing foreign key relationships between tables in different

databases causes MTS to use sequential mode which can have negative impact on performance

▪  Single database replication, it slows down the replication

performance

Page 29: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

MTS without GTID

▪  Exec_Master_Log_Pos in SHOW SLAVE STATUS is misleading. ▪  Skipping replication errors with SQL_SLAVE_SKIP_COUNTER=1 is dangerous ▪  Backup from slave, either mysqldump and xtrabackup might not get right position

GTID comes to the rescue

Page 30: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Performance Testing - GTID with MTS Setup

Test scenario: ▪  one master, ▪  two slaves (one is single-threaded replication, another slave is multi-

threaded replication both using GTID

Master

Slave1 Slave2

GTID Rep MTS GTID Rep

Page 31: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Replication Performance Comparison

Page 32: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Replication Performance Comparison

QPS is increased about 3 or 4 times Load, CPU, and Writes per second are increased as well

Page 33: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Replication Performance Comparison

Page 34: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

GTID with MTS enabled: Things to watch out for

▪  Exec_Master_Log_Pos is no longer reliable ▪  Executed_Gtid_Set is the reliable

▪  SQL_SLAVE_SKIP_COUNTER no longer works ▪  START SLAVE UNTIL is not supported ▪  Slave_transaction_retries is treated as 0, and can not be changed.

Page 35: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

MySQL57 GA

▪  Parallel replication improvement Slave can apply transaction in parallel with single database/schema with --slave-parallel-type=LOGICAL_CLOCK.

▪  GTID improvements: ●  Automatically tracks the replication position in replication stream. ●  Enable/disable GTID can be online without MySQL restart

Page 36: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

MySQL57 SLAVE_PARALLEL_TYPE STUDY

Master  :  slave-­‐parallel-­‐type   DATABASE   LOGICAL_CLOCK  Master  generated  binary  logs(MB)   3924   3690  read/write  requests   18704820   17587168  read/write  requests/per  sec   20783.1   19541.25  response  Sme  AVG  ms   1.54   1.64  95  percenSle   2.21   2.36  15  mins  work   81   38  Slave  QPS   4614.648   9466.198  

Page 37: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Rolling out GTID and MTS to production

Page 38: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Online Rollout GTID with MTS in Percona Server

▪  MySQL56 requires downtime to enable GTID, it is not acceptable ▪  With Percona server 5.6, with almost no downtime The variable GTID_DEPLOYMENT_STEP plays an important role

Page 39: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Database Servers Setup

Dual masters setup ▪ Masters setup cross different colos. ▪ Each master carries one slave

DNS

Prod master BCP master

Prod slave BCP slave

Page 40: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Enable GTID without downtime

Enable GTID in BCP side 1. Make sure BCP master and BCP slave are sync

2. Stop mysqld in BCP master and BCP slave, add gtid_deployment_step=on, gtid_mode=ON, enforce-gtid-consistency into my.cnf

Restart mysqld in both servers. 3. Replication from prod master to BCP master is good.

DNS

Prod master BCP master

GTID_deployment_step=on

Prod slave

BCP slave GTID_deployme

nt_step=on

Page 41: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Enable GTID without downtime

Promote BCP master to Prod master 4. Prod master: set global read_only=on

5. BCP master: set global gtid_deployment_step = off; set global read_only=off; 6. The replication from BCP master to Prod master is broken.

DNS

Prod master BCP master

GTID_deployment_step=off

Prod slave

BCP slave GTID_deployme

nt_step=on

Page 42: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Enable GTID without downtime

Enable GTID in Prod master

7. Enable GTID on old prod master and prod slave 8. Fix replication from BCP master to prod master

CHANGE MASTER TO MASTER_AUTO_POSITION = 1; START SLAVE; 9. Enable GTID replication from Prod master to BCP master

10. Enable MTS in all servers stop slave; set global slave_parallel_workers=16; start slave;

DNS

Prod master GTID enabled

BCP master GTID_deployme

nt_step=off

Prod slave GTID enabled

BCP slave GTID_deployme

nt_step=on

Page 43: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Enable GTID without downtime

Switch back 10. Perform switchover in Prod master Disable gtid_deployment_step across all servers.

DNS

Prod master GTID enabled

BCP master GTID_deployme

nt_step=off

Prod slave GTID enabled

BCP slave GTID_deployme

nt_step=off

Page 44: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Switchover Steps

   

•  Enable global read_only=on in prod master

•  Sanity check to make sure BCP master catch up its master (WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS)

•  Disable read_only in BCP master. BCP master becomes prod

master Failover:

•  If prod master is unreachable, it will be failover without step 1 and 2.

 

Page 45: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

GTID and MTS in Production : MySQL Ops

Page 46: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

GTID and MTS in production : What did we learn?

▪  Errant Transactions ▪  Replication Monitoring ▪  Building slaves using xtrabackup

Page 47: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Errant Transaction

The errant transactions are: They are only executed in slaves. ▪  Could result from a mistake ▪  Could be intentionally by design, such as report tables

▪  Why they cause problem When the slave becomes the master during failover, it exchanges its own set of executed GTIDs, then send any missing transactions to the slaves.

Page 48: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Errant Transaction Detection and Fix

Detect: GTID_SUBSET(slave-Executed_Gtid_Set, master-Executed_Gtid_Set) If it returns true(1), no errant trx. If it returns false(0), it does have errant trx. Identify:GTID_SUBTRACT(slave-Executed_Gtid_Set, master-Executed_Gtid_Set) It returns the errant GTID. Fix: Inject empty transaction on all other servers or its master. If the transaction must be executed in slave only, use set sql_log_bin=0;

Page 49: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Inject Empty Transaction

Sql_skip_slave_counter=n no longer works Execute a fake trx with the GTID that you want to skip For example: GTID=68fb0071-299b-11e5-9cd6-78e7d15dbe38:501 STOP SLAVE; SET GTID_NEXT="68fb0071-299b-11e5-9cd6-78e7d15dbe38:501"; BEGIN; COMMIT; SET GTID_NEXT="AUTOMATIC"; START SLAVE; SHOW SLAVE STATUS\G # Verification

Page 50: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

MySQL Replication Monitoring

   

•  Seconds_Behind_Master A good approximation of how late the slave is only when the slave actively processes updates.

If the network is slow or not much updates in the master, this is NOT a good measurement.

   

Page 51: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

MySQL Replication Monitoring at Yahoo

▪  MySQL Health Heartbeat

1. Master generates heartbeat by updating timestamp (last_update) 2. Slave checks the difference between current time and last_update

Page 52: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

GTID MTS Monitoring Challenge

▪  SHOW SLAVE STATUS

▪  Seconds_Behind_Master is still a good indication of the replication lag

▪  Retrieved_Gtid_Set: List of GTIDs received by the I/O thread,

cleared after a server restart ▪  Executed_Gtid_Set: List of GTIDs executed by the SQL thread

▪  Auto_position: 1 if GTID-based replication is enabled

▪  5.7 is using performance_schema

Page 53: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Build Slaves Using Xtrabackup

▪  Start Xtrabackup from either master or slave If the backup is taken from the master, Please check the file xtrabackup_binlog_info in the backup folder If the backup is from slave, Please check the file xtrabackup_slave_info

$ cat xtrabackup_slave_info

SET GLOBAL gtid_purged='ffee1ff8-363f-11e5-af47-9cb654954cac:1-29123533'; CHANGE MASTER TO MASTER_AUTO_POSITION=1

Page 54: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Build Slave Using Xtrabackup

Enable Replication in Slave Issue

mysql> SET GLOBAL gtid_purged='ffee1ff8-363f-11e5-af47-9cb654954cac:1-29123533'; ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.

How to fix ▪  RESET MASTER;

▪  SET GLOBAL gtid_purged='ffee1ff8-363f-11e5-af47-9cb654954cac:1-29123533’; ▪  CHANGE MASTER TO MASTER_HOST="mastername", master_user='rep_user',

master_password='rep_password', MASTER_AUTO_POSITION = 1; ▪  START SLAVE;

Page 55: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Build Slave Using Xtrabackup

Still issue? mysql> start slave; ERROR 1872 (HY000): Slave failed to initialize relay log info structure from the repository

RESET SLAVE; START SLAVE;

Page 56: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

Summary

▪  GTID ▪  MTS ▪  GTID with MTS performance comparison ▪  GTID with MTS online rollout ▪  Things to watch out ▪  Rebuild slave

Page 57: Yahoo: Experiences with MySQL GTID and Multi Threaded Replication

We would love to talk more ..

mysqlatyahoo.tumblr.com

Yashada Jadhav

[email protected]

Stacy Yuan

[email protected]