migrating to xtradb cluster

Migrating to XtraDB Cluster

Jay Janssen, MySQL Consulting Lead

Percona Live University, Toronto

March 22nd, 2013

Overview of Xtradb Cluster

• Percona Server 5.5 + Galera Codership sync repl

addon

• “Cluster of MySQL nodes”

– Have all the data, all the time

– Readable and writeable

• Established cluster:

– Synchronizes new nodes

– Handles node failures

– Handles Node resync

– Split brain protection (quorum)

Company Confidential December 2010

-2-

• Standard MySQL replication

– into or out of the cluster

• Write scalable to a point

– all writes still hit all nodes

• LAN/WAN architectures

– write latency ~1 RTT

• MyISAM experimental

– big list of caveats

– designed and built for Innodb

XtraDB Cluster FAQ


-3-

• Is it production worthy?

– Several production users of Galera/PXC

– You should really evaluate your workload to see if it’s a

good fit for Galera/PXC

• What are the limitations of using Galera?

– http://www.codership.com/wiki/doku.php?id=limitations

What you really want to know


-4-

CONFIGURING XTRADB CLUSTER


-5-

• Configured via wsrep_provider_options

• Can be a separate network from mysqld

• Default cluster replication port is 4567 (tcp)

– Supports multicast

– Supports SSL

• Starting node needs to know one cluster node ip

– you can list all the nodes you know and it will find one

that is a member of the cluster

Cluster Replication Config


-6-

• Outside of galera replication (gcomm)

• SST

– full state transfers

– Donor picked from running cluster, gives full backup to

joiner node

– Might be blocking (various methods allowed)

– default tcp 4444

• IST

– incremental state transfers

– default wsrep port + 1 (tcp 4568)

Other intra-cluster communication


-7-

• [mysqld]

– wsrep_provider = /usr/lib64/libgalera_smm.so

– wsrep_cluster_name - Identify the cluster

– wsrep_cluster_address - Where to find the cluster

– srep_node_address - tell Galera what IP to use for

replication/SST/IST

– wsrep_sst_method - How to synchronize nodes

– binlog_format = ROW

– innodb_autoinc_lock_mode=2

– innodb_locks_unsafe_for_binlog=1 - performance

Essential Galera settings


-8-

• [mysqld]

– wsrep_node_name - Identify this node

– wsrep_provider_options - cluster comm opts

• wsrep_provider_options="gcache.size=<gcache size>"

• http://www.codership.com/wiki/doku.php?id=galera_parameters

– wsrep_node_incoming_address=<node mysql IP>

– wsrep_slave_threads - apply writesets in parallel

• http://www.codership.com/wiki/doku.php?id=mysql_

options_0.8

Other Galera Settings


-9-

1. [mysqld]

2. datadir=/var/lib/mysql

3. binlog_format=ROW

5. wsrep_cluster_name=trimethylxanthine

6. wsrep_cluster_address=gcomm://192.168.70.2,192.168.70.3,192.168.70.4

8. # Only use this before the cluster is formed

9. # wsrep_cluster_address=gcomm://

11. wsrep_node_name=percona1

12. wsrep_node_address=192.168.70.2

13. wsrep_provider=/usr/lib64/libgalera_smm.so

15. wsrep_sst_method=xtrabackup

16. wsrep_sst_auth=backupuser:password

18. wsrep_slave_threads=2

20. innodb_locks_unsafe_for_binlog=1

21. innodb_autoinc_lock_mode=2

23. innodb_buffer_pool_size=128M

24. innodb_log_file_size=64M

Example configuration


-10-

CONVERTING STANDALONE

MYSQL TO XTRADB CLUSTER


-11-

• Migrating a single server:

– stop MySQL

– replace the packages

– add essential Galera settings

– start MySQL

• A stateless, peerless node will form its own cluster

– if an empty cluster address is given (gcomm://)

• That node is the baseline data for the cluster

• Easiest from Percona Server 5.5

Method 1 - Single Node


-12-

• All at once (with downtime):

– Stop all writes, stop all nodes

after replication is synchronized

– skip-slave-start / RESET SLAVE

– Start first node - initial cluster

– Start the others with

wsrep_sst_mode=skip

• The slaves will join the cluster,

skipping SST

• Change wsrep_sst_method !=

skip

Method 2 - Blanket changeover


-13-

Method 2 - Blanket changeover


-14-

• No downtime

– Form new cluster from one slave

– Node replicates from old master

• log-slave-updates on this node

– Test like any other slave

– Move more slave nodes to

cluster

– Cut writes over to the cluster

– Absorb master into cluster.

• Non-skip SST

OPERATIONAL CONSIDERATIONS


-15-

• SHOW GLOBAL STATUS like ‘wsrep%’;

• Cluster integrity - same across all nodes

– wsrep_cluster_conf_id - configuration version

– wsrep_cluster_size - number of active nodes

– wsrep_cluster_status - should be Primary

• Node Status

– wsrep_ready - indicator that the node is healthy

– wsrep_local_state_comment - status message

– wsrep_flow_control_paused/sent - replication lag feedback

– wsrep_local_send_q_avg - possible network bottleneck

• http://www.codership.com/wiki/doku.php?id=monitoring

Monitoring


-16-

Realtime Wsrep status


-17-

Maintenance


-18-

• Rolling package updates

• Schema changes

– potential for blocking the whole cluster

– Galera supports a rolling schema upgrade feature

• http://www.codership.com/wiki/doku.php?id=rolling_schema

_upgrade

• Isolates DDL to individual cluster nodes

• Won’t work if replication events become incompatible

– pt-online-schema-change

• Prefer IST over SST

– be sure you know when IST will and won’t work!

Architecture


-19-

• How many nodes should I have?

– >= 3 nodes for quorum purposes

• 50% is not a quorum

– garbd - Galera Arbitrator Daemon

• Contributes as a voting node for

quorum

• Does not store data, but does

replicate

• What gear should I get?

– Writes as fast as your slowest node

– Standard MySQL + Innodb choices

– garbd could be on a cloud server

APPLICATION WORKLOADS


-20-

How (Virtually) Synchronous Writes Work


-21-

• Source node - pessimistic locking

– Innodb transaction locking

• Cluster repl - optimistic locking

– Before source returns commit:

• replicates to all nodes, GTID chosen

• source certifies

– PASS: source applies

– FAIL: source deadlock error (LCF)

– Other nodes

• receive, certify, apply (or drop)

• Certification deterministic on all nodes

– Apply can abort open trxs (BFA)

• First commit wins!

Why does the Application care?


-22-

• Workload dependent!

• Write to all nodes simultaneously and evenly:

– Increase of deadlock errors on data hot spots

• Can be avoided by

– Writing to only one node at a time

• all pessimistic locking happens on one node

– Data subsets written only on a single node

• e.g., different databases, tables, rows, etc.

• different nodes can handle writes for different datasets

• pessimistic locking for that subset only on one node

Workloads that work best with Galera


-23-

• Multi-node writing

– Low Data hotspots

– Auto-increment-offset/increment is ok

• Galera sets automatically by default

• Small transactions

– Expose serialization points in replication and certification

• Tables

– With PKs

– Innodb

– Avoid triggers, FKs, etc. -- supported, but problematic

APPLICATION CLUSTER HA


-24-

Application to Cluster Connects


-25-

• For writes:

– Best practice: (any) single node

• For Reads:

– All nodes load-balanced

• Can be hashed to hit hot caches

• Geo-affinity for WAN setups

– Replication lag still possible, but minimal. Avoidable with

wsrep_causal_reads (session|global).

• Be sure to monitor that nodes are functioning

members of the cluster!

Load balancing and Node status


-26-

• Health check:

– TCP 3306

– SHOW GLOBAL STATUS

• wsrep_ready = ON

• wsrep_local_state_comment !~ m/

Donor/?

• /usr/bin/clustercheck

• Maintain a separate rotations:

– Reads

• RR or Least Connected all available

– Writes

• Single node with backups on failure

Load Balancing Technologies


-27-

• glbd - Galera Load Balancer

– similar to Pen, can utilize multiple cores

– http://www.codership.com/products/galera-loadbalancer

• HAProxy

– httpchk to monitor node status

– http://www.percona.com/doc/percona-xtradb-

cluster/haproxy.html

• Watch out for a lot of TIME_WAIT conns!

HAProxy Sample config


-28-

1. # Random Reads connection (any node)

2. listen all *:3306

3. server db1 10.2.46.120:3306 check port 9200



7. # Writer connection (first available node)

8. listen writes *:4306

9. server db1 10.2.46.120:3306 track all/db1

10. server db2 10.2.46.121:3306 track all/db2 backup

11. server db3 10.2.46.122:3306 track all/db3 backup

Resources


-29-

• XtraDB Cluster homepage and documentation:

– http://www.percona.com/software/percona-xtradbcluster/

• Galera Documentation:

– http://www.codership.com/wiki/doku.php

• PXC tutorial (self-guided or at a conference):

– https://github.com/jayjanssen/percona-xtradb-cluster-

tutorial

• http://www.mysqlperformanceblog.com/category/

xtradb-cluster/

THANK YOU

Jay Janssen

@jayjanssen

http://www.percona.com/software/percona-xtradb-cluster


-30-

migrating to xtradb cluster

Software