migrating to xtradb cluster
DESCRIPTION
With employees based in countries around the globe which provide 24x7 services to MySQL users worldwide, Percona provides enterprise-grade MySQL Support, Consulting, Training, Managed Services, and Server Development services to companies ranging from large organizations, such as Cisco Systems, Alcatel-Lucent, Groupon, and the BBC, to recent startups building MySQL-powered solutions for businesses and consumers.TRANSCRIPT
Migrating to XtraDB Cluster
Jay Janssen, MySQL Consulting Lead
Percona Live University, Toronto
March 22nd, 2013
Overview of Xtradb Cluster
• Percona Server 5.5 + Galera Codership sync repl
addon
• “Cluster of MySQL nodes”
– Have all the data, all the time
– Readable and writeable
• Established cluster:
– Synchronizes new nodes
– Handles node failures
– Handles Node resync
– Split brain protection (quorum)
Company Confidential December 2010
-2-
• Standard MySQL replication
– into or out of the cluster
• Write scalable to a point
– all writes still hit all nodes
• LAN/WAN architectures
– write latency ~1 RTT
• MyISAM experimental
– big list of caveats
– designed and built for Innodb
XtraDB Cluster FAQ
Company Confidential December 2010
-3-
• Is it production worthy?
– Several production users of Galera/PXC
– You should really evaluate your workload to see if it’s a
good fit for Galera/PXC
• What are the limitations of using Galera?
– http://www.codership.com/wiki/doku.php?id=limitations
What you really want to know
Company Confidential December 2010
-4-
CONFIGURING XTRADB CLUSTER
Company Confidential December 2010
-5-
• Configured via wsrep_provider_options
• Can be a separate network from mysqld
• Default cluster replication port is 4567 (tcp)
– Supports multicast
– Supports SSL
• Starting node needs to know one cluster node ip
– you can list all the nodes you know and it will find one
that is a member of the cluster
Cluster Replication Config
Company Confidential December 2010
-6-
• Outside of galera replication (gcomm)
• SST
– full state transfers
– Donor picked from running cluster, gives full backup to
joiner node
– Might be blocking (various methods allowed)
– default tcp 4444
• IST
– incremental state transfers
– default wsrep port + 1 (tcp 4568)
Other intra-cluster communication
Company Confidential December 2010
-7-
• [mysqld]
– wsrep_provider = /usr/lib64/libgalera_smm.so
– wsrep_cluster_name - Identify the cluster
– wsrep_cluster_address - Where to find the cluster
– srep_node_address - tell Galera what IP to use for
replication/SST/IST
– wsrep_sst_method - How to synchronize nodes
– binlog_format = ROW
– innodb_autoinc_lock_mode=2
– innodb_locks_unsafe_for_binlog=1 - performance
Essential Galera settings
Company Confidential December 2010
-8-
• [mysqld]
– wsrep_node_name - Identify this node
– wsrep_provider_options - cluster comm opts
• wsrep_provider_options="gcache.size=<gcache size>"
• http://www.codership.com/wiki/doku.php?id=galera_parameters
– wsrep_node_incoming_address=<node mysql IP>
– wsrep_slave_threads - apply writesets in parallel
• http://www.codership.com/wiki/doku.php?id=mysql_
options_0.8
Other Galera Settings
Company Confidential December 2010
-9-
1. [mysqld]
2. datadir=/var/lib/mysql
3. binlog_format=ROW
5. wsrep_cluster_name=trimethylxanthine
6. wsrep_cluster_address=gcomm://192.168.70.2,192.168.70.3,192.168.70.4
8. # Only use this before the cluster is formed
9. # wsrep_cluster_address=gcomm://
11. wsrep_node_name=percona1
12. wsrep_node_address=192.168.70.2
13. wsrep_provider=/usr/lib64/libgalera_smm.so
15. wsrep_sst_method=xtrabackup
16. wsrep_sst_auth=backupuser:password
18. wsrep_slave_threads=2
20. innodb_locks_unsafe_for_binlog=1
21. innodb_autoinc_lock_mode=2
23. innodb_buffer_pool_size=128M
24. innodb_log_file_size=64M
Example configuration
Company Confidential December 2010
-10-
CONVERTING STANDALONE
MYSQL TO XTRADB CLUSTER
Company Confidential December 2010
-11-
• Migrating a single server:
– stop MySQL
– replace the packages
– add essential Galera settings
– start MySQL
• A stateless, peerless node will form its own cluster
– if an empty cluster address is given (gcomm://)
• That node is the baseline data for the cluster
• Easiest from Percona Server 5.5
Method 1 - Single Node
Company Confidential December 2010
-12-
• All at once (with downtime):
– Stop all writes, stop all nodes
after replication is synchronized
– skip-slave-start / RESET SLAVE
– Start first node - initial cluster
– Start the others with
wsrep_sst_mode=skip
• The slaves will join the cluster,
skipping SST
• Change wsrep_sst_method !=
skip
Method 2 - Blanket changeover
Company Confidential December 2010
-13-
Method 2 - Blanket changeover
Company Confidential December 2010
-14-
• No downtime
– Form new cluster from one slave
– Node replicates from old master
• log-slave-updates on this node
– Test like any other slave
– Move more slave nodes to
cluster
– Cut writes over to the cluster
– Absorb master into cluster.
• Non-skip SST
OPERATIONAL CONSIDERATIONS
Company Confidential December 2010
-15-
• SHOW GLOBAL STATUS like ‘wsrep%’;
• Cluster integrity - same across all nodes
– wsrep_cluster_conf_id - configuration version
– wsrep_cluster_size - number of active nodes
– wsrep_cluster_status - should be Primary
• Node Status
– wsrep_ready - indicator that the node is healthy
– wsrep_local_state_comment - status message
– wsrep_flow_control_paused/sent - replication lag feedback
– wsrep_local_send_q_avg - possible network bottleneck
• http://www.codership.com/wiki/doku.php?id=monitoring
Monitoring
Company Confidential December 2010
-16-
Realtime Wsrep status
Company Confidential December 2010
-17-
Maintenance
Company Confidential December 2010
-18-
• Rolling package updates
• Schema changes
– potential for blocking the whole cluster
– Galera supports a rolling schema upgrade feature
• http://www.codership.com/wiki/doku.php?id=rolling_schema
_upgrade
• Isolates DDL to individual cluster nodes
• Won’t work if replication events become incompatible
– pt-online-schema-change
• Prefer IST over SST
– be sure you know when IST will and won’t work!
Architecture
Company Confidential December 2010
-19-
• How many nodes should I have?
– >= 3 nodes for quorum purposes
• 50% is not a quorum
– garbd - Galera Arbitrator Daemon
• Contributes as a voting node for
quorum
• Does not store data, but does
replicate
• What gear should I get?
– Writes as fast as your slowest node
– Standard MySQL + Innodb choices
– garbd could be on a cloud server
APPLICATION WORKLOADS
Company Confidential December 2010
-20-
How (Virtually) Synchronous Writes Work
Company Confidential December 2010
-21-
• Source node - pessimistic locking
– Innodb transaction locking
• Cluster repl - optimistic locking
– Before source returns commit:
• replicates to all nodes, GTID chosen
• source certifies
– PASS: source applies
– FAIL: source deadlock error (LCF)
– Other nodes
• receive, certify, apply (or drop)
• Certification deterministic on all nodes
– Apply can abort open trxs (BFA)
• First commit wins!
Why does the Application care?
Company Confidential December 2010
-22-
• Workload dependent!
• Write to all nodes simultaneously and evenly:
– Increase of deadlock errors on data hot spots
• Can be avoided by
– Writing to only one node at a time
• all pessimistic locking happens on one node
– Data subsets written only on a single node
• e.g., different databases, tables, rows, etc.
• different nodes can handle writes for different datasets
• pessimistic locking for that subset only on one node
Workloads that work best with Galera
Company Confidential December 2010
-23-
• Multi-node writing
– Low Data hotspots
– Auto-increment-offset/increment is ok
• Galera sets automatically by default
• Small transactions
– Expose serialization points in replication and certification
• Tables
– With PKs
– Innodb
– Avoid triggers, FKs, etc. -- supported, but problematic
APPLICATION CLUSTER HA
Company Confidential December 2010
-24-
Application to Cluster Connects
Company Confidential December 2010
-25-
• For writes:
– Best practice: (any) single node
• For Reads:
– All nodes load-balanced
• Can be hashed to hit hot caches
• Geo-affinity for WAN setups
– Replication lag still possible, but minimal. Avoidable with
wsrep_causal_reads (session|global).
• Be sure to monitor that nodes are functioning
members of the cluster!
Load balancing and Node status
Company Confidential December 2010
-26-
• Health check:
– TCP 3306
– SHOW GLOBAL STATUS
• wsrep_ready = ON
• wsrep_local_state_comment !~ m/
Donor/?
• /usr/bin/clustercheck
• Maintain a separate rotations:
– Reads
• RR or Least Connected all available
– Writes
• Single node with backups on failure
Load Balancing Technologies
Company Confidential December 2010
-27-
• glbd - Galera Load Balancer
– similar to Pen, can utilize multiple cores
– http://www.codership.com/products/galera-loadbalancer
• HAProxy
– httpchk to monitor node status
– http://www.percona.com/doc/percona-xtradb-
cluster/haproxy.html
• Watch out for a lot of TIME_WAIT conns!
HAProxy Sample config
Company Confidential December 2010
-28-
1. # Random Reads connection (any node)
2. listen all *:3306
3. server db1 10.2.46.120:3306 check port 9200
4. server db2 10.2.46.121:3306 check port 9200
5. server db3 10.2.46.122:3306 check port 9200
7. # Writer connection (first available node)
8. listen writes *:4306
9. server db1 10.2.46.120:3306 track all/db1
10. server db2 10.2.46.121:3306 track all/db2 backup
11. server db3 10.2.46.122:3306 track all/db3 backup
Resources
Company Confidential December 2010
-29-
• XtraDB Cluster homepage and documentation:
– http://www.percona.com/software/percona-xtradbcluster/
• Galera Documentation:
– http://www.codership.com/wiki/doku.php
• PXC tutorial (self-guided or at a conference):
– https://github.com/jayjanssen/percona-xtradb-cluster-
tutorial
• http://www.mysqlperformanceblog.com/category/
xtradb-cluster/
THANK YOU
Jay Janssen
@jayjanssen
http://www.percona.com/software/percona-xtradb-cluster
Company Confidential December 2010
-30-