building mysql dbaas on openstack with xtradb cluster · • gitlab (code repository) •...
TRANSCRIPT
Building MySQL DBaaS on
OpenStack with XtraDB
Cluster
Who We Are
Paddy Power Betfair is a leading international sports betting and gaming operator
FTSE100, Market Cap ~£7Bn
We operate six leading brands; PaddyPower, Betfair,
Sportsbet, FanDuel, TVG, DRAFT
Over five million customers worldwide
We run some of the world’s most exciting online sports
betting and gaming brands
We employ over 7000 people from Los Angeles to
Melbourne, via Dublin and London
Where We Started
Merger of Paddy Power and Betfair
Ageing native Infrastructure
Lack of cross DC DR for MySQL
Reduce TTM for new database systems
S/W and H/W inconsistencies across Dev,
QA and Prod
Our Vision
DB as a service
Always-On, Highly Available, Disaster Proof
architecture
Rapid provisioning
Ability to quickly patch systems with little to
no disruption for Applications
Free up staff for more valuable work
DBaaS at Paddy Power Betfair
XtraDB Cluster on OpenStack…
MySQL HA Options?
• MySQL Master-Master cross DC replication
• XtraDB cluster with arbitrator node in cloud/3-DC
• Asymmetric cross DC XtraDB Cluster (3-node)
Why Not Master-Master Cross DC
Replication?
Limitations:
• Handling replication lags in case of unplanned
failovers
• Handling split brain scenarios
• Operational overhead of keeping replication
working for over 160+ environment’s
• Conflict resolution
Why Not XtraDB Cluster with Arbitrator in
3rd DC?
Limitations:
• Additional round trip network latency
• SST with just 2 active node will cause
service disruption
• Handling split brain scenarios
arbitrator
arbitrator
Why Asymmetric Cross DC XtraDB
Cluster?
Limitations:
• Unplanned DC outage on majority node DC
Why Percona XtraDB?
Cross DC resiliency Transparent/Seamless failover for planned maintenance
Cross DC deployment pipeline
Fast recovery from DC outages Less Operational Overhead
Improving customer experience
Why XtraDB Backup, PMM,
pt-online-schema-change?
• XtraDB Backup allows us to recover individual
nodes, without having to do SST on 1 TB DB’s
• XtraDB Backup allows us to do point in time
and partition level recovery
• PMM allows us to monitor XtraDB cluster,
MySQL and O/S metrics in a centralized
fashion.
• PMM allows us to add PMM agents as part of
our deployment pipeline
• pt-online-schema-change for running schema
upgrades, on OLTP platform
PMM Dash Board
Why NetScaler?
• MaxScale and ProxySQL did not support values
returned from DB procedure calls (at the time of
testing)
• NetScaler allows us to check DB state for routing
connections, as it works better than other
connection managers which checks the port state
• DB state check has helped in reducing the failover
time’s from 10 sec to 2-3 seconds
• NetScaler allows us to implement read/write split
rules, this is something we plan to use in future.
• Existing framework code to provision NetScalers
What Did We Build?
IaC /1 - Automation Tools
Our toolset includes:
• Gitlab (code repository)
• Artifactory (artifacts, external repos proxy)
• Jenkins (Ci Build Jobs)
• GoCD (Pipeline configuration and templates)
IaC /2 - Ansible Framework
We have number of Git repositories to describe our
infrastructure requirements.
They all feed our Ansible Framework that calls APIs
to provision what’s required.
IaC /3 - Our repos
• Openstack VM provisioning specs
• SDN (Nuage network and firewall
design)
• Load Balancer (Citrix netscaler
VIPs, AVI GSLB)
• Monitoring (Sensu, Splunk, Tsdb)
IaC – PPB Cloud /3a
Percona XtraDB Cluster Configuration
Percona XtraDB Cluster gets configured using an
Ansible role included by our Framework
• We use jinja2 templates
• Default values for all MySQL parameters
• Override values for each environment
e.g.
Memory parameter is calculated dynamically as a
percentage of the total allocated memory to VM.
IaC – PPB Cloud /3b
Percona XtraDB Cluster Configuration
IaC - PPB Cloud /5 Jenkins wraps it up
IaC - PPB Cloud /6 GoCD Pipelines
Provisioning the desired infrastructure with the same process for each Environment (QA/Pre-Prod/Perf/Prod)
CI/CD Workflow in a picture
Challenges
• Hosting stateful applications on PPBF Openstack.
• Reducing Service Disruption.
• Hosting highly concurrent OTLP application on
XtraDB Cluster.
• Developing a mechanism for fast recovery from full
unplanned DC outages.
Stateful Apps on PPBF OpenStack
• Rolling update is the process to redeploy our
environment(s); challenge was how to minimize
service disruption
• Rolling update requires a new VM to be deployed
with the new changes and move the DB instance
onto the new VM (A / B deployments)
Rolling Update Explained
Volume
clone
Rolling Update Explained
Volume
snapshot
Rolling Update Explained
Rolling Update Explained
Volume
Clone
Rolling Update Explained
Reducing Service Disruption
• Reducing the time to failover.
sqlquery: "show global status like 'wsrep_local_state_comment'"
evalrule: "MYSQL.RES.ROW(0).TEXT_ELEM(1).CONTAINS(\"Synced\")||MYSQL.RES.ROW(0).TEXT_ELEM(1).CONTAINS(\"Donor\")“
Hosting Highly Concurrent OLTP
Applications
• Route all write connections to a single XtraDB cluster node
• Reads are being scaled across nodes
• DB design strategy
Recovery from Unplanned DC Outages
Fast recovery from unplanned DC outages:
• Trade-off between:
Transaction latency with Arbitrator
Vs
Fast enough recovery from not so frequent DC
failure
• Created workflow to recover from unplanned DC
outages;
• Process works through the MySQL environment
identifying those that have minority nodes in the
surviving DC and bootstraps them
• This process take 5-10 min for the entire DC
• set global wsrep_provider_options="pc.bootstrap=1";
How has Percona XtraDB Cluster on
OpenStack Benefited PPBF?
✓ Time-to-Market
✓ Operational Support and Consistency
✓ HA and DR for our business critical services
✓ Minimal service disruption for planned maintenance and upgrades
✓ Better manageability
✓ Standardised and improved monitoring
✓ Security
Current Status…
• We host 40+ applications on XtraDB clusters
• Biggest single database about 1TB in size
• Max transaction rate 6k/sec
• 480 VMs used
• On average we deploy/migrate 3 applications onto MySQL XtraDB cluster
every month
• About half a day average time to build a full set of environments for a new
application
• 2 major planned XtraDB cluster and OpenStack version upgrade completed
with close to zero downtime
What’s Next for PPBF?
• Standard deployment pipelines for other SQL and
NoSQL technologies
• Integrating DB releases into the pipeline and make
it self service for development team
• Full DBaaS offering for Dev team to test different
DB technologies
Thank You
Any Questions?