online schema changes for maximizing uptime
DESCRIPTION
PalominoDB's David Turner and Ben Black cover common operations implemented in production and how you can minimize downtime and customer impactTRANSCRIPT
Online Schema Changes
for Maximizing Uptime
PALOMINODB OPERATIONAL EXCELLENCE
FOR DATABASES
David Turner
Ben Black
www.palominodb.com
Palominodb: Who are we and what do we do?
• Proactive database awesomeness so you
can focus on other things
• Systems and Performance Optimization
• Operational Best Practices
• Devops
• Ask us how we like working from anywhere
Online Schema Changes
• Best practices
• Online Schema Tools
• PT Online Schema Change
• Innodb Online DDL
• Case Study
Best Practices
• Backups
• Benchmark o Start small
o Step up
o Time
o Performance_schema and Information_schema
Repository
• Prevent
o Broken Replication
o Data loss
o Broken applications
Creating the repository
• Dump
• Script
Syncing the repository
• Migrations
• Schema Changes
• Scheduled
Problems
• Schema changes
• Fragmentation
You want to do what to that 40G
table?
• Down for Maintenance/blocking DDL
• Disable writes
• Remaster
• Facebook OSC
• Large Hadron Migrator
• Openark-kit
• Pt Online Schema Change
• MySQL Innodb DDL - IOD
OSC Tools or how I learned
to love the alter...
• copy original table structure to new table
• alter new table
• create triggers to copy dml from original
table to new table
• copy data in original table to new table in
chunks
• swap table names and drop original table
pt-online-schema-change
Percona toolkit is your friend.
Mature product
Pay attention to version you are using!!!
FK issues, log_bin with 2.0
screen - use it along with tee and time
It's all ball bearings nowadays
--progress time,10 (default 30 seconds)
--max-lag
--recursion-method (how to find slaves)
show processlist
show hosts
dsn table (great for ignoring some slaves)
What could possibly go wrong?
• PK/UK required
• FK names will change on altered table
• FK's reference table to alter
--alter-foreign-keys-method drop_swap
sets foreign_key_checks=0
drops original table (hardlink!!!)
renames new table
What could possibly go wrong?
• non xfs table drops (create the hardlink!)
o Can even cause innodb to crash mysqld
• PKs with gaps
• Largest table to alter vs free disk space
• Disk space (2x for RBR)
• Global mutexes (table drops)
• table metadata locks (triggers)
NOOOOOO!!
• Running PT-OSC against a slave with RBR
Replication started erroring after pt-osc
why???
• And how about syncing a table using pt-osc
with RBR?
Set it and forget it???
• watch
o w
o df -h
o ls -alh ibd files
o mysql -e"show processlist;"|egrep -v "Sleep|repl"
o slave lag
• How is it affecting the application?
• max-lag, max-load,
Features we like
• Throttling
• Nibbling
• Reporting
• Replicating
• Reorgs
Innodb Online DDL - IOD
• Looking back 5.1 and 5.5
o Fast index creation
• Enhancements
o Online DDL
o Inplace
Silver Bullets
http://dev.mysql.com/doc/refman/5.6/en/innodb-
create-index-overview.html#innodb-online-
ddl-summary-grid
• Drop and add Index together
• Temporary table index creation
Alter Online lock modes
• Exclusive
• Shared
• Default
• None
* Note
Alter Online Algorithms
• Inplace
• Copy
Grouping DDL
• Copy
• Sequence
• Lock
IOD Behavior
• Metadata
• Data volume
• Type of index
• Foreign keys
• Partitioning
o Add partition, drop partition
o Truncate partition
o Add partition and coalesce partition
• Algorithm
• Autoincrement
• Locking
IOD Errors
• Locking
• Timeouts
• Tmpdir
• DML
IOD Crash Recovery
• Secondary Indexes
• Clustered Indexes (don't crash!)
Of Interest
• Tmpdir
• Dropping Indexes with Foreign keys
• Cascade on
• Inconsistent .frm
Monitoring
• Rows Affected
• Performance_schema and
Information_schema
• Progress
• Replication lag
IOD Server parms
• innodb_online_alter_log_max_size
• sort_buffer_size
Case Study
• Ticketing system
Architecture
• 1 master
• 1 active slave
• 1 analytics/indexing slave
• 1 backup/reorg slave
Issues
• Growth
• Rebalancing Shards
• Deletions
• Reorgs
Timeline
• Difficult failovers
• Ticket Growth
• FB OSC
• PT OSC
• Reorgs