monitoring mysql replication lag with prometheus & pt-heartbeat
TRANSCRIPT
![Page 1: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/1.jpg)
Monitoring MySQL Replication Delaywith mysqld_exporter & pt-heartbeat
Julien Pivotto (@roidelapluie)
PromConf Munich
Augustus 18, 2017
![Page 2: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/2.jpg)
SELECT USER();Julien "roidelapluie" Pivotto
@roidelapluie
Sysadmin at inuits
Automation, monitoring, HA
MySQL/MariaDB user/admin/contributor
Grafana and Prometheus user/contributor
![Page 3: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/3.jpg)
inuits
![Page 4: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/4.jpg)
MySQL ReplicationMySQL Master <-> MySQL Master
MySQL Master -> MySQL Slave
MySQL Master -> MySQL Slave -> MySQLSlave
MySQL Masters -> MySQL Slaves -> MySQLSlaves -> MySQL Slaves
MySQL Master -> MySQL Slaves
![Page 5: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/5.jpg)
mysqld_exporter
![Page 6: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/6.jpg)
mysqld_exporter
![Page 7: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/7.jpg)
mysqld_exporter is greatLots of data
Lots of alerts examples
Percona's Graphana dashboard brings dozensof useful dashboards
![Page 8: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/8.jpg)
Migrating to Prometheus does not mean that weshould forget the past ... Or lower our monitoringexpectations.
![Page 9: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/9.jpg)
pt-heartbeatpt-heartbeart is a daemon that updates an entrywith current timestamp on a mysql server everysecond.
On the replica, you can check the timestamp anddo NOW timestamp to get the real lag.
+++| ts | server_id |+++| 20170817T16:55:01.001030 | 1 |+++
![Page 10: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/10.jpg)
pt-heartbeatGPL
Perl
Part of percona toolkit
![Page 11: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/11.jpg)
pt-heartbeatOur previous monitoring tool (munin) had supportfor pt-heartbeat. Prometheus mysqld_exporterdidn't.
![Page 12: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/12.jpg)
wait, mysql has that nativelymysql> SHOW SLAVE STATUS\G...Seconds_Behind_Master: 0...
aka mysqld_exporter metric:
mysql_slave_lag_seconds
![Page 13: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/13.jpg)
![Page 14: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/14.jpg)
BugsFixes for Seconds_Behind_Master in: 5.7.18,5.6.36, 5.6.23, 5.6.16.
![Page 15: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/15.jpg)
pt-heartbeat is usefulOkay, so we had that thing, now we move toprometheus, we don't want to losethat thing.
:idea_emoji: let's implement this!
![Page 16: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/16.jpg)
Pull Request 183https://github.com/prometheus/mysqld_exporter/pull/183
Opened Feb 20
Merged Feb 21
![Page 17: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/17.jpg)
How it worksChecks the heartbeat table (SQL query). It's notcalling the ptheartbeat cli. So it is independantfrom it.
![Page 18: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/18.jpg)
CLI flagscollect.heartbeat
collect.heartbeat.database
collect.heartbeat.table
![Page 19: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/19.jpg)
Metricsmysql_heartbeat_stored_timestamp_seconds{server_id="1"}mysql_heartbeat_now_timestamp_seconds{server_id="1"}
![Page 20: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/20.jpg)
Recording Lagmysql_heartbeat_lag_seconds = mysql_heartbeat_now_timestamp_seconds mysql_heartbeat_stored_timestamp_seconds
https://github.com/prometheus/mysqld_exporter/blob/master/example.rules
![Page 21: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/21.jpg)
AlertALERT MySQLReplicationLag IF (mysql_heartbeat_lag_seconds > 30) AND on (instance) (predict_linear(mysql_heartbeat_lag_seconds[5m], 60*2) > 0) FOR 1m LABELS { severity = "critical" } ANNOTATIONS { summary = "MySQL slave replication is lagging", description = "The mysql slave replication has fallen behind and is not recovering", }
https://github.com/prometheus/mysqld_exporter/blob/master/example.rules
![Page 22: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/22.jpg)
Contributing to PerconaGrafana Dashboards
less great
PR opened Feb 23
Still open
![Page 23: Monitoring MySQL Replication lag with Prometheus & pt-heartbeat](https://reader030.vdocuments.site/reader030/viewer/2022012304/5a6d40b97f8b9ac2418b59b1/html5/thumbnails/23.jpg)
Takeawayscontributing to prometheus is easy
pt-heartbeat is the way to monitor mysqlreplication lag
and now it's available in prometheus
any volunteers to rewrite pt-heartbeat in go? :)