I recently had a case where replication lag on a slave was caused by a backup script. First reaction was to incriminate the additional pressure on the disks, but it turned out to be more subtle: Percona XtraBackup was not able to execute
FLUSH TABLES WITH READ LOCK due to a long-running query, and the server ended up being read-only. Let’s see how we can deal with that kind of situation.
Starting with Percona XtraBackup 2.1.4, you can:
- Configure a timeout after which the backup will be aborted (and the global lock released) with the
- Or automatically kill all queries that prevent the lock to be granted with the