pt-online-schema-change (if misused) can’t save the day

In this blog post we’ll discuss pt-online-schema-change, and how to correctly use it.

Always use pt-osc?

Altering large tables can be still a problematic DBA task, even now after we’ve improved Online DDL features in MySQL 5.6 and 5.7. Some ALTER types are still not online, or sometimes just too expensive to execute on busy production master.

So in some cases, we may want to apply an

ALTER

 first on slaves, taking them out of traffic pool one by one and bringing them back after the

ALTER

 is done. In the end, we can promote one of the already altered slaves to be new master, so that the downtime/maintenance time is greatly minimized. The ex-master can be altered later, without affecting production. Of course, this method works best when the schema change is backwards-compatible.

So far so good, but there is another problem. Let’s say the table is huge, and

ALTER

 takes a lot of time on the slave. When it is a DML-blocking type ALTER (perhaps when using MySQL 5.5.x or older, etc.), there will be a long slave lag (if the table is being written by replication SQL thread at the same time, for example). So what do we to speed up the process and avoid the altered slave lag? One temptation that could tempt you is why not use pt-online-schema-change on the slave, which can do the ALTER in a non-blocking fashion?

Let’s see how it that would work. I need to rebuild big table on slave using MySQL version 5.6.16 (“null alter” was made online since 5.6.17) to reclaim disk space after some rows are deleted.

This example demonstrates the process (db1 is the master, db2 is the slave):

[root@db2 ~]# pt-online-schema-change --execute --alter "engine=innodb" D=db1,t=sbtest1
 No slaves found.  See --recursion-method if host db2 has slaves.
 Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
 Operation, tries, wait:
 analyze_table, 10, 1
 copy_rows, 10, 0.25
 create_triggers, 10, 1
 drop_triggers, 10, 1
 swap_tables, 10, 1
 update_foreign_keys, 10, 1
 Altering `db1`.`sbtest1`...
 Creating new table...
 Created new table db1._sbtest1_new OK.
 Altering new table...
 Altered `db1`.`_sbtest1_new` OK.
 2016-05-16T10:50:50 Creating triggers...
 2016-05-16T10:50:50 Created triggers OK.
 2016-05-16T10:50:50 Copying approximately 591840 rows...
 Copying `db1`.`sbtest1`:  51% 00:28 remain
(...)

The tool is still working during the operation, and the table receives some writes on master:

db1 {root} (db1) > update db1.sbtest1 set k=k+2 where id<100;
 Query OK, 99 rows affected (0.06 sec)
 Rows matched: 99  Changed: 99  Warnings: 0
db1 {root} (db1) > update db1.sbtest1 set k=k+2 where id<100;
 Query OK, 99 rows affected (0.05 sec)
 Rows matched: 99  Changed: 99  Warnings: 0

which are applied on slave right away, as the table allows writes all the time.

(...)
 Copying `db1`.`sbtest1`:  97% 00:01 remain
 2016-05-16T10:51:53 Copied rows OK.
 2016-05-16T10:51:53 Analyzing new table...
 2016-05-16T10:51:53 Swapping tables...
 2016-05-16T10:51:53 Swapped original and new tables OK.
 2016-05-16T10:51:53 Dropping old table...
 2016-05-16T10:51:53 Dropped old table `db1`.`_sbtest1_old` OK.
 2016-05-16T10:51:53 Dropping triggers...
 2016-05-16T10:51:53 Dropped triggers OK.
 Successfully altered `db1`.`sbtest1`.

Done! No slave lag, and the table is rebuilt. But . . . let’s just make sure data is consistent between the master and slave (you can use pt-table-checksum):

db1 {root} (db1) > select max(k) from db1.sbtest1 where id<100;
 +--------+
 | max(k) |
 +--------+
 | 392590 |
 +--------+
 1 row in set (0.00 sec)
db2 {root} (test) > select max(k) from db1.sbtest1 where id<100;
 +--------+
 | max(k) |
 +--------+
 | 392586 |
 +--------+
 1 row in set (0.00 sec)

No, it is not! The slave is clearly missing the updates that happened during a pt-osc run. Why?

The explanation is simple. The

pt-online-schema-change

 relies on triggers. The triggers are used to make the writes happening to the original table also populate to the temporary table copy, so that both tables are consistent when the final table switch happens at the end of the process. So what is the problem here? It’s the binary log format: in ROW based replication, the triggers are not fired on the slave! And my master is running in ROW mode:

db1 {root} (db1) > show variables like 'binlog_format';
 +---------------+-------+
 | Variable_name | Value |
 +---------------+-------+
 | binlog_format | ROW |
 +---------------+-------+
 1 row in set (0.01 sec)

So, if I used

pt-online-schema-change

 on the master, the data inconsistency problem doesn’t happen. But using it on the slave is just dangerous! Conclusion

Whenever you use

pt-online-schema-change

, make sure you are not executing it on a slave instance. For that reason, I escalated this bug report: https://bugs.launchpad.net/percona-toolkit/+bug/1221372. Also in many cases, using a normal

ALTER

 will work well enough. As in my example, to rebuild the table separately on each slave in lockless mode, I would just need to upgrade to the more recent 5.6 version.

BTW, if you’re wondering about Galera replication (used in Percona XtraDB Cluster, etc.) since it also uses a ROW-based format, it’s not a problem. The pt-osc triggers are created in all nodes thanks to synchronous write-anywhere replication nature. It does not matter which node you start

pt-online-schema-change

 on, and which other nodes your applications writes on at the same time. No slaves, no problem!