I recently worked on a case where one node of a Galera cluster had its schema desynchronized with the other nodes. And that was although Total Order Isolation method was in effect to perform the schema changes. Let’s see what happened.
Background
For those of you who are not familiar with how Galera can perform schema changes, here is a short recap:
- Two methods are available depending on the value of the
wsrep_OSU_method
setting. Both have benefits and drawbacks, it is not the main topic of this post. - With TOI (Total Order Isolation), a DDL statement is performed at the same point in the replication flow on all nodes, giving strong guarantees that the schema is always identical on all nodes.
- With RSU (Rolling Schema Upgrade), a DDL statement is not replicated to the other nodes. Until the DDL statement has been executed on all nodes, the schema is not consistent everywhere (so …