We had a Galera Cluster support case recently. The customer was drenched in tears because his Galera Cluster did not work any more and he could not make it work any more.
Upsss! What has happened?
A bit of the background of this case: The customer wanted to do a rolling-restart of the Galera Cluster under load because of an Operating System upgrade which requires a reboot of the system.
Lets have a look at the MySQL error log to see what was going on. Customer restarted server with NodeC:
12:20:42 NodeC: normal shutdown --> Group 2/2 12:20:46 NodeC: shutdown complete 12:22:09 NodeC: started 12:22:15 NodeC: start replication 12:22:16 NodeC: CLOSED -> OPEN 12:22:16 all : Group 2/3 component all PRIMARY 12:22:17 NodeC: Gap in state sequence. Need state transfer. 12:22:18 all : Node 1 (NodeC) requested state transfer from '*any*'. Selected 0 (NodeB)(SYNCED) as donor. 12:22:18 NodeB: Shifting SYNCED -> DONOR/DESYNCED (TO: …[Read more]