In the previous release of our Percona Distribution for MySQL Operator, we implemented one interesting feature, which can be seen as “self-healing”: https://jira.percona.com/browse/K8SPXC-564.
I do not think it got enough attention, so I want to write more about this.
As it is well known, a 3-node cluster can survive a crash of one node (or pod, in Kubernetes terminology), and this case is very well handled by itself. However, if there is a problem with 2 nodes at the same time, this scenario is problematic for Percona XtraDB Cluster. Let’s see why this is a problem.
First, let’s review if the first node goes offline:
In this case, the cluster can continue work, because Node 1 and Node 2 figure out …
[Read more]