How To Speed Up Re-sync of Dropped Percona XtraDB Cluster Node

The Problem

HELP, HELP! My Percona XtraDB Cluster version: 5.7.31-31. Single Node is stuck in a joined state.

I recently had the privilege to help a client with a fascinating issue.

NODE-B dropped out of the 3 node PXC cluster. It looked to be DISK IO that caused NODE-B to fall far behind and eventually be removed from the cluster. A restart of NODE-B allowed it
to rejoin the cluster. NODE-B looked to have been down for about 4 hours. Once NODE-B was back as part of the cluster, it required a full SST.

When NODE-B stayed in a joint state for more than 12 hours, the client gave me a call. They were concerned that there was another issue with this cluster.

Before going forward, let’s make sure we know the CPU, RAM and Database Size.

Database Size approx. 2.75TB

Let’s gather some base information.

I pulled the below data once I …

Deploying MySQL on Kubernetes with a Percona-based Operator

In the context of providing managed WordPress hosting services, at Presslabs we operate with lots of small to medium-sized databases, in a DB-per-service model, as we call it. The workloads are mostly reads, so we need to efficiently scale that. The MySQL® asynchronous replication model fits the bill very well, allowing us to scale horizontally from one server—with the obvious availability pitfalls—to tens of nodes. The next release of the stack is going to be open-sourced.

As we were already using Kubernetes, we were looking for an operator that could automate our DB deployments and auto-scaling. Those available were doing synchronous replication using MySQL group replication or Galera-based replication. Therefore, we decided to write our own operator.

Solution architecture

The …

