Introduction
The purpose of this article is to describe how Galera Cluster multi-master replication provides high availability for MySQL beyond simply replicating all updates to multiple nodes.
High availability has multiple dimensions, such as being able to detect and tolerate failures in individual components and be able to recover quickly. We will discuss the different failure modes that can happen in a cluster and how Galera facilitates the detection and recovery from each situation.
Your load balancer and application may be governed by different timeouts and recovery mechanisms, but an operational Galera Cluster will provide a stable foundation to recover the rest of your infrastructure in case of a widespread outage.
Failures of Individual Nodes
Synchronous replication requires the participation of all nodes but a Galera Cluster will detect and automatically remove a node that has gone down within the default timeout of 5 seconds (configurable using the evs.suspect_timeout option). The timeout places an upper bound on the time a transaction attempting to commit at the time of a node failure could be blocked.
After the node has been evicted from the cluster, if enough other nodes remain in the majority, the cluster will continue to operate without further interruptions.
Recovery
The recovery procedure for a Galera Cluster node is streamlined so that there is the least number of surprises during a time of high stress. The procedure is identical regardless of which node has failed and the customary multi-step procedure of promoting a slave to master that is used in legacy MySQL replication is not used in Galera.
- if the node still has its data files, restarting the server
via the init script will cause it to rejoin the cluster
automatically (Internally, mysqld will first be called with the
--wsrep-recover option to initiate InnoDB recovery, followed by
the actual server restart.)
If the downtime was short, Galera Cluster will bring the node up to speed by replaying any transactions that it missed while it was down, the so called Incremental State Transfer (IST). If the downtime was longer, the node may need to be instantiated anew using a complete copy of the database from another node (State Snapshot Transfer, or SST).
- if the node no longer has its data files, Galera will bring in a copy from another node automatically. While it is also possible to restore the node from a backup, it is not required.
In all cases the procedure happens internally and can proceed without administrator intervention.
Mitigation
A Galera cluster can tolerate the loss of up to half minus one of its nodes and remain operational. It is possible to create 7- and 9-node Galera clusters in order to increase the number of failed nodes that can be tolerated (3 and 4, respectively). On the smaller side of the scale, using the Galera Arbitrator allows a two-node cluster to survive the failure of a single node.
It is possible to avoid complete snapshot transfers when restarting after non-destructive outages by giving Galera nodes more disk space to store pending database updates by increasing the value of the gcache.size variable.
Node Instability
A node which is behaving erratically or whose network connection is unstable can be detected and evicted from the cluster via the Auto Eviction mechanism. The administrator specifies the maximum number of incidents per unit of time that will be tolerated. When the limit is reached, the node is instructed to shut down as to not impact the operation of the cluster.
Network Issues
If there is a network issue that prevents just two nodes or groups of nodes from communicating directly to each other, while the rest of the cluster is being able to communicate correctly, Galera will internally reroute the traffic to work around the network failure.
Whole Datacenter Failure
Galera can be used to build geo-distributed clusters that can handle a failure of an entire datacenter, even if multiple nodes were located in that datacenter.
Mitigation
Galera is not limited to two data centers, so it is recommended that you use three or more datacenters for maximum reliability. In a three-datacenter cluster the majority of the nodes will remain running, avoiding the case where the cluster is split exactly in the middle.
In case of a two-datacenter setup, one of the data centers can be designated the main one by adding more nodes to it, installing the Galera Arbitrator in it or giving its nodes a higher weight using the pc.weight option. In case of a network split that causes the data centers to lose contact with one another, the data center having the bigger number of nodes by weight will remain running and continue to service requests.
Recovery
While it is possible to simply restart the failed nodes together, data transfers over the WAN may be minimized if one node is restarted and left to join the cluster first. When the other nodes in the datacenter are restarted, they may elect to come up to speed using that node as a donor.
Whole-cluster FailuresPrevention
Galera does not require that nodes are located on the same physical network or physical location, or to share the same storage, so a truly shared-nothing cluster can be built even without using geo-distributed replication. It is possible to reduce the chance for a whole-cluster outage by having nodes in different availability zones or even different datacenters, providing the desired amount of isolation from common-cause failures.
Recovery
If the pc.recover option is used and all machines were successfully restarted after the outage is over, the only administrator action that is required is to restart the nodes in any order and they will find each other and recreate the cluster as it existed before the failure.
If that option is not used, or some machines have not survived, it is important to determine which node died last and restart it first. The rest of the nodes can be restarted in any order.
If all nodes lost their data files in a truly catastrophic outage, recovering Galera Cluster from a backup uses the same tools that are used to recover a stand-alone MySQL server. As soon as the first node of the cluster has been recovered from a backup, it can be made immediately available for servicing requests. More nodes can then be started and they will automatically fetch a copy of the database.
Configuration and Procedural Errors
Galera Cluster has various characteristics that make it simple to configure and maintain. These same features are very useful in emergency situations when trying to restore service under a high level of stress. A reduced number of steps to perform, a smaller number of files to take care of decrease the potential for mistakes and result in faster recovery times.
Limited External Dependencies
Galera requires only working TCP networking in order to run. There is no requirement for things such as functioning multicast or dedicated network interfaces. Galera has no dependence on third-party services or libraries that are unlikely to be present on a fresh server or will not be brought in by the package manager when Galera is installed on a replacement server.
Galera can even start up without a functioning DNS service if IP addresses are present in the wsrep_cluster_address variable. It does not require SANs, shared storage, or user or file permissions that are not present by default.
Single Configuration
The entire configuration for Galera Cluster is contained within MySQL’s my.cnf file (or files included from it), which enables it to be quickly restored or recreated in case of an emergency. The configuration file is not required to contain any host-specific entries, so can be reused across nodes.
No Requirement for SQL commands
Galera Cluster does not require issuing commands such as CHANGE MASTER or RESET SLAVE via the command-line interface. There is no need to figure out where replication stopped in order to determine the proper parameters to specify to such commands.
Limited File “Sprawl”
The only Galera file that is located outside of the MySQL directory hierarchy is the libgalera_smm.so library. Galera stores its data files in the MySQL’s data directory, but does not require them to survive restarts or be restored from a backup. Galera prints its log messages to MySQL’s error log.
Automatic Replication of Authentication
A newly-joining Galera node will obtain the authentication database for the SQL users from the cluster, so it will be able to accept incoming connections from the application without the need to set up users manually.
Conclusion
Galera Cluster is not merely a way to replicate data from one MySQL server to another. It is a complete clustering solution for MySQL high availability that is designed to handle all possible failure scenarios and allow for speedy recovery from each one. The database can be kept running and available under very challenging circumstances.