Load Balancing a Galera Cluster

Introduction

This article details the proper method of load balancing either a Percona XTRADB Cluster (PXC) or MariaDB Cluster.  Both of these solutuions utilize the same Galera plugin to facilitate clustering.  This information deatails what ports must be open and how they are to be configured as well as how to route traffic to the database nodes.

There are generally a total of three (3) nodes in this cluster, along with a single load balancer. This is the minimum number of nodes required for a cluster. This does not mean that the cluster is down if one or two nodes are lost – it only means that at least three nodes are required for normal cluster operation.

 

Data Centers

One of the most common questions we receive is whether the cluster can span two data centers.  The short answer is “Yes, but…”   While it is possible for a cluster to span two data centers, it would be best to span three datacenters.  Otherwise, it is possible to end up with a split-brain condition if one data center is lost and therefore the cluster would fail.

Anytime, you span a data center, you should span three data centers.  You should have at least three nodes in the cluster.  If you span switches, you should span three switches.  And so on.  This is known as the Rule of Threes.

Do not take a three node cluster and put two nodes in one data center and one node in the other.  You are asking for problems.  It would be better to put one node in each data center, but then you have to consider lag between the nodes of the cluster which would adversely affect performance.

Also bear in mind, you can have two clusters (one in each of two data centers) and replicate between them.  We have not yet done this at the time of the writing of this document, however.

Another option is to have a single node that is a Slave of the cluster yet is located in a second data center.

 

Load Balancers

Load balancing can be accomplished either via hardware or software.  Typically, we recommend a hardware solution for load-balancing, but software is an option when that is not available for some reason.  In the event that software is needed, we generally recommend the Open Source HAProxy tool.  Which can be configured to perform round-robin DNS and check the two necessary port probes as outlined below.

A Virtual IP (VIP) should be utilized for all agents who wish to connect to the cluster.  This is the only way the load balancer can successfully route database traffic to the nodes who are considered viable members of the cluster.  The cluster software has no internal mechanisms for doing this, or handling failing over of IPs and such.

 

Load Balancer Probe Checks

Essentially the load balancer should monitor for success on two ports for each node before it routes traffic to the node.  For a node to be considered ready to receive traffic, the node should respond to the following polls:

  1. A TCP probe of port 3306 should be successful for a node to receive traffic. Port 3306 is the default port for MySQL connections to the database.  Any clients performing queries typically will route through this port via TCP/IP.  If testing, you can even perform a simple telnet on that port to the host and should receive back data, albeit a little hard to read.
  2. An HTTP probe of port 9200 should receive an HTTP response of 200. There should be a clustercheck bash script running on Linux servers responding via xinetd to these probes.  This clustercheck script will connect to the database and then check the status of the replication on the node to ensure that the node is in primary mode, is synced with the rest of the cluster, and ready to receive traffic.  If the node is not ready for traffic an error response code of 503 will be sent.

Again, it is not enough that the node responds successful to a port 3306 check alone.  Nodes may be responsive on this port but be busy being a donor node for a joining node to the cluster, may have broken replication with the rest of the cluster, may be undergoing maintenance, upgrades, or any other issue.  In those cases, they should not be considered viable nodes to the Load Balancer.

By ensuring both probes are successful, we can guarantee consistency of the database across the cluster.

 

Node Sequencing

For MySQL clusters it is best to utilize a simple Round-Robin DNS scheme.  This keeps the solution simple while allowing rather good performance from the cluster.  There is, therefore, no need to setup complicated scripts to monitor activity or loads on the individual nodes.  A simple Round-Robin DNS rotation works fine.

 

Sticky Sessions

It is generally a good idea to enable Sticky Sessions for the nodes.  Due to the semi-synchronous nature of the cluster, it is possible that a commit on a node will not be immediately available for reads on other nodes unless Causal Reads are enabled, which affects performance.  By default, Causal Reads are disabled, but can be enabled if absolutely necessary.  To help deal with this, we generally recommend Sticky Sessions as this will allow an immediate read after write to succeed since the node requesting the immediate read is most often of the same session.

 

Networking & Traffic Routing

Although two different ports are checked, routing of connections from hosts to the database nodes should only occur on TCP port 3306!  The check on port 9200 is only to ensure that the node is ready for traffic.  No other connections, other than the probe from the Load Balancer, should be routed to this port.  All database traffic should flow through the load balancer and to the same port (3306) on the database node.  Naturally, the database node should be able to transmit its data back to the requestor though port 3306 as well.

Firewalls will need to be configured to allow communication between Load Balancer and database nodes as follows:

 

Port Purpose Origin & Destination
3306 MySQL Connections from clients to/from database hosts Both to and from remote client and database nodes through the load balancer.
4444 SST – State Transfer from donor to joining node. Between all database nodes.
4567 Group communication among nodes. Between all database nodes.
4568 IST – Incremental State Transfer from donor to joining node. Between all database nodes.
9200 Check status of database node from Load Balancer Load Balancer to each of the database nodes.

 

All nodes must be able to communicate with one another.  Segments can be utilized for minimizing network traffic, which is especially useful for clusters over a WAN, but still all nodes are required to be able to communicate over the above listed ports with one another.  If communication is not possible, a node will be removed from the cluster.

 

Clustercheck Script Configuration

Install xinetd to allow TCP/IP connections to port 9200 to return cluster node status:

yum install xinetd

Edit the /etc/services file and comment out any line with port 9200 and add the following line:

mysqlchk        9200/tcp                # MySQL check

Restart xinetd:

service xinetd start

The above should setup a file in /etc/xinetd.d that will point to the /usr/bin/clustercheck bash script.  The xinetd.d config file configures the check for requests coming to port 9200.  Upon such a connection, xinetd will call the /usr/bin/clustercheck script which queries the database using SHOW GLOBAL STATUS commands to determine whether a node is ready to receive traffic.  On a successful response, the script will return a HTTP formatted response header with code 200 and a message stating whether the node is synced or not.

Unfortunately, there is a bug in /usr/bin/clustercheck script that can cause some connections from remote hosts to fail.  The fix is to add a “sleep 0.1” to the script in four places as noted below:

#!/bin/bash
#
# Script to make a proxy (ie HAProxy) capable of monitoring Percona XtraDB Cluster nodes properly
#
# Authors:
# Raghavendra Prabhu <raghavendra.prabhu@percona.com>
# Olaf van Zandwijk <olaf.vanzandwijk@nedap.com>
#
# Based on the original script from Unai Rodriguez and Olaf (https://github.com/olafz/percona-clustercheck)
#
# Grant privileges required:
# GRANT PROCESS ON *.* TO 'clustercheckuser'@'localhost' IDENTIFIED BY 'clustercheckpassword!';

if [[ $1 == '-h' || $1 == '--help' ]];then
  echo "Usage: $0 <user> <pass> <available_when_donor=0|1> <log_file>"
  exit
fi

MYSQL_USERNAME="${1-clustercheckuser}"
MYSQL_PASSWORD="${2-clustercheckpassword!}"
AVAILABLE_WHEN_DONOR=${3:-0}
ERR_FILE="${4:-/dev/null}"

#Timeout exists for instances where mysqld may be hung
TIMEOUT=10

#
# Perform the query to check the wsrep_local_state
#
WSREP_STATUS=`mysql -nNE --connect-timeout=$TIMEOUT --user=${MYSQL_USERNAME} --password=${MYSQL_PASSWORD} -e "SHOW STATUS LIKE 'wsrep_local_state';" 2>${ERR_FILE} | tail -1 2>>${ERR_FILE}`

if [[ "${WSREP_STATUS}" == "4" ]] || [[ "${WSREP_STATUS}" == "2" && ${AVAILABLE_WHEN_DONOR} == 1 ]]
then
  # Percona XtraDB Cluster node local state is 'Synced' => return HTTP 200
  # Shell return-code is 0
  echo -en "HTTP/1.1 200 OK\r\n"
  echo -en "Content-Type: text/plain\r\n"
  echo -en "Connection: close\r\n"
  sleep 0.1
  echo -en "Content-Length: 40\r\n"
  echo -en "\r\n"
  sleep 0.1
  echo -en "Percona XtraDB Cluster Node is synced.\r\n"
  exit 0
else
  # Percona XtraDB Cluster node local state is not 'Synced' => return HTTP 503
  # Shell return-code is 1
  echo -en "HTTP/1.1 503 Service Unavailable\r\n"
  echo -en "Content-Type: text/plain\r\n"
  echo -en "Connection: close\r\n"
  sleep 0.1
  echo -en "Content-Length: 44\r\n"
  echo -en "\r\n"
  sleep 0.1
  echo -en "Percona XtraDB Cluster Node is not synced.\r\n"
  exit 1
fi

The cluster should now be up and operational.

 

Node Testing

You can test each node locally to see if they are part of the cluster by executing the following command:

curl http://127.0.0.1:9200

If you want to check each node remotely, you can verify that all nodes are up and running properly via HTTP response:

curl http://node1:9200
curl http://node2:9200
curl http://node3:9200

This will return the following result if the node you are checking is synced with the rest of the cluster.

Percona XtraDB Cluster Node is synced.