Making HAProxy 1.5 replication lag aware in MySQL

HAProxy is frequently used as a software load balancer in the MySQL world. Peter Boros, in a past post, explained how to set it up with Percona XtraDB Cluster (PXC) so that it only sends queries to available nodes. The same approach can be used in a regular master-slaves setup to spread the read load across multiple slaves. However with MySQL replication, another factor comes into play: replication lag. In this case the approach mentioned for Percona XtraDB Cluster does not work that well as the check we presented only returns ‘up’ or ‘down’. We would like to be able to tune the weight of a replica inside HAProxy depending on its replication lag. This is what we will do in this post using HAProxy 1.5.

Agent checks in HAProxy

HAProxy 1.5 allows us to run an agent check, which is a check that can be added to a regular health check. The benefit of agent checks is that the return value can be ‘up’ or ‘down’, but also a weight.

What is an agent? It is simply a program that can be accessed from a TCP connection on a given port. So if we want to run an agent on a MySQL server that will:

  • Mark the server as down in HAProxy if replication is not working
  • Set the weight to 100% if the replication lag is < 10s
  • Set the weight to 50% if the replication lag is >= 10s and < 60s
  • Set the weight to 5% in all other situations

We can use a script like this:

$ less agent.php
= 10 && $lag < 60){
                return "up 50%";
        }
        else
                return "up 5%";
}
set_time_limit(0);
$socket = stream_socket_server("tcp://127.0.0.1:$port", $errno, $errstr);
if (!$socket) {
        echo "$errstr ($errno)
n";
} else {
        while ($conn = stream_socket_accept($socket,9999999999999)) {
                $cmd = "$mysql -h127.0.0.1 -u$user -p$password -P$mysql_port -Ee "$query" | grep Seconds_Behind_Master | cut -d ':' -f2 | tr -d ' '";
                exec("$cmd",$lag);
                $weight = set_weight($lag[0]);
                unset($lag);
                fputs ($conn, $weight);
                fclose ($conn);
        }
        fclose($socket);
}
?>

If you want the script to be accessible from port 6789 and connect to a MySQL instance running on port 3306, run:

$ php agent.php 6789 3306

You will also need a dedicated MySQL user:

mysql> GRANT REPLICATION CLIENT ON *.* TO 'haproxy'@'127.0.0.1' IDENTIFIED BY 'haproxy_pwd';

When the agent is started, you can check that it is working properly:

# telnet 127.0.0.1 6789
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
up 100%
Connection closed by foreign host.

Assuming it is run locally on the app server, that 2 replicas are available (192.168.10.2 and 192.168.10.3) and that the application will send all reads on port 3307, you will define a frontend and a backend in your HAProxy configuration like this:

frontend read_only-front
bind *:3307
mode tcp
option tcplog
log global
default_backend read_only-back
backend read_only-back
mode tcp
balance leastconn
server slave1 192.168.10.2 weight 100 check agent-check agent-port 6789 inter 1000  rise 1  fall 1 on-marked-down shutdown-sessions
server slave2 192.168.10.3 weight 100 check agent-check agent-port 6789 inter 1000  rise 1  fall 1 on-marked-down shutdown-sessions

Demo

Now that everything is set up, let’s see how HAProxy can dynamically change the weight of the servers depending on the replication lag.

No lag

# Slave1
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 0
# Slave2
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 0
# HAProxy
$ echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19
# pxname,svname,status,weight
read_only-front,FRONTEND,OPEN,
read_only-back,slave1,UP,100
read_only-back,slave2,UP,100
read_only-back,BACKEND,UP,200

Slave1 lagging

# Slave1
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 25
# Slave2
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 0
# echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19
# pxname,svname,status,weight
read_only-front,FRONTEND,OPEN,
read_only-back,slave1,UP,50
read_only-back,slave2,UP,100
read_only-back,BACKEND,UP,150

Slave2 down

# Slave1
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 0
# Slave2
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: NULL
# echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19
# pxname,svname,status,weight
read_only-front,FRONTEND,OPEN,
read_only-back,slave1,UP,100
read_only-back,slave2,DOWN (agent),100
read_only-back,BACKEND,UP,100

Conclusion

Agent checks are a nice addition in HAProxy 1.5. The setup presented above is a bit simplistic though: for instance, if HAProxy fails to connect to the agent, it will not mark the corresponding as down. It is then recommended to keep a regular health check along with the agent check.

Astute readers will also notice that in this configuration, if replication is broken on all nodes, HAProxy will stop sending reads. This may not be the best solution. Possible options are: stop the agent and mark the servers as UP using the stats socket or add the master as a backup server.

And as a final note, you can edit the code of the agent so that replication lag is measured with Percona Toolkit’s pt-heartbeat instead of Seconds_Behind_Master.

The post Making HAProxy 1.5 replication lag aware in MySQL appeared first on MySQL Performance Blog.