Monitoring flow control in a Galera cluster is very important. If
you do not, you will not understand why writes may sometimes be
stalled. Percona XtraDB Cluster 5.6 provides 2 status
variables for such monitoring:
wsrep_flow_control_paused
and
wsrep_flow_control_paused_ns
. Which one should you
use?
What is flow control?
Flow control does not exist with regular MySQL replication, but only with Galera replication. It is simply the mechanism nodes are using when they are not able to keep up with the write load: to keep replication synchronous, the node that is starting to lag instructs the other nodes that writes should be paused for some time so it does not get too far behind.
If you are not familiar with this notion, you should read this blogpost.
Triggering flow control and graphing it
For this test, we’ll use a 3-node Percona XtraDB Cluster 5.6
cluster. On node 3, we will adjust gcs.fc_limit
so
that flow control is triggered very quickly and then we will lock
the node:
pxc3> set global wsrep_provider_options="gcs.fc_limit=1"; pxc3> flush tables with read lock;
Now we will use sysbench to insert rows on node 1:
$ sysbench --test=oltp --oltp-table-size=50000 --mysql-user=root --mysql-socket=/tmp/pxc1.sock prepare
Because of flow control, writes will be stalled and sysbench will hang. So after some time, we will release the lock on node 3:
pxc3> unlock tables;
During the whole process, wsrep_flow_control_paused
and wsrep_flow_control_paused_ns
are recorded every
second with mysqladmin ext -i1
. We can then build a
graph of the evolution of both variables:
While we can clearly see when flow control was triggered on both
graphs, it is much easier to know when flow control was stopped
with wsrep_flow_control_paused_ns
. It would be even
more obvious if we have had several timeframes when flow control
is in effect.
Conclusion
Monitoring a server is obviously necessary if you want to be able
to catch issues. But you need to look at the right metrics. So
don’t be scared if you are seeing that
wsrep_flow_control_paused
is not 0: it simply means
that flow control has been triggered at some point since the
server started up. If you want to know what is happening right
now, prefer wsrep_flow_control_paused_ns
.
The post Monitoring MySQL flow control in Percona XtraDB Cluster 5.6 appeared first on MySQL Performance Blog.