Sometimes on very busy MySQL server you will see sporadic
connection timeouts, such as Can’t connect to MySQL server on
‘mydb’ (110). If you have connects timed in your
application you will see some successful connections taking well
over the second. The problem may start very slow and be almost
invisible for long time, for example having one out of
million
of connection attempts to time out… when as the load growths it
may become a lot more frequent.
If you time the connect you will often see connection times are being close to 3 and 9 seconds. These are “magic” numbers which I remember from years ago, which correspond to SYN packet being dropped during connection attempt and being resent. 3 seconds corresponds to 1 packet being dropped and 9 seconds correspond to two. If this is happening it is possible you have network issues or more likely you have listen queue overflow. You can check if it is the case by running netstat -s and finding something like:
38409 times the listen queue of a socket overflowed 38409 SYNs to LISTEN sockets dropped
This means some SYN packets have to be dropped because kernel
buffer of connection requests on LISTEN socket is overflow –
MySQL is not accepting connections as quickly as it needs.
There are 2 tuning places you need to consider if this is what is
happening.
First – Linux kernel
net.ipv4.tcp_max_syn_backlog This is size of
kernel buffer for all sockets.
Default I have on my kernel is 2048 though it might vary for
different versions, you might need to increase it to 8192 or so
if you have intense connection. I’ll explain the math below.
Second – is MySQL parameter back_log which has
default value of just 50. You may want to set it to 1000 or even
higher. You may also need to increase
net.core.somaxconn kernel setting which contains
the maximum depth of listen queue allowed. The kernel I’m running
has it set to just 128 which would be too low for many
conditions.
Now lets look more into the problem and do some Math. First lets
look into how MySQL accepts connection. There is single main
thread which is accepting connections coming to LISTEN
sockets. Once there is connection coming it it needs to create a
new socket for incoming connection and create a new thread or
take one out of the thread cache. From this point on MySQL
processes network communication in multiple threads and can
benefit from multiple cores but this work done by main thread
does not.
Usually main thread is able to accept connections pretty quickly, however if it stalls waiting on mutex or doing any other work such as launching new thread takes a lot of time you can have the listen queue to overflow. Lets look at the database which accepts 1000 of connects/sec in average. This is a high number but you can see ones even higher. In most cases because of “random arrivals” nature of traffic you will see some seconds where as much as 3000 connections come in. Under such conditions the default back_log of 50 is enough just for 17 milliseconds, and if main thread stalls somewhere longer than, some SYN packets may be lost.
I would suggest sizing your tcp_max_syn_backlog and back_log value to be enough for at least 2 seconds worth of connection attempts. For example If I have 100 connects/sec which means I should plan for 300 connections using 3x for “peak multiplier”. This means they should be set to at least 600.
Setting it to cover much more than 2 seconds does not make much sense because if client does not get a response within 3 seconds it will consider SYN packet is lost and will send the new one anyway.
There is something else. If you’re creating 1000 of connections a second to MySQL Server you might be pushing your luck and at very least you’re using a lot of resources setting up and tearing down connections. Consider using persistent connections or connection pool at least for applications which are responsible for most of connections being created.