In the MySQL ecosystem there are few load balancers there are
also open-source, and ProxySQL is one of the few proxies that
works at the application layer and therefore is SQL aware.
In this blog post we will benchmark ProxySQL against MaxScale,
another popular proxy for MySQL.
The idea to compare ProxySQL vs MaxScale came after reading an
interesting blog post of Krzysztof Książek on SQL Load
Balancing Benchmark, comparing performance of MaxScale vs
HAProxy.
Disclaimer: ProxySQL is not GA yet, therefore please do not
use it in production.
Sysbench setupI wanted to setup a similar sysbench setup to what
Krzysztof used in his benchmark, but it is slightly
different:
a) instead of using a MySQL cluster with Galera, I setup a
cluster with 1 master and 3 slaves. Since the workload was meant
to be completely read-only and in-memory, the 2 setups are
functionally identical;
b) instead of using AWS instances I used 4 physical servers:
server A was running as a master and servers B, C and D were
running as slaves. Since the master was idle (remember, this is a
read-only workload that use only the slaves), I used the same box
to also run sysbench and all the various proxies.
Benchmark were executed running the follow:
./sysbench \
--test=./tests/db/oltp.lua \
--num-threads=$THREADS \
--max-requests=0 \
--max-time=600 \
--mysql-user=rcannao \
--mysql-password=rcannao \
--mysql-db=test \
--db-driver=mysql \
--oltp-tables-count=128 \
--oltp-read-only=on \
--oltp-skip-trx=on \
--report-interval=1 \
--oltp-point-selects=100 \
--oltp-table-size=400000 \
--mysql-host=127.0.0.1 \
--mysql-port=$PORT \
run
The versions used are:
Percona Server 5.6.22
sysbench 0.5
ProxySQL at commit a47136e with debugging disabled
MaxScale 1.0.5 GA
HAProxy 1.4.15
ProxySQL and MaxScale: few design differences
In the benchmark executed by Krzysztof, MaxScale was configured
to listen on port 4006 where the service "RW Split Router" was
running, and on port 4008 where the service "Read Connection
Router" was running.
To my understand:
a) RW Split Router performs read/write split, parsing the queries
and tracking the state of the transaction;
b) Read Connection Router performs a simple network forwarding,
connecting clients to backends;
c) the two services, to operate, need to listen on different
ports.
ProxySQL is, by design, different.
ProxySQL and RW split
ProxySQL performs a very simple query analysis to determine where
the queries need to be send.
ProxySQL decides where a query needs to be forwarded based on a
user configurable chain of rules, where a DBA can specify various
matching criteria like username, schemaname, if there is an
active transaction (feature not completely implemented), and a
regular expression to match the query.
Matching against a regular expression provides better speed than
building a syntax tree, and having a chain of rules that match
with either regex or other attributes allows a great degree of
flexibility compared to hardcoded routing policies.
Therefore, to implemented a basic read/write split, ProxySQL was
configured in a way that:
a) all the queries matching '^SELECT.*FOR UPDATE$' were
sent to master ;
b) all the queries not matching the previous rules but matching
'^SELECT.*' were sent to slaves.
c) by default, all traffic not matching any of the previous rules
was sent to master;
Considering the 3 rules listed above, all traffic generated by
sysbench was always sent to slaves.
Additionally, while ProxySQL doesn't perform any syntax parsing
to determine the target of a query, no matter what routing rules
are in place, it also performs a very simple query analysis to
determine what type of statement is being executed and generate
statistics based on these. That is, ProxySQL is counting the type
of statements that is executing, and these information are
accessible through ProxySQL itself.
As already pointed in previous articles, one of the main idea
behind ProxySQL is that the DBA is now the one controlling and
defining query routing rules, making routing completely
transparent to the developers, eliminates the politics behind
DBAs depending on developers for such tweaking
of the setup, and therefore increasing interaction speed.
ProxySQL and Fast ForwardingI think that the way MaxScale
implements different modules listening on different port is a
very interesting approach, yet it forces the developers to
enforce some sort of read/write split in the application: connect
to port 4006 if you want R/W split, or port 4008 if you want RO
load balancing.
My aim in ProxySQL is that the application should have a single
connection point, ProxySQL, and the proxy should determine what
to do with the incoming requests. In other words, the application
should just connect to ProxySQL and this should take care of the
rests, according to its configuration.
To do so, ProxySQL should always authenticate the client before
applying any rule. Therefore I thought that a quick feature to
implement is Fast Forwarding based on username: when a specific
user connects, all its requests are forwarded to the backends
without any query processing or connection pool.
In other words, ProxySQL's Fast Forwarding is a concept similar
to MaxScale's Read Connection, but uses the same port as the R/W
split module and the matching criteria is the client's username
instead of listener port.
Note that ProxySQL already support multiple listeners, but the
same rules apply to all ports; in future versions, ProxySQL will
support matching criteria also based on listener's port behaving
in a similar way of MaxScale, but will also add additional
matching criteria like the source of the connection.
Performance benchmarks
As said previously, on the same host where sysbench was running I
also configured ProxySQL, MaxScale and HAProxy.
In the blog post published by Severalnines, one of the comment
states that MaxScale was very slow with few connections, on
physical hardware.
Therefore, the first benchmark I wanted to run was exactly at low
number of connections, and progressively increase the number of
connections.
ProxySQL and MaxScale were both configured with just 1 worker
thread, and HAProxy was configured with only 1 process.
Please note that in the follows benchmark worker threads
and connections are two completely different
entities:
1) a connection is defined as a client connection;
2) a worker thread is a thread inside the proxy, either ProxySQL,
MaxScale or HAProxy (even if HAProxy uses processes and not
threads).
What could cause confusion is the fact that in sysbench a thread
is a connection: from a proxy prospective, it is just a
connection.
Benchmark with 1 worker thread
Tagline:
maxscale rw = MaxScale with RW Split Router
maxscale rr = MaxScale with Read Connection Router
proxysql rw = ProxySQL with query routing enabled
proxysql ff = ProxySQL with fast forwarding enabled
Average throughput in QPS:
Connections | HAProxy | MaxScale RW | MaxScale RR | ProxySQL RW | ProxySQL FF |
---|---|---|---|---|---|
1 | 3703.36 | 709.99 | 722.27 | 3534.92 | 3676.04 |
4 | 14506.45 | 2815.7 | 2926.44 | 13125.67 | 14275.66 |
8 | 26628.44 | 5690.22 | 5833.77 | 23000.98 | 24514.94 |
32 | 54570.26 | 14722.97 | 22969.73 | 41072.51 | 51998.35 |
256 | 53715.79 | 13902.92 | 42227.46 | 45348.59 | 58210.93 |
In the above graphs we can easily spot that:
a) indeed, MaxScale performance are very low when running with
just few connections (more details below);
b) for any proxy, performance become quite unstable when the
number of connections increases;
c) proxysql-ff is very close to the performance of haproxy;
d) with only 1 or 4 client connections, ProxySQL provides 5 times
more throughput than MaxScale in both modules; with only 8 client
connections ProxySQL provides 4 times more throughput than
MaxScale in R/W split, and 4.3 times more in fast forward
mode;
e) at 32 client connections, proxysql-rw provides 2.8x more
throughput than maxscale-rw, and proxysql-ff provides 2.3x more
than maxscale-rr ;
f) 4 proxies configurations (haproxy, maxscale-rw, proxysql-rw,
proxysql-ff) behave similarly at 32 or 256 client's connections,
while maxscale-rr almost double its throughput at 256 connections
vs 32 connections: in other words, when the number of connections
is high some bottleneck is taken away.
Below are also the graphs of average throughput, average and 95%
response time at low number of connections.
Fortunately, I have access to physical hardware (not AWS
instances) and I was able to reproduce the issue reported in that
comment: MaxScale seems to be very slow when running with just
few connections.
Although, for comparison, I tried a simple benchmark on AWS and I
found that MaxScale doesn't behave as bad as on physical
server.
After these interesting results, I tried running the same
benchmark connecting to MaxScale and ProxySQL not through TCP but
through Unix Domain Socket, with further interesting
results.
Unfortunately, I didn't have a version of HAProxy that accepted
connections via UDS, so I ran benchmark against HAProxy using TCP
connections.
Average throughput in QPS:
Connections | HAProxy | MaxScale RW | MaxScale RR | ProxySQL RW | ProxySQL FF |
---|---|---|---|---|---|
1 | 3703.36 | 3276.85 | 3771.15 | 3716.19 | 3825.81 |
4 | 14506.45 | 11780.27 | 14807.45 | 13333.03 | 14729.59 |
8 | 26628.44 | 15203.93 | 27068.81 | 24504.42 | 25538.57 |
32 | 54570.26 | 16370.69 | 44711.25 | 46846.04 | 58016.03 |
256 | 53715.79 | 14689.73 | 45108.54 | 54229.29 | 71981.32 |
In the above graphs we can easily spot that:
a) MaxScale is no longer slow when running with just few
connections: the performance bottleneck at low number of
connections is not present when using UDS instead of TCP;
b) again, for any proxy, performance become quite unstable when
the number of connections increase;
d) maxscale-rw is the slowest configuration at any number of
connections;d) with an increased number of client connections,
performance of MaxScale reaches its limits with an average QPS of
16.4k reads/s peaking at 32 connections for maxscale-rw , and an
average QPS of 45.1k reads/s peaking at 256 connections for
maxscale-rr;
e) with an increased number of client connections, performance of
ProxySQL reaches its limits with an average QPS of 54.2k reads/s
peaking at 256 connections for proxysql-rw , and an average QPS
of 72.0k reads/s peaking at 256 connections for proxysql-ff
.
As pointed already, with an increased number of connections the
performance become quite unstable, although it is easy to spot
that:
1) in R/W split mode, ProxySQL can reached a throughput over 3
times higher than MaxScale;
2) ProxySQL in Fast Forward mode can reach a throughput of 33%
more than MaxScale in Read Connection Router mode;
3) ProxySQL in R/W split mode is faster than MaxScale in simple
Read Connection Router mode.
The above points that while MaxScale has a readconnroute module
with a low latency, none of the two MaxScale's module scale very
well. The bottleneck seems to be that MaxScale uses a lot of CPU,
as already pointed by Krzysztof in his blog post, therefore it
quickly saturates its CPU resources without being able to
scale.
Of course, it is possible to scale adding more threads: more
results below!
MaxScale and TCPAt this stage I knew that, on physical
hardware:
- ProxySQL was running well when clients were connecting via TCP
or UDS at any number of connections;
- MaxScale was running well when clients were connecting via UDS
at any number of connections;
- MaxScale was running well when clients were connecting via TCP
with a high number of connections;
- MaxScale was not running well when clients were connecting via
TCP with a low number of connections.
My experience with networking programming quickly drove me to
where the bottleneck could be.
This search returns no results:
https://github.com/mariadb-corporation/MaxScale/search?utf8=%E2%9C%93&q=TCP_NODELAY
In other words, MaxScale never disabled the Nagle's algorithm,
adding latency to any communication with the client. The problem
is noticeable only at low number of connections because at high
number of connections the latency introduced by Nagle's algorithm
become smaller compared to the overall latency caused by
processing multiple clients. For reference:
http://en.wikipedia.org/wiki/Nagle%27s_algorithm
I will also soon open a bug report against MaxScale.
What I can't understand, and I would appreciate if someone's else
can comment on this, is why Nagle's algorithm doesn't seem to
have any effect on AWS or other virtualization
environments.
In any case, this is a very interesting example of how software
behave differently on physical hardware and virtualization
environments.
Because MaxScale performs on average, 5x more slowly at low
number of connections via TCP, the follow graphs only use UDS for
ProxySQL and MaxScale: the performance of MaxScale on TCP were
too low to be considered.
Benchmark with 2 worker threadsBecause MaxScale performs really
bad at low number of connections via TCP due the Nagle's
algorithm on physical hardware, I decided to run all the next
benchmark connecting to MaxScale and ProxySQL only through UDS.
HAProxy will still be used for comparison, even if connections
are through TCP sockets.
I know it is not fair to compare performance of connections via
TCP (HAProxy) against connections via UDS (for ProxySQL and
MaxScale), but HAProxy is used only for reference.
Average throughput in QPS:
Connections | HAProxy | MaxScale RW | MaxScale RR | ProxySQL RW | ProxySQL FF |
---|---|---|---|---|---|
4 | 14549.61 | 11627.16 | 14185.88 | 13697.03 | 14795.74 |
8 | 27492.41 | 21865.39 | 27863.94 | 25540.61 | 27747.1 |
32 | 81301.32 | 29602.84 | 63553.77 | 62350.89 | 77449.45 |
256 | 109867.66 | 28329.8 | 73751.24 | 81663.75 | 125717.18 |
512 | 105999.84 | 26696.6 | 69488.71 | 81734.18 | 128512.32 |
1024 | 103654.97 | 27340.47 | 63446.61 | 74747.25 | 118992.24 |
Notes with 2 worker threads (for
MaxScale and ProxySQL) or 2 worker processes (HAProxy):
a) once again, for any proxy, performance become quite unstable
when the number of connections increase. Perhaps this is not a
bug in the proxies, but it is a result of how the kernel
schedules processes;
b) up to 32 client connections, performance of 2 workers is very
similar to performance of 1 worker no matter the proxy. Each
proxy configuration has its different performance, but it
performs the same with either 1 or 2 workers;
c) maxscale-rw reaches its average peak at 32 connections,
reaching 29.6k reads/s;
d) maxscale-rr reaches its average peak at 256 connections,
reaching 73.8k reads/s;
e) proxysql-rw reaches its average peak at 512 connections,
reaching 81.7k reads/s;
f) proxysql-ff reaches its average peak at 512 connections,
reaching 128.5k reads/s;
As pointed already, with an increased number of connections the
performance become quite unstable, but as in the workload with
just one worker thread it is easy to spot that:
1) in R/W split mode, ProxySQL can reach a throughput of nearly 3
times higher than MaxScale;
2) ProxySQL in Fast Forward mode can reach a throughput of 74%
more than MaxScale in Read Connection Router mode;
3) ProxySQL in R/W split mode is faster than MaxScale in simple
Read Connection Router mode.
The above points confirms what said previously: ProxySQL uses
less CPU resources, therefore it is able to scale a lot better
than MaxScale with an increased number of client
connections.
Benchmark with 4 worker threadsI ran more benchmark using 4
worker threads for ProxySQL and MaxScale, and 4 worker
processes for HAProxy.
Average throughput in QPS:
Connections | HAProxy | MaxScale RW | MaxScale RR | ProxySQL RW | ProxySQL FF |
---|---|---|---|---|---|
16 | 50258.21 | 41939.8 | 50621.74 | 46265.65 | 51280.99 |
32 | 89501.33 | 50339.81 | 87192.58 | 70321.17 | 85846.94 |
256 | 174666.09 | 52294.7 | 117709.3 | 115056.5 | 183602.6 |
512 | 176398.33 | 46777.17 | 114743.73 | 112982.78 | 188264.03 |
2048 | 157304.08 | 0 | 107052.01 | 102456.38 | 187906.29 |
What happens with 4 worker threads/processes?
a) as with 1 or 2 workers, for any proxy, performance become
quite unstable when the number of connections increase, but this
time the fluctuations seems more smooth. Yet, ProxySQL seems the
most stable proxy at high number of connections;
b) at 32 connections, ProxySQL and HAProxy gives similar
throughput at either 2 or 4 workers;
c) at 32 connections, MaxScale provides more throughput with 4
workers than at 2 workers, showing that MaxScale needs more CPU
power to provide better throughput;
d) at 32 connections, HAProxy, ProxySQL and MaxScale provide
similar reads/s if they do not analyze traffic (89.5k , 85.8k and
87.2k);
e) using R/W functionality, at 16 connections ProxySQL provides
10% more reads/s than MaxScale (46.3k vs 41.9k), and at 32
connections ProxySQL provides 40% more reads/s than MaxScale
(70.3k vs 50.3k);
f) MaxScale in R/W mode wasn't able to run 2048 client's
connections;
g) maxscale-rw reaches its average peak at 256 connections, with
52.3k reads/s;
h) maxscale-rr reaches its average peak at 256 connections, with
117.7k reads/s;
i) proxysql-rw reaches its average peak at 256 connections, with
115.1k reads/s;
j) proxysql-ff reaches its average peak at 512 connections, with
188.3k reads/s;
Few more notes on scalability with 4 threads:
1) in R/W split mode, ProxySQL can reached a throughput over 2
times higher than MaxScale;
2) ProxySQL in Fast Forward mode can reach a throughput of 60%
more than MaxScale in Read Connection Router mode;
3) ProxySQL in R/W split mode is, for the first time, slightly
slower than MaxScale in simple Read Connection Router mode
(115.1k vs 117.7k).
Note on transport layer load balancingI consider important only
the benchmark related to R/W split because only this provides SQL
load balancing; HAProxy, ProxySQL with fast forward and MaxScale
with readconnroute module do not provide SQL load balancing, but
are present in the benchmark above to provide some reference of
the overhead caused by processing SQL traffic.
Furthermore, the performance of MaxScale's readconnroute cannot
be compared with the performance of HAProxy or ProxySQL. From a
user's prospective, I would prefer to use HAProxy because it can
provide way better performance.
ConclusionsOne of the main focus while developing ProxySQL is
that it must be a very fast proxy, to introduce almost no
latency. This goal seems to be very well achieved, and ProxySQL
is able to process MySQL traffic with very little overhead, and
it is able to scale very well.
In all the benchmark listed above ProxySQL is able to scale
easily.
In fact, in R/W split mode (highly configurable in ProxySQL, but
hardcoded in MaxScale), ProxySQL is able to provide up to 5 times
more throughput than MaxScale, depending from workload.
Since ProxySQL in query processing mode (R/W split) provides more
throughput than MaxScale's readconnroute in the majority of the
cases, I would always use ProxySQL's query processing that
implements important features like query routing, query rewrite,
query caching, statistics, connection poll, etc.
At today, the only reason why I wouldn't use ProxySQL in
production is that ProxySQL is not GA ... yet!