Peter Zaitsev wrote about the importance of single-threaded performance and
expressed concern that there might be regressions in MySQL 5.6.
Not much has been published on it so I will repeat tests that I
ran for high-concurrency workloads using sysbench
with IO-bound and cached workloads. These tests used sysbench
with 128M rows in one table. I compared MySQL 5.6.10 (orig5610
below), MySQL 5.1.63 (orig5163 below) and MySQL 5.1.63 with the
Facebook patch (fb5163 below). For all tests the sysbench clients
ran on the same host as mysqld. The sysbench workload is to fetch
one row by primary key per query.
My performance summary is:
- performance_schema reduced peak performance by 9% for the IO-bound test. I assume the overhead of the PS is only an issue for very fast queries and the storage for my test is fast (100us reads). But the table_stats and user_stats feature from the Facebook patch is able to provide a lot of useful statistics without that overhead. I hope the PS can be made as efficient.
- performance_schema did not reduce peak performance for the cached test. But in that case MySQL 5.6 was always about 9% worse in peak QPS than the others. I have yet to identify the source of the overhead.
- innodb_checksum_algorithm=CRC32 helps when storage is fast
For the IO bound test a 4GB InnoDB buffer pool was used. The database file is ~29GB. Fast storage was used with an average latency of 100 microseconds per 16kb page read. MySQL 5.1.63 is slower because it doesn't use x86 instructions to make the InnoDB checksum validation faster. For orig5610 I used innodb_checksum_algorithm=CRC32 and fb5163 does something similar. The performance schema still has a big cost - about 9% of peak QPS is lost when it is enabled with default options.
- 5063 QPS - orig5610, performance_schema not compiled
- 5012 QPS - fb5163
- 4913 QPS - orig5163, innodb_checksums=0
- 4682 QPS - orig5610, performance_schema=ON, default options
- 4342 QPS - orig5163, innodb_checksums=1
For the cached tests a 64GB InnoDB buffer pool was used and the table was read into the buffer pool via a SELECT statement prior to the benchmark. The adaptive hash index was not warm prior to the test so most queries spent some time updating it.
- 11940 QPS - orig5163
- 11901 QPS - fb5163
- 10875 QPS - orig5610, performance_schema not compiled
- 10860 QPS - orig5610, performance_schema with default options