I previously tested InnoDB on an 8-core server to determine how many IOPs it can do for a simple IO-bound workload. The limits were ~12k disk reads/second for MySQL 5.0 and ~18k reads/second for MySQL 5.1. I just repeated the tests using a 16-core server and the results are much better. I can get 20,000 to 30,000 disk reads/second using InnoDB 5.1. InnoDB 5.5 appears to be capable of 50,000 to 70,000 disk reads/second courtesy of support for multiple buffer pools (innodb_buffer_pool_instances). This is possible on benchmarks assuming sufficient concurrency and without unreasonable degradation in response time. This is a best-case result that is unlikely to be achieved in production, but I much prefer a best case of 30,000 disk reads/second instead of 10,000/second.
The test runs 1 to 1024 concurrent threads on a client server. Each thread fetches one row per query from mysqld running on a separate server and there is one query per transaction. The rows are fetched using HANDLER and that requires sysbench from Launchpad to get the --oltp-point-select-mysql-handler option. This is used to minimize the overhead from processing SQL statements as I am trying to determine the best case.
The database server has 16 cores, lots of RAM and a very fast PCI-based flash device. I ran tests with the database on tmpfs and on the flash device. I tested four MySQL binaries:
- 5084forever - MySQL 5.0.84 with the Facebook patch
- 5147orig - unmodified MySQL 5.1.47
- 5147dev - MySQL 5.1.47 with the Facebook patch. Tests were repeated for 16kb and 8kb database pages.
- 554m3 - MySQL 5.5.4m3 with support for innodb_buffer_pool_instances
The default my.cnf settings for the tests are listed below. The InnoDB database file was about 30G and could not be cached in a 4G InnoDB buffer pool. The test queries fetched primary keys at random (--oltp-dist-type=uniform). Assuming a reasonable RNG then about 4/30 of the fetches will hit in the InnoDB buffer pool and 26/30 of the fetches require a disk read.
[mysqld] innodb_buffer_pool_size=4000M innodb_flush_method=O_DIRECT innodb_thread_concurrency=0 max_connections=2000 table_cache=2000 thread_cache_size=2000
The command line to setup and run the tests:
# setup the test sysbench --test=oltp --oltp-table-size=128000000 --max-time=60 --max-requests=0 \ --oltp-read-only --oltp-skip-trx --oltp-test-mode=simple \ --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=yes \ --oltp-point-select-mysql-handler --oltp-dist-type=uniform --oltp-range-size=1 \ --num-threads=1 --seed-rng=1 prepare # run the tests using nt = 1, 2, 4, ..., 1024 sysbench --test=oltp --oltp-table-size=128000000 --max-time=60 --max-requests=0 \ --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=yes \ --oltp-read-only --oltp-skip-trx --oltp-test-mode=simple \ --oltp-point-select-mysql-handler --oltp-dist-type=uniform --oltp-range-size=1 \ --num-threads=$nt --seed-rng=1 run
Results in TPS for 1 to 1024 concurrent threads using sysbench and the flash device. A transaction is one HANDLER statement that fetches one row by primary key value. Here is a graph of the results. The peak is about 30,000 TPS which means that InnoDB did almost 30,000 reads/second from the flash device. The peak in r/s from iostat is about 28,000. That is very nice.
1025 2328 4391 8634 15523 17647 15937 16071 14811 10784 2210 5084forever 898 1763 3424 8522 16334 21271 29173 32480 33101 31651 13625 5147dev 1119 1913 4574 8992 16424 23142 26383 27910 29065 28157 19063 5147orig 1072 1901 4301 8603 16785 24307 27353 27490 27666 25483 9238 554m3
95th percentile response time in microseconds for the same test. Results begin to degrade at 64 concurrent threads. Here is a graph of the results.
1220 1300 1340 1760 1920 3260 7820 16460 37000 98550 916560 5084forever 2040 2070 2090 1860 1880 2570 4120 8820 21110 52370 207780 5147dev 1940 2000 1150 1790 1800 2330 4380 8540 17400 37680 130920 5147orig 1930 2000 1890 1840 1780 2240 4370 8710 18660 43410 255380 554m3
I repeated tests using the 5147dev binary compiled to use 8kb pages. The table below lists TPS for 1 to 1024 concurrent threads using 8kb and 16kb pages. That required a few changes to InnoDB source code. Here is a graph of the results.
898 1763 3424 8522 16334 21271 29173 32480 33101 31651 13625 5147dev-16kb 850 2271 4391 9307 19095 32292 35026 37155 35605 30107 13577 5147dev-8kb
And 95th percentile response time in microseconds for 8kb versus 16kb pages with a graph of the results.
2040 2070 2090 1860 1880 2570 4120 8820 21110 52370 207780 5147dev-16kb 2020 1970 1190 1170 1160 1650 3760 7560 17130 40870 209160 5147dev-8kb
Finally, I wanted to understand whether InnoDB was limited by the speed of the storage device so I repeated tests using tmpfs. Two results are reported for the 554m3 binary: one with innodb_buffer_pool_instances=1 and one with innodb_buffer_pool_instances=4. The results with innodb_buffer_pool_instances=4 are much better. This is a graph of the results. The table lists TPS for 1 to 1024 concurrent threads.
1110 2482 4410 10513 22573 53237 77219 69870 59835 54468 13918 554m3-instances=4 1327 2191 4715 9091 21857 30569 34201 33589 32015 28815 11291 554m3-instances=1 1503 2258 4470 9527 18941 17897 16851 16442 14437 11745 2218 5084forever 1305 2575 4452 10480 21445 34666 37879 39325 37539 32054 14425 5147dev 1183 2344 4829 9900 21792 31378 34844 35192 35281 33635 25023 5147orig
And 95th percentile response time in microseconds:
1100 1230 1660 1100 1100 1060 1840 4600 9580 25720 198250 554m3-instances=4 1110 1900 1660 1250 1100 1900 3770 8440 18900 42610 226220 554m3-instances=1 1040 1110 1220 1710 1280 3210 7920 17300 39510 92520 927600 5084forever 1070 1100 1630 1130 1140 1720 3620 7930 17190 39890 205620 5147dev 1640 1080 1100 1130 1120 1850 3670 7920 16390 34290 110850 5147orig
Finally, this table lists the reads/second I get from sysbench --test=fileio (no mysql) using the flash device for 8kb and 16kb reads with 1 to 1024 concurrent threads:
6069 12620 18976 21127 24377 28941 29230 29457 29645 29587 29470 flash-16kb 6758 13329 22636 27092 30897 40853 42786 43206 43296 43244 43100 flash-8kb # sysbench command line bs=8192; sec=60; for nt in 1 2 4 8 16 32 64 128 256 512 1024 ; do sysbench --test=fileio --file-num=1 --file-total-size=28G --file-test-mode=rndrd \ --file-io-mode=sync --file-fsync-freq=0 --file-extra-flags=direct \ --num-threads=$nt --max-requests=0 --max-time=$sec \ --file-block-size=$bs run; done