How many IOPs can InnoDB do?

I previously tested InnoDB on an 8-core server to determine how many IOPs it can do for a simple IO-bound workload. The limits were ~12k disk reads/second for MySQL 5.0 and ~18k reads/second for MySQL 5.1. I just repeated the tests using a 16-core server and the results are much better. I can get 20,000 to 30,000 disk reads/second using InnoDB 5.1. InnoDB 5.5 appears to be capable of 50,000 to 70,000 disk reads/second courtesy of support for multiple buffer pools (innodb_buffer_pool_instances). This is possible on benchmarks assuming sufficient concurrency and without unreasonable degradation in response time. This is a best-case result that is unlikely to be achieved in production, but I much prefer a best case of 30,000 disk reads/second instead of 10,000/second.

The test runs 1 to 1024 concurrent threads on a client server. Each thread fetches one row per query from mysqld running on a separate server and there is one query per transaction. The rows are fetched using HANDLER and that requires sysbench from Launchpad to get the --oltp-point-select-mysql-handler option. This is used to minimize the overhead from processing SQL statements as I am trying to determine the best case.

The database server has 16 cores, lots of RAM and a very fast PCI-based flash device. I ran tests with the database on tmpfs and on the flash device. I tested four MySQL binaries:

The default my.cnf settings for the tests are listed below. The InnoDB database file was about 30G and could not be cached in a 4G InnoDB buffer pool. The test queries fetched primary keys at random (--oltp-dist-type=uniform). Assuming a reasonable RNG then about 4/30 of the fetches will hit in the InnoDB buffer pool and 26/30 of the fetches require a disk read.

[mysqld]
innodb_buffer_pool_size=4000M
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=0
max_connections=2000
table_cache=2000
thread_cache_size=2000

The command line to setup and run the tests:

# setup the test
sysbench --test=oltp --oltp-table-size=128000000 --max-time=60 --max-requests=0 \
    --oltp-read-only --oltp-skip-trx --oltp-test-mode=simple \
    --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=yes \
    --oltp-point-select-mysql-handler --oltp-dist-type=uniform --oltp-range-size=1 \
    --num-threads=1 --seed-rng=1 prepare

# run the tests using nt = 1, 2, 4, ..., 1024
sysbench --test=oltp --oltp-table-size=128000000 --max-time=60 --max-requests=0 \
    --mysql-table-engine=innodb --db-ps-mode=disable --mysql-engine-trx=yes \
    --oltp-read-only --oltp-skip-trx --oltp-test-mode=simple \
    --oltp-point-select-mysql-handler --oltp-dist-type=uniform --oltp-range-size=1 \
    --num-threads=$nt --seed-rng=1 run

Results in TPS for 1 to 1024 concurrent threads using sysbench and the flash device. A transaction is one HANDLER statement that fetches one row by primary key value. Here is a graph of the results. The peak is about 30,000 TPS which means that InnoDB did almost 30,000 reads/second from the flash device. The peak in r/s from iostat is about 28,000. That is very nice.

  1025   2328   4391   8634  15523  17647  15937  16071  14811  10784   2210 5084forever
   898   1763   3424   8522  16334  21271  29173  32480  33101  31651  13625 5147dev
  1119   1913   4574   8992  16424  23142  26383  27910  29065  28157  19063 5147orig
  1072   1901   4301   8603  16785  24307  27353  27490  27666  25483   9238 554m3

95th percentile response time in microseconds for the same test. Results begin to degrade at 64 concurrent threads. Here is a graph of the results.

  1220   1300   1340   1760   1920   3260   7820  16460  37000  98550 916560 5084forever
  2040   2070   2090   1860   1880   2570   4120   8820  21110  52370 207780 5147dev
  1940   2000   1150   1790   1800   2330   4380   8540  17400  37680 130920 5147orig
  1930   2000   1890   1840   1780   2240   4370   8710  18660  43410 255380 554m3

I repeated tests using the 5147dev binary compiled to use 8kb pages. The table below lists TPS for 1 to 1024 concurrent threads using 8kb and 16kb pages. That required a few changes to InnoDB source code. Here is a graph of the results.

   898   1763   3424   8522  16334  21271  29173  32480  33101  31651  13625 5147dev-16kb
   850   2271   4391   9307  19095  32292  35026  37155  35605  30107  13577 5147dev-8kb

And 95th percentile response time in microseconds for 8kb versus 16kb pages with a graph of the results.

  2040   2070   2090   1860   1880   2570   4120   8820  21110  52370 207780 5147dev-16kb
  2020   1970   1190   1170   1160   1650   3760   7560  17130  40870 209160 5147dev-8kb

Finally, I wanted to understand whether InnoDB was limited by the speed of the storage device so I repeated tests using tmpfs. Two results are reported for the 554m3 binary: one with innodb_buffer_pool_instances=1 and one with innodb_buffer_pool_instances=4. The results with innodb_buffer_pool_instances=4 are much better. This is a graph of the results. The table lists TPS for 1 to 1024 concurrent threads.

  1110   2482   4410  10513  22573  53237  77219  69870  59835  54468  13918 554m3-instances=4
  1327   2191   4715   9091  21857  30569  34201  33589  32015  28815  11291 554m3-instances=1
  1503   2258   4470   9527  18941  17897  16851  16442  14437  11745   2218 5084forever
  1305   2575   4452  10480  21445  34666  37879  39325  37539  32054  14425 5147dev
  1183   2344   4829   9900  21792  31378  34844  35192  35281  33635  25023 5147orig

And 95th percentile response time in microseconds:

  1100   1230   1660   1100   1100   1060   1840   4600   9580  25720 198250 554m3-instances=4
  1110   1900   1660   1250   1100   1900   3770   8440  18900  42610 226220 554m3-instances=1
  1040   1110   1220   1710   1280   3210   7920  17300  39510  92520 927600 5084forever
  1070   1100   1630   1130   1140   1720   3620   7930  17190  39890 205620 5147dev
  1640   1080   1100   1130   1120   1850   3670   7920  16390  34290 110850 5147orig

Finally, this table lists the reads/second I get from sysbench --test=fileio (no mysql) using the flash device for 8kb and 16kb reads with 1 to 1024 concurrent threads:

6069 12620 18976 21127 24377 28941 29230 29457 29645 29587 29470  flash-16kb
6758 13329 22636 27092 30897 40853 42786 43206 43296 43244 43100  flash-8kb

# sysbench command line
bs=8192; sec=60; for nt in 1 2 4 8 16 32 64 128 256 512 1024 ; do
sysbench --test=fileio --file-num=1 --file-total-size=28G --file-test-mode=rndrd \
    --file-io-mode=sync --file-fsync-freq=0 --file-extra-flags=direct \
    --num-threads=$nt --max-requests=0 --max-time=$sec \
    --file-block-size=$bs run; done