More EXT4 vs XFS IO Testing

Following my previous post, I got some excellent feedback in the forms of comments, tweets and other chat. In no particular order:

  • Commenter Tibi noted that ensuring I’m mounting with noatime, nodiratime and nobarrier should all improve performance.
  • Commenter benbradley pointed out a missing flag on some of my sysbench tests which will necessitate re-testing.
  • Former co-worker @preston4tw suggests looking at different IO schedulers. For all tests past, I used deadline which seems to be best, but re-testing with noop could be useful.
  • Fellow DBA @kormoc encouraged me to try many smaller partitions to limit the number of concurrent fsyncs.

There seem to be plenty of options here that should allow me to re-try my testing with a slightly more consistent method. The consistent difference seems to be in the file system, EXT4 vs XFS, with XFS performing at about half the speed of EXT4.

The constants for the testing:

  • Testing tool: sysbench 0.4.12, using test=fileio, file-test-mode=rndrw (random read-write)
  • fsync interval: Every 100 requests
  • Read:Write Ratio: 1.5:1
  • IO Scheduler: deadline
  • Thread count: 1 (to mimic the MySQL SQL thread as closely as possible)
  • sysbench testing duration was 300 seconds per run

First test is to look at whether the additional mount options (noatime, nodiratime, nobarrier) have any impact.

FS Size Mount Options Transfer/s Requests/s Avg/Request 95%/Request
xfs 59T none 1.8787Mb/sec 120.23 8.31ms 18.71ms
ext4 59T none 3.7856Mb/sec 242.28 4.12ms 9.96ms
xfs 59T noatime,nodiratime,nobarrier 1.8287Mb/sec 117.04 8.54ms 19.21ms
ext4 59T noatime,nodiratime,nobarrier 3.7284Mb 238.62 4.18ms 10.00ms

Seems that EXT4 is still proving to be twice as fast as xfs and the additional mount options don’t seem to impact the results in any significant way, and the XFS FAQ sheds some light onto this.

Recalling that MySQL 5.5 introduced native Linux asynchronous IO, I discovered that sysbench has a way to set this behavior (–file-io-mode=async), I thought this might be a good followup test.

FS Size Mount Options Transfer/s Requests/s Avg/Request 95%/Request
xfs 59T none 17.755Mb/sec 1136.32 0.83ms 3.44ms
ext4 59T none 31.85Mb/sec 2038.43 0.45ms 1.89ms
xfs 59T noatime,nodiratime,nobarrier 17.823Mb 1140.68 0.84ms 3.40ms
ext4 59T noatime,nodiratime,nobarrier 31.822Mb/sec 2036.63 0.46ms 1.89ms

This shows an incredible gain on both file systems for all values but the XFS numbers still maintain half the performance of EXT4.

In my last post, I mentioned that I had iinadvertentlycreated a smaller partition. This next test was with 2 4TB partitions instead of 2 59T partitions, first with synchronous then asynchronous IO modes:

FS Size Mount Options Transfer/s Requests/s Avg/Request 95%/Request
xfs 4T noatime,nodiratime,nobarrier 3.3147Mb/sec 212.14 4.71ms 11.09ms
ext4 4T noatime,nodiratime,nobarrier 3.8028Mb/sec 243.38 4.10ms 9.94ms
xfs 4T noatime,nodiratime,nobarrier 28.597Mb/sec 1830.24 0.51ms 2.06ms
ext4 4T noatime,nodiratime,nobarrier 32.583Mb/sec 2085.33 0.46ms 1.89ms

For the first time, XFS outperformed EXT4 on synchronous IO, but still fell short on the asynchronous test, albeit, not be as wide a margin as in previous tests. The simple conclusion here is that as volume size grows, EXT4 performs more consistently than XFS.

I found this result to be fascinating and wonder if it holds true for other workloads. For example, if you have a large storage box to hold backups, and are writing multiple simultaneous backup streams to it, it could have a significant impact on performance. Seeing that partition size has such a profound impact on XFS also made me wonder if there was something that simply wasn’t tuned correctly. Strangely, the XFS FAQ effectively tells you to use the defaults.

The last set of tests I wanted to try were around the IO scheduler. So far, all the tests were using the deadline method. The other choice here would be the noop scheduler.

File System Size Mount Options Transfer/s Requests/s Avg/Request 95%/Request
xfs 4T noatime,nodiratime,nobarrier 28.778Mb/sec 1841.79 0.50ms 2.05ms
ext4 4T noatime,nodiratime,nobarrier 32.607Mb/sec 2086.83 0.46ms 1.88ms

I didn’t see any clear wins with this change.

With all that tested, it seems like the choice is clear here. Smaller partitions with EXT4 seem like the right way to go for a practical replication test. I will follow up with a (final?) post about the results of a live setup.