Following my previous post, I got some excellent feedback in the forms of comments, tweets and other chat. In no particular order:
- Commenter Tibi noted that ensuring I’m mounting with noatime, nodiratime and nobarrier should all improve performance.
- Commenter benbradley pointed out a missing flag on some of my sysbench tests which will necessitate re-testing.
- Former co-worker @preston4tw suggests looking at different IO schedulers. For all tests past, I used deadline which seems to be best, but re-testing with noop could be useful.
- Fellow DBA @kormoc encouraged me to try many smaller partitions to limit the number of concurrent fsyncs.
There seem to be plenty of options here that should allow me to re-try my testing with a slightly more consistent method. The consistent difference seems to be in the file system, EXT4 vs XFS, with XFS performing at about half the speed of EXT4.
The constants for the testing:
- Testing tool: sysbench 0.4.12, using test=fileio, file-test-mode=rndrw (random read-write)
- fsync interval: Every 100 requests
- Read:Write Ratio: 1.5:1
- IO Scheduler: deadline
- Thread count: 1 (to mimic the MySQL SQL thread as closely as possible)
- sysbench testing duration was 300 seconds per run
First test is to look at whether the additional mount options (noatime, nodiratime, nobarrier) have any impact.
FS | Size | Mount Options | Transfer/s | Requests/s | Avg/Request | 95%/Request |
xfs | 59T | none | 1.8787Mb/sec | 120.23 | 8.31ms | 18.71ms |
ext4 | 59T | none | 3.7856Mb/sec | 242.28 | 4.12ms | 9.96ms |
xfs | 59T | noatime,nodiratime,nobarrier | 1.8287Mb/sec | 117.04 | 8.54ms | 19.21ms |
ext4 | 59T | noatime,nodiratime,nobarrier | 3.7284Mb | 238.62 | 4.18ms | 10.00ms |
Seems that EXT4 is still proving to be twice as fast as xfs and the additional mount options don’t seem to impact the results in any significant way, and the XFS FAQ sheds some light onto this.
Recalling that MySQL 5.5 introduced native Linux asynchronous IO, I discovered that sysbench has a way to set this behavior (–file-io-mode=async), I thought this might be a good followup test.
FS | Size | Mount Options | Transfer/s | Requests/s | Avg/Request | 95%/Request |
xfs | 59T | none | 17.755Mb/sec | 1136.32 | 0.83ms | 3.44ms |
ext4 | 59T | none | 31.85Mb/sec | 2038.43 | 0.45ms | 1.89ms |
xfs | 59T | noatime,nodiratime,nobarrier | 17.823Mb | 1140.68 | 0.84ms | 3.40ms |
ext4 | 59T | noatime,nodiratime,nobarrier | 31.822Mb/sec | 2036.63 | 0.46ms | 1.89ms |
This shows an incredible gain on both file systems for all values but the XFS numbers still maintain half the performance of EXT4.
In my last post, I mentioned that I had iinadvertentlycreated a smaller partition. This next test was with 2 4TB partitions instead of 2 59T partitions, first with synchronous then asynchronous IO modes:
FS | Size | Mount Options | Transfer/s | Requests/s | Avg/Request | 95%/Request |
xfs | 4T | noatime,nodiratime,nobarrier | 3.3147Mb/sec | 212.14 | 4.71ms | 11.09ms |
ext4 | 4T | noatime,nodiratime,nobarrier | 3.8028Mb/sec | 243.38 | 4.10ms | 9.94ms |
xfs | 4T | noatime,nodiratime,nobarrier | 28.597Mb/sec | 1830.24 | 0.51ms | 2.06ms |
ext4 | 4T | noatime,nodiratime,nobarrier | 32.583Mb/sec | 2085.33 | 0.46ms | 1.89ms |
For the first time, XFS outperformed EXT4 on synchronous IO, but still fell short on the asynchronous test, albeit, not be as wide a margin as in previous tests. The simple conclusion here is that as volume size grows, EXT4 performs more consistently than XFS.
I found this result to be fascinating and wonder if it holds true for other workloads. For example, if you have a large storage box to hold backups, and are writing multiple simultaneous backup streams to it, it could have a significant impact on performance. Seeing that partition size has such a profound impact on XFS also made me wonder if there was something that simply wasn’t tuned correctly. Strangely, the XFS FAQ effectively tells you to use the defaults.
The last set of tests I wanted to try were around the IO scheduler. So far, all the tests were using the deadline method. The other choice here would be the noop scheduler.
File System | Size | Mount Options | Transfer/s | Requests/s | Avg/Request | 95%/Request |
xfs | 4T | noatime,nodiratime,nobarrier | 28.778Mb/sec | 1841.79 | 0.50ms | 2.05ms |
ext4 | 4T | noatime,nodiratime,nobarrier | 32.607Mb/sec | 2086.83 | 0.46ms | 1.88ms |
I didn’t see any clear wins with this change.
With all that tested, it seems like the choice is clear here. Smaller partitions with EXT4 seem like the right way to go for a practical replication test. I will follow up with a (final?) post about the results of a live setup.