ext4 vs xfs on SSD

As ext4 is a standard de facto filesystem for many modern Linux system, I am getting a lot of question if this is good for SSD, or something else (i.e. xfs) should be used.
Traditionally our recommendation is xfs, and it comes to known problem in ext3, where IO gets serialized per i_node in O_DIRECT mode (check for example Domas’s post)

However from the results of my recent benchmarks I felt that this should be revisited.
While I am still running experiments, I would like to share earlier results what I have.

I use STEC SSD drive 200GB SLC SATA (my thanks to STEC for providing drives).

What I see, that ext4 still has problem with O_DIRECT. There are results for “single file” with O_DIRECT case (sysbench fileio 16 KiB blocksize random write workload):

  • ext4 1 thread: 87 MiB/sec
  • ext4 4 threads: 74 MiB/sec
  • xfs 4 threads: 97 MiB/sec

Dropping performance in case with 4 threads for ext4 is a signal that there still are contention issues.

I was pointed that ext4 has an option dioread_nolock, which supposedly fixes that, but that option is not available on my CentOS 6.2, so I could not test it.

At this point we may decide that xfs is still preferable, but there is one more point to consider.

Starting the MySQL 5.1 + InnoDB-plugin and later MySQL 5.5 (or equally Percona Server 5.1 and 5.5), InnoDB uses “asynchronous” IO in Linux.

Let’s test “async” mode in sysbench, and now we can get:

  • ext4 4 threads: 120 MiB/sec
  • xfs 4 threads: 97 MiB/sec

It corresponds to results I see running MySQL benchmarks (to be published later) on ext4 vs xfs.

Actually amount of threads does not affect the result significantly. This is to another question I was asked, namely: “If MySQL 5.5 uses async IO, is innodb_write_io_threads still important?”, and it seems it is not. In my tests it does not affect the final result. I would still use value 2 or 4, to avoid scheduling overhead from single thread, but it does not seem critical.

In conclusion ext4 looks like an good option, providing 20% better throughput. I am still going to run more benchmark to get better picture.

The script for tests:

for size in 100
do

cd /mnt/stec
sysbench --test=fileio --file-num=1 --file-total-size=${size}G prepare
sync
echo 3 > /proc/sys/vm/drop_caches

for numthreads in 4
do
sysbench --test=fileio --file-total-size=${size}G --file-test-mode=rndwr --max-time=3600 --max-requests=0 --num-threads=$numthreads --rand-init=on --file-num=1 --file-extra-flags=direct --file-fsync-freq=0 --file-io-mode=sync --file-block-size=16384 --report-interval=10 run | tee -a run$size.thr$numthreads.txt
done
done

Follow @VadimTk