Many people think buffered write (write()/pwrite()) is fast
because it does not do disk access. But this is not always true.
Buffered write sometimes does disk access by itself, or waits for
some disk accesses by other threads. Here are three common cases
where write() takes longer time (== causing stalls).
1. Read Modify Write Suppose the following logic. Opening
aaa.dat without O_DIRECT/O_SYNC, writing 1000 bytes sequentially
for 100,000 times, then flushing by fsync().
fd=open("aaa.dat", O_WRONLY);
for(i=0; i< 100000; i++) {
write(fd, buf, 1000);
}
fsync(fd);
You might think each write() will finish fast enough (at
least less than 0.1ms) because it shouldn't do any disk access.
But it is not always true.
Operating System manages I/O by page. It's 4KB for most
Linux environments. If you'd modify 1000 bytes of the 4KB page
from offset 0, Linux first needs to read the 4KB …