Have just read INSERT, Don’t DELETE by Aaron Brown, and have some lengthy response, which is why I write this post instead of commenting on said post.
I wish to offer my counter thought and suggest that DELETEs are probably the better choice.
Aaron suggests that, when one wishes to purge rows from some table, a trick can be used: instead of DELETEing unwanted rows, one can INSERT "good" rows into a new table, then switch over with RENAME (but please read referenced post for complete details).
I respectfully disagree on several points discussed.
Lockdown
The fact one needs to block writes during the time of creation of new table is problematic: you need to essentially turn off parts of your application. The posts suggests one could use a slave - but this solution is far from being trivial as well. To switch over, you yet again need to turn off access to DB, even if for a short while.
A switch over to a slave is quite a big deal, in my opinion, for the mere purpose of deletion of rows.
DELETEs are easy
The DELETEs are so much easier: the first thing to note is the following: You don't actually have to delete all the rows *at once*.
You just need to drop some rows, right? Why waste a huge transaction that takes minutes, when you can drop the rows by chunks, one at a time?
For that, you can use either pt-archive from Percona Toolkit, oak-chunk-update from openark-kit, or write a simple QueryScript code with common_schema:
while (DELETE FROM title WHERE title <= 'g' LIMIT 1000)
{
throttle 1;
}
So, drop 1,000 rows or so at a time, then sleep some time, etc. The total runtime is longer, but who cares? The impact can be reduced to be unnoticeable.
Space reclaim
You can use online table operations to rebuild your table and reclaim the disk space. Either see oak-online-alter-table or pt-online-schema-change. Again, both work in small chunks, so no long stalls.
But more on this: my usual purge scenario shows that it is repetitive. You purge, data fills again, you purge again, and so on.
Which is why it doesn't make much sense to rebuild the table and reclaim the disk space: it just grows again to roughly same dimensions.
For a one time operation (e.g. after neglect of cleanup for long time) -- yes, absolutely, do a rebuild and reclaim. For repetitive cleanup - I don't bother.
Conclusion
Aaron does make note at the end of his post that DELETE operations can be done online, while the INSERT trick requires downtime, and this is a fair assessment.
But just to make a point: none of the DELETE timings are interesting. Since we are not concerned with deleting the rows in a given time (no "press the red button"), we can spread them over time and make the impact negligible. So not only is everything done online, it also goes unnoticed by the user. And this, I believe, is the major thing to consider.