From time to time we have to work with live environments and production databases. For some of us this is day-to-day job. And most of the time cost of a mistake is way higher than expected improvement especially on the databases. Because issue on the database side will affect everything else.
I heard enough war stories about ruined productions and can imagine well enough speed of DROP DATABASE command replicating across the cluster. So I’m scared to make changes in production. The more loss expected if things go wrong the more I’m going to be scared planning every change. But I still love to make improvements so the only question is how to make them safer.
This post is not intended to be a guide or best practices on how to avoid issues at all, it’s more invitation to discussion that started between me and @randomsurfer in twitter on how to avoid production failures. …[Read more]