I wrote couple of weeks ago on dangers of bad cache design. Today I’ve been troubleshooting the production down case which had fair amount of issues related to how cache was used.
The deal was as following. The update to the codebase was performed and it caused performance issues, so it was rolled back but yet the problem remained. This is a very common case when you would see customer telling you everything is the same as it was yesterday… but it does not work today.
When I hear these words I like to tell people computers are state machines and they work in predictable way. If it does not work same today as it worked yesterday something was changed… it is just you may not recognize WHAT was changed. It may be something subtle as change in query plan or increase in search engine bot activity. It may be RAID writeback cache disabled …
[Read more]