There are number of people recently blogging about MySQL automated failover, based on production incident which GitHub disclosed.
Here is my take on it. When we look at systems providing high availability we can identify 2 cases of system breaking down. First is when the system itself has a bug or limitations which does not allow it to take the right decision. Second is the configuration issue – which can be hardware configuration such as redundant network paths, STONITH, as well as things like various timeouts to know the difference between just transient errors or performance problem and real problem.
To be truly …
[Read more]