Automatic failover for MySQL Replication has been subject to debate for many years.
Is it a good thing or a bad thing?
For those with long memory in the MySQL world, they might remember the GitHub outage in 2012 which was mainly caused by software taking the wrong decisions.
GitHub had then just migrated to a combo of MySQL Replication, Corosync, Pacemaker and Percona Replication Manager. PRM decided to do a failover after failing health checks on the master, which was overloaded during a schema migration. A new master was selected, but it performed poorly because of cold caches. The high query load from the busy site caused PRM heartbeats to fail again on the cold master, and PRM then triggered another failover …
[Read more]