Automated failover is pretty much a must have for many applications - uptime is taken for granted. It’s quite hard to accept that an application is down for 20 or 30 minutes because someone has to be paged to log in and investigate the situation before taking action.
In the real world, replication setups tend to grow over time to become complex, sometimes messy. And there are constraints. For instance, not every node in a setup makes a good master candidate. Maybe the hardware differs and some of the replicas have less powerful hardware as they are dedicated to handle some specific types of the workload? Maybe you are in the middle of migration to a new MySQL version and some of the slaves have already been upgraded? You’d rather not have a master in more recent version replicating to old replicas, as this can break replication. If you have two datacenters, one active and one for disaster recovery, you may prefer to pick master …
[Read more]