High availability is a high percentage of time that the system is working and responding according to the business needs. For production database systems it is typically the highest priority to keep it close to 100%. We build database clusters to eliminate all single point of failure. If an instance becomes unavailable, another node should be able to take the workload and carry on from there. In a perfect world, a database cluster would solve all of our system availability problems. Unfortunately, while all may look good on paper, the reality is often different. So where can it go wrong?
Transactional databases systems come with sophisticated storage engines. Keeping data consistent across multiple nodes makes this task way harder. Clustering introduces a number of new variables that highly depend on network and underlying infrastructure. It is not uncommon for a standalone database instance that was running fine on a single node suddenly …
[Read more]