The occurrence of failures and crashes can compromise the high
availability of your database system affecting your revenue and
reputation. Therefore, it is fundamental to minimize downtime and
have an efficient strategy for crash recovery.
Replication and failover are commonly applied to deal with those
situations. However, other types of failures can also affect the
recovery process. In fact, the occurrence of unanticipated faults
can really be an headache! Thus, it is better to be prepared and
implement a good fault-tolerant failover strategy.
Performing failover is not trivial. It requires the execution of
several steps in order to ensure data consistency (i.e., no data
loss) -- especially if the "best" candidate to become the new
master is not the most up-to-date.
Note that, one might desire that the slave with the best hardware
should become the new master. In this case, the candidate must be …
Working day to day with Percona Remote DBA customers, we have been facing an issue from time to time when MySQL replication lag is flapping between 0 and XXXXX constantly – i.e. Seconds_Behind_Master is 0 for a few secs, then it’s like 6287 or 25341, again 0 and so on. I would like to note the 5 different scenarios and symptoms of this – some are obvious and some are not really.
1. Duplicate server-ids on two or more slaves.
Symptoms: MySQL error log on a slave shows the
slave thread is connecting/disconnecting from a master
constantly.
Solution: check whether all nodes in the
replication have unique server-ids.
2. Dual-master setup, “log_slave_updates” enabled, server-ids changed.
Scenario: you stop MySQL on the first master, then you stop the second one. …
[Read more]I was rooting through past blog entries and I stumbled accross a draft post on setting up multi-master (update anywhere) asynchronous replication for MySQL Cluster. The post never quite got finished and published and while the material is now 4 years old it may still be helpfull to some and so I’m posting it now. Note that a lot has happened with MySQL Cluster in the last 4 years and in this area, the most notable change has been the Enhanced conflict resolution with MySQL Cluster active-active replication feature introduced in MySQL Cluster 7.2 and if you’re only dealing with a pair of Clusters, that’s your best option as it removed the need for you to maintain the timestamp columns and backs out entire transactions rather than just the conflicting rows. So when would you use this “legacy” method? The main use case is when you want conflict detection/resolution among a ring of more than …
[Read more]It was just a few days ago that we announced, with celebratory enthusiasm, Tungsten Replicator 2.1.1, and today we are at it again, with Tungsten Replicator 2.1.2.
What happened? In a surfeit of overconfidence, we released Tungsten 2.1.1, with faith on the test suite and its result. The faith was justified, as the test suite was able to catch any known problem and regression. The overconfidence was unjustified, because, due to a series of unfortunate events, some sections of the test suite were accidentally disabled, and the regression that was lurking in the dark was not caught.
Therefore, instead of having a quiet post-release week-end, the whole team has worked round the clock to plug the holes. There …
[Read more]
MySQL Connect 2013 is coming up with several interesting new
sessions. Some sessions that I am participating in got accepted
for the conference, so if you are going there, you might find the
following sessions interesting. For your convenience, the
sessions have hCalendar markup, so it should be easier to
add them to your calendar.
- MySQL Sharding, Replication, and HA (September 21, 5:30-6:30pm in Imperial Ballroom B)
-
This session is an opportunity for you to meet the MySQL engineering team and discuss the latest tools and best practices for sharding MySQL across distributed server farms while maintaining high availability.
Come …
A tip for all those cloud users that like cloning database servers (as reported in my book Effective MySQL – Replication Techniques in Depth).
Starting with MySQL 5.6, MySQL instances have a UUID. Cloning servers to quickly create slaves will result in the following error message.
mysql> SHOW SLAVE STATUSG ... Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server UUIDs; these UUIDs must be different for replication to work. ...
The solution is simple. Clear our the file based configuration file (located in the MySQL datadir) and restart the MySQL instance.
$ rm -f /mysql/data/auto.cnf $ service mysql restart
UPDATE 2013-08-30: Tungsten 2.1.2 was released.
UPDATE 2013-08-23: We have found a few problems that happen when replicating with RBR and temporal columns. We will have to publish an updated bugfix release quite soon.
Tungsten Replicator 2.1.1 is out. Key features in this release are:
- A better installer, of which we have already given a preview in tpm, the multi-master composer. The new installer allows faster and more powerful deployments of both single and multiple masters topologies. And it also allows the next feature:
- Secured communication layer. Now the replicator data and administrative …
If you are in Washington DC area, don't miss Jeff Mace talk about using Continuent Tungsten to survive MySQL failures.
Join us September 12, 2013 for Percona University in Washington, DC. This full-day event will provide you with deep insights into MySQL as well as an opportunity to network with the local MySQL community. The practical knowledge you'll receive will help you be more successful in
Two of the more popular solutions are MySQL Master HA and Percona Replication Manager.
MySQL Master HA
MHA is based on a set of Perl scripts that monitors for replication and server health. When a failover scenario will happen, it can do automatic failover to a slave OR from a selection of slaves you have configured to make as the new master. The good thing about this is, after the initial failover, there will be no succeeding attempts to fail back, this is to protect your data and consistency of the cluster. You can also configure to only have manual failover for scheduled maintenance and the like. Between PRM and MHA, MHA is really easier to manage.
http://code.google.com/p/mysql-master-ha/
Percona Replication Manager
PRM is based on …
[Read more]Recently i updated to MySQL 5.6 and we were really excited about the very good overall performance. But beside a major bug concerning wrong results when running a SELECT that includes a HAVING based on a function (see http://bugs.mysql.com/bug.php?id=69638) we also noticed that from time to time the replication breaks with the following error:
Last_SQL_Errno: 1590 Last_SQL_Error: The incident LOST_EVENTS occured on the master. Message: error writing to the binary log
After some investigation it seemed like this happens if one modifies some user privileges, so we stumbled upon http://bugs.mysql.com/bug.php?id=68892.
Essentially the bug report says that if you use the wrong syntax for GRANT-statements the replication will break. So far, so bad. I told everyone who had the …
[Read more]