Using the MySQL Script mysqlfailover for Automatic Failover with MySQL 5.6 GTID Replication

This post is the second in a series that I will be doing on MySQL Workbench Utilities – Administer MySQL with Python Scripts. You may want to read the first half of this post to understand how MySQL Workbench Utilities work and how you access the scripts. These scripts were written by Chuck Bell (a MySQL employee) and are available as stand-alone scripts (see Chuck’s blog for more information) or as part of the MySQL Workbench utility.

I am going to show you one way that you can use the mysqlfailover script to monitor your replication stack and automatically failover to a slave database when your master has failed. You will need to have both your master and slave databases running with GTID’s enabled. I will provide a brief overview of GTID’s, and how to start replication with GTID enabled. The term “automatically failover” in the title might be a bit misleading, as the failover process is automatic, but it does take a couple of minutes. It is automatic but not instantaneous. Also, you may use the mysqlfailover script on a master with multiple slaves, but in this example I will only have one master and one slave.

Let’s start with a quick review of GTID’s – or global transaction identifiers. GTID’s were introduced in MySQL 5.6.5. With GTID’s, each transaction can be identified and tracked as it is committed on the originating server and applied by any slaves; this means that it is not necessary when using GTIDs to refer to log files or positions within those files when starting a new slave or failing over to a new master, which greatly simplifies these tasks.

(From http://dev.mysql.com/doc/refman/5.6/en/replication-gtids.html)

A global transaction identifier (GTID) is a unique identifier created and associated with each transaction when it is committed on the server of origin (master). This identifier is unique not only to the server on which it originated, but is unique across all servers in a given replication setup. There is a 1-to-1 mapping between all transactions and all GTIDs.

The GTID has this format: GTID = source_id:transaction_id – with the source_id identifying the originating server (in this case, the master server), and the transaction_id being a sequential number of the transactions that were committed on the originating server. For example, the twenty-third (23rd) transaction to be committed originally on the server having the UUID 3E11FA47-71CA-11E1-9E33-C80AA9429562 has this GTID:

3E11FA47-71CA-11E1-9E33-C80AA9429562:23

(From http://dev.mysql.com/doc/refman/5.6/en/replication-gtids-concepts.html.)

When you provide your slave server(s) with the information about which master to use for replication, without using GTID’s, you would normally execute a statement like this on the slave:

CHANGE MASTER TO
  MASTER_HOST = '192.168.1.121',
  MASTER_USER = 'replicate',
  MASTER_PASSWORD = 'password',
  MASTER_PORT = 3306,
  MASTER_LOG_FILE = 'mysql-bin.000003',
  MASTER_LOG_POS = 150691098,
  MASTER_CONNECT_RETRY = 10;

When you have GTID’s enabled, you don’t have to provide the log file and position, you only have to provide this:

CHANGE MASTER TO 
MASTER_HOST = '192.168.1.121',
MASTER_PORT = 3306,
MASTER_USER = 'replicate',
MASTER_PASSWORD = 'password',
MASTER_AUTO_POSITION = 1;

If you have worked with replication before, this should make some sense. If not, then you will probably want to read more about replication and GTID’s.

For this example, I am going to take an existing MySQL database, export the data, install the latest version of MySQL (in this case 5.6.8), enable GTID’s, and then demo the mysqlfailover script. This post is going to be a long one, but I will not try to go into as much detail as I normally would. I am writing this after I have already tested this – so I am writing from memory – and hopefully I won’t forget any steps. Here is my current configuration:

I have an application and web server at 192.168.1.2, a MySQL master server at 192.168.1.121 and a MySQL slave server at 192.168.1.122.

For my master and slave servers, I was running MySQL version 5.5.27. To export the data, I am going to just use mysqldump – but I will not export any of the MySQL tables (such as information_schema, mysql, performance_schema and test). When I upgrade from a new major version of MySQL (such as from 5.5. to 5.6), I like to start with a new install versus trying to upgrade from a previous version. (For large databases, this might not be as efficient or even possible, but since my database dump is only 26 megabytes, this will work for me.)

Since I only have a few MySQL users, I keep the SQL statements that I need to re-create these users and their permissions in a text file. Obviously this isn’t the best and most secure way to do this, but this is for my home system, so it doesn’t matter in my case. If you prefer, you can just upgrade from 5.5 to 5.6 and not export the data – and instructions for upgrading this way may be found via this link – Installing and Upgrading MySQL.

You need to make sure that there aren’t any updates to the database while you are doing your mysqldump. You can lock the database with this command from a mysql prompt FLUSH TABLES WITH READ LOCK; and then unlock it with UNLOCK TABLES;. Here is the mysqldump command that I used:

/usr/local/mysql/bin/mysqldump --databases [list of your databases here, separated by spaces] 
--add-drop-database --add-drop-table --user=root --pass= > /users/tonydarnell/2012_11_30_1645_dbdump.db

(Yes, you will get a notification “Warning: Using a password on the command line interface can be insecure.” so you could leave the password blank and enter it when prompted.)

I usually also create a backup of the entire database in case I have any problems or if I destroy something.

/usr/local/mysql/bin/mysqldump --all-databases --add-drop-database --add-drop-table --user=root \
  --pass=[your_password]
 > /users/tonydarnell/2012_11_30_1645_all_dbdump.db

I keep my data directory on two external USB hard drives that I have set up as a RAID on each machine. I use a symbolic link from my MySQL data directory (/usr/local/mysql/data) to point to a directory on the RAID – with this command ln -s /volumes/server_raid/mysql_data/data /usr/local/mysql/data). Since I am creating a new install, I just rename the directory on the raid mv /volumes/server_raid/mysql_data/data /volumes/server_raid/mysql_data/data-old. When I install MySQ it will create a new data directory. I then can move the new data directory to the RAID, and recreate the link. You could also use this method to move your data directory to another internal or SSD drive.

Now I install MySQL version 5.6.8. (see http://dev.mysql.com/doc/refman/5.6/en/installing.html for instructions on installing MySQL.)

Once I have MySQL 5.6.8 installed on the master (including running any post-install scripts per the instructions above), I can import my database.

mysql -uroot -p < /users/tonydarnell/2012_11_30_1645_all_dbdump.db

I then start MySQL, login, create my users, and the master is finished and ready. I then repeat the same procedures on a slave machine. Since no one has updated the master since my data dump, the master and the slave should be exact copies of each other. If you are using virtual machines, once you have created the first virtual machine to be used as your master, you can just duplicate the VM to be your slave machine. Just be sure to change the server-id option in your mysql config file (my.cnf or my.ini) to be a different number.

It is time to turn on GTID's and to get replication started. If you already have a master and slave configured, then you can refer to this link on how to start replication using GTID's http://dev.mysql.com/doc/refman/5.6/en/replication-gtids-howto.html.

You can either start the GTID process on both MySQL servers by adding these options when you start mysqld:

--gtid_mode=ON --log-bin --log-slave-updates --disable-gtid-unsafe-statements

Or, you can add these options to your MySQL config file (/etc/my.cnf or c:\my.cnf or c:\Windows\my.ini).

gtid_mode=ON
disable-gtid-unsafe-statements = 1
log-bin
log-slave-updates

Binary logging should be enabled on the master, and you will also want to enable binary logging on the slave, so when the slave is promoted to the master, you can make the old master a slave to the new master. See this link for binary log options and variables.

(Starting with MySQL 5.6.9, –disable-gtid-unsafe-statements is now named –enforce-gtid-consistency)

Now that you have both of these options in place, you may start both of your servers. On the slave, you will want to add –skip-slave-start to the mysqld command. You will want to start the slave manually, after you have given the slave the information about the master from a mysql prompt:

To test and make sure that GTID is running, you may issue this command on both servers:

mysql> show global variables like '%GTID%';
+--------------------------------+--------------------------------------------+
| Variable_name                  | Value                                      |
+--------------------------------+--------------------------------------------+
| disable_gtid_unsafe_statements | ON                                         |
| gtid_done                      | 6CD03F68-3B38-11E2-99FA-588CB3DE3E9D:1-242 |
| gtid_lost                      |                                            |
| gtid_mode                      | ON                                         |
| gtid_owned                     |                                            |
+--------------------------------+--------------------------------------------+
5 rows in set (0.00 sec)

You can see the value for gtid_done contains the GTID information – source_id:transaction_id, where 1-242 is the range of transactions that have been committed. (Your values will be different)

Now that we have MySQL replication running with GTID’s enabled, we can look at running the mysqlfailover script. I use Perl on my web site, and my Perl scripts make a connection to the MySQL database by reading the connection information from a text file (connection file) stored on the web server in the CGI directory. (for more information on how I use this connection file, please see Connecting to MySQL with Perl)

This connection file contains the database name, IP address, mysql user and password. This file determines which MySQL server will be used by the web server and in this example the file is named accessWEB. The file contains the following: (you will have to configure the file to match your system)

scripts_db
192.168.1.121
user_name
password

With the mysqlfailover script, you have the option to run a script before failover and after failover. There is an option to also run a script prior to failing over, and one to run a script after failover has finished and mysqlfailover has refreshed the health report.

For this test, I will create a script that will change the connection file information to point to the slave database when the master fails. This is as simple as creating a new connection file with the slave’s information, and then copying it on top of the existing file. I will create a file for each server, and name the files after their IP addresses. So, the file 192-168-1-121.txt will have the same information as the current accessWEB connection file, and the file 192-168-1-122.txt will contain:

scripts_db
192.168.1.122
user_name
password

For my pre-failover script, I will then create a shell script that input some text into a file so that I can see when failover started. The script will be named “prefail.sh”, and it will contain the following:

cd /Library/WebServer/cgi-bin/
echo "failover started" > failover_started.txt

I will need to make sure that prefail.sh has execute privileges and that all of the connection files have the correct privileges as well. And I would want to test the script prior to using it.

For my post-failover script, I will create a shell script that will send me a text message, will change the connection file after failover has occurred and input some text into a file so that I can see when failover finished. I will name this script postfail.sh. It will contain the following:

cd /Library/WebServer/cgi-bin/
cp 192-168-1-122.txt accessWEB
echo "Failover has occurred." | mail 4045552232@messaging.att.net
echo "failover finished" > failover_finished.txt

I have my master and slave using GTID, and the web server is connecting to the master (192.168.1.121). I can now run the mysqlfailover script. I don’t want to run it on the master or slave, because if one of them fails, then the script could fail as well. I will run the script on the web server. If it fails, then it doesn’t matter if the MySQL servers are down, as no one can access the web site anyway.

Prior to running this script, I created a MySQL user name “scripts” to use for the mysqlfailover script. I gave the user the same permissions as root. I have a few options that I will use when executing the mysqlfailover script:

--master=scripts:scripts123@192.168.1.121:3306 - connection information for the master
--slaves=scripts:scripts123@192.168.1.122:3306 - connection information for the slave(s)
--candidates=scripts:scripts123@192.168.1.122:3306 - a list of candidates for failover
--exec-before=/users/tonydarnell/scripts/prefail.sh - the script to execute before the failover
--exec-after=/users/tonydarnell/scripts/postfail.sh - the script to execute after the failover

There is an option for setting the refresh time for the script with the –interval=X option (where X is the number of seconds for the interval), but I will be using the default of 15 seconds.

I am also adding the –force option – because at startup, the console will attempt to register itself with the master. If another console is already registered, and the failover mode is auto or elect, the console will be blocked from running failover. When a console quits, it deregisters itself from the master. If this process is broken, the user may override the registration check by using the –force option.

(From: http://dev.mysql.com/doc/workbench/en/mysqlfailover.html

I can then open a terminal window and run the mysqlfailover script:

# mysqlfailover --master=scripts:scripts123@192.168.1.121:3306 --slaves=scripts:scripts123@192.168.1.122:3306 
--candidates=scripts:scripts123@192.168.1.122:3306 --exec-before=/users/tonydarnell/scripts/prefail.sh 
--exec-after=/users/tonydarnell/scripts/postfail.sh --force

Here is a screen shot of the script in action:

To test the script, and to simulate the master server crashing or the mysqld process failing, I will just kill the mysqld process that is on the master server. Since I am using mysqld_safe to start the mysqld process, I will need to kill that process as well.

Once the mysqld processes have been killed, and the mysqlfailover script has refreshed (or you can refresh it manually), the failover process will start. This entire process might take 20-30 seconds (give or take), and you will see something similar to this:

Once the process has completed, the mysqlfailover script will now show you that the failover process has completed and the slave at 192.168.1.122 is now the master.

If you have more than one slave attached to the master, there are options that will allow you to specify a slave to become the master, or you can have the mysqlfailover script decide which slave is the best candidate to be promoted to master. You will need to refer to the mysqlfailover page for more information.

We can check to make sure that our scripts ran successfully by checking the actions of our pre and post-failover scripts. We can check to see if the files were created by our “echo” commands in both scripts:

We can also check our accessWEB file, to see that it has the new connection information.

With the mysqlfailover script, both of our pre and post-failover scripts were executed, and our slave was promoted to the master. Even though the failover process wasn’t immediate (the entire failover process took about a minute), it was successful.

Once the failover has completed, and the old master has been restarted, you can then make the old master (192.168.1.121) a slave to the new master with this command:

CHANGE MASTER TO 
MASTER_HOST = '192.168.1.122',
MASTER_PORT = 3306,
MASTER_USER = 'replicate',
MASTER_PASSWORD = '',
MASTER_AUTO_POSITION = 1;

The mysqlfailover script will recognize the new slave, but now your scripts will not be correct in that it will not copy the master info to the accessWEB file – so you would want to change them to match the new configuration. Of course, you can obviously create scripts that provide the logic to failover to whichever server is available – maybe that is a topic for a future post.

If you prefer to have the old master as the current master, then you can wait until the old master catches up to the new master, stop both servers, and make the old master the new master again. But it is easier to just keep both servers in the new configuration until failover happens again. You could also use the mysqlfailover script to fail over to old master as well – making it the new master again.

 

Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn.