Orchestrator-agent: How to recover a MySQL database

In our previous post, we showed how Orchestrator can handle complex replication topologies. Today we will discuss how the Orchestrator-agent complements Orchestrator by monitoring our servers, and provides us a snapshot and recovery abilities if there are problems.

Please be aware that the following scripts and settings in this post are not production ready (missing error handling, etc.) –  this post is just a proof of concept.

What is Orchestrator-agent?

Orchestrator-agent is a sub-project of Orchestrator. It is a service that runs on the MySQL servers, and it gives us the seeding/deploying capability.

In this context “seeding” means copying MySQL data files from a donor server to the target machine. Afterwards, the MySQL can start on the target machine and use the new data files. 

Functionalities (list from Github):

  • Detection of the MySQL service, starting and stopping (start/stop/status commands provided via configuration)
  • Detection of MySQL port, data directory (assumes configuration is /etc/my.cnf)
  • Calculation of disk usage on data directory mount point
  • Tailing the error log file
  • Discovery (the mere existence of the orchestrator-agent service on a host may suggest the existence or need of existence of a MySQL service)
  • Detection of LVM snapshots on MySQL host (snapshots that are MySQL specific)
  • Creation of new snapshots
  • Mounting/umounting of LVM snapshots
  • Detection of DC-local and DC-agnostic snapshots available for a given cluster
  • Transmitting/receiving seed data

The following image shows us an overview of a specific host (click on an image to see a larger version):

How does it work?

The Orchestrator-agent runs on the MySQL server as a service, and it connects to Orchestrator through an HTTP API. Orchestrator-agent is controlled by Orchestrator. It uses and is based on LVM and LVM snapshots: without them it cannot work.

The agent requires external scripts/commands example:

  • Detect where in the local and remote DCs it can find an appropriate snapshot
  • Find said snapshot on server, mount it
  • Stop MySQL on target host, clear data on MySQL data directory
  • Initiate send/receive process
  • Cleanup data after copy

If these external commands are configured, a snapshot can be created through the Orchestrator web interface example. The agent gets the task through the HTTP API and will call an external script, which creates a consistent snapshot.

Orchestrator-agent configuration settings

There are many configuration options, some of which we’ll list here:

  • SnapshotMountPoint
     – Where should the agent mount the snapshot.
  • AgentsServer
      –  Where is the AgentServer example: “http://192.168.56.111:3001” .
  • CreateSnapshotCommand
      – Creating a consistent snapshot.
  • AvailableLocalSnapshotHostsCommand
      – Shows us the available snapshots on localhost.
  • AvailableSnapshotHostsCommand
      – Shows us the available snapshots on remote hosts.
  • SnapshotVolumesFilter
      – Free text which identifies MySQL data snapshots.
  • ReceiveSeedDataCommand
      – Command that receives the data.
  • SendSeedDataCommand
      – Command that sends the data.
  • PostCopyCommand
      – Command to be executed after the seed is complete.

Example external scripts

As we mentioned before, these scripts are not production ready.

"CreateSnapshotCommand": "/usr/local/orchestrator-agent/create-snapshot.sh",
#!/bin/bash
donorName='MySQL'
snapName='my-snapshot'
lvName=`lvdisplay | grep "LV Path" | awk '{print $3}'|grep $donorName`
size='500M'
dt=$(date '+%d_%m_%Y_%H_%M_%S');
mysql -e"STOP SLAVE; FLUSH TABLES WITH READ LOCK;SELECT SLEEP(10);" &>/dev/null &
lvcreate --size $size --snapshot --name orc-$snapName-$dt $lvName

This small script creates a consistent snapshot what agent can use later, but it is going to Lock all the tables for 10 seconds.  (Better solutions can be exists but this is a proof of concept script.)

"AvailableLocalSnapshotHostsCommand": "lvdisplay | grep "LV Path" | awk '{print $3}'|grep my-snapshot",

We can filter the available snapshots based on the “SnapshotVolumesFilter” string.

"AvailableSnapshotHostsCommand": "echo rep4",

You can define a command that can show where the available snapshots in your topology are, or you can use a dedicated slave. In our test, we easily used a dedicated server.

"SnapshotVolumesFilter": "-my-snapshot",

“-my-snapshot” is the filter here.

"ReceiveSeedDataCommand": "/usr/local/orchestrator-agent/receive.sh",
#!/bin/bash
directory=$1
SeedTransferPort=$2
echo "delete $directory"
rm -rf $directory/*
cd $directory/
echo "Start nc on port $SeedTransferPort"
`/bin/nc -l -q -1 -p $SeedTransferPort | tar xz`
rm -f $directory/auto.cnf
echo "run chmod on $directorty"
chown -R mysql:mysql $directory

The agent passes two parameters to the script, then it calls the script like this:

/usr/local/orchestrator-agent/recive.sh /var/lib/mysql/ 21234

The script cleans the folder (you can not start while mysqld is running; first you have to stop it on the web interface or command line), listens on the specified port and it waits for the compressed input. After it removes “auto.cnf”,  MySQL recreates a new UUID at start time. Finally, make sure every file has the right owner.

"SendSeedDataCommand": "/usr/local/orchestrator-agent/seed.sh",
#!/bin/bash
directory=$1
targetHostname=$2
SeedTransferPort=$3
cd $directory
echo "start nc"
`/bin/tar -czf - -C $directory . | /bin/nc $targetHostname $SeedTransferPort`

The agent passes three parameters to the script:

/usr/local/orchestrator-agent/seed.sh /tmp/MySQLSnapshot rep5 21234

The first parameter is the mount point of the snapshot, and the second one is the destination host and the port number. The script easily compresses the data and sends it through “nc”.

Job details

A detailed log can be found from every seed. These logs can be really helpful in discovering any problems during the seed.

Why do we need Orchestrator-agent?

If you have a larger MySQL topology where you frequently have to provide new servers, or if you have a dev/staging replica set where you want to easily go back to a previous production stage, Orchestrator-agent can be really helpful and save you a lot of time. Finally, you’ll have the time for other fun and awesome stuff!

Features requests

Orchestrator-agent does its job, but adding a few extra abilities could make it even better:

  • Adding XtraBackup support.
  • Adding Mysqlpump/Mysqldump/Mydumper support.
  • Implement some kind of scheduling.
  • Batch seeding (seeding to more than one server with one job.)

Summary

Shlomi did a great job again, just like with Orchestrator.

Orchestrator and Orchestrator-agent together give us a useful platform to manage our topology and deploy MySQL data files to the new servers, or re-sync old ones.