I haven't written in a while. Some of the reason is that life has
been incredibly busy with two young children. The other reason is
that I've been busy at work with my team developing HP Cloud DNS,
a service that runs in the cloud, as do the various other
applications within the Platform as a Service group that I am
part of. After a period of rapid development and deployment,
Cloud DNS recently went into private
beta . The ability to build an cloud application such as this
was primarily that I have a great team (Kiall Mac Innes, Simon
McCartney and a great product manger Josh Barry) as well as
having chosen the right tools and software solutions. For the
deployment side of things, Chef, despite frustrations that I have
had before getting my mind around it, was a solution that I knew
would make it so we could deploy anywhere, as well as being what
HPCS uses for their deployments for under-the-cloud
infrastructure. This is a topic in of itself, something that I
wish to elaborate in another future post. For the application,
the core of Cloud DNS is based around Moniker, an OpenStack-like project created by
Of course there is a need in an application for a backing database, and the choice I pondered was a MySQL dual-master setup or Galera Cluster, which I had been wanting to investigate and familiarize myself with for some time because of what little I did know about it. Having synchronous replication and a multi-master setup where split brain is not a much of an issue (if at least three nodes are used) was very appealing. Also, avoiding having to do a pacemaker setup for failover, which me saying so is not a disparagement, but the goal being with a three person team to keep the database solution simple.
After testing Percona XtraDB Cluster which uses Galera, I was very pleased with the results and the performance was excellent for what we needed. Since we use chef, there was an existing cookbook for Percona XtraDB Cluster albeit it was not really ready for what we needed and it even attempted to build the server from source, something we definitely wanted to avoid adding to a deployment process, so quite a few changes were needed to simplify it and utilize binary packages for installation instead of source compilation.
The functionality of the cookbook for a cluster is somewhat straightforward now that I reflect on how it was implemented and know more about the cluster setup, but at the time I wasn't sure of the approach I would take with the cookbook. The basic setup is that three nodes are needed in a Galera cluster, the first node needs to be boot-strapped. At this point, being that this is a cloud environment where a node is launched and then connected to a chef server which depending on the run list that node is provisioned accordingly, the other cloud instances are not yet in existence as far as this boot-strapped node, and chef, are concerned. Why is this important to discuss? Because in setting up a Galera cluster, there is a MySQL parameter "wsrep_cluster_address", which has the format "gcomm://,,...". The value of this parameter is a string in the chef recipe that will be built using the results of chef search for percona nodes, excluding itself, and interpolated in the template for my.cnf. It's important to note, that in the beginning, the first node is set up as a cluster of one with solely "gcomm://" as the wsrep_cluster_address. So, at this point of bootstrapping, node1 would have "gcomm://". This happens without any particular logic, because there are no other nodes other than itself, hence the list of nodes used to build the string empty.
After the first node is boot-strapped, the second node is launched and provisioned with chef. At this point, through chef, this node can be searched and found to exist. Also, the the first node (shall henceforth be refered to as node1) that was bootstrapped can be found through a chef search. And with this search, the IP address of the first node1 is used as the wsrep_cluster_address for node2 such that "wsrep_cluster_address=gcomm://. When node2 has MySQL started, it connects to node1 and state exchange is initiated, the cluster state being given to node2. Again, node1 is the single-node cluster at this point and the state of node2 is most likely different hence node2 will request a snapshot (in this case using Xtrabackup) of the state (Snapshot State Transfer, SST). Once node2 the SST is installed, node2 will then join the cluster, the cluster now consisting of node1 and node2.
Then the third node is launched and provisioned with chef. Now there are two other nodes that can be found, hence "wsrep_cluster_address=gcomm://, (the order depending on results of chef search). The same process as occured with node2 now begins for node3: state exchange, SST (with two nodes, the cluster will assign either node1 or node2 as the SST donor), then SST installation, node3 then joins the cluster of node1 and node2. At this point, the wsrep_cluster_address as found in the chef generated from template my.cnf for the three nodes are as follows:
- node1: "wsrep_cluster_address=gcomm://"</li>
- node2: "wsrep_cluster_address=gcomm://<node1 IP>"
- node3: "wsrep_cluster_address=gcomm://<node1
It is apparent that this is not complete! What must remain to be done? Well, through the utility of chef, a subsequent run of chef-client is required. The recipe in the cookbook has logic that as stated before performs a chef search that will now return a list of all three nodes. Of course, in building up the string for wsrep_cluster_address to be used in the template for the my.cnf, excluding itself. At end end of the second chef-client run on each of the nodes, the three nodes will have in their my.cnf the following:
- node1: "wsrep_cluster_address=gcomm://<node2
- node2: "wsrep_cluster_address=gcomm://<node1
- node3: "wsrep_cluster_address=gcomm://<node1
At this point, there is a running cluster!
The other important design of this is the HAProxy setup. This setup is pretty much describe in Peter's excellent post. Essentially, each application node also has a running HAProxy that is configured to use the node in the cluster with the least connections and utilizes an HTTP check against an inetd service that provides the output of the utility clustercheck which simply indicates if the cluster has a problem. The application itself connects to localhost and HAProxy takes care of the rest. And all of this is built using chef!
Finally, the other important HA design of all of this is that all of these components-- each database node and each API server with HAProxy are in separate availability zones. The functionality that Simon built into our deployment tool is cognizant of all three availability zones and is able to launch these components into each AZ seamlesly!
The following illustration shows the finished product:
It is worth noting that I worked with Peter Boros of Percona on this setup, as well as the rest of my team of course was able to help me get unblocked while developing this cookbook for building this.
I will conclude this post by stating that I am very pleased at the architecture we have chosen for the database component of our DNSaaS and encourage anyone who needs a simple MySQL-based HA solution for their application to consider using such a setup. Also, the basic chef cookbooks for this I will make available at a later day after much cleanup and testing.