Percona XtraDB Cluster (PXC) offers a great deal of flexibility when it comes to the state transfer (SST) options (used when a new node is automatically provisioned with data). For many environments, on-the-fly compression capability gives great benefits of saving network bandwidth during the process of sending sometimes terabytes of data. The usual choice for compression here is a built-in Percona XtraBackup compress option (using qpress internally), or options compressor/decompressor for the compression tool of choice. In the second case, the popular option is the gzip or its …[Read more]
This week we are talking about size, which is a subject that should matter to any system administrator in charge of the backup system of any project, and in particular database backups.
I sometimes get questions about what should be the best compression tool to apply during a particular backup system: gzip? bzip2? any other?
The testing environment
In order to test several formats and tools, I created a .csv file (comma-separated values) that was 3,700,635,579 bytes in size by transforming a recent dump of all the OpenStreetMap nodes of the European portion of Spain. It had a total of 46,741,126 rows and looked like this:
171773 38.6048402 -0.0489871 4 2012-08-25 00:37:46 12850816 472193 rubensd 171774 38.6061981 -0.0496867 2 2008-01-19 10:23:21 666916 9250 …[Read more]
The following is an evaluation of various compression utilities that I tested when reviewing the various options for MySQL backup strategies. The overall winner in performance was pigz, a parallel implementation of gzip. If you use gzip today as most organizations do, this one change will improve your backup compression times.
Details of the test:
- The database is 5.4GB of data
- mysqldump produces a backup file of 2.9GB
- The server is an AWS t1.xlarge with a dedicated EBS volume for backups
The following testing was performed to compare the time and % compression savings of various available open source products. This was not an exhaustive test with multiple iterations and different types of data files.
This week I was given the task of repopulating our entire primary database cluster. This was due to an alter that had to be performed on our largest table. It was easiest to run it on one host and populate the dataset from that host everywhere.
I recalled a while back reading a blog post from Tumblr about how to chain a copy to multiple hosts using a combination of nc, tar, and pigz. I used this, with a few other things to greatly speed up our repopulation process. As I was repopulating production servers, I did a combination of raw data copy and xtrabackup streams across our servers, depending on the position in our replication setup.
For a normal straight copy, here’s what I did:
On the last host, configure netcat to listen and then pipe the output through pigz and tar to …[Read more]