Showing entries 1 to 3
Displaying posts with tag: pigz (reset)
Which Compression Tool Should I Use for my Database Backups? (Part I: Compression)

This week we are talking about size, which is a subject that should matter to any system administrator in charge of the backup system of any project, and in particular database backups.

I sometimes get questions about what should be the best compression tool to apply during a particular backup system: gzip? bzip2? any other?

The testing environment

In order to test several formats and tools, I created a .csv file (comma-separated values) that was 3,700,635,579 bytes in size by transforming a recent dump of all the OpenStreetMap nodes of the European portion of Spain. It had a total of 46,741,126 rows and looked like this:

171773  38.6048402      -0.0489871      4       2012-08-25 00:37:46     12850816        472193  rubensd
171774  38.6061981      -0.0496867      2       2008-01-19 10:23:21     666916  9250 …
[Read more]
What compression do you use?

The following is an evaluation of various compression utilities that I tested when reviewing the various options for MySQL backup strategies. The overall winner in performance was pigz, a parallel implementation of gzip. If you use gzip today as most organizations do, this one change will improve your backup compression times.

Details of the test:

  • The database is 5.4GB of data
  • mysqldump produces a backup file of 2.9GB
  • The server is an AWS t1.xlarge with a dedicated EBS volume for backups

The following testing was performed to compare the time and % compression savings of various available open source products. This was not an exhaustive test with multiple iterations and different types of data files.

Compression Time
Decompression Time
[Read more]
Chain Copying to Multiple hosts

This week I was given the task of repopulating our entire primary database cluster.  This was due to an alter that had to be performed on our largest table.  It was easiest to run it on one host and populate the dataset from that host everywhere.

I recalled a while back reading a blog post from Tumblr about how to chain a copy to multiple hosts using a combination of nc, tar, and pigz.  I used this, with a few other things to greatly speed up our repopulation process.  As I was repopulating production servers, I did a combination of raw data copy and xtrabackup streams across our servers, depending on the position in our replication setup.

For a normal straight copy, here’s what I did:

On the last host, configure netcat to listen and then pipe the output through pigz and tar to …

[Read more]
Showing entries 1 to 3