Showing entries 1 to 4
Displaying posts with tag: pigz (reset)
Putting Galera SST Compression on the benchmark

I mentioned in my Galera crash (un)safe post that it’s bad if a SST is triggered on a large dataset. On a Galera node an SST will simply wipe the contents of the MySQL data directory and copy a snapshot from another node in the cluster. For consistency this is a favorable thing to do, but from an HA perspective it isn’t: one node in the cluster is unavailable and another one is acting as donor. This means, if you have a three node Galera cluster, only two are available where one has “degraded” performance.

This could have quite high impact, for instance in the proxysql_galera_checker script only synced nodes are selected to receive traffic. By default the donor nodes will not receive any traffic and now the cluster only has one node left to serve traffic. Don’t worry: if there are no synced …

[Read more]
Which Compression Tool Should I Use for my Database Backups? (Part I: Compression)

This week we are talking about size, which is a subject that should matter to any system administrator in charge of the backup system of any project, and in particular database backups.

I sometimes get questions about what should be the best compression tool to apply during a particular backup system: gzip? bzip2? any other?

The testing environment

In order to test several formats and tools, I created a .csv file (comma-separated values) that was 3,700,635,579 bytes in size by transforming a recent dump of all the OpenStreetMap nodes of the European portion of Spain. It had a total of 46,741,126 rows and looked like this:

171773  38.6048402      -0.0489871      4       2012-08-25 00:37:46     12850816        472193  rubensd
171774  38.6061981      -0.0496867      2       2008-01-19 10:23:21     666916  9250 …
[Read more]
What compression do you use?

The following is an evaluation of various compression utilities that I tested when reviewing the various options for MySQL backup strategies. The overall winner in performance was pigz, a parallel implementation of gzip. If you use gzip today as most organizations do, this one change will improve your backup compression times.

Details of the test:

  • The database is 5.4GB of data
  • mysqldump produces a backup file of 2.9GB
  • The server is an AWS t1.xlarge with a dedicated EBS volume for backups

The following testing was performed to compare the time and % compression savings of various available open source products. This was not an exhaustive test with multiple iterations and different types of data files.

Compression
Utility
Compression Time
(sec)
Decompression Time
[Read more]
Chain Copying to Multiple hosts

This week I was given the task of repopulating our entire primary database cluster.  This was due to an alter that had to be performed on our largest table.  It was easiest to run it on one host and populate the dataset from that host everywhere.

I recalled a while back reading a blog post from Tumblr about how to chain a copy to multiple hosts using a combination of nc, tar, and pigz.  I used this, with a few other things to greatly speed up our repopulation process.  As I was repopulating production servers, I did a combination of raw data copy and xtrabackup streams across our servers, depending on the position in our replication setup.

For a normal straight copy, here’s what I did:

On the last host, configure netcat to listen and then pipe the output through pigz and tar to …

[Read more]
Showing entries 1 to 4