Showing entries 1 to 10
Displaying posts with tag: archive (reset)
Archiving MySQL and MongoDB Data

This post discusses archiving MySQL and MongoDB data, and determining what, when and how to archive data.

Many people store infrequently used data. This data is taking up storage space and might make your database slower than it could be. Archiving data can be a huge benefit, both regarding the performance impact and storage savings.

Why archive?

One of the reasons for archiving data is freeing up space on your database volumes. You can store archived data on slower, less expensive storage devices, and current data on the faster database drives. Archiving old data makes backups and restores run faster since they need to process less data. Last, but by no means least, archiving data has the benefit of making your queries perform more efficiently since they do not need to process through old …

[Read more]
MySQL compression: Compressed and Uncompressed data size

MySQL has information_schema.tables that contain information such as “data_length” or “avg_row_length.” Documentation on this table however is quite poor, making an assumption that those fields are self explanatory – they are not when it comes to tables that employ compression. And this is where inconsistency is born. Lets take a look at the same table containing some highly compressible data using different storage engines that support MySQL compression:


mysql> select * from information_schema.tables where table_schema='test' G
*************************** 1. row ***************************
ROW_FORMAT: tokudb_zlib
DATA_LENGTH: 409722880
[Read more]
The ARCHIVE Storage Engine

I wonder how much longer the ARCHIVE storage engine is going to ship with MySQL…. I think I’m the last person to actually fix a bug in it, and that was, well, a good number of years ago now. It was created to solve a simple problem: write once read hardly ever. Useful for logs and the like. A zlib stream of rows in a file.

You can actually easily beat ARCHIVE for INSERT speed with a non-indexed MyISAM table, and with things like TokuDB around you can probably get pretty close to compression while at the same time having these things known as “indexes”.

ARCHIVE for a long time held this niche though and was widely and quietly used (and likely still is). It has the great benefit of being fairly lightweight – it’s only about 2500 lines of code (1130 if you exclude azio.c, the slightly modified gzio.c from zlib).

It also use the table discovery mechanism that NDB uses. If you remove the FRM file for an ARCHIVE …

[Read more]
Where are they now: MySQL Storage Engines

There was once a big hooplah about the MySQL Storage Engine Architecture and how it was easy to just slot in some other method of storage instead of the provided ones. Over the years I’ve repeatedly mentioned how this wasn’t really

[Read more]
Setting up slave, stripping indexes and changing engines, on the fly

Warning, the following is quite ugly, but does the job :)

A while back I needed to create an archive slave database from a half a terabyte myisam master and had space restrictions. I could not dump the db, load it, then drop keys (archive doesn’t support keys apart from a primary key on one column as of 5.1), alter engine etc (would take even longer than it took either way). So an ugly single liner came to mind and worked nicely too.

mysqldump -uuser -ppassword -h127.0.0.1 -P3306 dbname --master-data=1 | sed 's/ENGINE=MyISAM/ENGINE=archive/g' | grep -v '^ UNIQUE KEY' | grep -v '^ KEY' | perl -p0777i -e 's/,\n^\)/\n\)/mg' | mysql -uuser -ppassword -h127.0.0.1 -P3307 dbname

So what is it doing?
Broken down:
mysqldump -uuser -ppassword -h127.0.0.1 -P3306 dbname --master-data=1 –> extract the database with the master position
sed …

[Read more]
A tale of a bug…

So I sometimes get asked if we funnel back bug reports or patches back to MySQL from Drizzle. Also, MariaDB adds some interest here as they are a lot closer (and indeed compatible with) to MySQL. With Drizzle, we have deviated really quite heavily from the MySQL codebase. There are still some common areas, but they’re getting rarer (especially to just directly apply a patch).

Back in June 2009, while working on Drizzle at Sun, I found a bug that I knew would affect both. The patch would even directly apply (well… close, but I made one anyway).

So the typical process of me filing a MySQL bug these days is:

  • Stewart files bug
  • In the next window of Sveta being awake, it’s verified.

[Read more]
Drizzle FRM replacement: the table proto

Drizzle originally inherited the FRM file from MySQL (which inherited it from UNIREG). The FRM file stores metadata about a table; what columns it has, what type those columns are, what indexes, any default values, comments etc are all stored in the FRM. In the days of MyISAM, this worked relatively well. The row data was stored in table.MYD, indexes on top of it in table.MYI and information about the format of the row was
in table.FRM. Since MyISAM itself wasn’t crash safe, it didn’t really matter if creating/deleting the FRM file along with the table was either.

As more sophisticated engines were introduced (e.g. InnoDB) that had their own data dictionary, there started to be more of a problem. There were now two places storing information about a table: the FRM file and the data dictionary specific to the engine. Even if the data dictionary of the storage engine was crash safe, the FRM file was not plugged into that, so you …

[Read more]
There is more than one way to do it….

I spent Friday examining the systems for a more traditional consulting gig (in case you did not know, Pythian has offered more traditional consulting and emergency services in addition to our remote DBA offering for our entire 12 year history). It is a familiar story to many people — the database performance was unacceptable.

The company had a few log tables that had huge amounts of inserts and foreign keys, so they used InnoDB. Unfortunately, they also used a GUID as a primary key (which is getting more and more common these days, but long primary keys with InnoDB slow down INSERT, UPDATE and DELETE commands a great deal) — varchar(32) and utf8.

That’s right — their primary key for many of these tables was 96 bytes long (32 characters * 3 bytes per character), and as an InnoDB table, the primary key is clustered with …

[Read more]
A lesson in never letting your filesystem fill up too much

A good rule of thumb is that a DBA should not let the size of his/her database grow to much more than 50% of the available disk space. I’m sure everyone has a story of being burned in some way by not following this advice.

Before promoting 5.1 to some beefier hardware, I’ve got a few large tables sitting on a small SATA-based machine. Noticing that a few were growing faster than I expected, i decided to pause my loading and convert another one of them to the archive storage engine to save some space.

I certainly saved space.

By trying to do too much at one time, I had some other temp files lying around on the same filesystem, and my conversion to the archive storage engine failed at about 80% through with my disk full.

mysql> alter table w_stats engine = Archive;
ERROR 1030 (HY000): Got error -1 from storage engine

mysql 5.1 has …

[Read more]
One backup script that does it all.

This integrates with Monolith, but the database update function can be stripped out for use without Monolith. The idea is that this script is a wrapper for mysqldump that does backup file consistency checking, email reporting, file based logging and directory pruning.

I used to have one script for daily, weekly, and monthly all running out of /etc/cron.daily /etc/cron.weekly /etc/cron.monthly - respectively. But maintaining 3 scripts is foolish if one can do everything. So I added some variables to check day of week and day of month to achieve this.

Enjoy the code. Script Link here.

Showing entries 1 to 10