Showing entries 11 to 20 of 29
« 10 Newer Entries | 9 Older Entries »
Displaying posts with tag: Good practice / Bad practice (reset)
InnoDB without PRIMARY KEY

Having an InnoDB table without a PRIMARY KEY is not good. Many have known this for years, but exact opinions as to why have differed. From observation, it was clear to me that it impacted performance.

InnoDB stores its row data in the leaf nodes of the primary key B+tree structure, that means that it can’t work without… so if you don’t specify a PK, it makes one up. Seems pretty innocuous and shouldn’t actually perform any worse than an auto-inc field. Except that in reality the performance can be much much worse. Annoying. Naturally we recommend clients to always have a PK (auto-inc, a composite of foreign keys, or if need be a natural key) but production systems cannot always be quickly changed, depending on the app code adding a column is not something you can just do at the DBA level.

Recently my good friend and former colleague Jeremy Cole, who has been delving into the depths of InnoDB, asked me if I had any …

[Read more]
Serving Clients Rather than Falling Over

Dawnstar Australis (yes, nickname – but I know him personally – he speaks with knowledge and authority) updates on The Real Victims Of The Click Frenzy Fail: The Australian Consumer after his earlier post from a few months ago.

Colourful language aside, I believe he rightfully points out the failings of the organising company and the big Australian retailers. From the Open Query perspective we can just review the situation where sites fall over under load. Contrary to what they say, that’s not a cool indication of popularity. Let’s compare with the real world:

  1. Brick & Mortar store does something that turns out popular and we see a huge queue outside, people need to wait for hours. The people in the queue can chat, and overall …
[Read more]
One-way Password Crypting Flaws

I was talking with a client and the topic of password crypting came up. From my background as a C coder, I have a few criteria to regard a mechanism to be safe. In this case we’ll just discuss things from the perspective of secure storage, and validation in an application.

  1. use a digital fingerprint algorithm, not a hash or CRC. A hash is by nature lossy (generates evenly distributed duplicates) and a CRC is intended to identify bit errors in transmitted data, not compare potentially different data.
  2. Store/use all of the fingerprint, not just part (otherwise it’s lossy again).
  3. SHA1 and its siblings are not ideal for this purpose, but ok. MD5 and that family of “message digests” has been proven flawed long ago, they can be “freaked” to create a desired outcome. Thus, it is possible to …
[Read more]
What a Hosting Provider did Today

I found Dennis the Menace, he now has a job as system administrator for a hosting company. Scenario: client has a problem with a server becoming unavailable (cause unknown) and has it restarted. MySQL had some page corruption in the InnoDB tablespace.

The hosting provider, being really helpful, goes in as root and first deletes ib_logfile* then ib* in /var/lib/mysql. He later says “I am sorry if I deleted it. I thought I deleted the log only. Sorry again.”  Now this may appear nice, but people who know what they’re doing with MySQL will realise that deleting the iblogfiles actually destroys data also. MySQL of course screams loudly that while it has FRM files it can’t find the tables. No kidding!

Then, while he’s been told to not touch anything any more, and I’m trying to see if I can recover the deleted files on ext3 filesystem (yes there are tools for that), he goes in again and puts an ibdata1 file back. No, …

[Read more]
On Password Strength

XKCD (as usual) makes a very good point – this time about password strength, and I reckon it’s something app developers need to consider urgently. Geeks can debate the exact amount of entropy, but that’s not really the issue: insisting on mixed upper/lower and/or non-alpha and/or numerical components to a user password does not really improve security, and definitely makes life more difficult for users.

So basically, the functions that do a “is this a strong password” should seriously reconsider their approach, particularly if they’re used to have the app decide whether to accept the password as “good enough” at all.

Update: Jeff Preshing has written an xkcd password generator. Users probably should choose their own four …

[Read more]
MySQL data backup: going beyond mysqldump

A user on a linux user group mailing list asked about this, and I was one of the people replying. Re-posting here as I reckon it’s of wider interest.

> [...] tens of gigs of data in MySQL databases. > Some in memory tables, some MyISAM, a fair bit InnoDB. According to my > understanding, when one doesn’t have several hours to take a DB > offline and do dbbackup, there was/is ibbackup from InnoBase.. but now > that MySQL and InnoBase have both been ‘Oracle Enterprised’, said > product is now restricted to MySQL Enterprise customers.. > > Some quick searching has suggested Percona XtraBackup as a potential > FOSS alternative. > What backup techniques do people employ around these parts for backups > of large mixed MySQL data sets where downtime *must* be minimised? > > Has your backup plan ever been put to the test?

You should put it to the test regularly, not just when it’s needed. …

[Read more]
Challenge: identify this pattern in datadir

You take a look at someone’s MySQL (or MariaDB) data directory, and see

mysql
foo
bar -> foo

  1. What’s the issue? Identify pattern.
  2. What does it mean?  Consequences.
  3. Is there any way it can be safe and useful/usable? Describe.

Good luck!

Unqualified COUNT(*) speed PBXT vs InnoDB

So this is about a SELECT COUNT(*) FROM tblname without a WHERE clause. MyISAM has an optimisation for that since it maintains a rowcount for each table. InnoDB and PBXT can’t do that (at least not easily) because of their multi-versioned nature… different transactions may see a different number of rows for the table table!

So, it’s kinda known but nevertheless often ignored that this operation on InnoDB is costly in terms of time; what InnoDB has to do to figure out the exact number of rows is scan the primary key and just tally. Of course it’s faster if it doesn’t have to read a lot of the blocks from disk (i.e. smaller dataset or a large enough buffer pool).

I was curious about PBXT’s performance on this, and behold it appears to be quite a bit faster! For a table with 50 million rows, PBXT took about 20 minutes whereas the same table in InnoDB took 30 minutes. Interesting! …

[Read more]
The actual range and storage size of an INT

What’s the difference between INT(2) and INT(20) ? Not a lot. It’s about output formatting, which you’ll never encounter when talking with the server through an API (like you do from most app languages).

The confusion stems from the fact that with CHAR(n) and VARCHAR(n), the (n) signifies the length or maximum length of that field. But for INT, the range and storage size is specified using different data types: TINYINT, SMALLINT, MEDIUMINT, INT (aka INTEGER), BIGINT.

At Open Query we tend to pick on things like INT(2) when reviewing a client’s schema, because chances are that the developers/DBAs are working under a mistaken assumption and this could cause trouble somewhere – even if not in the exact spot where we pick on it. So it’s a case of pattern recognition.

A very practical example of this comes from a client I worked with last week. I first spotted some harmless ones, we talked about it, and then we hit …

[Read more]
relay-log-space-limit

We don’t often see this option configured (default: unlimited) but it might be a good idea to set it. What it does is limit the amount of disk space the combined relay logs are allowed to take up.

A slave’s IO_Thread reads from the master and puts the events into the relay log; the slave’s SQL_Thread reads from the relay log and executes the query. If/when replication “breaks”, unless it’s connection related it tends to be during execution of a query. In that case the IO_Thread will keep running (receiving master events and storing in the relay log). Beyond some point, that doesn’t make sense.

The reason for having two separate replication threads (introduced in MySQL 4.0) is that long-running queries don’t delay receiving more data. That’s good. But receiving data is generally pretty fast, so as long as that basic issue is handled, it’s not necessary (for performance) to have the IO_Thread run ahead that far. …

[Read more]
Showing entries 11 to 20 of 29
« 10 Newer Entries | 9 Older Entries »