This release contains bug fixes and new features. It also contains a new tool: my implementation of Paul Tuckfield's relay log pipelining idea. I have had quite a few responses to that blog post, and requests for the code. So I'm releasing it as part of Maatkit.
I dashed off a hasty post about speeding up replication slaves, and gave no references or explanation. That's what happens when I write quickly! This post explains what the heck I was talking about.
Paul Tuckfield of YouTube has spoken about how he sped up his slaves by pre-fetching the slave's relay logs. I wrote an implementation of this, tried it on my workload, and it didn't speed them up. (I didn't expect it to; I don't have the right workload). I had a few email exchanges with Paul and some other experts on the topic and we agreed my workload isn't going to benefit from the pre-fetching.
In the meantime, I've got a pretty sophisticated implementation of Paul's idea just sitting around, unused. I haven't released it for the same reasons Paul didn't release his: I'm afraid it might do more harm than good.
However, if you'd like the code, send me an email at [baron at this domain] and I'll share the code with you. In return, I would like you to tell me about your hardware and your workload, and to do at least some rudimentary benchmarks to show whether it works or not on your workload. If I find that this is beneficial for …
[Read more]Whew! I just finished a marathon of revisions. It's been a while since I posted about our progress, so here's an update for the curious readers.
There is a lot of information out there about how to setup
circular replication but nothing about how to recover it when all
else fails. This article will cover a quick and easy method I
use. Depending on the size of your database and the interconnects
between servers this method may not be suitable due to the need
to copy all replicated databases from one good server to all
other servers in the replication circle which requires a certain
amount of down time respectively.
OK, so one server or more is showing
Slave_IO_Running
and/or
Slave_SQL_Running
as No
and there is
some error about a failed query when you run "show slave
status;
" and no amount of effort to fix it is working.
First DO NOT PANIC. It is broken, OK, tell yourself that and
realise that trying to fix something when you are in a panicked
state is only liable to make the situation worse, hell I'd
guarantee it, so …
This release contains bug fixes and new features. Click through to the full article for the details. I'll also write more about the changes in a separate article.
My posts lately have been mostly progress reports and release notices. That's because we're in the home stretch on the book, and I don't have much spare time. However, a lot has also been changing with Maatkit, and I wanted to take some time to write about it properly.
... I didn't get two-way sync done, and I didn't get the Nibble algorithm done. That much I expected. But I also didn't get the current work released tonight because I'm paranoid about breaking things. I'm trying to go through all the tools and write at least a basic test for them to be sure they can do the simplest "unit of work" (such as mk-find running and printing out that it finds the mysql.columns_priv table).
It's good that I'm doing this. I found that mk-heartbeat suddenly doesn't work on my Ubuntu 7.10 laptop. It goes into infinite sleep. Can anyone repro this and/or diagnose? The same code works fine on my Gentoo servers at work.
Hopefully I'll be able to release something very soon. Release early/often is fine, but "knowingly release brokenness" isn't in my code of conduct :)
This is the last day I'm taking off work to hack on mk-table-sync, and I thought it was time for (yet another) progress report. Here's what I have done so far. (Click through to the full article to read the details).
I created MySQL Table Checksum because I was certain replication slaves were slowly drifting out of sync with their masters, and there was no way to prove it. Once I could prove it, I was able to show that replication gets out of sync for lots of people, lots of times. (If you really want to hear war stories, you should probably talk to one of the MySQL support staff or consulting team members; I'm sure they see this a lot more than I do).
I finally figured out what was causing one of my most persistent and annoying out-of-sync scenarios. It turns out to be nothing earth-shaking; it's just an easy-to-overlook limitation of statement-based replication. You could call it a bug, but as far as I can see, there's no way to fix it with statement-based replication. (I'd love to be proven wrong). Read on for the details.