I’ve just released changes to all tools in MySQL Toolkit. The biggest changes are in MySQL Table Sync, which I’m beginning to give sane defaults and options to. Some of the changes are incompatible (but that’s what you get with MySQL Table Sync, which is still very rough). I also found and fixed some bugs with MySQL Visual Explain. Thanks to everyone who submitted bug reports. Note, the formatting overflow in MySQL Query Profiler was not a security vulnerability.
Dear Kettle fan,
Since our code is open, we have to be honest: in the past, the
performance of Kettle was less than stellar in the “Text File”
department. It’s true that we did offer some workarounds with
respect to database loading, but there are cases when people
don’t want to touch any database at all. Let’s take a closer look
at that specific problem…
Reading and writing text files…
Let’s take a look at this delimited (CSV) file (28MB). Unzipped, the
file is around 89MB in size.
Suppose you read this file using version 2.5.1 (soon to be out) with a single “Text File Input” step. On my machine, that process consumes most of the available CPU power it can take and takes around 57 seconds to complete. (1M rows/minute or 60M rows/hour)
When we analyze what’s eating the CPU …
[Read more]adapt.com uses sharding to scale out. In my talk on moving to MySQL, I mentioned a few design issues we had to become aware of before we rolled out a sharded database environment. I promised to give more details, so here they are…
We wanted some keys to be globally unique (eg, userid and accountid). How would we do this?
auto_increment_offset and auto_increment_increment
We had been using autoincremented values. So we considered using MySQL’s auto_increment_offset and auto_increment_increment to keep autoincrement values unique across shards. (This basically tells each shard to start at a certain number (auto_increment_offset), eg 1 for one shard, 2 for the next, etc, then to skip auto_increment_increment between values. So, eg, if we set that to 1000, our next keys would be 1001 for shard one, 1002 for shard 2, etc. Then 2001 for shard one, 2002 for shard 2, and so on. …
[Read more]In one of the last commits I added a SQL Tokenizer which understands the basic tokens of (My)SQL:
- Strings
- Literals
- Numbers
- Comments
- Operators
With this basic understanding we can normalize Queries and build statistics over similar queries.
The idea is simple and already implemented in
mysqldumpslow:
/* login.php:37 */SELECT * FROM tbl WHERE id = 1243 AND name = "jan"
is turned into
SELECT * FROM `tbl` WHERE `id` = ? AND `name` = ?
The queries look like prepared statements now and can be used the characterize queries of the same kind.
- comments are removed
- whitespaces are stripped
- literals are quoted
- constants are replaced with ?
Taking the famous world-db and executing some simple queries like:
root@127.0.0.1:4040 …[Read more]
I came across this issue sometime last year, got the solution from the resident Emacs wizard and forgot about it. Last week, it happened again. I'm recording it now so I don't forget about it.
Emacs lets you search and replace in a buffer with regular expressions. So far so cool. I needed to replace a part of a lot of HTML links in a file.
<a href="/path/to/file" title="some title">Go!</a>
was to become
<a href="/path/to/file" title="other title">Go!</a>
with varying values for the title attribute obviously. So I searched for
<a href="/path/to/file" title="([^\"]*)">Go!</a>
and Emacs complained about some invalid regex. Darn. I double and triple checked all necessary escape sequences (which I omitted here for clarity) and finally asked the wizard.
Easy, Emacs doesn't konw the * quantifier. Do …
[Read more]
InnoDB: Number of pending reads 128, pending pread calls 0
InnoDB: Error: InnoDB has waited for 50 seconds for pending
InnoDB: reads to the buffer pool to be finished.
What does this mean? Well searching yahoo search I couldn't find
anything other then pointers to source code where this message
generated. I assumed it was a disk problem because performance
was super crappy.
My disk monitoring software which probes MegaRaidCli didn't
report a failed disk. Linux didn't report a failed disk or any
errors. So, what is happening? Was I wrong?
Well, in fact it was a failed disk, but not in a state that could
be classified as failed by the controller or any other component
including the OS. With the help of my bud we found that a disk in
the array was in a critical state-the disk itself just had a
bunch of sector errors, yet the controller, os, everything would
not mark it as dead - so the end result …
We have started MySQL Camp II. The first session is Introductions.
I didn’t catch all the employers, but here is part of the list of attendees at the Introduction section. Great to see multiple people from many places including ESPN, priceline.com, Proven Scaling, Solid Tech - sponsors of Dorsal Source, ForSaleByOwner. fontshop.com, 9Mmedia, CafeMom, JP Morgan, …
[Read more]
I mentioned this event a few weeks ago, but it's
worth a reminder.
The Free and
Open Source Software Conference is held in Sankt Augustin, a
cheerful place near Bonn.
Some reasons to attend:
- the organizers are friendly and efficient
- the program is exciting
- the food is good,
- it's full of open source enthusiasts
- I am meeting with a bunch of friends and colleagues
Do you need more? Ah, yes, my session :)
I will present a topic for database administrators, Monitoring and logging a database server.
But there is also Lenz …
Well readers, your either here or your not. MySQL Camp II
starts today in Brooklyn, New York, at Polytechnic University.
Last night’s pre drinks meetup in NYC went well, but today it’s
brass tacks time. View Larger Map
For those of you not able to make it, IRC@Freenode #mysql-camp will be the place to hang out to hear what’s happening. If your not at the camp, please identify yourself. Be sure to also check out the Camp Web Site MySQL Camp II for the plans for today and tomorrow.
For those of you not here, MySQL Camp III is already in planning.
Let’s have a quick look at some clustering examples in the new 3.0 engine:
This example runs all steps in the transformation in a
clustered mode. That means that there are 4 slave
transformations that run in parallel with each other.
The interesting part is that first of all the “Fixed Input”
step is running in parallel, each copy reading a certain part of
a file.
The second thing to mention about it is that we now allow you
to run multiple copies of a single step on a cluster. In this
example, we run 3 copies of a step per slave transformation. In
total there are 12 copies of the sort step in action in
parallel.
IMPORTANT: this is a test-transformation, a real world sorting exercise would also include a “Sorted Merge” step to keep the data sorted. I was too lazy to redo the screenshots though
The clustered transformations now also support logging to …
[Read more]