Showing entries 11 to 20 of 33
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: yves (reset)
The upcoming MySQL UC 2009

The MySQL UC 2009 is coming and it is time for my own little marketing. As Matt already annonced it a few months ago we (Matt and I) are doing a WaffleGrid presentation, Distributed InnoDB caching with Memcached, Tuesday at 2PM. I am also presenting at the MySQL Camp or unconference, NBD (MySQL Cluster) performance tuning and pitfalls, also Tuesday at 4:25PM.

NDB Cluster one step closer to become a DB killer app!

If you have been following the development of the NDB Cluster storage engine lately, you are probably as excited as I am. NDB Cluster is becoming a kind of large database killer app. Look at all the nice features that have been added:

  • Replication, if you know MySQL you know what I am talking about
  • Distribution awareness, optimize query execution based on the distribution, a strong scaling factor
  • Disk based data, the possibility of pushing some columns to disk
  • Online add index, among the only online DDL I know of in MySQL
  • Multi-threading, no more need to configure many data nodes per server
  • Realtime, when query execution times matter

and I probably miss some. And now, with version 7 (renamed from 6.4) it is possible to …

[Read more]
Improvement in subquery execution with the upcoming MySQL-6.0 branch

Although I know the JOIN syntax very well, IN clauses are often so much easier to read. A few years ago I was developing a custom ERP application on SAPDB/MAXDB (still opensource at the time) and I used extensively IN clauses to make complex queries easier to read. Of course with MySQL I have been fairly disappointed up to now by the performance of the IN clause. Look at the following explain from a 5.1.x database:

mysql> explain select u2.name from user u2 where u2.id IN (select Friend from friend where friend.UserID IN (select id from user where name = 'Yves'));
+----+--------------------+--------+-----------------+---------------+---------+---------+------+------+--------------------------+
| id | select_type        | table  | type            | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+--------------------+--------+-----------------+---------------+---------+---------+------+------+--------------------------+
| …
[Read more]
Social networking type queries with NDB (part 2)

Recently, I talked about how to optimize social networking type queries for the NDB storage engine using IN clause statements. In clauses are great but they have one fundamental limitation, they work only on one column (Actually, this is not true, I discovered, thanks to Roland’s comment, that MySQL supports multiple columns IN clause). What if the primary key is a composite of let’s say “region_id”, “application_id” and “user_id”? Recently, while on site with a client, Brian Morin showed me a very clever way of dealing these type of primary keys in an IN clause. The main problem is that you cannot return a binary or varbinary from a function. So the idea was to used the return values of a stored proc. First we need to compose the varbinary from the actual values with this stored proc:
.

delimiter $$
drop procedure if exists compose_user_account_key $$ …
[Read more]
High performance replacement of the MySQL Memory storage engine with NDB

People often wants to use the MySQL memory engine to store web sessions or other similar volatile data.
There are good reasons for that, here are the main ones:

  • Data is volatile, it is not the end of the world if it is lost
  • Elements are accessed by primary key so hash index are good
  • Sessions tables are accessed heavily (reads/writes), using Memory tables save disk IO

Unfortunately, the Memory engine also has some limitations that can prevent its use on a large scale:

  • Bound by the memory of one server
  • Variable length data types like varchar are expanded
  • Bound to the CPU processing of one server
  • The Memory engine only supports table level locking, limiting concurrency

Those limitations can be hit fairly rapidly, especially if the session payload data is large. What is less known is that NDB Cluster can creates tables …

[Read more]
WaffleGrid ported to the InnoDB plugin 1.0.3

For the interested, I just ported Wafflegrid to the newest version of the InnoDB pluging, v. 1.0.3. Meanwhile, I also corrected a small bug with the CRC_32 code that wrongly cause MySQL to report corrupted blocks after being retrieved from memcached. To get the new code, simply do:

bzr branch lp:~y-trudeau/wafflegrid/waffleGrid-Innodb-plugin

and replace the directory storage/innobase by what you will retrieve from launchpad. If you are new to WaffleGrid, you will also need a custom version of memcached and a modified build script for MySQL. All these can be obtained at:

http://bazaar.launchpad.net/~yonkovim/wafflegrid/Mattscode/files/8

You will also need to change the file memcached_constants.h in libmemcached with the following value:

#define MEMCACHED_MAX_BUFFER 16500

This gives a 20x perf gain… Yes… WaffleGrid is still experimental.

The early end of the MiniWheatFS project for tmpfs with MySQL!!!

I am happy to say that I found a kernel level way to join a ramdisk (actually not a tmpfs strictly speaking) with a normal file system. This cut my motivation for the MiniWheatFS project by 99.999%. I recall that the goal of the MiniWheatFS project was to provide and efficient filesystem for the the “tmpdir”, where MySQL puts its temporary files and tables. The trick uses the tendancy of ext2 (and probably ext3 and 4) to use the first available block from its bitmap and LVM to join a ramdisk with a normal device. Here are my steps.

1. Give a ramdisk to LVM

root@yves-laptop:/home/yves# pvcreate /dev/ram0
  Physical volume "/dev/ram0" successfully created

By default, my Ubuntu laptop creates 16 ramdisk of 64 MB each. RAM is not allocated until used. To create bigger one, you need to add a ramdisk_size=SizeInKB to the kernel command line in Grub menu.lst or …

[Read more]
Social Networking type queries with NDB (part 1)

NDB Cluster is the only integrated sharding framework that I know of (educate me if I am wrong) but it is known to have issues with large joins. These days, large databases that would benefit from a sharding framework are often for social networking type applications that requires large joins.

Actually it is not NDB that is the root cause of the joins problem, it is the way MySQL executes joins. Instead of asking the cluster for a large number of rows for the secondary table it joins to, the current version of MySQL does ask one row at a time. NDB cluster answers those queries very rapidly but, the time to hop over the network kills performance. The MySQL-6.0 branch will implement the Batch Key Access (BKA) algorithm which will solve that issue and might create database application killer with NDB cluster.

Although right now BKA is not available, there are ways to execute those queries in an efficient way by rewriting them. The …

[Read more]
NDB cluster and Max_connections

NDB cluster is a strange beast. Usually, performance wise, it is a good idea to limit the number of threads inside MySQL, that’s why there are parameters like thread_concurrency and innodb_thread_concurrency. MySQL is known to show mutexes contention with a number of active threads greater than a hundred (actually even less) but with NDB the situation is quite different since threads have to wait for the network latency. With NDB Cluster, be prepared to use unusually high numbers of connections and be prepared to crank up the number of active workers if you want to push NDB to its limit. The following figure shows some tests I made recently during one my engagements. As one can see, the number of active connections has an important impact on the overall throughput and it peaks at approximately 800 connections!!!

We did by mistake one run of our test with InnoDB with 800 clients… we had to kill the server after a while, …

[Read more]
InnoDB secondary index file structure

In my previous Post, we took a look at the file structure of the InnoDB primary key index. With the same table structure, but a slightly different dataset:

mysql> select * from test_innodb_growth limit 10;
+----+------------+--------+
| id | data       | spacer |
+----+------------+--------+
|  1 | a          | |      |
|  2 | aa         | |      |
|  3 | aaa        | |      |
|  4 | aaaa       | |      |
|  5 | aaaaa      | |      |
|  6 | aaaaaa     | |      |
|  7 | aaaaaaa    | |      |
|  8 | aaaaaaaa   | |      |
|  9 | aaaaaaaaa  | |      |
| 10 | aaaaaaaaaa | |      |
+----+------------+--------+
10 rows in set (0.00 sec)

All the rows after id = 10 have data = ‘aaaaa’. Let’s take a look at the secondary index ‘idxdata’. I recently had an argument with a client that claimed that a varchar is fully expanded in a secondary index. The following will prove it is not …

[Read more]
Showing entries 11 to 20 of 33
« 10 Newer Entries | 10 Older Entries »