Showing entries 21 to 30 of 32
« 10 Newer Entries | 2 Older Entries »
Displaying posts with tag: NDB Cluster (reset)
NDB Cluster one step closer to become a DB killer app!

If you have been following the development of the NDB Cluster storage engine lately, you are probably as excited as I am. NDB Cluster is becoming a kind of large database killer app. Look at all the nice features that have been added:

  • Replication, if you know MySQL you know what I am talking about
  • Distribution awareness, optimize query execution based on the distribution, a strong scaling factor
  • Disk based data, the possibility of pushing some columns to disk
  • Online add index, among the only online DDL I know of in MySQL
  • Multi-threading, no more need to configure many data nodes per server
  • Realtime, when query execution times matter

and I probably miss some. And now, with version 7 (renamed from 6.4) it is possible to …

[Read more]
Social networking type queries with NDB (part 2)

Recently, I talked about how to optimize social networking type queries for the NDB storage engine using IN clause statements. In clauses are great but they have one fundamental limitation, they work only on one column (Actually, this is not true, I discovered, thanks to Roland’s comment, that MySQL supports multiple columns IN clause). What if the primary key is a composite of let’s say “region_id”, “application_id” and “user_id”? Recently, while on site with a client, Brian Morin showed me a very clever way of dealing these type of primary keys in an IN clause. The main problem is that you cannot return a binary or varbinary from a function. So the idea was to used the return values of a stored proc. First we need to compose the varbinary from the actual values with this stored proc:
.

delimiter $$
drop procedure if exists compose_user_account_key $$ …
[Read more]
High performance replacement of the MySQL Memory storage engine with NDB

People often wants to use the MySQL memory engine to store web sessions or other similar volatile data.
There are good reasons for that, here are the main ones:

  • Data is volatile, it is not the end of the world if it is lost
  • Elements are accessed by primary key so hash index are good
  • Sessions tables are accessed heavily (reads/writes), using Memory tables save disk IO

Unfortunately, the Memory engine also has some limitations that can prevent its use on a large scale:

  • Bound by the memory of one server
  • Variable length data types like varchar are expanded
  • Bound to the CPU processing of one server
  • The Memory engine only supports table level locking, limiting concurrency

Those limitations can be hit fairly rapidly, especially if the session payload data is large. What is less known is that NDB Cluster can creates tables …

[Read more]
Social Networking type queries with NDB (part 1)

NDB Cluster is the only integrated sharding framework that I know of (educate me if I am wrong) but it is known to have issues with large joins. These days, large databases that would benefit from a sharding framework are often for social networking type applications that requires large joins.

Actually it is not NDB that is the root cause of the joins problem, it is the way MySQL executes joins. Instead of asking the cluster for a large number of rows for the secondary table it joins to, the current version of MySQL does ask one row at a time. NDB cluster answers those queries very rapidly but, the time to hop over the network kills performance. The MySQL-6.0 branch will implement the Batch Key Access (BKA) algorithm which will solve that issue and might create database application killer with NDB cluster.

Although right now BKA is not available, there are ways to execute those queries in an efficient way by rewriting them. The …

[Read more]
NDB cluster and Max_connections

NDB cluster is a strange beast. Usually, performance wise, it is a good idea to limit the number of threads inside MySQL, that’s why there are parameters like thread_concurrency and innodb_thread_concurrency. MySQL is known to show mutexes contention with a number of active threads greater than a hundred (actually even less) but with NDB the situation is quite different since threads have to wait for the network latency. With NDB Cluster, be prepared to use unusually high numbers of connections and be prepared to crank up the number of active workers if you want to push NDB to its limit. The following figure shows some tests I made recently during one my engagements. As one can see, the number of active connections has an important impact on the overall throughput and it peaks at approximately 800 connections!!!

We did by mistake one run of our test with InnoDB with 800 clients… we had to kill the server after a while, …

[Read more]
Using local tables and replication in a clever way with NDB

Tuning queries for MySQL NDB cluster is way trickier than tuning for any other storage engines. This is especially true for highly normalized schema. Look at the following query trying to retrieve all male account from a given city. Since a city is not unique, we also need to specify the state.

SELECT a.First_name, a.Last_name
FROM Account a INNER JOIN Cities c ON a.City_id = c.Id
  INNER JOIN States s ON c.State_id = s.Id
  INNER JOIN Gender g ON a.Gender_id = g.Id
WHERE c.Name = ‘Columbia’ AND s.Name = ‘South Caroline’ and g.Name = ‘Male’;

If you look at how the query will be executed with NDB, you realize that many hops over the network will be needed, at least one per table. If you push the cluster a lot, these extra hops will limit the performance. Since tables like Cities, States and especially Gender are …

[Read more]
NDB realtime options, choosing CPU and balancing interrupts

With the NDB realtime options, you can choose on which CPU the execution thread and the maintenance will be running. The point is, which CPUs to use.

The output of “cat /proc/interrupts” will help you determine which CPU to use. Here is an example of a dual quad-cores box:

$ cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
[Read more]
Reducing latency of queries with NBD Cluster realtime extensions

I was recently involved in a project where the main requirement was the smallest possible latency for queries. The queries were simple insert and update statements, the update being by primary key. The cluster was using regular Gigabits Ethernet and formed by 3 servers with 8 cores each, one Management node and 2 data nodes. Since throughput was not a concern, a design decision has been made to locate the MySQL daemons on the same servers as the data nodes, in order to save network hops. MySQL chooses the closest NDB data node as its transaction coordinators and the closest on will be reachable on localhost.

Apart from the architectural decision to host MySQL and the NDB nodes on the same boxes, we decided to benchmark the new real time options of NDB. The options are the following:

RealtimeScheduler: enable the real time schedule, was set to 1 LockExecuteThreadToCPU: assign a given CPU to the NDB execute thread, was set to 7, a …[Read more]
Linux Swappiness

Have you ever been upset by the Linux tendancy to swap… Especially when trying to allocate a large InnoDB buffer pool.. Look at the following output:

yves@yves-laptop:~$ free
total used free shared buffers cached
Mem: 2041888 1991096 50792 0 52 954592
-/+ buffers/cache: 1036452 1005436
Swap: 975200 1308 973892

There is still 50792 + 52 + 954592 = 1005436 of free memory and Linux starts to swap!!! The reason is hidden here:

yves@yves-laptop:~$ cat /proc/sys/vm/swappiness
60

The swappiness controls the Linux to swap for the File cache. For a file server or a web server or even MySQL with MyISAM tables, the file cache is interesting but for InnoDB or NDB Cluster it is close to useless. Only put a “0″ in that proc entry (echo 0 > /proc/sys/vm/swappiness) and add …

[Read more]
e1000 InterruptThrottleRate

The e1000 Nic is very common and can be found in many servers.  During my last NBD cluster engagement, on a set of 5 Dell 1950 servers equipped with a quad e1000 Nic, we were trying to tune cluster performance and one of the thing that was disapointing was that the system time, to processed interrupts from the Nics, was always low.  Since neither the ndbd processes or the mysqld processes were using a significant amount of CPU, we were wondering where the bottleneck was.

Then, Michel Donais, from the client, read the Linux driver documentation and bingo!!! Those e1000 Nics throttle interrupts by default at 8000/s.  8000… it is a big limiting factor….   design to prevent DOS attacks.  We removed the throttling and… reach more than 30,000 interrupt/s and guess what, the number of transaction per second scaled with it along with the ndbd and mysqld cpu usage.  Interesting isn’t it? Then I remembered another …

[Read more]
Showing entries 21 to 30 of 32
« 10 Newer Entries | 2 Older Entries »