Showing entries 1 to 10 of 33
10 Older Entries »
Displaying posts with tag: freesoftware (reset)
Improving replication with multiple storage engines

New MariaDB/MySQL storage engines such as MyRocks and TokuDB have renewed interest in using engines other than InnoDB. This is great, but also presents new challenges. In this article, I will describe work that I am currently finishing, and which addresses one such challenge.

For example, the left bar in the figure shows what happens to MyRocks replication performance when used with a default install where the replication state table uses InnoDB. The middle bar shows the performance improvement from my patch.

Current MariaDB and MySQL replication uses tables to transactionally record the replication state (eg mysql.gtid_slave_pos). When non-InnoDB storage engines are introduced the question becomes: What engine should be used for the replication table? Any choice will penalise other engines heavily by injecting a cross-engine transaction with every replicated change. Unless all tables can be migrated to the …

[Read more]
First steps with MariaDB Global Transaction ID

My previous writings were mostly teoretical, so I wanted to give a more practical example, showing the actual state of the current code. I also wanted to show how I have tried to make the feature fit well into the existing replication features, without requiring the user to enable lots of options or understand lots of restrictions before being able to use it.

So let us start! We will build the code from lp:~maria-captains/maria/10.0-mdev26, which at the time of writing is at revision knielsen@knielsen-hq.org-20130214134205-403yjqvzva6xk52j.

First, we start a master server on port 3310 and put a bit of data into it:

    server1> use test; …
[Read more]
More on global transaction ID in MariaDB

I got some very good comments/questions on my previous post on MariaDB global transaction ID, from Giuseppe and Robert (of Tungsten fame). I thought a follow-up post would be appropriate to answer and further elaborate on the comments, as the points they raise are very important and interesting.

(It also gives me the opportunity to explain more deeply a lot of interesting design decisions that I left out in the first post for the sake of brevity and clarity.)

On crash-safe slave

One of the things I really wanted to improve with global transaction ID is to make the replication slaves more crash safe with respect to their current replication state. This state is mostly persistently stored information about which event(s) were last executed on the slave, so that after a server restart the slave will know from which point in the master binlog(s) …

[Read more]
Global transaction ID in MariaDB

The main goal of global transaction ID is to make it easy to promote a new master and switch all slaves over to continue replication from the new master. This is currently harder than it could be, since the current replication position for a slave is specified in coordinates that are specific to the current master, and it is not trivial to translate them into the corresponding coordinates on the new master. Global transaction ID solves this by annotating each event with the global transaction id which is unique and universal across the whole replication hierarchy.

In addition, there are at least two other main goals for MariaDB global transaction ID:

  1. Make it easy to setup global transaction ID, and easy to provision a new slave into an existing replication hierarchy.
  2. Fully support …
[Read more]
Integer overflow

What do you think of this piece of C code?

  void foo(long v) {
    unsigned long u;
    unsigned sign;
    if (v < 0) {
      u = -v;
      sign = 1;
    } else {
      u = v;
      sign = 0;
    }
    ...

Seems pretty simple, right? Then what do you think of this output from MySQL:

  mysql> create table t1 (a bigint) as select '-9223372036854775807.5' as a;
  mysql> select * from t1;
  +----------------------+
  | a                    |
  +----------------------+
  | -'..--).0-*(+,))+(0( | 
  +----------------------+

Yes, that is authentic output from older versions of MySQL. Not just the wrong number, the output is complete garbage! This is my all-time favorite MySQL bug#31799. It was caused by code like the above C snippet.

So can you spot what is wrong with the code? Looks pretty simple, does it not? But the title of this post …

[Read more]
Even faster group commit!

I found time to continue my previous work on group commit for the binary log in MariaDB.

In current code, a (group) commit to InnoDB does not less than three fsync() calls:

  1. Once during InnoDB prepare, to make sure we can recover the transaction in InnoDB if we crash after writing it to the binlog.
  2. Once after binlog write, to make sure we have the transaction in the binlog before we irrevocably commit it in InnoDB.
  3. Once during InnoDB commit, to make sure we no longer need to scan …
[Read more]
Tale of a bug

This is a tale of the bug lp:798213. The bug report has the initial report, and a summary of the real problem obtained after detailed analysis, but it does not describe the processes of getting from the former to the latter. I thought it would be interesting to document this, as the analysis of this bug was rather tricky and contains several good lessons.

Background

The bug first manifested itself as a sporadic failure in one of our random query generator tests for replication. We run this test after all MariaDB pushes in our Buildbot setup. However, this failure had only occured twice in several months, so it is clearly a very rare failure.

The first task was to try to repeat the problem and get some more data in the form of binlog files and so on. Philip kindly helped with this, and …

[Read more]
The future of replication revealed in Istanbul

A very good meeting in Istanbul is drawing to an end. People from Monty Program, Facebook, Galera, Percona, SkySQL, and other parts of the community are meeting with one foot on the European continent and another in Asia to discuss all things MariaDB and MySQL and experience the mystery of the Orient.

At the meeting I had the opportunity to present my plans and visions for the future development of replication in MariaDB. My talk was very well received, and I had a lot of good discussions afterwards with many of the bright people here. Working from home in a virtual company, it means a lot to get this kind of inspiration and encouragement from others on occasion, and I am looking forward to continuing the work after an early flight to Copenhagen tomorrow.

The new interface for transaction coordinator plugins is what particularly interests me at the moment. The …

[Read more]
Dynamic linking costs two cycles

It turns out that the overhead of dynamic linking on Linux amd64 is 2 CPU cycles per cross-module call. I usually take forever to get to the point in my writing, so I thought I would change this for once :-)

In MySQL, there has been a historical tendency to favour static linking, in part because to avoid the overhead (in execution efficiency) associated with dynamic linking. However, on modern systems there are also very serious drawbacks when using static linking.

The particular issue that inspired this article is that I was working on MWL#74, building a proper shared libmysqld.so library for the MariaDB embedded server. The lack of a proper libmysqld.so in MySQL and MariaDB has caused no end of grief for packaging Amarok for the various Linux distributions. My patch …

[Read more]
Micro-benchmarking pthread_cond_broadcast()

In my work on group commit for MariaDB, I have the following situation:

A group of threads are going to participate in group commit. This means that one of the threads, called the group leader, will run an fsync() for all of them, while the other threads wait. Once the group leader is done, it needs to wake up all of the other threads.

The obvious way to do this is to have the group leader call pthread_cond_broadcast() on a condition that the other threads are waiting for with pthread_cond_wait():

  bool wakeup= false;
  pthread_cond_t wakeup_cond;
  pthread_mutex_t wakeup_mutex

Waiter:

  pthread_mutex_lock(&wakeup_mutex);
  while (!wakeup)
    pthread_cond_wait(&wakeup_cond, &wakeup_mutex);
  pthread_mutex_unlock(&wakeup_mutex);
  // Continue processing after group commit …
[Read more]
Showing entries 1 to 10 of 33
10 Older Entries »