I will present two talks at the MySQL Conference next
April.
One is a three hours tutorial on Advanced MySQL Replication
Techniques, and the other is a normal session on The art
of sandboxing. Reducing Complex Systems to Manageable
Boxes.
The first topic is not a first to me. But the contents are going
to be fresh and new. There has been so much going on in the
replication field, that the talk on this topic that I presented
in 2007 looks like ancient history.
The second topic is completely new. I have often presented the
result of my sandboxing efforts, but I have never thought of
explaining the techniques themselves. Now that I have got some
experience at reducing differently complex systems to sandboxes,
I want to share the knowledge, to promote more work in this
field.
The following quirky dynamic SQL will scan each index of each table so that they’re loaded into the key_buffer (MyISAM) or innodb_buffer_pool (InnoDB). If you also use the PBXT engine which does have a row cache but no clustered primary key, you could also incorporate some full table scans.
To make mysqld execute this on startup, create /var/lib/mysql/initfile.sql and make it be owned by mysql:mysql
SET SESSION group_concat_max_len=100*1024*1024;
SELECT GROUP_CONCAT(CONCAT('SELECT COUNT(`',column_name,'`) FROM `',table_schema,'`.`',table_name,'` FORCE INDEX (`',index_name,'`)') SEPARATOR ' UNION ALL ') INTO @sql FROM information_schema.statistics WHERE table_schema NOT IN ('information_schema','mysql') AND seq_in_index = 1;
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SET SESSION group_concat_max_len=@@group_concat_max_len;
and in my.cnf add a line in the [mysqld] block
init-file = …[Read more]
In my continuing MySQL 5.5 blog series, today I am covering
what's new on the replication front. MySQL replication is
my favorite server feature and what drew me to MySQL during my
tenure with Embarcadero Technologies. Others seem to
agree as based on community and customer surveys, MySQL
replication is the most popular and widely used database
feature. Mostly because it is easy to set up and ease, it
enables scalability and provides a pretty robust solution for
data redundancy, backup and overall availability. In MySQL
5.5 replication has been enhanced in response to user requests
that MySQL replication:
- Ensure data consistency between master and slave servers
- Immediately detect if replication is not working
- Allow a crashed slave to automatically recover from the master relay log
- Allow users to filter …
A few days ago, a friend of mine asked me if I knew of a way of
filtering a bunch of binary logs, to extract only statements
related to a single table. The task was about filtering a few
hundred binary log files.
It's a tricky problem. Even with my experience with regular
expressions, I knew that using a script to extract statements
related to a single table was going to be a nasty business.
However, I know of an entity that can filter by table name
efficiently, and that's the MySQL replication system. So I
suggested using replication to a sandbox with a
replicate-wild-do-table statement to get the job done.
My friend was skeptical and did not want to go that way. I was
busy writing an article for an Italian magazine and did not
follow up immediately. But today, with the article safely in the
editor's hands, I did a quick test, and guess what? It
works!
Here is a step-by-step procedure to do it. I started …
A few days ago I saw an article about Semi-Synchronous Replication in MySQL 5.5. It
asks questions, and doesn't give answers beyond gut feeling. So I
thought that I would do some practical testing of this new
feature.
Before we go that way, though, let's revisit the theory.
How semi-synchronous replication works
Figure 1. A transaction with regular replication
With regular replication, you send a transaction to the master
(1). When the COMMIT is received, the master
executes it (2), and if successful it logs the transaction to the
binary log (3). The the master answers the client request (4)
with a successful result. In the meantime, the slaves replicate
the record (5).
What happens if the master crashes after point #4 and before a
slave has had a chance of getting the data in point #5?
…
Single-threaded apply is one of the big downsides of MySQL's
built-in replication, as Baron Schwartz pointed out a couple of days
ago. While a master can process dozens of updates at once,
slaves must apply them one after the other on a single
thread. Add in disk I/O, and the result is very slow
performance indeed. The obvious answer is parallel
apply, namely writing multiple non-conflicting updates to the
slave at once.
I have spent the last few months implementing parallel apply for
Tungsten 2.0, which we are now testing at customer sites.
In this article I would like to describe how Tungsten's parallel
apply works as well as some of the lessons that have become
apparent through the implementation.
There are a couple of big challenges in parallel apply. …
If you have used MySQL for some time you know that mysqld can write binlogs. This is usually used for backup purposes and JITR or for replication purposes so a slave can collect the changes made on the master and apply them locally.
Most of the time apart from configuring how long you keep these binlogs they are pretty much ignored.
Recently I came across an issue. I have a slave server which is NOT configured read only and which has an additional database used to collect statistics from the replicated database and provided aggregation and business information. The typical sales per country, per product, per day, week, month, year, per whatever etc. This is the usual datawarehouse type functionality. It’s done on a slave and not the master so that the load is not passed on down stream to all boxes. …
[Read more]This is the second in a series on what's seriously limiting MySQL in certain circumstances (links: part 1). In the first part, I wrote about single-threaded replication. Upstream from the replicas is the primary, which enables replication by writing a so-called "binary log" of events that modify data in the server. The binary log is a real limitation in MySQL.
The binary log is necessary not only for replication, but for point-in-time recovery, too. Given a backup and the corresponding binary log position, you can replay the binary log and roll forward the state of your server to a desired point in time.
But enabling the binary log reduces MySQL's performance dramatically. It is not the logging itself that's the problem -- writing the log is usually not much additional work. It's ensuring consistency and …
[Read more]I recently mentioned a few of the big "non-starter" limitations Postgres has overcome for specific use cases. I decided to write a series of blog posts on MySQL's unsolved severe limitations. I mean limitations that really hobble it for major, important needs -- not in areas where it isn't used, but in areas where it is used and thus is used very wastefully compared to its potential.
The first glaring problem is single-threaded replication. It is severe and getting much worse as servers get more and more CPUs and CPU cores. The replication replay process executes in a single thread on the replicas, and thus has no hope of keeping up with even a moderately busy write load on the primary, where many updates are occurring concurrently.
In a lot of Web applications, this isn't really seen as a huge limitation. That's because these …
[Read more]
Disaster recovery (DR) is not the first thing most DBAs think of
when putting up a new database application. However,
it's one of the top issues for people using the data--what
happens if the site goes down and everything
disappears? So even if DR is not the first issue in
every deployment, it is a very high priority as soon as your
application is the least bit successful.
At the database level DR has a fairly simple solution: keep
copies of data on a backup site that is up-to-date at all
times. This article explains the architecture for MySQL DR
with Tungsten and a couple of key features that make it work,
namely floating IP addresses and global transation IDs. We
will dig into those at the end.
First a bit of introduction. Tungsten manages clusters of
off-the-shelf database connected by master/slave
replication. There are replication and management services
on …