Backing up databases has never been fun, not as fun as having a
cool English Ale on the balcony on a hot summar day anyway, but
MongoDB takes this one step further when it comes to
annoyances.
In general, I often feel that many Open Source projects start
with good intentions for what the project should do and how, but
then more stuff is added as you require it, and suddenly what
started out as a simple and fast application for a narrow
usecase, suddenly turns into a bit of a mess. And the issue might
well be that building fast, compact software for a specialized
usecase, as they start out, is not the same as writing generic
software, with a wide range of use cases, code that can easily be
maintained and enhanced as we go along. And why should it not be
like that? In many cases, this is just fine and the limited
usecase is just what the project sets out to do, and it does it
well. But sometimes this turns into something really …
Log Writer is busily and efficiently pumping out innumerable bytes from log buffer to online redo logs, while more and more exciting features are being added to the MySQL and the SQL Server community is getting more vibrant with each passing day, and in all this beautiful frenzy, we manage to catch some elegant blog [...]
It is a constant, yet interesting debate in the world of big data. What scales best? OldSQL, NoSQL, NewSQL?
I have a longer post coming on this soon. But for now, let me make the following comments. Generally, most data technologies can be made to scale - somehow. Scaling up tends not to be too much of an issue, scaling out is where the difficulties begin. Yet, most data technologies can be scaled in one form or another to meet a data challenge even if the result isn’t pretty.
What is best? Well that comes down to the resulting complexity, cost, performance and other trade-offs. Trade-offs are key as there are almost always significant concessions to be made as you scale up.
A recent example of mine, I was looking at scalability aspects of MySQL. In particular, MySQL Cluster. It is …
[Read more]Codership team announced availability of MySQL/Galera 0.8.1, which is minor release, but actually it has bunch of improvements that makes Galera replication more user friendly (there are many bugs fixed, reported by me personally, what annoyed me a lot).
As part of my evaluation activity I ported MySQL/Galera 0.8.1 to Percona Server/Galera 0.8.1 and you can get source code on Launchpad.
I appreciate the fact that not everybody has fun from compiling
source code (hint, hint for Drizzle developers), that is why I
also made binaries for RHEL 6.1 / Oracle Linux 6.1
…
Multi-master replication between sites is the holy grail of applications ranging from credit card processing to large-scale software-as-a-service (SaaS) operations. Tungsten Replicator is award-winning open source software that helps you solve a wide range of multi-master problems that you can only dream of tackling with MySQL native replication. Learn the nuts and bolts of multi-master
Yoshinori Matsunobu sparked some interest recently when he posted about his MySQL MHA HA solution, and there has been some discussion of it internally at Yahoo compared with the standard we currently have.
Full disclosure: I haven’t read every bit of the documentation or tried it out yet, so I apologize in advance to Yoshinori if I mistakenly represent his hard work.
I see a lot of great ideas in Yoshinori’s release, it seems to focus on two main problems:
- A process to monitor an active master and perform a failover when it fails
- The bit that finds the most “caught-up” slave and distributes …
Since my last post I’ve changed how the table statistics work quite a bit in MariaDB. I ran into a few problems with my original changes. In the TiVo 5.0 patch the show table_statistics command chose from one of three hash tables to read from depending on the flags. There is a global hash table for global stats and two in the thd object for session and query stats. Each time a non show query is executed the query statistics are reset. In 5.1 the implementation of show command changed from reading arbitrary data structures to constructing queries to run against information_schema tables. The information_schema tables are constructed on the fly, placed into a temporary table and have the select resulting from the show command …
[Read more]There are a lot of scalability challenges we see with clients over and over. The list could easily include 20, 50 or even 100 items, but we shortened it down to the biggest five issues we see.
1. Tune those queries
By far the biggest bang for your buck is query optimization. Queries can be functionally correct and meet business requirements without being stress tested for high traffic and high load. This is why we often see clients with growing pains, and scalability challenges as their site becomes more popular. This also makes sense. It wouldn't necessarily be a good use of time to tune a query for some page off in a remote corner of your site, that didn't receive real-world traffic. So some amount of reactive tuning is common and appropriate.
Enable the slow query log and watch it. Use …
[Read more]
Even though multiple fixes have been implemented in Percona
Server and MySQL 5.5, there are still workloads in which case
mutex (or rw-lock) contention is a performance limiting factor,
helped by ever growing number of cores available in the systems.
It is interesting though the contention may manifest itself in
the different form from the system monitoring standpoint. In many
cases as heavy contention happens user CPU will be very high, and
the context switches will be somewhere reasonable. In others you
would see the CPU usage being low with a lot of CPU being idle,
increased compared to normal workload portion of system CPU and
high number of context switches. These correspond to different
contention situations which can be handled differently.
First situation often corresponds to Innodb spending a lot of CPU
time running “loops” as part of spinlock implementation. In many
cases busy wait is indeed more efficient than doing …
A short discussion with Baron at Henrik's blog has stirred my eloquence.
Baron points to a great post by Josh Berkus where Josh
contemplates the database clustering issues from a novel
viewpoint. The post is really insightful. But I'm going to top
that (albeit not so skilfully).
In his post Josh maintains that existing PostgreSQL clustering
solutions do a poor job satisfying user needs because developers
concentrate too much on technological choices and too little on
use cases, which he identifies three: Transactional User,
Analytic User, Online User. And all developers need to do is just
make three (only three) clustering solutions that would satisfy
each of those use cases well. And all will be nice and …