For a long time I’ve wanted to know how MySQL scales as you add more memory to the server. Vadim recently benchmarked the effects of increasing memory and CPU core count. He looked for a balance between utilizing the hardware as much as possible, limiting the system complexity, and lowering the price-to-performance ratio.
The outcome of the research, which was sponsored by Virident, is that as you add CPUs and increase memory size, MySQL doesn’t scale as well as we would like, and solid-state storage — specifically, the Virident tachIOn drive — has more bandwidth than MySQL can fully utilize at present. Therefore, to decrease the price-to-performance ratio and increase the utilization of the tachIOn drive, Vadim sharded the database into smaller instances and colocated them on the same machine. It’s not a[Read more...]
I started going over the slides from talks from MySQL Conference and Expo 2011 to pick set of must see presentations and publish the list, but this is not happening due to lack of time. Instead I’m only going to recommend 1 talk, from list of tutorials I had a chance to review. If you have not check it out yet, take a look at Linux and Hardware Optimizations by Yoshinori Matsunobu. Check out the slides which are very well done with many graphs and verbose enough explanations so you can easily follow slides alone.
Were you excited by some other talk on MySQL Conference and Expo ? Feel free to leave it as a comment to this post.
Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to significantly improve the performance of queries by spreading the work over all available compute resources. In this test, Shard-Query averages a nearly 6x (max over 10x) improvement over the baseline, as shown in the following graph:
Part of the InnoDB duties, being an MVCC-implementing storage engine, is to get rid of–purge–the old versions of the records as they become obsolete. In MySQL 5.1 this is done by the master InnoDB thread. Since then, InnoDB has been moving towards the parallelized purge: in MySQL 5.5 there is an option to have a single separate dedicated purge thread and in MySQL 5.6.2 one can have multiple dedicated purge threads.
Percona Server 5.1 supports multiple purge threads too, although using more than one is considered experimental at the moment. Unfortunately this patch hasn’t been ported to Percona Server 5.5 yet.
Let’s test these two implementations and find out what benefits, if any, do the additional purge threads bring.
The test workload makes a long history list and then lets purge
My previous benchmark on Performance Schema was mainly in memory workload and against single tables.
Now after adding multi-tables support to sysbench, it is interesting to see what statistic we can get from workload that produces some disk IO.
So let’s run sysbench against 100 tables, each 5000000 rows (~1.2G ) and buffer pool 30G.
The scripts and results are on Benchmark Wiki.
If we look on performance overhead it appears rather big in read-only benchmark, and it is well explained in
We just pushed to sysbench support for workload against multiple tables ( traditionally it used only single table).
It is available from launchpad source tree
This is set of LUA scripts for sysbench 0.5 ( it supports scripting), and it works following way:
- you should use
--test=tests/db/oltp.lua to run OLTP test
./sysbench --test=tests/db/oltp.lua --oltp-tables-count=25 prepare
./sysbench --test=tests/db/oltp.lua --oltp-tables-count=25 --num-threads=5 run
oltp.lua should understand most options that available for regular sysbench –test=oltp
there are couple other scripts, like
As continuation of my CPU benchmarks it is interesting to see what is scalability limitation in MySQL 5.6.2, and I am going to check that using PERFORMANCE SCHEMA, but before that let’s estimate what is potential overhead of using PERFORMANCE SCHEMA. So I am going to run the same benchmarks (sysbench read-only and read-write) as in previous post with different performance schema options and compare results.
I am going to use Cisco UCS C250
with next settings:
Having two big boxes in our lab, one based Intel Nehalem (Cisco UCS C250) and second on AMD Opteron (Dell PowerEdge R815), I decided to run some simple sysbench benchmark to compare how both CPUs perform and what kind of scalability we can expect.
It is hard to make apples to apples comparison, but I think it is still interesting.
Cisco UCS C250 has total 12 cores / 24 threads of Intel Nehalem X5670, and Dell PowerEdge R815 has 48 cores of AMD Opteron 6168.
One of biggest difference is that Cisco is running CentOS 5.5 and Dell R815 is based on RedHat EL 6. I will probably will need to rerun benchmark after upgrade Cisco to CentOS 6 ( will be it even released or
There have been recent discussions about DROP TABLE performance in InnoDB. (You can refer to Peter’s post http://www.mysqlperformanceblog.com/2011/02/03/performance-problem-with-innodb-and-drop-table/ and these bug reports: http://bugs.mysql.com/bug.php?id=51325 and http://bugs.mysql.com/bug.php?id=56332.) It may not sound that serious, but if your workload often uses DROP TABLE and you have a big buffer pool, it may be a significant issue. This can get especially painful, as during this operation InnoDB holds the LOCK_open mutex, which prevents other queries from executing. So, this is a problem for a server with a large amount of memory, like the one we have[Read more...]
I mentioned problems with InnoDB flushing in a previous post. Before getting to ideas on a solution, let’s define some terms and take a look into theory.
The two most important parameters for InnoDB performance are innodb_buffer_pool_size and innodb_log_file_size. InnoDB works with data in memory, and all changes to data are performed in memory. In order to survive a crash or system failure, InnoDB is logging changes into InnoDB transaction logs. The size of the InnoDB transaction log defines how many changed blocks we can have in memory for a given period of time. The obvious question is: Why can’t we simply have a gigantic InnoDB transaction log? The answer is that the size of the transaction log[Read more...]
You may have seen in the last couple of weekly news posts that Baron mentioned we are working on a new adaptive flushing algorithm in InnoDB. In fact, we already have three such algorithms in Percona Server (reflex, estimate, keep_average). Why do we need one more? Okay, first let me start by showing the current problems, and then we will go to solutions.
The basic problem is that, unfortunately, none of the existing flushing implementations (including both MySQL native adaptive flushing and that in Percona Server) can handle it properly. Our last invention, “keep_average”, is doing a very good job on systems based on SSD/Flash storage, but it is not so good for regular slow hard drives.
Let me state the following: If you have a lot of memory (and this is not rare nowadays, for example[Read more...]
This is a continuation of my series of benchmark posts comparing Amazon RDS to a server running on Amazon EC2. Upcoming posts (probably 6 or 8 in total) will extend the scope of the benchmark to include data on our Dell r900 with traditional hard drives in RAID10, and a server in the Joyent cloud. As a reminder, my goal was to run a long-term benchmark and see how the instance performed over time. Can it sustain performance over a several-day period of intense workload? The first step was to determine the number of threads that should be used for the benchmark.
To gauge this, I ran a series of 60-second benchmarks on the RDS server, and extracted the transactions per second from them, then used the peak throughput as my target configuration. The benchmark was sysbench oltp complex, with 400 million rows (88GB of data and indexes, which[Read more...]
The duration of a benchmark is an important factor that helps determine how meaningful it is. Most systems have some “burstable capacity,” and this can influence the results a lot. You can see this in all areas of life — you can sprint much faster than you can run a 10k race. Your stereo system components are usually advertised in both peak and sustained output. Transducers can generally hit peaks that would melt them due to heat dispersion challenges if run at that level long-term. Database servers are no different. Many components in the system have the capacity to absorb peaks. But buffers eventually fill if pressured for a long time.
When designing a benchmark, you should think about what type of performance characteristics you are looking for in your production system. If you want a system that can handle peak loads that[Read more...]
Before I begin, a disclaimer. VoltDB is not a customer, and did not pay Percona or me to investigate VoltDB’s scalability or publish this blog post. More disclaimers at the end. Short version: VoltDB is very scalable; it should scale to 120 partitions, 39 servers, and 1.6 million complex transactions per second at over 300 CPU cores, on the benchmarked configuration, with the recommended level of redundancy for HA.
First, if you’re new to VoltDB, I’ll summarize: it’s an open-source OLTP database that is designed to run on a cluster, not just a single machine, and doesn’t sacrifice consistency during a network partition. It is an in-memory shared-nothing system, and tables are partitioned across multiple servers in the cluster; high availability is ensured by keeping[Read more...]
In my recent benchmarks, such as this one about the Virident TachIon card, I used different values for innodb_buffer_pool_size, like 13GB, 52GB, and 144GB, for testing the tpcc-mysql database with size 100G. This was needed in order to test different memory/dataset size ratios. But why is it important, and how does it affect how InnoDB works internally? Let me show some details.
Internally, InnoDB uses two lists for flushing (writing pages from the Innodb buffer pool memory to disk): the LRU list and the flush list. You can see a tracking of these lists in SHOW ENGINE INNODB STATUS:
... Pending writes: LRU 0, flush list 129, single page 0 ...
It is important to understand which list is being used for[Read more...]
I again work with the system which needs high insertion rate for data which generally fits in memory. Last time I worked with similar system it used MyISAM and the system was built using multiple tables. Using multiple key caches was the good solution at that time and we could get over 200K of inserts/sec.
This time I worked with Innodb tables... it was a different system with different table structure, not to mention different hardware so It can't be compared directly, still it is nice to see you can get the numbers as high with Innodb too.
I will spare you all experiments we went through and just share final numbers. On 8 core Opteron Box we were able to achieve 275K inserts/sec at which time we started to see load to get IO bound because of log writes and flushing dirty buffers. I'm confident you can get[Read more...]
In my post MySQL 5.5.8 and Percona Server: being adaptive I mentioned that I used innodb-log-block-size=4096 in Percona Server to get better throughput, but later Dimitri in his article MySQL Performance: Analyzing Percona's TPCC-like Workload on MySQL 5.5 sounded doubt that it really makes sense. Here us quote from his article:
"Question: what is a potential impact on buffered 7MB/sec writes if we'll use 4K or 512 bytes block size to write to the buffer?.. )
There will be near no or no impact at all
A couple of days ago, Dimitri published a blog post, Analyzing Percona's TPCC-like Workload on MySQL 5.5, which was a response to my post, MySQL 5.5.8 and Percona Server: being adaptive. I will refer to Dimitri's article as article . As always, Dimitri has provided a very detailed and thoughtful article, and I strongly recommend reading if you want to understand how InnoDB works. In his post, Dimitri questioned some of my conclusions, so I decided to take a more detailed look at my findings. Let me show you my results.
Article  recommends using the innodb_max_dirty_pages_pct and[Read more...]
This is to follow up on my previous post and show the results for MySQL 5.5.8 and Percona Server on the fastest hardware I have in our lab: a Cisco UCS C250 server with 384GB of RAM, powered by a Virident tachIOn 400GB SLC card.
To see different I/O patterns, I used different innodb_buffer_pool_size settings: 13G, 52G, an 144G on a tpcc-mysql workload with 1000W (around 100GB of data). This combination of buffer pool sizes gives us different data/memory ratios (for 13G - an I/O intensive workload, for 52G - half of the data fits into the buffer pool, for 144G -[Read more...]
As we can see, MySQL 5.5.8 comes with great improvements and scalability fixes. Adding up all the new features, you have a great release. However, there is one area I want to touch on in this post. At Percona, we consider it important not only to have the best peak performance, but also stable and predictable performance. I refer you to Peter's post, Performance Optimization and Six Sigma.
In Percona Server (and actually even before that, in percona-patches builds for 5.0), we added adaptive checkpoint algorithms, and later the InnoDB-plugin included an implementation of "adaptive flushing". This post shows the differences between them and MySQL.
The post also answers the question of whether we are going to have releases of Percona[Read more...]
A couple of weeks ago I blogged about Sharing an auto_increment value across multiple MySQL tables. In the comments, a few people wrote in to suggest alternative ways of implementing this. I just got around to benchmarking those alternatives today across two large EC2 machines:
What is the conclusion? With the exception of my original option2, they actually all perform fairly similar.[Read more...]
The parameter sort_buffer_size is one the MySQL parameters that is far from obvious to adjust. It is a per session buffer that is allocated every time it is needed. The problem with the sort buffer comes from the way Linux allocates memory. Monty Taylor (here) have described the underlying issue in detail, but basically above 256kB the behavior changes and becomes slower. After reading a post from Ronald Bradford (here), I decide to verify and benchmark performance while varying the size of the sort_buffer. It is my understanding that the sort_buffer is used when no index are available to help the sorting so I created a MyISAM table with one char column without an index:[Read more...]
The title is SEO bait – you can’t do it. We’ve seen a few recurring patterns trying to achieve similar – and I thought I would share with you my favorite two:
Option #1: Use a table to insert into, and grab the insert_id:
CREATE TABLE option1 (id int not null primary key auto_increment) engine=innodb; # each insert does one operations to get the value: INSERT INTO option1 VALUES (NULL); # $connection->insert_id();
Option #2: Use a table with one just row:
CREATE TABLE option2 (id int not null primary key) engine=innodb; INSERT INTO option2 VALUES (1); # start from 1 # each insert does two operations to get the value: UPDATE option2 SET id=@id:=id+1; SELECT @id;
So which is better? I don’t think it’s that easy to tell at a[Read more...]
We now have hardware in our test lab that represents the next generation of commodity servers for databases. It’s a Cisco UCS C250 server, powered by two Intel Westmere CPUs (X5670 @ 2.93GHz). Each CPU has 6 cores and 12 threads. The most amazing part is the amount of memory. It has 384GB of RAM, which is actually more space than the disks contain. The disks are 270GB in total, with the underlying configuration RAID10 over eight 2.5″ 15K RPM disks. To make the system even more powerful, I put a FusionIO 320GB SLC card in the PCI-E slot. Here is a link to the box specs.
The server was generously provided by Cisco Systems, Inc.
There is a lot of discussions whenever running MySQL storing data on NFS is a good idea. There is a lot of things for and against this and this post is not about them.
The fact is number of people run their databases on NetApp and other forms of NFS storage and this post is about one of discoveries in such setup.
There are good reasons to have binary logs on NFS volume - binary logs is exactly the thing you want to survive the server crash - using them you can do point in time recovery from backup.
I was testing high volume replication today using Sysbench:PLAIN TEXT SQL:
(Note: The review was done as part of our consulting practice, but is totally independent and fully reflects our opinion)
In my talk on MySQL Conference and Expo 2010 "An Overview of Flash Storage for Databases" I mentioned that most likely there are other players coming soon. I actually was not aware about any real names at that time, it was just a guess, as PCI-E market is really attractive so FusionIO can't stay alone for long time. So I am not surprised to see new card provided by Virident and I was lucky enough to test a pre-production sample Virident tachIOn 400GB SLC card.
I think it will be fair to say that Virident targets where right now FusionIO has a monopoly, and it will[Read more...]
You might be familiar with Six Sigma business management strategy which is employed by variety of the companies in relationship to managing quality of its product. Six Sigma applies to number of defects - when you have reached six sigma quality in your production you would see 99.99966% of the products manufactured with no defects, or in other words there is less than 3 defects per million.
One of principles of six sigma is what customers tend to be concerned about variance a lot more than average. For example if you produce tomato soup and the average difference from declared weight is going to be 0.1 gram or 0.5 gram, probably nobody would not notice the difference. What would worry people however is significant number of very large differences, such as half empty tomato soup can.
This run is very similar what I had on Intel SSD X25-M card, but now I use FusionIO 80GB SLC card. I chose this card as smallest available card (and therefore cheapest. On Dell.com you can see it for about $3K). There is also FusionIO IO-Xtreme 80GB card, which is however MLC based and it could be not best choice for FlashCache usage ( as there high write rate on FlashCache for both reading and writing to/from disks, so lifetime could be short).
Also Facebook team released WriteThrough module for FlashCache, which could be good trade-off if you want extra warranty for data consistency and your[Read more...]
Finally I was able to run PBXT 1.0.11 pre-GA in tpcc-like workload, apparently there was bug with did not allow me to get the result earlier, and I am happy to see that PBXT team managed it.
For initial runs I took tpcc 100 warehouses ( about 10GB of data) which fully fits into memory (32 GB on server),
and compared 1 and 16 users in MySQL-5.1.46/PBXT and Percona Server / XtraDB - 5.1.45-rel10.2. As workload is totally memory based it will show how PBXT scales in CPU-bond cases on 16 cores systems.
As storage system it was Intel SSD X25-M card.
While full results and config are on Wiki:
there are graphs for 1 user: