(and before you ask, yes “rotating blades” comes from “become a fan”)
I’m forming the ideas here first and then we can go and implement it. Feedback is much appreciated.
Table one looks like this:
CREATE TABLE fan_of (
PRIMARY KEY (user_id, item_id),
That is, two columns, both 64bit integers. The primary key covers both columns (a user cannot be a fan of something more than once) and can be used to look up all things the user is a fan of. There is also an index over item_id so that you can find out which users are a fan of an item.
The second table looks like this:
CREATE TABLE fan_count (
item_id BIGINT PRIMARY KEY,
You have to love all the debating going on over NOSQL -vs- SQL don’t you? With my UC session on choosing the right data storage tools ( does this sound better then SQL-vs-NoSQL?) I have been trying to stay current with the mood of the community so i can make my talk more relevant. Today I was catching up on reading a few blogs posts and I thought I would pass along these two: Pro SQL and Pro NoSQL … these represent the two very different views on this subject. (Note I think there are misleading facts and figures in these that should be flushed out more, but they are a good sample of what I am talking about).[Read more...]
I am giving a talk in a couple of weeks at the 2010 MySQL User Conference that will touch on use cases for NOSQL tools -vs- More relational tools, the talk is entitled “Choosing the Right Tools for the Job, SQL or NOSQL”. While this talk is NOT supposed to be a deep dive into the good, bad, and ugly of these solutions, rather a way to discuss potential use cases for various solutions and where they may make a lot of sense, being me I still felt a need to at least do some minor benchmarking of these solutions. The series of posts I wrote last year over on mysqlperformanceblog.com comparing Tokyo Tyrant to both MySQL and Memcached was fairly popular. In fact the initial set of benchmark scripts I used for that series actually has been put to good use since then[Read more...]
Oracle’s plans for Sun’s OSS. The UK’s updated OSS strategy. And more.
Oracle’s plans for Sun’s OSS
# Oracle’s MySQL strategy slide.
# eWeek reported that database thought leaders are divided on Oracle MySQL.
# Zack Urlocker is leaving Oracle/Sun/MySQL.
# Red Hat’s Mark Little[Read more...]
The server in question was Solaris 10 with 8 disk RAID10 and 2 32GB SSDs used for ZIL and L2ARC, 72G RAM and 40G buffer pool. We started it up with innodb_adaptive_flushing=OFF and innodb_doublewrite=OFF, then ramped up traffic and everything looked stable ... but I noticed one troubling thing: ~2GB of uncheckpointed data.
mysql> SHOW INNODB STATUS\G .... Database pages 2318457 Old database pages 855816 Modified db pages 457902 Log flushed up to 10026890404067 Last checkpoint at[Read more...]
A while ago I started a project which will be heavily IO-bound on the MySQL Server, the testmachine allocated for this had a DAS with 15 disks (although I only used 14) connected via external SAS (standard 3Gb/s half-duplex or 6Gb/s full-duplex on two ports).
I used sysbench for the tests, both fileio and oltp, although these results will be based on the fileio results. The setup with the disks in RAID10 (7 raid 1 sets, then striping over them) and later RAID50 (2 raid 5 sets with 7 disks each, then striping over that), the latter yielding better results.
Let’s take a look at 1,2,4,8,16 and 128 concurrent clients, with different IO schedulers, although using XFS.
The config for the raid controller was write-back, cached access, advanced readahead. 512MB battery backed cache on the controller.
Also, I tested both sequential reading (SEQRD in[Read more...]
After finishing my post on the Faban 1.0 announcement, I realized that it was geared towards users who were already using Faban. So I decided to write this post for users who have never used Faban.
Faban is two things :
The former is called the “Faban Driver Framework” and the latter is called the “Faban Harness”. Although the two are related, it is entirely possible to run an arbitrary test developed outside of Faban using the Faban Harness. In fact, many benchmarks do just that. In this respect, Faban is rather unique.
The real power of Faban is unleashed only when you use the[Read more...]
Faban 1.0 has just been released. This is a major milestone for this open-source workload creation and test framework. Faban is widely used by many performance geeks to performance test various types of server applications. Amongst open source tools, Faban is unique in that it not only provides a framework to create and run performance/load tests, but also has robust functionality to run monitoring tools. It collects all sorts of configuration information as well to truly help performance engineers keep track of configuration and tuning settings.
Here are some major new features in 1.0 which I think will make Faban a very attractive proposition (when compared to the likes of some very expensive proprietary tools).
If you have used Faban before to create workloads, this is the feature you[Read more...]
For the past two months, I have been running tests on TokuDB in my free time. TokuDB is a storage engine put out by Tokutek. TokuDB uses fractal tree indexes instead of B-tree indexes to improve performance, which is dramatically noticeable when dealing with large tables (over 100 million rows).
For those that like the information “above the fold”, here is a table with results from a test comparing InnoDB and TokuDB. All the steps are explained in the post below, if you want more details, but here’s the table:
Importing ~40 million rows119 min 20.596 sec69 min 1.982 sec
INSERTing again, ~80 million rows total5 hours 13 min 52.58 sec56 min 44.56 sec
INSERTing again, ~160 million rows total20 hours 10 min 32.35 sec2 hours 2 min 11.95 sec
Size of table on
I used three tables, each with integer primary keys, having 109, 600 and 16k+ rows. I did two runs for each of the four algorithms: the first run used an empty destination table so all rows from the source had to be synced; the second run used an already synced destination table so all rows had to be checked but none were synced. I ran Perl with DProf to get simple wallclock and user time measurements.
Here are the results for the first run:
Mark Callaghan posted a good test of the MySQL query cache in different versions. His tests clearly show that in 5.0.44 and 5.0.84 and 5.1.38, there is more query throughput when the query cache is disabled.
However, the tests are skewed — not on purpose, I am sure, and Mark admits he has not used the query cache before — but they are skewed all the same. Mark’s error was that he assumed he could just turn on the query cache and see if it works. Most features of MySQL do not work that way — you have to understand the strengths and weaknesses of the feature in order to use it properly.
Mark’s benchmark definitely reinforces that turning on the query cache without any knowledge of your system is a bad idea, and I agree with him on that. But it does not in any way mean that[Read more...]
So during preparation of XtraDB template for EC2 I wanted to understand what IO characteristics we can expect from EBS volume ( I am speaking about single volume, not RAID as in my previous post). Yasufumi did some benchmarks and pointed me on interesting behavior, there seems several level of caching on EBS volume.
Let me show you. I did sysbench random read IO benchmark on files with size from 256M to 5GB with step 256M. And, as Morgan pointed me, I previously made first write, to avoid first-write penalty:
dd if=/dev/zero of=/dev/sdk bs=1M
for reference script is:PLAIN TEXT CODE:
During preparation of Percona-XtraDB template to run in RightScale environment, I noticed that IO performance on EBS volume in EC2 cloud is not quite perfect. So I have spent some time benchmarking volumes. Interesting part with EBS volumes is that you see it as device in your OS, so you can easily make software RAID from several volumes.
So I created 4 volumes ( I used m.large instance), and made:
RAID0 on 2 volumes as:
mdadm -C /dev/md0 --chunk=256 -n 2 -l 0 /dev/sdj /dev/sdk
RAID0 on 4 volumes as:
mdadm -C /dev/md0 --chunk=256 -n 4 -l 0 /dev/sdj /dev/sdk /dev/sdl /dev/sdm
RAID5 on 3 volumes as:
mdadm -C /dev/md0 --chunk=256 -n 3 -l 5 /dev/sdj /dev/sdk /dev/sdl
RAID10 on 4 volumes in two steps:
There was small delay in our releases, part of this time we worked on features I mentioned before:
- Moving InnoDB tables between servers
- Improve InnoDB recovery time
and rest time we played with performance trying to align XtraDB performance with MySQL 5.4 ® and also port all performance fixes to 5.0 tree.
So basically we made: new split-buffer-mutex patch, which separate global buffer pool mutex into several small mutexes, and ported some Google IO fixes.
Here are results what we have so far. As usually for benchmarks I used our workhorse Dell PowerEdge R900 with 16 cores and 32GB of RAM and RAID
Today, I would like to introduce “skyload“, a small project that I’ve been working on for the last couple of weeks. In brief, skyload is a libdrizzle based load emulation tool that is capable of running concurrent load tests against database instances that can speak Drizzle (and/or) the MySQL (http://www.mysql.com) protocol.
Something I’d like to emphasize here is that, skyload is not a replacement for mysqlslap or drizzleslap since it only provides a subset of what they can do. As I’ve stated on the project description, skyload is designed to do a good job at this subset of tasks by giving you[Read more...]
Lets get down to how the latest version of Waffle Grid performs.
Starting off simple lets look at the difference between the wafflegrid modes. As mentioned before the LRU mode is the “classic” Waffle Grid setup. A page is put into memcached when the page is removed from the buffer pool via the LRU process. When a page is retrieved from memcached it is expired so its no longer valid. In the New “Non-LRU” mode when a page is read from disk, the page is placed in memcached. When a dirty page is flushed to disk, this page is overwritten in memcached. So how do the different modes perform?
4GB Memcached, Read Ahead Enabled TPM % Increase No Waffle 3245.79 Baseline Waffle LRU 10731.34 330.62% Waffle NoLRU 10847.52 334.20%
You can see here that with 100% of[Read more...]
So I spent several hours over the last few days on the Secondary index bug. Out of frustration I decided to try and bypass the LRU concept all together and try going to a true secondary page cache. In standard Waffle a page is written to memcached only when it is expunged ( or LRU’d ) from the main buffer pool. This means anything in the BP should not be in memcached. Obviously with this approach we missed something, as Heikii pointed out in a comment to a previous post, it seems likely we are getting an old version of a page. Logically this could happen if we do not correctly expire a page on get or we bypass a push/lru leaving an old page in memcached to be retrieved later on.
So I was thinking why not bypass the LRU process? While I feel this is the most efficient way to do this, its not the only way. I modified innodb to use the default LRU code and then modified the page get to push to[Read more...]
So lets test some different configurations and try and build some best practices around Multiple SSD’s:
Which is better? Raid 5 or Raid 10?
As with regular disks, Raid 10 seems to performance better ( accept for pure reads ). I did get a lot of movement test to test like with the 67% read test -vs- the 75% or 80% tests. But all in all RAID 10 seemed to be the optimal config.
Should you enable the controller cache? One of the things I have found in my single drive tests is that “dumb” controllers tend to give better performance numbers then “smart” controllers. Really expensive controllers tend to have extra logic to compensate for the limitations of traditional disk. So I[Read more...]
It been sometime since we benchmarked MySQL/Galera with sysbench, using it mostly for testing. Our recent visit to Percona Performance Conference showed that sysbench is probably most widely used tool for MySQL benchmarking in the community and besides it is the only benchmark I know that correctly measures response times. So I just gave it a shot with our 0.6 release.
I ran OLTP test on 1-4 large EC2 instances. At first I tried 100K row table and it was good except that the deadlock rate was too high to my taste:
nodes users trx/s deadlks 95%lat -------------------------------------- 4 40 840 28.13 0.099 4 60 866 86.34 0.150 4 80 781 194.8 0.240
Note how deadlock rate escalates with the number of concurrent connections. But what is 100K rows by modern standards? Kids play. So I tried 1M rows. And it just shows that Galera cluster is cut for big tables:[Read more...]
So in my previous post I showed some benchmarks showing a large drop off in performance when you fill the x-25e. I wanted to followup and say this: even if you do everything correctly ( i.e. leave 50%+ space free, disable controller cache etc ) you may still see a drop in performance if your workload is heavily write skewed. To show this I ran a 100% random read sysbench fileio test over a 12GB dataset (37.5% full ) , the tests were run back-to-back over a several hours , here is what we see:
*Note the scale is a little skewed here ( i start at 2500 reqs ).
Each data point represents 2 million IO’s, so somewhere after about 6 million IO’s we start to drop. At the end it looks like we stabilize around2900-3000 requests per second, an overall drop of about 25%.
The plan was only to do two quick posts on RAID Performance on the X-25e, but this was compelling enough to post on it’s own. So in part I Mark Callaghan asked hey what gives with the SLC Intel’s single drive random write performance, It’s lower then the MLC drive. To be completely honest with you I had overlooked it, after all I was focusing on RAID performance. This was my mistake because this is actually caused by one of the Achilles heals of most flash on the market today, crappy performance when you fill more of the drive. I don’t really know what the official title for it is but I will call it “Drive Overeating”.
Let me try and put this simply: a quick trick most vendors use to push better random write #’s and help wear leveling is to not fully erase[Read more...]
Everyone loves SSD. It’s a hot topic all around the MySQL community with vendors lining up all kinds of new solutions to attack the “disk io” problem that has plagued us all for years and years. At this year’s user conference I talked about SSD’s and MySQL. Those who follow my blog know I love IO and I love to benchmark anything that can help overcome IO issues. One of the most exciting things out their at this point are the Intel x-25e drives. These bad boys are not only fast but relatively inexpensive. How fast are they? Let’s just do a quick bit of review here and peak at the single drive #’s from sysbench. Here you can see that a single X25-e outperforms all my other single drive test.
Javier Soltero, former CEO of Hyperic, has maintained that the sale of Hyperic to SpringSource was driven by discussion between himself and SpringSource CEO, Rod Johnson, but the fact that the companies shared investors - Accel Partners and Benchmark Capital - no doubt accelerated the deal (and I wonder whether either could have afforded to acquire the other without shared investors).
When examining the open source vendor landscape it is tempting to imagine that the combined total could be bigger than the sum of its parts - that a combination of many open source product specialists could mount a challenge to Red Hat and Sun to claim the title of biggest open source software vendor.
Benchmark and Accel[Read more...]
Pre-UC I put out a teaser on some dbt2 scores in the 50K range. I mentioned and showed the graphs during my SSD session, but I thought I would show them here for those who skipped the UC or did not attend my session. Basically what most people consider to be a classic “CPU Bound” workload where all of your data easily fits into memory can also see benefits from moving to SSD’s. Remember just because everything fits into memory doesn’t mean your not going to be doing some operations to disk ( logging, flushes, etc ). Take a look:
Well I just finished my SSD session. I was concerned by the amount of slides I had so I kept trimming them back. What happened? I finished early. Why? The problem with storage is it’s not really that sexy. I mean nobody ( sane anyways ) drools over drive specs ( I do consider myself insane by the way , and do drool over some drives ). CPU’s are sexy… memory… sexy … graphics cards sexy… drives… are not. We only had a small crowd turn out (50-60 people maybe), but they were vocal and interactive. I got some great feedback from others who love IO performance as much as me. In fact one antendee stopped me in the hall and thanked me , telling me I helped him make up his mind to purcahse some SSD drives, which made it worthwhile.