Salle gave the first answer. Converting the tables to MyISAM
would only exacerbate the problem. Instead of having one file per
table, the …
NDB Cluster is the only integrated sharding framework that I know of (educate me if I am wrong) but it is known to have issues with large joins. These days, large databases that would benefit from a sharding framework are often for social networking type applications that requires large joins.
Actually it is not NDB that is the root cause of the joins problem, it is the way MySQL executes joins. Instead of asking the cluster for a large number of rows for the secondary table it joins to, the current version of MySQL does ask one row at a time. NDB cluster answers those queries very rapidly but, the time to hop over the network kills performance. The MySQL-6.0 branch will implement the Batch Key Access (BKA) algorithm which will solve that issue and might create database application killer with NDB cluster.
Although right now BKA is not available, there are ways to execute those queries in an efficient way by rewriting them. The …
[Read more]My favorite Linux tool in DB work is ‘iostat -x’ (and I really really want to see whenever I’m doing any kind of performance analysis), yet I had to learn its limitations and properties. For example, I took 1s snapshot from a slightly overloaded 16-disk database box:
avg-cpu: %user %nice %system %iowait %steal %idle 8.12 0.00 2.57 21.65 0.00 67.66 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s \ sda 7684.00 19.00 2420.00 498.00 81848.00 5287.00 \ avgrq-sz avgqu-sz await svctm %util 29.86 32.99 11.17 0.34 100.00
I pasted this somewhere on IRC, and got “doesn’t look too healthy” and that it is disk-bound. Now, to understand if it really is, one has to understand what iostat tells here.
First line of numbers shows that we’ve got plenty of CPU resources (thats because nowadays it is quite difficult to get a box with not enough CPU power, …
[Read more]Why should you care about the latest “early adopter” release of the InnoDB Plugin, version 1.0.3? One word: performance! The release introduces these features:
- Enhanced concurrency & scalability: the “Google SMP patch” using atomic instructions for mutexing
- More efficient memory allocation: ability to use more scalable platform memory allocator
- Improved out-of-the-box scalability: unlimited concurrent thread execution by default
- Dynamic tuning: at run-time, enable or disable insert buffering and adaptive hash indexing
These new performance features can yield up to twice the throughput or more, depending on your workload, platform and other tuning considerations. In another post, we explore some details about these changes, but first, what do these enhancements mean for performance and scalability?
In brief, we’ve tested …
[Read more]In my previous Post, we took a look at the file structure of the InnoDB primary key index. With the same table structure, but a slightly different dataset:
mysql> select * from test_innodb_growth limit 10; +----+------------+--------+ | id | data | spacer | +----+------------+--------+ | 1 | a | | | | 2 | aa | | | | 3 | aaa | | | | 4 | aaaa | | | | 5 | aaaaa | | | | 6 | aaaaaa | | | | 7 | aaaaaaa | | | | 8 | aaaaaaaa | | | | 9 | aaaaaaaaa | | | | 10 | aaaaaaaaaa | | | +----+------------+--------+ 10 rows in set (0.00 sec)
All the rows after id = 10 have data = ‘aaaaa’. Let’s take a look at the secondary index ‘idxdata’. I recently had an argument with a client that claimed that a varchar is fully expanded in a secondary index. The following will prove it is not …
[Read more]While going through the DATA TYPES section of the Certification Study Guide, I was refreshed of the ENUM datatype, which I rarely use.
I usually create individual tables for enumerations, so that new values can be added with just an insert, or deprecated values can be marked as such.
However, today I got to think about the performance issues involved in all that joining, and how could an ENUM column improve a select.
Here’s what I came up with:
mysql> create table project_types ( -> id int unsigned not null auto_increment, -> name char(30) not null, -> primary key (id), -> index(name) -> ) Engine InnoDB; Query OK, 0 rows affected (0.00 sec)
create the projects table
mysql> create table projects ( -> id int unsigned not null auto_increment, -> name char(30) not null, -> project_type int unsigned not null, -> primary key (id), …[Read more]
This is part 3 in my RAMSan Series.
While I am confident the read-only test was a reasonably good test ( I just needed to push more ), my mixed load test was marred by issues. It was really a quick attempt to get a heavy read/write workload. I ran into issues with how I wrote this so I will spare you the details. Some flash devices are notoriously poor performing in writes, so its important to at least briefly look at this. What I will share are the IOPS & latency numbers from this test. The mixed workload does updates & selects at this point, these are a mix of PK updates, secondary index updates, etc. These typically are built to run faster and smaller the the read-only IO bound workload.
By the 11th interval the Ramsan was pretty much complete. The peaks are whats interesting… lets look at this in a slightly different way.
So in the admittedly flawed mixed workload I am …
[Read more]Since a long time people (including me) complained that storeconfigs was a real resource hog. Unfortunately for us, this option is so cool and useful.
What’s storeconfigs
Storeconfigs is a puppetmasterd option that stores the nodes actual configuration to a database. It does this by comparing the result of the last compilation against what is actually in the database, resource per resource, then parameter per parameter, and so on.
The actual implementation is based on Rails’ Active Record, which is a great way to abstract the gory details of the database, and prototype code easily …
[Read more]This is part 2 of My RAMSan Series.
In my normal suite of benchmarks I typically run dbt2 & sysbench oltp benchmarks next… and I did run then, but to be honest they just weren’t that interesting. They showed an improvement over my intel ssd results I ran on frankenmatt, but it was difficult to provide an apples to apples comparison. The server hardware was way different ( cpu, memory, controller, etc ). Plus I typically run a test -vs- non-flash then a test with flash, and ran tests with varying degrees of memory… the test box had 2GB of memory and sparse internal disk, so my normal test cycles were already in jeopardy. For what I ran I was pushing CPU limits long before I was hitting the IOPS I saw above. In fact in a 100W test I ended up peaking @ 1200 iops, while the CPU was @ 100%.
The challenge is building an effective solution that will easily maximize MySQL …
[Read more]I am going to get asked later today on this, so I figured I would head it off at the pass. So I am working on a new benchmark, and I am using an early test release of it to do some of my SSD testing. The quick 10K foot overview of what It does.
I build three large tables and several small tables. The large
tables contain the exact same structure as each. Why 3? well its
a cheap easy way to help ensure that the BP is turned over, and I
can use them later to join to each other. One of the things I
wanted to accomplish was to test different types of queries… I.e.
full scan, index scan with low card, med card, high data skew,
etc. So I added columns to help with each. Here is the structure
of the large tables:
create table benchmark_a_lrg ( a_id int not null auto_increment, b_id int, c_id int, a_name varchar(200), a_post text, a_var1 … |