In the process of building a new benchmark tool for Yahoo, I
needed a good "guinea pig." I think I found the one by showing
how much more powerful covering indexes can be with InnoDB. A
covering index is one where the index itself contains all of the
necessary data field(s). In other words, no need to access the
data page!
Here's a sample table:
CREATE TABLE entity (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`entity_type` enum('Advertiser','Publisher','Network') NOT NULL DEFAULT 'Advertiser',
`managing_entity_id` int(10) unsigned DEFAULT NULL,
....
PRIMARY KEY (`id`),
KEY `ix_entity_managing_entity_id` (`managing_entity_id`),
....
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
It's a wide table with 88 some odd columns with an average row
length of 240 bytes.
Now to test the QPS difference between a query like this:
SELECT * FROM entity WHERE entity.managing_entity_id = X;
and a query that uses a covering index:
SELECT entity.managing_entity_id FROM entity WHERE entity.managing_entity_id = X;
I have an SQL file that contains 2000 SQL statements with fairly
random ids, and I'll run my benchmark program with 4 clients a
few times to get a good QPS reading (throwing out the low & high
numbers).
Results with SELECT *:
Tot Q=2000,Tot T=1.757571,QPS=1137.93411475269,Min=0.0006665,Max=0.027022,90th Pctl=0.000946
Tot Q=2000,Tot T=1.770522,QPS=1129.61036349732,Min=0.0006635,Max=0.026858,90th Pctl=0.00096675
Tot Q=2000,Tot T=1.753147,QPS=1140.80564835693,Min=0.00066325,Max=0.026966,90th Pctl=0.00095175
Results with covering index:
Tot Q=2000,Tot T=0.703581,QPS=2842.60092299252,Min=0.00026625,Max=0.014821,90th Pctl=0.00035325
Tot Q=2000,Tot T=0.897792,QPS=2227.6874821785,Min=0.00026625,Max=0.04546425,90th Pctl=0.0003595
Tot Q=2000,Tot T=0.720616,QPS=2775.403266094,Min=0.000257,Max=0.01566475,90th Pctl=0.0003452
Fairly significant results. I've seen many times where SQL has
been hastily written just to perform a simple check if a record
exists or not. I've always tried to rationalize this with the
developers I work with, but I never had the raw numbers to show
the exact impact. If the query happens only a handful of times,
not a huge deal. But, if it happens hundreds of thousands or
millions of times per day, then that's a different story, and
making a very simple rewrite of the SQL helps immensely, and
could even save on cost, if you're thinking about adding another
slave to spread out load.