Covering indexes in MySQL - revisited (with benchmark)

In the process of building a new benchmark tool for Yahoo, I needed a good "guinea pig." I think I found the one by showing how much more powerful covering indexes can be with InnoDB. A covering index is one where the index itself contains all of the necessary data field(s). In other words, no need to access the data page!

Here's a sample table:

CREATE TABLE entity (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`entity_type` enum('Advertiser','Publisher','Network') NOT NULL DEFAULT 'Advertiser',
`managing_entity_id` int(10) unsigned DEFAULT NULL,
....
PRIMARY KEY (`id`),
KEY `ix_entity_managing_entity_id` (`managing_entity_id`),
....
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


It's a wide table with 88 some odd columns with an average row length of 240 bytes.

Now to test the QPS difference between a query like this:

SELECT * FROM entity WHERE entity.managing_entity_id = X;


and a query that uses a covering index:

SELECT entity.managing_entity_id FROM entity WHERE entity.managing_entity_id = X;


I have an SQL file that contains 2000 SQL statements with fairly random ids, and I'll run my benchmark program with 4 clients a few times to get a good QPS reading (throwing out the low & high numbers).

Results with SELECT *:

Tot Q=2000,Tot T=1.757571,QPS=1137.93411475269,Min=0.0006665,Max=0.027022,90th Pctl=0.000946
Tot Q=2000,Tot T=1.770522,QPS=1129.61036349732,Min=0.0006635,Max=0.026858,90th Pctl=0.00096675
Tot Q=2000,Tot T=1.753147,QPS=1140.80564835693,Min=0.00066325,Max=0.026966,90th Pctl=0.00095175


Results with covering index:

Tot Q=2000,Tot T=0.703581,QPS=2842.60092299252,Min=0.00026625,Max=0.014821,90th Pctl=0.00035325
Tot Q=2000,Tot T=0.897792,QPS=2227.6874821785,Min=0.00026625,Max=0.04546425,90th Pctl=0.0003595
Tot Q=2000,Tot T=0.720616,QPS=2775.403266094,Min=0.000257,Max=0.01566475,90th Pctl=0.0003452


Fairly significant results. I've seen many times where SQL has been hastily written just to perform a simple check if a record exists or not. I've always tried to rationalize this with the developers I work with, but I never had the raw numbers to show the exact impact. If the query happens only a handful of times, not a huge deal. But, if it happens hundreds of thousands or millions of times per day, then that's a different story, and making a very simple rewrite of the SQL helps immensely, and could even save on cost, if you're thinking about adding another slave to spread out load.