Late row lookups: InnoDB

Answering questions asked on the site.

Aryé asks:

Thanks for your article about late row lookups in MySQL.

I have two questions for you please:

  • Is this workaround specific to MyISAM engine?
  • How does PostgreSQL handle this?

The questions concerns a certain workaround for MySQL LIMIT … OFFSET queries like this:

SELECT  *
FROM    mytable
ORDER BY
id
LIMIT   10
OFFSET  10000

which can be improved using a little rewrite:

SELECT  m.*
FROM    (
SELECT  id
FROM    mytable
ORDER BY
id
LIMIT   10
OFFSET  10000
) q
JOIN    mytable m
ON      m.id = q.id
ORDER BY
m.id

For the rationale behind this improvement, please read the original article.

Now, to the questions.

The second questions is easy: PostgreSQL won't pull the fields from the table until it really needs them. If a query involving an ORDER BY along with LIMIT and OFFSET is optimized to use the index for the ORDER BY part, the table lookups won't happen for the records skipped.

Though PostgreSQL does not reflect the table lookups in the EXPLAIN output, a simple test would show us that they are done only LIMIT times, not OFFSET + LIMIT (like MySQL does).

Now, let's try to answer the first question: will this trick improve the queries against an InnoDB table?

To do this, we will create a sample table:

Table creation details

CREATE TABLE filler (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE lookup (
id INT NOT NULL PRIMARY KEY,
value INT NOT NULL,
shorttxt TEXT NOT NULL,
longtxt TEXT NOT NULL
) ENGINE=InnoDB ROW_FORMAT=COMPACT;

CREATE INDEX
ix_lookup_value
ON      lookup (value);

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
DECLARE _cnt INT;
SET _cnt = 1;
WHILE _cnt <= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(100000);
COMMIT;

INSERT
INTO    lookup
SELECT  id, CEILING(RAND(20110211) * 1000000),
        RPAD('', CEILING(RAND(20110211 << 1) * 100), '*'),
        RPAD('', CEILING(8192 + RAND(20110211 << 1) * 100), '*')
FROM    filler;

There is one indexed INT column and two TEXT columns, shorttxt storing short strings (1 to 100 characters), longtxt storing long strings (8193 to 8293 characters).

Let's run some queries against this table.

PRIMARY KEY and the INT column

No rewrite:

SELECT  value
FROM    lookup
ORDER BY
id
LIMIT   10
OFFSET  90000

Query results

value
12336
314476
535374
733443
61089
105117
342318
396237
954232
582449
10 rows fetched in 0.0002s (0.0415s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE lookup index PRIMARY 4 90010 622.36
select `20110211_late`.`lookup`.`value` AS `value` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`id` limit 90000,10

Rewrite:

Query results

SELECT  l.value
FROM    (
SELECT  id
FROM    lookup
ORDER BY
id
LIMIT   10
OFFSET  90000
) q
JOIN    lookup l
ON      l.id = q.id
ORDER BY
q.id

value
12336
314476
535374
733443
61089
105117
342318
396237
954232
582449
10 rows fetched in 0.0002s (0.0407s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 10 100.00 Using filesort
1 PRIMARY l eq_ref PRIMARY PRIMARY 4 q.id 1 100.00
2 DERIVED lookup index PRIMARY 4 90010 622.36 Using index
select `20110211_late`.`l`.`value` AS `value` from (select `20110211_late`.`lookup`.`id` AS `id` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`id` limit 90000,10) `q` join `20110211_late`.`lookup` `l` where (`20110211_late`.`l`.`id` = `q`.`id`) order by `q`.`id`

As you can see, there is almost no difference (41 ms vs 40 ms).

InnoDB tables are clustered on their PRIMARY KEY columns, which means the the index on id (used to serve the ORDER BY condition) contains all the data the query needs. There is a negligible benefit from not lookup up the value columns at the index pages because the column is tiny and the index pages need to be read anyway.

PRIMARY KEY and the short TEXT column

Rewrite:

SELECT  LENGTH(l.shorttxt)
FROM    (
SELECT  id
FROM    lookup
ORDER BY
id
LIMIT   10
OFFSET  90000
) q
JOIN    lookup l
ON      l.id = q.id
ORDER BY
q.id

Query results

LENGTH(l.shorttxt)
17
41
52
39
36
65
15
78
44
85
10 rows fetched in 0.0003s (0.0401s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 10 100.00 Using filesort
1 PRIMARY l eq_ref PRIMARY PRIMARY 4 q.id 1 100.00
2 DERIVED lookup index PRIMARY 4 90010 622.36 Using index
select length(`20110211_late`.`l`.`shorttxt`) AS `LENGTH(l.shorttxt)` from (select `20110211_late`.`lookup`.`id` AS `id` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`id` limit 90000,10) `q` join `20110211_late`.`lookup` `l` where (`20110211_late`.`l`.`id` = `q`.`id`) order by `q`.`id`

No rewrite:

SELECT  LENGTH(shorttxt)
FROM    lookup
ORDER BY
id
LIMIT   10
OFFSET  90000

Query results

LENGTH(shorttxt)
17
41
52
39
36
65
15
78
44
85
10 rows fetched in 0.0002s (0.0925s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE lookup index PRIMARY 4 90010 622.36
select length(`20110211_late`.`lookup`.`shorttxt`) AS `LENGTH(shorttxt)` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`id` limit 90000,10

There is quite a significant difference (92 ms vs 40 ms) the reasons for which we will discuss a little bit later, after we see the results of the third query.

PRIMARY KEY and the long TEXT column

Rewrite:

SELECT  LENGTH(l.longtxt)
FROM    (
SELECT  id
FROM    lookup
ORDER BY
id
LIMIT   10
OFFSET  90000
) q
JOIN    lookup l
ON      l.id = q.id
ORDER BY
q.id

Query results

LENGTH(l.longtxt)
8209
8233
8244
8231
8228
8257
8207
8270
8236
8277
10 rows fetched in 0.0002s (0.0396s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 10 100.00 Using filesort
1 PRIMARY l eq_ref PRIMARY PRIMARY 4 q.id 1 100.00
2 DERIVED lookup index PRIMARY 4 90010 622.36 Using index
select length(`20110211_late`.`l`.`longtxt`) AS `LENGTH(l.longtxt)` from (select `20110211_late`.`lookup`.`id` AS `id` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`id` limit 90000,10) `q` join `20110211_late`.`lookup` `l` where (`20110211_late`.`l`.`id` = `q`.`id`) order by `q`.`id`

No rewrite:

SELECT  LENGTH(longtxt)
FROM    lookup
ORDER BY
id
LIMIT   10
OFFSET  90000

Query results

LENGTH(longtxt)
8209
8233
8244
8231
8228
8257
8207
8270
8236
8277
10 rows fetched in 0.0000s (30.3594s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE lookup index PRIMARY 4 90010 622.36
select length(`20110211_late`.`lookup`.`longtxt`) AS `LENGTH(longtxt)` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`id` limit 90000,10

The query employing the trick runs for same 40 ms, while the straightforward query takes as much as 30 seconds (30,359 ms, to be exact).

Why such a difference?

The reason is that InnoDB, despite the fact it stores the data in the clustered index, is still able to move some data out of the index. This is called external storage.

With COMPACT row format I used to create the tables, InnoDB, when trying to insert a record with two TEXT columns on the page, will try to fit both of them on the page. If this is not possible, then it will split the longest of the records in two parts: the first 768 bytes will be stored on the page and the remaining data will be stored on a separate page (or pages), with a pointer to these data stored in the original clustered index. This will be repeated until all TEXT columns fit on the page of there is no more space there (in which case an error would be thrown).

This means that all TEXT columns shorter than 768 bytes will be stored completely on the page, while those longer can be either split of stored as a whole (with at least first 768 bytes still being on the page).

With the column lengths chosen (1 to 100 and 8K to 8K + 100, accordingly), it can be easily seen that shorttxt will always be stored on-page, while longtxt will always be split (since InnoDB allows at most 8K per record (minus some overhead) to be stored on one page).

Now, this becomes more clear. As with MyISAM, the straightforward query involving longtxt should perform two page lookups per each record scanned: the first one on the clustered index, the second one on the external storage. This takes lots of time and may even spoil the InnoDB cache with unneeded data (which would lead to increased cache miss ratio).

The query on shorttxt is not that bad, since it only requires come extra CPU cycles per record to calculate the string length.

Now, let's check one more query which orders by a secondary indexed field rather than id:

Secondary index and the short text column

Rewrite:

SELECT  LENGTH(l.shorttxt)
FROM    (
SELECT  id, value
FROM    lookup
ORDER BY
value
LIMIT   10
OFFSET  90000
) q
JOIN    lookup l
ON      l.id = q.id
ORDER BY
q.value

Query results

LENGTH(l.shorttxt)
14
85
16
34
77
78
3
49
53
60
10 rows fetched in 0.0002s (0.0262s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 10 100.00 Using filesort
1 PRIMARY l eq_ref PRIMARY PRIMARY 4 q.id 1 100.00
2 DERIVED lookup index ix_lookup_value 4 90010 622.36 Using index
select length(`20110211_late`.`l`.`shorttxt`) AS `LENGTH(l.shorttxt)` from (select `20110211_late`.`lookup`.`id` AS `id`,`20110211_late`.`lookup`.`value` AS `value` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`value` limit 90000,10) `q` join `20110211_late`.`lookup` `l` where (`20110211_late`.`l`.`id` = `q`.`id`) order by `q`.`value`

No rewrite:

SELECT  LENGTH(shorttxt)
FROM    lookup
ORDER BY
value
LIMIT   10
OFFSET  90000

Query results

LENGTH(shorttxt)
14
85
16
34
77
78
3
49
53
60
10 rows fetched in 0.0002s (0.2663s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE lookup index ix_lookup_value 4 90010 622.36
select length(`20110211_late`.`lookup`.`shorttxt`) AS `LENGTH(shorttxt)` from `20110211_late`.`lookup` order by `20110211_late`.`lookup`.`value` limit 90000,10

The execution times for these queries vary tenfold: 26 ms vs 266 ms.

As with MyISAM, the secondary index requires an extra lookup to retrieve the actual values from the table (the only difference is that any index is secondary in MyISAM, including that used to police the PRIMARY KEY).

The first query does not perform these row lookups on the skipped records and hence is ten times as fast. It is even faster than queries ordering on the PRIMARY KEY, because the secondary index contains significantly less data than the PRIMARY KEY, holds much more records per page, is hence more shallow and can be traversed faster.

The second query does perform the early row lookups, as usual.

Conclusion

A trick used to avoid early row lookups for the LIMIT … OFFSET queries is useful on InnoDB tables too, though to different extent, depending on the ORDER BY condition and the columns involved:

  • It's very useful on queries involving columns stored off-page (long TEXT, BLOB and VARCHAR columns)
  • It's very useful on ORDER BY conditions served by secondary indexes
  • It's quite useful on moderate sized columns (still stored on page) or CPU-intensive expressions
  • It's almost useless on short columns without complex CPU-intensive processing

Hope that helps.

I'm always glad to answer the questions regarding database queries.

Ask me a question

The post Late row lookups: InnoDB appeared first on EXPLAIN EXTENDED.