Dynamic range access (and recent changes)

Dynamic range access (aka “Range checked for each record” since that is what EXPLAIN will say in the ‘Extra’ column) is one of the big mysteries to MySQL users. The reason is that it is used for queries that are less common, such as queries with non-equality join predicates. The following query is an example; it finds the number of messages sent since the user was last logged in:

EXPLAIN SELECT user.user_id, COUNT(message.id)
FROM message, user
WHERE message.send_date >= user.last_activity
GROUP BY user.user_id;

However, although such join predicates are less common, it doesn’t mean that the dynamic range access method is not important.

How the dynamic range access method works

It is easier to explain dynamic range access if we consider a “normal” join first, so let’s take a look at this query with an equality join predicate:

mysql> EXPLAIN SELECT user.user_id, user.name, message.id
    -> FROM user JOIN message ON user_id=message.sender_id
    -> WHERE user.name="Fred%";
+----+-------------+---------+--------+---------+---------+------+-------------+
| id | select_type | table   | type   | key     | ref     | rows | Extra       |
+----+-------------+---------+--------+---------+---------+------+-------------+
|  1 | SIMPLE      | user    | ALL    | NULL    | NULL    | 6384 | Using index |
|  1 | SIMPLE      | message | ref    | send_id | user_id |   10 | Using index |
+----+-------------+---------+--------+---------+---------+------+-------------+

What this tells us is that MySQL will read a row from ‘user’ and, if the name starts with “Fred”, check the ‘send_id’ index of ‘message’ to see if there are any rows with exactly that sender_id. If so, each  {user,message} row combination is returned to the user. MySQL then repeats the process with the next row in ‘user’ until all rows have been processed.

Dynamic range works in a similar way.  Again, consider the query with a non-equality join predicate shown above:

mysql> EXPLAIN SELECT user.user_id, COUNT(message.id)
    -> FROM message, user
    -> WHERE message.send_date >= user.last_activity
    -> GROUP BY user.user_id;
+-----+---------+-------+-------+------------------------------------------------+
| ... | table   | type  | rows  | Extra                                          |
+-----+---------+-------+-------+------------------------------------------------+
| ... | user    | index | 6384  | NULL                                           |
| ... | message | ALL   | 32240 | Range checked for each record (index map: 0x2) |
+-----+---------+-------+-------+------------------------------------------------+

For this query, MySQL will start by reading a row from ‘user’. MySQL now has to find all messages not yet delivered to the user, and it can do this in two ways: Read all rows in the table through a table scan or, read all rows which were sent after the last time this user logged out using the dynamic range access method on index ’0×2′.

To make this choice between range and table scan, range access analysis is now performed on table ‘message’ using the value of ‘user.last_activity’ as input. If the cost calculations indicate that range access is cheaper than a table scan, the range access method is used. This analysis is done independently for each row in ‘user’. Thus, if a user “John” has recently been logged in, it is likely that the range access method is used to get a few messages not yet delivered to him. On the other hand, it is likely that a table scan will be used to all the messages not yet delivered to user “Jen” who hasn’t logged in for two years.

The dynamic range access method may significantly improve the response time of a query if only a small portion of the rows match a non-equality join predicate, and this is why it is an important access method even if it is used only for less common queries:

# Use dynamic range access
mysql> SELECT user.user_id, count(message.id)
    -> FROM message, user
    -> WHERE message.send_date >= user.last_activity
    -> GROUP BY user.user_id;
...
1 row in set (1.06 sec)

# Same query, but do NOT use dynamic range access
mysql> SELECT user.user_id, count(message.id)
    -> FROM message IGNORE INDEX(send_date), user
    -> WHERE message.send_date >= user.last_activity
    -> GROUP BY user.user_id;
...
1 row in set (44.84 sec)

Recent changes to the dynamic range access method

Now that dynamic range access has been explained it’s time to discuss a few fairly recent improvements. Before MySQL 5.6.8, dynamic range could not be used for subquery predicates with outer references (columns resolved outside the subquery) like this one:

EXPLAIN SELECT id, salary
FROM employee
WHERE NOT EXISTS (
      SELECT *
      FROM employee e2
      WHERE e2.salary > employee.salary);

Notice the employee reference in the subquery. BUG#41659 fixes this so that the dynamic range access method can be used for outer references like the above.

The other recent change applies to the cases where comparison between two  columns of different data type cannot be performed in the index. One example is comparison between a temporal datatype (date, datetime etc) and a VARCHAR column. For instance, let’s say that in the query above, ‘message.send_date’ is a VARCHAR and ‘user.last_activity’ is a DATE. When this is the case, access methods that do comparisons in the index (ref, eq_ref, range) cannot be used to perform the join.

However, prior to MySQL 5.7.4, the dynamic range access method would not be completely refused for a query like this. In fact, the queries that use dynamic range above would still print “Range checked for… ” in EXPLAIN, and the range access analysis would be performed for each row in the referred-to table… and the range analysis module would have to reply “Sorry – can’t do it” every time. For the query

SELECT user.user_id, count(message.id)
FROM message, user
WHERE message.send_date >= user.last_activity
GROUP BY user.user_id;

it would be something like this:

  1. Read a row from ‘user’
  2. Invoke range analysis: “what’s the cost of doing range access using the value of ‘user.id’“?
  3. Range analysis calculates and after a while comes up with the answer: “Sorry, that’s impossible because these datatypes are not comparable in the index”
  4. Perform table scan on ‘message’ to find all join matches
  5. Repeat from 1)

It is even possible to get in this situation with equality predicates because the ref/eq_ref analysis module of MySQL was made aware of such incompatibilities a long time ago. Thus, the query

SELECT user.user_id, user.name, message.id
FROM user JOIN message ON user_id=message.sender_id
WHERE user.name="Fred%";

could actually end up using dynamic range if ‘user_id’ was incomparable (in the index) to ‘sender_id’ because ref access is refused once and for all while dynamic range access is not. That’s what this guy indirectly found out.

Obviously, the MySQL optimizer would do better if it realized that dynamic range access was impossible the first time the range access module was invoked, and starting from MySQL 5.7.4 it does. The performance implications are not dramatic, but you can expect up to ~10% reduced response time (depending on the data) for such queries.