New optimization in MariaDB 10.0: EXISTS-to-IN subquery rewrite

MariaDB 10.0 has got another new feature: Sanja Byelkin has pushed EXISTS-to-IN rewrite feature. It is an optimization targeted at EXISTS subqueries. The idea behind it is simple. EXISTS subqueries often have form:

EXISTS (SELECT …  FROM … WHERE outer_col=inner_col AND inner_where)

where outer_col=inner_col is the only place where the subquery has references to outside. In this case, the subquery can be converted into an uncorrelated IN:

outer_col IN (SELECT inner_col FROM … WHERE inner_where)

The conversion opens new opportunities for the optimizer. Correlated EXISTS subquery has only one execution strategy. Uncorrelated IN subquery has two:

  1. re-run the subquery every time the subquery is evaluated (the same as in EXISTS)
  2. Materialize the subquery output into a temporary table. Then, evaluation of subquery predicate will be reduced to making a lookup in that temporary table

The optimizer is able to make a cost-based choice whether to do #1 or #2. But wait, there is more: if the subquery is at top level of the WHERE clause, like

SELECT * FROM person WHERE EXISTS(SELECT… FROM book WHERE book.author=person.name) AND …

then the subquery is converted into a semi-join, which has even more execution strategies, including those that will use the subquery to “drive” access to tables in the parent select. In the above example, EXISTS-to-IN allows the optimizer to first access the books, and then for each book access its author. When your database has a few books and lots of people, this execution order can give a big speedup.

Now, a little context. EXISTS->IN rewrite is useful, but is not a revolutionary breakthrough. PostgreSQL has a similar optimization. They have introduced it in release 8.4.1, which happened in July, 2009. IIRC MySQL/Sun also had a task for adding EXISTS->IN rewrite, also around 2009. I don’t know what the fate of that task was. It is definitely not in MySQL 5.6, and google doesn’t find the worklog entry, I suppose because it has been made private. MariaDB’s implementation was developed from scratch here at Monty Program Ab. Now, when we have it in MariaDB 10.0, I’m wondering what are the other important subquery cases that we have not yet covered.