- Subquery materialization has a high start up cost (it needs to create and fill the temporary table).
- But afterwards it has fast lookups (temporary table has a hash index, no duplicates, and is in memory).
- In other words, compared to EXISTS, the first evaluation of the IN predicate is slow (high start up cost) and all following evaluations are fast (just a hash lookup).
- In the DBT 3 setup, one outer table (named "part") has 200,000 rows, so there are 200,000 evaluations of IN, thus subquery materialization wins over EXISTS because the time it loses in the first evaluation is more than compensated by the many faster following evaluations.
To show it in action, I will use query Q16 again. First I will run it with the normal "part" table which has 200,000 rows. Then I will reduce this table to only 200 rows, and run the query again. Each time, I will run EXPLAIN to see what subquery strategy is chosen by the optimizer. I will also, by tweaking the optimizer_switch variable, force the optimizer to use the other strategy which it didn't like, in order to verify that it is indeed worse.
For brevity, let me jump directly to the results, obtained with a release build of MySQL 5.6.7 on my machine:
Rows in part Optimizer chooses Execution time If I force alternative 200,000 Materialization 550 ms 830 ms 200 EXISTS 1 ms 10 ms
We can see that in both cases the optimizer has made the right choice!