Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Employee Cost-based choice between subquery materialization and EXISTS
+7 Vote Up -1 Vote Down
In a previous post, I had demonstrated how subquery materialization, introduced in MySQL 5.6.5, improves the performance of certain queries, like query Q16 of DBT3. Such improvement was easily explained:
  • Subquery materialization has a high start up cost (it needs to create and fill the temporary table).
  • But afterwards it has fast lookups (temporary table has a hash index, no duplicates, and is in memory).
  • In other words, compared to EXISTS, the first evaluation of the IN predicate is slow (high start up cost) and all following evaluations are fast (just a hash lookup).
  • In the DBT 3 setup, one outer table (named "part") has 200,000 rows, so there are 200,000 evaluations of IN, thus subquery materialization wins over EXISTS because the time it loses in the first evaluation is more than compensated by the many faster following evaluations.
However, if there were only few outer rows, then subquery materialization should logically be slower than EXISTS (the compensation would not happen anymore)... MySQL 5.6.5, by blindly always choosing subquery materialization, takes the risk of making certain queries slower. There needs to be a cost-based choice between the two strategies, to pick the best, depending on the situation! That is what I have implemented in MySQL 5.6.7.

To show it in action, I will use query Q16 again. First I will run it with the normal "part" table which has 200,000 rows. Then I will reduce this table to only 200 rows, and run the query again. Each time, I will run EXPLAIN to see what subquery strategy is chosen by the optimizer. I will also, by tweaking the optimizer_switch variable, force the optimizer to use the other strategy which it didn't like, in order to verify that it is indeed worse.

For brevity, let me jump directly to the results, obtained with a release build of MySQL 5.6.7 on my machine:

Rows in part Optimizer chooses Execution time If I force alternative 200,000 Materialization 550 ms 830 ms 200 EXISTS 1 ms 10 ms
We can see that in both cases the optimizer has made the right choice!

Votes:

You must be logged in with a MySQL account to vote on Planet MySQL entries. More information on PlanetMySQL voting.

Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.