State of the UNION

Since union queries were introduced in MySQL 4.0.0, they have been processed by executing each part of the union and adding the result to a temporary table. Then a final query block is executed that, if needed, filters out duplicates, do global ordering and limit the number of output rows. The good thing about this method is that it works for all union queries. The bad thing is that it always uses a temporary table, even in the really simple cases, and there’s always an extra query block to execute, even if there’s no sorting and no duplicate filtering.

As of MySQL 5.7.3, UNION ALL doesn’t use temporary tables unless needed for sorting. The result of each part of the union will be sent directly back to the client, without waiting in a temporary table or passing through an extra query block. When 5.7.3 came out, Facebook (who filed the feature request for this) cheered, and Morgan took the new feature for a test run. If you haven’t already done so, I recommend reading Morgan’s blog post that will tell you how to check if your query is optimized by this feature.

Performance improvement

It’s hard to give an objective measure of the speedup, and we haven’t claimed any numbers. The reason is that the most important speedup depends on the data and the queries that are executed. I could run some tests and present numbers, but unless your queries look like mine, my numbers wouldn’t help. You really need to evaluate this with your own queries and data.

The thing that has really changed a lot is the time the client has to wait for the first row. Earlier, the client would have to wait until the server had executed all parts of the query and inserted the resulting rows into the temporary table before the final query block could read the temporary table and output the first row to the client. Now the server outputs the first row as soon as it is found.

The speedup depends on the execution time of the remaining parts of the union. The time saved waiting for the first row is proportional to the total execution time of the rest of the query. If one part of the union is significantly slower than the others, and time to first row matters, you should consider pushing that query block to the end of the union.

Of course, none of this will reduce the total execution time. You may gain something due to less internal processing of the data, especially if the temporary table grows large and is moved from main memory to disk. Again, the performance improvement depends on your data and queries. As a rule of thumb, the worse your queries perform now, the more improvement you’ll see.

In a comment to the Facebook post, Robert Eisele suggested a further improvement: to stop execution early if the global LIMIT is reached. It’s a very good idea that would significantly reduce the server load if the limit is small relative to the total number of result rows. There’s no use in executing a query past the last row the client will see, and this, as opposed to the improvements mentioned above, will reduce the total execution time of the query. I searched the bug database and found that we already have a feature request for this: bug #7888. And I have an idea of how it could be implemented.

Another aspect of the new UNION ALL processing is that there is no longer a temporary table to fill your memory or disk while executing the query. Judging by the comments in the feature request, there are people that will appreciate this improvement, too. (BTW, remember to not put your on-disk temporary tables in main memory.)

Backporting

In a comment to the same Facebook post, Kacper Rowiński writes what I interpret as a request to backport the feature to 5.5. I understand that people want this, but let me give a few reasons not to do it:

  • Linux distros expect GA releases to only fix bugs, not introduce new features. Ideally, they would want only security bug fixes.
  • Every new feature we backport increases the risk of regressions, and none of us wants that.
  • Even though this feature is stand-alone and doesn’t directly depend on other new features, we’ve done other changes in 5.7 that makes it non-trivial to backport this feature.

That said, MySQL is open source, so if anyone wants to backport it and compile their own server, go ahead! Keep an eye out for how the global limit (i.e., the limit for the whole union) is handled. That is one of the changes in 5.7 that you’ll have to work around.