One of the new features of MySQL 5.6 is the optimizer trace. What does it do? Well, you know Explain, right? Just add Explain before a query and MySQL will print out how it would execute the query. The optimizer trace does that, except it gives you a little more background into to how it makes its decisions. Think of it as “It’s a Wonderful Life” for indexes – it shows you what the world would be like without that primary key or index.
If you ask, why did MySQL choose to run a full table scan vs. an index seek, the optimizer trace can help you get a better picture of the situation. The other big deal with the optimizer trace is that it’ll show you the relative costs of operations. Yeah, MySQL already tells you how many rows it’ll have to read, but now you know how much those cost given the access path (index seek, scan, etc.)
Example
Let’s jump in to see what it looks like. First, enable the optimizer trace (this is restricted to MySQL 5.6, so if you use it on an earlier version, you’ll get a warning):
Set optimizer_trace="enabled=on";
Now we can run a query (I’m using the sample employee database):
select dept_no from departments join dept_manager using(dept_no) where dept_manager.to_date > now() group by dept_no having count(emp_no) > 1;
Now we can get the trace:
Select * From Information_Schema.Optimizer_Trace;
Once we’re done tracing, we can disable it:
Set optimizer_trace="enabled=off";
This will produce a big ol’ JSON document, with a whole bunch of fun stuff in it:
{ "steps": [ { "join_preparation": { "select#": 1, "steps": [ { "expanded_query": "/* select#1 */ select `departments`.`dept_no` AS `dept_no` from (`departments` join `dept_manager` on((`departments`.`dept_no` = `dept_manager`.`dept_no`))) where (`dept_manager`.`to_date` > now()) group by `departments`.`dept_no` having (count(`dept_manager`.`emp_no`) > 1) limit 0,1000" } ] } },...
You can see the full version here. I used to Online JSON Viewer to sort through the layers.
Interesting Parts
At the first level, we can see that the optimizer goes through three parts: preparation, optimization, and execution. The optimization step is where the meat of the action happens.
In step 1 and 2, we see conditional processing, which is all things MySQL can assume about the stuff in the Where, On, and Having Clauses. The only thing I saw in equality propagation is the application of a transitive closure (if a=b and b=c then a=c), but there might be other optimizations that are applied. The next bit is constant propagation, which means that constants will be evaluated. Trivial condition removal takes out conditions like 1=1 or (a=b and ab).
There are other steps in between, but I’m not going to cover those right now. The next interesting piece is the “considered execution plans”, which also includes the chosen plan. Each item under the considered execution plan is a starting table, and each access path is listed. The cool thing about this is that is shows the costs of accessing the table for each plan.
The query optimizer is choosing in what order to join the tables. As you get more tables in the join, there are more possibilities (specifically, n! combinations where n is the number of tables). Note that MySQL tries to use the order you specified first.
As a note, I would love to hear from the optimizer team to get better definitions for the terms in this document. This is an exciting piece of functionality and I’d like to understand it better.
How to Use It
The first use is just learning how MySQL works. Understanding the magical steps that take your English-like SQL statement and transform it into nitty-gritty IO is useful when you’re building a high-performance system. Sometimes the magic breaks down and the leaky abstraction leaks.
The other use is double-checking your work. When you see that the trivial condition removal step removes your entire Where clause, perhaps you made a typo.
The final use I’ll mention is understanding how much joins are costing you. When you bring in that other table, how much is it actually costing you vs. the rest of the query. When I was programming in SQL Server, this was a great tool to provide management with a quick estimate of how much feature X impacted our performance. It looks like MySQL will get cost estimations directly in the Explain plan in 5.7.
Additional Notes
If you’re trying to view a particularly large trace, it might split beyond what MySQL Workbench can handle. You can just dump the trace directly into a file like so:
Select Trace Into Dumpfile "C:/temp/trace.txt" From Information_Schema.Optimizer_Trace;
Or if you’re on Linux/Mac OS X:
Select Trace Into Dumpfile "/tmp/trace.txt" From Information_Schema.Optimizer_Trace;
By default, this captures just one trace. You can configure it to keep track of more than one:
Set optimizer_trace_offset=-2; Set optimizer_trace_limit=2;
You can replace 2 with however many you’d like to see (the offset is negative).
Want to hear more about what I’ve learned about MySQL? You can subscribe below and I’ll send you updates on a new book I’m writing that helps you find the goodies in MySQL. If you sign-up, I’ll also send out stuff on MySQL like tips and tutorials.
Email
Your Name
No spam
Further Reading
Introducing MySQL Optimizer Trace