I love my job. One of the best feelings is when I find an interesting paper and use it to solve a real problem. It feels like I found a cheat code. Instead of having to do a lot of hard thinking, I can just stand on the shoulders of really big people and take a shortcut. Here, I want to share a recent project that I could solve using a public paper.
Introduction # In this blog post, we will discuss an example of a change to the Vitess query planner and how it enhances the optimization process. The new model focuses on making every step in the optimization pipeline a runnable plan. This approach offers several benefits, including simpler understanding and reasoning, ease of testing, and the ability to use arbitrary expressions in ordering, grouping, and aggregations. Vitess distributed query planner # VTGate is the proxy component of Vitess.
Introduction # I recently encountered an intriguing bug. A user reported that their query was causing vtgate to fetch a large amount of data, sometimes resulting in an Out Of Memory (OOM) error. For a deeper understanding of grouping and aggregations on Vitess, I recommend reading this prior blog post. The Query # The problematic query was: selectsum(user.type)fromuserjoinuser_extraonuser.team_id=user_extra.idgroupbyuser_extra.idorderbyuser_extra.id;The planner was unable to delegate aggregation to MySQL, leading to the fetching of a significant amount of data for aggregation.
Updated June 1, 2023.
Suboptimal MySQL ORDER BY implementation, especially together with MySQL LIMIT is often the cause of MySQL performance problems. Here is what you need to know about MySQL ORDER BY LIMIT optimization to avoid these problems.
Try Now: Free your applications with Percona Distribution for MySQL
MySQL LIMIT clause
The MySQL LIMIT clause is a valuable tool for controlling the number of rows returned by a SELECT statement. By specifying the maximum number of rows to retrieve from the result set, it enables you to work with subsets of data, especially in situations involving large tables. This feature enhances query performance and optimizes resource usage by fetching only …
[Read more]Sometimes, the MySQL Optimizer chooses a wrong plan, and a query that should execute in less than 0.1 second ends-up running for 12 minutes ! This is not a new problem: bugs about this can be traced back to 2014, and a blog post on the subject dates of 2015. But even if this is old news, because this problem recently came to my attention, it is a problem worth writing on.
This
When Index Condition Pushdown is used, you cannot trust rows examined as reported by the slow query log (nor in some Performance Schema tables) and you cannot trust the global status Innodb_rows_read (nor the InnoDB Metrics dml_reads). These can lead to an incorrect query complexity analysis, to a mistaken estimation of the total load of a MySQL server, and to a lot of wasted time.&
Query planning is hard # Have you ever wondered what goes on behind the scenes when you execute a SQL query? What steps are taken to access your data? In this article, I'll talk about the history of Vitess's V3 query planner, why we created a new query planner, and the development of the new Gen4 query planner. Vitess is a horizontally scalable database solution which means that a single table can be spread out across multiple database instances.
Originally posted at Andres's blog. Traditional query optimizing is mostly about two things: first, in which order and from where to access data, and then how to then combine it. You have probably seen the tree shapes execution plans that are produced from query planning. I’ll use an example from the MySQL docs, using FORMAT=TREE which was introduced in MySQL 8.0: mysql>EXPLAINFORMAT=TREE->SELECT*->FROMt1->JOINt2->ON(t1.c1=t2.c1ANDt1.c2<t2.c2)->JOINt3->ON(t2.c1=t3.c1)\G***************************1.row***************************EXPLAIN:->Innerhashjoin(t3.c1=t1.c1)(cost=1.05rows=1)->Tablescanont3(cost=0.35rows=1)->Hash->Filter:(t1.c2<t2.c2)(cost=0.70rows=1)->Innerhashjoin(t2.c1=t1.c1)(cost=0.70rows=1)->Tablescanont2(cost=0.35rows=1)->Hash->Tablescanont1(cost=0.35rows=1)Here we can see that the MySQL optimizer thinks the best plan is to start reading from t1 using a table scan.
MySQL introduced optimizer hints in version 5.7 and greatly
extended the feature in MySQL 8. One thing that has been missing
though is the ability to specify index hints using the syntax of
optimizer hints. This has been improved of in MySQL 8.0.20 with
the introduction of index-level optimizer hints for the
FORCE
and IGNORE
versions of the index
hints. This blog will look at the new index hint syntax.
Warning
Do not add index hints – neither using the old or new style –
unless you really need them. When you add index hints, you limit
the options of the optimizer which can prevent the optimizer
obtaining the optimal query plan as new optimizer improvements
are implemented or the data changes.
On …
I will try to make it short and clear: if you are writing SQL
queries with “NOT IN” like
SELECT … WHERE x NOT IN (SELECT y FROM …)
you have to be sure to first understand what happens when “x” or
“y” are NULL: it might not be what you want!…
Facebook Twitter LinkedIn