Introduction # In this blog post, we will discuss an example of a change to the Vitess query planner and how it enhances the optimization process. The new model focuses on making every step in the optimization pipeline a runnable plan. This approach offers several benefits, including simpler understanding and reasoning, ease of testing, and the ability to use arbitrary expressions in ordering, grouping, and aggregations. Vitess distributed query planner # VTGate is the proxy component of Vitess.
In this blog post, I’ll look into four ways MySQL executes GROUP BY.
In my previous blog post, we learned that indexes or other means of finding data might not be the most expensive part of query execution. For example, MySQL GROUP BY could potentially be responsible for 90% or more of the query execution time.
The main complexity when MySQL executes GROUP BY is computing aggregate functions in a GROUP BY statement. How this works is shown in the documentation for UDF Aggregate Functions. As we see, the requirement is that UDF functions get all values that constitute the single group one …
[Read more]In this blog post, I’ll look at whether optimizing indexing is always the key to improving MySQL query performance (spoiler, it isn’t).
As we look at MySQL query performance, our first concern is often whether a query is using the right indexes to retrieve the data. This is based on the assumption that finding the data is the most expensive operation – and the one you should focus on for MySQL query optimization. However, this is not always the case.
Let’s look at this query for illustration:
mysql> show create table tbl G *************************** 1. row *************************** Table: tbl Create Table: CREATE TABLE `tbl` ( `id` int(11) NOT NULL AUTO_INCREMENT, `k` int(11) NOT NULL DEFAULT '0', `g` int(10) unsigned NOT NULL, PRIMARY KEY (`id`), KEY `k_1` (`k`) ) ENGINE=InnoDB …[Read more]
New MySQL version, YAY!
MySQL 5.7 is full of new features, like virtual columns, virtual indexes and JSON fields! But, it came with some changes to the default configuration. When running:
SELECT @@GLOBAL.sql_mode;
We get:
ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
What I want to talk about is the ONLY_FULL_GROUP_BY
mode. This mode rejects queries where nonaggregated columns are
expected, but aren’t on the GROUP BY
or
HAVING
clause. Before MySQL 5.7.5,
ONLY_FULL_GROUP_BY
was disabled by default, now it
is enabled.
You know the drill…
This is a simple statement, people use it everywhere, it shouldn’t be that hard to use, right?
Given the following schema:
Suppose I want to list all users that commented on …
[Read more]We usually try to avoid subselects because sometimes they force the use of a temporary table and limits the use of indexes. But, when is good to use a subselect?
This example was tested over table a (1310723 rows), b, c and d ( 5 rows each) and with MySQL version 5.5 and 5.6.
Let’s suppose we have a query like this:
select a.name,sum(a.count) aSum,avg(a.position) aAVG,b.col1,c.col2,d.col3 from a join b on (a.bid = b.id) join c on (a.cid = c.id) join d on (a.did = d.id) group by a.name,b.id,c.id,d.id
What will MySQL do? First it will take the entire data set – this means that will go through each row scanning the value of “bid,” “cid” and “did” and then apply the join to each table. At this point it has the complete data set and then it will start to cluster it, executing the sum and the average functions.
Let’s analyze it step by step:
- Scan each row of table a which …
Friend and former colleague Roland Bouwman has written an excellent update on the GROUP BY implementation in MySQL.
MySQL’s implementation of GROUP BY has historically been quirky. Sometimes that quirkiness has been useful, but often it causes grief as SQL authors can make mistakes that are executed but don’t produce the results they want (or expect).
Simple example:
SELECT cat, COUNT(val) as cnt, othercol FROM tbl GROUP BY cat
The ‘cat‘ column is in the GROUP BY clause, the COUNT(val) is an aggregate, but the ‘othercol‘ column is … well… neither. What used to effectively happen is that the server would pick one othercol value from within each group. As I noted before, sometimes useful but often a pest as the server wouldn’t know if …
[Read more]Today, Oracle announced the availability of the Development Milestone Release 15 of MySQL 5.7.5. The tagline for this release promises "Enhanced Database Performance and Manageability". That may sound rather generic, the actual list of changes and improvements is simply *huge*, and includes many items that I personally find rather exciting! Perhaps I'm mistaken but I think this may be one of the largest number of changes packed into a MySQL point release that I've witnessed in a long time. The list of changes includes improvements such as:
- InnoDB improvements: Simplified tablespace recovery, support for spatial indexes, dynamic configuration …
Today we had an interesting situation where the same query was executed significantly slower when it was written with GROUP BY instead of DISTINCT and I saw many people still had the assumption that these two types of queries are actually equivalent which is simply not true. Although DISTINCT queries can be implemented using GROUP BY but not every GROUP BY query can be translated to DISTINCT. Depending on the brand and the optimizer the database server may actually use group by internally for the execution of distinct but that won’t make them equivalent. Let’s see why…
GROUP BY as the name suggest groups the result by some set of parameters and evaluate the whole result set. In most databases group by is implemented based on sorting and the same rules applies to it as well.
DISTINCT will make sure that the same row won’t be returned in the result set twice. Distinct doesn’t necessary …
[Read more]MySQL is a database of compromise. Compromise between running a production-ready relational database and being popular with all sorts of hackers - mostly the ones that don't really like SQL. And because they don't really like SQL, they choose MySQL, as MySQL is very forgiving. It is just as forgiving as their favourite language PHP, … Continue reading MySQL Bad Idea #384 →
In my opinion, one of the best things that happened to Planet MySQL
lately, is Explain Extended, a blog by Alex Bolenok (also
known as Quassnoi on Stackoverflow).
I never had the pleasure of meeting Alex in person, but his
articles are always interesting and of high quality, and the SQL
wizardry he pulls off is downright inspiring. I really feel
humbled by the creativity of some of his solutions and his
apparent experience with multiple RDBMS products.
Alex' most recent post is about aggregation, and
finding a top 3 based on the aggregate:
In …