Showing entries 1 to 9
Displaying posts with tag: group by (reset)
GROUP BY, are you sure you know it?

New MySQL version, YAY!

MySQL 5.7 is full of new features, like virtual columns, virtual indexes and JSON fields! But, it came with some changes to the default configuration. When running:

SELECT @@GLOBAL.sql_mode;

We get:


What I want to talk about is the ONLY_FULL_GROUP_BY mode. This mode rejects queries where nonaggregated columns are expected, but aren’t on the GROUP BY or HAVING clause. Before MySQL 5.7.5, ONLY_FULL_GROUP_BY was disabled by default, now it is enabled.

You know the drill…

This is a simple statement, people use it everywhere, it shouldn’t be that hard to use, right?

Given the following schema:

Suppose I want to list all users that commented on …

[Read more]
Speed up GROUP BY queries with subselects in MySQL

We usually try to avoid subselects because sometimes they force the use of a temporary table and limits the use of indexes. But, when is good to use a subselect?

This example was tested over table a (1310723 rows), b, c and d ( 5 rows each) and with MySQL version 5.5 and 5.6.

Let’s suppose we have a query like this:

select,sum(a.count) aSum,avg(a.position) aAVG,b.col1,c.col2,d.col3
a join
b on ( = join
c on (a.cid = join
d on (a.did =
group by,,,

What will MySQL do? First it will take the entire data set – this means that will go through each row scanning the value of  “bid,” “cid” and “did” and then apply the join to each table. At this point it has the complete data set and then it will start to cluster it, executing the sum and the average functions.

Let’s analyze it step by step:

  1. Scan each row of  table a …
[Read more]
GROUP BY fixed

Friend and former colleague Roland Bouwman has written an excellent update on the GROUP BY implementation in MySQL.

MySQL’s implementation of GROUP BY has historically been quirky. Sometimes that quirkiness has been useful, but often it causes grief as SQL authors can make mistakes that are executed but don’t produce the results they want (or expect).

Simple example:

SELECT cat, COUNT(val) as cnt, othercol FROM tbl GROUP BY cat

The ‘cat‘ column is in the GROUP BY clause, the COUNT(val) is an aggregate, but the ‘othercol‘ column is … well… neither. What used to effectively happen is that the server would pick one othercol value from within each group. As I noted before, sometimes useful but often a pest as the server wouldn’t know if …

[Read more]
MySQL 5.7.5: GROUP BY respects functional dependencies!

Today, Oracle announced the availability of the Development Milestone Release 15 of MySQL 5.7.5. The tagline for this release promises "Enhanced Database Performance and Manageability". That may sound rather generic, the actual list of changes and improvements is simply *huge*, and includes many items that I personally find rather exciting! Perhaps I'm mistaken but I think this may be one of the largest number of changes packed into a MySQL point release that I've witnessed in a long time. The list of changes includes improvements such as:

  • InnoDB improvements: Simplified tablespace recovery, support for spatial indexes, dynamic configuration
[Read more]
Difference between DISTINCT and GROUP BY

Today we had an interesting situation where the same query was executed significantly slower when it was written with GROUP BY instead of DISTINCT and I saw many people still had the assumption that these two types of queries are actually equivalent which is simply not true. Although DISTINCT queries can be implemented using GROUP BY but not every GROUP BY query can be translated to DISTINCT. Depending on the brand and the optimizer the database server may actually use group by internally for the execution of distinct but that won’t make them equivalent. Let’s see why…

GROUP BY as the name suggest groups the result by some set of parameters and evaluate the whole result set. In most databases group by is implemented based on sorting and the same rules applies to it as well.

DISTINCT will make sure that the same row won’t be returned in the result set twice. Distinct doesn’t necessary …

[Read more]
Greatest N per group: top 3 with GROUP_CONCAT()

In my opinion, one of the best things that happened to Planet MySQL lately, is Explain Extended, a blog by Alex Bolenok (also known as Quassnoi on Stackoverflow).

I never had the pleasure of meeting Alex in person, but his articles are always interesting and of high quality, and the SQL wizardry he pulls off is downright inspiring. I really feel humbled by the creativity of some of his solutions and his apparent experience with multiple RDBMS products.

Alex' most recent post is about aggregation, and finding a top 3 based on the aggregate:

In …

[Read more]
Loading a dimension table with SCD1 and SCD2 attributes

Jos, my co-author for the "Building Pentaho Solutions" book just pointed me to a recent article by Jeff Prenevost entitled "The Problem with History".AbstractJeff's topic, loading a hybrid Type 1 / Type 2 slowly changing dimension table is related to data warehousing but maybe of interest outside of that context as well.

As it turns out, the particular problem described by Jeff is non-trivial, but can be solved quite elegantly in a single SQL statment. This may be a compelling alternative to the multi-step, …

[Read more]
Old challenges, new synthax ...

I've blogged before about the new SQL synthax which is becoming available in databases and how it helps solving questions which are increasingly common.Now it's time for another example, something which doesn't come up often in a reporting environment as most tools have this feature, but can be a problem if you're building your output with a scipting language.Adding a "Total" row at the end of

How to pick indexes for order by and group by queries

First some of the things that you need to use and understand

Explain Syntax

Order by Optimization

Group by Optimization

Update: Updated errors.

Now some details that are usually missed. GROUP BY does sorting unless you tell mysql not to. GROUP BY has two optimization methods, loose index scan, and tight index scan.

Loose index scan, scans the entire table index, while tight index scan uses some sort of constraint. For large datasets that are accessed often and require some sort of group by, tight index scans are better.

So how to pick columns to create …

[Read more]
Showing entries 1 to 9