We are considering enabling the SQL mode
ONLY_FULL_GROUP_BY
by default in MySQL 5.7. Here is
a quick example to demonstrate how this change will impact you:
Sample Data:
CREATE TABLE invoice_line_items (id INT NOT NULL PRIMARY KEY auto_increment, invoice_id INT NOT NULL, description varchar(100) ); INSERT INTO invoice_line_items VALUES (NULL, 1, 'New socks'), (NULL, 1, 'A hat'), (NULL, 2, 'Shoes'), (NULL, 2, 'T shirt'), (NULL, 3, 'Tie');
Default behaviour in MySQL 5.6 and below:
mysql> SELECT id, invoice_id, description FROM invoice_line_items GROUP BY invoice_id; +----+------------+-------------+ | id | invoice_id | description | +----+------------+-------------+ | 1 | 1 | New socks | | 3 | 2 | Shoes | | 5 | 3 | Tie | +----+------------+-------------+ 3 rows in set (0.00 sec)
Proposed default behaviour in MySQL 5.7:
mysql> SELECT id, invoice_id, description FROM invoice_line_items GROUP BY invoice_id; ERROR 1055 (42000): 'test.invoice_line_items.id' isn't in GROUP BY
That is to say that in 5.7 this statement will produce an error.
Notice that columns id
and description
are not actually unique? Roland Bouman has a blog post from 2007 that describes what MySQL
is doing here, and how it is different in behaviour to other
databases.
To summarize: MySQL is picking one of the possible values for
id
and description
and the query
executes in a non deterministic way. Here’s some more information
to prove this:
mysql> SELECT id, invoice_id, description, max(description) FROM invoice_line_items GROUP BY invoice_id; +----+------------+-------------+------------------+ | id | invoice_id | description | max(description) | +----+------------+-------------+------------------+ | 1 | 1 | New socks | New socks | | 3 | 2 | Shoes | T shirt | | 5 | 3 | Tie | Tie | +----+------------+-------------+------------------+ 3 rows in set (0.00 sec) mysql> SELECT GROUP_CONCAT(id), invoice_id, GROUP_CONCAT(description) FROM invoice_line_items GROUP BY invoice_id; +------------------+------------+---------------------------+ | GROUP_CONCAT(id) | invoice_id | GROUP_CONCAT(description) | +------------------+------------+---------------------------+ | 1,2 | 1 | New socks,A hat | | 3,4 | 2 | Shoes,T shirt | | 5 | 3 | Tie | +------------------+------------+---------------------------+ 3 rows in set (0.00 sec)
What is problematic about this behaviour is that it can hide bugs in application code. To highlight two blog posts:
- Ike Walker @ Flite explains why he only uses full group by.
- Keyur Govande @ Etsy strongly recommends STRICT_ALL_TABLES and ONLY_FULL_GROUP_BY to avoid data corruption.
The Proposal
We are proposing to change ONLY_FULL_GROUP_BY
to be
enabled by default as of MySQL 5.7. The downside in doing this,
is that some users upgrading will have statements return errors.
These users will be left with two options:
- Set
sql_mode
in their my.cnf (or my.ini) file explicitly, or as part of their application code since sql_mode is configurable per session. - Modify the offending
GROUP BY
statements to be deterministic.
On the second point, we are also proposing introducing an
ANY_VALUE()
SQL function to allow statements that
are ‘acceptable as non-deterministic’ to run easy retrofit in
this less permissive configuration. For example:
mysql> SELECT ANY_VALUE(id) AS id, invoice_id, ANY_VALUE(description) AS description FROM invoice_line_items GROUP BY invoice_id; +------+------------+-------------+ | id | invoice_id | description | +------+------------+-------------+ | 1 | 1 | New socks | | 3 | 2 | Shoes | | 5 | 3 | Tie | +------+------------+-------------+ 3 rows in set (0.01 sec)
Why I personally like this proposal, is that I compare it to buying an undercooked burger in a North American restaurant. We are proposing to switch our policy from “we may serve it to you without you having realized you ordered it” to “sign a waiver, then we will serve it”.
i.e. We are not removing options, but want you to have to opt-in to what can be an unsafe choice, rather than opt-out.
Fans of standard SQL will rejoice at compatibility with SQL-2003, a standard defined after this behavior was first implemented.
Conclusion
As with other changes that have been proposed for MySQL 5.7, we are soliciting your feedback!
- Have you accidentally returned incorrect data, and could have
benefited from having
ONLY_FULL_GROUP_BY
enabled? - Do you maintain an application that relies on the non deterministic behaviour?
- Do you agree or disagree that this change is the better default for users?
Please leave a comment, or get in touch!