Sometimes MySQL just doesn't choose the most efficient way to
execute a query. GROUP BY is a good example. A
customer recently wanted to add a unique key over some columns of
a large (~50GB) table, and they first had to find duplicates. In
this case, no suitable index was available that would help with
the query. That means "Using temporary". And of course MySQL's
GROUP BY also sorts by default, which means "Using
filesort".
The goal here is just to find duplicates, or more specifically to find the lowest primary key value out of some group of rows with identical values for a subset of columns in the table. I think there is a much more efficient way to solve this problem.
I proposed a temporary table with the primary key column and a hash of the concatenated columns to be used for the grouping. This ended up allowing the customer to finish in 10 minutes an operation that originally resulted in a 30GB …
[Read more]