As part of the ongoing effort to refactor MySQL and make it better, we are defining the behaviour of certain features and in some cases reimplimenting them in a more maintainable fashion.
I have previously blogged on:
- A proposal to simplify SQL_MODE options.
- The Federated Storage Engine.
- The Query Cache.
- The deprecation of EXPLAIN PARTITIONS and EXPLAIN EXTENDED.
- The deprecation of NULL synonym \N.
Today what I want to write about, is the
ALTER TABLE. From the MySQL manual:
“IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.”
This creates several issues for the MySQL server team:
IGNOREcould remove rows from a parent table when using a foreign key relationship.
IGNOREmakes it impossible to use InnoDB Online DDL for several operations, for example adding a
IGNOREhas some strange side-effects for replication. For example: DDL is always replicated via statement-based replication, and since SQL does not imply ordering, it’s not clear which rows will be deleted as part of the ignore step. I also see cross-version replication problematic if future MySQL versions were to introduce more strictness, since a slave may de-duplicate more rows.
The most common case
We believe that the most common use case for
is to be able to add a
UNIQUE INDEX on a table which
currently has duplicate values present. i.e.
ALTER IGNORE TABLE users ADD UNIQUE INDEX (emailaddress);
In this scenario, a novice user manages to avoid auditing each entry in the users table, and simply lets MySQL pick a row to be kept, with all duplicates automatically removed.
There are two other ways to be able to do that:
Using the same an example as above, return a list of email
PRIMARY KEY values for records that
SELECT GROUP_CONCAT(id), emailaddress, count(*) as count FROM users GROUP BY emailaddress HAVING count >= 2; /* delete or merge duplicate from above query */ ALTER TABLE users ADD UNIQUE INDEX (emailaddress);
Note: This method will be the fastest way, since
when not using
IGNORE, MySQL is able to use InnoDB’s
New table + INSERT IGNORE
While this method looks very similar, internally it’s semantics are quite different:
CREATE TABLE users_new LIKE users; ALTER TABLE users ADD UNIQUE INDEX (emailaddress); INSERT IGNORE INTO users_new SELECT * FROM users; DROP TABLE users; RENAME TABLE users_new TO users;
By creating a table first, the MySQL server will not have to manage rows in a foreign key relationship. The rows will also be re-sent to the slave using row-based replication, so issue (3) I mentioned above does not come into play.
We are looking for feedback on how you use this feature.
Is there another common use-case we are not accounting for?
Would it be acceptable to add a feature to MySQL Workbench to assist in preparing a table for adding a unique index (similar to hand removal)?
Help shape the future of MySQL. Please leave a comment or contact me.