A while back one of my foreign keys started causing trouble. The
problem was that some parent rows had tens of thousand of child
rows, and the foreign key was defined with CASCADE
DELETE
enabled. When we deleted one of those parent rows
on a master database, it took several seconds to execute the
delete because of the cascade. This led to latency for the end
user, and also led to replication delays.
The immediate solution was make the application tolerant of orphaned rows in the child table and to drop the explicit foreign key constraint.
I didn't really want to leave those orphaned rows hanging around in the child table, so I decided to implement an asynchronous process to delete the orphaned rows on a scheduled basis. Read on for a description of that process.
Using the sakila database as an example, imagine I drop
the foreign key between file_category
and
category
, like so:
alter table sakila.film_category drop foreign key
fk_film_category_category;
Without the foreign key in place, deletes on the
category
table lead to orphaned rows in
film_category
. For example, I will delete the "New"
category:
``` mysql> delete from sakila.category
-> where name = 'New';
Query OK, 1 row affected (0.01 sec) ```
There are several ways to count the orphaned rows. Here are two
different naive implementations using OUTER JOIN
or
NOT EXISTS
:
``` mysql> select count(*)
-> from sakila.film_category c
-> left outer join sakila.category p on p.category_id = c.category_id
-> where p.category_id is null;
+----------+ | count(*) | +----------+ | 63 | +----------+ 1 row in set (0.00 sec)
mysql> select count(*)
-> from sakila.film_category c
-> where not exists
-> (
-> select NULL from sakila.category p where p.category_id = c.category_id
-> );
+----------+ | count(*) | +----------+ | 63 | +----------+ 1 row in set (0.01 sec) ```
I can also delete the orphaned rows using the same query approaches. I'll roll the first delete back so I can demonstrate the second query in the same session:
``` mysql> set autocommit = 0; Query OK, 0 rows affected (0.00 sec)
mysql> delete c.*
-> from sakila.film_category c
-> left outer join sakila.category p on p.category_id = c.category_id
-> where p.category_id is null;
Query OK, 63 rows affected (0.02 sec)
mysql> rollback; Query OK, 0 rows affected (0.00 sec)
mysql> delete c.*
-> from sakila.film_category c
-> where not exists (select NULL from sakila.category p where p.category_id = c.category_id);
Query OK, 63 rows affected (0.01 sec)
mysql> commit; Query OK, 0 rows affected (0.00 sec) ```
This approach will be very slow for tables containing millions of rows, so in my real world case I didn't use this approach. Instead I decided it would be a lot easier and faster to delete the orphaned rows if I knew who their parent was. To this end I created a new table to track the deleted rows, and populated it using a trigger. Continuing the example in the sakila database:
``` CREATE TABLE category_deleted ( category_id tinyint(3) unsigned NOT NULL, name varchar(25) NOT NULL, last_update timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, delete_time timestamp NOT NULL, PRIMARY KEY (category_id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DELIMITER $$
DROP TRIGGER IF EXISTS sakila.TR_A_DEL_CATEGORY $$
CREATE TRIGGER sakila.TR_A_DEL_CATEGORY AFTER DELETE ON sakila.category FOR EACH ROW BEGIN
INSERT IGNORE INTO sakila.category_deleted (category_id, name, last_update, delete_time) VALUES (old.category_id, old.name, old.last_update,now());
END $$
DELIMITER ; ```
Now I can delete another category to test the trigger:
``` mysql> -- delete a single category mysql> delete from sakila.category
-> where name = 'Classics';
Query OK, 1 row affected (0.01 sec)
mysql> -- verify the trigger worked mysql> select * from sakila.category_deleted; +-------------+----------+---------------------+---------------------+ | category_id | name | last_update | delete_time | +-------------+----------+---------------------+---------------------+ | 4 | Classics | 2006-02-15 04:46:27 | 2013-05-21 18:21:53 | +-------------+----------+---------------------+---------------------+ 1 row in set (0.00 sec)
mysql> -- count the orphaned rows using the _deleted table mysql> select count(*)
-> from sakila.film_category c
-> inner join sakila.category_deleted p on p.category_id = c.category_id;
+----------+ | count(*) | +----------+ | 57 | +----------+ 1 row in set (0.00 sec) ```
I also wanted to execute the deletes on the child table in
chunks, so I implemented a stored procedure to delete the
orphaned rows by iterating through the rows in the
_deleted
table, deleting the child rows, and then
deleting from the _deleted
table. If I had to
implement it again, I would probably use common_schema to chunk the deletes so I wouldn't
need the stored procedure.
Here's an implementation using common_schema:
``` -- first delete the orphaned rows from the child table set @script := " split(sakila.film_category: delete sakila.film_category.*
from sakila.film_category
inner join sakila.category_deleted on sakila.category_deleted.category_id = sakila.film_category.category_id
) SELECT $split_total_rowcount AS 'rows deleted so far'; "; call common_schema.run(@script);
-- then delete the rows from the _deleted table (assuming they have no children) delete p.* from sakila.category_deleted p left outer join sakila.film_category c on c.category_id = p.category_id where c.category_id is null; ```