Implementing asynchronous cascade delete in MySQL

A while back one of my foreign keys started causing trouble. The problem was that some parent rows had tens of thousand of child rows, and the foreign key was defined with CASCADE DELETE enabled. When we deleted one of those parent rows on a master database, it took several seconds to execute the delete because of the cascade. This led to latency for the end user, and also led to replication delays.

The immediate solution was make the application tolerant of orphaned rows in the child table and to drop the explicit foreign key constraint.

I didn't really want to leave those orphaned rows hanging around in the child table, so I decided to implement an asynchronous process to delete the orphaned rows on a scheduled basis. Read on for a description of that process.

Using the sakila database as an example, imagine I drop the foreign key between file_category and category, like so:

alter table sakila.film_category drop foreign key fk_film_category_category;

Without the foreign key in place, deletes on the category table lead to orphaned rows in film_category. For example, I will delete the "New" category:

``` mysql> delete from sakila.category

-> where name = 'New';

Query OK, 1 row affected (0.01 sec) ```

There are several ways to count the orphaned rows. Here are two different naive implementations using OUTER JOIN or NOT EXISTS:

``` mysql> select count(*)

-> from sakila.film_category c 
-> left outer join sakila.category p on p.category_id = c.category_id 
-> where p.category_id is null;

+----------+ | count(*) | +----------+ | 63 | +----------+ 1 row in set (0.00 sec)

mysql> select count(*)

-> from sakila.film_category c 
-> where not exists 
-> (
->   select NULL from sakila.category p where p.category_id = c.category_id
-> );

+----------+ | count(*) | +----------+ | 63 | +----------+ 1 row in set (0.01 sec) ```

I can also delete the orphaned rows using the same query approaches. I'll roll the first delete back so I can demonstrate the second query in the same session:

``` mysql> set autocommit = 0; Query OK, 0 rows affected (0.00 sec)

mysql> delete c.*

-> from sakila.film_category c 
-> left outer join sakila.category p on p.category_id = c.category_id 
-> where p.category_id is null;

Query OK, 63 rows affected (0.02 sec)

mysql> rollback; Query OK, 0 rows affected (0.00 sec)

mysql> delete c.*

-> from sakila.film_category c 
-> where not exists (select NULL from sakila.category p where p.category_id = c.category_id);

Query OK, 63 rows affected (0.01 sec)

mysql> commit; Query OK, 0 rows affected (0.00 sec) ```

This approach will be very slow for tables containing millions of rows, so in my real world case I didn't use this approach. Instead I decided it would be a lot easier and faster to delete the orphaned rows if I knew who their parent was. To this end I created a new table to track the deleted rows, and populated it using a trigger. Continuing the example in the sakila database:

``` CREATE TABLE category_deleted ( category_id tinyint(3) unsigned NOT NULL, name varchar(25) NOT NULL, last_update timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, delete_time timestamp NOT NULL, PRIMARY KEY (category_id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DELIMITER $$

DROP TRIGGER IF EXISTS sakila.TR_A_DEL_CATEGORY $$

CREATE TRIGGER sakila.TR_A_DEL_CATEGORY AFTER DELETE ON sakila.category FOR EACH ROW BEGIN

INSERT IGNORE INTO sakila.category_deleted (category_id, name, last_update, delete_time) VALUES (old.category_id, old.name, old.last_update,now());

END $$

DELIMITER ; ```

Now I can delete another category to test the trigger:

``` mysql> -- delete a single category mysql> delete from sakila.category

->     where name = 'Classics';

Query OK, 1 row affected (0.01 sec)

mysql> -- verify the trigger worked mysql> select * from sakila.category_deleted; +-------------+----------+---------------------+---------------------+ | category_id | name | last_update | delete_time | +-------------+----------+---------------------+---------------------+ | 4 | Classics | 2006-02-15 04:46:27 | 2013-05-21 18:21:53 | +-------------+----------+---------------------+---------------------+ 1 row in set (0.00 sec)

mysql> -- count the orphaned rows using the _deleted table mysql> select count(*)

-> from sakila.film_category c 
->     inner join sakila.category_deleted p on p.category_id = c.category_id;   

+----------+ | count(*) | +----------+ | 57 | +----------+ 1 row in set (0.00 sec) ```

I also wanted to execute the deletes on the child table in chunks, so I implemented a stored procedure to delete the orphaned rows by iterating through the rows in the _deleted table, deleting the child rows, and then deleting from the _deleted table. If I had to implement it again, I would probably use common_schema to chunk the deletes so I wouldn't need the stored procedure.

Here's an implementation using common_schema:

``` -- first delete the orphaned rows from the child table set @script := " split(sakila.film_category: delete sakila.film_category.*

from sakila.film_category 
inner join sakila.category_deleted on sakila.category_deleted.category_id = sakila.film_category.category_id 

) SELECT $split_total_rowcount AS 'rows deleted so far'; "; call common_schema.run(@script);

-- then delete the rows from the _deleted table (assuming they have no children) delete p.* from sakila.category_deleted p left outer join sakila.film_category c on c.category_id = p.category_id where c.category_id is null; ```