Versioning MySQL data: Multi-table records

In the article ‘Versioning MySQL data‘, I showed the basics of implementing a revisioning system using trigger. As Jens Schauder already pointed out, often the data of a record is spread across multiple tables, like an invoice with multiple invoice lines. Having each invoice line versioned individually isn’t really useful. Instead we want a new revision of the whole invoice on each change.

The perfect solution
Ideally a change of one or more parts of the invoice would be changed, a new revision would be created. There are several issues in actually creating this those. Detecting the change of multiple parts of the invoice at once, generating a single revision, would mean we need to know if the actions are done within the same transaction. Unfortunately there is a connection_id(), but no transaction_id() function in MySQL. Also, the query would fail when a query inserts or updates a record in the child table, using the parent table. We need to come up with something else.

In the implementation we currently have in production, we version the rows in the parent as well in the child tables. For each version of the parent row, we register which versions of the child rows ware set. This however has really complicated the trigger code and tends to need a lot of checking an querying slowing the write process down. Since nobody ever looks at the versions of the child rows, the application forces a new version of the parent row. The benefits of versioning both are therefor minimal.

Only versioning the parent
For this new (simplified) implementation, we will only have one revision number across all tables of the record. Changing data from the parent table, will trigger a new version. This will not only copy the parent row to the revisioning table, but also the rows of the children.

Writing to the child will not trigger a new version, instead it will update the data in the revisioning table. This means that when changing the record, you need to write to the parent table, before writing to the child tables. To force a new version without changing values use

UPDATE mytable SET _revision=NULL where id=$id

The parent and child tables are defined as

CREATE TABLE `mytable` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL DEFAULT '',
  `description` text,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB

CREATE TABLE `mychild` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `mytable_id` int(10) unsigned NOT NULL DEFAULT '0',
  `title` varchar(255) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `mytable_id` (`mytable_id`),
  CONSTRAINT `mychild_ibfk_1` FOREIGN KEY (`mytable_id`) REFERENCES `mytable` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB

Note that we are using InnoDB tables here. MyISAM doesn’t have foreign key constraints, therefor it’s not possible to define a parent-child relationship.

Insert, update and delete
In the parent trigger, two different things happen concerning the child rows. When a new version is created, the data of `mychild` is copied to the revisioning table. On a revision switch, data will be copied from the revisioning table into `mychild`. The “`_revision_action` IS NULL” condition, means that `_revision_mytable` is only updated when a new revision is created.

CREATE TRIGGER `mytable-afterupdate` AFTER update ON `mytable`
  FOR EACH ROW BEGIN
    DECLARE `newrev` BOOLEAN;
    
    UPDATE `_revision_mytable` SET `id` = NEW.`id`, `name` = NEW.`name`, `description` = NEW.`description`, `_revision_action`='update' WHERE `_revision`=NEW.`_revision` AND `_revision_action` IS NULL;
    SET newrev = (ROW_COUNT() > 0);
    INSERT INTO `_revhistory_mytable` VALUES (NEW.`id`, NEW.`_revision`, @auth_uid, NOW());
    
    IF newrev THEN
       INSERT INTO `_revision_mychild` SELECT *, NEW.`_revision` FROM `mychild` WHERE `mytable_id` = NEW.`id`;
    ELSE
       DELETE `t`.* FROM `mychild` AS `t` LEFT JOIN `_revision_mychild` AS `r` ON 0=1 WHERE `t`.`mytable_id` = NEW.`id`;
       INSERT INTO `mychild` SELECT `id`, `mytable_id`, `title` FROM `_revision_mychild` WHERE `_revision` = NEW.`_revision`;
    END IF;
  END

CREATE TRIGGER `mychild-afterinsert` AFTER INSERT ON `mychild`
  FOR EACH ROW BEGIN
    DECLARE CONTINUE HANDLER FOR 1442 BEGIN END;
    INSERT IGNORE INTO `_revision_mychild` (`id`, `mytable_id`, `title`, `_revision`) SELECT NEW.`id`, NEW.`mytable_id`, NEW.`title`, `_revision` FROM `mytable` AS `p` WHERE `p`.`id`=NEW.`mytable_id`;
  END

CREATE TRIGGER `mychild-afterupdate` AFTER UPDATE ON `mychild`
  FOR EACH ROW BEGIN
    REPLACE INTO `_revision_mychild` (`id`, `mytable_id`, `title`, `_revision`) SELECT NEW.`id`, NEW.`mytable_id`, NEW.`title`, `_revision` FROM `mytable` AS `p` WHERE `p`.`id`=NEW.`mytable_id`;
  END

CREATE TRIGGER `mychild-afterdelete` AFTER DELETE ON `mychild`
  FOR EACH ROW BEGIN
    DECLARE CONTINUE HANDLER FOR 1442 BEGIN END;
    DELETE `r`.* FROM `_revision_mychild` AS `r` INNER JOIN `mytable` AS `p` ON `r`.`_revision` = `p`.`_revision` WHERE `r`.`id` = OLD.`id`;
  END

Changing data in table `mychild` simply updates the data in the revisioning table. The revision number is grabbed from the field in the parent table.

Switching the revision can only be done through the parent table. This will also automatically change the data in the child tables. We simply delete all rows of the record and replace them with data from the revisioning table. This would however trigger the deletion of the data in `_revision_child` on which the insert has nothing to do. To prevent this, we can abuse that fact that a trigger can’t update data of a table using in the insert/update/delete query. This causes error 1442. With a continue handler we can ignore this silently.

The InnoDB constraints will handle the cascading delete. Deleting child data won’t activate the deletion trigger, which is all the better in this case.

Without a primary key
A primary key is not required for the child table, since versioning is done purely based on the id of `mytable`.

CREATE TABLE `mypart` (
  `mytable_id` int(10) unsigned NOT NULL,
  `reference` varchar(255) NOT NULL,
  KEY `mytable_id` (`mytable_id`),
  CONSTRAINT `mypart_ibfk_1` FOREIGN KEY (`mytable_id`) REFERENCES `mytable` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB

This does cause an issue for the update and delete triggers of the child table. It can’t use the primary to id to locate the current version of the modified/removed row. This can be solved by a trick I got from PhpMyAdmin. We can simply locate the record by comparing the old values of all fields. There is no constraint for the table enforcing the uniqueness of a row, so we could be targeting multiple identical rows. Since they are identical, it doesn’t matter which one we target, as long as we limit to 1 row.

CREATE TRIGGER `mypart-afterupdate` AFTER UPDATE ON `mypart`
  FOR EACH ROW BEGIN
    DELETE FROM `_revision_mypart` WHERE `_revision` IN (SELECT `_revision` FROM `mytable` WHERE `id` = OLD.`mytable_id`) AND `mytable_id` = OLD.`mytable_id` AND `reference` = OLD.`reference` LIMIT 1;
    INSERT INTO `_revision_mypart` (`mytable_id`, `reference`, `_revision`) SELECT NEW.`mytable_id`, NEW.`reference`, `_revision` FROM `mytable` AS `p` WHERE `p`.`id`=NEW.`mytable_id`;
  END

CREATE TRIGGER `mypart-afterdelete` AFTER DELETE ON `mypart`
  FOR EACH ROW BEGIN
    DECLARE CONTINUE HANDLER FOR 1442 BEGIN END;
    DELETE FROM `_revision_mypart` WHERE `_revision` IN (SELECT `_revision` FROM `mytable` WHERE `id` = OLD.`mytable_id`) AND `mytable_id` = OLD.`mytable_id` AND `reference` = OLD.`reference` LIMIT 1;
  END

Unique keys
The revisioning table has multiple versions of a record. Unique indexes from the original table should be converted to non-unique indexes in the revisioning table. This information can be fetched using INFORMATION_SCHEMA.

SELECT c.CONSTRAINT_NAME, GROUP_CONCAT(CONCAT('`', k.COLUMN_NAME, '`')) AS cols FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS `c` INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS `k` ON c.TABLE_SCHEMA=k.TABLE_SCHEMA AND c.TABLE_NAME=k.TABLE_NAME AND c.CONSTRAINT_NAME=k.CONSTRAINT_NAME WHERE c.TABLE_SCHEMA=DATABASE() AND c.TABLE_NAME='mytable' AND c.CONSTRAINT_TYPE='UNIQUE' AND c.CONSTRAINT_NAME != '_revision' GROUP BY c.CONSTRAINT_NAME

Revisioning and replication
Baron Schwartz pointed out a race condition when relying on auto-increment keys in triggers with replication. Actions carried out through triggers on a master are not replicated to a slave server. Instead, triggers on the slave will be invoked, which should do the same action as on the master.

It probably isn’t needed to have a copy of the revisioning tables on the slave. This would mean that we could simply omit the triggers. Unfortunately this causes problems when changing the revision. In that case we are forced to move switching of a revision out of the database. Instead the application needs to select the data from all revisioning tables and write that to the original tables. Any other thoughts on solving this issue are welcome.

Download
–> Download mysql-revisioning script @github <--