In the article ‘Versioning MySQL data‘, I showed the basics of
implementing a revisioning system using trigger. As Jens Schauder
already pointed out, often the data of a record is spread across
multiple tables, like an invoice with multiple invoice lines.
Having each invoice line versioned individually isn’t really
useful. Instead we want a new revision of the whole invoice on
each change.
The perfect solution
Ideally a change of one or more parts of the invoice would be
changed, a new revision would be created. There are several
issues in actually creating this those. Detecting the change of
multiple parts of the invoice at once, generating a single
revision, would mean we need to know if the actions are done
within the same transaction. Unfortunately there is a
connection_id(), but no transaction_id() function in MySQL. Also,
the query would fail when a query inserts or
updates a record in the child table, using the parent table. We
need to come up with something else.
In the implementation we currently have in production, we version the rows in the parent as well in the child tables. For each version of the parent row, we register which versions of the child rows ware set. This however has really complicated the trigger code and tends to need a lot of checking an querying slowing the write process down. Since nobody ever looks at the versions of the child rows, the application forces a new version of the parent row. The benefits of versioning both are therefor minimal.
Only versioning the parent
For this new (simplified) implementation, we will only have one
revision number across all tables of the record. Changing data
from the parent table, will trigger a new version. This will not
only copy the parent row to the revisioning table, but also the
rows of the children.
Writing to the child will not trigger a new version, instead it will update the data in the revisioning table. This means that when changing the record, you need to write to the parent table, before writing to the child tables. To force a new version without changing values use
UPDATE mytable SET _revision=NULL where id=$id
The parent and child tables are defined as
CREATE TABLE `mytable` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(255) NOT NULL DEFAULT '', `description` text, PRIMARY KEY (`id`), UNIQUE KEY `name` (`name`) ) ENGINE=InnoDB CREATE TABLE `mychild` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `mytable_id` int(10) unsigned NOT NULL DEFAULT '0', `title` varchar(255) NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `mytable_id` (`mytable_id`), CONSTRAINT `mychild_ibfk_1` FOREIGN KEY (`mytable_id`) REFERENCES `mytable` (`id`) ON DELETE CASCADE ) ENGINE=InnoDB
Note that we are using InnoDB tables here. MyISAM doesn’t have foreign key constraints, therefor it’s not possible to define a parent-child relationship.
Insert, update and delete
In the parent trigger, two different things happen concerning the
child rows. When a new version is created, the data of `mychild`
is copied to the revisioning table. On a revision switch, data
will be copied from the revisioning table into `mychild`. The
“`_revision_action` IS NULL” condition, means that
`_revision_mytable` is only updated when a new revision is
created.
CREATE TRIGGER `mytable-afterupdate` AFTER update ON `mytable` FOR EACH ROW BEGIN DECLARE `newrev` BOOLEAN; UPDATE `_revision_mytable` SET `id` = NEW.`id`, `name` = NEW.`name`, `description` = NEW.`description`, `_revision_action`='update' WHERE `_revision`=NEW.`_revision` AND `_revision_action` IS NULL; SET newrev = (ROW_COUNT() > 0); INSERT INTO `_revhistory_mytable` VALUES (NEW.`id`, NEW.`_revision`, @auth_uid, NOW()); IF newrev THEN INSERT INTO `_revision_mychild` SELECT *, NEW.`_revision` FROM `mychild` WHERE `mytable_id` = NEW.`id`; ELSE DELETE `t`.* FROM `mychild` AS `t` LEFT JOIN `_revision_mychild` AS `r` ON 0=1 WHERE `t`.`mytable_id` = NEW.`id`; INSERT INTO `mychild` SELECT `id`, `mytable_id`, `title` FROM `_revision_mychild` WHERE `_revision` = NEW.`_revision`; END IF; END CREATE TRIGGER `mychild-afterinsert` AFTER INSERT ON `mychild` FOR EACH ROW BEGIN DECLARE CONTINUE HANDLER FOR 1442 BEGIN END; INSERT IGNORE INTO `_revision_mychild` (`id`, `mytable_id`, `title`, `_revision`) SELECT NEW.`id`, NEW.`mytable_id`, NEW.`title`, `_revision` FROM `mytable` AS `p` WHERE `p`.`id`=NEW.`mytable_id`; END CREATE TRIGGER `mychild-afterupdate` AFTER UPDATE ON `mychild` FOR EACH ROW BEGIN REPLACE INTO `_revision_mychild` (`id`, `mytable_id`, `title`, `_revision`) SELECT NEW.`id`, NEW.`mytable_id`, NEW.`title`, `_revision` FROM `mytable` AS `p` WHERE `p`.`id`=NEW.`mytable_id`; END CREATE TRIGGER `mychild-afterdelete` AFTER DELETE ON `mychild` FOR EACH ROW BEGIN DECLARE CONTINUE HANDLER FOR 1442 BEGIN END; DELETE `r`.* FROM `_revision_mychild` AS `r` INNER JOIN `mytable` AS `p` ON `r`.`_revision` = `p`.`_revision` WHERE `r`.`id` = OLD.`id`; END
Changing data in table `mychild` simply updates the data in the revisioning table. The revision number is grabbed from the field in the parent table.
Switching the revision can only be done through the parent table. This will also automatically change the data in the child tables. We simply delete all rows of the record and replace them with data from the revisioning table. This would however trigger the deletion of the data in `_revision_child` on which the insert has nothing to do. To prevent this, we can abuse that fact that a trigger can’t update data of a table using in the insert/update/delete query. This causes error 1442. With a continue handler we can ignore this silently.
The InnoDB constraints will handle the cascading delete. Deleting child data won’t activate the deletion trigger, which is all the better in this case.
Without a primary key
A primary key is not required for the child table, since
versioning is done purely based on the id of `mytable`.
CREATE TABLE `mypart` ( `mytable_id` int(10) unsigned NOT NULL, `reference` varchar(255) NOT NULL, KEY `mytable_id` (`mytable_id`), CONSTRAINT `mypart_ibfk_1` FOREIGN KEY (`mytable_id`) REFERENCES `mytable` (`id`) ON DELETE CASCADE ) ENGINE=InnoDB
This does cause an issue for the update and delete triggers of the child table. It can’t use the primary to id to locate the current version of the modified/removed row. This can be solved by a trick I got from PhpMyAdmin. We can simply locate the record by comparing the old values of all fields. There is no constraint for the table enforcing the uniqueness of a row, so we could be targeting multiple identical rows. Since they are identical, it doesn’t matter which one we target, as long as we limit to 1 row.
CREATE TRIGGER `mypart-afterupdate` AFTER UPDATE ON `mypart` FOR EACH ROW BEGIN DELETE FROM `_revision_mypart` WHERE `_revision` IN (SELECT `_revision` FROM `mytable` WHERE `id` = OLD.`mytable_id`) AND `mytable_id` = OLD.`mytable_id` AND `reference` = OLD.`reference` LIMIT 1; INSERT INTO `_revision_mypart` (`mytable_id`, `reference`, `_revision`) SELECT NEW.`mytable_id`, NEW.`reference`, `_revision` FROM `mytable` AS `p` WHERE `p`.`id`=NEW.`mytable_id`; END CREATE TRIGGER `mypart-afterdelete` AFTER DELETE ON `mypart` FOR EACH ROW BEGIN DECLARE CONTINUE HANDLER FOR 1442 BEGIN END; DELETE FROM `_revision_mypart` WHERE `_revision` IN (SELECT `_revision` FROM `mytable` WHERE `id` = OLD.`mytable_id`) AND `mytable_id` = OLD.`mytable_id` AND `reference` = OLD.`reference` LIMIT 1; END
Unique keys
The revisioning table has multiple versions of a record. Unique
indexes from the original table should be converted to non-unique
indexes in the revisioning table. This information can be fetched
using INFORMATION_SCHEMA.
SELECT c.CONSTRAINT_NAME, GROUP_CONCAT(CONCAT('`', k.COLUMN_NAME, '`')) AS cols FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS `c` INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS `k` ON c.TABLE_SCHEMA=k.TABLE_SCHEMA AND c.TABLE_NAME=k.TABLE_NAME AND c.CONSTRAINT_NAME=k.CONSTRAINT_NAME WHERE c.TABLE_SCHEMA=DATABASE() AND c.TABLE_NAME='mytable' AND c.CONSTRAINT_TYPE='UNIQUE' AND c.CONSTRAINT_NAME != '_revision' GROUP BY c.CONSTRAINT_NAME
Revisioning and replication
Baron Schwartz pointed out a race condition when relying on auto-increment
keys in triggers with replication. Actions carried out through
triggers on a master are not replicated to a slave server. Instead,
triggers on the slave will be invoked, which
should do the same action as on the master.
It probably isn’t needed to have a copy of the revisioning tables on the slave. This would mean that we could simply omit the triggers. Unfortunately this causes problems when changing the revision. In that case we are forced to move switching of a revision out of the database. Instead the application needs to select the data from all revisioning tables and write that to the original tables. Any other thoughts on solving this issue are welcome.