Versioning MySQL data

As a developer you’re probably using a versioning control system, like subversion or git, to safeguard your data. Advantages of using a VCS are that you can walk to the individual changes for a document, see who made each change and revert back to specific revision if needed. These are features which would also be nice for data stored in a database. With the use of triggers we can implement versioning for data stored in a MySQL db.

The revisioning table
We will not store the different versions of the records in the original table. We want this solution to be in the database layer instead of putting all the logic in the application layer. Instead we’ll create a new table, which stores all the different versions and lives next to the original table, which only contains the current version of each record. This revisioning table is copy of the original table, with a couple of additional fields.

CREATE TABLE `_revision_mytable` LIKE `mytable`;

ALTER TABLE `_revision_mytable`
  CHANGE `id` `id` int(10) unsigned,
  DROP PRIMARY KEY,
  ADD `_revision` bigint unsigned AUTO_INCREMENT,
  ADD `_revision_previous` bigint unsigned NULL,
  ADD `_revision_action` enum('INSERT','UPDATE') default NULL,
  ADD `_revision_user_id` int(10) unsigned NULL,
  ADD `_revision_timestamp` datetime NULL default NULL,
  ADD `_revision_comment` text NULL,
  ADD PRIMARY KEY (`_revision`),
  ADD INDEX (`_revision_previous`),
  ADD INDEX `org_primary` (`id`);

The most important field is `_revision`. This field contains a unique identifier for a version of a record from the table. Since this is the unique identifier in the revisioning table, the original id field becomes a normal (indexed) field.

We’ll also store some additional information in the revisioning table. The `_revision_previous` field hold the revision nr of the version that was updated to create this revision. Field `_revision_action` holds the action that was executed to create this revision. This field has an extra function that will discussed later. The user id and timestamp are useful for blaming changes on someone. We can add some comment per revision.

The database user is probably always the same. Storing this in the user id field is not useful. Instead, we can set variable @auth_id after logging in and on connecting to the database to the session user.

Altering the original table
The original table needs 2 additional fields: `_revision` and `_revision_comment`. The `_revision` field holds the current active version. The field can also be used to revert to a different revision. The value of `_revision_comment` set on an update or insert will end up in the revisioning table. The field in the original table will always be empty.

ALTER TABLE `mytable`
  ADD `_revision` bigint unsigned NULL,
  ADD `_revision_comment` text NULL,
  ADD UNIQUE INDEX (`_revision`);

The history table
Saving each version is not enough. Since we can revert back to older revisions and of course delete the record altogether, we want to store which version of the record was enabled at what time. The history table only needs to hold the revision number and a timestamp. We’ll add the primary key fields, so it’s easier to query. A user id field is included again to blame.

CREATE TABLE `_revhistory_mytable` (
  `id` int(10) unsigned,
  `_revision` bigint unsigned NULL,
  `_revhistory_user_id` int(10) unsigned NULL,
  `_revhistory_timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP,
  INDEX (`id`),
  INDEX (_revision),
  INDEX (_revhistory_user_id),
  INDEX (_revhistory_timestamp)
) ENGINE=InnoDB;

How to use
Inserting, updating and deleting data should work as normal, including the INSERT … ON DUPLICATE KEY UPDATE syntax. When updating the _revision field shouldn’t be changed.

To switch to a different version, we would do something like

UPDATE mytable SET _revision=$rev WHERE id=$id;

However if the record has been deleted, there will be no record in the original table, therefore the update won’t do anything. Instead we could insert a record, specifying the revision.

INSERT INTO mytable SET _revision=$rev;

We can combine these two into a statement that works either way.

INSERT INTO mytable SET id=$id, _revision=$rev ON DUPLICATE KEY UPDATE _revision=VALUES(_revision);

The above query shows that there an additional constraint. The only thing that indicates that different versions is of the same record, is the primary key. Therefore value of the primary key can’t change on update. This might mean that some tables need to start using surrogate keys if they are not.

On Insert
Let’s dive into the triggers. We’ll start with before insert. This trigger should get the values of a revision when the _revision field is set, or otherwise add a new row to the revision table.

CREATE TRIGGER `mytable-beforeinsert` BEFORE INSERT ON `mytable`
  FOR EACH ROW BEGIN
    DECLARE `var-id` int(10) unsigned;
    DECLARE `var-title` varchar(45);
    DECLARE `var-body` text;
    DECLARE `var-_revision` BIGINT UNSIGNED;
    DECLARE revisionCursor CURSOR FOR SELECT `id`, `title`, `body` FROM `_revision_mytable` WHERE `_revision`=`var-_revision` LIMIT 1;
  
    IF NEW.`_revision` IS NULL THEN
      INSERT INTO `_revision_mytable` (`_revision_comment`, `_revision_user_id`, `_revision_timestamp`) VALUES (NEW.`_revision_comment`, @auth_uid, NOW());
      SET NEW.`_revision` = LAST_INSERT_ID();
    ELSE
      SET `var-_revision`=NEW.`_revision`;
      OPEN revisionCursor;
      FETCH revisionCursor INTO `var-id`, `var-title`, `var-body`;
      CLOSE revisionCursor;
      
      SET NEW.`id` = `var-id`, NEW.`title` = `var-title`, NEW.`body` = `var-body`;
    END IF;
    
    SET NEW.`_revision_comment` = NULL;
  END

CREATE TRIGGER `mytable-afterinsert` AFTER INSERT ON `mytable`
  FOR EACH ROW BEGIN
    UPDATE `_revision_mytable` SET `id` = NEW.`id`, `title` = NEW.`title`, `body` = NEW.`body`, `_revision_action`='INSERT' WHERE `_revision`=NEW.`_revision` AND `_revision_action` IS NULL;
    INSERT INTO `_revhistory_mytable` VALUES (NEW.`id`, NEW.`_revision`, @auth_uid, NOW());
  END

If the `_revision` field is NULL, we insert a new row into the revision table. This action is primarily to get a revision number. We set the comment, user id and timestamp. We won’t set the values, action and previous id yet. The insert might fail or be converted into an update action by insert on duplicate key update. If the insert action fails, we’ll have an unused row in the revisioning table. This is a problem, since the primary key has not been set, so it won’t show up anywhere. We can clean up these phantom records once in a while to keep the table clean.

When `_revision` is set, we use a cursor to get the values from the revision table. We can’t fetch to values directly into NEW, therefore we first fetch them into variables and than copy that into NEW.

After insert, we’ll update the revision, setting the values and the action. However, the insert might have been an undelete action. In that case `_revision_action` is already set and we don’t need to update the revision. We also add an entry in the history table.

On Update
The before and after update trigger do more or less the same as the before and after insert trigger.

CREATE TRIGGER `mytable-beforeupdate` BEFORE UPDATE ON `mytable`
  FOR EACH ROW BEGIN
    DECLARE `var-id` int(10) unsigned;
    DECLARE `var-title` varchar(45);
    DECLARE `var-body` text;
    DECLARE `var-_revision` BIGINT UNSIGNED;
    DECLARE `var-_revision_action` enum('INSERT','UPDATE','DELETE');
    DECLARE revisionCursor CURSOR FOR SELECT `id`, `title`, `body`, `_revision_action` FROM `_revision_mytable` WHERE `_revision`=`var-_revision` LIMIT 1;
    
    IF NEW.`_revision` = OLD.`_revision` THEN
      SET NEW.`_revision` = NULL;
      
    ELSEIF NEW.`_revision` IS NOT NULL THEN 
      SET `var-_revision` = NEW.`_revision`;
      
      OPEN revisionCursor;
      FETCH revisionCursor INTO `var-id`, `var-title`, `var-body`, `var-_revision_action`;
      CLOSE revisionCursor;
      
      IF `var-_revision_action` IS NOT NULL THEN
        SET NEW.`id` = `var-id`, NEW.`title` = `var-title`, NEW.`body` = `var-body`;
      END IF;
    END IF;

    IF (NEW.`id` != OLD.`id` OR NEW.`id` IS NULL != OLD.`id` IS NULL) THEN
-- Workaround for missing SIGNAL command
      DO `Can't change the value of the primary key of table 'mytable' because of revisioning`;
    END IF;

    IF NEW.`_revision` IS NULL THEN
      INSERT INTO `_revision_mytable` (`_revision_previous`, `_revision_comment`, `_revision_user_id`, `_revision_timestamp`) VALUES (OLD.`_revision`, NEW.`_revision_comment`, @auth_uid, NOW());
      SET NEW.`_revision` = LAST_INSERT_ID();
    END IF;
    
    SET NEW.`_revision_comment` = NULL;
  END

CREATE TRIGGER `mytable-afterupdate` AFTER UPDATE ON `mytable`
  FOR EACH ROW BEGIN
    UPDATE `_revision_mytable` SET `id` = NEW.`id`, `title` = NEW.`title`, `body` = NEW.`body`, `_revision_action`='UPDATE' WHERE `_revision`=NEW.`_revision` AND `_revision_action` IS NULL;
    INSERT INTO `_revhistory_mytable` VALUES (NEW.`id`, NEW.`_revision`, @auth_uid, NOW());
  END

If `_revision` is not set, it has the old value. In that case a new revision should be created. Setting `_revision` to NULL will have the same behaviour of not setting `_revision`. Next to the comment, user id and timestamp, we add also set the previous revision.

As said before, it’s very important that the value of primary key doesn’t change. We need to check this and trigger an error, if it would be changed.

On Delete
Deleting won’t create a new revisiong. However we do want to log that the record has been deleted. Therefore we add an entry to the history table with `_revision` set to NULL.

CREATE TRIGGER `mytable-afterdelete` AFTER DELETE ON `mytable`
  FOR EACH ROW BEGIN
    INSERT INTO `_revhistory_mytable` VALUES (OLD.`id`, NULL, @auth_uid, NOW());
  END

To conclude
Using triggers we can implement the basic versioning functionality to MySQL. Since this is completely done the by database, it can be added to an existing application, without having to change to application code (or with very little changes). Using the history table, we can get the data of the database on any moment in time.

There are some situations where this solution as a bit to basic. A record might span across multiple table, like an invoice with invoice lines. In that case, we don’t want to revision each individual invoice line, but the invoice as a whole. I’ll come around in a follow up with a solution for this. I can tell up front that this solution is unfortunately not as clean as these basics.

Continue reading
Please continue reading the follow up article ‘Versioning MySQL data: Multi-table records‘. At the bottom of that article you’ll find a download link for a script that adds revisioning to existing MySQL tables.