It has been seven years since the last time I blogged about generated columns, and a lot has happened -- now both MariaDB and MySQL support them. So it's time to look again, see how well they're doing, and compare to the SQL standard's Optional feature T175 Generated columns.
This is not an introductory description or an explanation why you'd want to use generated columns rather than (say) triggers and views. For that, I'd recommend the relevant manuals or the blog posts by Alexander Rubin and Anders Karlsson.
The Generation Clause
Standard MariaDB 10.1 MySQL 5.7 --------- ------------ --------- [data type] data_type data type GENERATED ALWAYS [GENERATED ALWAYS] [GENERATED ALWAYS] AS AS AS (expression) (expression) (expression) [VIRTUAL | PERSISTENT] [VIRTUAL | STORED] [constraints] [constraints] [constraints] [COMMENT 'string'] [COMMENT 'string']
The above side-by-side BNFs show the standard syntax and the syntax that MariaDB and MySQL actually allow at the time I'm writing this. The MariaDB manual says incorrectly that either VIRTUAL or PERSISTENT is mandatory. The MySQL manual suggests incorrectly that the clause order is fixed, and has a typo: there should be two "]]"s after "STORED".
The first important deviation from standard SQL is that "data type" is not an optional clause in either MariaDB or MySQL. The data type can be figured out from the type of the expression, after all.
The second important deviation is that [GENERATED ALWAYS] is optional and there's a way to say whether the column is virtual (column value is generated when it's accessed) or persistent/stored (column value is generated when it's set, and kept in the database). I call this a single deviation because it's got a single backgrounder: compatibility with Microsoft SQL Server. In fact the original title of the worklog task (WL#411) was "Computed virtual columns as MS [i.e. Microsoft] SQL server has". We changed it to "Generated columns", but the perfume of Microsoftness lingers, and you'll see traces in the vocabulary too. For example, MariaDB has an error message: ""HY000 A computed column cannot be based on a computed column".
So the tip sheet is: for the sake of compatibility with the standard rather than with Microsoft, always say GENERATED ALWAYS, and call it a "generated" column not a "computed" column. It's okay to say VIRTUAL, though, because Oracle does.
Restrictions
In standard SQL these restrictions apply:
"Every column reference contained in [the generation expression)
shall reference a base column of [the same table]."
In other words, a generated column cannot be based on another
generated column. MariaDB adheres to this, but MySQL, as a
harmless extension, allows
CREATE TABLE t (a INT, b INT GENERATED ALWAYS AS (a), c INT
GENERATED ALWAYS AS (b));
"[The generation expression] shall be deterministic."
This is pretty reasonable, and both MariaDB and MySQL comply.
"[The generation expression] shall not contain a routine
invocation whose subject routine possibly reads SQL-data."
This is reasonable too, but MariaDB and MySQL go much further --
they forbid every user-defined function, even if it's declared
that it's deterministic and reads no SQL data.
"[The generation expression] shall not contain a query
expression".
In other words, GENERATED ALWAYS AS (SELECT ...)" is a no-no.
Again, reasonable, and I doubt it will occur to anyone to try.
Differences between MariaDB and MySQL
We're actually looking at two different implementations --
MariaDB's generated columns come ultimately from a user
contribution by Andrey Zhakov, while MySQL's generated columns
are younger and are more of an in-house development. (Update
added later: Mr Zhakov deserves credit for the MySQL development
too, see the comments.) Things worth noticing are:
* the PERSISTENT versus STORED syntax detail, mentioned
earlier,
* GENERATED is a reserved word in MySQL but not in MariaDB,
* MariaDB has some restrictions about foreign keys that MySQL
doesn't have.
MySQL lacks some restrictions about foreign keys, eh? That could lead to interesting results. I tried this sequence of statements:
CREATE TABLE t1 (s1 INT PRIMARY KEY); CREATE TABLE t2 (s1 INT, s2 INT AS (s1) STORED, FOREIGN KEY (s1) REFERENCES t1 (s1) ON UPDATE CASCADE); INSERT INTO t1 VALUES (1); INSERT INTO t2 VALUES (1, DEFAULT); UPDATE t1 SET s1 = 2; SELECT * FROM t1; SELECT * FROM t2;
And the results from the two SELECTs looked like this:
mysql> SELECT * FROM t1; +----+ | s1 | +----+ | 2 | +----+ 1 row in set (0.00 sec) mysql> SELECT * FROM t2; +------+------+ | s1 | s2 | +------+------+ | 2 | 1 | +------+------+ 1 row in set (0.00 sec)
If you're thinking "whoa, that shouldn't be right", then I'm sure you'll understand why MariaDB doesn't allow this trick.
Metadata
In the standard there are two relevant INFORMAIION_SCHEMA columns: IS_GENERATED and GENERATION_EXPRESSION. In MySQL all we get is a value in the EXTRA column: "VIRTUAL GENERATED". In MariaDB also we get a value in the EXTRA column: "VIRTUAL".
I'd say that both implementations are deficient here -- you can't even see what the "(expression)" was.
Hurray For Everybody
Both MariaDB and MySQL have slight flaws in their implementations of generated columns, but my complaints here shouldn't cause worry. Both are robust, as I found out by wasting time looking for bugs. This feature can be used.
What about us?
In my last blog post I may have been over-confident when I predicted ocelotgui (our GUI for MariaDB and MySQL) would allow multiple connections and go beta this month. We will have something that, I hope, will be far more exciting.