Update on porting table_statistics to MariaDB

Over the past few weeks I’ve been working on porting the per table and per session row stats from the tivo patch into mariadb. This is the show table_statistics functionality. I’ve been pushing the code to a branch on lanchpad as I get features working. Up to revision 2953 the row stats are working but the index stats aren’t. I think I have a reasonable implementation of the row stats code given my level of C++ skill. I’m changing the syntax a bit from how it worked in the tivo patch to make it be more compatible with how the original row stats code worked that was ported into mariadb.

In MariaDB 5.2 there are show table_statistics show index_statistics, and show user_statistics functions that return global stats collected by the server when the userstats variable is enabled. There are also associated information schema tables that feed these commands. So far I’ve changed the syntax of the show command to be

SHOW [ QUERY | SESSION | GLOBAL ] TABLE_STATISTICS

Where the default show table_statistics returns the global stats. This is different from the tivo patch against 5.0.72 which changed the default to be the last query. The new syntax keeps things backwards compatible with the original show command. The query keyword will return the previous query_statistics_history_size queries of statistics. This is a bit different from the 5.0 version which only tracked one query.

For example if I set query_statistics_history_size=5 then mariadb will track the row stats from the previous 5 queries:

mysql> set query_statistics_history_size=5;
Query OK, 0 rows affected (0.00 sec)

mysql> show tables;
+—————-+
| Tables_in_test |
+—————-+
| t |
| t2 |
+—————-+
2 rows in set (0.00 sec)

mysql> select * from t;
+—+
| t |
+—+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+—+
5 rows in set (0.00 sec)

mysql> select * from t2;
+—-+
| t2 |
+—-+
| 1 |
| 2 |
| 3 |
+—-+
3 rows in set (0.00 sec)

mysql> show query table_statistics;
+———-+————–+————+———–+————–+————————-+
| Query_id | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+———-+————–+————+———–+————–+————————-+
| 81 | test | t | 5 | 0 | 0 |
| 82 | test | t2 | 3 | 0 | 0 |
+———-+————–+————+———–+————–+————————-+
2 rows in set (0.00 sec)

mysql> select count(*) from t,t2;
+———-+
| count(*) |
+———-+
| 15 |
+———-+
1 row in set (0.00 sec)

mysql> show query table_statistics;
+———-+————–+————+———–+————–+————————-+
| Query_id | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+———-+————–+————+———–+————–+————————-+
| 81 | test | t | 5 | 0 | 0 |
| 82 | test | t2 | 3 | 0 | 0 |
| 84 | test | t | 5 | 0 | 0 |
| 84 | test | t2 | 3 | 0 | 0 |
+———-+————–+————+———–+————–+————————-+
4 rows in set (0.00 sec)

Note that show query table_statistics shows the stats only for the current connection. There isn’t a way to get lists from other threads. My next task is to add a separate table with the query_id and query text similar to show profiles that can be used to correlate a query back to the different statistics kept for it. After that I plan on adding index statistics.

The older version of this in the tivo patch has been extremely useful for optimizing queries. I hope I can make this even better and get it into mariadb for other people to use. If you have any suggestions for how I should change this please post in the comments.