MySQL Performance Schema Statement Digests

MySQL 5.6.5 was released recently, with many a great blog written by various developers involved in making it such an awesome release. The list is large, I’d recommend reading the change log. The MySQL 5.6 What Is New page provides a great overview of the 5.6 release so far as well.

The changelog includes this nugget, which I’ll expand upon here:

The Performance Schema now maintains statement digest information. This normalizes and groups statements with the same “signature” and permits questions to be answered about the types of statements the server is executing and how often they occur.

  • A statement_digest consumer in the setup_consumers table controls whether the Performance Schema maintains digest information.
  • The statement event tables (events_statements_current, events_statements_history, and events_statements_history_long) have DIGEST and DIGEST_TEXT columns that contain digest MD5 values and the corresponding normalized statement text strings.
  • A events_statements_summary_by_digest table provides aggregated statement digest information.

Many of you should be familiar with this view now:

It is the Query Analysis view from MySQL Enterprise Monitor. This was first released at the end of 2008, and was conceived (actually in 2006, presented and much discussed on a snowy day at a MEM team meeting in Amsterdam) from the need to easily monitor what statements are running within the database instance.

Yes, there were the Slow Query Log and General Query Log, but they had their downfalls when we were evaluating this – not fine grained enough on the one side, not enough info with too much overhead on the other, and both require access at the OS level to read the log files – not always a problem, but certainly is in some cases. The only tool that did any kind of analysis and statistics aggregation on queries that MySQL had been executing, at the time, was mysqldumpslow.

And so MySQL Proxy was born, and hooked up with MEM to collect statistics on queries that were funneled through it. It was revolutionary (for the MySQL world, at least). Hot on it’s heels we saw new GUI tools from the community trying to do the same thing. We’ve also seen extensions to the old ways of doing things as well with the extended slow query logging, and command line scripts such as mk-query-digest being born (and re-born). We then extended Query Analysis in to the Connectors as well, to share the load of statistics gathering across application servers, and not have to funnel everything through MySQL Proxy (among other benefits).

These are all still great solutions today. Yet they all have one major downfall – they try and skirt the issue of the database instance … actually generating this data and having it available with a SQL interface. That’s what databases are supposed to be good at … right?

Now it’s 2012, and we finally have a solution for this – Performance Schema Statement Digests. So with the picture above in mind, now take a look at:

Strikingly similar, no? And it’s being returned by the database directly. It gets better.

One of the problems with trying to gather statistics on statement traffic from within the protocol stream (done by Proxy, Connectors, any tcpdump solution, etc.) is that there are certain statistics not exposed within the MySQL protocol – everything like the number of rows examined, temporary table usage, select type etc. that can be seen in a SHOW SESSION STATUS for example are practically impossible to correlate (we would have to inject that in to each connection to wrap each “real” statement).

The Slow Query Log does not have this problem – and the extensions there in the community have proved that it’s useful to be able to aggregate more at the statement level too.

Statement Digests most certainly do not have this problem. Here’s the current structure of the new events_statements_summary_by_digest table:

mysql> DESC performance_schema.events_statements_summary_by_digest;
+-----------------------------+---------------------+------+-----+---------------------+-------+
| Field                       | Type                | Null | Key | Default             | Extra |
+-----------------------------+---------------------+------+-----+---------------------+-------+
| DIGEST                      | varchar(32)         | YES  |     | NULL                |       |
| DIGEST_TEXT                 | longtext            | YES  |     | NULL                |       |
| COUNT_STAR                  | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_TIMER_WAIT              | bigint(20) unsigned | NO   |     | NULL                |       |
| MIN_TIMER_WAIT              | bigint(20) unsigned | NO   |     | NULL                |       |
| AVG_TIMER_WAIT              | bigint(20) unsigned | NO   |     | NULL                |       |
| MAX_TIMER_WAIT              | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_LOCK_TIME               | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_ERRORS                  | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_WARNINGS                | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_ROWS_AFFECTED           | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_ROWS_SENT               | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_ROWS_EXAMINED           | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_CREATED_TMP_DISK_TABLES | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_CREATED_TMP_TABLES      | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SELECT_FULL_JOIN        | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SELECT_FULL_RANGE_JOIN  | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SELECT_RANGE            | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SELECT_RANGE_CHECK      | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SELECT_SCAN             | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SORT_MERGE_PASSES       | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SORT_RANGE              | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SORT_ROWS               | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_SORT_SCAN               | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_NO_INDEX_USED           | bigint(20) unsigned | NO   |     | NULL                |       |
| SUM_NO_GOOD_INDEX_USED      | bigint(20) unsigned | NO   |     | NULL                |       |
| FIRST_SEEN                  | timestamp           | NO   |     | 0000-00-00 00:00:00 |       |
| LAST_SEEN                   | timestamp           | NO   |     | 0000-00-00 00:00:00 |       |
+-----------------------------+---------------------+------+-----+---------------------+-------+
28 rows in set (0.02 sec)

Possibilities abound!

Before diving in to some of the use cases, first it’s worth describing exactly what a digest is.

As with Query Analysis in MEM, we track and aggregate statistics by normalizing the statements. Performance Schema does this in slightly different ways to how the MEM components do this, however they both follow roughly the same process, by working on a tokenized representation of the statement.

Performance Schema does this by tying in to the lexer within the MySQL server, which spits out the tokenized statement to be parsed, by recording it’s token output stream and doing some of it’s own normalization on top.

A normalized statement may have any of the following done to it:

  • Stripping whitespace (done in the lexer layer)
  • Stripping comments (done in the lexer layer)
  • Replacing literals (integer and string inputs) with a “?” placeholder
    • SELECT foo, bar FROM foobar WHERE foo = 100; becomes SELECT foo, bar FROM foobar WHERE foo = ?
    • For a single value INSERT statement, INSERT INTO foobar VALUES (100); becomes INSERT INTO foobar VALUES (?);
  • Lists of values are folded
    • Folding lists of IN values
      • SELECT foo FROM foobar WHERE bar IN (1, 2, 3) becomes SELECT foo FROM foobar WHERE bar IN ()
    • Folding multi-row INSERTs with single values
      • INSERT INTO t1 VALUES (1), (2) becomes INSERT INTO t1 VALUES (?) /* , … */
    • Folding multi-row INSERTs with many values
      • INSERT INTO t1 VALUES (1, 2, 3), (4, 5, 6) becomes INSERT INTO t1 VALUES (…) /* , … */
    • Lists of values in the SELECT list
      • SELECT 1, 2, 3, foo, bar FROM foobar becomes SELECT ?, … , foo, bar FROM foobar
  • Long normalized statements are truncated, with “” added at the end
    • “long” is currently defined as 1024 bytes, set by PSI_MAX_DIGEST_STORAGE_SIZE in psi.h, unfortunately this is not configurable yet, however it should also be noted that this is on the normalized statement – so large string values etc. are not a concern here

MEM does not fold lists of values down, i.e. “INSERT INTO foobar VALUES (1,2,3,4,5)” will be normalized to “INSERT INTO foobar VALUES (?,?,?,?,?)” within MEM. Performance Schema does this to save memory, by discarding the value tokens along the way and using the above forms for normalization.

With that description out of the way, now let’s take a look at some the value that you can get from the data!

First, let’s get the obvious out the way:

A high level overview of the statements like Query Analysis, sorted by those queries with the highest latency

SELECT IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
       IF(SUM_NO_GOOD_INDEX_USED > 0 OR SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
       COUNT_STAR AS exec_count,
       SUM_ERRORS AS err_count,
       SUM_WARNINGS AS warn_count,
       SEC_TO_TIME(SUM_TIMER_WAIT/1000000000000) AS exec_time_total,
       SEC_TO_TIME(MAX_TIMER_WAIT/1000000000000) AS exec_time_max,
       (AVG_TIMER_WAIT/1000000000) AS exec_time_avg_ms,
       SUM_ROWS_SENT AS rows_sent,
       ROUND(SUM_ROWS_SENT / COUNT_STAR) AS rows_sent_avg,
       SUM_ROWS_EXAMINED AS rows_scanned,
       DIGEST AS digest
  FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC LIMIT 5;

+-------------------------------------------------------------------+-----------+------------+-----------+------------+-----------------+---------------+------------------+------------+----------+--------------+----------------------------------+
| query                                                             | full_scan | exec_count | err_count | warn_count | exec_time_total | exec_time_max | exec_time_avg_ms | rows_total | rows_avg | rows_scanned | digest                           |
+-------------------------------------------------------------------+-----------+------------+-----------+------------+-----------------+---------------+------------------+------------+----------+--------------+----------------------------------+
| SELECT `mysqlserve0_` . `hid`  ... ` . `performanceSchema` AS ... | *         |       7747 |         0 |          0 | 00:00:02.5492   | 00:00:00.0009 |           0.3288 |       5200 |        1 |         5200 | 3176fe79ae8ce631eb328feaaafcf972 |
| SELECT `agent0_` . `hid` AS `h ... ent0_` . `id` = ? FOR UPDATE   |           |       7706 |         0 |          0 | 00:00:02.4067   | 00:00:00.0008 |           0.3122 |       7706 |        1 |         7706 | cbcf9ed706ce0974f1ef91b7c2d690eb |
| SELECT `os0_` . `hid` AS `hid1 ... ion` AS `hasVersion121_` , ... | *         |       5098 |         0 |          0 | 00:00:02.2526   | 00:00:00.0047 |           0.4416 |       5098 |        1 |         5098 | 1677026fd320b4876261b1270d28be38 |
| SELECT `replicatio0_` . `hid`  ... ess` AS `hasMast18_107_` , ... | *         |       2550 |         0 |          0 | 00:00:02.0543   | 00:00:00.0016 |           0.8055 |       2550 |        1 |         2550 | 90b8d082e28985953ff558546e00eb33 |
| SELECT `environmen0_` . `hid`  ... ck130_` , `environmen0_` . ... | *         |       2550 |         0 |          0 | 00:00:01.6082   | 00:00:00.0047 |           0.6304 |       2550 |        1 |         2550 | f0ba68f44b6d9357f1cff9670dc4ff2a |
+-------------------------------------------------------------------+-----------+------------+-----------+------------+-----------------+---------------+------------------+------------+----------+--------------+----------------------------------+

My test data is a little bland (this is a MEM schema monitoring a single MySQL instance), but you get the idea. You can instantly get a picture of your top statements, ordered in this case by latency, along with some extra statistics like whether it caused a full scan, their average latency, how many rows were both scanned and returned etc.

But you could slice and dice this data any way you want – it’s just SQL.

Here’s a few more use cases:

List all normalized statements that use temporary tables ordered by number of on disk temporary tables descending first, then by the number of memory tables.

SELECT IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
       COUNT_STAR AS exec_count,
       SUM_CREATED_TMP_TABLES AS memory_tmp_tables,
       SUM_CREATED_TMP_DISK_TABLES AS disk_tmp_tables,
       ROUND(SUM_CREATED_TMP_TABLES / COUNT_STAR) AS avg_tmp_tables_per_query,
       ROUND((SUM_CREATED_TMP_DISK_TABLES / SUM_CREATED_TMP_TABLES) * 100) AS tmp_tables_to_disk_pct,
       DIGEST AS digest
  FROM performance_schema.events_statements_summary_by_digest
 WHERE SUM_CREATED_TMP_TABLES > 0
ORDER BY SUM_CREATED_TMP_DISK_TABLES DESC, SUM_CREATED_TMP_TABLES DESC LIMIT 5;

+-------------------------------------------------------------------+------------+-------------------+-----------------+--------------------------+------------------------+
| query                                                             | exec_count | memory_tmp_tables | disk_tmp_tables | avg_tmp_tables_per_query | tmp_tables_to_disk_pct |
+-------------------------------------------------------------------+------------+-------------------+-----------------+--------------------------+------------------------+
| SELECT DISTINCTROW `hibalarm0_ ... testeval2_` . `alarm_id` = ... |          5 |                15 |               5 |                        3 |                     33 |
| SELECT DISTINCTROW `hibalarm0_ ... testeval2_` . `alarm_id` = ... |          2 |                 6 |               2 |                        3 |                     33 |
| SELECT * FROM latest_file_io                                      |          2 |                 4 |               2 |                        2 |                     50 |
| SELECT * FROM PROCESSLIST                                         |          2 |                 2 |               2 |                        1 |                    100 |
| SELECT SQL_CALC_FOUND_ROWS `st ...  , MIN ( `min_exec_time` ) ... |          1 |                 3 |               1 |                        3 |                     33 |
+-------------------------------------------------------------------+------------+-------------------+-----------------+--------------------------+------------------------+

List all normalized statements that have done sorts, ordered by sort_merge_passes, sort_scans and sort_rows, all descending.

SELECT IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
       COUNT_STAR AS exec_count,
       SUM_SORT_MERGE_PASSES AS sort_merge_passes,
       ROUND(SUM_SORT_MERGE_PASSES / COUNT_STAR) AS avg_sort_merges,
       SUM_SORT_SCAN AS sorts_using_scans,
       SUM_SORT_RANGE AS sort_using_range,
       SUM_SORT_ROWS AS rows_sorted,
       ROUND(SUM_SORT_ROWS / COUNT_STAR) AS avg_rows_sorted,
       DIGEST AS digest
  FROM performance_schema.events_statements_summary_by_digest
 WHERE SUM_SORT_ROWS > 0
 ORDER BY SUM_SORT_MERGE_PASSES DESC, SUM_SORT_SCAN DESC, SUM_SORT_ROWS DESC LIMIT 5;

+-------------------------------------------------------------------+------------+-------------------+-----------------+-------------------+------------------+-------------+-----------------+----------------------------------+
| query                                                             | exec_count | sort_merge_passes | avg_sort_merges | sorts_using_scans | sort_using_range | rows_sorted | avg_rows_sorted | digest                           |
+-------------------------------------------------------------------+------------+-------------------+-----------------+-------------------+------------------+-------------+-----------------+----------------------------------+
| SELECT * FROM ps_helper . statements_with_sorting                 |          7 |                 0 |               0 |                 7 |                0 |          31 |               4 | 635d19e3e652972b3267ada0bf9c7b36 |
| SELECT * FROM statement_analysis                                  |          4 |                 0 |               0 |                 4 |                0 |          89 |              22 | 10f918a1a410f4fa0fc2602cff02deb7 |
| SELECT table_schema , SUM ( da ... tables GROUP BY table_schema   |          2 |                 0 |               0 |                 2 |                0 |          24 |              12 | 27fecd44f0bf5c0fc4e46f547083a09d |
| SELECT * FROM statements_with_sorting                             |          2 |                 0 |               0 |                 2 |                0 |           3 |               2 | dc117dd0eb81394322e3d4144a997ffc |
+-------------------------------------------------------------------+------------+-------------------+-----------------+-------------------+------------------+-------------+-----------------+----------------------------------+

List all normalized statements that use have done a full table scan ordered by the percentage of times a full scan was done, then by the number of times the statement executed

SELECT IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
       COUNT_STAR AS exec_count,
       SUM_NO_INDEX_USED AS no_index_used_count,
       SUM_NO_GOOD_INDEX_USED AS no_good_index_used_count,
       ROUND((SUM_NO_INDEX_USED / COUNT_STAR) * 100) no_index_used_pct,
       DIGEST AS digest
  FROM performance_schema.events_statements_summary_by_digest
 WHERE SUM_NO_INDEX_USED > 0
    OR SUM_NO_GOOD_INDEX_USED > 0
ORDER BY no_index_used_pct DESC, exec_count DESC LIMIT 5;

+-------------------------------------------------------------------+------------+---------------------+--------------------------+-------------------+
| query                                                             | exec_count | no_index_used_count | no_good_index_used_count | no_index_used_pct |
+-------------------------------------------------------------------+------------+---------------------+--------------------------+-------------------+
| SELECT `hibadvisor0_` . `sched ... _` . `advisorClassId` IN (?)   |        679 |                 679 |                        0 |               100 |
| SELECT `hibalarm0_` . `alarm_i ... ` . `fixed_time` < ? LIMIT ?   |        365 |                 365 |                        0 |               100 |
| SELECT `id` , `millis_stamp` , ... s` ORDER BY `id` ASC LIMIT ?   |        353 |                 353 |                        0 |               100 |
| SELECT `agent0_` . `hid` AS `h ... ventory` . `Agent` `agent0_`   |        112 |                 112 |                        0 |               100 |
| SELECT COUNT ( * ) AS `col_0_0 ... entry0_` . `fetchedDate` > ?   |         18 |                  18 |                        0 |               100 |
+-------------------------------------------------------------------+------------+---------------------+--------------------------+-------------------+

List all normalized statements that have raised errors or warnings.

SELECT IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
       COUNT_STAR AS exec_count,
       SUM_ERRORS AS errors,
       (SUM_ERRORS / COUNT_STAR) * 100 as error_pct,
       SUM_WARNINGS AS warnings,
       (SUM_WARNINGS / COUNT_STAR) * 100 as warning_pct,
       DIGEST AS digest
  FROM performance_schema.events_statements_summary_by_digest
 WHERE SUM_ERRORS > 0
    OR SUM_WARNINGS > 0
ORDER BY SUM_ERRORS DESC, SUM_WARNINGS DESC;

+-------------------------------------------------------------------+------------+--------+-----------+----------+-------------+----------------------------------+
| query                                                             | exec_count | errors | error_pct | warnings | warning_pct | digest                           |
+-------------------------------------------------------------------+------------+--------+-----------+----------+-------------+----------------------------------+
| CREATE PROCEDURE currently_ena ... w_instruments BOOLEAN DEFAULT  |          2 |      2 |  100.0000 |        0 |      0.0000 | ad6024cfc2db562ae268b25e65ef27c0 |
| CREATE PROCEDURE currently_ena ... ents WHERE enabled = ? ; END   |          2 |      1 |   50.0000 |        0 |      0.0000 | 4aac3ab9521a432ff03313a69cfcc58f |
| CREATE PROCEDURE currently_enabled ( BOOLEAN show_instruments     |          1 |      1 |  100.0000 |        0 |      0.0000 | c6df6711da3d1a26bc136dc8b354f6eb |
| CREATE PROCEDURE disable_backg ... d = ? WHERE TYPE = ? ; END IF  |          1 |      1 |  100.0000 |        0 |      0.0000 | 12e0392402780424c736c9555bcc9703 |
| DROP PROCEDURE IF EXISTS currently_enabled                        |         12 |      0 |    0.0000 |        6 |     50.0000 | 44cc7e655d08f430e0dd8f3110ed816c |
| DROP PROCEDURE IF EXISTS disable_background_threads               |          3 |      0 |    0.0000 |        2 |     66.6667 | 0153b7158dae80672bda6181c73f172c |
| CREATE SCHEMA IF NOT EXISTS ps_helper                             |          2 |      0 |    0.0000 |        1 |     50.0000 | a12cabd32d1507c758c71478075f5290 |
+-------------------------------------------------------------------+------------+--------+-----------+----------+-------------+----------------------------------+

All of these greatly show the flexibility of having this instrumentation directly within the server, accessible via the server itself. If you haven’t done it already now is the time to take a little breath, and then shout Hurrah.

Now, it’s worth noting that all of this is stored in memory. By default, the maximum number of rows in this table in 5.6.5 is 200, which effectively lets you monitor 199 normalized statements – as there is a row reserved for a NULL digest, which is a catch all for once this number of normalized statements have been generated – Performance Schema then stores the aggregated statistics for all other statement traffic whose statement digests do not match those already within the table within this NULL row.

In the MEM team we’ve found in practice that people often have many more than this – often 500+ different kinds of statements being executed. You can increase the number of digests that are tracked with the performance_schema_digests_size system variable. You can track how much memory this uses with the SHOW ENGINE PERFORMANCE_SCHEMA STATUS command (here I have the number of digests to track set to 400, which takes roughly 490KB of memory):

mysql> show engine performance_schema status;
+--------------------+--------------------------------------------------------------+-----------+
| Type               | Name                                                         | Status    |
+--------------------+--------------------------------------------------------------+-----------+
...
| performance_schema | events_statements_summary_by_digest.row_size                 | 1256      |
| performance_schema | events_statements_summary_by_digest.row_count                | 400       |
| performance_schema | events_statements_summary_by_digest.memory                   | 502400    |
...

This limit has been raised to 1000 within the current development version (5.6.6), along with Performance Schema (and the instrumentation I have talked about here) being enabled by default!

As with other aggregate tables within Performance Schema, you can reset the statistics within the digest table with:

TRUNCATE TABLE performance_schema.events_statements_summary_by_digest;

Any scripts or tools that are used to analyze this table will need to take this in to account.

We’re just scratching the surface of possibilities with this table at the moment. I have a number of things that I’d like to see added (like.. a summary of each of the major classes of lower wait events, if enabled, like “SUM_FILE_IO_WAIT”, “SUM_SOCKET_IO_WAIT”, “SUM_MUTEX_WAIT” etc.), and I’m sure many of you out there in the community will have your own ideas as well.

Welcome to the future of statement diagnostics within MySQL, we hope you enjoy it! Tell us what you’d like to see next!