“Show profile” + “Information_schema.profiling”

Starting with version 5.1.28, there is a new table INFORMATION_SCHEMA.PROFILING and a new statement SHOW PROFILE(S). Basic information is in the reference manual and in an introductory article. But I don’t think that’s adequate for everyone. I’ll try here to give some non-basic information. I’ll assume that you read the manual, think of this as a supplement.

A bit of a caveat: I was totally against adding this feature, and some of my bias might show. But I’m not trying to argue about it. I do believe that once we add a feature we should explain it, and all I’m trying to do is explain here.

How it works

The data-gathering code, or “instrumentation”, goes approximately like this:
statement_begin:
start-and-setup-for-getting-info
Stage 1: collect-data()
bunch-of-code
Stage 2: collect-data()
bunch-of-code
Stage 3: collect-data()

end-and-put-info-in-buffer

And specifically the timing part of the collect-data() code looks like this:
In sql/sql_profile.cc:

void PROF_MEASUREMENT::collect()
{
time_usecs= (double) my_getsystime() / 10.0; /* 1 sec was 1e7, now is 1e6 */
#ifdef HAVE_GETRUSAGE
getrusage(RUSAGE_SELF, &rusage);
#endif
}

The my_getsystime() function gets time of day; the getrusage() function gets a little more; I’ll get into the details about ‘timing’ later.

So what’s happening is: if profiling is on (@@profiling==1), then, at each “stage” of processing for an SQL statement, MySQL gets the current time with a get-time-of-day function, and maybe gets a little more than that with getrusage().

This information is stored in memory. When you select from the PROFILING table, MySQL gets basic data about the query from elsewhere (things like “Query Id” or “Seq” are just counters and things like “function name” are straightforward), combines it with the timing information, and Bob’s your uncle, you have a display like this:

mysql> select query_id,seq,state,duration,source_function
       from information_schema.profiling where query_id = 6;
+----------+-----+--------------------+----------+-----------------------+
| query_id | seq | state              | duration | source_function       |
+----------+-----+--------------------+----------+-----------------------+
|        6 |   1 | starting           | 0.000565 | NULL                  |
|        6 |   2 | Opening tables     | 0.001148 | open_tables           |
|        6 |   3 | System lock        | 0.000137 | mysql_lock_tables     |
|        6 |   4 | Table lock         | 0.000222 | mysql_lock_tables     |
|        6 |   5 | init               | 0.000209 | mysql_select          |
|        6 |   6 | optimizing         | 0.027357 | optimize              |
|        6 |   7 | statistics         | 0.000398 | optimize              |
|        6 |   8 | preparing          | 0.000211 | optimize              |
|        6 |   9 | executing          | 0.001443 | exec                  |
|        6 |  10 | Sending data       | 0.000542 | exec                  |
|        6 |  11 | end                | 0.000213 | mysql_select          |
|        6 |  12 | query end          | 0.000143 | mysql_execute_command |
|        6 |  13 | freeing items      | 0.001860 | mysql_parse           |
|        6 |  14 | removing tmp table | 0.000900 | free_tmp_table        |
|        6 |  15 | closing tables     | 0.000142 | free_tmp_table        |
|        6 |  16 | logging slow query | 0.000139 | log_slow_statement    |
|        6 |  17 | cleaning up        | 0.000582 | dispatch_command      |
+----------+-----+--------------------+----------+-----------------------+
17 rows in set (0.00 sec)

There will be rows for the last 15 queries (15 is the default if you just “set @@profiling=1″) or perhaps as many as 100 (100 is the maximum value if you say “set @@profiling_history_size = 100″).

I’ll now explain what each of the above columns (query_id, seq, state, duration, source_function) is supposed to mean.

query_id

Whenever you enter a new statement, your “query identifier” goes up by 1. (MySQL people sometimes use the word “query” when the meaning is “statement”.)

This is a number within your session, not a global number, so each connection’s query_id value starts with 1.

SEQ

This is the “stage number” or “step number”. Each time the program reaches a new stage, this number goes up by 1.

state

By now the question might be: when does a new stage happen? MySQL has always (or at least, since before there was a “profiling” patch) had a function thd_set_proc_info() which it calls at different stages. The profiling feature is simply adding the timing functions to this thd_set_proc_info() and making it available via SHOW and SELECT. This piggybacking made the feature easier to add, but it means the stages aren’t always meaningful because they weren’t originally designed with query profiling in mind.

By using grep on the source code, I have been able to collect a list of the server functions which invoke get_thd_proc_info().

sql/item_func.cc:3393: thd_proc_info(thd, “User lock”);
sql/item_func.cc:3418: thd_proc_info(thd, 0);
sql/item_func.cc:3509: thd_proc_info(thd, “User lock”);
sql/item_func.cc:3552: thd_proc_info(thd, 0);
sql/item_func.cc:3737: thd_proc_info(thd, “User sleep”);
sql/item_func.cc:3749: thd_proc_info(thd, 0);
sql/log_event.cc:5926: thd_proc_info(thd, proc_info);
sql/log_event.cc:5979: thd_proc_info(thd, 0);
sql/log_event.cc:6101: thd_proc_info(thd, proc_info);
sql/log_event.cc:6135: thd_proc_info(thd, 0);
sql/sql_delete.cc:62: thd_proc_info(thd, “init”);
sql/sql_delete.cc:253: thd_proc_info(thd, “updating”);
sql/sql_delete.cc:336: thd_proc_info(thd, “end”);
sql/sql_delete.cc:582: thd_proc_info(thd, “deleting from main table”);
sql/sql_delete.cc:897: thd_proc_info(thd, “deleting from reference tables”);
sql/sql_delete.cc:906: thd_proc_info(thd, “end”);
sql/sql_class.cc:264:const char *set_thd_proc_info(THD *thd, const char *info,
sql/sp_head.cc:2734: thd_proc_info(thd, “closing tables”);
sql/sp_head.cc:2737: thd_proc_info(thd, 0);
sql/lock.cc:258: thd_proc_info(thd, “System lock”);
sql/lock.cc:269: thd_proc_info(thd, “Table lock”);
sql/lock.cc:317: thd_proc_info(thd, 0);
sql/lock.cc:332: thd_proc_info(thd, 0);
sql/sql_base.cc:940: thd_proc_info(thd, “Flushing tables”);
sql/sql_base.cc:1008: thd_proc_info(thd, 0);
sql/sql_base.cc:2140: thd_proc_info(thd, “Waiting for table”);
sql/sql_base.cc:2160: thd_proc_info(thd, proc_info);
sql/sql_base.cc:3532: thd_proc_info(thd, “Waiting for tables”);
sql/sql_base.cc:3548: thd_proc_info(thd, “Reopen tables”);
sql/sql_base.cc:3553: thd_proc_info(thd, 0);
sql/sql_base.cc:4450: thd_proc_info(thd, “Opening tables”);
sql/sql_base.cc:4729: thd_proc_info(thd, 0);
sql/sql_base.cc:4878: thd_proc_info(thd, “Opening table”);
sql/sql_base.cc:4918: thd_proc_info(thd, 0);
sql/sql_base.cc:8511: thd_proc_info(thd, “FULLTEXT initialization”);
sql/sql_view.cc:669: thd_proc_info(thd, “end”);
sql/sql_update.cc:235: thd_proc_info(thd, “init”);
sql/sql_update.cc:464: thd_proc_info(thd, “Searching rows for update”);
sql/sql_update.cc:533: thd_proc_info(thd, “Updating”);
sql/sql_update.cc:766: thd_proc_info(thd, “end”);
sql/sql_update.cc:1260: thd_proc_info(thd, “updating main table”);
sql/sql_update.cc:1974: thd_proc_info(thd, “updating reference tables”);
sql/sql_update.cc:1986: thd_proc_info(thd, “end”);
sql/sql_insert.cc:618: thd_proc_info(thd, “init”);
sql/sql_insert.cc:703: thd_proc_info(thd, “update”);
sql/sql_insert.cc:923: thd_proc_info(thd, “end”);
sql/sql_insert.cc:1796: thd_proc_info(thd, “waiting for delay_list”);
sql/sql_insert.cc:1878: thd_proc_info(thd, “Creating delayed handler”);
sql/sql_insert.cc:1924: thd_proc_info(thd, “waiting for handler open”);
sql/sql_insert.cc:1930: thd_proc_info(thd, “got old table”);
sql/sql_insert.cc:2007: thd_proc_info(client_thd, “waiting for handler lock”);
sql/sql_insert.cc:2013: thd_proc_info(client_thd, “got handler lock”);
sql/sql_insert.cc:2031: thd_proc_info(client_thd, “allocating local table”);
sql/sql_insert.cc:2116: thd_proc_info(thd, “waiting for handler insert”);
sql/sql_insert.cc:2120: thd_proc_info(thd, “storing row into queue”);
sql/sql_insert.cc:2366: thd_proc_info(&(di->thd), “Waiting for INSERT”);
sql/sql_insert.cc:2401: thd_proc_info(&(di->thd), 0);
sql/sql_insert.cc:2543: thd_proc_info(&thd, “upgrading lock”);
sql/sql_insert.cc:2556: thd_proc_info(&thd, “insert”);
sql/sql_insert.cc:2694: thd_proc_info(&thd, “reschedule”);
sql/sql_insert.cc:2715: thd_proc_info(&thd, “insert”);
sql/sql_insert.cc:2721: thd_proc_info(&thd, 0);
sql/slave.cc:1054: thd_proc_info(thd, “Creating table from master dump”);
sql/slave.cc:1069: thd_proc_info(thd, “Opening master dump table”);
sql/slave.cc:1085: thd_proc_info(thd, “Reading master dump table data”);
sql/slave.cc:1097: thd_proc_info(thd, “Rebuilding the index on master dump table”);
sql/slave.cc:1586: thd_proc_info(thd, “Waiting for the next event in relay log”);
sql/slave.cc:1588: thd_proc_info(thd, “Waiting for master update”);
sql/slave.cc:2311: thd_proc_info(thd, “Connecting to master”);
sql/slave.cc:2338: thd_proc_info(thd, “Checking master version”);
sql/slave.cc:2347: thd_proc_info(thd, “Registering slave on master”);
sql/slave.cc:2377: thd_proc_info(thd, “Requesting binlog dump”);
sql/slave.cc:2408: thd_proc_info(thd, “Waiting for master to send event”);
sql/slave.cc:2452: thd_proc_info(thd, “Queueing master event to the relay log”);
sql/slave.cc:2527: thd_proc_info(thd, “Waiting for slave mutex on exit”);
sql/slave.cc:2714: thd_proc_info(thd, “Reading event from the relay log”);
sql/slave.cc:2801: thd_proc_info(thd, “Waiting for slave mutex on exit”);
sql/sql_cache.cc:760: thd_proc_info(thd, “storing result in query cache”);
sql/sql_cache.cc:813: thd_proc_info(thd, “storing result in query cache”);
sql/sql_cache.cc:1262: thd_proc_info(thd, “checking query cache for query”);
sql/sql_cache.cc:1346: thd_proc_info(thd, “checking privileges on cached query”);
sql/sql_cache.cc:1441: thd_proc_info(thd, “sending cached result to client”);
sql/sql_cache.cc:1518: thd_proc_info(thd, “invalidating query cache entries (table list)”);
sql/sql_cache.cc:1544: thd_proc_info(thd, “invalidating query cache entries (table)”);
sql/sql_class.h:1298: Set it using the thd_proc_info(THD *thread, const char *message)
sql/sql_parse.cc:386: thd_proc_info(thd, “Execution of init_command”);
sql/sql_parse.cc:447: thd_proc_info(thd, 0);
sql/sql_parse.cc:1575: thd_proc_info(thd, “cleaning up”);
sql/sql_parse.cc:1577: thd_proc_info(thd, 0);
sql/sql_parse.cc:1608: thd_proc_info(thd, “logging slow query”);
sql/sql_parse.cc:1618: thd_proc_info(thd, “logging slow query”);
sql/sql_parse.cc:3220: thd_proc_info(thd, “init”);
sql/sql_parse.cc:4799: thd_proc_info(thd, “query end”);
sql/sql_parse.cc:5037: thd_proc_info(thd, “checking permissions”);
sql/sql_parse.cc:5805: thd_proc_info(thd, “freeing items”);
sql/repl_failsafe.cc:104: thd_proc_info(thd, “Thread initialized”);
sql/repl_failsafe.cc:619: thd_proc_info(thd, “Processing request”);
sql/repl_failsafe.cc:974: thd_proc_info(thd, “purging old relay logs”);
sql/repl_failsafe.cc:1001: thd_proc_info(thd, “starting slave”);
sql/repl_failsafe.cc:1013: thd_proc_info(thd, 0);
sql/sql_repl.cc:658: thd_proc_info(thd, “Sending binlog event to slave”);
sql/sql_repl.cc:696: thd_proc_info(thd, “Finished reading one binlog; switching to next binlog”);
sql/sql_repl.cc:742: thd_proc_info(thd, “Waiting to finalize termination”);
sql/sql_repl.cc:749: thd_proc_info(thd, “Waiting to finalize termination”);
sql/sql_repl.cc:903: thd_proc_info(thd, “Killing slave”);
sql/sql_repl.cc:930: thd_proc_info(thd, 0);
sql/sql_repl.cc:1089: thd_proc_info(thd, “Changing master”);
sql/sql_repl.cc:1220: thd_proc_info(thd, “Purging old relay logs”);
sql/sql_repl.cc:1281: thd_proc_info(thd, 0);
sql/sql_select.cc:762: thd_proc_info(thd, “optimizing”);
sql/sql_select.cc:944: thd_proc_info(thd, “statistics”);
sql/sql_select.cc:954: thd_proc_info(thd, “preparing”);
sql/sql_select.cc:1408: thd_proc_info(thd, “Creating tmp table”);
sql/sql_select.cc:1458: thd_proc_info(thd, “Sorting for group”);
sql/sql_select.cc:1479: thd_proc_info(thd, “Sorting for order”);
sql/sql_select.cc:1638: thd_proc_info(thd, “executing”);
sql/sql_select.cc:1777: thd_proc_info(thd, “Copying to tmp table”);
sql/sql_select.cc:1904: thd_proc_info(thd, “Creating sort index”);
sql/sql_select.cc:1918: thd_proc_info(thd, “Copying to group table”);
sql/sql_select.cc:1977: thd_proc_info(thd, “Removing duplicates”);
sql/sql_select.cc:2040: thd_proc_info(thd, “Sorting result”);
sql/sql_select.cc:2177: thd_proc_info(thd, “Sending data”);
sql/sql_select.cc:2337: thd_proc_info(thd, “init”);
sql/sql_select.cc:2382: thd_proc_info(thd, “end”);
sql/sql_select.cc:10527: thd_proc_info(thd, “removing tmp table”);
sql/sql_select.cc:10549: thd_proc_info(thd, save_proc_info);
sql/sql_select.cc:10583: thd_proc_info(thd, “converting HEAP to MyISAM”);
sql/sql_select.cc:10647: thd_proc_info(thd, (!strcmp(save_proc_info,”Copying to tmp table”) ?
sql/sql_select.cc:10660: thd_proc_info(thd, save_proc_info);
sql/sql_table.cc:3517: thd_proc_info(thd, “creating table”);
sql/sql_table.cc:3586: thd_proc_info(thd, “After create”);
sql/sql_table.cc:5096: thd_proc_info(thd, “discard_or_import_tablespace”);
sql/sql_table.cc:5113: thd_proc_info(thd, “end”);
sql/sql_table.cc:6068: thd_proc_info(thd, “init”);
sql/sql_table.cc:6271: thd_proc_info(thd, “setup”);
sql/sql_table.cc:6327: thd_proc_info(thd, “rename”);
sql/sql_table.cc:6724: thd_proc_info(thd, “copy to tmp table”);
sql/sql_table.cc:6898: thd_proc_info(thd, “rename result table”);
sql/sql_table.cc:7013: thd_proc_info(thd, “end”);
sql/mysql_priv.h:619:#define thd_proc_info(thd, msg) set_thd_proc_info(thd, msg, __func__, __FILE__, __LINE__)
sql/mysql_priv.h:681: The meat of thd_proc_info(THD*, char*), a macro that packs the last
sql/mysql_priv.h:685:const char *set_thd_proc_info(THD *thd, const char *info,
sql/mysql_priv.h.pp:1694:const char *thd_proc_info(void* thd, const char *info);
sql/mysql_priv.h.pp:2394:const char *set_thd_proc_info(THD *thd, const char *info,
storage/csv/ha_tina.cc:1552: old_proc_info= thd_proc_info(thd, “Checking table”);
storage/csv/ha_tina.cc:1582: thd_proc_info(thd, old_proc_info);
storage/myisam/ha_myisam.cc:752: thd_proc_info(thd, “Checking table”);
storage/myisam/ha_myisam.cc:826: thd_proc_info(thd, old_proc_info);
storage/myisam/ha_myisam.cc:1112: thd_proc_info(thd, buf);
storage/myisam/ha_myisam.cc:1115: thd_proc_info(thd, “Repair done”); // to reset proc_info, as
storage/myisam/ha_myisam.cc:1120: thd_proc_info(thd, “Repair by sorting”);
storage/myisam/ha_myisam.cc:1127: thd_proc_info(thd, “Repair with keycache”);
storage/myisam/ha_myisam.cc:1141: thd_proc_info(thd, “Sorting index”);
storage/myisam/ha_myisam.cc:1149: thd_proc_info(thd, “Analyzing”);
storage/myisam/ha_myisam.cc:1156: thd_proc_info(thd, “Saving state”);
storage/myisam/ha_myisam.cc:1194: thd_proc_info(thd, old_proc_info);
storage/myisam/ha_myisam.cc:1409: thd_proc_info(thd, “Creating index”);
storage/myisam/ha_myisam.cc:1434: thd_proc_info(thd, save_proc_info);
storage/archive/ha_archive.cc:1553: old_proc_info= thd_proc_info(thd, “Checking table”);
storage/archive/ha_archive.cc:1568: thd_proc_info(thd, old_proc_info);

So there you go, now you know what the “state” values are, except for a few like “Resuming”. And for more information, all you have to do is read the above source files. Of course, the line numbers will change frequently and the calls to set_thd_proc_info() will change occasionally.

duration

The my_getsystime() function in sql/my_getsystime.c gets time of day with one of:
clock_gettime() available on many Unix systems
QueryPerformanceCounter() Windows only
NXGetTime() NetWare only
gettimeofday() the default if there’s nothing else
All these time-of-day functions are slow on most platforms, but they only get called a few times per statement, because a typical statement has ten or fewer “Stages”. And they’re all good for microsecond measurements.

To get “Duration”, MySQL just subtracts time-of-day [for the previous stage] from time-of-day [for the current stage].

The first (”starting”) row has the duration since the statement execution started, so it doesn’t include some network-IO overhead.

source_function

I tend to think that “state” is more explanatory than “source_function”, but if you really need to get an exact idea about the stage that’s been reached, source_function and source_line are the things to look at. The reference is of course to the source code at build time, so if you pull and build at different times, or depend on a binary distribution, you might have a bit of trouble with these values.

The columns that come from getrusage()

Remember that the lines for ‘timing’ invoked a get-time-of-day function, and then this:
#ifdef HAVE_GETRUSAGE
getrusage(RUSAGE_SELF, &rusage);
#endif
which means “call the operating system’s getrusage() function f it’s there”.

The getrusage() function is available on most operating systems but the MySQL names are all different from the getrusage names. Here is a handy chart:

MYSQL COLUMN NAME               GETRUSAGE FIELD NAME
CPU_USER decimal(9,6)           ru_utime
CPU_SYSTEM decimal(9,6)         ru_stime
CONTEXT_VOLUNTARY               ru_nvcsw
CONTEXT_INVOLUNTARY int(20)     ru_nivcsw
BLOCK_OPS_IN int(20)            ru_inblock
BLOCK_OPS_OUT int(20)           ru_oublock
MESSAGES_SENT int(20)           ru_msgend
MESSAGES_RECEIVED int(20)       ru_msgrcv
PAGE_FAULTS_MAJOR int(20)       ru_majflt
PAGE_FAULTS_MINOR int(20)       ru_minflt
SWAPS int(20)                   ru_nswap

The possibly interesting values are PAGE_FAULTS_MAJOR (which tells you if the operating system had to do a disk read because it didn’t have a copy of the disk block in its memory), CONTEXT_VOLUNTARY (which may help you guess the number of “waits” i.e. how many times a thread had to wait for something), and BLOCK_OPS_IN + BLOCK_OPS_OUT (which give a good hint about how active the disk is).

Some things may need emphasis for a MySQL context.

1 It does not work on Windows!
There is a Windows-API function to get the number of page faults and some memory information: GetProcessMemoryInfo (http://msdn.microsoft.com/en-us/library/ms683219(VS.85).aspx). And GetProcessTimes might give something like ru_utime + ru_stime. So we could get at least some of the getrusage() data. But we don’t.

2 It’s per process not per thread!
Since MySQL tends to use a thread-per-connection model, it might be most interesting to know “how long did this process take for this thread” rather than “how long did it take for the whole process (which includes other threads)”. This way, if some other job interrupts, you’ll see the other job’s time rather than your own. Linux is moving towards per-thread accounting (my Brazilian colleague informed me it’s already working fine with the 2.6.25 kernel), but on a busy system your times may be misleading.

3 It gives estimates that are useless for short measurements!
Usually getrusage() is getting information that the operating system deposited the last time there was a ‘timer tick’, which could be as much as 1/100 seconds ago. Some operating systems tick 1000 times per second instead of 100 times per second, and Solaris has “microstate accounting” which means it’s precise. But as a general rule you can’t trust getrusage() results for high precision.

All of these observations are okay, given the purpose of the profile feature. But I thought you should know about them.

The future

If we decide to remove a feature, we always give plenty of warning with a “deprecation notice”. At time of writing there is no deprecation notice for the profiling feature.