As there was a lot of complain recently about various cases with
PERFORMANCE SCHEMA (PFS) overhead, I've decided to start a series
of articles to help you to use PFS in the best way to analyze
your workloads. This is the first part, starting with an overview
of instrumentation problems generally..
First of all, instrumentation cannot be free! - when you
want to trace or get some stats about your application internals
you're adding a new code, and this code is increasing an overall
code path, so your final binary simply cannot run on the same
speed by definition!.. Of course, if your instrumentation code is
poor, the overall overhead will be bigger ;-) but if you're
designing it well, it'll be lower, but still not zero!..
Dynamic instrumentation brings additional cost - as soon
as you're willing to add some advanced stuff within your
instrumentation, it makes sense to not keep it active all of the
time.. - but checking of all these flag/states on active/inactive
will bring additional cost as will require more additional code
to manage it.. (the best realization for today of dynamic
instrumentation will probably attributed to DTrace, but currently
you cannot get it yet available on most of platforms, and which
is remaining a story on itself)..
Instrumentation overhead is directly depending on event
frequency - sounds trivial, right? - more frequently your
instrumentation code is executed - bigger overhead you may
expect. So, before you activate any instrumentations on some
execution levels of your code you have to think about a frequency
of events on this level.. BTW, exactly the same problem you're
meeting with DTrace as well (for ex. see my old article with
example of x10 slowdown under DTrace).
However, when we're adding instrumentation into our own code, we
may then keep in mind what kind of level of event frequency to
expect within each part of code, and make and overhead of such an
instrumentation predictable(!)..
All this sounds simple.. - until we're not starting to feel an
overhead directly related to involved instrumentation ;-) To give
you a feeling of what it could be I wrote a simple "dumb code" of
a program which is not doing anything useful, but just looping to
keep CPU warm, and within it loops checking different conditions
;-) (the full source code you may find
here if you want to play with it too).
The program is calling recursively Loop_func(), which will do "N"
loops of a given level, and then call "M" times Loop_func() again
to involve execution of the next level (in fact the program is
simulating a work like: main()->do some work-> call
function1() -> do some work -> call in loop func2() ->
do some work -> call in loop fun3() -> and so on...)
For a database code it could be something similar to:
main() ... '-> loop: execute_SQL_SELECT(); <== level 1 '-> loop: read_ranges(); <== level 2 '-> loop: read_rows(); <== level 3 '-> loop: read_row(); <== level 4 '-> ...
So, more we're going in depth in levels - bigger a probability
that a code on this level is executed more frequently (more loop
iterations). And then, we want to instrument it to measure
exactly the time we're spending there. And to follow the PFS
idea, I'll add the following to my code around call of the
Loop_func():
- first I'm able to compile the code with or without "PFS instructions" (depending if PFS_DEF is declared or not)
- then, I'm able "dynamically" enable instrumentation code according the "PFS" arg value
- and then according to enabled instrumentation of each level, I'm then executing or not the instrumentation code on such a level (for TIMED instrumentation I'm using gettimeofday() which is probably not the most optimal, but good for an example)..
#ifdef PFS_DEF if( PFS ) { if( Data[ level ].Enabled ) Data[ level ].CNT++; if( Data[ level ].Timed ) gettimeofday( &tv1, NULL ); } #endif Loop_func( level + 1, max ); #ifdef PFS_DEF if( PFS ) { if( Data[ level ].Enabled && Data[ level ].Timed ) { gettimeofday( &tv2, NULL ); Data[ level ].TIMED += (tv2.tv_sec - tv1.tv_sec) * 1000000 + tv2.tv_usec - tv1.tv_usec; } } #endif
Hm.. it's probably will be faster for you to read the code than
read my explanations ;-)
But well, now how to use this program:
-
first of all, I've compiled 2 binaries:
- one "without PFS enabled" and namedd "pfs_loop-noPFS"
- another with "PFS enabled" (PFS_DEF defined) and named "pfs_loop-withPFS"
-
then I'm executing the program with following arguments:
- $ pfs_loop PFS Loop1_N Loop1_M Instr1 Timed1 [Loop2_N Loop2_M Instr2 Timed2]...
-
where:
- PFS - 1/0: enabled/disabled PFS code
- Loop1_N - number of loops to execute on the "level1" to simulate a kind of work
- Loop1_M - number of loops with call of the Loop_func() function for the next level ("level2")
- Instr1 - 1/0 : enabled/disabled instrumentation on the "level1"
- Timed1 - 1/0 : enabled/disabled TIMED instrumentation on the "level1"
- ... - and so on for each next level
- the total number of levels is calculated then automatically
So far, in the following example I'm executing the "pfs_loop"
program with 4 levels in total, on the first 3 levels the program
will run 1000 times in the "work loop" (N), and then will 500
times call execution of the next level. And only on the last
(4th) level the "work loop" will be shorter - 10 times only
(similar like you read few rows from FS cache, etc.)..
The first execution is with a binary without any instrumentation:
$ time pfs_loop-noPFS 0 1000 500 0 0 1000 500 0 0 1000 500 0 0 10 0 0 0 real 0m4.380s user 0m4.366s sys 0m0.000s
Now exactly the same thing, but with binary compiled with "PFS
instrumentation", but PFS is "disabled"
$ time pfs_loop-withPFS 0 1000 500 0 0 1000 500 0 0 1000 500 0 0 10 0 0 0 real 0m4.551s <-- 4% more.. user 0m4.537s sys 0m0.000s
The same, but PFS "enabled", but all instrumentations "disabled":
$ time pfs_loop-withPFS 1 1000 500 0 0 1000 500 0 0 1000 500 0 0 10 0 0 0 real 0m4.685s <-- 6.6% more.. user 0m4.671s sys 0m0.001s
And now "enabling" instrumentation on different levels:
$ time pfs_loop-withPFS 1 1000 500 1 1 1000 500 0 0 1000 500 0 0 10 0 0 0 real 0m4.685s <-- same 6.6% for "level 1" user 0m4.671s sys 0m0.000s $ time pfs_loop-withPFS 1 1000 500 0 0 1000 500 1 1 1000 500 0 0 10 0 0 0 real 0m4.700s <-- 7.3% for "level 2" user 0m4.684s sys 0m0.001s $ time pfs_loop-withPFS 1 1000 500 0 0 1000 500 0 0 1000 500 1 1 10 0 0 0 real 0m12.864s <-- 293% for "level 3" user 0m12.826s sys 0m0.000s
As you can see, more in "lower level" we're involving
instrumentation - bigger overhead we're getting (trivial, right?
- on a more lower level the same instrumentation code is executed
way more often, so as the result the impact is more bigger
too)..
Exactly the same is going with PERFORMANCE SCHEMA in MySQL! -
more low level instrumentation you're enabling, bigger overhead
you may hit.. However, keep in mind:
- ALL PFS instrumentation is completely dynamic
- at any time you may disable everything!
- you may go very progressively and detailed level by level in instrumentation, just be aware about what exactly you're doing ;-)
Now, let's take a popular example of Sysbench OLTP_RO
Point-Select 8-tables workload. This workload on 16cores-HT is
getting its max performance level usually on 64 or 256 concurrent
user sessions. Let's analyze the PFS impact on this workload on
the following situations:
- noPFS -- MySQL binary was compiled without PFS
- withPFS, PFS=off -- MySQL binary was compiled with PFS, but PFS is disabled
- withPFS, PFS=def -- same as before, but PFS is enabled with default instrumentation (as in MySQL 5.6 by default)
- withPFS, PFS=none -- PFS is enabled, but all instrumentations are disabled
So, we're getting:
Observations :
- noPFS : 64usr-> 250K QPS, 256usr-> 280K QPS
- PFS=off : 64usr-> 249K QPS, 256usr-> 275K QPS
- PFS=def : 64usr-> 240K QPS, 256usr-> 260K QPS
- PFS=none : 64usr-> 245K QPS, 256usr-> 270K QPS
- so, in worse case we have 8% "overhead" - 260K QPS -vs- 280K QPS.. (PFS default -vs- compiled without PFS)
- while, "not compiled" -vs- "compiled, but disabled" (noPFS -vs- PFS=off) is "near" the same - only 2% overhead max here..
However, the "default" PFS instrumentation is composed from
"table I/O" and "statements/digest" instrumentations, let's see
the impact of the each one:
Observations :
- PFS=none : 64usr-> 245K QPS, 256usr-> 270K QPS
- PFS=tables_only : 64usr-> 240K QPS, 256usr-> 265K QPS
- PFS=statements_only : 64usr-> 240K QPS, 256usr-> 260K QPS
- so, as you can see, by disabling statements instrumentation you can reduce the max overhead from 8% to 5%
- and all this dynamic, without MySQL Server restart!
Well, I'll go more in details within next articles, but to finish
with this one, let's see how well the "alternative" solutions are
working on the same workload...
Let me start with MySQL 5.6 code patched by Facebook and
providing TABLE_STATISTICS (and other stats). As mentioned Domas,
there is no need any enable/disable settings, just start MySQL
server and you get it. So, let's compare its overhead on the same
workload with noPFS and PFS=def :
Oservations :
- as you can see, there is near no difference in overhead between PFS=def and Facebook stats instrumentation
- there are the same max 8% overhead -vs- noPFS.. - so, there is no instrumentation without cost, right?..
- while PFS=def is not only have table stats, but also query digest..
- and PFS instrumentation you may dynamically reduce at any moment, while the Facebook code is just here..
Another "alternative" stats solution is implemented in Percona
Server, but this instrumentation is dynamic, in fact binary: you
may just enable or disable it completely by setting the
"userstat" global variable to 1 or 0. Percona Server 5.6 is only
RC for the moment, and I was unable to compile it without PFS due
errors in Percona code.. So, let's compare it on the same
workload with noPFS and PFS=off (the difference in PFS=off config
between MySQL 5.6 and Percona 5.6 may be a sign of yet additional
overhead coming from the "userstat" related code) :
Observations :
- in fact, yes, MySQL 5.6 max QPS with PFS=off is higher than on Percona 5.6..
- and then, once userstat=1 the regression is horrible.. over 30%...
So, it's very easy to complain about PFS overhead.. - but what if
we'll try together to improve it rather to blame?.. ;-)) PFS is
an universal solution for MySQL profiling, it has a HUGE power,
we all just need to learn how to use it right, and then life will
become much more easier.. ;-)
INSTEAD OF SUMMARY :
- Instrumentation is not free ;-)
- Instrumentation of higher frequency events having a bigger overhead!
- PFS is a really great tool, just need some love ;-)
Well, it was only a story about PFS instrumentation overhead, but
nothing about practical usage.. So, the next one in this series
will be all about practical ;-)
To be continued..
Rgds,
-Dimitri