We've all heard the hype. MySQL 5.6 is packed with amazing
new features that address all our database problems. 5.6
deals with replication and HA and performance and monitoring and security and features. It just may cure cancer.
In fact it's been out for ages. It went GA about 18 months ago. This is all old news.
MySQL 5.7 already exists. So why am I writing
about this now?
The hard truth is that if you're a DBA with a non-trivial
database environment then you can't drop everything and upgrade
at the drop of a hat. Newly GA versions tend to be buggy.
You have uptime requirements to maintain. Upgrading to the
latest version to try out all the new features in production is a
luxury you simply do not have. Additionally the transition
brings challenges. New versions may have incompatibilities
with the old so you need plenty of time to test your application
against it. The deployment process may have lots of hidden
gotchas.
If you're a DBA then you're constantly evaluating this equation:
at what point does the disadvantage of remaining on a known
old version outweigh the pain and effort of doing an upgrade.
If there are no clear benefits, then why upgrade at all?
In software and operations, the weight of an incumbent
solution is heavy indeed.
So for me, that tipping point has been in favor of upgrading only
recently. Since I think a lot of DBAs are still running on
5.5 environments I thought it was worth it to document this
process here to see if my mistakes could help others. This
isn't an exhaustive how-to. This is just my own experience
of going through the process.
First up: performance_schema Performance_schema was introduced in
5.5, but it was generally accepted that the performance cost was
too prohibitive to enable it in production. Even in 5.6
some claim it's still too expensive, but others have measured it
at around 2-3% overhead. Despite the cost I'm really
excited about using it to solve other, nastier performance issues
so I'm going to keep it enabled by default for new servers.
The first use-case I wanted to implement was to monitor
replication capacity more easily. Estimating how close you
are to having replication lag problems has not been very reliable
before performance_schema. It was mostly guesswork.
Once you go over your replication capacity then there's not
much you can do to fix it except do fewer updates on your
masters.
Mark Leith has been a great proponent of all things
performance_schema and has been writing lots of
tools to make it easier to use. He demonstrated how to
implement a replication "load average" type metric using
p_s -- and that was way back with 5.5!
I set out to get this data into our graphing system so I could
generate pretty pictures like this:
This show the percentage of time the replication SQL thread was
actually busy. This is data collected from a replica which is
clearly not doing much. On busier clusters it would show us
when we're about to have lag issues.
So how did implementing this go? Well here's a list of
gotchas that I ran into:
1. performance_schema=on can only be set at startup time. This
one is documented in the manual so no surprises there.
2. certain instruments must *also* be enabled at startup time.
This isn't documented and drove me crazy. Finally the one
and only place I found which described this issue was in the
slide deck from a Percona presentation by Valerii Kravchuk which was a
life-saver! I pointed out bug
#68097 which discussed the issue. The summary is
that some objects like mutexes are initialized only once at
server start time, so if the instruments aren't enabled at
startup then enabling them later won't do anything.
This is in direct contradiction of the documentation which
says: "Modifications to the setup_instruments table
affect monitoring immediately."
To fix this you
have to set the following in your my.cnf and restart:
performance_schema_instrument = 'wait/synch/mutex/innodb/%=on'
performance_schema_instrument = 'wait/synch/cond/%=on' 3. Check
the setup_consumers table if you're still not getting data Setup
for p_s has at minimum two parts. the setup_instruments
table lets you customize which events you want to collect data
about, but you also have to enable a place to put that collected
data, which are the consumers
> select * from setup_consumers;
+--------------------------------+---------+ | NAME
| ENABLED |
+--------------------------------+---------+ |
events_stages_current |
NO | | events_stages_history
| NO | |
events_stages_history_long | NO
| | events_statements_current | YES
| | events_statements_history | YES
| | events_statements_history_long | NO
| | events_waits_current
| YES | | events_waits_history
| YES | |
events_waits_history_long | YES
| | global_instrumentation | YES
| | thread_instrumentation
| YES | | statements_digest
| YES |
+--------------------------------+---------+
Mostly at this point I'm interested in waits. These help us
measure where the server is spending it's time internally.
UPDATE setup_consumers SET ENABLED='YES' WHERE NAME LIKE
'%waits%';
4. Performance_schema can consume a lot of memory To make storing
and displaying performance schema data as fast as possible, it
uses a storage engine which is memory based. AFAIK it
allocates all this memory up front. Most of the buffers are
now auto-sized and the memory consumed depends a lot on the value
of max_connections you have set on your server.
SHOW ENGINE PERFORMANCE_SCHEMA STATUS;
+---------------------------+------------+ |
performance_schema.memory | 2283449856 |
+---------------------------+------------+
On a system with max_connections=10,000 it consumes around 2G of
RAM.
5. The structure of performance_schema tables changes often It's
common to see error messages like the following in the server's
error.log:
[ERROR] Native table 'performance_schema'.'events_waits_current'
has the wrong structure
This means you've forgotten to run mysql_upgrade after changing
the installed server version. This will happen when moving
from 5.5 to 5.6, but if you look at the changelog of MySQL 5.6
it's clear that lots of development is going on with
performance_schema which results in changes to the structure of
these tables. Therefore even with minor version upgrades
within the 5.6 branch you will need to run mysql_upgrade to
update these tables. Otherwise performance_schema will fail
to initialize and it will be unusable. Summary While performance
schema is super powerful, information about how to configure and
use it is lacking in content and accuracy. I'm not the
first one to raise these issues and hopefully by repeating them
we can get more attention on it.
I will try to post more about specific issues with upgrading from
5.5 to 5.6!