The slow query log is one of the most misused features of MySQL.
I say this because I often talk to people who are using it,
but they're not getting everything they could from it. It's
an awesomely powerful source of information! When used correctly
it can make performance optimization much simpler, and enable you
to really understand what your mysql server spends its time
doing.
Slow is faster than you think
The most misunderstood part of the query log is the "slow" part.
People often set the long query time to some high value
(default 10 seconds) and wait for a few outlier queries to
trickle in. At most they will reduce the threshold to 1
second to be really aggressive. Here's the a secret:
those queries aren't really the ones that
matter.
A query that runs for 60 seconds and hits your server twice a
day, may look ripe for optimization. It probably is.
You can probably get that query to run in less than a
second and pat yourself on the back for improving it by several
orders of magnitude. That's great, but that long query only
represents a tiny fraction of what keeps your server busy.
On the other hand you probably have queries which may take 50
milliseconds to run, but they're executing once per
second. In aggregate your server will be spending 4,320
seconds executing that query.
Even if you improve that query to take 25 ms, a much more modest
improvement, you're still cutting that 4k seconds of execution
time in half.
If you never gather stats from your slow query log for less than
1 second, then you would never see that 50 ms query and never
optimize it. Those are the queries that are most important
to fix!
Pre-MySQL 5.5 the minimum time you could set was 1 second.
Percona builds allowed you to set it
to sub-second values which was huge. Now it's
built-in to the Oracle MySQL builds. It was a such a
powerful feature that it was impossible to ignore. I think
lots of people have been trained to never think of it because old
versions of mysql didn't allow it. Now they do! Use
it!
You don't have to leave it on all the time
So you're sold on gathering data with a slower threshold, but you
say "My server can't possibly handle writing that much slow log,
I have thousands of queries per second!"
That's where sampling comes in. Don't capture everything
all the time. Capture everything part of the time.
I only gather about 1% of all the queries that hit the
server.
That can be as simple as:
mysql -e "SET GLOBAL long_query_time=0.0; SET GLOBAL slow_log=1"
&& sleep(36) && mysql -e " SET GLOBAL
slow_log=0"
Percona builds also allow for rate limiting slow query log
events. This is fantastic, because it means I can set it to
log 1 in 4 events, and then leave my collection on for 4 times as
long and still get 1% overall. This is lets me sample over
a longer period of time and get better coverage.
Process those logs!
Now that you're getting a firehose of performance data about your
server, it's not really practical to simply read them top to
bottom. You need a way to summarize the information to
extract the important information.
Not only that, but these collections are like a gold mine of
information -- you don't want to throw them away!
You don't have to save the raw .log files; the best way to
keep the data is to use pt-query-digest to process the log and insert
the results into a database. This digest keeps stats about
queries for each collection period. It intelligently strips
out all the variable information to identify which queries are
the same, and aggregates information about all of them.
This compacts that huge file into just a few hundred rows.
Keeping these digested samples is a good way you can build
a history of performance on your database.
Visualise the data
Now here's the part where I self promote: I wrote an
awesome tool to take this data that has been collected and
digested, and give you a nice web interface for exploring and
reporting on it. It's called Anemometer; it takes only minutes to get it
up and running and it's a really powerful tool combined with
the slow query log and pt-query-digest.
It allows you to quickly see which are the most expensive queries
running on your system, even across multiple hosts!
Summary That was really a lot of information, but getting slow query logging right will make a huge difference in how you handle performance optimization in your sever. Lets put it all together:
- When you turn on the slow query log, gather everything. That is set long_query_time = 0.0 or 0.001 or something similarly small.
- Sample your slow logs. You don't need to leave them on all the time. Set up a cron job to do this regularly so you have consistent data gathering. Aim for 1% sampling to start, you may never need more than that.
- Use pt-query-digest to process and store your data.
- Use Anemometer to visualise the data.