When mysqld kills mysqld

The other day a colleague and friend of mine, Peter Boros, had a case where one of our clients had to track down the process shutting down MySQL. This blog is based on the discussion we had about that internally.

Our client wanted Peter to identify the culprit behind periodic shutdowns. This proved to be slightly more difficult than usual, for reasons that you might guess from the title of this blog.

Here is what Peter saw in the logs:

150928 15:15:33 [Note] /usr/sbin/mysqld: Normal shutdown
150928 15:15:36 [Note] Event Scheduler: Purging the queue. 0 events
150928 15:15:39 [Warning] /usr/sbin/mysqld: Forcing close of thread 411515  user: 'dashboard'
150928 15:15:40  InnoDB: Starting shutdown...
150928 15:15:59  InnoDB: Shutdown completed; log sequence number 4873840375
150928 15:16:00 [Note] /usr/sbin/mysqld: Shutdown …
What stopped MySQL? Tracing back signals sent to MySQL

Have you ever had a case where you needed to find a process which sent a HUP/KILL/TERM or other signal to your database? Let me rephrase. Did you ever have to find which process messed up your night? If so, you might want to read on. I’m going to tell you how you can find it.

Granted, on small and/or meticulously managed systems tracking down the culprit is probably not a big deal. You can likely identify your process simply by checking what processes have enough privileges to send mysqld a HUP/KILL/TERM signal. However, frequently we see cases where this may not work or the elimination process would be too tedious to execute.

We recently had a case where a process was frequently sending SIGHUPs to mysqld and the customer asked us to see if we could get rid of his annoyance. This blog is the direct result of a discussion I had with my colleague …

Systemtap solves phantom MySQLd SIGTERM / SIGKILL issue

The Percona Managed Services team recently faced a somewhat peculiar client issue. We’d receive pages about their MySQL service being unreachable. However, studying the logs showed nothing out of the ordinary…. for the most part it appeared to be a normal shutdown and there was nothing in anyone’s command history nor a cron task to speak of that was suspicious.

This is one of those obscure and peculiar (read: unique) issues that triggered an old memory; I’d seen this behavior before and I had just the tool to catch the culprit in the act.

Systemtap made diagnostics of this issue possible and I can’t state enough how much of a powerful and often under-utilized tool set systemtap really is.

cat > signals.stp << EOF
probe signal.send {
if (sig_name == …

