How to use tcpdump on very busy hosts

Often I run into problems when trying to use mk-query-digest with tcpdump on “very” busy hosts. You might be thinking, “very busy is a relative and unquantifiable term,” and you’d be right, so I’ll phrase this differently. Let me give a little background to the problem first. Mk-query-digest tries to handle dropped or missing packets gracefully, but even so they can skew results dramatically. Imagine a situation where a single connection sends two queries and gets two responses, with a response time of R for each request, and a wait time of W between the requests. If the first response and second request are dropped by the kernel, the result – from mk-query-digest’s perspective – is that the database took 2R + W time to respond to the first request.

Back to the question of, “what is a very busy host?” In my experience, if you are getting even 5% of tcpdump packets dropped by the kernel, the results can be skewed enough to cause confusion about which queries are really slow. Recently, I got more than 60% dropped packets on a server with roughly 50MB/s of traffic on port 3306, system load of about 10, and 8 CPU cores. The resulting mk-query-digest output was obviously bogus when compared to the host’s slow-query-log (for example, none of the top 5 slow queries reported by mkqd appeared in the actual slow log file). After a little brain-storming, we came up with a few solutions:

  1. use “mk-query-digest –filter ‘$event->{time} && $event->{time} > 1′” to exclude all queries which it believes took longer than the servers long-query-time of 1 second
  2. tcpdump traffic only from a small and representative subset of clients
  3. tcpdump a modulo of incoming ports (including port 3306, because we must capture the responses from mysqld)

#1 has an obvious flaw — if your long-query-time is 1 second, and mkqd believes that a query which actually took 10ms instead took 0.9s, the results are still useless. That is to say, this doesn’t actually solve the real problem of dropped packets, it just applies a mask to the output. #2 seems like the simplest good solution, but when I tested this, I still got very high percentage of dropped packets (around 30% when filtering only 4 out of hundreds of active clients). While this is lower than without the filter, it is still unusable. #3 actually worked very well and resulted in about 0.2% packet loss on this host, which is acceptable — the variances are statistically smoothed out and don’t noticeably affect the results. Here is the tcpdump command used.

tcpdump -i eth0 -s 65535 -x -n -q -tttt 'port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2'

I also had to forward the tcpdump output  to another host for processing because the database host couldn’t handle the additional IO or CPU pressure of either writing it to a file or piping it to mk-query-digest. Here is a draft of a script to automate this; use at your own risk and update to suit your needs.

 

#!/bin/bash
# Set some defaults
d=$(date +%F-%T | tr :- _)
self=`basename $0`
REMOTEHOST='some.other.hostname'           # CHANGEME
LIMIT=50                                   # limit # of queries in report
SLEEPTIME=1200                             # duration to capture tcpdump data in seconds (1200 = 20 min)
TMP='/tmp'
LOCKFILE="$self.lockfile"
TMPfile="$self.temp"
RESfile="$self.result"

# check lock file
if [ -e "$TMP/$LOCKFILE" ]; then
 echo "$self: lock file $LOCKFILE already exists, aborting"
 exit 1
fi

# set trap to be sure tcpdump doesn't run for ever
# and clean up the temp file too
trap  'rm -f $LOCKFILE; kill $PID; ssh $REMOTEHOST -- "rm -f $TMP/$TMPfile"; exit' INT TERM EXIT
touch $TMP/$LOCKFILE

# run the tcpdump & write to remote file and sleep for a bit
tcpdump -i eth0 -s 65535 -x -n -q -tttt 'port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2' 2>/dev/null \
 | ssh $REMOTEHOST -- "cat - > $TMP/$TMPfile" &

PID=$!
sleep $SLEEPTIME
kill $PID

# set trap to be sure both remote files are removed
trap 'ssh $REMOTEHOST -- "rm -f $TMP/$TMPfile $TMP/$RESfile"; rm -f $LOCKFILE $RESfile; exit' INT TERM EXIT

# digest the result, copy to localhost, then email it
ssh $REMOTEHOST -- "mk-query-digest --type tcpdump --limit $LIMIT < $TMP/$TMPfile 2>&1 > $TMP/$RESfile"
scp -q $REMOTEHOST:$TMP/$RESfile $RESfile

# email $RESfile using your preferred transport

# clean up remote and local files.
ssh $REMOTEHOST -- "rm -f $TMP/$TMPfile $TMP/$RESfile"
rm -f $RESfile $LOCKFILE

trap - INT TERM EXIT
exit 0