In an application such as a database server, instrumentation is like sex: it’s not enough to know how often things happen. You also care about how long they took, and in many cases you want to know how big they were.
“Things” are the things you want to optimize. Want to optimize queries? Then you need to know what activities that query causes to happen. Most systems have at least some of this kind of instrumentation. If you look around at… let’s not pick on the usual targets… oh, say Sphinx, Redis, and memcached. What metrics do they provide? They provide counters that say how often various things happened. (Most of these systems provide very few and coarse-grained counters.) That’s not very helpful. So I read from disk N times, and I read from memory N times, and I compared rows N times… so what? I still don’t know anything relevant to execution time.
That’s why we need to measure how long things took. It’d be …
[Read more]