A lot of monitoring systems have a goal of end-to-end tracing, from “click to disk” or something similar. This is usually implemented by adding some kind of tracing information to requests. You can take a look at X-Trace or Zipkin for a couple of examples. The idea is that you can collect complete traces of the entire call tree a user request generates, even across services and through different subsystems, so a slow web page load can be blamed on an overutilized disk somewhere.
I was at a database conference recently where this topic came up, and someone mentioned “blaming” resource usage on any of a variety of things. An example was blaming all disk I/O operations on tenants in a multi-tenant SaaS service. (My ears perked up, because VividCortex is such a service.)
VividCortex doesn’t do end-to-end tracing and it’s not a goal for us. However, the conversation made me pause and reexamine how I made the decision to …
[Read more]