Monitoring tools offer two core types of functionality: alerts based on aliveness checks and comparing metrics to thresholds, and displaying time-series charts of status counters. Nagios + Graphite are the prototypical time-series tools that do these things.
But these tools don’t answer the crucial questions about what we should monitor. What kinds of aliveness/health checks should we build into Nagios? Which metrics should we monitor with thresholds to raise alarms, and what should the thresholds be? What graphs should we build of status counters, which graphs should we examine and what do they mean?
We need guiding principles to help answer these questions. This webinar briefly introduces the principles that motivate and inform what we do at VividCortex, then explains which types of health checks and charts are valuable and what conclusions should be drawn from them. The webinar is focused mostly on MySQL database monitoring, …
[Read more]