Now it’s time to setup proper monitoring to avoid unpleasant surprises in future.
There are two major problems the monitoring solves: alerting and trending. Alerting is to notify a responsible person about some major event like service stopped working. Trending is to track the change of something over time – disk or memory usage over time, replication lag etc.
This post will be about alerting with Nagios.
The major problem with most of Nagios setups I’ve seen is excessive amount of false positives. This kills whole …[Read more...]