More importantly, how often to you confirm access to your server and database with that alert monitoring?
With a client yesterday the primary database server while still usable and serving connections for a while, but was not accessible via SSH to investigate performance issues. It eventually became non responsive and required a physical reboot. With alert monitoring for system availability only recorded every 5 minutes this was simply too long a delay.
This lead to a discussion with more questions then answers including.
- How often should you ping your server(s), both internally and externally?
- How often do you connect physically to your server for confirmation, e.g. a ssh keyed authentication test?
- How often do you perform a physical database connection test?
- How often do you do an end to end test, including http request to database query test?
As with all of …[Read more]