Showing entries 1 to 2
Displaying posts with tag: alert (reset)
What alert monitoring do you use?

More importantly, how often to you confirm access to your server and database with that alert monitoring?

With a client yesterday the primary database server while still usable and serving connections for a while, but was not accessible via SSH to investigate performance issues. It eventually became non responsive and required a physical reboot. With alert monitoring for system availability only recorded every 5 minutes this was simply too long a delay.

This lead to a discussion with more questions then answers including.

  • How often should you ping your server(s), both internally and externally?
  • How often do you connect physically to your server for confirmation, e.g. a ssh keyed authentication test?
  • How often do you perform a physical database connection test?
  • How often do you do an end to end test, including http request to database query test?

As with all of …

[Read more]
Gandi SiteMaker Incident (fixed)

An incident has occurred with on Gandi SiteMaker platform which has rendered your websites offline. We will keep you informed of the situation as soon as the problem has been identified. Please accept our apologies for any problem this may have caused.

10:38 => incident fixed.
11:00 => Following the network outage and subsequent impact on the SiteMaker service, we need to perform some emergency maintenance to bring the file system back up. It should take about an hour, in the meantime, there will be a maintenance page showing.
13:00 => Maintenance done

Showing entries 1 to 2