We’ve had a couple of issues with some of our server infrastructure recently, which have affected portions of our customer base. In this blog post I want to explain what has happened, why, and what we’re doing to correct and prevent it.
I am writing a combined report of these issues because the first one wasn’t fully understood when the second one happened, and because the issues largely have the same contributing factors.
I apologize to our customers who have been impacted. Monitoring is supposed to be more highly available than the monitored systems. I know firsthand how damaging it can be when you can’t access your monitoring data. I take this very seriously and the whole team is working hard to prevent it from recurring.
Summary of Incidents
- On November 15, 2015, from 19:15 until 22:30 Eastern time, some customer data ingest was delayed. Up to 25% of customer environments were affected at …