Showing entries 1 to 3
Displaying posts with tag: disaster is inevitable (reset)
Disaster @ Tumblr

Tumblr has been down for more than 12 hours due to an issue with their database cluster. Here is the comment I left on GigaOm.com

This is the freshest lesson for entrepreneurs and startups:
- Learn to value your data
- Implement a high availability plan
- Plan a disaster recovery strategy

“Tumblr likely has the resources to recover…”

I really hope that holds out true but remember, data is the only irreplaceable asset of an organization. Once it’s gone, it’s gone.

When I was handling the disaster at Fotolog (massive database corruption when our SAN crashed), I couldn’t find any company or consulting firm ready to handle the situation and help with data recovery. It was a miracle that I came across the concept of DUDE (Data …

[Read more]
S3 suffers major outage

“Funny how Amazon doesn't use S3 to store any assets for amazon.com”tweet by @gruber

Amazon's S3 suffered a major outage today knocking many websites offline. S3 outage started at approximately 12:00 PM EST and the last time I checked at 11:11PM EST, Smugmug, a popular photo hosting site that extensively uses S3, was still down.

- S3 down for more than 7 hours
- S3 outage, 7 hours and counting
- S3 down again
- Amazon failure downs …

[Read more]
Disaster is Inevitable - Must shutdown generators

Disaster is really inevitable. Even with all the redundant power investments, ThePlanet (formerly EV1 and RackShack), had to shut down their backup generators at their H1 data center on the instructions of the fire crew. This happened after a wire-short in fault transformer led to an explosion that knocked off one of their walls, ultimately bringing 9,000 servers down. Luckily no one was injured.

This just goes on to show that just because a data center has redundant power and backup generators, it does not mean that a disaster cannot happen. IIRC, ThePlanet's last disaster was blamed on backup generators not kicking off properly.

While there was no damage to servers, I wonder how many MyISAM repairs need to be triggered once the servers do come back online?

- …

[Read more]
Showing entries 1 to 3