Showing entries 1 to 2
Displaying posts with tag: SPOF (reset)
Can I have your horror-stories, please? (SANs and VMs)

Please make it descriptive, graphic, and if anything burnt or exploded I'd love to have pictures.
Include an approximate timeline of when things happened and when it was all working again (if ever).

This somewhat relates to the earlier post A SAN is a single point-of-failure, too. Somehow people get into scenarios where highly virtualised environments with SANs get things like replication and everything, but it all runs on the same hardware and SAN backend. So if this admittedly very nice hardware fails (and it will!), the degree of "we're stuffed" is particularly high. The reliance in terms of business processes is possibly a key factor there, rather than purely technical issues.

Anyway, if you have good stories of (distributed?) SAN and VM infra failure, please step up and tell all. It'll help prevent similar issues for …

[Read more]
So you want to talk about Single points of failure, eh?

In reply to Arjen's post about Single points of failure:

Arjen, you are absolutely right.  It doesn't matter how over-engineered a storage solution is (I'm thinking of a giant dual-headed Netapp with redundant everything).  After you've paid a few hundred K for that, you still have a single point of failure.  Is it a highly-unlikely point of failure?  Sure, but it's still a point of failure. 
Let's take it a step further, at Yahoo we're beyond thinking about how to make a single node redundant (be it for storage, networking, or even a simple webserver), we consider entire datacenters to be single points of failure.  What does that mean?  

read more

Showing entries 1 to 2