Few developers consider, when trying to build robust platforms,
all the possible modes of failure. Indeed, it is difficult to
consider them all, let alone plan for them, or design tests which
exercise particular symptoms.
In this post, I discuss some of the types of failure we can see
in real systems.
Complete server failure
Most developers DO consider this. In a "Complete server failure",
what generally happens is:
* The server stops processing new requests, completely.
* The server's OS no longer responds to any network request at
all (e.g. "ping")
* Processing does not continue within the server
* The contents of memory are immediately and irretrievably
lost.
Typically, the server recovers, and when it does so, it is
rebooted and restored to full health. All writes which were
acknowledged before its failure have been persisted.
This is …
Showing entries 1 to 1
May
23
2011
Showing entries 1 to 1