There is a bunch of posts on Planet MySQL this week about RAID. This comment from Kevin Burton really kind of made me go “huh?”.
You?re thinking too low level. Who cares if the disk fails. The entire shard is setup for high availability. Each server is redundant with 1-2 other boxes (depends on the number of replicas). If you have automated master promotion you?ll never notice any downtime. All the disks can fail in the server and a slave will be promoted to a new master.
Monitoring then catches that you have a failed server and you have operations repair it and put it back into production as a new slave.
Someone has to think low level. The key phrase in there is you have operations repair it and put it back into production as a …
[Read more]