The more databases you have in your cluster the greater the
probability they’re going to fail.
It’s basically MTBF/N until you have one of these boxes crash. If
you have 10k machines expect multiple failures per day.
If you’re performing realtime writes to these databases you can
lose a DB in the middle of a write.
Now what? Do you lose transactions?
Most people using InnoDB/MyISAM have used write caching
controllers to solve this problem.
You buy this expensive card with 128MB of memory which
temporarily caches writes. The card has a battery so if you lose
power you spin the disks until you can get your data onto the
disks.
But as I mentioned before, these devices are expensive. Expect
them to add another 20% on the price of your cluster.
This doesn’t sound like a ton of cash if you have one or two
machines but if you’re buying 50-1000 it’s a significant …
[Read more]