When you have a node failure the best practice is to restart the
ndbd node as soon as possible.
This post illustrates what happens with restart times:
- if you have configured
TimeBetweenLocalCheckpoints
wrong and have a high load - if you don't restart the failed ndbd node immediately
- how long time it takes to resync 1GB of data with and without
ongoing transactions.
5.0 and 5.1 differences in node recovery protocol
In 5.0 the node recovery protocol copies all data from the other
node in its node group.
In 5.1, there is a new node recovery protocol, called "Optimized
Node Recovery", called ONR below.
When a node has failed, and is restarted, it will recover its
local logs (Local Checkpoint (LCP) + redo log) and only copy the
changes that has been made on the started node.
This is faster than copy all information as done in …