Chasing bugs can be a tedious task, and multi-threaded software doesn’t make it any easier. Threads will be scheduled at different times, instructions will not have deterministic results, and in order for one to reproduce a particular issue, it might require the exact same threads, doing the exact same work, at the exact same time. As you can imagine, this is not straightforward.
Let’s say your database is crashing or even having a transient stall. By the time you get to it, the crash has happened and you are stuck restoring service quickly and doing after-the-fact forensics. Wouldn’t it be nice to replay the work from right before or during the crash and see exactly what was happening?
Record and Replay is a technique where we record the execution of a program allowing it to be replayed over and over producing the same result. Engineers at …
[Read more]