At GitHub we use MySQL as our main datastore. While repository
data lies in git
, metadata is stored in MySQL. This
includes Issues, Pull Requests, Comments etc. We also auth
against MySQL via a custom git proxy (babeld). To be able to serve under the high
load GitHub operates at, we use MySQL replication to scale out
read load.
We have different clusters to provide with different types of services, but the single-writer-multiple-readers design applies to them all. Depending on growth of traffic, on application demand, on operational tasks or other constraints, we take replicas in or out of our pools. Depending on workloads some replicas may lag more than others.
Displaying up-to-date data is important. We have tooling that
helps us ensure we keep replication lag at a minimum, and
typically it doesn’t exceed 1
…