More of information about how we handle database stuff can be found in some of my talks.
Lately I hear people questioning database software choices we made at Wikipedia, and I’d like to point out, that…
Wikipedia database infrastructure needs are remarkably boring.
We have worked a lot on having majority of site workload handled by edge HTTP caches, and some of most database intensive code (our parsing pipeline) is well absorbed by just 160G of memcached arena, residing on our web servers.
Also, major issue with our databases is finding the right balance between storage space (even though text is stored in ‘external store’, which is just set of machines with lots of large slow disks) – we store information about every revision, every link, every edit – and available I/O performance per dollar for that kind of space needed.
As a …[Read more]