Showing entries 1 to 6
Displaying posts with tag: wikitech (reset)
on swapping and kernels

There is much more to write about all the work we do at Facebook with memory management efficiency on our systems, but there was this one detour investigation in the middle of 2012 that I had to revisit recently courtesy of Wikipedia.

There are lots of factors that make machines page out memory segments into disk, thus slowing everything down and locking software up – from file system cache pressure to runaway memory leaks to kernel drivers being greedy. But certain swap-out scenarios are confusing – systems seem to have lots of memory available, with proper settings file system cache should not cause swapping, and obviously in production environment all the memory leaks are ironed out.

And yet in mid-2012 we noticed that our new kernel machines were swapping out for no obvious reason. When it comes to swapping, MySQL community will always point to Jeremy’s post on …

[Read more]
on wikipedia and mariadb

There’s some media coverage about Wikipedia switching to MariaDB, I just wanted to point out that performance figures cited are somewhat incorrect and don’t attribute gains to correct authors.

Proper performance evaluation should include not just MariaDB 5.5 but Oracle’s MySQL 5.5 version too, because thats where most of performance development happened (multiple buffer pools, rollback segments, change buffering et al).

5.5 is faster for some workloads, 5.1-fb can outperform 5.5 in other workloads (ones with lots of IO), it is good to know that there’s beneficial impact from upgrading (though I’d wait for 5.6), but it is important to state that it is an effort from Oracle as well, not just MariaDB developers.

[Read more]
on primary keys

5.1.46 has this change:

Performance: While looking for the shortest index for a covering index scan, the optimizer did not consider the full row length for a clustered primary key, as in InnoDB. Secondary covering indexes will now be preferred, making full table scans less likely.

In other words, if you have covering index on * (which is quite common on m:n mapping tables), use it rather than PK. As I have spent my time getting indexing right and having PKs be based on primary access pattern and SKs on secondary access pattern, I hereby not welcome the new change that suddenly reverses the behavior in late GA version.

Not good, when mysqldump queries end up taking 6 days instead of previous half an hour, not good at all.

Update: Oh, MariaDB has this reverted, from their …

[Read more]
MySQL versions at Wikipedia

More of information about how we handle database stuff can be found in some of my talks.

Lately I hear people questioning database software choices we made at Wikipedia, and I’d like to point out, that…

Wikipedia database infrastructure needs are remarkably boring.

We have worked a lot on having majority of site workload handled by edge HTTP caches, and some of most database intensive code (our parsing pipeline) is well absorbed by just 160G of memcached arena, residing on our web servers.

Also, major issue with our databases is finding the right balance between storage space (even though text is stored in ‘external store’, which is just set of machines with lots of large slow disks) – we store information about every revision, every link, every edit – and available I/O performance per dollar for that kind of space needed.

As a …

[Read more]
GDB 7!

I wasn’t prepared for this. After spending months playing with GDB development trees I somehow entirely missed that 7.0 release is getting close, and took me more than an hour to spot it.

My favorite features are python scripting and non-stop debugging. I was toying around with python scripting for a while, and was planning to make backtraces make sense. Having hands that open means that one can see PHP backtraces, when gdb’ing apache, see table names and states when MySQL thread access handler interfaces, or remote IPs and users, when it is writing to network. Process inspection can simply rock, if right tools are created using these new capabilities, and I’m way too excited when I think about those. “Always have debugging symbols” gets way more meaning now.

Another issue I’ve been trying to resolve lately is avoiding long locking …

[Read more]
on tools and operating systems

Sometimes people ask why do I use MacOSX as my main work platform (isn’t that something to do with beliefs?). My answer is “good foundation with great user interface”. Though that can be treated as “he must like unix kernel and look&feel!”, it is not exactly that.

What I like is that I can have good graphical stable environment with some mandatory tools (yes, I used OS-supplied browser, mail, etc), but beside that maintain the bleeding edge open-source space (provided by MacPorts).

Also what I like, is OS-supplied development and performance tools. DTrace included is awesome, yes, but Apple did put some special touch on it too. This is visualization environment for dtrace probes and other profiling/debugging tools:

Even the web browser (well, I upgraded to Safari4.0 ;-) provides some impressive debugging …

[Read more]
Showing entries 1 to 6