To help the more than 1.23 billion people who use Facebook to
share and connect with each other, we’ve had to build an
expansive and incredibly advanced infrastructure -- including one
of the largest deployments of MySQL in the world. Along the way,
we’ve learned and benefited from code changes made by the MySQL
community. Today we’re announcing WebScaleSQL, a collaboration
among engineers from several companies that face similar
challenges in running MySQL at scale and seek greater performance
from a database technology tailored to their needs.
WebScaleSQL currently includes contributions from MySQL
engineering teams at Facebook, Google, LinkedIn, and Twitter.
Together, we’re working to share a common base of code changes to
the upstream MySQL branch that we can all use and that will be
made available via open source. This collaboration will expand on
existing efforts by the MySQL community, and we will continue to
track the upstream branch that is the latest, production-ready
release (currently MySQL 5.6).
Our goal in launching WebScaleSQL is to enable the scale-oriented
members of the MySQL community to work more closely together in
order to prioritize the aspects that are most important to us. We
aim to create a more integrated system of knowledge-sharing to
help companies leverage the great features already found in MySQL
5.6, while building and adding more features that are specific to
deployments in large scale environments. In the last few months,
engineers from all four companies have contributed code and
provided feedback to each other to develop a new, more unified,
and more collaborative branch of MySQL.
But as effective as this collaboration has been so far, we know
we’re not the only ones who are trying to solve these particular
challenges. So we will keep WebScaleSQL open as we go, to
encourage others who have the scale and resources to customize
MySQL to join in our efforts. And of course we will welcome input
from anyone who wants to contribute, regardless of what they’re
currently working on.
What we’ve built so far:
We want WebScaleSQL to be able to collaborate effectively and to
move fast. To that end, we have set up a system for
collaborating, reviewing code, and reporting bugs. For example,
to introduce a code change, a WebScaleSQL engineer can propose a
change. Then a WebScaleSQL engineer from another company will
review the code and provide feedback. If both engineers agree the
change makes sense and is functional, it
will be pushed into the WebScaleSQL branch for everyone to use.
Beyond this, each organization may further customize WebScaleSQL
to suit its own needs, just as we all do today.
This has already produced exciting results. Working together, the
engineers involved in WebScaleSQL have made major changes to aid
in the development of the new branch, including:
- An automated framework that will, for each proposed change, run and publish the results of MySQL's built-in test system (mtr).
- A full new suite of stress tests (https://github.com/webscalesql/webscalesql-5.6/commit/8b6adf69913226cab5cf8aaf45914e66b812692d) and a prototype automated performance testing system.
- Several changes to the tests already found in MySQL, and to the structure of some existing code, to avoid problems where otherwise safe code changes had previously caused tests to fail or caused unnecessary conflicts. These changes make it easier to work on the code and helped us get started creating WebScaleSQL.
- Several changes to improve the performance of WebScaleSQL, including buffer pool flushing improvements (https://github.com/webscalesql/webscalesql-5.6/commit/1aa4d3cf18f71d7e30da35cc4082a786c2870f49, https://github.com/webscalesql/webscalesql-5.6/commit/d90a06daebb3abbbb3aacfe23168a33c7a940c4a), optimizations to certain types of queries (https://github.com/webscalesql/webscalesql-5.6/commit/d72b580597fecbdbb5b2f96cc9f57c946889fea4), support for NUMA interleave policy (https://github.com/webscalesql/webscalesql-5.6/commit/175520ac44545decff760506fa24b98ea5c21dff), and more.
- New features that make operating WebScaleSQL at true web scale easier, such as super_read_only (https://github.com/webscalesql/webscalesql-5.6/commit/4142091449dd439d473ab22f2e5d60b326e01dc7), and the ability to specify sub-second client timeouts (https://github.com/webscalesql/webscalesql-5.6/commit/c1d98ebd607c571f554e96c2b477a7d9f826b4bf).
What we’re working on now:
After these initial accomplishments, we’ve started work on a
number of other improvements to upstream MySQL. A few activities
that Facebook’s WebScaleSQL team is currently working on:
- Contributing an asynchronous MySQL client (https://reviews.facebook.net/D17025, https://reviews.facebook.net/D17031) which means that while querying MySQL, we don’t have to wait to connect, send, or retrieve. This non-blocking client (http://www.percona.com/live/mysql-conference-2014/sessions/asynchronous-mysql-how-facebook-queries-databases) is currently being code-reviewed by the other WebScaleSQL teams, after being used in production at Facebook for many months.
- Preparing to move Facebook's production-tested versions of table, user, and compression statistics into WebScaleSQL.
- Preparing to push the remaining components of Facebook's current production-tested version of compression that were not already included in MySQL 5.6 into WebScaleSQL.
- Adding the Logical Read-Ahead mechanism (http://yoshinorimatsunobu.blogspot.com/2013/10/making-full-table-scan-10x-faster-in.html) that we have proven in production to achieve large, quantifiable speed improvements (up to 10x) to full table scans, such as nightly logical back-ups.
What to expect in the future:
We will keep all our WebScaleSQL work open, to create a useful
branch for others within the MySQL community who are focused on
scale deployments. We’ll continue to follow the most up-to-date
upstream version of MySQL. As long as the MySQL community
releases continue, we are committed to remaining a branch – and
not a fork – of MySQL.
We’re excited to expand our existing work on WebScaleSQL, and we
think that this collaboration represents an opportunity for the
scale-oriented members of the MySQL community to work together in
a more efficient and transparent way that will benefit us
all.
To learn more about how to get involved, visit: http://webscalesql.org/