A few years ago I wrote about Asynchronous Shard Queries verses Synchronous
Shard Queries, and in this post I talked about having to write a
server to handle this for me in Java. Now I do it in PHP and got
great results that are posted below.
Building the Feed was taking 100ms up to 40 seconds on the
initial load, if the feed is out of cache. This is not acceptable
for me as an engineer or the users I serve. Although the 40
seconds was rare it still is wrong.
The problem is as I added more shards to handle our data growth,
the feed got proportionally slower. To build the feed for large
users I would have to hit each server or at the very least a
large percentage of them. Adding capacity made things slower in
my Synchronous World.
My options to fix this issue where, do what I did in the past
which, is boring, or try something new. In the past I wrote a
server using Java with a Jetty Core that took a request and sent
a merged response to get around php not having the ability to be
parallel in a web context. It worked flawlessly but if that
server went down users would not be able to see their info.
Really its just another moving piece, another thing for me to
monitor and at my company there is just 4 engineers to handle
3 million (and growing) DAU.
The new option was to use a mysqlnd option that enables asynchronous
queries. Since I am running PHP 5.3.10 its an option that is
available as a shared mod. and works with PDO except for this
feature.
PDO is like Perl's DBI (but better) its an abstract that enables
you to switch SQL database servers without having to change your
method calls. The problem is PDO doesn't have the concept of
asynchronous queries but this is really not an issue. I suggest
for you to always write wrappers around used API's so you can
switch the underlying (vendor) api without having to change
multiple places in your code. I did such thing to PDO we call it
Shard which is a wrapper around the PDO object. Then through
polymorphism I am able to change query, connect, fetch
equivalents to support this mysqlnd feature.
Below is the code
ShardDB::getAsyncShardInstance($pool)->setSide($userId)->query($query)->fetchAll();
ShardDB is a static class that has a method getAsyncShardInstance
for a given shard pool. Really the construct design pattern is a
singleton which returns an async shard object that extends shard.
(I love objects things just work and is clean).
setSide is optional to say keep the query on the side which the
calling user uses unless that side is down.
query connects to the servers if not connected and executes the
sql.
fetchAll polls for the responses and is equivalent to PDO's
fetchAll.
Here is the query code:
public function query($query, $args = array()) {
foreach($this->connectionList as $shardId => $connection) {
Debugger::timer('async_query');
$connection->query($query, MYSQLI_ASYNC);
Debugger::log("ASYNC_QUERY", $query, 'async_query');
}
return $this;
}
Then fetch all
public function fetchAll() {
$connectionCount = count($this->connectionList);
Debugger::timer('async_poll');
$results = array();
$processed = 0;
do {
$links = $errors = $reject = array();
foreach ($this->connectionList as $i => $link) {
$links[] = $errors[] = $reject[] = $link;
}
if (!mysqli_poll($links, $errors, $reject, $this->defaultTimeout)) {
$spinned++;
continue;
}
foreach ($links as $link) {
if ($result = $link->reap_async_query()) {
while ($row = $result->fetch_object()){
$results[] = $row;
}
$result->free();
$processed++;
}
}
} while($processed++ < $connectionCount);
Debugger::log("ASYNC_QUERY", "Poll is done - Spinned $spinned times", 'async_poll');
return $results;
}
The affect of this code is illustrated below.
The blue line is greater than 2 seconds, which is bad then the
line goes away the the change above. There is still some work
needed to totally get rid of the greater than 2 second calls but
what I got from this one change was worth the effort.
Thanks to mySQL and the internet super hero for building this feature. It
was like it was built just for me!