With TokuDB v6.6 out now, I’m excited to present one of my favorite enhancements: fast updates with TokuDB. Update intensive applications can have their throughput limited by the random read capacity of the storage system. The cause of the throughput limit is the read-modify-write algorithm that MySQL uses when processing update statements. MySQL reads a row from the storage engine, applies the updates to it, and then writes the new row to the storage engine. To address this throughput limit, TokuDB uses a different update algorithm that simply encodes the update expressions of the SQL statement into tiny programs that are stored in an update Fractal Tree® message. This update message is injected into the root of the Fractal Tree index. Eventually, these update messages reach a leaf node, where the update programs are applied to the row. Since messages are moved between Fractal Tree levels in batches, the cost of reading in the leaf node is amortized over many update messages.
What is a simple update statement? TokuDB can offload an update statement from MySQL that selects one and only one row by primary key. This simplifies the implementation. A future implementation could support updating a range of primary keys. We also require the update expressions to set a column to a constant, to add a constant to a column (increment is a special case), or to subtract a constant from a column (decrement is a special case). These constraints on update expressions are imposed by the current version of our expression parser. A future implementation could identify a larger set of update expressions from the expression trees created by the MySQL parser. We could extend our update machine language as needed.
What about affected rows? SQL update and insert statements compute the number of rows that are affected by the statement. The computation of affected rows requires one to know whether or not a row exists in the table prior to executing the statement. This requires a read from the storage system when the leaf node of the tree is not in memory. For very large trees, most of the leaf nodes are not in memory, so the performance tanks. Many applications do not look at the number of affected rows. We allow these applications to mark their update and insert statements with a modifier that allows the TokuDB storage engine to handle the entire update statement.
How does one use this feature? Suppose that we want to count a stream of events. We can create a table that associates events with their count. Here is an example table.
create table t ( id bigint unsigned not null primary key, count bigint unsigned not null );
If we don’t know if the event id exists in the table, then we use insert on duplicate key update statements to insert it if it does not exist in the table, or increment its count if it already exists. Here is an example marked duplicate key insertion statement where “%id” represents a specific event id.
insert noar into t values (%id, 1) on duplicate key update count=count+1;
If the event id’s are random, then the throughput of this application will be limited by the random read capacity of the storage system since each insert statement has to determine if the event id exists in the table. The “noar” modifier allows TokuDB to replace the primary key existence check with an insertion of an “upsert” message into the Fractal Tree index. This “upsert” message contains a copy of the row and a program that increments count. As the Fractal Tree buffer’s get filled, this “upsert” message is flushed down the tree. Eventually, the message reaches a leaf node and gets executed there. If the key exists in the leaf node, then the count is incremented. Otherwise, the new row is inserted into the leaf node.
If the event id is known to exist in the table, then we can use an update SQL statement to increment its count. Here is an example marked update statement.
update noar t set count=count+1 where id=%id;
TokuDB generates an “update” message from the update statement and its update expression trees, and inserts this message into the Fractal Tree index. When the message eventually reaches the leaf node, the increment program is extracted from the message and executed.
Many graph applications that map onto relational tables can use duplicate key inserts and updates to maintain the graph. For example, one can update the meta-data associated with a link in the graph using duplicate key insertions. If the affected rows is not used by the application, then the insertion or update can be marked and executed as a fast insertion or a fast update.
Stay tuned – we will publish fast update performance benchmarks over the next few weeks.