Benchmarking Percona Server TokuDB vs InnoDB

After compiling Percona Server with TokuDB, of course I wanted to compare InnoDB performance vs TokuDB.
I have a particular workload I’m interested in testing – it is an insert-intensive workload (which is TokuDB’s strong suit) with some roll-up aggregation, which should produce updates in-place (I will use INSERT .. ON DUPLICATE KEY UPDATE statements for that), so it will produce all good amount of reads.

A few words about the hardware: I am going to use new the Dell PowerEdge R420 with two Intel(R) Xeon(R) CPU E5-2450 0 @ 2.10GHz, 48GB of RAM and SATA SSD: Kingston HyperX 3K 240 GB.

Workload: I will use two different schemas. The first schema is from sysbench, and the table looks like:

CREATE TABLE sbtest$I (
id BIGINT UNSIGNED NOT NULL,
k INTEGER UNSIGNED DEFAULT '0' NOT NULL,
c CHAR(120) DEFAULT '' NOT NULL,
pad CHAR(60) DEFAULT '' NOT NULL,
PRIMARY KEY (id)
)
CREATE INDEX k on sbtest$I(k)

and tables sbtest$I_r10, sbtest$I_r100, sbtest$I_r1000, with roll-up sum for 10, 100, 1000 records in the main table.

and transactions for this workload are:

$ID=monotonically increasing ID
$K=rand(0,10000) // distributed by pareto distribution
$C, $PAD = random_string()
BEGIN
INSERT INTO sbtest (id, k, c, pad) VALUES ($ID, $K, $C, $PAD);
INSERT INTO sbtest_r10 (id, k) VALUES ($ID/10, $K) ON DUPLICATE KEY UPDATE k=VALUES(k)+k;
INSERT INTO sbtest_r100 (id, k) VALUES ($ID/100, $K) ON DUPLICATE KEY UPDATE k=VALUES(k)+k;
INSERT INTO sbtest_r1000 (id, k) VALUES ($ID/1000, $K) ON DUPLICATE KEY UPDATE k=VALUES(k)+k;
COMMIT;

So this workload produces SEQUENTIAL inserts into Primary Key, that this is quite suitable for InnoDB, and in it we have random inserts into SECONDARY KEYS (k),
which is not so good for InnoDB.

So let’s see what results we have. The results are in TPS (more is better) and we start with empty tables.

Now, before looking at the graph, please do not jump to conclusions, as the graph is MISLEADING.

So we see that InnoDB performance steadily declines from 24000 tps to 18000 tps, but InnoDB can’t make 5h run. After 3h the disk is full, and InnoDB data size is about 210GB with 234.238.440 inserted records.
While TokuDB averages around 14000 tps mark with some periodical drops into 10000 tps area.
TokuDB datasize after 5h of run is about 50GB with 276.934.863 records.

So why do I say that the graph is misleading?
Obviously we can say that InnoDB is faster, but you should look into the steady decline of InnoDB throughput. Eventually it will drop to the level 14000 tps and below. I do not have enough space on this SSD to run this experiment that long. So there we see the strong side of TokuDB: it has more than 4x data compression on this dataset.
We can easily fill TokuDB tables with 1bln of rows on this SSD, and projected InnoDB performance on this size will be the same or worse, but will require 1TB in size.

Now to see this point of intersection, let’s review different workload (which actually is closer to what I need).

Tables looks like:

CREATE TABLE `sbtest1` (
  `hid` int(10) unsigned NOT NULL DEFAULT '0',
  `mid` int(10) unsigned NOT NULL DEFAULT '0',
  `id` bigint(20) unsigned NOT NULL,
  `k` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`hid`,`mid`,`id`)
)

and transactions are:

$HID=rand(0,10000) // distributed by uniform distribution
$MID=rand(0,10000) // distributed by uniform distribution
$ID=monotonically non-decreasing ID
$K=rand(0,10000) // distributed by pareto distribution
BEGIN
INSERT INTO sbtest (hid, mid, id, k) VALUES ($HID, $MID, $ID, $K);
INSERT INTO sbtest_r10 (hid, mid, id, k) VALUES ($HID, $MID, $ID/10, $K) ON DUPLICATE KEY UPDATE k=VALUES(k)+k;
INSERT INTO sbtest_r100 (hid, mid, id, k) VALUES ($HID, $MID, $ID/100, $K) ON DUPLICATE KEY UPDATE k=VALUES(k)+k;
INSERT INTO sbtest_r1000 (hid, mid, id, k) VALUES ($HID, $MID, $ID/1000, $K) ON DUPLICATE KEY UPDATE k=VALUES(k)+k;
COMMIT;

That is, our PRIMARY KEY is not sequential anymore, which is bad for InnoDB, but this is what I need for my setup (I still can have synthetic auto_inc PK, but in that case I still will need SECONDARY KEY (hid,mid,id) ).

Also please note the transaction produces 4 INSERTs and workload is very write intensive.

So what are results in this case:

InnoDB gradually declines as data growth (which is expected) and by the end of 5 hours averages at 2700 tps.
With TokuDB we also see a drop, and by end of 5 hours the average throughput is 7800 tps.
Something to take into account: TokuDB results are not quite stable, that is why I also show 5-minute moving averages to TokuDB.

So TokuDB shows about 2.8x better throughput, and on data size:

  • InnoDB table: 58GB and 244.980.192 records
  • TokuDB table: 15GB and 232.927.460 records

So TokuDB looks better in this workload, however the sparse throughput is worrisome to me. Let’s zoom in to 10 min intervals and see throughput:

We can see periodical drops, which I believe are related to 60-sec checkpoint interval, as TokuDB does time-based checkpoints.
These drops are quite concerning, and it might be a problem for some users.

Now, I understand that my PRIMARY KEY (hid,mid,id) where id is sequential, and hid,mid is low selectivity is not good for fast inserts, but it is suitable for range selects by id. However it will interesting how both InnoDB and TokuDB performs if PK is (id,hid,mid). This also will affect select performance, so we will need to measure that also.

And, if you want to repeat this benchmark, the sysbench code is on Launchpad lp:~vadim-tk/sysbench/insert-roll-2,
command line to run:

sysbench --test=insert_roll.lua --oltp-table-size=10000  --mysql-user=root --oltp-tables-count=32 --mysql_table_engine=tokudb --oltp_auto_inc=on --max-time=18000 --report-interval=10 --max-requests=0 --num-threads=32 --rand-type=pareto run

and InnoDB options are:

[mysqld_safe]
numa-interleave
malloc-lib=/usr/local/Percona-Server-5.5.30-rel30.2-500.Linux.x86_64/lib/mysql/libjemalloc.so
[mysqld]
gdb
datadir=/mnt/data/mysql
#for SSD
innodb_flush_neighbor_pages = none
innodb_adaptive_flushing_method = keep_average
innodb_file_per_table = true
innodb_data_file_path = ibdata1:100M:autoextend
innodb_flush_method = O_DIRECT
innodb_log_buffer_size = 256M
innodb_flush_log_at_trx_commit = 1
innodb_buffer_pool_size = 36G
innodb_log_file_size = 4G
innodb_log_files_in_group = 2
innodb_log_block_size=4096
#####plugin options
innodb_read_io_threads = 16
innodb_write_io_threads = 4
innodb_io_capacity = 4000
innodb_use_native_aio=0
#not innodb options (fixed)
port = 3306
back_log = 50
max_connections = 2000
max_prepared_stmt_count=500000
max_connect_errors = 10
table_open_cache = 2048
max_allowed_packet = 16M
binlog_cache_size = 16M
max_heap_table_size = 64M
sort_buffer_size = 4M
join_buffer_size = 4M
thread_cache_size = 1000
query_cache_size = 0
query_cache_type = 0
ft_min_word_len = 4
#default_table_type = InnoDB
thread_stack = 192K
tmp_table_size = 64M
server-id = 10
#*** MyISAM Specific options
key_buffer_size = 8M
read_buffer_size = 1M
read_rnd_buffer_size = 4M
bulk_insert_buffer_size = 8M
myisam_sort_buffer_size = 8M
myisam_max_sort_file_size = 10G
#myisam_max_extra_sort_file_size = 10G
myisam_repair_threads = 1
myisam_recover
socket=/var/lib/mysql/mysql.sock
user=root
skip-grant-tables

TokuDB-related options are all defaults, as I understand from documentation TokuDB comes with good settings out-of-box, but I am ready to tune something if there are suggestions.

The post Benchmarking Percona Server TokuDB vs InnoDB appeared first on MySQL Performance Blog.