Archive vs Percona XtraDB vs Tokutek TokuDB LOAD DATA performance

Stewart blogged about the archive storage engine and he asked the Internet to test how fast ARCHIVE is compared with other MySQL storage engines like Percona XtraDB and Tokutek TokuDB. Since I've been curious about this subject as well, I decided to give it a quick test.

Not very compressible data
I took a star schema benchmark "lineorder" file and grabbed the first 15M rows. To each row I added a TEXT field called "extra_data". This field contains 10 random words from /usr/share/dict/words separated by space. This adds up to just about 3GB of raw input data.

Insert performance with no indexes (best to worst)
TokuDB: 187K rows/sec

Query OK, 15000000 rows affected (1 min 20.25 sec)



XtraDB (uncompressed): 119K rows/sec:

Query OK, 15000000 rows affected (2 min 5.40 sec)



Archive: 61K rows/sec:

Query OK, 15000000 rows affected (4 min 3.15 sec)



XtraDB Compressed key_block_size=8: 6K row/sec:
I cancelled after 6 mins, that is 2.2M rows of 15M rows.

---TRANSACTION 514, ACTIVE 365 sec inserting
1 lock struct(s), heap size 376, 0 row lock(s), undo log entries 2233162



Test with indexes
I added the following keys to the table:

alter table insert_test add key(LO_CustKey), add key(LO_OrderDateKey),
add key(LO_SuppKey), add key(LO_PartKey);



I then loaded the table as before to see how multiple indexes affect insert performance.

TokuDB: 84K rows/sec:

Query OK, 15000000 rows affected (2 min 58.42 sec)



InnoDB: 53K rows/sec:

Query OK, 15000000 rows affected (4 min 40.05 sec)



Insert performance with more highly compressible data
This time I took the first 5M rows, but instead of just adding 10 random words, I added the words in a repeating "phrase". For example: "word1 word2 ... word10 word1 word2 ... word10 word1 ...". This generated about the same amount of data.

TokuDB (no indexes): 92K rows/sec:

Query OK, 5000000 rows affected (45.34 sec)



TokuDB (with indexes): 66K rows/sec:

Query OK, 5000000 rows affected (1 min 14.89 sec)



XtraDB (no indexes): 65K rows/sec:

Query OK, 5000000 rows affected (1 min 16.75 sec)



Archive: 43K rows/sec:

Query OK, 5000000 rows affected (1 min 55.24 sec)



XtraDB (with indexes): 36.6K rows/sec:

Query OK, 5000000 rows affected (2 min 16.23 sec)



XtraDB Compressed key_block_size=8 (no indexes): 19.25K rows/sec:

Query OK, 5000000 rows affected (4 min 20.12 sec)



XtraDB Compressed key_block_size=8 (with indexes): 7.59K rows/sec:

Query OK, 5000000 rows affected (10 min 58.43 sec)



Building index on copy of the 15M row table (best to worst)

alter table sortbuild_test  add key(LO_CustKey), add key(LO_OrderDateKey),  add key(LO_SuppKey), add key(LO_PartKey);


[edit: Fixed order to put TokuDB first]
TokuDB:

Query OK, 0 rows affected (1 min 40.55 sec)



XtraDB:

Query OK, 0 rows affected (2 min 21.93 sec)





Building index on copy of the 5M row table (InnoDB 8k only)

XtraDB Compressed key_block_size=8:

Query OK, 0 rows affected (1 min 14.05 sec)




Data size comparison (smallest disk footprint to largest)
[edit]
These numbers were calculated using INFORMATION_SCHEMA.TABLES which reports the UNCOMPRESSED size for TokuDB tables. I hope they fix this in a new release. See the comments for a look at the tokudb data directory contents.

+------------------------------+-------------------+---------+---------+----------+
| table_schema                 | table_name        | data_mb | idx_mb  | sum_size |
+------------------------------+-------------------+---------+---------+----------+
| archive_phrase               | insert_test       |  529.78 |    0.00 |   529.78 |
| archive                      | insert_test       | 1386.51 |    0.00 |  1386.51 |
| innodb_compressed_phrase     | insert_test_8k    | 1601.00 |    0.00 |  1601.00 |
| innodb_compressed_phrase_idx | sortbuild_test_8k | 1601.00 |  151.30 |  1752.30 |
| innodb_compressed_phrase_idx | insert_test_8k    | 1601.00 |  433.83 |  2034.83 |
| tokudb                       | insert_test       |    X    |    X    |     X    |
| tokudb_phrase                | insert_test       |    X    |    X    |     X    |
| tokudb_phrase_idx            | insert_test       |    X    |    X    |     X    |
| innodb                       | insert_test       | 3112.00 |    0.00 |  3112.00 |
| innodb_phrase                | insert_test       | 3202.00 |    0.00 |  3202.00 |
| tokudb_idx                   | insert_test       |    X    |    X    |     X    |
| tokudb_idx                   | sortbuild_test    |    X    |    X    |     X    |
| innodb_phrase_idx            | insert_test       | 3202.00 |  565.03 |  3767.03 |
| innodb_idx                   | sortbuild_test    | 3112.00 |  899.19 |  4011.19 |
| innodb_idx                   | insert_test       | 3112.00 | 1519.00 |  4631.00 |
+------------------------------+-------------------+---------+---------+----------+



[edit]
The TokuDB test table is 1800MB (15M row table, 3G orig size)
The TokuDB test2 table is 2171MB (keys built with ALTER)
The tokudb_idx database is 2202MB (keys built with INSERT)
The tokudb_phrase database is 681MB (5M row, 3G orig size)
The tokudb_phrase_idx database is 806MB (5M row, keys built with INSERT)
[/edit]

Here is the CREATE TABLE for the data:

CREATE TABLE IF NOT EXISTS insert_test
(
LO_OrderKey bigint not null,
LO_LineNumber tinyint not null,
LO_CustKey int not null,
LO_PartKey int not null,
LO_SuppKey int not null,
LO_OrderDateKey int not null,
LO_OrderPriority varchar(15),
LO_ShipPriority char(1),
LO_Quantity tinyint,
LO_ExtendedPrice decimal,
LO_OrdTotalPrice decimal,
LO_Discount decimal,
LO_Revenue decimal,
LO_SupplyCost decimal,
LO_Tax tinyint,
LO_CommitDateKey int not null,
LO_ShipMode varchar(10),
extra_data TEXT
);

For compression:
ROW_FORMAT=compressed, KEY_BLOCK_SIZE=8



Here is the my.cnf for Percona XtraDB (not tuning for tokudb):

[mysqld]
#datadir=/data/datadirs/tokudb_55
datadir=/data/datadirs/percona_55
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

innodb_buffer_pool_size=16G
innodb_log_file_size=4G
innodb_flush_neighbor_pages=cont
innodb_fast_checksum
innodb_file_per_table
innodb_file_format=barracuda
innodb_buffer_pool_instances=6
innodb_write_io_threads=12
innodb_read_io_threads=12
innodb_flush_method=O_DIRECT
innodb_io_capacity=10000

read_buffer_size=2M
key_buffer_size=32M

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid


[edit]
Test Hardware:

Intel i970-3.2GHz 6 core w/HT (12 threads)
24GB RAM
LSI 9211-8i IT HBA
DB on software RAID 0 - Two Intel 520 SSD, one OCZ vertex.  300GB total.