Stewart blogged about the archive storage
engine and he asked the Internet to test how fast ARCHIVE is
compared with other MySQL storage engines like Percona XtraDB and
Tokutek TokuDB. Since I've been curious about this subject as
well, I decided to give it a quick test.
Not very compressible data
I took a star schema benchmark "lineorder" file and grabbed the
first 15M rows. To each row I added a TEXT field called
"extra_data". This field contains 10 random words from
/usr/share/dict/words separated by space. This adds up to just
about 3GB of raw input data.
Insert performance with no indexes (best to worst)
TokuDB: 187K rows/sec
Query OK, 15000000 rows affected (1 min 20.25 sec)
XtraDB (uncompressed): 119K rows/sec:
Query OK, 15000000 rows affected (2 min 5.40 sec)
Archive: 61K rows/sec:
Query OK, 15000000 rows affected (4 min 3.15 sec)
XtraDB Compressed key_block_size=8: 6K row/sec:
I cancelled after 6 mins, that is 2.2M rows of 15M rows.
---TRANSACTION 514, ACTIVE 365 sec inserting 1 lock struct(s), heap size 376, 0 row lock(s), undo log entries 2233162
Test with indexes
I added the following keys to the table:
alter table insert_test add key(LO_CustKey), add key(LO_OrderDateKey), add key(LO_SuppKey), add key(LO_PartKey);
I then loaded the table as before to see how multiple indexes
affect insert performance.
TokuDB: 84K rows/sec:
Query OK, 15000000 rows affected (2 min 58.42 sec)
InnoDB: 53K rows/sec:
Query OK, 15000000 rows affected (4 min 40.05 sec)
Insert performance with more highly compressible data
This time I took the first 5M rows, but instead of just adding 10
random words, I added the words in a repeating "phrase". For
example: "word1 word2 ... word10 word1 word2 ... word10 word1
...". This generated about the same amount of data.
TokuDB (no indexes): 92K rows/sec:
Query OK, 5000000 rows affected (45.34 sec)
TokuDB (with indexes): 66K rows/sec:
Query OK, 5000000 rows affected (1 min 14.89 sec)
XtraDB (no indexes): 65K rows/sec:
Query OK, 5000000 rows affected (1 min 16.75 sec)
Archive: 43K rows/sec:
Query OK, 5000000 rows affected (1 min 55.24 sec)
XtraDB (with indexes): 36.6K rows/sec:
Query OK, 5000000 rows affected (2 min 16.23 sec)
XtraDB Compressed key_block_size=8 (no indexes): 19.25K
rows/sec:
Query OK, 5000000 rows affected (4 min 20.12 sec)
XtraDB Compressed key_block_size=8 (with indexes): 7.59K
rows/sec:
Query OK, 5000000 rows affected (10 min 58.43 sec)
Building index on copy of the 15M row table (best to worst)
alter table sortbuild_test add key(LO_CustKey), add key(LO_OrderDateKey), add key(LO_SuppKey), add key(LO_PartKey);
[edit: Fixed order to put TokuDB first]
TokuDB:
Query OK, 0 rows affected (1 min 40.55 sec)
XtraDB:
Query OK, 0 rows affected (2 min 21.93 sec)
Building index on copy of the 5M row table (InnoDB 8k only)
XtraDB Compressed key_block_size=8:
Query OK, 0 rows affected (1 min 14.05 sec)
Data size comparison (smallest disk footprint to largest)
[edit]
These numbers were calculated using INFORMATION_SCHEMA.TABLES
which reports the UNCOMPRESSED size for TokuDB tables. I hope
they fix this in a new release. See the comments for a look at
the tokudb data directory contents.
+------------------------------+-------------------+---------+---------+----------+ | table_schema | table_name | data_mb | idx_mb | sum_size | +------------------------------+-------------------+---------+---------+----------+ | archive_phrase | insert_test | 529.78 | 0.00 | 529.78 | | archive | insert_test | 1386.51 | 0.00 | 1386.51 | | innodb_compressed_phrase | insert_test_8k | 1601.00 | 0.00 | 1601.00 | | innodb_compressed_phrase_idx | sortbuild_test_8k | 1601.00 | 151.30 | 1752.30 | | innodb_compressed_phrase_idx | insert_test_8k | 1601.00 | 433.83 | 2034.83 | | tokudb | insert_test | X | X | X | | tokudb_phrase | insert_test | X | X | X | | tokudb_phrase_idx | insert_test | X | X | X | | innodb | insert_test | 3112.00 | 0.00 | 3112.00 | | innodb_phrase | insert_test | 3202.00 | 0.00 | 3202.00 | | tokudb_idx | insert_test | X | X | X | | tokudb_idx | sortbuild_test | X | X | X | | innodb_phrase_idx | insert_test | 3202.00 | 565.03 | 3767.03 | | innodb_idx | sortbuild_test | 3112.00 | 899.19 | 4011.19 | | innodb_idx | insert_test | 3112.00 | 1519.00 | 4631.00 | +------------------------------+-------------------+---------+---------+----------+
[edit]
The TokuDB test table is 1800MB (15M row table, 3G orig
size)
The TokuDB test2 table is 2171MB (keys built with ALTER)
The tokudb_idx database is 2202MB (keys built with INSERT)
The tokudb_phrase database is 681MB (5M row, 3G orig size)
The tokudb_phrase_idx database is 806MB (5M row, keys built with
INSERT)
[/edit]
Here is the CREATE TABLE for the data:
CREATE TABLE IF NOT EXISTS insert_test ( LO_OrderKey bigint not null, LO_LineNumber tinyint not null, LO_CustKey int not null, LO_PartKey int not null, LO_SuppKey int not null, LO_OrderDateKey int not null, LO_OrderPriority varchar(15), LO_ShipPriority char(1), LO_Quantity tinyint, LO_ExtendedPrice decimal, LO_OrdTotalPrice decimal, LO_Discount decimal, LO_Revenue decimal, LO_SupplyCost decimal, LO_Tax tinyint, LO_CommitDateKey int not null, LO_ShipMode varchar(10), extra_data TEXT ); For compression: ROW_FORMAT=compressed, KEY_BLOCK_SIZE=8
Here is the my.cnf for Percona XtraDB (not tuning for
tokudb):
[mysqld] #datadir=/data/datadirs/tokudb_55 datadir=/data/datadirs/percona_55 socket=/var/lib/mysql/mysql.sock user=mysql # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 innodb_buffer_pool_size=16G innodb_log_file_size=4G innodb_flush_neighbor_pages=cont innodb_fast_checksum innodb_file_per_table innodb_file_format=barracuda innodb_buffer_pool_instances=6 innodb_write_io_threads=12 innodb_read_io_threads=12 innodb_flush_method=O_DIRECT innodb_io_capacity=10000 read_buffer_size=2M key_buffer_size=32M [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid
[edit]
Test Hardware:
Intel i970-3.2GHz 6 core w/HT (12 threads) 24GB RAM LSI 9211-8i IT HBA DB on software RAID 0 - Two Intel 520 SSD, one OCZ vertex. 300GB total.