Showing entries 1 to 10 of 128
10 Older Entries »
Displaying posts with tag: RocksDB (reset)
Comparing TokuDB, RocksDB and InnoDB Performance on Intel(R) Xeon(R) Gold 6140 CPU

Recently one of our customers wanted us to benchmark InnoDB, TokuDB and RocksDB on Intel(R) Xeon(R) Gold 6140 CPU (with 72 CPUs),  nvme SSD (7 TB) and  530 GB RAM for performance. We have used Ubuntu xenial 16.04.4, Percona Server 5.7 (included storage engines- InnoDB/XtraDB, TokuDB and RocksDB) and  Sysbench 1.0.15 with custom Lua scripts for this exercise, This benchmarking exercise included bulk INSERTS, WRITES, READS and READS-WRITES. We have tried our best to capture maximum information about the hardware infrastructure and copied / shared scripts we have used for benchmarking. This is not a paid / sponsored benchmarking effort by any of the software or hardware vendors, We will remain forever an vendor neutral and independent web-scale database infrastructure operations company with core expertise in performance, scalability, high availability and database reliability engineering. This benchmarking is …

[Read more]
Lock elision, pthreads and MySQL

Yesterday I learned that lock elision is supported in recent versions of glibc for pthread mutex and rw-lock. I am curious if anyone has results for MySQL with it. My memory is that InnoDB can suffer from contention on a rw-lock, but that is a custom rw-lock not the one included with glibc. But code above the storage engine uses mutex and maybe rw-lock from glibc.

A rw-lock where reads dominate can suffer from contention because it has at least twice the memory writes per lock/unlock pair compared to a mutex. So when the lock hold time is short a mutex wins even when exclusive access isn't required. This can often be seen in PMP output where there are convoys and the worst-case is when a thread gets stuck trying to get the internal latch during unlock, but the InnoDB custom rw-lock might not have …

[Read more]
Tiered or leveled compaction, why not both via adaptive compaction?

First there was leveled compaction and it was good, but it took a while for implementations to become popular. Then there was (size) tiered compaction and it was good too, but more confusing given the diversity in strategies for picking files to compact. RocksDB didn't help with the confusion by calling it universal compaction. Eventually compaction algorithms optimized for time series were added (see DTCS for Cassandra). Finally, Kudu and InfluxDB have specialized compaction algorithms that are also worth understanding.

This post is about adaptive compaction which is yet another compaction algorithm. The summary for adaptive compaction is:

[Read more]
Default options in MyRocks

We need to make MyRocks easier to configure -- this isn't a new idea. If you are using MyRocks with default options in mid-2018 then you are probably not using bloom filters, compression or my favorite compaction policy.

You can fix all of that by setting rocksdb_default_cf_options. I wish this were the default.
rocksdb_default_cf_options=block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;compaction_pri=kMinOverlappingRatioThe above will enable the default compression type for all levels of the LSM tree which is Snappy in a recent MyRocks build with FB MySQL. But one of the proper distros only provides zlib and doing that for the small levels in the LSM tree (L0, L1, L2) might slow down compaction too much.

To set rocksdb_default_cf_options but disable compression use:

[Read more]
CPU overheads for RocksDB queries

An LSM like RocksDB has much better write and space efficiency than a B-Tree. That means with RocksDB you will use less SSD than with a B-Tree and the SSD will either last longer or you can use lower endurance SSD. But this efficiency comes at a cost. In the worst-case RocksDB might use 2X more CPU/query than a B-Tree, which means that in the worst case QPS with RocksDB might be half of what it is with a B-Tree. In the examples below I show that RocksDB can use ~2X more comparisons per query compared to a B-Tree and it is nice when results in practice can be explained by theory.

But first I want to explain the context where this matters and where it doesn't matter. It matters when the CPU overhead from RocksDB is a significant fraction of the query response time -- so the workload needs to be CPU bound (cached working set). …

[Read more]
Percona Server for MySQL 5.7.22-22 Is Now Available

Percona announces the GA release of Percona Server for MySQL 5.7.22-22 on on May 31, 2018. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.

Based on MySQL 5.7.22, including all the bug fixes in it, Percona Server for MySQL 5.7.22-22 is the current GA release in the Percona Server for MySQL 5.7 series. Percona …

[Read more]
Percona Server for MySQL 5.7.21-21 Is Now Available with Increased Built-In Security Enhancements

Percona announces the GA release of Percona Server for MySQL 5.7.21-21 on on April 24, 2018. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.

This version of Percona Server for MySQL 5.7.21 includes three new encryption features – Vault keyring plug-in, encryption for InnoDB general tablespaces, and encryption for binary log files.

These new capabilities, which allow companies to immediately …

[Read more]
Index Structures, Access Methods, whatever

I prefer to use Index Structures when writing about algorithms for indexing data stored on disk and SSD but Access Methods is another popular name. Recently I have been working on a high-level comparison (lots of hand waving) of index structures in terms of read, write, space and cache amplification.

I started by dividing the index structures into tree-based and hash-based. Tree-based support range queries while hash-based do not. Most of my experience is with tree-based approaches. There was a lot of research on hash-based in the 80s (extendible, dynamic and linear hashing), they are in production today but not as prominent as b-trees, and there is recent research on them. This paper by Seltzer and Yigit is a great overview on hash-based index …

[Read more]
Cache amplification

How much of the database must be in cache so that a point-query does at most one read from storage? I call this cache-amplification or cache amplification. The answer depends on the index structure (b-tree, LSM, something else). Cache amplification can join read, write and space amplification. Given that RWS was renamed RUM by the excellent RUM Conjecture now we have CRUM which is close to crummy. I briefly wrote about this in a previous post.

To do at most 1 storage read for a point query:

[Read more]
Percona Server for MySQL 5.7.21-20 Is Now Available

Percona announces the GA release of Percona Server for MySQL 5.7.21-20 on February 19, 2018. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.

Based on MySQL 5.7.21, including all the bug fixes in it, Percona Server for MySQL 5.7.21-20 is the current GA release in the Percona Server for MySQL 5.7 series. Percona …

[Read more]
Showing entries 1 to 10 of 128
10 Older Entries »