Showing entries 11 to 20 of 156
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: RocksDB (reset)
Geek code for LSM trees

This is a link to slides from my 5-minute talk at the CIDR 2019 Gong Show. The slides are a brief overview of the geek code for LSM trees. If you click on the settings icon in the slide show you can view the speaker notes which have links to blog posts that have more details. I also pasted the links below. Given time I might add to this post, but most of the content is in my past blog posts. Regardless I think there is more to be discovered about performant, efficient and manageable LSM trees.

The key points are there are more compaction algorithms to discover, we need to make it easier to describe them and compaction is a property of a level, not of the LSM tree. …

[Read more]
LSM math: fixing mistakes in my last post

My last post explained the number of levels in an LSM that minimizes write amplification using 3 different estimates for the per-level write-amp. Assuming the per-level growth factor is w then the 3 estimates were approximately w, w+1 and w-1 and named LWA-1, LWA-2 and LWA-3 in the post.

I realized there was a mistake in that post for the analysis of LWA-3. The problem is that the per-level write-amp must be >= 1 (and really should be > 1) but the value of w-1 is <= 1 when the per-level growth factor is <= 2. By allowing the per-level write-amp to be < 1 it easy to incorrectly show that a huge number of levels reduces write-amp as I do for curve #3 in this graph. While I don't claim that (w-1) or (w-1)/2 can't be a useful estimate for …

[Read more]
LSM math: revisiting the number of levels that minimizes write amplification

I previously used math to explain the number of levels that minimizes write amplification for an LSM tree with leveled compaction. My answer was one of ceil(ln(T)) or floor(ln(T)) assuming the LSM tree has total fanout = T where T is size(database) / size(memtable).

Then I heard from a coworker that the real answer is less than floor(ln(T)). Then I heard from Niv Dayan, first author of the Dostoevsky paper, that the real answer is larger than ceil(ln(T)) and the optimal per-level growth factor is ~2 rather than ~e.

All of our answers are correct. We have different answers because we use different functions to estimate the per-level write-amp. The graph of the …

[Read more]
Define "better"

Welcome to my first rant of 2019, although I have written about this before. While I enjoy benchmarketing from a distance it is not much fun to be in the middle of it. The RocksDB project has been successful and thus becomes the base case for products and research claiming that something else is better. While I have no doubt that other things can be better I am wary about the definition of better.

There are at least 3 ways to define better when evaluating database performance. The first, faster is better, ignores efficiency, the last two do not. I'd rather not ignore efficiency. The marginal return of X more QPS eventually becomes zero while the benefit of using less hardware is usually greater than zero.

[Read more]
How to Get Details About MyRocks Deadlocks in MariaDB and Percona Server

In my previous post on ERROR 1213 I noted that Percona Server does not support the SHOW ENGINE ROCKSDB TRANSACTION STATUS statement to get deadlock details in "text" form. I've got some clarifications in my related feature request, PS-5114. So I decided to write this followup post and show what is the way to get deadlock details for the ROCKSDB tables in current versions of MariaDB and Percona Server.

First of all, I'd like to check MariaDB's implementation of MyRocks. For this I'll re-create deadlock scenario from that my post with MariaDB 10.3.12 I have at hand. We should start with installing ROCKSDB

[Read more]
New small servers for performance testing

My old NUC cluster found a new home and I downsized to 2 new NUC servers. The new server is NUC8i7beh with 16g RAM, 500g Samsung 860 EVO for the OS and 500g Samsung 970 EVO for performance. The Samsung 860 is SATA and the Samsung 970 is an m.2 device. I expect to wear out the performance devices as I have done that in the past. With the OS on a separate device I avoid the need to reinstall the OS when that happens.

[Read more]
LSM math - size of search space for LSM tree configuration

I have written before and will write again about using 3-tuples to explain the shape of an LSM tree. This makes it easier to explain the configurations supported today and configurations we might want to support tomorrow in addition to traditional tiered and leveled compaction. The summary is that n LSM tree has N levels labeled from L1 to Ln and Lmax is another name for Ln. There is one 3-tuple per level and the components of the 3-tuple are (type, fanout, runs) for Lk (level k) where:

  • type is Tiered or Leveled and explains compaction into that level
  • fanout is the size of a sorted run in Lk relative to a sorted run from Lk-1, a real and >= 1
  • runs is the number of sorted runs in that level, an integer and >= 1

Given the above how many valid configurations exist for …

[Read more]
LSM math - how many levels minimizes write amplification?

How do you configure an LSM tree with leveled compaction to minimize write amplification? For a given number of levels write-amp is minimal when the same fanout (growth factor) is used between all levels, but that does not explain the number of levels to use. In this post I answer that question.

  1. The number of levels that minimizes write-amp is one of ceil(ln(T)) or floor(ln(T)) where T is the total fanout -- sizeof(database) / sizeof(memtable)
  2. When #1 is done then the per-level fanout is e when the number of levels is ln(t) and a value close to e when the number of levels is an integer.

Introduction
I don't recall reading this result elsewhere, but I am happy to update this post with a link to such a result. I was encouraged to answer this after a discussion with the RocksDB team and thank Siying Dong for stating #2 above while leaving the math to me. I assume the …

[Read more]
What May Cause MySQL ERROR 1213

Probably all of us, MySQL users, DBAs and developers had seen error 1213 more than once, in one context or the other:
mysql> select * from t1;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transactionThe first thing that comes to mind in this case is: "OK, we have InnoDB deadlock, let's check the details", followed by the SHOW ENGINE INNODB STATUS check, like this:
mysql> show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name:
Status:
=====================================
2018-12-08 17:41:11 0x7f2f8b8db700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 12 seconds
-----------------
BACKGROUND THREAD

[Read more]
Review of TRIAD: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores

This is review of TRIAD which was published in USENIX ATC 2017. It explains how to reduce write amplification for RocksDB leveled compaction although the ideas are useful for many LSM implementations. I share a review here because the paper has good ideas. It isn't easy to keep up with all of the LSM research, even when limiting the search to papers that reference RocksDB, and I didn't notice this paper until recently.

TRIAD reduces write amplification for an LSM with leveled compaction and with a variety of workloads gets up to 193% more throughput, up to 4X less write amplification and spends up to 77% less time doing compaction and flush. Per the …

[Read more]
Showing entries 11 to 20 of 156
« 10 Newer Entries | 10 Older Entries »