Showing entries 1 to 6
Displaying posts with tag: LevelDB (reset)
Geek code for LSM trees

This is a link to slides from my 5-minute talk at the CIDR 2019 Gong Show. The slides are a brief overview of the geek code for LSM trees. If you click on the settings icon in the slide show you can view the speaker notes which have links to blog posts that have more details. I also pasted the links below. Given time I might add to this post, but most of the content is in my past blog posts. Regardless I think there is more to be discovered about performant, efficient and manageable LSM trees.

The key points are there are more compaction algorithms to discover, we need to make it easier to describe them and compaction is a property of a level, not of the LSM tree. …

[Read more]
LSM math: fixing mistakes in my last post

My last post explained the number of levels in an LSM that minimizes write amplification using 3 different estimates for the per-level write-amp. Assuming the per-level growth factor is w then the 3 estimates were approximately w, w+1 and w-1 and named LWA-1, LWA-2 and LWA-3 in the post.

I realized there was a mistake in that post for the analysis of LWA-3. The problem is that the per-level write-amp must be >= 1 (and really should be > 1) but the value of w-1 is <= 1 when the per-level growth factor is <= 2. By allowing the per-level write-amp to be < 1 it easy to incorrectly show that a huge number of levels reduces write-amp as I do for curve #3 in this graph. While I don't claim that (w-1) or (w-1)/2 can't be a useful estimate for …

[Read more]
LSM math: revisiting the number of levels that minimizes write amplification

I previously used math to explain the number of levels that minimizes write amplification for an LSM tree with leveled compaction. My answer was one of ceil(ln(T)) or floor(ln(T)) assuming the LSM tree has total fanout = T where T is size(database) / size(memtable).

Then I heard from a coworker that the real answer is less than floor(ln(T)). Then I heard from Niv Dayan, first author of the Dostoevsky paper, that the real answer is larger than ceil(ln(T)) and the optimal per-level growth factor is ~2 rather than ~e.

All of our answers are correct. We have different answers because we use different functions to estimate the per-level write-amp. The graph of the …

[Read more]
Define "better"

Welcome to my first rant of 2019, although I have written about this before. While I enjoy benchmarketing from a distance it is not much fun to be in the middle of it. The RocksDB project has been successful and thus becomes the base case for products and research claiming that something else is better. While I have no doubt that other things can be better I am wary about the definition of better.

There are at least 3 ways to define better when evaluating database performance. The first, faster is better, ignores efficiency, the last two do not. I'd rather not ignore efficiency. The marginal return of X more QPS eventually becomes zero while the benefit of using less hardware is usually greater than zero.

[Read more]
Compaction stalls: something to make better in RocksDB


In previous results that I shared for the insert benchmark it was obvious that MyRocks throughput is steady when the workload transitions from in-memory to IO-bound. The reason is that non-unique secondary index maintenance is read-free for MyRocks so there are no stalls for storage reads of secondary index pages. Even with the change buffer, InnoDB eventually is slowed by storage reads and by page writeback.

It was less obvious that MyRocks has more variance on both the in-memory and IO-bound insert benchmark tests. I try to be fair when explaining storage engine performance so I provide a few more details here and results for InnoDB in MySQL 5.7.10 & 5.6.26 along with MyRocks from our fork of MySQL 5.6. The binlog was enabled for all tests, fsync-on-commit was disabled and 16 clients inserted 500m or 2b rows into …

[Read more]
451 CAOS Links 2011.07.29

Open Cloud Initiative launches. HP joins OpenStack. Oracle releases Java 7. And more.

# The Open Cloud Initiative launched to drive open standards in cloud computing.

# HP announced its support for OpenStack.

# Oracle announced the availability of Java SE 7. The Apache Software Foundation warned of index corruption and crashes in Apache Lucene and Solr.

# Nebula …

[Read more]
Showing entries 1 to 6