My goal for the year is more time learning math and less time
running MySQL benchmarks. I haven't done serious benchmarks for
more than 12 months. It was a great experience but I want to
learn new things. MySQL 8.0.14 has been released with fixes for a
serious bug I found via the insert benchmark. I won't confirm
whether it has been fixed. I hope someone else does.

My tests and methodology are described in posts for sysbench, linkbench and the insert benchmark. I hope the
upstream distros (MySQL, MariaDB, Percona) repeat my tests and
methodology and I am happy to answer questions about that. I even
have inscrutable shell scripts that …

**1**to

**8**

**LevelDB**(reset)

I have been trying to solve the problem of finding an optimal LSM
configuration for a given workload. The real problem is larger
than that, which is to find the right index structure and the right configuration
for a given workload. But my focus is RocksDB so I will start by
solving for an LSM.

This link is to slides that summarizes my
effort. I have expressed the problem to be solved using
differentiable functions to express the cost that is to be
minimized. The cost functions have a mix of real and integer
valued parameters for which values must be determine to minimize
the cost. I have yet to solve the …

This is a link to slides from my 5-minute talk
at the CIDR 2019 Gong Show. The slides are a brief
overview of the geek code for LSM trees. If you click on the
settings icon in the slide show you can view the speaker notes
which have links to blog posts that have more details. I also
pasted the links below. Given time I might add to this post, but
most of the content is in my past blog posts. Regardless I think
there is more to be discovered about performant, efficient and
manageable LSM trees.

The key points are there are more compaction algorithms to
discover, we need to make it easier to describe them and
compaction is a property of a level, not of the LSM tree. …

My last post explained the number of levels in
an LSM that minimizes write amplification using 3 different
estimates for the per-level write-amp. Assuming the per-level
growth factor is w then the 3 estimates were approximately w, w+1
and w-1 and named LWA-1, LWA-2 and LWA-3 in the post.

I realized there was a mistake in that post for the analysis of
LWA-3. The problem is that the per-level write-amp must be >=
1 (and really should be > 1) but the value of w-1 is <= 1
when the per-level growth factor is <= 2. By allowing the
per-level write-amp to be < 1 it easy to incorrectly show that
a huge number of levels reduces write-amp as I do for curve #3
in this graph. While I don't claim that (w-1) or
(w-1)/2 can't be a useful estimate for …

I previously used math to explain the number of
levels that minimizes write amplification for an LSM tree with
leveled compaction. My answer was one of ceil(ln(T)) or
floor(ln(T)) assuming the LSM tree has total fanout = T where T
is size(database) / size(memtable).

Then I heard from a coworker that the real answer is less than
floor(ln(T)). Then I heard from Niv Dayan, first author of
the Dostoevsky paper, that the real
answer is larger than ceil(ln(T)) and the optimal per-level
growth factor is ~2 rather than ~e.

All of our answers are correct. We have different answers because
we use different functions to estimate the per-level write-amp.
The graph of the …

Welcome to my first rant of 2019, although I have written about this before. While I enjoy
benchmarketing from a distance it is not much fun
to be in the middle of it. The RocksDB project has been
successful and thus becomes the base case for products and
research claiming that something else is better. While I have no
doubt that other things can be better I am wary about the
definition of * better*.

There are at least 3 ways to define better when evaluating database performance. The first, faster is better, ignores efficiency, the last two do not. I'd rather not ignore efficiency. The marginal return of X more QPS eventually becomes zero while the benefit of using less hardware is usually greater than zero.

- …

In previous results that I shared for the insert benchmark it was
obvious that MyRocks throughput is steady when the workload
transitions from in-memory to IO-bound. The reason is that
non-unique secondary index maintenance is read-free for MyRocks
so there are no stalls for storage reads of secondary index
pages. Even with the change buffer, InnoDB eventually is slowed
by storage reads and by page writeback.

It was less obvious that MyRocks has more variance on both the
in-memory and IO-bound insert benchmark tests. I try to be fair
when explaining storage engine performance so I provide a few
more details here and results for InnoDB in MySQL 5.7.10 &
5.6.26 along with MyRocks from our fork of MySQL 5.6. The binlog
was enabled for all tests, fsync-on-commit was disabled and 16
clients inserted 500m or 2b rows into …

Open Cloud Initiative launches. HP joins OpenStack. Oracle releases Java 7. And more.

# The Open Cloud Initiative launched to drive open standards in cloud computing.

# HP announced its support for OpenStack.

# Oracle announced the availability of Java SE 7. The Apache Software Foundation warned of index corruption and crashes in Apache Lucene and Solr.

# Nebula …

[Read more]**1**to

**8**