Showing entries 1 to 10 of 163
10 Older Entries »
Displaying posts with tag: RocksDB (reset)
Throttling writes: LSM vs B-Tree

Reducing response time variance is important for some workloads. This post explains sources of variance for workloads with high write rates when the index structure is an LSM or a B-Tree. I previously wrote about this in my post on durability debt.

Short summary:

  1. For a given write rate stalls are more likely with a B-Tree than an LSM
  2. Many RocksDB write stalls can be avoided via configuration
  3. Write stalls with a B-Tree are smaller but more frequent versus an LSM
  4. Write stalls are more likely when the redo log isn't forced on commit
  5. The worst case difference between an LSM and B-Tree is larger when the working set isn't cached
  6. Life is easier but more expensive when the working set …
[Read more]
My theory on technical debt and OSS

This is a hypothesis, perhaps it is true. I am biased given that I spent 15 years acknowledging tech debt on a vendor-based project (MySQL) and not much time on community-based projects.

My theory on tech debt and OSS is that there is more acknowledgement of tech debt in vendor-based projects than community-based ones. This is an advantage for vendor-based projects assuming you are more likely to fix acknowledged problems. Of course there are other advantages for community-based projects.

I think there is more acknowledgement of tech debt in vendor-based projects because the community is criticizing someone else's effort rather than their own. This is human nature, even if the effect and behavior aren't always kind. I spent many years marketing bugs that needed to be fixed -- along with many years  leading teams working to fix those bugs.

The joy of database configuration

I am wary of papers with performance results for too many products.Too many means including results from systems for which you lack expertise. Wary means I have less faith in the comparison even when the ideas in the paper are awesome. I have expertise in MySQL, MongoDB, RocksDB, WiredTiger and InnoDB but even for them I have made and acknowledged ridiculous mistakes.

Database configuration is too hard. There are too many options, most of them aren't significant and the approach is bottom-up. I an expert on this -- in addition to years of tuning I have added more than a few options to RocksDB and MySQL.

This post was motivated by PostgreSQL. I want to run the insert benchmark for it and need a good configuration. I have nothing against PG with the exception of a few too many why …

[Read more]
Using an LSM for analytics

How do you use an LSM for analytics? I haven't thought much about it because my focus has been small data -- web-scale OLTP since 2006. It is a great question given that other RocksDB users (Yugabyte, TiDB, Rockset, CockroachDB) support analytics.

This post isn't a list of everything that is needed for an analytics engine. That has been described elsewhere. It is a description of problems that must be solved when trying to use well-known techniques with an LSM. I explain the LSM challenges for a vectorized query engine, columnar storage, analytics indexes and bitmap indexes.

There is much speculation in this post. Some of it is informed -- I used to maintain bitmap indexes at Oracle. There is also big opportunity for research and for R&D in this space. I am happy to be corrected and be told of papers that I should read.

Vectorized query engine

See …

[Read more]
Just put the cold data over there

There are several ways to use less SSD for an OLTP workload: choose a database engine has less space amplification, store less data, move the cold data elsewhere. The first approach is a plan while the others are goals. A plan is something you can implement. A goal requires a plan to get done.

This matters when you want to decrease the cost of storage for a database workload but not everyone needs to do that. The first approach assumes your DBMS supports an LSM with leveled compaction and compression (MariaDB and Percona Server include MyRocks, ScyllaDB and Cassandra are also options). The second approach, store less data, assumes you can get everyone to agree to remove data and that is a hard conversation.

The third approach, move the cold data elsewhere, is a wonderful goal. I wonder how often that goal …

[Read more]
Learned indexes for an LSM?

The Learned Indexes paper opened a new area of research for storage engines. The idea is to use the distribution of the data to make searches faster and/or reduce the size of search structures. Whether ML will be used is an implementation artifact to me. I am in this for the efficiency gains -- reducing space and CPU read amplification. I prefer to call this topic learned search structures rather than learned indexes because the public isn't ready for learned recovery and learned consistency.

While the Learned Indexes paper has ideas for hash maps and bloom filters most research has focused on a B-Tree. I expect learned indexes to work better with an LSM because its search structures (block indexes) are created off line with respect to user operations. There is more time for analysis and …

[Read more]
Open Core Summit and OSS glue code

I am optimistic about the Open Core Summit. It can be something that benefits users, startups, the source-available community and the OSS crowd. The summit has many of the important people from the open core community. It is an opportunity for them to collaborate -- form a foundation, reduce open core license proliferation, discuss the next license to be reviewed by OSI and most importantly -- fix the open core marketing approach.

I appreciate that startups put so much time and money into building interesting system software that is shared either as OSS or source-available. I haven't enjoyed much of the marketing about the challenges that cloud creates for VC-funded …

[Read more]
Exposing MyRocks Internals Via System Variables: Part 7, Use Case Considerations

(In the previous post, Part 6, we covered Replication.)

In this final blog post, we conclude our series of exploring MyRocks by taking a look at use case considerations. After all, having knowledge of how an engine works is really only applicable if you feel like you’re in a good position to use it.

Advantages of MyRocks

Let’s start by talking about some of the advantages of MyRocks.

Compression

MyRocks will typically do a good job of reducing the physical footprint of your data. As I mentioned in my previous post in this series about compression, you have the ability to configure compression down to the individual compaction layers for each column family. You also get the advantage of the fact that data isn’t updated once it’s written to disk. Compaction, which was …

[Read more]
Exposing MyRocks internals Via system variables: Part 6, Replication

(In the previous post, Part 5, we covered Data Reads.)

In this blog post, we continue our series of exploring MyRocks mechanics by looking at the configurable server variables and column family options. In our last post, I explained at a high level how reads occur in MyRocks, concluding the arc of covering how data moves into and out of MyRocks. In this post, we’re going to explore replication with MyRocks, more specifically read-free replication.

Some of you may already be familiar with the concepts of read-free replication as it was a key feature of the TokuDB engine, which leveraged fractal tree indexing. TokuDB was similar to MyRocks in the sense that it had a pseudo log-based storage …

[Read more]
Exposing MyRocks internals via system variables: Part 5, Data Reads

(In the previous post, Part 4, we covered Compression and Bloom Filters)

In this blog post, we continue on our series of exploring MyRocks mechanics by looking at the configurable server variables and column family options. In our last post, I explained at a high level how compression and bloom filtering are applied to data files as they are initially flushed from immutable memtables and are subsequently passed through the compaction process. With that being covered, we should now have a clear understanding as to how data writing works in MyRocks and can start reviewing how data read requests are handled.

The Read Process

Let’s start off by talking about how read processes are handled at the file level. When a read request comes in, the first thing it needs to do is pull the …

[Read more]
Showing entries 1 to 10 of 163
10 Older Entries »