In my two previous blogs I wrote about our implementation of Fractal Tree Indexes on MongoDB, showing a 10x insertion performance increase and a 268x query performance increase. MongoDB’s covered indexes can provide some performance benefits over a regular MongoDB index, as they reduce the amount of IO required to satisfy certain queries. In essence, when all of the fields you are requesting are present in the index key, then MongoDB does not have to go back to the main storage heap to retrieve anything. My benchmark results are …
[Read more]Last week I wrote about our 10x insertion performance increase with MongoDB. We’ve continued our experimental integration of Fractal Tree® Indexes into MongoDB, adding support for clustered indexes. A clustered index stores all non-index fields as the “value” portion of the index, as opposed to a standard MongoDB index that stores a pointer to the document data. The benefit is that indexed lookups can immediately return any requested values instead of needing to do an additional lookup (and potential disk IOs) for the requested fields.
To create a clustered index you just need to add “clustering:true” as in the following example (note that version 2 indexes are Fractal Tree Indexes): …
[Read more]The challenge of handling massive data processing workloads has spawned many new innovations and techniques in the database world, from indexing innovations like our Fractal Tree® technology to a myriad of “NoSQL” solutions (here is our Chief Scientist’s perspective). Among the most popular and widely adopted NoSQL solutions is MongoDB and we became curious if our Fractal Tree indexing could offer some advantage when combined with it. The answer seems to be a strong “yes”.
Earlier in the summer we kicked off a small side project and here’s what we did: we implemented a “version 2” IndexInterface as a Fractal Tree index and ran some benchmarks. Note that our integration only affects MongoDB’s secondary indexes; primary indexes continue to rely on MongoDB’s indexing code. All the changes we made to the MongoDB source …
[Read more]Next week I (Bradley) will be traveling to FROSCON near Bonn, Germany, and then on to VLDB in Istanbul.
At FROSCON I’ll be talking about fast data structures for maintaining indexes. The talk will share some content with my upcoming MySQL Connect talk.
At VLDB, Dzejla Medjedovic will be presenting a talk on our paper on SSD-friendly Bloom-filter-like data structures. The paper is
Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell
Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes,
Pradeep Shetty, Richard P. Spillane, and Erez Zadok.
Don’t Thrash: How to Cache Your Hash on Flash.
PVLDB 5(11):1627-1637, 2012.
An earlier version of the paper appeared at …
[Read more]A few weeks ago Bradley Kuszmaul and I attended the Dagstuhl Seminar on Database Workload Management.
The Dagstuhl computer science research center is (remotely) located in the countryside in Saarland, Germany. The actual building is an 18th Century Manor House, first retooled as an old-age home, and then a computer science research center. Workshop participants typically spend the whole week talking and working together.
Dagstuhl Computer Science Center
Shivnath Babu (Duke University), Goetz Graefe (Hewlett Packard), and Harumi Kuno (Hewlett Packard) did a great job organizing. …
[Read more]In April, I got to give a talk at Percona Live, about why The Right Read Optimization is Actually Write Optimization. It was my first industry talk, so I was delighted when someone in the audience said “I feel like I just earned a college credit.”
Box offered to host everyone’s slides from the conference here (mine is here). A big thanks from me to Sheeri Cabral, for recording my talk and posting it online!
The focus of the talk …
[Read more]Master/slave replication is an important tool that gets used in many ways: distributing read loads among many slaves for performance, using a slave for backups so the master can handle live load, geographically distributed disaster recovery, etc. The Achilles’ Heal of slave performance is that slave workloads are single-threaded. The master can have many clients inserting, updating, querying, whereas the slave has only one insertion client: the master. InnoDB single-client performance is much slower than its multi-client performance, which means that the bottleneck in a master/slave system is often the rate at which a slave can keep up.
If the master has an average transactions per second (tps) that is higher than what the slave can handle, the slave will fall further and further behind. If the slaves are being used to distribute read workload, for example, the results they produce will fall further out of date. If a slave is used to …
[Read more]I’ll be speaking on April 11th at 4:30 pm in Room 4 in at the Percona Conference and Expo Talk. The topic will be “Creating a Benchmark Infrastructure That Just Works.”
Throughout my career I’ve been involved with maintaining the performance of database applications and therefore created many benchmark frameworks. At Tokutek, an important part of my role is measuring the performance of our storage engine over time and versus competing solutions. There is nothing proprietary about what I’ve created, it can be used anywhere.
My presentation will cover how I created the benchmark infrastructure at Tokutek:
- Hardware and software …
iiBench measures the rate at which a database can insert new rows while maintaining several secondary indexes. We ran this for 1 billion rows with TokuDB and InnoDB starting last week, right after we launched TokuDB v5.2. While TokuDB completed it in 15 hours, InnoDB took 7 days.
The results are shown below. At the end of the test, TokuDB’s insertion rate remained at 17,028 inserts/second whereas InnoDB had dropped to 1,050 inserts/second. That is a difference of over 16x. Our complete set of benchmarks for TokuDB v5.2 can be found here.
…
[Read more]TokuDB® v5.2, the latest version of Tokutek’s flagship storage engine for MySQL and MariaDB, is now available.
This version offers performance enhancements over previous releases, especially for multi-client scale up and point queries, and extends the cases where ALTER TABLE is non-blocking, in particular adding Hot Column Rename.
TokuDB v5.2 maintains all our established advantages: fast trickle load, fast bulk load, fast range queries through clustering indexes, hot schema changes, great compression, no fragmentation, and full MySQL compatibility for ease of installation. See our benchmark page for details.
Multi-client workloads
In TokuDB v5.2, we have reworked our locking scheme to better support multi-client workloads, and as always, we have focused on large databases. How did we do? Let’s check out some …
[Read more]