This webinar covers the basics of B-trees and Fractal Tree Indexes, the benchmarks we’ve run so far, and the development road map going forward.
Date: November 13th
Time: 2 PM EST / 11 AM PST
Topics will include:
We look forward to having you join the webinar. We also hope that by sharing these results with[Read more...]
I’ll be presenting “MongoDB and Fractal Tree Indexes” at MongoDB Boston 2012 on October 24th. My presentation covers the basics of B-trees and Fractal Tree Indexes, the benchmarks we’ve run so far, and the development road map going forward.
I’ve been to this one day conference twice now and both times came away with a better understanding of MongoDB’s capabilities, use-cases, and many questions answered via their deep technical dives. I highly recommend current MongoDB users and anyone considering a MongoDB project attend – it appears that seats are still available.
The core technology behind Tokutek is based on the academic research by our founders: Michael Bender, Bradley Kuszmaul and Martin Farach-Colton. They are all still in academia, in addition to their work at Tokutek.
Back in March, the White House kicked off a new Initiative for Big Data. Last week, the National Science Foundation announced the first interagency grants for this. Eight awards were given, and our own Michael Bender and Martin Farach-Colton, along with Robert Johnson of Stony Brook University, received one of[Read more...]
According to the article, “Fractal Tree indexing is helping organizations analyze big data more efficiently due to its ability to improve database efficiency thanks to faster ‘database insertion speed, quicker input/output performance, operational agility, and data compression.’” As a start-up based on “the first algorithm-based breakthrough in the database world in 40 years,” Toktuetek is following in the footsteps of firms such as Google and RSA, which also relied on novel algortithm advances as core to their technology.
To read the full article, and[Read more...]
We are excited to announce TokuDB® v6.5, the latest version of Tokutek’s flagship storage engine for MySQL and MariaDB.
This version offers optimization for Flash as well as more hot schema change operations for improved agility.
We’ll be posting more details about the new features and performance, so here’s an overview of what’s in store.Flash TokuDB v6.5 continues the great Toku-tradition of fast insertions. On flash drives, we show an order-of-magnitude (9x) faster insertion rate than InnoDB. TokuDB’s standard compression works just as well on flash and helps you get the most out of your [Read more...]
The tutorial, which is 4 hours on Monday afternoon, aims to cover the following topics (but it’s looking like we’ll have to drop several items for lack of time.)
This tutorial will explore data structures and algorithms for big databases. The topics include:
- Data structures including B-trees, Log Structured Merge Trees, and Streaming B-trees.
- Approximate Query Membership data structures including Bloom filters and cascade filters.
- Algorithms for join including hash joins and
At FROSCON I’ll be talking about fast data structures for maintaining indexes. The talk will share some content with my upcoming MySQL Connect talk.
At VLDB, Dzejla Medjedovic will be presenting a talk on our paper on SSD-friendly Bloom-filter-like data structures. The paper is
Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, and Erez Zadok.
Don’t Thrash: How to Cache Your Hash on Flash. PVLDB 5(11):1627-1637, 2012.
An earlier version of the paper appeared at[Read more...]
A few weeks ago Bradley Kuszmaul and I attended the Dagstuhl Seminar on Database Workload Management.
The Dagstuhl computer science research center is (remotely) located in the countryside in Saarland, Germany. The actual building is an 18th Century Manor House, first retooled as an old-age home, and then a computer science research center. Workshop participants typically spend the whole week talking and working together.
Dagstuhl Computer Science Center[Read more...]
This past week I attended OSCon, the annual conference for open source’s true believers. And there was a religious fervor in the air, particularly from the point of view of someone more accustomed to Oracle conferences.
And if open source is the religion, proprietary closed-source companies are the devil. That having been said, I was surprised how virtually all large companies were demonized. Even long-time defenders of open source like IBM were ignored at best. That didn’t prevent them from coming though, with Microsoft and HP in particular with high-profile sponsorships and PR offensives that didn’t seem to have much influence with the crowd.
The companies generating buzz were the small companies built around development of their own open source products. There are a surprising number of them out[Read more...]
Three rules on making indexes around queries to provide good performance
Application performance often depends on how fast a query can respond and query performance almost always depends on good indexing. So one of the quickest and least expensive ways to increase application performance is to optimize the indexes. This talk presents three simple and effective rules on how to construct indexes around queries that result in good performance.
Time: 2PM EDT / 11AM PDT
This webinar is a general discussion applicable to all databases using indexes and is not specific to any particular MySQL® storage engine[Read more...]
"Why the days are numbered for Hadoop as we know it"I know GigaOM like to provoke scandals sometimes, we all remember some other unforgettable piece, but there is something behind it...
Solving the Challenges of Big Databases with MySQL
When you’re using MySQL for big data (more than ten times as large as main memory), these challenges often arise: loading data fast; maintaining indexes under insertions deletions, and updates; adding and removing columns online; adding indexes online; preventing slave lag; and compressing data effectively.
This session shows why some of these challenges are difficult to solve with storage engines based on B-trees, how Fractal Tree® data structures work, and why they can help solve these problems. Tokutek sells a transaction-safe Fractal Tree storage engine for MySQL, but the presentation is primarily about the underlying technology. It includes a discussion of both the theoretical and practical aspects of Fractal Tree indexes.
I have the privilege of being able to give[Read more...]
Table optimization is a necessary evil; tables sometimes need to be optimized to reclaim space or to improve query performance. Unfortunately, MySQL blocks writes to a table while it is being optimized. Because optimization time is proportional to the table size, writes can be blocked for a long time. Fractal Tree indexes support online optimization; however, the MySQL metadata lock gets in the way of writing while optimizing. We will describe a simple patch to MySQL that enables online optimization of TokuDB tables.
Why do tables need to be optimized? Here are some reasons.
Fast indexing requires the leaves of a Fractal Tree® Index to be big. But some queries require the leaves to be small in order to get any reasonable performance. Basements nodes are our way to achieve these conflicting goals, and here I’ll explain how.
On many occasions, we at Tokutek have pointed out that TokuDB is write optimized, which means TokuDB indexes data much faster than a B-tree solution such as InnoDB. As with any write-optimized data structure, Fractal Tree indexes need to bundle up lots of small writes into a few big writes. Otherwise, there’d be no way to beat a B-tree. So the question is, how big do the writes have to be?
Consider how long it takes to write k bytes to a disk. First, there is the seek time s, which we can assume to be independent of k.[Read more...]
The signal-to-noise ratio in the NoSQL world has made it hard to figure out what’s going on, or even who has something new. For all the talk of performance in the NoSQL world, much of the most exciting part of what’s new is really not about performance at all.
Take for example, MongoDB, which has a really great data model and MapReduce has a very handy scripting language. These are genuine and probably long-lasting contributions. Their innovation is all about finding a new language to use for interacting with data. They are about NoSQL.
The confusion comes, for me, when we get to the performance side of the equation. I explore this in detail in an article I did for Datanami recently – http://www.datanami.com/datanami/2012-05-22/the_sound_and_the_nosql_fury.html.
Martin Farach-Colton and I ran a Tutorial on Algorithms for Memory Sensitive Computing on May 18th at the 44th ACM Symposium on Theory of Computing (STOC) at NYU. Here is the program for the tutorial.
Erik Demaine (MIT) spoke on the History of I/O Models. Throughout the years, a remarkable variety of computational models have been proposed to explain the effects of caching, data locality, prefetching, and single-and multi-level memory hierarchies. Erik traced the intellectual history and connections between these models. Most approaches[Read more...]