According to the article, “Fractal Tree indexing is helping organizations analyze big data more efficiently due to its ability to improve database efficiency thanks to faster ‘database insertion speed, quicker input/output performance, operational agility, and data compression.’” As a start-up based on “the first algorithm-based breakthrough in the database world in 40 years,” Toktuetek is following in the footsteps of firms such as Google and RSA, which also relied on novel algortithm advances as core to their technology.
To read the full article, and[Read more...]
We are excited to announce TokuDB® v6.5, the latest version of Tokutek’s flagship storage engine for MySQL and MariaDB.
This version offers optimization for Flash as well as more hot schema change operations for improved agility.
We’ll be posting more details about the new features and performance, so here’s an overview of what’s in store.Flash TokuDB v6.5 continues the great Toku-tradition of fast insertions. On flash drives, we show an order-of-magnitude (9x) faster insertion rate than InnoDB. TokuDB’s standard compression works just as well on flash and helps you get the most out of your [Read more...]
The tutorial, which is 4 hours on Monday afternoon, aims to cover the following topics (but it’s looking like we’ll have to drop several items for lack of time.)
This tutorial will explore data structures and algorithms for big databases. The topics include:
- Data structures including B-trees, Log Structured Merge Trees, and Streaming B-trees.
- Approximate Query Membership data structures including Bloom filters and cascade filters.
- Algorithms for join including hash joins and
At FROSCON I’ll be talking about fast data structures for maintaining indexes. The talk will share some content with my upcoming MySQL Connect talk.
At VLDB, Dzejla Medjedovic will be presenting a talk on our paper on SSD-friendly Bloom-filter-like data structures. The paper is
Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, and Erez Zadok.
Don’t Thrash: How to Cache Your Hash on Flash. PVLDB 5(11):1627-1637, 2012.
An earlier version of the paper appeared at[Read more...]
A few weeks ago Bradley Kuszmaul and I attended the Dagstuhl Seminar on Database Workload Management.
The Dagstuhl computer science research center is (remotely) located in the countryside in Saarland, Germany. The actual building is an 18th Century Manor House, first retooled as an old-age home, and then a computer science research center. Workshop participants typically spend the whole week talking and working together.
Dagstuhl Computer Science Center[Read more...]
This past week I attended OSCon, the annual conference for open source’s true believers. And there was a religious fervor in the air, particularly from the point of view of someone more accustomed to Oracle conferences.
And if open source is the religion, proprietary closed-source companies are the devil. That having been said, I was surprised how virtually all large companies were demonized. Even long-time defenders of open source like IBM were ignored at best. That didn’t prevent them from coming though, with Microsoft and HP in particular with high-profile sponsorships and PR offensives that didn’t seem to have much influence with the crowd.
The companies generating buzz were the small companies built around development of their own open source products. There are a surprising number of them out[Read more...]
Three rules on making indexes around queries to provide good performance
Application performance often depends on how fast a query can respond and query performance almost always depends on good indexing. So one of the quickest and least expensive ways to increase application performance is to optimize the indexes. This talk presents three simple and effective rules on how to construct indexes around queries that result in good performance.
Time: 2PM EDT / 11AM PDT
This webinar is a general discussion applicable to all databases using indexes and is not specific to any particular MySQL® storage engine[Read more...]
"Why the days are numbered for Hadoop as we know it"I know GigaOM like to provoke scandals sometimes, we all remember some other unforgettable piece, but there is something behind it...
Solving the Challenges of Big Databases with MySQL
When you’re using MySQL for big data (more than ten times as large as main memory), these challenges often arise: loading data fast; maintaining indexes under insertions deletions, and updates; adding and removing columns online; adding indexes online; preventing slave lag; and compressing data effectively.
This session shows why some of these challenges are difficult to solve with storage engines based on B-trees, how Fractal Tree® data structures work, and why they can help solve these problems. Tokutek sells a transaction-safe Fractal Tree storage engine for MySQL, but the presentation is primarily about the underlying technology. It includes a discussion of both the theoretical and practical aspects of Fractal Tree indexes.
I have the privilege of being able to give[Read more...]
Table optimization is a necessary evil; tables sometimes need to be optimized to reclaim space or to improve query performance. Unfortunately, MySQL blocks writes to a table while it is being optimized. Because optimization time is proportional to the table size, writes can be blocked for a long time. Fractal Tree indexes support online optimization; however, the MySQL metadata lock gets in the way of writing while optimizing. We will describe a simple patch to MySQL that enables online optimization of TokuDB tables.
Why do tables need to be optimized? Here are some reasons.
Fast indexing requires the leaves of a Fractal Tree® Index to be big. But some queries require the leaves to be small in order to get any reasonable performance. Basements nodes are our way to achieve these conflicting goals, and here I’ll explain how.
On many occasions, we at Tokutek have pointed out that TokuDB is write optimized, which means TokuDB indexes data much faster than a B-tree solution such as InnoDB. As with any write-optimized data structure, Fractal Tree indexes need to bundle up lots of small writes into a few big writes. Otherwise, there’d be no way to beat a B-tree. So the question is, how big do the writes have to be?
Consider how long it takes to write k bytes to a disk. First, there is the seek time s, which we can assume to be independent of k.[Read more...]
The signal-to-noise ratio in the NoSQL world has made it hard to figure out what’s going on, or even who has something new. For all the talk of performance in the NoSQL world, much of the most exciting part of what’s new is really not about performance at all.
Take for example, MongoDB, which has a really great data model and MapReduce has a very handy scripting language. These are genuine and probably long-lasting contributions. Their innovation is all about finding a new language to use for interacting with data. They are about NoSQL.
The confusion comes, for me, when we get to the performance side of the equation. I explore this in detail in an article I did for Datanami recently – http://www.datanami.com/datanami/2012-05-22/the_sound_and_the_nosql_fury.html.
Martin Farach-Colton and I ran a Tutorial on Algorithms for Memory Sensitive Computing on May 18th at the 44th ACM Symposium on Theory of Computing (STOC) at NYU. Here is the program for the tutorial.
Erik Demaine (MIT) spoke on the History of I/O Models. Throughout the years, a remarkable variety of computational models have been proposed to explain the effects of caching, data locality, prefetching, and single-and multi-level memory hierarchies. Erik traced the intellectual history and connections between these models. Most approaches[Read more...]
"Dell announced a prototype low-power server with ARM processors, following a growing demand by Web companies for custom-built servers that can scale performance while reducing financial overhead on data centers"In short, ARM (see Wikipedia definition here) is an architecture standard for processors. ARM processors are slower compared to good old x86 processors from Intel and AMD, but have power-efficiency, density and price attributes that intrigue [Read more...]
TokuDB® is a proven solution that scales MySQL® and MariaDB® from GBs to TBs with unmatched insert and query speed, compression, and online schema flexibility.
Tokutek’s recently launched TokuDB v6 delivers all of these features and more, with the introduction of high performance replication for MySQL and MariaDB. TokuDB v6 eliminates the common and persistent problem of “slave lag” in which a replication server is unable to keep up with the query load borne by the master server. TokuDB v6 solves this by offering high ingestion rates at the slave.[Read more...]
Many database management tasks become difficult as you move from millions of rows and gigabytes of data to billions of rows and terabytes of data. Such tasks include ingesting data while maintaining indexes; changing schemas without downtime; and supporting connections, replication, and backup. For some scaling problems (connections and replication), MySQL® is better than most of the competition. For others, such as indexing, schema changes, and backup, MySQL has typically been harder to use. Fortunately, the tasks MySQL does well are in its core, whereas the tasks that are more difficult can be solved with storage engine plug-ins.
I recently gave a talk at[Read more...]
Tackling machine data on the ground to ensure successful operations for NASA in space
The Company: Southwest Research Institute (SwRI) is an independent, nonprofit applied research and development organization. The staff of more than 3,000 specializes in the creation and transfer of technology in engineering and the physical sciences. Currently, SwRI is part of an international team working on the NASA[Read more...]
In April, I got to give a talk at Percona Live, about why The Right Read Optimization is Actually Write Optimization. It was my first industry talk, so I was delighted when someone in the audience said “I feel like I just earned a college credit.”[Read more...]