KAYAK, the world’s leading travel search engine, is using TokuDB for MySQL to provide a more personalized user experience. Read all about it in today’s press release.
Tokutek is pleased to announce the general availability of TokuDB for MySQL, version 2.2.0. This version offers several improvements:
- Better multi-core load balancing for concurrent workloads.
- Faster bulk loading performance.
- Enhanced diagnostics for easier tuning and troubleshooting.
- Fixed all known bugs.
About TokuDB
TokuDB for MySQL is a storage engine built with Tokutek’s Fractal Tree technology. TokuDB provides near seamless compatibility for MySQL applications. Tables can be individually defined to use TokuDB, MyISAM, InnoDB or other MySQL-compliant storage engines. Data is loaded, inserted, and queried using standard MySQL commands, with no restrictions or special requirements. Our Fractal Tree technology indexes up to 50 times faster than traditional database technologies, enabling near …
[Read more]At Tokutek, Rich Prohaska used Gearman to automate our nightly build and test process for TokuDB for MySQL. Rich is busy working on TokuDB, so I’m writing up an overview of the build and test architecture on his behalf.
Build and Test Process
Rich created a script, nightly.bash, that gets kicked off every night as a cron job. Nightly.bash creates a separate Gearman job for each build target. We have a separate build target (unique binary) for each combination of operating system (e.g. Linux, Windows, etc.) and HW architecture (e.g. i686, x86_64) supported by TokuDB. As we support more operating systems over time, the number of build targets grows quickly so we needed a build and test architecture that scales, and Gearman makes it easy.
Gearman then automatically distributes the build jobs to a set of systems set up as …
[Read more]We often hear from customers and MySQL experts that fragmentation causes problems such as wasting disk space, increasing backup times, and degrading performance. Typical remedies include periodic “optimize table” or dump and re-load (for example, see Project Golden Gate). Unfortunately, these techniques impact database availability and/or require additional administrative cost and complexity. Tokutek’s Fractal Tree algorithms do not not cause fragmentation, and we’re looking for ways to measure the effects of fragmentation to quantify TokuDB’s benefits.
I ran some tests using the iiBench benchmark as an experiment to try and quantify the impact of fragmentation, and observed some …
[Read more]I saw Mark Callaghan’s post, and his graph showing miss rate as a function of cache size for InnoDB running MySQL. He plots miss rate against cache size and compares it to two simple models:
- A linear model where the miss rate is (1-C/D)/50, and
- A inverse-proportional model where the miss rate is D/(1000C).
He seemed happy (and maybe surprised) that that the linear model is a bad match and that inverse-proportional model is a good match. The linear model is the one that would make sense if every page were equally likely to have a hit.
I’ll argue here that it’s not so surprising. Suppose that miss rate has a heavy-tailed distribution, such as Zipf’s law. An example of a Zipf’s-law distribution would be if …
[Read more]We’re supporting the OpenSQL Camp, which will be held in Portland on November 14.
One of my objectives for the camp is to make progress on a universal storage engine API, to make it possible to use the same storage engines in MySQL, PostgreSQL, Ingres, or any other database. I’m also looking forward to hearing other people’s great ideas.
After OpenSQLcamp, I’ll be attending Supercomputing’09. Supercomputing and database hardware technology seems to be converging. Many of the fastest databases today look like a supercomputer with disks attached. Will there be other kinds of convergence? For example, what kind of convergence will we see between multicore computing and cluster computing? Today we program multicore machines very differently from clusters. I think in the future that difference will vanish.
Sorting a Terabyte in 197 seconds
I just returned from The 21st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), held in Calgary, where I gave a talk about my entry to the sorting contest. I sorted 1TB in 197s on a 400-node machine at MIT Lincoln Laboratory, a record which still stands today. (And it will likely remain standing, since terabyte sorting is now deprecated because it’s too fast. Now the challenge is to sort 100TB.)
For many years Jim Gray ran a sorting contest to see how fast anyone could sort a terabtye worth of 100-byte records, how much data could be sorted in one minute, and how much data could be sorted for a penny. After Jim’s disappearance at sea in January 2007, a committee formed …
[Read more]Tokutek® announces the release the release of the TokuDB storage engine for MySQL®, version 2.1.0. This release offers the following improvements over our previous release:
- Faster indexing of sequential keys.
- Faster bulk loads on tables with auto-increment fields.
- Faster range queries in some circumstances.
- Added support for InnoDB.
- Upgraded from MySQL 5.1.30 to 5.1.36.
- Fixed all known bugs.
About TokuDB
TokuDB for MySQL is a storage engine built with Tokutek’s Fractal Tree technology. TokuDB provides near seamless compatibility for MySQL applications. Tables can be individually defined to use TokuDB, MyISAM, InnoDB® or other MySQL-compliant storage engines. Data is loaded, inserted, and queried using standard MySQL commands, with no restrictions or special requirements. …
[Read more]In our last post, Bradley described how auto increment works in TokuDB. In this post, I explain one of our implementation’s big benefits, the ability to combine better primary keys with clustered primary keys.
In working with customers, the following scenario has come up frequently. The user has data that is streamed into the table, in order of time. The table will have a primary key that is an auto increment field, ‘id’, and then have an index on the field ‘time’. The queries the user does are all on some range of time (e.g. select sum(clicks) from foo where time > date ‘2008-12-19′ and time < date '2008-14-20';).
For storage engines with clustered primary keys (such as TokuDB and InnoDB), having such a schema hurts query performance. Queries do a range query on a secondary index (time), and then perform point queries …
[Read more]