Showing entries 61 to 70 of 117
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: blogging (reset)
Introducing Multiple Clustering Indexes

In this posting I’ll describe TokuDB’s multiple clustering index feature.  (This posting is by Zardosht.)

In general (not just for TokuDB) a clustered index or a clustering index is an index that stores the all of the data for the rows.  Quoting the MySQL 5.1 reference manual:

Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.

Most storage engines allow at most one clustered index for each table. For example, MyISAM does not support clustered indexes at all, whereas InnoDB allows only the primary key be a clustered index.

[Read more]
Publications Related to Fractal Tree Indexing

The TokuDB storage engine for MySQL employs Fractal Tree technology.  We’ve been planning to write a white paper explaining how fractal tree indexing works, but haven’t gotten to it yet.  In the mean time, here are links to some academic papers that relate to our technology.


  • Cache-Oblivious B-Trees by Michael A. Bender, Erik D. Demaine and Martin Farach-Colton in SICOMP 35:2, pp. 341-358, 2005.  An early version of this paper appeared in FOCS in 2000.
  • The Cost of Cache-Oblivious Searching by Michael A. Bender, Gerth Stlting Brodal, Rolf Fagerberg, Dongdong Ge, Simai He, Haodong Hu, John Iacono, and Alejandro López-Ortiz in FOCS 2003 p. 271.
[Read more]
Covering Indexes: Orders-of-Magnitude Improvements

The talk I gave at the Percona Performance Conference at the MySQL
Users Conference in April 2009 can be found
at http://tokutek.com/images/blog/mysqluc09/kuszmaul-mysqluc-percona-09-slides.pdf.

This talk provides some examples where covering indexes help, and
then describes a performance model that can be used to understand and
predict query performance.  It covers clustering indexes (which are a
kind of “universal” covering index), and describes the asymptotic
performance of Fractal Tree indexing (but sorry, it doesn’t yet
explain how Fractal Tree indexes work.) We’re working on writing a
white paper to explain how they work, but we’ve simply been too
busy.  The talk concludes with the graph (shown above) that
illustrates iiBench …

[Read more]
Presenting and blogging in Chinese

Travelling to Hongkong and Taipei has made such an impression on me, that I couldn’t help but add two new blogs to my homepage kaj.arno.fi:

Guanxi means “relations”, as in “Community Relations”. It’s also a very common word describing how to get things done in China. It even has its own English language Wikipedia entry.

Yi-ling-yi means one-oh-one, as in Taipei 101. …

[Read more]
The Depth of a B-tree

Schlomi Noach recently wrote a useful primer on the depth of B-trees and how that plays out for point queries—in both clustered indexes, like InnoDB, and in unclustered indexes, like MyISAM.  Here, I’d like to talk about the effect of B-tree depth on insertions and range queries.  And, of course, I’ll talk about alternatives like Fractal Trees, since that’s the basis of Tokutek’s storage engine for MySQL.

Please see Schlomi’s post for details, but I can summarize a few points, partly because I need some vocabulary for the points I’d like to make below.  Scholmi notes that there are two main features determining the depth of a B-tree (or B+-tree):


  1. The number of rows in the database.  We’ll call that N.
  2. The size of the indexed key.  Let’s …
[Read more]
High Anxiety Whenever You’re Near

Every time I visit the Sun Santa Clara Campus, I’m reminded of Mel
Brooks’s movie “High Anxiety”.  The campus was known as The Great
Asylum for the Insane in the 19th century, and even includes a tower. 

High Anxiety,
whenever you’re near.
High Anxiety,
it’s you that I fear.

I went to the MySQL Storage Engine (SE) Summit held on the Sun
campus in Santa Clara.  I thought it was a great meeting, and many
thanks to Sanjay for inviting us.  Also attending from Tokutek were
Zardosht and Tom.  We heard interesting points of view from SE
implementers such as Akiba, ScaleDB, InnoDB, PBXT, and Virident, as
well as from the Sun/MySQL implementors.  Here are a few highlights:

Everyone agrees that the Storage Engine (SE) API needs better
documentation.

The InnoDB team suggested that one approach …

[Read more]
TokuDB Storage Engine for MySQL

Tokutek officially announced the TokuDB for MySQL v2.0 Storage Engine, v2.0 on April 16th, 2009.  TokuDB uses Fractal Tree (TM) technology to boost MySQL performance for users challenged with interactive querying in high volume, always-on applications.  As a pure SW storage engine, TokuDB provides drop-in compatibility for existing MySQL code and applications.  Curt Monash posted an introduction to Fractal Tree technology over on Monash Research’s DBMS2 blog.  TokuDB is …

[Read more]
Improving TPC-H-like Queries - Q2

Posted by: Bradley C. Kuszmaul and David Wells

Executive Summary: A MySQL straight join can speed up a query that is very similar to TPC-H Q2 by a factor of 159 on MySQL.

Posted by Bradley C. Kuszmaul and David Wells

Executive Summary: A MySQL straight join can speed up a query that is very similar to TPC-H Q2 by a factor of 159 on MySQL.

Recently, we began looking at TPC-H performance on MySQL. Our early tests yielded unexpectedly poor performance for MyISAM, InnoDB and the Tokutek storage engine. So we decided to take at look at each query individually to see what could be done. This post is about Query 2.

Before going further, let us be clear - this is NOT "TPC-H" benchmarking. The TPC prescribes methods and procedures for measuring performance, and we didn't follow the rules (which you can read at …

[Read more]
MySQL insert performance with iiBench Python client

Mark Callaghan recently developed and released an enhanced Python version of Tokutek’s iiBench benchmark (Thanks Mark!).  We’re happy to see a Python version of the benchmark as it can now more easily be run by a broader group of people in more diverse environments.  Going forward, we will continue building upon Mark’s work on the Python version.  In addition to porting iiBench to Python, Mark added query capabilities to it, functionality that we were also planning to add.  We will test and discuss query performance in a future post.

Given our focus on overall performance, we tested the insert performance of the Python version (iiBench.py) and found that the resulting numbers were about 30% lower than with …

[Read more]
Consolidating Blogs

I'm consolidating my various blogs to a common blog, the Database Geek. I'm still tweaking the site and I plan to add links to categories (i.e. Oracle, Postgres, MySQL, etc) so that it is easily searchable. I also plan to add an RSS feed for each category so you only read the topics that interest you. Stop by and check it out.

Any comments or feedback is appreciated.

LewisC

Technorati : blogging, database-geek.com

Showing entries 61 to 70 of 117
« 10 Newer Entries | 10 Older Entries »