Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 23

Displaying posts with tag: Fractal Trees (reset)

Thoughts on Small Datum – Part 1
+0 Vote Up -0Vote Down

A little background…

When I ventured into sales and marketing (I’m an engineer by education) I learned I would often have to interpret and simply summarize the business value that is sometimes hidden in benchmarks. Simply put, the people who approve the purchase of products like TokuDB® and TokuMX™ appreciate the executive summary.

Therefore, I plan to publish a multipart series here on TokuView where I will share my simple summaries and thoughts on business value for the benchmarks Mark Callaghan (@markcallaghan), a former Google and now Facebook database guru, is publishing on his blog, Small Datum.

I’m going to start with his first benchmark post and work my way forward to

  [Read more...]
Fast Updates with TokuDB
+4 Vote Up -0Vote Down

With TokuDB v6.6 out now, I’m excited to present one of my favorite enhancements: fast updates with TokuDB. Update intensive applications can have their throughput limited by the random read capacity of the storage system. The cause of the throughput limit is the read-modify-write algorithm that MySQL uses when processing update statements. MySQL reads a row from the storage engine, applies the updates to it, and then writes the new row to the storage engine. To address this throughput limit, TokuDB uses a different update algorithm that simply encodes the update expressions of the SQL statement into tiny programs that are stored in an update Fractal Tree® message. This update message is

  [Read more...]
268x Query Performance Increase for MongoDB with Fractal Tree Indexes, SAY WHAT?
+1 Vote Up -0Vote Down

Last week I wrote about our 10x insertion performance increase with MongoDB. We’ve continued our experimental integration of Fractal Tree® Indexes into MongoDB, adding support for clustered indexes.  A clustered index stores all non-index fields as the “value” portion of the index, as opposed to a standard MongoDB index that stores a pointer to the document data.  The benefit is that indexed lookups can immediately return any requested values instead of needing to do an additional lookup (and potential disk IOs) for the requested fields.

To create a clustered index you just need to add

  [Read more...]
Indexing: The Director’s Cut
+0 Vote Up -0Vote Down

Thanks again to Erin O’Neill and Mike Tougeron for having me at the SF MySQL Meetup last month for the talk on “Understanding Indexing.” The crowd was very interactive, and I appreciated that over 100 people signed up for the event and left some very positive comments and reviews.

Thanks to Mike, a video of the talk is now available:

As a brief overview – Application performance often depends on how fast a query can respond and query performance almost always depends on good indexing. So one of the quickest and least expensive ways to increase application performance is to optimize the indexes. This talk presents three simple and effective rules on how to construct

  [Read more...]
Don’t Thrash: How to Cache your Hash on Flash
+1 Vote Up -0Vote Down

Last week I gave a talk entitled “Don’t Thrash: How to Cache your Hash.” The talk took place at the Workshop on Algorithms and Data Structures (ADS) in a medieval castle turned conference center in Bertinoro, Italy. An earlier version of this work (with the same title) appeared at the HotStorage conference in Portland, OR. Tokutek co-founders Bradley, Martin, and I are coauthors on the work, along with students and other faculty at Stony Brook University.

The talk title is colorful and doggerel-y. Here’s what the title means. “Cache your hash”—the so-called Bloom Filter type data structure. A Bloom filter acts like

  [Read more...]
Don’t Thrash: How to Cache your Hash on Flash
+0 Vote Up -0Vote Down

Last week I gave a talk entitled “Don’t Thrash: How to Cache your Hash.” The talk took place at the Workshop on Algorithms and Data Structures (ADS) in a medieval castle turned conference center in Bertinoro, Italy. An earlier version of this work (with the same title) appeared at the HotStorage conference in Portland, OR. Tokutek co-founders Bradley, Martin, and I are coauthors on the work, along with students and other faculty at Stony Brook University.

The talk title is colorful and doggerel-y. Here’s what the title means. “Cache your hash”—the so-called Bloom Filter type data structure. A Bloom filter acts like a negative

  [Read more...]
Understanding Indexing – SF MySQL Meetup
+0 Vote Up -0Vote Down

At this week’s SF MySQL Meetup, I will give a talk: “Understanding Indexing: Three rules on making indexes around queries to provide good performance.” The meetup is 7 pm tomorrow (Wednesday, 6/22), and will be held at CBS Interactive (235 2nd St., San Francisco). Thanks to hosts Erin O’Neill and Mike Tougeron for the invitation and location.

Application performance often depends on how fast a query can respond and query performance almost always depends on good indexing. So one of the quickest and least expensive ways to increase application performance is to optimize the indexes. This talk presents three simple and effective rules on how to construct indexes around queries that result in good performance.

This is a general discussion applicable to all databases using indexes and is not

  [Read more...]
OldSQL Tricks or NewSQL Treats
+4 Vote Up -0Vote Down

Why do B-trees need “Tricks” to work?

Marko Mäkelä recently posted a couple of “tips and tricks” you can use to improve InnoDB performance. Tips and tricks. A general purpose relational database like MySQL shouldn’t need “tips and tricks” to perform well, and I lay the blame on design choices that were made in the early ’70s: the B-tree data structure underlying all OldSQL databases. B-trees were designed for machines that had very different performance characteristics than the machines of today. Hardware has changed, but B-trees are the same. Tips and Tricks are an attempt to make up the difference.

So B-tree implementers — InnoDB, Oracle, MS SQL Server — are fighting an uphill battle; they’re fighting the future. B-trees

  [Read more...]
Tokutek’s Chief Scientist Discusses TokuDB v5.0
+3 Vote Up -0Vote Down

Running with Big Data

It’s spring here in Boston, though one could hardly tell (still barely hitting 40°F). So, for those stuck indoors working out on the treadmill, or those lucky enough to do a workout outdoors, we’ve got a great podcast. Our Chief Scientist and co-founder Martín Farach-Colton had the privilege of sitting down with Sheeri Cabral and Sarah Novotny for their weekly MySQL Database Community Podcast (OurSQL Episode 39). In it, he speaks about Tokutek and TokuDB v5.0, which was just released last week (see

  [Read more...]
MySQL Partitioning: A Flow Chart
+2 Vote Up -4Vote Down

In Part 1, and Part 2 of this series, I presented some thoughts on partitioning. I heard some great feedback on why people use partitioning. Here, I present a flow chart that summarizes what I’ve learned. In summary: with TokuDB in the picture there’s almost no reason to use partitioning. Or I should say, there are almost always better (higher performing, more robust, lower maintenance) alternatives to partitioning.

Here goes:

  • Spindle contention? In other words, are you partitioning in order to spread your query work load across many disks? I’ve yet to see a compelling
  •   [Read more...]
    On “Replace Into”, “Insert Ignore”, Triggers, and Row Based Replication
    +0 Vote Up -0Vote Down

    In posts on June 30 and July 6, I explained how implementing the commands “replace into” and “insert ignore” with TokuDB’s fractal trees data structures can be two orders of magnitude faster than implementing them with B-trees. Towards the end of each post, I hinted at that there are some caveats that complicate the story a little. On July 21st I explained one caveat, secondary keys, and on August 3rd, Rich explained another caveat. In this

      [Read more...]
    On “Replace Into”, “Insert Ignore”, and Secondary Keys
    +0 Vote Up -0Vote Down

    In posts on June 30 and July 6, I explained how implementing the commands “replace into” and “insert ignore” with TokuDB’s fractal trees data structures can be two orders of magnitude faster than implementing them with B-trees. Towards the end of each post, I hinted at that there are some caveats that complicate the story a little. In this post, I explain one of the complications: secondary indexes.

    Secondary indexes act the same way in TokuDB as they do in InnoDB. They store the defined secondary key, and the primary key as a pointer to the rest of the row. So, say

      [Read more...]
    Why “insert … on duplicate key update” May Be Slow, by Incurring Disk Seeks
    +0 Vote Up -0Vote Down

    In my post on June 18th, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. I previously explained why it would be better to use “replace into” or to use “insert ignore” over normal inserts. In this post, I explain why another alternative to normal inserts, “insert … on duplicate key update” is no better in MySQL, because the command incurs disk seeks.

    The reason “insert ignore” and “replace into” can be made fast with

      [Read more...]
    Making “Insert Ignore” Fast, by Avoiding Disk Seeks
    +0 Vote Up -0Vote Down

    In my post from three weeks ago, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. Towards the end of the post, I claimed that it would be better to use “replace into” or “insert ignore” over normal inserts, because the semantics of these statements do NOT require disk seeks. In my post last week, I explained how the command “replace into” can be fast with TokuDB’s fractal trees. Today, I explain how “insert ignore” can be fast, using a strategy that is very similar to what we do

      [Read more...]
    Making “Replace Into” Fast, by Avoiding Disk Seeks
    +0 Vote Up -1Vote Down

    In this post two weeks ago, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. Towards the end of the post, I claimed that it would be better to use “replace into” or “insert ignore” over normal inserts, because the semantics of these statements do NOT require disk seeks. In this post, I explain how the command “replace into” can be fast with fractal trees.

    The semantics of “replace into” are as follows:


    • if the primary (or unique) key does not exist, insert the new row
    • if the primary (or unique) key does exist, overwrite the existing row with the new row

    The slow, expensive way





      [Read more...]
    Making Updates Fast, by Avoiding Disk Seeks
    +1 Vote Up -0Vote Down

    The analysis that shows how to make deletions really fast by using clustering keys and TokuDB’s fractal tree based engine also applies to make updates really fast. (I left it out of the last post to keep the story simple). As a quick example, let’s look at the following statement:

    update foo set price=price+1 where product=toy;
    

    Executing this statement has two steps:


    • a query to find where product=toy
    • a combination of insertions and deletions to change old rows to new rows

    The analysis is identical to that for deletions. Just like for





      [Read more...]
    Disk seeks are evil, so let’s avoid them, pt. 4
    +0 Vote Up -0Vote Down

    Continuing in the theme from previous posts, I’d like to examine another case where we can eliminate all disk seeks from a MySQL operation and therefore get two orders-of-magnitude speedup. The general outline of these posts is:


    • B-trees do insertion disk seeks. While they’re at it, they piggyback some other work on the disk seeks. This piggyback work requires disk seeks regardless.
    • TokuDB’s Fractal Tree indexes don’t do insertion disk seeks. If we also get rid of the piggyback work, we end up with no disk seeks, and a two order of magnitude improvement.

    So it’s all about finding out which piggyback work is important (important enough to pay a huge performance penalty for), and which isn’t.





      [Read more...]
    Making Deletions Fast, by Avoiding Disk Seeks
    +1 Vote Up -2Vote Down

    In my last post, I discussed how fractal tree data structures can be up to two orders of magnitude faster on deletions over B-trees. I focused on the deletions where the row entry is known (the storage engine API handler::delete_row), but I did not fully analyze how MySQL delete statements can be fast. In this post, I do. Here I show how one can use TokuDB, a storage engine that uses fractal tree data structures, to make MySQL deletions run fast.

    Let’s take a step back and analyze the work needed to be done to execute a MySQL delete statement. Suppose we have the table:

    create table foo (
    	id auto_increment
    	a int,
    	b int,
    	primary key (id)
    )
    

    Say we wish to perform the following operation that deletes 100,000 rows:

    delete from
      [Read more...]
    Disk seeks are evil, so let’s avoid them, pt. 3 (Deletions)
    +0 Vote Up -0Vote Down

    As mentioned in parts 1 and 2, having many disk seeks are bad (they slow down performance). Fractal tree data structures minimize disk seeks on ad-hoc insertions, whereas B-trees practically guarantee that disk seeks are performed on ad-hoc insertions. As a result, fractal tree data structures can insert data up to two orders of magnitude faster than B-Trees can.

    In this post, let’s examine deletions, and get an intuitive understanding for why fractal-tree data structures exhibit the same two orders of magnitude faster deletions than B-trees. In MySQL 5.1, this advantage is really eye-popping for TokuDB v. InnoDB, because InnoDB does not use its insert buffer for deletions.

      [Read more...]
    Disk seeks are evil, so let’s avoid them, pt. 2
    +1 Vote Up -0Vote Down

    In part 1, I discussed why having many disk seeks are bad (they slow down performance), and how fractal tree data structures minimize disk seeks on ad-hoc insertions, whereas B-trees practically guarantee that disk seeks are performed on ad-hoc insertions. As a result, fractal tree data structures can insert data up to two orders of magnitude faster than B-Trees can.

    Now that insertion disk seeks are out of the way (and I don’t want to shortchange the importance of getting rid of these seeks!), let’s look at other places where databases perform seeks, and see if we can get rid of them. Over my next couple of posts, I will look at several use cases and analyze whether disk seeks are required. If disk seeks are required, then performance will suffer on large amounts of

      [Read more...]
    Fractal Tree Video from OpenSQL Camp (Portland in 2009)
    +2 Vote Up -0Vote Down

    I recently discovered that there’s a youtube video of the talk I gave at OpenSQL Camp in Portland in 2009.

    This is a whiteboard presentation and is less well developed than the talk I gave a the MySQL conference (I posted those slides two days ago. But since it includes audio it may be easier to understand.

    This talk presents the data structure underlying the TokuDB storage engine for MySQL.

    “How Fractal Trees Work” talk at MySQL 2010
    +3 Vote Up -0Vote Down

    Here’s the talk I presented at the MySQL User Conference. This talk is a fairly technical talk on how fractal trees work.

    You can find this talk and other mostly technical material at http://tokutek.com/technology/.

    Tokutek MySQL UC Talks
    +1 Vote Up -0Vote Down

    I (Bradley C. Kuszmaul) am presenting two talks at the MySQL User Conference.

    The first talk is a 5-minute talk at tonight’s Ignite MySQL session organized by Brian Aker. I’ll present some performance measurements on the Intel X25E SSD. The bottom line is that although I can get the 3,300 random 4KB writes per second, as the spec sheet advertises, I cannot seem to get more than about 11,000 reads per second, although the spec sheet says I should get 35,000.

    My second talk is tomorrow (Thursday) at 10:50am, where I’ll talk about Fractal Trees. I’ll explain how Fractal Trees work, and show why they can get one to two orders of magnitude speedup on insertions compared to B-tree indexes. The talk is about data structures and algorithms, but I think it should be easy for everyone to

      [Read more...]
    Showing entries 1 to 23

    Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

    Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.