Showing entries 131 to 140 of 211
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: big data (reset)
FROSCON and VLDB

Next week I (Bradley) will be traveling to FROSCON near Bonn, Germany, and then on to VLDB in Istanbul.

At FROSCON I’ll be talking about fast data structures for maintaining indexes. The talk will share some content with my upcoming MySQL Connect talk.

At VLDB, Dzejla Medjedovic will be presenting a talk on our paper on SSD-friendly Bloom-filter-like data structures. The paper is

Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, and Erez Zadok.
Don’t Thrash: How to Cache Your Hash on Flash. PVLDB 5(11):1627-1637, 2012.

An earlier version of the paper appeared at …

[Read more]
Dagstuhl Seminar on Database Workload Management

A few weeks ago Bradley Kuszmaul and I attended the Dagstuhl Seminar on Database Workload Management.

The Dagstuhl computer science research center is (remotely) located in the countryside in Saarland, Germany. The actual building is an 18th Century Manor House, first retooled as an old-age home, and then a computer science research center. Workshop participants typically spend the whole week talking and working together.

Dagstuhl Computer Science Center

Shivnath Babu (Duke University), Goetz Graefe (Hewlett Packard), and Harumi Kuno (Hewlett Packard) did a great job organizing. …

[Read more]
A Few Thoughts on OSCon and the Open Source Community

This past week I attended OSCon, the annual conference for open source’s true believers. And there was a religious fervor in the air, particularly from the point of view of someone more accustomed to Oracle conferences.

And if open source is the religion, proprietary closed-source companies are the devil. That having been said, I was surprised how virtually all large companies were demonized. Even long-time defenders of open source like IBM were ignored at best. That didn’t prevent them from coming though, with Microsoft and HP in particular with high-profile sponsorships and PR offensives that didn’t seem to have much influence with the crowd.

The companies generating buzz were the small companies built around development of their own open source products. There are a surprising number of them out there, especially relating to multiple forks of a popular product like MySQL or …

[Read more]
Webinar: Understanding Indexing

Three rules on making indexes around queries to provide good performance

Application performance often depends on how fast a query can respond and query performance almost always depends on good indexing. So one of the quickest and least expensive ways to increase application performance is to optimize the indexes. This talk presents three simple and effective rules on how to construct indexes around queries that result in good performance.


Time: 2PM EDT / 11AM PDT

This webinar is a general discussion applicable to all databases using indexes and is not specific to any particular MySQL® storage engine (e.g., InnoDB, TokuDB®, etc.). The rules are explained using a simple model that does NOT rely on understanding B-trees, Fractal Tree® indexing, …

[Read more]
So now Hadoop's days are numbered?

Earlier this week we all read GigaOM's article with this title:
"Why the days are numbered for Hadoop as we know it"I know GigaOM like to provoke scandals sometimes, we all remember some other unforgettable piece, but there is something behind it...

Hadoop today (after SOA not so long ago) is one of the worst case of an abused buzzword ever known to men. It's everything, everywhere, can cure illnesses and do "big-data" at the same time! Wow! Actually Hadoop is a software framework that supports data-intensive distributed applications, derived from Google's MapReduce and Google File System (GFS) papers.

My take from the article is this: Hadoop is a foundation, low-level platform. I used the word …

[Read more]
My Talks at MySQL Connect and Percona Live NYC


Solving the Challenges of Big Databases with MySQL

When you’re using MySQL for big data (more than ten times as large as main memory), these challenges often arise: loading data fast; maintaining indexes under insertions deletions, and updates; adding and removing columns online; adding indexes online; preventing slave lag; and compressing data effectively.

This session shows why some of these challenges are difficult to solve with storage engines based on B-trees, how Fractal Tree® data structures work, and why they can help solve these problems. Tokutek sells a transaction-safe Fractal Tree storage engine for MySQL, but the presentation is primarily about the underlying technology. It includes a discussion of both the theoretical and practical aspects of Fractal Tree indexes.

I have the privilege of being able to give this talk at both conferences, so please stop by my presentation at …

[Read more]
ARM based data center. Inspiring.

In a previous post I wrote ARM based servers. Since then, and thanks to all the comments and responses I got, I looked more into this ARM thing and it's absolutely fascinating...

Look at this beauty (taken from the site of Calxeda, the manufacturer):

What is it? A chip? A server? No, it's a cluster of 4 servers...

And this:

is HP Redstone Server, 288 chips, 1,152 cores (Calxeda quad-core SoC) in a 4U server “Dramatically reducing the cost and complexity of cabling and …

[Read more]
Hot Table Optimization with MySQL

Table optimization is a necessary evil; tables sometimes need to be optimized to reclaim space or to improve query performance.  Unfortunately, MySQL blocks writes to a table while it is being optimized.  Because optimization time is proportional to the table size, writes can be blocked for a long time.  Fractal Tree indexes support online optimization; however, the MySQL metadata lock gets in the way of writing while optimizing.  We will describe a simple patch to MySQL that enables online optimization of TokuDB tables.

Why do tables need to be optimized?  Here are some reasons.

  • Insertions with random keys can result in a tree with underutilized leaf blocks.  Many tree algorithms split nodes in half when they become full.  If these nodes are stored in fixed sized blocks, like many B-trees do, then there can be a lot of wasted space.  Table optimization of B-trees write blocks with less …
[Read more]
The catch-22 of read/write splitting

In my previous post I covered the shard-disk paradigm's pros and cons, but the conclusion that is that it cannot really qualify as a scale-out solution, when it comes to massive OLTP, big-data, big-sessions-count and mixture of reads and writes.

Read/Write splitting is achieved when numerous replicated database servers are used for reads. This way the system can scale to cope with increase in concurrent load. This solution qualifies as a scale-out solution as it allow expansion beyond the boundaries of one DB, DB machines are shared-nothing, can be added as a slave to the replication "group" when required.


And, as a fact, read/write …

[Read more]
Why shared-storage DB clusters don't scale

Yesterday I was asked by a customer for the reason why he had failed to achieve scale with a state-of-the-art "shared-storage" cluster. "It's a scale-out to 4 servers, but with a shared disk. And I got, after tons of work and efforts, 130% throughput, not even close to the expected 400%" he said.

Well, scale-out cannot be achieved with a shared storage and the word "shared" is the key. Scale-out is done with absolutely nothing shared or a "shared-nothing" architecture. This what makes it linear and unlimited. Any shared resource, creates a tremendous burden on each and every database server in the cluster.

In a previous post, I identified database engine activities such as buffer management, locking, thread locks/semaphores, and recovery tasks - as the main bottleneck in the OLTP …

[Read more]
Showing entries 131 to 140 of 211
« 10 Newer Entries | 10 Older Entries »