The Last Mile for Big Data – Strata Overview with Jeff Kelly of Wikibon (Part 2)

During the second half of our CUBE discussion with Wikibon analyst Jeff Kelly at this year’s Strata Conference in Santa Clara, we talked about the tipping point for Big Data. Strata veterans could see at a glance that this year’s conference was markedly different. No longer the exclusive domain of geeks and database administrators, this year’s Strata featured some of the biggest enterprise vendors around. With heavy weight enterprise players Intel and EMC Greenplum announcing their own Hadoop distributions, big data is clearly going mainstream. Now that we know how to capture, store, access and analyze big data, what’s the next step? Listen in to hear my conversation with Jeff Kelly about taking big data down its last mile and finally putting it in the hands of business users.

MySQL and MongoDB – Strata Discussion with Jeff Kelly of Wikibon (Part 1)

We had the opportunity to do a CUBE interview with Wikibon analyst Jeff Kelly at last week’s Strata Conference in Santa Clara. In the first part of our conversation, we discuss how our success in integrating Tokutek’s Fractal Tree® technology into MySQL has led us to another popular database, MongoDB. We explain the results of our recent benchmarking tests with MongoDB, which indicate that adding indexing can also improve performance for this popular NoSQL database with faster insertion rates, lower query latency and greater …

MongoDB + Fractal Tree Indexes = High Compression

One doesn’t have to look far to see that there is strong interest in MongoDB compression. MongoDB has an open ticket from 2009 titled “Option to Store Data Compressed” with Fix Version/s planned but not scheduled. The ticket has a lot of comments, mostly from MongoDB users explaining their use-cases for the feature. For example, Khalid Salomão notes that “Compression would be very good to reduce storage cost and improve IO performance” and Andy notes that “SSD is getting more and more common for servers. They are very fast. The problems are high costs and low capacity.” There are many …

NoSQL is Great, But You Still Need Indexes

I’ve said it before, and, as is the nature of these things, I’ll almost certainly say it again: your database performance is only as good as your indexes.

That’s the grand thesis, so what does that mean? In any DB system — SQL, NoSQL, NewSQL, PostSQL, … — data gets ingested and organized. And the system answers queries. The pain point for most users is around the speed to answer queries. And the query speed (both latency and throughput, to be exact) depend on how the data is organized. In short: Good Indexes, Fast Queries; Poor Indexes, Slow Queries.

But building indexes is hard work, or at least it has been for the last several decades, because almost all indexing is done with B-trees. That’s true of commercial databases, of MySQL, and of most NoSQL solutions that do indexing. (The ones that don’t do …

Big Data and MySQL – a Discussion with SiliconANGLE on theCUBE

Given all the focus and hype on Big Data, I was excited to have the chance at the recent O’Reilly Strata Show to sit down with Jeff Kelly, one of the top rated “Big Data” analysts, to give a MySQL perspective. Below is my interview with Jeff Kelly and David Floyer.

In the segment, you’ll find a number of topics. These include indexing technology, NoSQL vs. MySQL, when to use flash drives, how to avoid partitioning, and customer uses cases.

David makes a particularly salient …

O’Reilly Strata 2012: The Year of the Data Scientist

We had the privilege this past week to be invited to be part of the 2012 O’Reilly Strata “Making Data Work” Conference. Some of our photos from the event are here. At the event, we were excited to have Tokutek described in front of the approximately 2,500 attendees during the keynote sessions.

Overall, the diversity of topics discussed at the conference was impressive, spanning databases, developer tools, data visualization techniques, customer stories, and business implications. The full agenda is here.

For those who missed it, here are …

Evidenzia Upgrades to TokuDB v5.2 to Address Storage Growth and Scale Performance

Ensuring sufficient disk I/O to catch copyright violations at network speed. Evidenzia GmbH & Co. KG

Issues addressed:

  • Storage growth, including maxed-out disk I/O utilization
  • Performance issues and business impact due to slow selects
  • Inability to revise data schema on the fly

The Company: Evidenzia GmbH & Co. KG is one of the leading partners of the software, movie and music industry when it comes to tracing copyright infringements and illegal file sharing activities in peer-to-peer networks. Evidenzia helps copyright owners in protecting their intellectual property. Their powerful technologies enable …

Tokutek Selected as a Finalist for O’Reilly Strata Conference

We are excited to announce that we’ve been named as one of ten finalists selected for the startup showcase at the O’Reilly Strata “Making Data Work” Conference at the end of this month in Santa Clara, California. The startup showcase will be held on February 29th, starting at 6:30 pm.

The conference offers a great overview of the big data space, with tracks on Data Science, Business and Industry, Visualization and Interfaces, Hadoop Applied, Hadoop Tech, Policy and Privacy, and Domain Data. With all of the “NoSQL” buzz and sessions at the show (Hadoop gets two tracks!), we are glad …

From Under the Desk to the Cloud


Review of the O’Reilly Strata Making Data Work Conference

(reprinted from my guest blog for the Cloud Council of 7)

Monica Rogati of LinkedIn told a story of the early days at the firm, when the reporting system consisted of a single server under someone’s desk. One day, someone needed an Ethernet cable and unplugged the machine from the data outlet in the wall. LinkedIn’s data reporting, its life blood, instantly came to a screeching halt.

The Push to the …

