451 CAOS Links 2011.01.18

Funding for OpenGamma. Riptano becomes OpenStax. And more.

# OpenGamma raised $6m series B funding.

# Apache Cassandra-supporter Riptano changed its name to DataStax and has added 50 customers in 6 months.

# WANdisco acquired the Subversion user community.

# Univa hired the principal engineers from the Grid Engine team, will publish a …

How Real is the Data Deluge?

It seems obvious that given the decreasing cost of storage and computation, there's going to be a significant increase in the volume of data that organizations accumulate over the next 10 years.  But the type of data being accumulated may be different from the areas where traditional DBMSs dominated.  It's not just about transactions; it's search patterns, on-line behavior, click-thru data, events fired off by smartphones, messages over Twitter & Facebook, log data of various kinds.

If an organization can figure out a better way identify prospects, or deliver more targeted ads, or optimize pricing decisions by analyzing terrabytes of data, they'd be crazy not to. Over the long term, companies that don't develop these capabilities will be at a competitive disadvantage.

As to what the implications are from a …

451 CAOS Links 2010.10.08

Patents! Patents! Patents! Canonical’s perfect 10. And more.

# Google responded to Oracle’s claims that its Android OS infringes copyrights and patents related to Java.

# Matt Asay evaluated the various patent claims against Android and its related devices.

# Microsoft licensed smartphone patents from ACCESS Co and a subsidiary of Acacia Research.

# Glyn Moody …

LCA Miniconf Call for Papers: Data Storage: Databases, Filesystems, Cloud Storage, SQL and NoSQL

This miniconf aims to cover many of the current methods of data storage and retrieval and attempt to bring order to the universe. We’re aiming to cover what various systems do, what the latest developments are and what you should use for various applications.

We aim for talks from developers of and developers using the software in question.

Aiming for some combination of: PostgreSQL, Drizzle, MySQL, XFS, ext[34], Swift (open source cloud storage, part of OpenStack), memcached, TokyoCabinet, TDB/CTDB, CouchDB, MongoDB, Cassandra, HBase….. and more!

Call for Papers open NOW (Until 22nd October).

Do We Need a New Programming Language for Big Data?


I'm the boards of two companies (Pentaho, Revolution Analytics) that are starting to see a lot of customer traction around Big Data. More and more companies in media, pharma, retail and finance are doing advanced analysis, reporting, graphing, etc with massive data sets. It made me wonder what other areas of the technology stack might evolve with the trend towards Big Data.  Obviously, there's new middleware layers like Hadoop and Map Reduce, and we're also seeing the emergence of NoSQL data management layers with Cassandra, MongoDB, MemBase and others.  But what …

Digg’s main competitor (Reddit) runs Cassandra but their VP of Engineering was fired for the decision to switch.

Apparently, Digg performed a big migration from MySQL to Cassandra and a big migration to their new Digg v4 architecture and now their VP of Engineering has been shown the door:

Ever since Digg launched its new site design, it’s been plagued with all kinds of trouble, not least of which is that it keeps going down. The problems with the new architecture are so bad that VP of Engineering John Quinn is now gone, we’ve confirmed with sources close to Digg.

In a Diggnation video today, CEO Kevin Rose explained some of the technical issues the site is dealing with and why it can’t simply roll back to the previous architecture. The new version of Digg, v4, is based on a distributed database called Cassandra, which replaced the MySQL database the site ran on before. Cassandra is very advanced—it is supposed to be faster and scale …

Cassandra and Ganglia

I finally got some time to do some house cleaning. One of my nagging low-hanging fruit jobs was to stop using jconsole as my monitor. I created a ganglia script to graph what is above. The image illustrated above I am showing all the Cassandra servers and their total row read stages completed in the last hour as a gauge. In essence I am graphing the delta of the change between ganglia script runs.

How I have it set up is:

All data exposed by JMX to produce tpstats and cfstats is graphed via ganglia. The pattern for each graph is as follows


stat_class - tpc, tpp, tpa means complete, pending, active respectively
key - would be message deserialization for instance.

For column family stats I graph the keyspace stats as well as the specific column family …

There’s a European OpenSQL Camp coming up

In addition to the Boston edition, there’s an OpenSQL Camp at the same time and place as FrOSCon mid-August in Germany. The call for papers is open until July 11th. As always, the conference is about all kinds of open-source databases: MySQL and PostgreSQL are only two of the obvious ones; MongoDB and Cassandra featured prominently at the last one I attended, and SQLite was well represented at the first one.

451 CAOS Links 2010.04.27

VMware and launch VMforce. Red Hat provides Cloud Access. And more.

# VMware and launched VMforce, a platform for developing and deploying Java cloud applications.

# Red Hat Cloud Access enables enterprises to use their Red Hat Enterprise Linux subscription on Amazon Web Services.

# Canonical announced Ubuntu 10.04 LTS Server Edition, Desktop Edition and ISV support.

Cloud openness contemplated

I caught some of the keynotes and discussion at the Linux Foundation Collaboration Summit today, and was particularly interested in the panel discussion on open source and cloud computing. While we are used to hearing and talking about how important open source software is to cloud computing (open source giving to cloud computing), moderator John Mark Walker posed the question of whether cloud computing gives back? The discussion also rightfully focused on openness in cloud computing, how open source might or might not translate to cloud openness and the importance of data to be open as well.

The discussion also centered on some issues regarding open standards and how open is open enough for cloud computing? …

