It’s common wisdom that large-scale database systems require distributing the data across machines. But what seems to be missing in a lot of discussions is distributing the query processing too. By this I mean the actual computation that’s performed on the data.
I just had a conversation with Peter Zaitsev yesterday that helped make concrete some thoughts I’ve been having about Cassandra for a while. Because Cassandra doesn’t allow you to really do any computation in the data (aggregating, evaluating expressions, and so on), if you’re going to use it for truly Big data, you’re going to fetch enormous amounts of data across the network. Sure, you’re distributing the storage and retrieval across many machines — but you’re locating your data far[Read more...]