I’ve had a few VCs ask how we compare to Hadoop and companies
using MapReduce. With Google blessing MapReduce, it seems to be
the cool new thing. I figure I’m going to have to explain this to
VCs, so I might as well blog about it.
MapReduce is a process of dividing a problem into small pieces
and distributing (mapping) those pieces to a large number of
computers. Then it collects the processed data and merges
(reduces) it into a result set. Hadoop provides the plumbing, so
users focus on writing the query and Hadoop handles the dirty
work of mapping and reducing. Such a query, using a procedural
language like Java, is more complex than a comparable SQL query,
but more on that below.
So what is MapReduce good for? It really shines when you want to
summarize, analyze or transform a very large data set. This is
why it is well suited to web data. Map reduce doesn’t utilize an
index, so the tradeoff you need to consider is whether …[Read more]