In a world where new data processing languages appear every day, it can be helpful to have tutorials explaining language characteristics in detail from the ground up. This blog post is not such a tutorial. It also isn’t a tutorial on getting started with MySQL or Hadoop, nor is it a list of best practices for the various languages I’ll reference here – there are bound to be better ways to accomplish certain tasks, and where a choice was required, I’ve emphasized clarity and readability over performance. Finally, this isn’t meant to be a quickstart for SQL experts to access Hadoop – there are a number of SQL interfaces to Hadoop such as Impala or Hive that make Hadoop incredibly accessible to those with existing SQL skills.
Instead, this post is a pale equivalent of the …
[Read more]