In a world where new data processing languages appear every
day, it can be helpful to have tutorials explaining language
characteristics in detail from the ground up. This blog
post is not such a tutorial.
It also isn’t a tutorial on getting started with
MySQL or Hadoop, nor is it a list of best practices for the
various languages I’ll reference here – there are bound to be
better ways to accomplish certain tasks, and where a choice was
required, I’ve emphasized clarity and readability over
performance. Finally, this isn’t meant to be a quickstart
for SQL experts to access Hadoop – there are a number of SQL
interfaces to Hadoop such as Impala or Hive that make Hadoop
incredibly accessible to those with existing SQL skills.
Instead, this post is a pale equivalent of the …
[Read more]