In Part 1 we discussed our existing architecture for ingesting MySQL called Tracker, including its wins, challenges and an outline of the new architecture with a focus on the Hadoop side. Here we’ll focus on the implementation details on the MySQL side. The uploader of data to S3 has been open-sourced as part of the Pinterest MySQL Utils.
Tracker V-0
As a proof of concept, we wrote a hacky 96-line Bash script to unblock backups to Hive for a new data set. The script spawned a bunch of workers that each worked on one database at a time. For each table in the database, it ran SELECT INTO OUTFILE and then uploaded the data to S3. It worked, but BASH… And that just isn’t a long term solution.
Tracker V-1
For our …
[Read more]