MySQL replication enables data to be replicated from one MySQL database server (the master) to one or more MySQL database servers (the slaves). However, imagine the number of use cases being served if the slave (to which data is replicated) isn't restricted to be a MySQL server; but it can be any other database server or platform with replication events applied in real-time!
This is what the new Hadoop Applier empowers you to do.
An example of such a slave could be a data warehouse system such as Apache Hive, which uses HDFS as a data store. If you have a Hive metastore associated with HDFS(Hadoop Distributed File System), the Hadoop Applier can populate Hive tables in real time. Data is …
This is a follow up post, describing the implementation details of Hadoop Applier, and steps to configure and install it. Hadoop Applier integrates MySQL with Hadoop providing the real-time replication of INSERTs to HDFS, and hence can be consumed by the data stores working on top of Hadoop. You can know more about the design rationale and per-requisites in the previous post.
Design and Implementation:
Hadoop Applier replicates rows inserted into a table in MySQL to the Hadoop Distributed File System(HDFS). It uses an API provided by libhdfs, a C library to manipulate files in HDFS.
The library comes pre-compiled with Hadoop distributions. It connects to the MySQL master (or read …
Enabling Real-Time MySQL to HDFS Integration
Batch processing delivered by Map/Reduce remains central to Apache Hadoop, but as the pressure to gain competitive advantage from “speed of thought” analytics grows, so Hadoop itself is undergoing significant evolution. The development of technologies allowing real time queries, such as Apache Drill, Cloudera Impala and the Stinger Initiative are emerging, supported by new generations of resource management with Apache YARN
To support this growing emphasis on real-time operations, we are releasing a new …[Read more]