Planet MySQL

Displaying posts with tag: sqoop (reset)

Apr

2016

Rosetta Stone: MySQL, Pig and Spark (Basics)

Posted by Todd Farmer on Wed 13 Apr 2016 20:27 UTC
Tags:

planetmysql, hadoop, cloudera, sqoop, Pig, MySQL, spark

In a world where new data processing languages appear every day, it can be helpful to have tutorials explaining language characteristics in detail from the ground up. This blog post is not such a tutorial. It also isn’t a tutorial on getting started with MySQL or Hadoop, nor is it a list of best practices for the various languages I’ll reference here – there are bound to be better ways to accomplish certain tasks, and where a choice was required, I’ve emphasized clarity and readability over performance. Finally, this isn’t meant to be a quickstart for SQL experts to access Hadoop – there are a number of SQL interfaces to Hadoop such as Impala or Hive that make Hadoop incredibly accessible to those with existing SQL skills.

Instead, this post is a pale equivalent of the …

[Read more]

May

2014

Archival and Analytics - Importing MySQL data into Hadoop Cluster using Sqoop

Posted by Severalnines on Fri 16 May 2014 04:46 UTC
Tags:

Other, analytics, hadoop, mariadb, sqoop, galera, MySQL, archival

May 16, 2014 By Severalnines

We won’t bore you with buzzwords like volume, velocity and variety. This post is for MySQL users who want to get their hands dirty with Hadoop, so roll up your sleeves and prepare for work. Why would you ever want to move MySQL data into Hadoop? One good reason is archival and analytics. You might not want to delete old data, but rather move it into Hadoop and make it available for further analysis at a later stage.

In this post, we are going to deploy a Hadoop Cluster and export data in bulk from a Galera Cluster using Apache Sqoop. Sqoop is a well-proven approach for bulk data loading from a relational database into Hadoop File System. There is also Hadoop Applier available from …

[Read more]

Aug

2013

Big Data with MySQL and Hadoop at MySQL Connect 2013

Posted by Alexander Rubin of MySQL Performance Blog on Thu 08 Aug 2013 10:00 UTC
Tags:

hadoop, big data, sqoop, Hive, MySQL, flume, MySQL Connect 2013, Alexander Rubin

I will be talking about Big Data with MySQL and Hadoop at MySQL Connect 2013 (Sept. 21-22) in San Francisco as well as at Percona University at Washington, DC (September 12, 2013). Apache Hadoop is a very popular Big Data solution and we can nowadays easily integrate it with MySQL. I will start with a brief introduction of Apache Hadoop and its components (HFDS, Map/Reduce, Hive, HBase/HCatalog, Flume, Scoop, etc). Next I will show 2 major Big Data scenarios:

From file to Hadoop to MySQL. This is an example of “ELT” process: Extract data from external source; Load data into Hadoop; Transform data/Analyze data; Extract results to MySQL. It is similar to the original Data Warehouse ETL …

[Read more]

Jul

2013

MySQL and Hadoop integration

Posted by Alexander Rubin of MySQL Performance Blog on Thu 11 Jul 2013 10:00 UTC
Tags:

hadoop, sqoop, Hive, Insight for DBAs, MySQL, Apache Hadoop, Data Science, no sql

Dolphin and Elephant: an Introduction

This post is intended for MySQL DBAs or Sysadmins who need to start using Apache Hadoop and want to integrate those 2 solutions. In this post I will cover some basic information about the Hadoop, focusing on Hive as well as MySQL and Hadoop/Hive integration.

First of all, if you were dealing with MySQL or any other relational database most of your professional life (like I was), Hadoop may look different. Very different. Apparently, Hadoop is the opposite to any relational database. Unlike the database where we have a set of tables and indexes, Hadoop works with a set of text files. And… there are no indexes at all. And yes, this may be shocking, but all scans are sequential (full “table” scans in MySQL terms).

So, when does Hadoop makes sense?

First, Hadoop is great if you need to …

[Read more]

Apr

2013

MySQL Applier For Hadoop: Real time data export from MySQL to HDFS

Posted by Shubhangi Garg on Mon 22 Apr 2013 17:34 UTC
Tags:

Replication, Introduction, hadoop, big data, sqoop, real time, Hive, MySQL, applier, lab release, Hadoop Applier, MySQL Applier for Hadoop, CDC

MySQL replication enables data to be replicated from one MySQL database server (the master) to one or more MySQL database servers (the slaves). However, imagine the number of use cases being served if the slave (to which data is replicated) isn't restricted to be a MySQL server; but it can be any other database server or platform with replication events applied in real-time!
This is what the new Hadoop Applier empowers you to do.
An example of such a slave could be a data warehouse system such as Apache Hive, which uses HDFS as a data store. If you have a Hive metastore associated with HDFS(Hadoop Distributed File System), the Hadoop Applier can populate Hive tables in real time. Data is …

[Read more]

Apr

2013

MySQL Applier For Hadoop: Implementation

Posted by Shubhangi Garg on Mon 22 Apr 2013 17:30 UTC
Tags:

Replication, hadoop, big data, sqoop, real time, Hive, MySQL, big, applier, lab release, Hadoop Applier, MySQL Applier for Hadoop, CDC

This is a follow up post, describing the implementation details of Hadoop Applier, and steps to configure and install it. Hadoop Applier integrates MySQL with Hadoop providing the real-time replication of INSERTs to HDFS, and hence can be consumed by the data stores working on top of Hadoop. You can know more about the design rationale and per-requisites in the previous post.

Design and Implementation:

Hadoop Applier replicates rows inserted into a table in MySQL to the Hadoop Distributed File System(HDFS). It uses an API provided by libhdfs, a C library to manipulate files in HDFS.

The library comes pre-compiled with Hadoop distributions. It connects to the MySQL master (or read …

[Read more]

Apr

2013

Announcing the MySQL Applier for Apache Hadoop

Posted by Oracle MySQL Group on Mon 22 Apr 2013 15:03 UTC
Tags:

data, hadoop, sqoop, Hive, MySQL, hdfs, big, applier

Enabling Real-Time MySQL to HDFS Integration

Batch processing delivered by Map/Reduce remains central to Apache Hadoop, but as the pressure to gain competitive advantage from “speed of thought” analytics grows, so Hadoop itself is undergoing significant evolution. The development of technologies allowing real time queries, such as Apache Drill, Cloudera Impala and the Stinger Initiative are emerging, supported by new generations of resource management with Apache YARN

To support this growing emphasis on real-time operations, we are releasing a new …

[Read more]

Nov

2012

MySQL and Hadoop Integration - Unlocking New Insight

Posted by Oracle MySQL Group on Thu 29 Nov 2012 18:58 UTC
Tags:

Apache, cluster, data, hadoop, BI, sqoop, NoSQL, MySQL, big

“Big Data” offers the potential for organizations to revolutionize their operations. With the volume of business data doubling every 1.2 years, analysts and business users are discovering very real benefits when integrating and analyzing data from multiple sources, enabling deeper insight into their customers, partners, and business processes.

As the world’s most popular open source database, and the most deployed database in the web and cloud, MySQL is a key component of many big data platforms, with Hadoop vendors estimating 80% of deployments are integrated with MySQL.

The new Guide to MySQL and Hadoop presents the tools enabling integration between the two data platforms, supporting the data lifecycle from acquisition and organisation to …

[Read more]

Jun

2011

HPCC vs Hadoop at a glance

Posted by Roland Bouman on Sat 18 Jun 2011 08:22 UTC
Tags:

Open Source, gpl, Pentaho, ETL, hadoop, agpl, business intelligence, big data, sqoop, NoSQL, Pig, Hive, Roxie, Thor, Apache v2 license, ECL, HPCC Systems

Update

Since this article was written, HPCC has undergone a number of significant changes and updates. This addresses some of the critique voiced in this blog post, such as the license (updated from AGPL to Apache 2.0) and integration with other tools. For more information, refer to the comments placed by Flavio Villanustre and Azana Baksh.

The original article can be read unaltered below:

Yesterday I noticed this tweet by Andrei Savu: . This prompted me to read the related GigaOM article and then check out the HPCC Systems …

[Read more]

Jul

2010

MapReduce – DBInputFormat – Serialization on readers

Posted by Venu Anuganti on Tue 20 Jul 2010 05:46 UTC
Tags:

database, scalability, MapReduce, cloudera, sqoop, Hadopp, cloudera import tool, DBInputFomat Locking issue, Hive, how to load mysql data to hadoop, mapreduce isolation, MySQL

Last week I was working on EC2 MySQL server where one of the slave is taking lot of time to catch-up; and only job that is running on that server[...]

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links