Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 12

Displaying posts with tag: Hive (reset)

Using Apache Hadoop and Impala together with MySQL for data analysis
+0 Vote Up -0Vote Down

Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from  MySQL to Hadoop, load the data to Cloudera Impala (columnar format) and run a reporting on top of that. For the examples below I will use the “ontime flight performance” data from my previous post (Increasing MySQL performance with parallel query execution). I’ve used the

  [Read more...]
Percona Live London 2013: an insider’s view of the schedule
+1 Vote Up -0Vote Down

With the close of call for papers earlier this month, the Percona Live London conference committee was in full swing this past week reviewing all of the many submissions for November’s Percona Live London MySQL Conference.

The submissions are far ranging and cover some really interesting topics, making the lineup for Percona Live London really strong! What the committee looks for in a submission is how much “value” a talk will bring to the

  [Read more...]
Big Data with MySQL and Hadoop at MySQL Connect 2013
+1 Vote Up -0Vote Down

I will be talking about Big Data with MySQL and Hadoop at MySQL Connect 2013 (Sept. 21-22) in San Francisco as well as at Percona University at Washington, DC (September 12, 2013). Apache Hadoop is a very popular Big Data solution and we can nowadays easily integrate it with MySQL. I will start with a brief introduction of Apache Hadoop and its components (HFDS, Map/Reduce, Hive, HBase/HCatalog, Flume, Scoop, etc). Next I will show 2 major Big Data scenarios:

  • From file to Hadoop to MySQL. This is an example of “ELT” process: Extract data from external source; Load data into Hadoop; Transform
  [Read more...]
MySQL and Hadoop integration
+0 Vote Up -0Vote Down

Dolphin and Elephant: an Introduction

This post is intended for MySQL DBAs or Sysadmins who need to start using Apache Hadoop and want to integrate those 2 solutions. In this post I will cover some basic information about the Hadoop, focusing on Hive as well as MySQL and Hadoop/Hive integration.

First of all, if you were dealing with MySQL or any other relational database most of your professional life (like I was), Hadoop may look different. Very different. Apparently, Hadoop is the opposite to any relational database. Unlike the database where we have a set of tables and indexes, Hadoop works with a set of text files. And… there are no indexes at all. And yes, this may be shocking,

  [Read more...]
MySQL Applier For Hadoop: Implementation
Employee +4 Vote Up -0Vote Down

This is a follow up post, describing the implementation details of Hadoop Applier, and steps to configure and install it. Hadoop Applier integrates MySQL with Hadoop providing the real-time replication of INSERTs to HDFS, and hence can be consumed by the data stores working on top of Hadoop. You can know more about the design rationale and per-requisites in the previous post.

Design and Implementation:

Hadoop Applier replicates rows inserted into a table in MySQL to the Hadoop Distributed File System(HDFS). It uses an API provided by libhdfs, a C library to manipulate files in HDFS.

The library comes pre-compiled with Hadoop distributions.It






  [Read more...]
MySQL Applier For Hadoop: Real time data export from MySQL to HDFS
Employee +2 Vote Up -0Vote Down

MySQL replication enables data to be replicated from one MySQL database server (the master) to one or more MySQL database servers (the slaves). However, imagine the number of use cases being served if the slave (to which data is replicated) isn't restricted to be a MySQL server; but it can be any other database server or platform with replication events applied in real-time! 
This is what the new Hadoop Applier empowers you to do.
An example of such a slave could be a data warehouse system such as Apache Hive, which uses HDFS as a data store. If you have a Hive metastore associated with HDFS(Hadoop Distributed File System), the Hadoop Applier can populate Hive


  [Read more...]
Announcing the MySQL Applier for Apache Hadoop
Employee_Team +5 Vote Up -0Vote Down

Enabling Real-Time MySQL to HDFS Integration

Batch processing delivered by Map/Reduce remains central to Apache Hadoop, but as the pressure to gain competitive advantage from “speed of thought” analytics grows, so Hadoop itself is undergoing significant evolution. The development of technologies allowing real time queries, such as Apache Drill, Cloudera Impala and the Stinger Initiative are emerging, supported by new generations of resource management with Apache YARN

To support this growing emphasis on real-time operations, we are releasing a new

  [Read more...]
The Data Day, Two days: January 15/16 2013
+0 Vote Up -0Vote Down

Funding for Ayasdi and Zettaset. NuoDB launches cloud database. And more

For 451 Research clients: NuoDB launches distributed ‘cloud data management system’ bit.ly/UO3ssM

— Matt Aslett (@maslett) January 15, 2013

For 451 clients: Armed with $20m series C, Lattice Engines looks to bring sales intelligence inside bit.ly/11z4VdF By Krishna Roy

— Matt Aslett (@maslett) January 16, 2013

Ayasdi Launches with $10 Million from Khosla Ventures and FLOODGATE. bit.ly/X7oemJ

— Matt Aslett (@maslett)

  [Read more...]
Typical “Big” Data Architecture
+1 Vote Up -0Vote Down
Here is the typical “Big” data architecture, that covers most components involved in the data pipeline. More or less, we have the same architecture in production in number of places[...]
HPCC vs Hadoop at a glance
+0 Vote Up -0Vote Down

Update

Since this article was written, HPCC has undergone a number of significant changes and updates. This addresses some of the critique voiced in this blog post, such as the license (updated from AGPL to Apache 2.0) and integration with other tools. For more information, refer to the comments placed by Flavio Villanustre and Azana Baksh.

The original article can be read unaltered below:

Yesterday I noticed this tweet by Andrei Savu: . This prompted me to read the related GigaOM article and then check out the  [Read more...]
451 CAOS Links 2011.03.25
+0 Vote Up -0Vote Down

Red Hat grows revenue 20%+. Google withholding Honeycomb source code. And more.

Follow 451 CAOS Links live @caostheory on Twitter and Identi.ca, and daily at Paper.li/caostheory
“Tracking the open source news wires, so you don’t have to.”

# Red Hat reported Q4 revenue up 25% to $245m, FY revenue up 22% to $909m

# Google is withholding the source code to Honeycomb for the foreseeable future.

# Rick Clark explained why he left Rackspace amid concerns that the company is exerting too much control over OpenStack.

# DataStax launched Brisk, a Hadoop/Hive


  [Read more...]
MapReduce – DBInputFormat – Serialization on readers
+1 Vote Up -0Vote Down
Last week I was working on EC2 MySQL server where one of the slave is taking lot of time to catch-up; and only job that is running on that server[...]
Showing entries 1 to 12

Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.