Planet MySQL

Displaying posts with tag: Hive (reset)

Apr

2013

MySQL Applier For Hadoop: Real time data export from MySQL to HDFS

Posted by Shubhangi Garg on Mon 22 Apr 2013 17:34 UTC
Tags:

Replication, Introduction, hadoop, big data, sqoop, real time, Hive, MySQL, applier, lab release, Hadoop Applier, MySQL Applier for Hadoop, CDC

MySQL replication enables data to be replicated from one MySQL database server (the master) to one or more MySQL database servers (the slaves). However, imagine the number of use cases being served if the slave (to which data is replicated) isn't restricted to be a MySQL server; but it can be any other database server or platform with replication events applied in real-time!
This is what the new Hadoop Applier empowers you to do.
An example of such a slave could be a data warehouse system such as Apache Hive, which uses HDFS as a data store. If you have a Hive metastore associated with HDFS(Hadoop Distributed File System), the Hadoop Applier can populate Hive tables in real time. Data is …

[Read more]

Apr

2013

MySQL Applier For Hadoop: Implementation

Posted by Shubhangi Garg on Mon 22 Apr 2013 17:30 UTC
Tags:

Replication, hadoop, big data, sqoop, real time, Hive, MySQL, big, applier, lab release, Hadoop Applier, MySQL Applier for Hadoop, CDC

This is a follow up post, describing the implementation details of Hadoop Applier, and steps to configure and install it. Hadoop Applier integrates MySQL with Hadoop providing the real-time replication of INSERTs to HDFS, and hence can be consumed by the data stores working on top of Hadoop. You can know more about the design rationale and per-requisites in the previous post.

Design and Implementation:

Hadoop Applier replicates rows inserted into a table in MySQL to the Hadoop Distributed File System(HDFS). It uses an API provided by libhdfs, a C library to manipulate files in HDFS.

The library comes pre-compiled with Hadoop distributions. It connects to the MySQL master (or read …

[Read more]

Apr

2013

Announcing the MySQL Applier for Apache Hadoop

Posted by Oracle MySQL Group on Mon 22 Apr 2013 15:03 UTC
Tags:

data, hadoop, sqoop, Hive, MySQL, hdfs, big, applier

Enabling Real-Time MySQL to HDFS Integration

Batch processing delivered by Map/Reduce remains central to Apache Hadoop, but as the pressure to gain competitive advantage from “speed of thought” analytics grows, so Hadoop itself is undergoing significant evolution. The development of technologies allowing real time queries, such as Apache Drill, Cloudera Impala and the Stinger Initiative are emerging, supported by new generations of resource management with Apache YARN

To support this growing emphasis on real-time operations, we are releasing a new …

[Read more]

Jan

2013

The Data Day, Two days: January 15/16 2013

Posted by Matt Aslett on Wed 16 Jan 2013 19:38 UTC
Tags:

Oracle, Uncategorized, Tokutek, Hive, datastax, MySQL, HortonWorks, NuoDB, Clustrix, Impala, ObjectRocket, Ayasdi, Lattice Engines, zettaset

Funding for Ayasdi and Zettaset. NuoDB launches cloud database. And more

For 451 Research clients: NuoDB launches distributed ‘cloud data management system’ bit.ly/UO3ssM

— Matt Aslett (@maslett) January 15, 2013

For 451 clients: Armed with $20m series C, Lattice Engines looks to bring sales intelligence inside bit.ly/11z4VdF By Krishna Roy

— Matt Aslett (@maslett) January 16, 2013

Ayasdi Launches with $10 Million from Khosla Ventures and FLOODGATE. bit.ly/X7oemJ

— Matt Aslett (@maslett) …

[Read more]

Nov

2012

Typical “Big” Data Architecture

Posted by Venu Anuganti on Fri 30 Nov 2012 22:15 UTC
Tags:

postgresql, sql, database, scalability, ETL, hadoop, data warehouse, MapReduce, hbase, reporting, cloudera, NoSQL, vertica, Hive, bigdata, MySQL, SAS, Big Data Architecture, Big Data Warehouse, Data Architecture, Impala, NoSQL and BigData, Data Analytics, Data Science, kognitio, druid

Here is the typical “Big” data architecture, that covers most components involved in the data pipeline. More or less, we have the same architecture in production in number of places[...]

Jun

2011

HPCC vs Hadoop at a glance

Posted by Roland Bouman on Sat 18 Jun 2011 08:22 UTC
Tags:

Open Source, gpl, Pentaho, ETL, hadoop, agpl, business intelligence, big data, sqoop, NoSQL, Pig, Hive, Roxie, Thor, Apache v2 license, ECL, HPCC Systems

Update

Since this article was written, HPCC has undergone a number of significant changes and updates. This addresses some of the critique voiced in this blog post, such as the license (updated from AGPL to Apache 2.0) and integration with other tools. For more information, refer to the comments placed by Flavio Villanustre and Azana Baksh.

The original article can be read unaltered below:

Yesterday I noticed this tweet by Andrei Savu: . This prompted me to read the related GigaOM article and then check out the HPCC Systems …

[Read more]

Mar

2011

451 CAOS Links 2011.03.25

Posted by The 451 Group on Fri 25 Mar 2011 17:11 UTC
Tags:

gpl, software, Linux, Google, opensource, symbian, Red Hat, 451 group, 451caostheory, 451group, caostheory, matt aslett, mattaslett, matthew aslett, matthewaslett, open-source, The 451 Group, the451group, hadoop, honeycomb, black duck, continuent, android, Mike Olson, tungsten, Mark Radcliffe, cloudera, Stephen Walli, rackspace, openlogic, hadoopdb, cassandra, mulesoft, Tcat server, tasktop, genuitec, Future of Open Source, North Bridge, Hive, OpenStack, clearstone, datastax, brisk, evident software, hadapt, mapr, myeclipse, rick clark

Red Hat grows revenue 20%+. Google withholding Honeycomb source code. And more.

Follow 451 CAOS Links live @caostheory on Twitter and Identi.ca, and daily at Paper.li/caostheory
“Tracking the open source news wires, so you don’t have to.”

# Red Hat reported Q4 revenue up 25% to $245m, FY revenue up 22% to $909m

# Google is withholding the source code to Honeycomb for the foreseeable future.

# Rick Clark explained why he left Rackspace amid concerns that the company is exerting too much control over OpenStack.

# DataStax …

[Read more]

Jul

2010

MapReduce – DBInputFormat – Serialization on readers

Posted by Venu Anuganti on Tue 20 Jul 2010 05:46 UTC
Tags:

database, scalability, MapReduce, cloudera, sqoop, Hadopp, cloudera import tool, DBInputFomat Locking issue, Hive, how to load mysql data to hadoop, mapreduce isolation, MySQL

Last week I was working on EC2 MySQL server where one of the slave is taking lot of time to catch-up; and only job that is running on that server[...]

Get Started Contributing

Events

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links