Showing entries 1 to 10 of 43
10 Older Entries »
Displaying posts with tag: cloudera (reset)
Hark: The Software Paradox

Stephen O'Grady at RedMonk has launched a new Podcast called Hark. In his second episode, he and Agile programming guru Kent Beck have a thoughtful discussion around the ideas in O'Grady's book "The Software Paradox."  Even though software is "eating the world" and become more widespread and strategic, its economic value appears to be declining rapidly. Certainly, we've seen a shift in the …

[Read more]
Rosetta Stone: MySQL, Pig and Spark (Basics)

In a world where new data processing languages appear every day, it can be helpful to have tutorials explaining language characteristics in detail from the ground up.  This blog post is not such a tutorial.   It also isn’t a tutorial on getting started with MySQL or Hadoop, nor is it a list of best practices for the various languages I’ll reference here – there are bound to be better ways to accomplish certain tasks, and where a choice was required, I’ve emphasized clarity and readability over performance.  Finally, this isn’t meant to be a quickstart for SQL experts to access Hadoop – there are a number of SQL interfaces to Hadoop such as Impala or Hive that make Hadoop incredibly accessible to those with existing SQL skills.

Instead, this post is a pale equivalent of the …

[Read more]
How to Deploy a Cluster


In this blog post I will talk about how to deploy a cluster, the methods I tried and my solution to resolving the prerequisites problem.

I’m fairly new to the big data field. Learning about Hadoop, I kept hearing the term “clusters”, deploying a cluster, and installing some services on namenode, some on datanode and so on. I also heard about Cloudera manager which helps me to deploy services on my cluster, so I set up a VM and followed several tutorials including the Cloudera documentation to install cloudera manager. However, every time I reached the “cluster installation” step my installation failed. I later found out that there are several prerequisites for a Cloudera Manager Installation, which was the reason for the failure to install.


Deploy a Cluster

Though I discuss 3 other methods in detail, ultimately I recommend method …

[Read more]
Introducing VMware Continuent 4.0 – MySQL Clustering and Real-time Replication to Data Warehouses

It’s with great pleasure we announce the general availability of VMware Continuent 4.0 – a new suite of solutions for clustering and replication of MySQL to data warehouses.

VMware Continuent enables enterprises running business-critical database applications to achieve commercial-grade high availability (HA), globally redundant disaster recovery (DR) and performance scaling. The new suite

Hadoop BoF Session at OSCON

I have a BoF session next week at OSCON next week:

Migrating Data from MySQL and Oracle into Hadoop

The session is at 7pm Tuesday night – look for rooms D135 and/or D137/138.

Correction: We are now in  E144 on Tuesday with the Hadoop get together first at 7pm, and the Data Migration to follow at 8pm.

I’m actually going to be joined by Gwen Shapira from Cloudera, who has a BoF session on Hadoop next door at the same time, along with Eric Herman from We’ll use the opportunity to talk all things Hadoop, but particularly the ingestion of data from MySQL and other databases into the Hadoop datastore.

As always, it’d be great to meet anybody interested in Hadoop at the BoF, please come along and introduce yourselves, and …

[Read more]
Continuent Delivers Real-Time Data to Cloudera | Business Wire

SAN JOSE, CA– May 6, 2014 – Continuent, Inc., a leading provider of open source database clustering and replication solutions, today announced that their recently announced Tungsten Replicator 3.0 solution has been certified by Cloudera, the leader in enterprise analytic data management powered by Apache Hadoop™. Continuent Tungsten Replicator 3.0 enables organizations to quickly and easily 

Tungsten Replicator 3.0 is Cloudera Enterprise 5 Certified

One of the key platforms I’ve been testing on for the MySQL to Hadoop replication has been Cloudera, largely driven by customer requirements, but it’s also one of the easiest way to get started with Hadoop.

What I’m even more pleased about is the fact that we are proud to announce that Tungsten Replicator 3.0 is certified for use on the new Cloudera Enterprise 5 platform. That means that we’re sure that replicating your data from MySQL to Cloudera 5 and have it work without causing problems or difficulties on the Hadoop loading and materialisation.

Cloudera is a great product, and we’re very happy to be working so effectively with the new Cloudera Enterprise 5. Cloudera …

[Read more]
Typical “Big” Data Architecture

Here is the typical “Big” data architecture, that covers most components involved in the data pipeline. More or less, we have the same architecture in production in number of places[...]

CAOS Theory Podcast 2012.01.20

Topics for this podcast:

*Hadoop v1.0 and year ahead
*Oracle-Cloudera deal for more Hadoop
*Oracle’s ‘Sun spot’ with Solaris
*Open Source M&A outlook for 2012
*Our new MySQL/NoSQL/NewSQL survey

iTunes or direct download (28:49, 4.9MB)

OSSCube adds one more Cloudera Certified Developer in its Armor

OSSCube has now one more Cloudera Certified Developer for Apache Hadoop. Rakesh Kumar has become the Cloudera Certified Developer through the CCDH, the industry's only certification for software developers on Hadoop. He passed the Cloudera Certified Developer for Apache Hadoop exam after going through a rigorous training program.

Rakesh is also a MySQL certified DBA and Cluster DBA and has trained several engineers for Zend Certification Examinations.

Tags: ClouderaHadoop

Showing entries 1 to 10 of 43
10 Older Entries »