Showing entries 1 to 6
Displaying posts with tag: hdfs (reset)
Big Data Integration & ETL - Moving Live Clickstream Data from MongoDB to Hadoop for Analytics

June 16, 2014 By Severalnines

MongoDB is great at storing clickstream data, but using it to analyze millions of documents can be challenging. Hadoop provides a way of processing and analyzing data at large scale. Since it is a parallel system, workloads can be split on multiple nodes and computations on large datasets can be done in relatively short timeframes. MongoDB data can be moved into Hadoop using ETL tools like Talend or Pentaho Data Integration (Kettle).

 

In this blog, we’ll show you how to integrate your MongoDB and Hadoop datastores using Talend. We have a MongoDB database collecting clickstream data from several websites. We’ll create a job in Talend to extract the documents from MongoDB, transform and then load them into HDFS. We will also show you how to schedule this job to be executed every 5 minutes.

 

Test Case

 

We have an application …

[Read more]
Announcing the MySQL Applier for Apache Hadoop

Enabling Real-Time MySQL to HDFS Integration

Batch processing delivered by Map/Reduce remains central to Apache Hadoop, but as the pressure to gain competitive advantage from “speed of thought” analytics grows, so Hadoop itself is undergoing significant evolution. The development of technologies allowing real time queries, such as Apache Drill, Cloudera Impala and the Stinger Initiative are emerging, supported by new generations of resource management with Apache YARN

To support this growing emphasis on real-time operations, we are releasing a new …

[Read more]
MySQL and Hadoop

Introduction

"Improving MySQL performance using Hadoop" was the talk which I and Manish Kumar gave at Java One & Oracle Develop 2012, India. Based on the response and interest of the audience, we decided to summarize the talk in a blog post. The slides of this talk can be found here. They also include a screen-cast of a live Hadoop system pulling data from MySQL and working on the popular 'word count' problem.



MySQL and Hadoop have been popularly considered as 'Friends with benefits' and our talk was aimed at showing how!

The benefits of MySQL to developers are the speed, reliability, data integrity and …

[Read more]
Using MySQL Cluster to Protect & Scale the HDFS Namenode

The MySQL Cluster product team is always interested to see new and innovative uses of the database. Last week, a team of students at the KTH Royal Institute of Technology in Sweden blogged about their use of MySQL Cluster in creating a scalable and highly available HDFS Namenode. The blog has received some pretty wide coverage, but was first picked up by Alex Popescu at the myNoSQL site

There are many established use cases of MySQL Cluster in the web, cloud/SaaS, telecoms and even flight control systems – you can see those we are allowed to talk about publicly …

[Read more]
Using MySQL Cluster to Protect & Scale the HDFS Namenode

The MySQL Cluster product team is always interested to see new and innovative uses of the database. Last week, a team of students at the KTH Royal Institute of Technology in Sweden blogged about their use of MySQL Cluster in creating a scalable and highly available HDFS Namenode. The blog has received some pretty wide coverage, but was first picked up by Alex Popescu at the myNoSQL site

There are many established use cases of MySQL Cluster in the web, cloud/SaaS, telecoms and even flight control systems – you can see those we are allowed to talk about publicly …

[Read more]
Hadoop Cluster Setup on Debian Lenny

Today I will describe the setup of a Hadoop / HDSF multi-node cluster on Debian Lenny with a redundant Namenode using DRBD and Heartbeat, four Datanodes and Tasktracker, a Backup- Checkpointnode and Rack awareness.

Hadoop Cluster Setup on Debian Lenny purposes

This article descibes how to setup a hadoop (version 0.21.0) cluster on debian lenny (version 5.x). I will not describe how to use MapReduce.

general

Hadoop is a framework for distributed computing written in Java. The project includs the following subprojects:

  • HDFS: A distributed file system
  • MapReduce: A framework for distributed large data processing

list of references

[Read more]
Showing entries 1 to 6