Planet MySQL

Displaying posts with tag: hdfs (reset)

Aug

2018

Databook: Turning Big Data into Knowledge with Metadata at Uber

Posted by Uber Engineering on Fri 03 Aug 2018 15:30 UTC
Tags:

postgres, Infrastructure, metadata, Architecture, data warehouse, Data Management, vertica, cassandra, quartz, Hive, MySQL, hdfs, Kafka, gradle, Uber, Uber Data, Data Storage, Databook, Dropwizard, Queryparser, RESTful API, Uber Data Knowledge, Uber Engineering

From driver and rider locations and destinations, to restaurant orders and payment transactions, every interaction on Uber’s transportation platform is driven by data. Data powers Uber’s global marketplace, enabling more reliable and seamless user experiences across our products for riders, …

The post Databook: Turning Big Data into Knowledge with Metadata at Uber appeared first on Uber Engineering Blog.

Oct

2016

HopsFS based on MySQL Cluster 7.5 delivers a scalable HDFS

Posted by Mikael Ronström on Wed 19 Oct 2016 23:15 UTC
Tags:

MySQL Cluster, ndb, hadoop, hdfs, MySQL Cluster 7.5, HopsFS

The swedish research institute, SICS, have worked hard for a few years on
developing a scalable and a highly available Hadoop implementation using
MySQL Cluster to store the metadata. In particular they have focused on the
Hadoop file system (HDFS) and the YARN. Using features of MySQL
Cluster 7.5 they were able to achieve linear scaling in number of name
nodes as well as in number of NDB data nodes to the number of nodes
available for the experiment (72 machines). Read the press release from
SICS here

The existing metadata layer of HDFS is based on a single Java server
that acts as name node in HDFS. There are implementations to ensure
that this metadata layer have HA by using a backup name node and to
use ZooKeeper for heartbeats and a number of …

[Read more]

Jun

2014

Big Data Integration & ETL - Moving Live Clickstream Data from MongoDB to Hadoop for Analytics

Posted by Severalnines on Mon 16 Jun 2014 08:15 UTC
Tags:

Other, Data Integration, ETL, Migration, analytics, hadoop, talend, data migration, big data, mongodb, MySQL, hdfs, tokumx, clickstream

June 16, 2014 By Severalnines

MongoDB is great at storing clickstream data, but using it to analyze millions of documents can be challenging. Hadoop provides a way of processing and analyzing data at large scale. Since it is a parallel system, workloads can be split on multiple nodes and computations on large datasets can be done in relatively short timeframes. MongoDB data can be moved into Hadoop using ETL tools like Talend or Pentaho Data Integration (Kettle).

In this blog, we’ll show you how to integrate your MongoDB and Hadoop datastores using Talend. We have a MongoDB database collecting clickstream data from several websites. We’ll create a job in Talend to extract the documents from MongoDB, transform and then load them into HDFS. We will also show you how to schedule this job to be executed every 5 minutes.

Test Case

We have an application …

[Read more]

Apr

2013

Announcing the MySQL Applier for Apache Hadoop

Posted by Oracle MySQL Group on Mon 22 Apr 2013 15:03 UTC
Tags:

data, hadoop, sqoop, Hive, MySQL, hdfs, big, applier

Enabling Real-Time MySQL to HDFS Integration

Batch processing delivered by Map/Reduce remains central to Apache Hadoop, but as the pressure to gain competitive advantage from “speed of thought” analytics grows, so Hadoop itself is undergoing significant evolution. The development of technologies allowing real time queries, such as Apache Drill, Cloudera Impala and the Stinger Initiative are emerging, supported by new generations of resource management with Apache YARN

To support this growing emphasis on real-time operations, we are releasing a new …

[Read more]

Jul

2012

MySQL and Hadoop

Posted by Oracle MySQL Group on Thu 26 Jul 2012 11:50 UTC
Tags:

hadoop, MapReduce, MySQL, hdfs

Introduction

"Improving MySQL performance using Hadoop" was the talk which I and Manish Kumar gave at Java One & Oracle Develop 2012, India. Based on the response and interest of the audience, we decided to summarize the talk in a blog post. The slides of this talk can be found here. They also include a screen-cast of a live Hadoop system pulling data from MySQL and working on the popular 'word count' problem.

MySQL and Hadoop have been popularly considered as 'Friends with benefits' and our talk was aimed at showing how!

The benefits of MySQL to developers are the speed, reliability, data integrity and …

[Read more]

Dec

2011

Using MySQL Cluster to Protect & Scale the HDFS Namenode

Posted by Oracle MySQL Group on Mon 19 Dec 2011 09:51 UTC
Tags:

MySQL Cluster, cluster, hadoop, guide, MySQL, evaluation, hdfs

The MySQL Cluster product team is always interested to see new and innovative uses of the database. Last week, a team of students at the KTH Royal Institute of Technology in Sweden blogged about their use of MySQL Cluster in creating a scalable and highly available HDFS Namenode. The blog has received some pretty wide coverage, but was first picked up by Alex Popescu at the myNoSQL site

There are many established use cases of MySQL Cluster in the web, cloud/SaaS, telecoms and even flight control systems – you can see those we are allowed to talk about publicly …

[Read more]

Dec

2011

Using MySQL Cluster to Protect & Scale the HDFS Namenode

Posted by MySQL Community on Mon 19 Dec 2011 03:51 UTC
Tags:

MySQL Cluster, cluster, hadoop, guide, MySQL, evaluation, hdfs

There are many established use cases of MySQL Cluster in the web, cloud/SaaS, telecoms and even flight control systems – you can see those we are allowed to talk about publicly …

[Read more]

Nov

2010

Hadoop Cluster Setup on Debian Lenny

Posted by Frederik Konietzny on Sun 14 Nov 2010 11:34 UTC
Tags:

debian, hadoop, hdfs, Checkpointnode, Datanodes.Tasktracker, Rack awareness

Today I will describe the setup of a Hadoop / HDSF multi-node cluster on Debian Lenny with a redundant Namenode using DRBD and Heartbeat, four Datanodes and Tasktracker, a Backup- Checkpointnode and Rack awareness.

Hadoop Cluster Setup on Debian Lenny purposes

This article descibes how to setup a hadoop (version 0.21.0) cluster on debian lenny (version 5.x). I will not describe how to use MapReduce.

general

Hadoop is a framework for distributed computing written in Java. The project includs the following subprojects:

HDFS: A distributed file system
MapReduce: A framework for distributed large data processing

list of references

[Read more]

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links