Showing entries 1 to 10 of 22
10 Older Entries »
Displaying posts with tag: MapReduce (reset)
The Uber Engineering Tech Stack, Part II: The Edge and Beyond

The end of a two-part series on the tech stack that Uber Engineering uses to make transportation as reliable as running water, everywhere, for everyone, as of spring 2016.

The post The Uber Engineering Tech Stack, Part II: The Edge and Beyond appeared first on Uber Engineering Blog.

What’s the latest with Hadoop

The Big Data explosion in recent years has created a vast number of new technologies in the area of data processing, storage, and management. One of the biggest names to appear on the scene is Hadoop. In case you need a quick review, Hadoop is a Big Data storage system that takes in large amounts of data from servers and breaks it into smaller, manageable chunks. The technology is complex but at a high level the Hadoop ecosystem essentially takes a “divide and conquer” approach to processing Big Data instead of processing data in tables, as in a relational database like Oracle or MySQL.



One projection expects …

[Read more]
On Hadoop RDBMS. Interview with Monte Zweben.

“HBase and Hadoop are the only technologies proven to scale to dozens of petabytes on commodity servers, currently being used by companies such as Facebook, Twitter, Adobe and”–Monte Zweben.

Is it possible to turn Hadoop into a RDBMS? On this topic, I have interviewed Monte Zweben, Co-Founder and Chief Executive Officer of Splice Machine.


Q1. What are the main challenges of applications and operational analytics that support real-time, interactive queries on data updated in real-time for Big Data?

Monte Zweben: Let’s break down “real-time, interactive queries on data updated in real-time for Big Data”. “Real-time, interactive queries” means that results need to be returned in milliseconds to a few seconds.
For “Data updated in real-time” to happen, …

[Read more]
Data Analytics at NBCUniversal. Interview with Matthew Eric Bassett.

“The most valuable thing I’ve learned in this role is that judicious use of a little bit of knowledge can go a long way. I’ve seen colleagues and other companies get caught up in the “Big Data” craze by spend hundreds of thousands of pounds sterling on a Hadoop cluster that sees a few megabytes [...]

On Big Data, Analytics and Hadoop. Interview with Daniel Abadi.

“Some people even think that “Hadoop” and “Big Data” are synonymous (though this is an over-characterization). Unfortunately, Hadoop was designed based on a paper by Google in 2004 which was focused on use cases involving unstructured data (e.g. extracting words and phrases from Webpages in order to create Google’s Web index). Since it was not [...]

Typical “Big” Data Architecture

Here is the typical “Big” data architecture, that covers most components involved in the data pipeline. More or less, we have the same architecture in production in number of places[...]

MySQL and Hadoop


"Improving MySQL performance using Hadoop" was the talk which I and Manish Kumar gave at Java One & Oracle Develop 2012, India. Based on the response and interest of the audience, we decided to summarize the talk in a blog post. The slides of this talk can be found here. They also include a screen-cast of a live Hadoop system pulling data from MySQL and working on the popular 'word count' problem.

MySQL and Hadoop have been popularly considered as 'Friends with benefits' and our talk was aimed at showing how!

The benefits of MySQL to developers are the speed, reliability, data integrity and …

[Read more]
A super-set of MySQL for Big Data. Interview with John Busch, Schooner.

“Legacy MySQL does not scale well on a single node, which forces granular sharding and explicit application code changes to make them sharding-aware and results in low utilization of severs”– Dr. John Busch, Schooner Information Technology A super-set of MySQL suitable for Big Data? On this subject, I have interviewed Dr. John Busch, Founder, Chairman, [...]

451 CAOS Links 2011.08.23

Engine Yard acquires Orchestra. Red Hat considers NoSQL move. And more.

# Engine Yard announced a definitive agreement to acquire Orchestra, bringing PHP expertise to the Engine Yard platform.

# Red Hat’s CEO indicated the company is interested in a NoSQL or Hadoop acquisition.

# Gluster announced Apache Hadoop compatibility in the next GlusterFS release.

# Microsoft signed an agreement with China Standard Software Co (CS2C) to …

[Read more]
451 CAOS Links 2011.07.01

A herd of Hadoop announcements. Rockmelt raises $30m. And more.

A herd of Hadoop announcements
# Yahoo! and Benchmark Capital confirmed the formation of Hortonworks, an independent company focused on the development and support of Apache Hadoop.

# Cloudera announced the availability of Cloudera Enterprise 3.5 and the launch of Cloudera SCM Express, based on the new Service and Configuration Manager in Cloudera Enterprise 3.5.

# MapR …

[Read more]
Showing entries 1 to 10 of 22
10 Older Entries »