Showing entries 1 to 10 of 15
5 Older Entries »
Displaying posts with tag: Apache Hadoop (reset)
DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake

Keeping the Uber platform reliable and real-time across our global markets is a 24/7 business. People may be going to sleep in San Francisco, but in Paris they’re getting ready for work, requesting rides from Uber driver-partners. At that same …

The post DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake appeared first on Uber Engineering Blog.

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Cluster management, a common software infrastructure among technology companies, aggregates compute resources from a collection of physical hosts into a shared resource pool, amplifying compute power and allowing for the flexible use of data center hardware. At Uber, cluster management …

The post Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads appeared first on Uber Engineering Blog.

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks

The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

New VMware Continuent 5.0 – A powerful and cost efficient Oracle GoldenGate alternative!

VMware Continuent 5.0 is a complete data replication solution that includes all the functionality you need at one low price. In this webinar-on-demand, you’ll see how VMware Continuent delivers:  Migration. Replicate from an old version of Oracle, often running on non-Linux platform (Windows, AIX, HP-UX, Solaris), to a new version of Oracle (often running in Linux). VMware Continuent supports

Replication in real-time from Oracle and MySQL into data warehouses and analytics

Practical tips and a live demo of how to get your data warehouse loading projects off the ground quickly and efficiently when replicating from MySQL and Oracle into Amazon Redshift, HP Vertica and Hadoop.

Webinar-on-demand. Recorded 07/23/15.

Introducing VMware Continuent 4.0 – MySQL Clustering and Real-time Replication to Data Warehouses

It’s with great pleasure we announce the general availability of VMware Continuent 4.0 – a new suite of solutions for clustering and replication of MySQL to data warehouses.

VMware Continuent enables enterprises running business-critical database applications to achieve commercial-grade high availability (HA), globally redundant disaster recovery (DR) and performance scaling. The new suite

Real-time data loading from Oracle and MySQL to data warehouses, analytics

Analyzing transactional data is becoming increasingly common, especially as the data sizes and complexity increase and transactional stores are no longer to keep pace with the ever-increasing storage. Although there are many techniques available for loading data, getting effective data in real-time into your data warehouse store is a more difficult problem.In this webinar-on-demand we showcase

theCube @ Hadoop Summit 2014 - Robert Hodges (Continuent) with John Furrier and Jeff Kelly on on real-time data loading from Oracle and MySQL into Hadoop.

The Hadoop Summit, a leading Apache Hadoop industry conference, has grown significantly over the years, and throughout the day, theCUBE, led by hosts John Furrier and Jeff Kelly, featured the best of thought leaders, use cases, data scientists, data analysts, and developers at the event. Watch yesterday's interview with Robert Hodges (CEO, Continuent) on real-time data loading from Oracle and

Big Data Tools that You Need to Know About – Hadoop & NoSQL – Part 2

 

In the previous article we introduced Hadoop as the most popular Big Data toolset on the market today. We had just started talking about MapReduce as the major framework that makes Hadoop distinctive. So let’s continue the discussion where we left off.

 

MapReduce is really the key to understanding Hadoop’s parallel processing functionality as it enables data in various formats (XML, text, binary, log, SQL, ect) to be divided up and mapped out to many computers nodes and then recombined back to produce a final data set.

 

 

[Read more]
MySQL and Hadoop integration

Dolphin and Elephant: an Introduction

This post is intended for MySQL DBAs or Sysadmins who need to start using Apache Hadoop and want to integrate those 2 solutions. In this post I will cover some basic information about the Hadoop, focusing on Hive as well as MySQL and Hadoop/Hive integration.

First of all, if you were dealing with MySQL or any other relational database most of your professional life (like I was), Hadoop may look different. Very different. Apparently, Hadoop is the opposite to any relational database. Unlike the database where we have a set of tables and indexes, Hadoop works with a set of text files. And… there are no indexes at all. And yes, this may be shocking, but all scans are sequential (full “table” scans in MySQL terms).

So, when does Hadoop makes sense?

First, Hadoop is great if you need to …

[Read more]
Showing entries 1 to 10 of 15
5 Older Entries »