Planet MySQL

Displaying posts with tag: Apache Hadoop (reset)

Mar

2019

DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake

Posted by Uber Engineering on Thu 14 Mar 2019 16:00 UTC
Tags:

big data, cassandra, schemaless, MySQL, Apache Hadoop, Data Analytics, Uber Data, Data Infrastructure, Hudi, Marmaray

Keeping the Uber platform reliable and real-time across our global markets is a 24/7 business. People may be going to sleep in San Francisco, but in Paris they’re getting ready for work, requesting rides from Uber driver-partners. At that same …

The post DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake appeared first on Uber Engineering Blog.

Oct

2018

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Posted by Uber Engineering on Tue 30 Oct 2018 15:00 UTC
Tags:

Hardware, Architecture, data center, big data, cassandra, redis, capacity planning, MySQL, Apache Hadoop, Apache Spark, Uber, Uber Engineering, Cluster Management, Peloton, Unified Resource Scheduler, Workload Cluster

Cluster management, a common software infrastructure among technology companies, aggregates compute resources from a collection of physical hosts into a shared resource pool, amplifying compute power and allowing for the flexible use of data center hardware. At Uber, cluster management …

The post Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads appeared first on Uber Engineering Blog.

Oct

2018

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Posted by Uber Engineering on Wed 17 Oct 2018 16:00 UTC
Tags:

Apache, engineering, storage, Architecture, hadoop, data warehouse, big data, json, MySQL, Data Modeling, latency, Apache Hadoop, Docker, Apache Spark, Uber Data, PostgresSQL, hoodie, Apache Parquet, Hudi, Uber Eng

Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks…

The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Dec

2015

New VMware Continuent 5.0 – A powerful and cost efficient Oracle GoldenGate alternative!

Posted by Petri Virsunen of Continuent on Mon 21 Dec 2015 21:57 UTC
Tags:

Oracle, cloud, VMWare, big data, MySQL, Apache Hadoop, Data Analytics, database replication, Amazon Redshift, HP Vertica

VMware Continuent 5.0 is a complete data replication solution that includes all the functionality you need at one low price. In this webinar-on-demand, you’ll see how VMware Continuent delivers: Migration. Replicate from an old version of Oracle, often running on non-Linux platform (Windows, AIX, HP-UX, Solaris), to a new version of Oracle (often running in Linux). VMware Continuent supports

Jul

2015

Replication in real-time from Oracle and MySQL into data warehouses and analytics

Posted by Petri Virsunen of Continuent on Thu 23 Jul 2015 21:00 UTC
Tags:

Oracle, data warehouse, mysql replication, big data, vertica, mapr, MySQL, HortonWorks, Apache Hadoop, Data Analytics, database replication, Amazon Redshift, HP Vertica

Practical tips and a live demo of how to get your data warehouse loading projects off the ground quickly and efficiently when replicating from MySQL and Oracle into Amazon Redshift, HP Vertica and Hadoop.

Webinar-on-demand. Recorded 07/23/15.

Apr

2015

Introducing VMware Continuent 4.0 – MySQL Clustering and Real-time Replication to Data Warehouses

Posted by Petri Virsunen of Continuent on Fri 17 Apr 2015 19:37 UTC
Tags:

Oracle, VMWare, continuent, mysql replication, cloudera, mapr, MySQL, HortonWorks, Apache Hadoop, mysql disaster recovery, mysql high availability, Pivotal, Amazon Redshift, HP Vertica, vCloud Air

It’s with great pleasure we announce the general availability of VMware Continuent 4.0 – a new suite of solutions for clustering and replication of MySQL to data warehouses.

VMware Continuent enables enterprises running business-critical database applications to achieve commercial-grade high availability (HA), globally redundant disaster recovery (DR) and performance scaling. The new suite

Feb

2015

Real-time data loading from Oracle and MySQL to data warehouses, analytics

Posted by Petri Virsunen of Continuent on Mon 23 Feb 2015 19:45 UTC
Tags:

Oracle, Replication, data warehouse, mongodb, Continuent Tungsten, Apache Hadoop, Data Analytics, database replication, Amazon Redshift, HP Vertica

Analyzing transactional data is becoming increasingly common, especially as the data sizes and complexity increase and transactional stores are no longer to keep pace with the ever-increasing storage. Although there are many techniques available for loading data, getting effective data in real-time into your data warehouse store is a more difficult problem.In this webinar-on-demand we showcase

Jun

2014

theCube @ Hadoop Summit 2014 - Robert Hodges (Continuent) with John Furrier and Jeff Kelly on on real-time data loading from Oracle and MySQL into Hadoop.

Posted by Petri Virsunen of Continuent on Fri 06 Jun 2014 18:04 UTC
Tags:

Oracle, hadoop, data warehouse, MySQL, Continuent Tungsten, Apache Hadoop, Continuent Tungsten Replicator, Hadoop Summit 2014

The Hadoop Summit, a leading Apache Hadoop industry conference, has grown significantly over the years, and throughout the day, theCUBE, led by hosts John Furrier and Jeff Kelly, featured the best of thought leaders, use cases, data scientists, data analysts, and developers at the event. Watch yesterday's interview with Robert Hodges (CEO, Continuent) on real-time data loading from Oracle and

Nov

2013

Big Data Tools that You Need to Know About – Hadoop & NoSQL – Part 2

Posted by Hovhannes Avoyan on Wed 13 Nov 2013 11:08 UTC
Tags:

News, hadoop, big data, NoSQL, Apache Hadoop, Industry Info

In the previous article we introduced Hadoop as the most popular Big Data toolset on the market today. We had just started talking about MapReduce as the major framework that makes Hadoop distinctive. So let’s continue the discussion where we left off.

MapReduce is really the key to understanding Hadoop’s parallel processing functionality as it enables data in various formats (XML, text, binary, log, SQL, ect) to be divided up and mapped out to many computers nodes and then recombined back to produce a final data set.

…

[Read more]

Jul

2013

MySQL and Hadoop integration

Posted by Alexander Rubin of MySQL Performance Blog on Thu 11 Jul 2013 10:00 UTC
Tags:

hadoop, sqoop, Hive, Insight for DBAs, MySQL, Apache Hadoop, Data Science, no sql

Dolphin and Elephant: an Introduction

This post is intended for MySQL DBAs or Sysadmins who need to start using Apache Hadoop and want to integrate those 2 solutions. In this post I will cover some basic information about the Hadoop, focusing on Hive as well as MySQL and Hadoop/Hive integration.

First of all, if you were dealing with MySQL or any other relational database most of your professional life (like I was), Hadoop may look different. Very different. Apparently, Hadoop is the opposite to any relational database. Unlike the database where we have a set of tables and indexes, Hadoop works with a set of text files. And… there are no indexes at all. And yes, this may be shocking, but all scans are sequential (full “table” scans in MySQL terms).

So, when does Hadoop makes sense?

First, Hadoop is great if you need to …

[Read more]

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links